The Vyper programming language is a statically typed, Pythonic, high level, domain-specific smart contract language for the Ethereum Virtual Machine (EVM). It focuses on readability and high levels of abstraction from the EVM’s instruction set. In this article we will dive into not just the semantics of the Vyper language, but some behaviors of the Vyper compiler such as memory and storage layout as of version 0.3.7. This article assumes beginner to intermediate knowledge of the EVM, Solidity, and smart contract engineering, but no prior knowledge of Vyper.
Syntax
Variables
Types
Vyper's types include primitives such as signed and unsigned integers, address, fixed point decimals, statically sized bytes, and boolean. Complex types include user-defined structs, user-defined enums, dynamically sized arrays, dynamically sized strings and bytes, and storage mappings. All types must be explicitly declared.
Note, the enum type is listed under complex types, though it is a compile time abstraction over 256 bit bitmaps.
Note, dynamic types must have an upper bound known at compile time, this is required for storage, memory, and calldata values and has a direct impact on memory and storage scheduling.
Scoping
Variables declared at the global scope are treated as persistent storage variables, accessible as members of the “self” object. Storage variables are internal by default, that is to say, they are not directly accessible via an external call. A storage variable who’s type is wrapped by “public” will generate a getter function at compile time. A storage variable who’s type is wrapped by “constant” is directly accessible without the “self” object and must be defined at compile time. A storage variable who’s type is wrapped by “immutable” behaves as constants do, but they may be defined in the initialization function rather than at compile time (more on this in the functions section).
Variables declared in the scope of a function, loop, or branch (local variables) are accessible until the end of the current scope. Local variables are stored in memory based on the memory layout specified under the compiler behavior section.
Control Flow
Branches
Conditional branches in Vyper may be defined with the “if”, “elif”, and “else” keywords similar to Python. The “if” keyword must be followed by a boolean expression that, if resolved to true, the subsequent block would be executed. The “elif” keyword is syntax sugar over “else if” and must be followed by a boolean expression that, if resolved to true, the subsequent block would be executed. The “else” keyword must be followed by a block to be executed if the previous “if” or “elif” expression resolved to false.
Iteration
Arrays and integer ranges may be iterated with the “for” keyword. Arrays may be iterated by each element while integers may be iterated from a start value to a stop value using the built-in range function.
Functions
Functions are declared with optional decorators, typed argument(s), return type(s), and a body.
Decorators
Decorators are prefixed by “@” and must be on the line(s) directly above their function.
The external decorator makes the function externally accessible outside of the contract. Arguments are encoded in calldata and the return keyword returns the data back to the external caller as returndata.
The internal decorator makes the function only accessible internally within the same contract. Arguments are pointers to memory and the return keyword leaves the relevant pointers on the stack.
Note, the difference in behavior between internal and external functions are largely consistent across EVM languages, despite minor implementation differences.
The mutability decorators are pure, view, and payable. Pure functions neither read nor write state and have no side effects. View functions may read but do not write state and have no side effects. Payable functions may read and write state, may have side effects, and may receive a non-zero amount of Ether attached to the call. By default, a function is non-payable, which behaves the same as a payable function, but with the exception that if a non-zero amount of Ether is attached to the call, execution will revert.
The nonreentrant decorator wraps the function with a reentrancy lock. This asserts the function has not been called recursively, which prevents same-function reentrancy. Each nonreentrant decorator must be defined with a string, serving as a “key”. Functions with matching keys are wrapped with the same lock, which prevents cross-function reentrancy.
Special Functions
The initialization function “__init__” defines behavior during the creation of the contract. In this function, immutable variables must be assigned, storage variables may be assigned, and other code may be executed. At the end of the initialization function, the runtime bytecode is implicitly returned, completing the deployment of the contract.
The default function “__default__” defines behavior when the contract is called but no other function is matched. This is comparable to Solidity’s fallback function. Adding the payable decorator to the default function is comparable to Solidity’s receive function.
Events
Events may be defined with a name, typed arguments, and optionally up to three indexed arguments. Events may be logged using the log keyword in a non-payable or payable function. An indexed argument is defined as such by wrapping its type with “indexed”.
Built-ins
Vyper has a number of built-in functions that perform a range of operations from type-checked low level operations to elliptic curve arithmetic.
Chain Interactions
Chain interaction functions perform operations such as blueprint and proxy deployment and low level external calls, logging, and reverting.
Cryptography
Cryptography functions include elliptic curve point addition and multiplication on the AltBN128 curve, public key recovery from an SECP256k1 signature, and hashing with the Keccak256 and SHA256 hashing algorithms.
Data Manipulation
Data manipulation functions include concatenation, type conversion, and slicing. This also includes integer to string conversion and 32 byte extraction from dynamic byte arrays.
Maths
Math functions include absolute value, rounding, min and max comparisons, max value for a data type, modular arithmetic, unchecked arithmetic, and square root operations.
Utilities
Utility functions include environment information, the empty version of a data type, dynamic type length, application binary interface (ABI) encoding, and logging to the console in development environments.
Compiler Behavior
Intermediate Representation
The intermediate representation (IR) in the Vyper compiler is represented in a Lisp-like syntax where each expression has a “valency” of zero or one where one indicates the expression returns a stack value.
Note, in the following descriptions, compilation may not always map expressions directly to opcodes. For example, an integer literal may map to a “push” instruction or if the optimizer finds an opportunity to duplicate a value instead it may do so.
Simple Expressions
Integer literals have valency of one and may map to a literal push instruction.
EVM opcode expressions are interpreted recursively in reverse. Nested expressions are evaluated first an expressions are evaluated right to left. The following expression stores the value two in slot one.
# ir
(sstore 1 2)
# opcodes
push1 0x02
push1 0x01
sstore
“With” expressions define a variable name, initial value, and a scope in which the variable may be used. Variables may be shadowed in the defined scope. The valency depends on the valency of the sub-expression.
# ir
(with x 1
(mstore 32 x))
# opcodes
push1 0x01
push1 0x20
mstore
“Set” expressions mutate the value of the variable. This expression has a valency of zero.
# ir
(with x 1
(set x (add x 1)))
# opcodes
push1 0x01
push1 0x01
add
Sequential Expressions
“Seq” expressions contain a series of expressions. The final expression determines the valency while all other expressions with non-zero valency have their values popped off the stack.
# ir
(seq
(call gas address 0 0 0 0 0)
(call gas caller 0 0 0 0 0))
# opcodes
push0
push0
push0
push0
push0
address
gas
call
pop
push0
push0
push0
push0
push0
caller
gas
call
Control Flow
Jump destinations may be aliased using the “label” expression and jumped to with the “goto” expression.
An “if” expression takes a valency one expression and if the stack value is zero, it executes a second expression, else it executes a third.
(if
<condition_expr>
<truthy_expr>
<falsy_expr>)
A “repeat” expression contains an iterator variable name, initial value, number of iterations, an upper bound to the number of iterations checked at runtime, and a body to execute on each iteration. The “break” expression cleans the stack and exits the loop, “continue” increments the loop counter and continues at the loop’s start, and “cleanup_repeat” cleans the loop state from the stack.
(repeat
<var_name>
<initial>
<iterations>
<upper_bound>
<body>)
Opcode-like Abstractions
There are opcode-like expressions that create minor abstractions over the EVM opcodes. The “assert” expression is an assertion, “assert_unreachable” reverts if a condition evaluates to non-zero, “ge” is greater-than-or-equal-to, “le” is less-than-or-equal-to”, prefixing the previous two with “s” does the same with signed integers, “ne” is not-equal, “select” is a conditional expression similar to “if” but is intended to be branchless. The “sha_32” and “sha_64” expressions hash either 32 or 64 bytes of memory, respectively. The “ceil32” expression rounds its input to the nearest multiple of 32.
Note: the “select” boolean expression must resolve to one or zero, any other is undefined behavior.
Storage Layout
The storage layout of a Vyper contract starts at zero and increments by the number of slots each variable occupies. Value are never packed in storage, even in structs, everything is padded to a full 32 byte word.
Reentrancy locks generated by the nonreentrant keys occupy the first N slots of storage where N is the number of unique keys in the contract.
Primitive types occupy one slot each, structs occupy one slot for each field, statically sized arrays are laid out in storage sequentially, dynamically sized arrays occupy one slot for the length then N slots where N is the maximum capacity of the array, and mapping slots are generated by hashing the storage index concatenated with the key.
Note, Solidity does the opposite, with the key before the storage index. Additionally, dynamic arrays occupy sequential slots since their upper bound is known at compile time while in Solidity, the upper bound is not known so the array is stored at the hash of the storage index to minimize the risk of slot collision.
The default storage layout may be overridden at compile time using a JSON file specifying the variable name, the type, and slot index to use instead. This enables the use of Ethereum Improvement Proposals (EIPs) that specify storage slots.
Memory Layout
Vyper memory is laid out in increments of 32 byte words. While EVM memory is linear and accessed by the byte index, the term “slot” in this section refers to a 32 byte word stored at a memory index that is a multiple of 32.
The first two slots of memory are used as scratch-space for hashing data. Starting at slot three, variables are stored in memory sequentially based on when they come into scope. Function arguments come into scope first, therefore they are copied to memory. Then, for each variable definition, a value is written to memory. Variables are dropped at the end of their local scope, so variables defined in conditional and loop statements as well as internal function calls are dropped and overwritten when possible to save memory.
Note, Vyper stores all variables in memory, including primitives, while Solidity stores primitives on the stack with only minor exceptions.
Primitive types occupy one slot, statically sized arrays occupy N slots where N is the length, and structs occupy one slot for every field. Dynamically sized arrays occupy one slot for the length, then they reserve N slots where N is the maximum capacity of the array. The exception is dynamically sized byte arrays and strings, which are tightly packed, but still reserve N slots of memory where N is the length divided by 32, rounded up.
Calldata Layout
Calldata in Vyper is laid out in accordance with Solidity’s ABI specification. The only divergence is the user-defined enum. Vyper’s enum is an abstraction over a bitmap represented as a uint256 while Solidity’s enum is an abstraction over sequential uint8 values, which may cause compatibility issues if developers port enums between languages.
Conclusions
The Vyper compiler, despite front end trade-offs such as compile time dynamic type upper bounds and back end inefficiencies in caching and duplication, is a well designed machine. Its source code being written in Python serves as both a simple entry point into compiler contributions and as a reference to better comprehend the stages of compilation and the internal data structures and algorithms it takes to convert developer intents to EVM bytecode. While its abstractions may seem limiting to engineers that prefer low level control, its built-in functions enable type-checked low cost abstractions that aren’t built into the language itself and the optimizer is surprisingly efficient. Despite visible inefficiencies in the compiler output, its runtime bytecode size and gas efficiency rival that of even low level languages such as Yul.
I hope this article was informative for the beginner, intermediate, and advanced Vyper engineers alike. I personally learned a lot about the compiler during this deep dive and if you would like to see what went into this, below are some tools and resources used for this article.
If you enjoy this kind of content, consider subscribing below and until next time, good hacking!