Recently, I wrote an article on the state of the art in Ethereum Virtual Machine language design. It spoke about all of the high and low level languages in use today and their benefits. This is a follow up to the previous article.
It is only fitting that, as a follow up, I am calling for the abandonment of the state of the art in pursuit of a fundamentally different approach; one that enables maximal developer control, defines clear abstractions, and one that promotes emergent abstractions over imposed abstractions.
Motivations
“If I had asked people what they wanted, they would have said faster horses”
— Henry Ford, Rust advocates
At the time of writing, Solidity dominates the smart contract language design space. Client interfaces only implement Solidity’s Application Binary Interface (ABI), New Ethereum Improvement Proposals (EIPs) implicitly use Solidity’s ABI, debuggers operate on Solidity source maps, and even other smart contract languages start at Solidity’s base layer of abstraction and attempt to deviate from the same bedrock with minor syntax differences and backend improvements.
Vyper has an excellent intermediate representation (IR), Fe has a novel model for bringing context and system functionality into local scopes, Solidity is beginning to improve on modular and functional code independent of contract objects. However, each language starts from the same principles. Contracts are objects, code is reused primarily via inheritance if at all, and the syntax is designed to look general purpose except where the EVM prohibits reasonable abstraction.
Adoption Games
Adoption of smart contract languages is zero-sum when nothing novel is created from divergent patterns. Engineers and companies are expected to use their own resources to learn a new language, but the opportunity is only sufficient when the new language does what the old may not. Gas efficiency and compiler performance are huge improvements to developer and user experiences, but are not sufficient for adoption.
Zero-sum adoption games invariably devolve into code snippet dunking, cherry picked gas benchmarks, and culture wars.
Adoption becomes positive-sum when and only when a new language enables constructs and functionality not possible in the status quo. Resource investments are justified as teams reach the limits of what Solidity can do (see: Seaport).
“Solidity 2.0”
The meta for zero-sum adoption is a so-called “Solidity 2.0”, a language that looks and behaves like Solidity, but with abstract data types and improved gas efficiency. This means generics, possibly even trait or typeclass constraints, a better front end to generate better middle end code, and an intermediate representation that does not take multiple IRL minutes to optimize.
This is not a terrible idea, but it still misses the mark and still falls into the same adoption game trap as every other underfunded compiler in the space.
Status Quo Disruption
The status quo exists for a reason, it is the meta. The contract-as-an-object model has served the EVM space well for subtyping interfaces, abstracting data serialization, and creating high level concepts on top of which we can build composable protocols.
However, we are reaching the limits of this abstraction stack. Assets cannot be efficiently transferred without assembly, memory is allocated as if a garbage collector or memory management runtime exists when it does not, inheritance tree complexity cascades as poor storage pointer support prefers abstract contracts over custom data types and functions, and the type system suffers from an identity crisis teetering between disallowing implicit type casts and allowing function pointer poisoning.
Edge
The solution to status quo ossification, solidity lookalikes, and zero-sum adoption games is Edge. Edge begins from first principles and uses a relatively simple type system to construct novel abstraction stacks starting as low as type checked opcodes and going as high as generic contract objects.
Did you know, in a 2021 report on smart contracts (PDF) by Trail of Bits, it was found that around 90% of all EVM smart contracts are at least 56% similar, with 7% being completely identical? This is not a signal of lack of innovation, rather it is a signal that code reuse demands the most attention.
Clear semantics for namespaces and modules makes code reusable, parametric polymorphism (generics) and subtyping (traits) make reusable code worth writing, and annotations for all EVM data locations minimizes semantics imposed by the compiler.
Minimizing compiler semantic imposition improves granularity of developer control without relying on backdoors such as inline assembly. Inline assembly should always be enabled but outside of the standard library, its use should be considered a failure of the compiler. Practical inline assembly is used for a single reason, the developer has more context than can be provided to the compiler. This may be in the form of more efficient memory usage, arbitrary storage writes, bespoke serialization and deserialization, and unconventional storage methods.
From First Principles
Languages like Huff and ETK are special, they remove the guard rails and compiler-imposed abstractions, they leave only the EVM and aliases over it. In the EVM there are no data types, no functions, no encoding schemes, there is the word and the instruction.
Primitive Data Types
The word and the instruction are maximally flexible, the word may be a jump target, a condition, an external address, a data pointer, or just an arithmetic operand. The instruction does not care about the structure of the data. The conditional jump instruction will jump if the condition is a non-zero word. The call instruction does not care the call target is greater than 20 bytes. We could create a data type called “word” to have this in the language.
let myVariable: word = 0;
While this is maximally flexible, it is error prone, it performs no checks. From this, what data types are useful to derive?
First, there are 7 distinct data locations that may be loaded to the stack or copied to memory:
Storage: persistent disk storage (load, store)
Transient Storage: temporary memory storage (load, store)
Memory: linear read/write data buffer (load, store)
Calldata: linear read-only data buffer (load, copy)
Returndata: linear read-only data buffer (load, copy)
Code: linear read-only data buffer (copy)
External Code: linear read-only data buffer (copy)
Since each has its own set of instructions, it would make sense for a language to construct data pointer types for each.
let myStoragePtr: &s ptr = 0;
let myTransientPtr: &t ptr = 0;
let myMemoryPtr: &m ptr = 0;
let myCalldataPtr: &cd ptr = 0;
let myReturndataPtr: &r ptr = 0;
let myCodePtr: &co ptr = 0;
let myExtCodePtr: &ec ptr = 0;
It is worth noting that pointers may point to arbitrary data types.
Solidity enables pointers only to complex data types.
Additionally, constraints on integer sizes and signage are sensible, so we create “intN” and “uintN” which are signed and unsigned integers. Casting to a larger integer size is a no-op, but casting downward may require a check such that the new integer size has not overflowed.
let myInteger: u256 = 0;
let mySmolInteger: u1 = 0;
Of course, not all numbers are created equal, a number of zero or one may be a boolean and a 20 byte number may be an address.
let myBool: bool = true;
let myAddr: addr = 0x00..00;
This alone is enough to construct typed instructions where types must be explicitly declared or cast into the correct type for function execution.
fn call(
gas: u256,
target: addr,
value: u256,
argPtr: &m ptr,
argLen: u256,
retPtr: &m ptr,
retLen: u256,
) -> bool;
Our call instruction can now be checked at compile time that the target is an address, the argument and returndata pointers are memory pointers, and that the call returns a value of zero or one.
Complex Data Types
Primitive data types are useful, but structured data types are also important more complex interactions and computations.
The product type, aka structs and tuples, is important for grouping a number of different items with different data types together.
type MyStruct = { a: u8, b: u256 };
type MyPackedStruct = packed { a: u8, b: u8 };
type MyTuple = (u8, u256);
The sum type, aka enum or union, is important for representing one of a number of different states, where each state may have different items and data types.
type MyEnum = First | Second;
type MyUnion =
| Rgb({ r: u8, g: u8, b: u8 })
| Hex(u24);
Functions, while simply a code pointer to jump to and from with a sequence of instructions at the jump target, are also, fundamentally, a data type. A function’s type, also known as its signature, is a transition from input arguments to output arguments.
type MyFunction = (u8, u8) -> (bool, u16);
Control Flow
Reading procedures start to finish unconditionally is not very useful, and while code pointers can be used for jumping, we can add syntax sugar over this to make it more ergonomic.
Loops
The simple loop block with optional “continue” and “break” keywords is a simple but powerful abstraction.
loop {
break;
continue;
}
More familiar loops can map directly to the underlying loop in a desugaring step.
// while loop
while (condition) { .. }
loop {
if (!condition) break;
..
}
// for loop
for (let i = 0; i < list.len(); i++) { .. }
let i = 0;
loop {
if (i >= list.length()) break;
..
i++;
}
Branching
Branching based on boolean expressions and pattern matching is useful both for all the obvious reasons and for program correctness. Exhaustive pattern matching helps program correctness when pattern matching union data types.
if (myBool) { } else { }
if (myOption matches Option::Some(n)) { n; } else { }
match myOption {
Some(n) => { n; },
None => { },
}
Functions
Grouping instructions together into a code block with input and output values is good for modularity, code reuse, and in some cases, reducing code size. We define functions as follows.
fn myOtherAddFn(a: u256, b: u256) -> u256 {
return a + b;
}
Inline Assembly
Inline assembly should ideally be limited to the standard library, however its use is a critical component in allowing developers to break all abstractions and operate on opcodes directly.
fn double(a: u8) -> u8 {
asm (a) -> (a) {
push1 0x02
mul
}
return a;
}
Parametric Polymorphism
Writing libraries in Solidity is challenging because to make the library generic every single type must have its own function. That is to say, libraries use monomorphic data types in their construction.
Library engineers are vital users of any language. Parametric polymorphism is a core component of a great library system. Polymorphism refers to the ability to use different data types for the same purpose, and parametric refers to a set of type parameters to other data types.
type MyGenericEnum<T> =
| Some(T)
| None;
type MyGenericFunction<T> = (T, T) -> T;
Subtyping
Subtyping, the constraining of otherwise generic data types to have certain constants, functions, and types associated with it, is also incredibly useful in constructing correctness focused systems and clean abstractions.
trait MyTrait {
type MyType;
const MY_CONST: MyType;
fn toMyType(self: Self) -> MyType;
}
Modules
Modules, or libraries, is the other core component necessary to create quality, reusable code. This is relatively simple, conceptually.
pub mod MyModule {
pub mod MyNestedModule {
pub type MyType = u8;
}
}
use MyModule::MyNestedModule::MyType;
type MyOtherType = MyType;
Contracts
A contract object, in Solidity, is a collection of external functions that operate on its local state. Interfaces and abstract contracts may behave like subtypes, constraining minimum function implementations for the contract.
However, all contracts in the EVM are first and foremost single entry point executables. In this way it is similar to Rust and C programs, where the “main” function is the entry point of the program.
ABI
The ABI is a syntax sugar construct that enables both contract interface subtyping and calldata pattern matching.
abi MyABI {
fn myFn(a: u8) -> u8;
}
fn main() {
match sig<MyABI>() {
MyABI::myFn(a) => { return a; },
_ => revert(),
}
}
Contract ABI
Contracts behave similarly to structs where ABI’s can constrain the interface and external functions behave similarly to other high level languages like Solidity.
contract MyContract {
let myNumber: u8 = 0;
}
impl MyContract: MyABI {
fn myFn(a: u8) -> u8 {
return a + self.myNumber;
}
}
Note that the “self” syntax is experimental and is subject to change.
Comptime
Comptime, or compile time, is a keyword enabling zero-cost abstractions through compile time code execution.
Comptime branches perform conditional compilation based on constant values.
comptime if CONFIG.CHAINID == 1 {
tstore(slot, 1);
} else {
sstore(slot, 1);
}
Comptime functions are evaluated and resolved at compile time. If the function may not be resolved, a compiler error is thrown.
comptime fn eip1967slot(slotName: &m String) -> u256 {
return keccak256(slotName) - 1;
}
sstore(
eip1967slot("eip1967.proxy.implementation".toString()),
caller(),
);
To A New Abstraction Stack
These fundamental building blocks enable both the definition of other languages’ constructs within Edge and more. Constructs could be type checked SSTORE2 implementations, in memory hash maps, Solidity ABI encoders, compressed ABI encoders, elliptic curve data types and methods, and even nested virtual machines with no stack overhead.
The granularity of Huff with the type system and compile time code execution of a high level language offers an unparalleled developer experience and enables functionality not possible in any smart contract language. When developers want to break abstraction and create novel patterns, they can do so without breaking out of the language with assembly blocks, they can simply change imports and use lower level APIs developed over the EVM.
Solidity remains a great introductory language for developers to create and deploy an NFT in a dozen or so lines of code. This is not the target audience of Edge. Edge is for experienced engineers and teams that need more than what Solidity can offer. Edge is for choosing granularity of abstraction without sacrificing important correctness checks within the language’s type system.
To a new abstraction stack.