The Fuel Virtual Machine (FuelVM) is a novel 64-bit register machine for smart contract execution. Understanding all of its inner workings is not a prerequisite for this article, but it is useful. To get up to speed, here is a prior article diving into the instruction set and other unique behaviors to the machine.
Sway is the primary domain specific language (DSL) for writing FuelVM smart contracts, but how are the executables structured? How do they behave? We’ll dive into this in this article.
Note that things are quickly changing in the FuelVM. The time of writing and versions are as follows.
Time of Writing: March 24, 2023
FuelVM Specification Commit: a1ff912
Sway Release: 0.35.5
Sway Contract
First, we need a sway project. We can start by using the Fuel Orchestrator, forc
, (installation instructions here) to create a project via the following command.
forc new counter
This should generate a directory with the following structure.
.
├── .gitignore
├── Forc.toml
└── src
└── main.sw
Next, let’s open our code editor and write the following contract.
contract;
/// Logged on counter increment
struct IncrementEvent {
/// New counter value
new_count: u64,
}
/// Counter ABI
abi Counter {
/// Increments counter
#[storage(read, write)]
fn increment();
/// Returns current counter value
#[storage(read)]
fn get() -> u64;
}
/// Storage layout
storage {
/// Counter value
count: u64 = 0,
}
/// Counter Implementation
impl Counter for Contract {
/// Increments counter
#[storage(read, write)]
fn increment() {
let new_count = storage.count + 1;
storage.count = new_count;
log(IncrementEvent { new_count });
}
/// Returns current counter value
#[storage(read)]
fn get() -> u64 {
storage.count
}
}
Briefly skimming the code: the file is a contract that implements two methods, “increment”, which increments the count and logs “IncrementEvent” and “get” which returns the current count. We define a storage layout that contains the counter of type “u64”, or an unsigned 64 bit integer.
Compilation
Next we compile this contract as follows.
forc build
This should produce a “Forc.lock” file, storing dependency information at the time of compilation and an “out” directory containing a directory named after the build profile, in this case “debug”, which contains the final application binary interface (ABI), storage layout, and executable binary.
out/debug/counter-abi.json
The ABI file contains five lists:
types
functions
loggedTypes
messagesTypes
configurables
The “types” list contains any relevant data types for the contract, in our case, “()” is an empty expression, “struct IncrementEvent” is the structure of the logged event, and “u64” is the count’s type.
The “functions” section contains our external functions. The “get” function takes no inputs and returns one output. Note that the output “type” is two; this is an index in the “types” array, in this case “u64”. The “increment” function takes no inputs and returns a value of type “0” or “()”. This effectively means the function returns no values. Both functions have attributes including a “doc-comment” which refers to our triple slash “///” comments and a “storage” which indicates what storage operation each function does. The “get” function reads while the “increment” function reads and writes.
The “loggedTypes” section contains the “struct IncrementEvent” type since it is logged in the “increment” function.
The “messagesTypes” and “configurables” sections contain nothing for our contract.
out/debug/counter-storage_slots.json
The storage layout file contains a list of key-value pairs indicating the storage layout for the contract. Since we only specified a single storage variable, we only have a single pair. the key
value is the SHA-256 hash digest of “storage_0” where zero is the index of the storage variable. The “value” is a 32-byte zero value because storage keys and values are of 32 bytes and we initialize the “count” variable to zero.
Note: While not applicable here, it is worth noting that all
struct
fields occupy their own storage slot, they are not packed by default, but this behavior is likely to change in the near future.
counter.bin
Note: In this section, I will remove a few unnecessary things like the ASCII decoded hex values in the hexdumps and shortening some long values as follows:
0x0000..0000
Running the following command will print the binary as hexadecimal to the console, notice that the first four bytes (eight characters) indicate the starting index of the line and is not part of the actual file information.
hexdump -X out/debug/counter.bin
00000000 90 00 00 04 47 00 00 00 00 00 00 00 00 00 00 c0
00000010 5d fc c0 01 10 ff f3 00 5d 40 60 49 5d 47 f0 05
00000020 13 49 04 40 73 48 00 0f 5d 47 f0 06 13 49 04 40
00000030 73 48 00 19 72 f0 00 7b 36 f0 00 00 1a 48 50 00
00000040 91 00 00 20 50 41 20 00 5d 47 f0 07 10 45 13 00
00000050 50 41 20 00 60 41 10 20 50 41 20 00 38 45 04 00
00000060 24 44 00 00 1a 40 50 00 91 00 00 50 50 45 00 08
00000070 5d 4b f0 07 10 49 23 00 50 45 00 08 60 45 20 20
00000080 50 45 00 08 38 49 14 40 10 4d 20 40 50 45 00 28
00000090 5d 4b f0 07 10 49 23 00 50 45 00 28 60 45 20 20
000000a0 50 45 00 28 3a 45 14 c0 50 45 00 00 5f 45 30 00
000000b0 5d 43 f0 04 34 00 04 50 24 00 00 00 47 00 00 00
000000c0 f3 83 b0 ce 51 35 8b e5 7d aa 3b 72 5f e4 4a cd
000000d0 b2 d8 80 60 4e 36 71 99 08 0b 43 79 c4 1b b6 ed
000000e0 00 00 00 00 00 00 00 08 00 00 00 00 75 b7 04 57
000000f0 00 00 00 00 58 42 f1 be 00 00 00 00 00 00 00 c0
This is not very helpful, so we’ll revisit it after tinkering with some “forc” utilities.
Sway Compilation Tools
We'll start by looking at the Sway intermediate representation (IR) then at the generated assembly to pick apart our contract at different stages of compilation.
Sway IR
To show the Sway IR, we can include a flag in the compile command.
forc build --ir
This should print a somewhat human-readable format for our contract's code. Note that the following two values were inserted manually:
<repo_path>
<stdlib_path>
These two values will vary based on your machine’s directory structure.
contract {
pub entry fn get<75b70457>() -> u64, !3 {
local b256 key_for_0
entry():
v0 = get_local b256 key_for_0, !4
v1 = const b256 0xf383..b6ed, !4
store v1 to v0, !4
v2 = state_load_word key v0, !4
v3 = bitcast v2 to u64, !4
ret u64 v3
}
pub entry fn increment<5842f1be>() -> (), !7 {
local { u64 } __anon_0
local b256 key_for_0
local b256 key_for_0_
local u64 new_count
entry():
v0 = get_local b256 key_for_0, !8
v1 = const b256 0xf383..b6ed, !8
store v1 to v0, !8
v2 = state_load_word key v0, !8
v3 = bitcast v2 to u64, !8
v4 = const u64 1, !9
v5 = add v3, v4, !12
v6 = get_local b256 key_for_0_, !13
v7 = const b256 0xf383..b6ed, !13
store v7 to v6, !13
v8 = bitcast v5 to u64, !13
state_store_word v8, key v6, !13
v9 = get_local { u64 } __anon_0, !14
v10 = insert_value v9, { u64 }, v5, 0, !14
v11 = const u64 0
log { u64 } v10, v11, !18
v12 = const unit ()
ret () v12
}
}
!0 = "<repo_path>/counter/src/main.sw"
!1 = span !0 701 746
!2 = storage "reads"
!3 = (!1 !2)
!4 = span !0 735 740
!5 = span !0 493 636
!6 = storage "readswrites"
!7 = (!5 !6)
!8 = span !0 542 547
!9 = span !0 550 551
!10 = span !0 534 551
!11 = state_index 0
!12 = (!10 !11)
!13 = span !0 561 586
!14 = span !0 600 628
!15 = span !0 596 629
!16 = "<stdlib_path>/sway-lib-std/src/logging.sw"
!17 = span !16 244 261
!18 = (!15 !17)
Looking at this from a high level, we have a contract that contains two public functions, “get” and “increment”.
Both functions contain a hexadecimal number inside the angle brackets, these are actually the four most significant bytes, in hexadecimal, of the function selector, which is the SHA-256 hash digest of the function signature. The function signature is the function name followed by comma separated argument types wrapped in parenthesis, in our case the signatures are as follows:
sha256("get()") == 0x75b70457...
sha256("increment()") == 0x5842f1be...
There are also a few occurrences of the 32 byte value “0xf383..b6ed
” (truncated here for brevity), which is the “key” value of the “counter” variable defined in the storage layout file.
We can also see a few key words that are relevant such as “state_load_word”, which loads from storage, “state_store_word” which writes to storage, “log” which logs the counter change, and ret
, which returns from the current context.
contract {
// ...
pub entry fn increment<5842f1be>() -> (), !7 {
// ...
v2 = state_load_word key v0, !8
// ...
state_store_word v8, key v6, !13
// ...
log { u64 } v10, v11, !18
// ...
}
}
We need to go deeper.
Sway Assembly
To see the human-readable assembly, we can use a different flag while compiling.
forc build --intermediate-asm
This should print FuelVM instructions and some comments indicating what the sections and instructions mean. Note that comments in FuelVM assembly are started with “;” instead of “//” used in Sway files. For the sake of brevity, we will only look at the contract itself, but note that any code compiled from the core and standard libraries will also appear in this output.
;; --- ABSTRACT ALLOCATED PROGRAM ---
;; Contract
;; --- Prologue ---
.program:
ji .4
noop
DATA SECTION OFFSET[0..32]
DATA SECTION OFFSET[32..64]; data section offset
.4 ; end of metadata
lw $ds $is 1
add $$ds $$ds $is
; Begin contract ABI selector switch
lw $r0 $fp i73 ; load input function selector
lw $r1 data_2 ; load fn selector for comparison
eq $r2 $r0 $r1 ; function selector comparison
jnzi $r2 .0 ; jump to selected function
lw $r1 data_3 ; load fn selector for comparison
eq $r2 $r0 $r1 ; function selector comparison
jnzi $r2 .2 ; jump to selected function
movi $$tmp i123 ; special code for mismatched selector
rvrt $$tmp ; revert if no selectors matched
;; --- Functions ---
.program:
; contract method: get, selector: 0x75b70457
.0 ; --- start of function: get ---
move $r2 $sp ; save locals base register
cfei i32 ; allocate 32 bytes for locals
addi $r0 $r2 i0 ; get offset reg for get_ptr
lw $r1 data_0 ; literal instantiation
addi $r0 $r2 i0 ; get store offset
mcpi $r0 $r1 i32 ; store value
addi $r0 $r2 i0 ; get offset
srw $r1 $r0 $r0 ; single word state access
ret $r1
.program:
; contract method: increment, selector: 0x5842f1be
.2 ; --- start of function: increment ---
move $r0 $sp ; save locals base register
cfei i80 ; allocate 80 bytes for locals
addi $r1 $r0 i8 ; get offset reg for get_ptr
lw $r2 data_0 ; literal instantiation
addi $r1 $r0 i8 ; get store offset
mcpi $r1 $r2 i32 ; store value
addi $r1 $r0 i8 ; get offset
srw $r2 $r1 $r1 ; single word state access
add $r3 $r2 $one
addi $r1 $r0 i40 ; get offset reg for get_ptr
lw $r2 data_0 ; literal instantiation
addi $r1 $r0 i40 ; get store offset
mcpi $r1 $r2 i32 ; store value
addi $r1 $r0 i40 ; get offset
sww $r1 $r1 $r3 ; single word state access
addi $r1 $r0 i0 ; get offset reg for get_ptr
sw $r1 $r3 i0 ; insert_value @ 0
lw $r0 data_1 ; loading size for LOGD
logd $zero $zero $r1 $r0
ret $zero ; returning unit as zero
;; --- Data ---
.data:
data_0 .bytes[32] f383..b6ed
data_1 .word 8
data_2 .word 1974928471
data_3 .word 1480782270
This is a lot, so we'll start from a high level. The comments indicate the different parts of the program.
;; Contract
: start of the contract;; --- Prologue ---
: contract entry point;; --- Functions ---
: contract functions;; --- Data ---
: contract constants
Constants
We'll start with the data section since the constants defined there will be used throughout the contract.
We have four constants:
data_0
- the “count” variable's storage slotdata_1
- the size of the data to log (IncrementEvent)data_2
- the “get” selector, in decimaldata_3
- the “increment” selector, in decimal
These are accessed in the contract by referring to them by name. For example, loading the “get” selector into register “r1” can be done as follows with the “load word” or “lw” instruction.
lw $r1 data_2
Prologue
The contract starts with the following instruction.
ji .4
This jumps to the destination labeled “.4”, in this case, we're jumping over some contract metadata.
From “.4” we have the function dispatcher. This loads the function selector specified by the caller, then compare that with the two function selectors we have defined in our contract to determine which function the caller wants to call. If we find a match, we jump to the respective jump label. In this case, the label for “get” is “.0” and the label for “increment” is “.2”. Finally, if no function selectors match, we revert the call.
Functions
The “get” and “increment” bodies are in this section, starting with their jump label and ending with a “ret” instruction, which returns a value, if any, to the caller.
Stepping through the “get” function, we extend the call frame by 32 bytes for additional memory, load the storage key into memory, load the 64 bit value from storage into register “r1”, then return the value from “r1” to the caller.
Stepping through the “increment” function, we have a few more instructions. First we extend the call frame by 80 bytes for additional memory, load the storage key into memory, load the 64 bit value from storage, increment it, store the new value in storage from where it was read, create the “IncrementEvent” structure in memory with the new value as the “new_count” field, log it, then returns zero to the caller.
Revisiting the Binary
Now that we have a frame of reference for reasoning about the binary file, let's revisit the hexdump output.
90 00 00 04 47 00 00 00 00 00 00 00 00 00 00 c0
5d fc c0 01 10 ff f3 00 5d 40 60 49 5d 47 f0 05
13 49 04 40 73 48 00 0f 5d 47 f0 06 13 49 04 40
73 48 00 19 72 f0 00 7b 36 f0 00 00 1a 48 50 00
91 00 00 20 50 41 20 00 5d 47 f0 07 10 45 13 00
50 41 20 00 60 41 10 20 50 41 20 00 38 45 04 00
24 44 00 00 1a 40 50 00 91 00 00 50 50 45 00 08
5d 4b f0 07 10 49 23 00 50 45 00 08 60 45 20 20
50 45 00 08 38 49 14 40 10 4d 20 40 50 45 00 28
5d 4b f0 07 10 49 23 00 50 45 00 28 60 45 20 20
50 45 00 28 3a 45 14 c0 50 45 00 00 5f 45 30 00
5d 43 f0 04 34 00 04 50 24 00 00 00 47 00 00 00
f3 83 b0 ce 51 35 8b e5 7d aa 3b 72 5f e4 4a cd
b2 d8 80 60 4e 36 71 99 08 0b 43 79 c4 1b b6 ed
00 00 00 00 00 00 00 08 00 00 00 00 75 b7 04 57
00 00 00 00 58 42 f1 be 00 00 00 00 00 00 00 c0
According to the Fuel VM Instruction Set Specification, instructions are always encoded into increments of four bytes.
opcode arg0 arg1 arg2
0x00 0x00 0x00 0x00
In addition, the instructions are defined in a declarative macro in the Fuel VM repo here.
Skipping rust declarative macros for brevity, looking at the first few lines will give us what we need to reason about the executable.
impl_instructions! {
"Adds two registers."
0x10 ADD add [RegId RegId RegId]
"Bitwise ANDs two registers."
0x11 AND and [RegId RegId RegId]
// ...
}
Each opcode is defined in two lines, the first is a string that describes its functionality and the second defines the byte-representation of the opcode as well as some identifiers and, where applicable, a set of registers in brackets.
Checking the ADD instruction specification, there are three registers, “rA” is the register that stores the sum while “rB” and “rC” contain the operands. Therefore the encoding would be as follows.
<ADD> <rA> <rB> <rC>
With this information, let's take the first few lines of the binary, separate them into lines of four bytes each, and examine what each one does.
; ji .4 (jump to immediate 0x000004)
90 00 00 04
; noop
47 00 00 00
; data offset (8 bytes)
00 00 00 00
00 00 00 c0
; lw $ds $is 1 (load word (lw) from memory at `0xc0` to `0xfc`
; with a size of `0x01 * 8`)
5d fc c0 01
; add $$ds $$ds $is (store the sum of `0xf3` and `0x00` in `0xff`)
10 ff f3 00
; ...
Juicy. We can roughly understand what each 4 byte instruction is doing, but what about the eight "data offset" bytes? concatenating the value together (0x00000000000000c0
) and converting to base-10 yields 192
. This is the offset, in bytes, from the start of the executable to the data section. Starting at this offset, we get the following:
;; --- Data ---
f3 83 b0 ce 51 35 8b e5 7d aa 3b 72 5f e4 4a cd ; storage_key[..16]
b2 d8 80 60 4e 36 71 99 08 0b 43 79 c4 1b b6 ed ; storage_key[16..]
00 00 00 00 00 00 00 08 00 00 00 00 75 b7 04 57 ; log_length : get_sel
00 00 00 00 58 42 f1 be 00 00 00 00 00 00 00 c0 ; increment_sel : _
The first two lines are our storage key, the second line is broken into two sections of eight bytes each, first is 0x0000000000000008
, which is the length of the “IncrementEvent” log, and the second is 0x0000000075b70457
, which is the four byte selector for the “get” function. On the last line we have 0x000000005842f1be
which is the “increment” function selector.
Additional Tooling
At the time of writing, there are a few tools being developed to assist with Fuel binary analysis both independently and within the Fuel team including the fuel-debugger for stepping through a transaction's execution and the fuel-disassembler for disassembling binaries into a more human readable assembly format.
Conclusions
To summarize, using the “forc” command, we were able to peek into the intermediate representation and human-readable assembly in Sway’s compilation pipeline to reason about FuelVM’s executable binaries. Note that this is a very simple contract and larger, more complex contracts will be much more difficult to reason about, but the framework for reasoning about them is still applicable.
Hopefully this article serves as a good resource for engineers, optimizers, and security researchers alike as the Fuel ecosystem develops and the necessity for understanding Fuel and Sway at the lowest levels grows.
If you enjoyed this thing and want to see more, subscribe below and follow me on Twitter for regular updates. Until next time, good hacking 🤘