Skip to content

Commit

Permalink
Specify initial incomplete program schema
Browse files Browse the repository at this point in the history
  • Loading branch information
gnidan committed Oct 31, 2024
1 parent df0a8b7 commit 8f45e34
Show file tree
Hide file tree
Showing 15 changed files with 589 additions and 0 deletions.
5 changes: 5 additions & 0 deletions packages/web/spec/program/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"label": "ethdebug/format/program",
"position": 5,
"link": null
}
84 changes: 84 additions & 0 deletions packages/web/spec/program/concepts.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
sidebar_position: 2
---

# Key concepts

## Programs are associated with a contract's compiled bytecode

This bytecode might either be the call bytecode, executed when a contract
account with this bytecode receives a message on-chain, or the create bytecode,
executed as part of deploying the contract associated with the bytecode.

Reflecting this relationship, **ethdebug/format/program** records contain
a reference to the concrete contract (i.e., not an `abstract contract` or
`interface`), the environment the bytecode will be executed (call or
create), and the compilation that yielded the contract and bytecode.

## Programs contain instruction listings for debuggers to reference

Programs contain a list of **ethdebug/format/program/instruction** objects,
where each instruction corresponds to one machine instruction in the
associated bytecode.

These instructions are ordered sequentially, matching the order and
corresponding one-to-one with the encoded binary machine instructions in
the bytecode. Instructions specify the byte offset at which they appear in the
bytecode; this offset is equivalent to program counter on non-EOF EVMs.

By indexing these instructions by their offset, **ethdebug/format**
programs allow debuggers to lookup high-level information at any point
during machine execution.

## Instructions describe high-level context details

Each instruction object in a program contains crucial information about the
high-level language state at that point in the bytecode execution.
Instructions represent these details using the
**ethdebug/format/program/context** schema, and these details may include:

- Source code ranges associated with the instruction (i.e., "source mappings")
- Variables known to be in scope following the instruction and where to
find those variable's values in the machine state
- Control flow information such as an instruction being associated with the
process of calling from one function to another

This information serves as a compile-time guarantee about the high-level
state of the world that exists following each instruction.

## Contexts inform high-level language semantics during machine tracing

The context information provided for each instruction serves as a bridge
between low-level EVM execution and high-level language constructs. Debuggers
can use these strong compile-time guarantees to piece together a useful and
consistent model of the high-level language code behind the running machine
binary.

By following the state of machine execution, a debugger can use context
information to stay apprised of the changing compile-time facts over the
course of the trace. Each successively-encountered context serves as the
source of an observed state transition in the debugger's high-level state
model. This allows the debugger to maintain an ever-changing and coherent
view of the high-level language runtime.

In essence, the information provided by objects in this schema serves as a
means of reducing over state transitions, yielding a dynamic and accurate
representation of the program's high-level state. This enables debugging
tools to:

1. Map the current execution point back to the original source code
2. Reconstruct the state of variables at any given point
3. Provide meaningful stack traces that reference function names and source
locations
4. Offer insights into control flow, such as entering or exiting functions,
or iterating through loops
5. Present data structures (like arrays or mappings) in a way that reflects
their high-level representation, rather than their low-level storage

By leveraging these contexts, debugging tools can offer a more intuitive and
developer-friendly experience when working with EVM bytecode, effectively
translating between the machine-level execution and the high-level code that
developers write and understand. This continuous mapping between low-level
execution and high-level semantics allows developers to debug their smart
contracts more effectively, working with familiar concepts and structures
even as they delve into the intricacies of EVM operation.
5 changes: 5 additions & 0 deletions packages/web/spec/program/context/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"label": "Program contexts",
"position": 6,
"link": null
}
11 changes: 11 additions & 0 deletions packages/web/spec/program/context/code.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 4
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Code contexts

<SchemaViewer
schema={{ id: "schema:ethdebug/format/program/context/code" }}
/>
11 changes: 11 additions & 0 deletions packages/web/spec/program/context/context.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 3
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Schema

<SchemaViewer
schema={{ id: "schema:ethdebug/format/program/context" }}
/>
11 changes: 11 additions & 0 deletions packages/web/spec/program/context/variables.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 5
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Variables contexts

<SchemaViewer
schema={{ id: "schema:ethdebug/format/program/context/variables" }}
/>
11 changes: 11 additions & 0 deletions packages/web/spec/program/instruction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 5
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Instruction schema

<SchemaViewer
schema={{ id: "schema:ethdebug/format/program/instruction" }}
/>
73 changes: 73 additions & 0 deletions packages/web/spec/program/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
sidebar_position: 1
---

# Overview

:::tip[Summary]

**ethdebug/format/program** is a JSON schema for describing compile-time
information about EVM bytecode, organized from the perspective of individual
machine instructions.

In **ethdebug/format**, a program record (or "program") represents one block of
executable EVM machine code that a compiler generated for a specific contract.
This could be either the contract's runtime call bytecode or the bytecode
to create the contract.

A program is structured as a sequence of instruction records ("instructions"),
where each corresponds to a single EVM instruction in the machine code. Each
instruction contains information about the high-level language context at that
point in the bytecode. This allows debuggers to map low-level machine state
back to high-level language concepts at any point during execution.

Key information that programs contain for a particular instruction might
include:
- the source range or source ranges that are "associated" with the
instruction
- the collection of known high-level variables at that point in time,
including their types and where to find the bytes with those variables'
values
- signals to indicate that the instruction is part of some control flow
operation, such as calling some function from another.

These program records provide debuggers with a powerful reference resource
to be consulted while observing a running EVM. At each step of EVM machine
execution, debuggers can find the matching **ethdebug/format** program
instruction and use its information to maintain a coherent model of the
high-level world, step-by-step.

:::

This format defines the primary **ethdebug/format/program** schema as well as
various sub-schemas in the ethdebug/format/program/* namespace.

JSON values adhering to this schema contain comprehensive information about a
particular EVM bytecode object. This includes contract metadata (e.g., reference to the source range where the contract is defined) and, importantly, an
ordered list of **ethdebug/format/program/instruction** objects.

Each instruction object contains essential details for translating low-level
machine state at the time of the instruction back into high-level language
concepts. This allows debuggers to provide a meaningful representation of
program state at any point during execution.

## Reading this schema

The **ethdebug/format/program** schema is a root schema that composes other
related schemas in the ethdebug/format/program/* namespace.

These schemas (like all schemas in this format) are specified as
[JSON Schema](https://json-schema.org), draft 2020-12.

Please refer to one or more of the following resources in this section, or
see the navigation bar for complete contents:

- [Key concepts](/spec/program/concepts)

- [Schema](/spec/program) (**ethdebug/format/program** schema listing)

- [Instruction schema](/spec/program/instruction)
(**ethdebug/format/program/instruction** schema listing)

- [Context schema](/spec/program/context)
(**ethdebug/format/program/context** schema listing)
11 changes: 11 additions & 0 deletions packages/web/spec/program/program.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 4
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Schema

<SchemaViewer
schema={{ id: "schema:ethdebug/format/program" }}
/>
145 changes: 145 additions & 0 deletions schemas/program.schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
$schema: "https://json-schema.org/draft/2020-12/schema"
$id: "schema:ethdebug/format/program"

title: ethdebug/format/program
description: |
Debugging information about a particular bytecode in a compilation.
type: object

properties:
compilation:
title: Compilation reference by ID
description: |
A reference to the compilation as an `{ "id": ... }` object.
$ref: "schema:ethdebug/format/materials/reference"

contract:
type: object
properties:
name:
type: string

definition:
$ref: "schema:ethdebug/format/materials/source-range"
required:
- definition

environment:
title: Bytecode execution environment
description: |
Whether this bytecode is for contract creation or runtime calls.
type: string
enum:
- call
- create

context:
description: |
The context known to exist prior to the execution of the first
instruction in the bytecode.
$ref: "schema:ethdebug/format/program/context"

instructions:
type: array
description: |
The full array of instructions for the bytecode.
items:
$ref: "schema:ethdebug/format/program/instruction"
additionalItems: false

required:
- contract
- environment
- instructions

examples:
- # Incrementing a storage counter
#
# This example represents the call bytecode for the following pseudo-code:
# ```
# contract Incrementer;
#
# storage {
# [0] storedValue: uint256;
# };
#
# code {
# let localValue = storedValue;
# storedValue += 1;
# value = tmp;
# };
# ```
contract:
name: "Incrementer"
definition:
source:
id: 0
environment: call
context:
variables:
- &stored-value
identifier: storedValue
type:
kind: uint
bits: 256
pointer:
location: storage
slot: 0
instructions:
- offset: 0
operation:
mnemonic: PUSH0
context:
variables:
- *stored-value
- offset: 1
operation:
mnemonic: SLOAD
context:
variables:
- *stored-value
- &local-value
identifier: localValue
type:
kind: uint
bits: 256
pointer:
location: stack
slot: 0
- offset: 2
operation:
mnemonic: PUSH1
arguments: ["0x01"]
context:
variables:
- *stored-value
- <<: *local-value
pointer:
location: stack
slot: 1

- offset: 4
operation:
mnemonic: ADD
context:
variables:
- *stored-value
- *local-value
- offset: 5
operation:
mnemonic: PUSH0
context:
variables:
- *stored-value
- <<: *local-value
pointer:
location: stack
slot: 1

- offset: 6
operation:
mnemonic: SSTORE
context:
variables:
- *stored-value
Loading

0 comments on commit 8f45e34

Please sign in to comment.