Mutating Mempools and Programmable Transactions

Simon Brown
15 min readMar 29, 2023

--

Photo by D koi on Unsplash

We’re on the precipice of the first major step towards account abstraction and the many possibilities it enables. New innovations for dealing with the effects of MEV have resulted in significant architectural changes to the Ethereum ecosystem, and there are still significant changes to come. These innovations will lead to a range of options for how transactions are processed and executed. Now might be a good time to start thinking about where these innovations may lead to — and how we might get there.

One questions that interests me is this: what if transactions were programmable, dynamic, adaptive? What if transactions were more declarative than imperative, what if they simply express our intent at a high level rather than sending explicit low level instructions to a specific dapp?

This essay explores what benefits these sorts of programmable transactions might afford users, the various ways it could be achieved, and how the idea might evolve.

First of all: why is the idea of programmable transactions worth exploring?

The benefits of executing some logic in a programmable transaction instead of a smart contract (pre-chain vs. on-chain) are that it saves on gas and it can respond to conditions in the mempool that may alter the global state between the time of transaction composition and transaction execution.

This idea of responding to conditions faster than the block time of the underlying settlement layer opens up some interesting possibilities. For example, in terms of token swaps, having to wait up to the 12 seconds block time for settlement of a trade on Ethereum will nearly always result in a stale price, as prices in other domains such as centralized exchanges move much faster. Another example could be an oracle update that appears after the transaction was composed and broadcast but before it is included in a block, and which changes the information that was available at the time of execution. Allowing transactions to be context-aware means that they can respond to new information as it arrives in the mempool, updating the conditions under which the transaction is executed.

The one person who seems to be thinking deeply about this area is Vlad Zamfir, who talked about the concept of “Smart Transactions” in his talk at DevCon, where he talks about “MEV time I/O”, where by earlier transactions can directly interact with later transactions through EVM storage, opening up a whole new direction of research.

So what can you can do programmable transactions?

Examples

Here are some ideas of what capabilities they might enable:

Split payments

only process my payment if there are x other ETH transfers to the same address that add up to y within the next n blocks

This can reduce counter-party risk when splitting payments for goods or services, without the need to escrow funds in a smart contract and thereby wasting gas.

Atomic bilateral payments / swaps

only process my transfer to address A if there’s another transaction that transfers x ETH to address B that appears within n blocks

It’s worth noting of course that just because both value transfers appear in the mempool at the same time, that doesn’t give absolute guarantees that they will actually be executed atomically, more on that later.

Collective bargaining / Flash sales

execute only in the presence of at least x transactions to the same contract address with the same 4-byte signature

This could be used by dapps to run promotions that offer a lower price for a short time if interest is sufficient, but without locking funds on-chain or wasting gas in the case where the minimum level of interest isn’t met.

JIT liquidity

only execute this transaction if there is x volume to a specific liquidity pool and no other transactions that are providing liquidity

This could allow liquidity providers to improve prices for users and reduce occurrences of reverts due to excess slippage tolerance, by providing liquidity on pools where there is a spike in demand, but with the added bonus of reducing the risk of miscalculation, and reducing impermanent loss by leaving liquidity sitting in a pool for long periods.

Throttling

don’t process this transaction until there are at most x number of pending transactions from the same address in the mempool and only if the block height % 2 = 0”.

This could be used to have transactions processed at a certain rate over a period of time.

Oracle updates

only process my transaction if there is a specific oracle update to address 0x1234 that has values within a certain range

This assumes there is an oracle every block or every few blocks and that the update will be within a certain range, and allows for an abort if it’s not. It could be used to conditionally top-up collateralized positions.

Fee escalators

process my transaction with a gas price that increments at a specific rate over a specific time until it gets included in a block

Limit orders

execute my swap transaction only if and when a searcher can guarantee an effective price of at least x” (example)

Block-level flash loans

transfer x ETH from address A to address B if there is also a transaction in the mempool to transfer at x + x’ ETH back to address A from address B and only if both transactions are executed in the same block

Transactions qualified with general constraints

do not execute (or execute only) if more than x number of other transactions to the same address are in the mempool”.

This could improve multi-sig setups, where the various signatures are only executed if the quorum is detected in the mempool within a certain time frame.

With the above examples, we can imagine some emergent domain specific language that allows expressiveness of preferences with relation to transactions, see the table below for a more detailed example of what this might start to look like.

Example DSL:

Note: all these conditions can be combined in conjunction | disjunction

This type of transaction isn’t possible with traditional blockchain architecture, because there is no way to express any logic that is executed pre-chain, i.e. in the mempool. However, the mempool as we know it is changing. There is emerging a range of optionality for routing transactions through numerous segmented mempools that afford us new abilities. Could these pave the way to some sort of programmable transaction in the future?

Level set on mempools

(and potential mansplaining)

Most blockchains / distributed public ledgers adhere to the same basic structure on a high level: transactions are composed and signed in a wallet (usually) and sent to a node on the network which gossips it to the rest of the network through it’s connected peers, until the whole network has a copy of it in their local cache of pending transactions. This local cache that each node has is called its “mempool”, and collectively all of these local caches of pending transactions that each node maintains are referred to as “the mempool”.

All nodes have more or less the same set of pending transactions, and these transactions are “public”, in as far as every node on the network can see them. Actually there is always some inconsistency, nodes see about 99% of pending transactions under normal conditions. Each node also has its view of the global state, i.e. the tip of the chain, and the history of the chain. A new block is added to the chain, each removes the transactions in that block from its cache of pending transactions. Fairly straightforward.

Block Builders

Ethereum has started evolving beyond the traditional view of what the mempool is. This arguably started with block builders under mev-boost. Now as well as having nodes that gossip transactions across the public P2P mempool, we have block builders that maintain their own cache of transactions that they do not gossip to other nodes in the network.

Many transactions are sent directly to these block builders so that nobody but the block builder will see them, which avoids issues such as front-running that occur in the public mempool, see Flashbots Protect as an example. A lot of these transactions are sent to multiple block builders to reduce inclusion latency (the time it takes for a tx to get into a block) so that many block builders have the same subset of transactions that aren’t in the public mempool, as well as each having a set of transaction unique to them.

There is some overlap, that comes from when searchers take transactions from the public mempool, place them in bundles and submit them to block builders, and also when block builders take transactions from the public mempool to fill up their blocks once they’ve added the bundles and private transactions that have been sent directly to them.

This means that block builders are in the perfect position to offer new transaction types that are more expressive, adaptive, and programmable. They can do this because they have a complete view of their mempool, as well as the public mempool, but can also guarantee atomic execution and ordering of transactions within a block. The only issue is that currently block building is centralized, in as far as each block is built by one single block builder, which results in quite strong trust assumptions on the side of users submitting transactions. There are multiple ways that this could change in the future though, and is a fascinating open research question.

ERC-4337

The next radical transformation to Ethereum’s architecture is arguably ERC-4337 aka “Account Abstraction Using an Alternative Mempool”.

Under ERC-4337, a new type of transaction (well technically a pseudo-transaction object) called a “user operation” is sent to a new class of actor called a “bundler”. These bundlers take user operations, and bundle them into a single transaction that calls functions on user’s smart contract wallets, thereby enabling account abstraction. These single transactions contain batches of function calls to smart-contract-wallets for different users, and are given to block builders that put them at the top of the block. For this reason it’s assumed most block builders will also operate 4337 bundlers themselves. Bundlers will also propagate user operations to other bundlers, forming a public “alt mempool”.

So now we have another separate mempool containing user operations, which are aggregated into transactions which end up in block builder’s mempools, which in turn end up on-chain. Furthermore, there may be more than one 4337 mempool, for various reasons. We could speculate that one reason could be that a specific mempool supports particular paymasters with different validation rules, that could include some off-chain component for instance.

Could ERC-4337 user operations enable programmable transactions?

Well no. Or at least not without hacking the spec in some way. The ERC-4337 spec has some intentional limitations on the verification logic that these user operations can use, including a list of forbidden opcodes whose values can change between the time that the user operation was verified, and the time it actually lands in a block. This is to prevent DOS attacks and to give bundlers and paymasters more certainty that the transactions they’re processing will not revert on-chain.

However, there’s nothing to stop bundlers from allowing users to add extra qualifying execution constraints to their user operations, if the user is willing to trust the bundler to honor those constraints. For example, a segregated 4337 mempool could allow users to place constraints expressed as EVM bytecode or bespoke DSL within the initCode field of the user operation. This field is usually blank except for when the first time a wallet submits a transaction, whereby it is populated with an address for a contract factory that deploys the user’s smart contract wallet. This could easily be adapted to include some form of qualifying constraint or condition attached to the user operation (again, see the DSL example above for an example of what it could look like).

Again, this opens up some interesting new possibilities, but also has some strong trust assumptions, after all, the bundler can simply decide to ignore or misinterpret the conditions or constraints you’ve placed on the transaction execution, although this could potentially be mitigated against by using some sort of consensus protocol (e.g. Tendermint).

It also deviates from the specification which right now, is probably not helpful seeing as though the 4337 mempool hasn’t even been properly established yet. Then again, there’s nothing to stop a bundler from doing something like this in the future, if even to just test demand, especially as bundlers seek ways for product differentiation in a highly commoditized market.

Order Flow Auctions

Certain types of transactions (e.g. token swaps) can also be routed to other segmented mempools, those of Order Flow Auctions. These mempools are completely private in as far as the transactions are not shared or propagated to any other actor. Instead, the transaction “metadata” is public and can be viewed by third parties. These third parties can be either anyone that’s interested or only those with permission, depending on the OFA design. These third parties are usually MEV searchers, or market makers, that bid against each other to provide liquidity and capture MEV through atomic or statistical arbitrage, on the condition that most of the MEV they capture is paid to the transaction originator.

The set of transactions within these OFA mempools do not intersect with any other subset of transactions within the global power set of pending transactions on the network. Also, OFAs by themselves don’t facilitate programmable transactions in any way, but they move us closer to having transactions that are dynamic in how they are executed with respect to other transactions in the mempool. For example, MEV searchers can identify and group together similar transactions in order to increase the price impact on a certain liquidity pool, and thus creating an arbitrage opportunity. If a price moves in a certain direction in a different domain within a couple of seconds of a token swap transaction being submitted, searchers can take advantage of this to back-run the transaction that was submitted, based on a now stale price.

SUAVE

SUAVE takes the idea of dynamic transactions that we see in OFAs and brings it to a whole new level. SUAVE aims to be a form of “cross-chain mempool” that users, rollups and L1s can outsource transaction sequencing and block building to. SUAVE is described as being: “an independent network that can act as a plug-and-play mempool and decentralized block builder for any blockchain”.

SUAVE is designed as an independent blockchain, and will process transactions in the form of “user preferences”, which can be regarded as the native transaction type on the network. There is also the concept of “executors” that take user preferences and bid to execute them for the best price. The execution of preferences will take the form of blocks that are bid to validators on various other chains. In this way SUAVE can be regarded as a means by which validators can outsource their block building to the SUAVE network, similar to how block building is outsourced via mev-boost today on Ethereum.

Preferences can be simple or can be quite complex, depending on what the user needs. So long as there is an incentivized executor, it will get executed. As executors are bidding against each other to win the right to execute a transaction, it means that any MEV that can be captured from the transaction will mostly be bid back to the transaction originator. According to Flashbot’s description: “A preference is a message that a user signs to express a particular goal and that unlocks a payment if the user’s conditions have been met”.

Wallets that support SUAVE will allow their users to compose and submit preferences to SUAVE, which will allow for interesting possibilities, such as facilitating atomic cross-chain or cross-rollup token transfers or cross-chain contract function calls, or simply availing of competitively priced token swaps, allowing executors to determine the optional execution path. There could emerge an array of other possibilities as well.

Some of the thought experiments around use-cases for SUAVE include:

  • Pay (x) to anyone who gets me at least (y) of asset A on domain B
  • Combie multisig sub-transactions for a non-smart-contract multi-sigs
  • Execute my transaction which pays x ETH to the sender of the transaction at position i in the block at slot n if the block also includes this transaction in position i+1

This is an improvement over earlier methods for achieving programmable transactions in that it lowers trust assumptions because a) it is a dedicated chain with its own consensus and b) it also controls execution by way of building full blocks.

It is likely that SUAVE will be used by rollups to outsource their block building to. In this manner, SUAVE acts as a sort of “PBS-as-a-service”, where rollups / appchains can delegate the sequencing of transactions into blocks to the SUAVE network. This allows for optimal MEV capture through cross-domain arbitrage, ideally driving value back to transaction originators. In this way, SUAVE can be thought of as a sort of decentralized sequencing layer, similar in ways to Espresso or Stack Network. There are other elements to the SUAVE protocol that are also fascinating that I won’t dive into here, which include the programmable privacy and decentralized block building. If you’d like to find out more, you can read about SUAVE here, and there are some other more technical details available in the slides from Phil Daian’s talk at the Flashbots MEV Privacy Roast.

SUAVE represents a very interesting innovation in that it is a radical departure from the traditional idea of a blockchain having a single canonical mempool, and introduces the idea of a cross-chain mempool in which transactions can be qualified with constraints, that describe how it can be executed in relation to information beyond the global state of the chain it’s being executed on.

Lit Protocol

Lit Protocol is an incredibly powerful innovation. While Lit Protocol doesn’t directly enable programmable transactions as described above, it does have some very interesting properties that could be used to build a programmable transaction layer with. Lit Protocol is quite different from previous examples discussed because the nodes in the Lit Protocol network don’t all have access to the wider public mempool of pending transactions in Ethereum. However, Lit Protocol is interesting because it gives users the ability to compose a transaction, and determine whether or not it is signed and propagated to the mempool, based on arbitrary logic and virtually any sort of off-chain data.

At a high level, Lit Protocol is a cross-chain middleware layer that can be used to perform encryption, decryption or transaction signing, based on on-chain and/or off-chain conditions.

Lit Protocol achieves this by maintaining a decentralized key management network using threshold cryptography. The Lit Protocol’s network creates Programmable Key Pairs that are generated through DKG (distributed key generation) so that no one node has access to the full key at any point. These PKPs can be used to enable the signing of transactions based on the execution of off-chain logic with inputs from on-chain and off-chain data sources. This off-chain logic is contained in immutable JavaScript functions that are stored on IPFS, and are called “Lit Actions”.

When a wallet or dapp sends a request to the Lit Protocol, all the nodes on the network verify the auth signature of the request, and then execute the Lit Action stored at the specified IPFS hash, passing any inputs in the request as parameters to the JS function. There is an upper bound on the amount of time the JS function is allowed to take for execution in order to prevent DOS attacks, and a supermajority of nodes on the network must agree on the output of the function. This output can take various forms, and one such form is a signed transaction that can be broadcast to a specific blockchain network. Used in this way, Lit Protocol allows for very powerful programmable transactions. To quote their documentation:

“Specifically, Lit Actions are JavaScript functions that are executed across Lit’s threshold cryptography network. They are JavaScript smart contracts that, when combined with PKPs, can be programmed to sign transactions and other arbitrary data.”

To be clear, there are actually way more applications and use cases that Lit Protocol enables, and you can get a great overview here. There’s a lot more you can do with Lit Protocol than my rather narrow definition of programmable transactions.

Rollups

I think it’s even possible that rollups could start to embrace new types of transactions as well as backwards compatible EOA signed transactions or 4337 user operations, and these could include “qualified transactions” that allow a user to wrap a transaction in a qualifier object that contains information about how they would like it to be executed. This would involve adapting the verification logic to allow for signatures on either the transaction OR the qualifying object, whereby if a signature was invalid for the transaction, it could be tested against the qualifier object that contains the transaction. How this can be done while minimizing strong trust assumptions in a centralized operator is another topic, and maybe it’s a question to be explored within the shared sequencing space.

Conclusions

At the moment programmable transactions, or “smart transactions”, is just an idea. As such, it remains to be seen how the potential use-cases and applications will drive demand and what form they will take. There will likely be more applications built on top of middle-ware networks such as Lit Protocol and SUAVE, and perhaps even custom 4337 bundlers. There may also be other innovations over time that will open more possibilities for how transactions are routed, propagated and processed.

In order to provide more dynamic and expressive transactions, access to the widest set of transaction is a clear advantage, which at the very least includes the subset of transactions within the public mempool, but ideally also various other segregated mempools including the 4337-alt mempool and the mempools of prominent block builders, while also maintaining privacy. This puts block builders in the best position for providing this sort of capability, and especially if they are sequencing / building blocks for more than one chain.

I suspect that once these capabilities start to surface, the pace of innovation on the wallet / dapp layer to take advantage of them will be very fast. It will be very interesting to see what sort of product features this will enable and what the benefits to the user will be.

Many thanks to David Sneider, Ankit Chiplunkar for reviewing the post.
Special thanks to
Matt Cutler for his valuable insights and feedback.

--

--