
2023–03–14 edit: added a briefly explainer on the role of node providers
In the last two posts of this series, I described what crypto data is and why it is a unique and compelling opportunity. I strongly encourage reading these before continuing if you have not already, as they provide important context for understanding this post. In this post, we will get our hands dirty and discuss the technical details of decoding Ethereum smart contract Data into human readable formats. This is a necessary first step towards a deeper understanding of the underlying user activity. We will focus on Ethereum as the primary example, but many of the concepts we discuss here will apply more broadly to all EVM compatible chains and smart contracts e.g. Polygon, BSC, Optimism, etc.
As we have discussed in previous posts, a smart contract transaction is analogous to a backend API call in a smart contract powered web3 application. The details of each smart contract transaction and resulting application state changes are recorded in data elements known as transactions, calls, and logs. The transaction data element represents the function call initiated by a user (or EOA to be more precise), the call data elements represent additional function calls initiated within the transaction by the smart contract, and the log data elements represent events that have occurred during the transaction execution.
One important thing to note is that in order to interact with the smart contracts and work with the data elements mentioned above, you will need to go through a node. Nodes are participants running the blockchain software that form the network, and they serve as the gateway to the blockchain. Typically, the quickest way to get access to a node is through a node provider like Alchemy, Infura or Quicknode. If you are interested in learning more, this Alchemy blog post provides a great explainer on nodes and how to work with them.
Once we have the data elements, we can very granularly describe the state changes that occurred in the applications and on the blockchain as a result of the transaction. And when analyzed in the aggregate, the collection of all transactions, traces, and logs for a given decentralized web3 application can provide holistic and insightful views of the user bases and their activities in the product. However, doing so is made challenging by the fact that much of the salient details are recorded as hexadecimal encoded strings. See for example, this transaction to swap a pair of tokens using Uniswap on the Ethereum network (This particular record can be obtained by querying the transactions
table in Google’s Public Dataset on Ethereum, and also viewable on Etherscan):
hash: 0x87a3bc85da972583e22da329aa109ea0db57c54a2eee359b2ed12597f5cb1a64
nonce: 449
transaction_index: 37
from_address: 0x3c02cebb49f6e8f1fc96158099ffa064bbfee38b
to_address: 0x7a250d5630b4cf539739df2c5dacb4c659f2488d
value: 0E-9
gas: 228630
gas_price: 91754307665
input: 0x38ed1739000000000000000000000000000000000000000000000000000000009502f900000000000000000000000000000000000000000000a07e38bf71936cbe39594100000000000000000000000000000000000000000000000000000000000000a00000000000000000000000003c02cebb49f6e8f1fc96158099ffa064bbfee38b00000000000000000000000000000000000000000000000000000000616e11230000000000000000000000000000000000000000000000000000000000000003000000000000000000000000a0b86991c6218b36c1d19d4a2e9eb0ce3606eb48000000000000000000000000c02aaa39b223fe8d0a0e5c4f27ead9083c756cc2000000000000000000000000528b3e98c63ce21c6f680b713918e0f89dfae555
receipt_cumulative_gas_used: 2119514
receipt_gas_used: 192609
receipt_contract_address: None
receipt_root: None
receipt_status: 1
block_timestamp: 2021–10–19 00:00:18
block_number: 13444845
block_hash: 0xe9ea4fc0ef9a13b1e403e68e3ff94bc94e472132528fe8f07ade422b84a43afc
max_fee_per_gas: None
max_priority_fee_per_gas: None
transaction_type: None
receipt_effective_gas_price: 91754307665
As you might have noticed if you looked at the transaction on Etherscan, it already decodes this raw record and provides great context to help you understand the transaction details. While this is extremely helpful, it is not designed to answer questions that require transformation and aggregation of the data, e.g. how much total value was traded by all Uniswap users, or what is the 3 month retentions of Uniswap users. To answer these questions, we would need to be able to gather all the records, decode them, and work with the relevant details in batch. We will go through how to do that in the rest of this post.
Decoding Transactions
If we examine the raw data record, we can see that the transaction was initiated by the EOA, 0x3c02cebb49f6e8f1fc96158099ffa064bbfee38b
, to the smart contract address associated with Uniswap v2 router, 0x7a250d5630b4cf539739df2c5dacb4c659f2488d
. However, the relevant request details are encoded as a long hexadecimal string in theinput
field.
Before we get into how to extract human readable data frominput
, it will be instructive to talk through its structure. The leading 0x
is an indicator that this string is hexadecimal, so it is not relevant to the actual information content. After that, every 2 hex characters represent a byte. The first four bytes, in this case 38ed1739
, is the hashed signature of the function being called. The rest of the bytes are hashes of the arguments being passed to the function. This means that the length of the input string can vary depending on the specific function invoked and parameters required.
In order to decode this hexadecimal string, we need to reference something called the application binary interface or ABI. This is a json object that contains all the function and event interface definitions (i.e. names and types) for the given smart contract. The ABI functions as a look up for matching the hashed signatures in the transaction data against the human readable interface definition. An example ABI looks something like this

ABIs can generally be found on block explorers like Etherscan, alongside the contract source code. Here is the link of the ABI for the Uniswap v2 Router contract.
Once we have the ABI handy, we can write to decode the transaction:
A few things to note in the example code:
- This code is designed to be used for batch processing of a large number of transactions. It assumes that the data is already present in local storage (and not fetch live from the blockchain), and is well suited to a distributed processing framework like PySpark.
@lru_cache(maxsize=None)
– We cache the contract object creation to reduce overhead from repeating the same computation across a large number of transactions. This assumes that the decoding is targeted at a smallish number (on the order of thousands) of distinct smart contracts.- It leverages the open-sourced web3 package method
decode_function_input
for extraction of data based on the templates provided in the ABI. This method, however, returns data that is often not serializable (e.g. byte arrays) and sometimes missing human-readable keys. Therefore, it is very helpful (maybe even necessary) to perform post-extraction processing using the utility methodconvert_to_hex
to convert the data into serializable json objects and attach the human understandable keys where missing. This makes it easier to persist and re-use the decoded data. - The same code can be used for decoding trace data elements as well. This is because they are simply internal transactions initiated by a smart contract.
Using the code above yields this decoded input data
function called: swapExactTokensForTokens
arguments: {
"amountIn": 2500000000,
"amountOutMin": 194024196127819599854524737,
"path": [
"0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48",
"0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2",
"0x528B3e98c63cE21C6f680b713918E0F89DfaE555"
],
"to": "0x3c02cebB49F6e8f1FC96158099fFA064bBfeE38B",
"deadline": 1634603299
}
From this we can much more easily understand that
- The call is to a method named
swapExactTokensForTokens
, and the user is putting in 2,500,000,000 units of the starting token, and expecting to get back at least 194,024,196,127,819,599,854,524,737 units of the target token. These numbers may seem astronomical, but keep in mind that token units are typically denoted in the 1/10^n, where n is something like 18. n is sometimes referred to as the decimal value of the token. - The
path
array describes the tokens being exchanged in this transaction. Each array element is an address of a token contract. The first one is USDC (a stable coin pegged to the dollar), the second one is Wrapped Eth (Ethereum with an ERC20 interface), and the third one is DXO (a deep.space in-game currency). - Putting 1 and 2 together, we can deduce that the user request is to swap 2,500 USDC (USDC has a decimal value of 6) for ~194 million DXO (DXO has a decimal value of 18). Since this particular pairwise swap is not directly available, the transaction will be mediated through the intermediary token of WETH.
Decoding logs
This transaction also emitted 7 events in the process of execution, which can be obtained by querying the logs
table in Google’s Public Dataset on Ethereum, and also viewed on Etherscan. The two most salient records that correspond to the swaps requested by the users are:
log_index: 47
transaction_hash: 0x87a3bc85da972583e22da329aa109ea0db57c54a2eee359b2ed12597f5cb1a64
transaction_index: 37
address: 0xb4e16d0168e52d35cacd2c6185b44281ec28c9dc
data: 0x000000000000000000000000000000000000000000000000000000009502f90000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000093f8f932b016b1c
topics: [
'0xd78ad95fa46c994b6551d0da85fc275fe613ce37657fb8d5e3d130840159d822',
'0x0000000000000000000000007a250d5630b4cf539739df2c5dacb4c659f2488d',
'0x000000000000000000000000242301fa62f0de9e3842a5fb4c0cdca67e3a2fab']
block_timestamp: 2021-10-19 00:00:18
block_number: 13444845
block_hash: 0xe9ea4fc0ef9a13b1e403e68e3ff94bc94e472132528fe8f07ade422b84a43afc
and
log_index: 50
transaction_hash: 0x87a3bc85da972583e22da329aa109ea0db57c54a2eee359b2ed12597f5cb1a64
transaction_index: 37
address: 0x242301fa62f0de9e3842a5fb4c0cdca67e3a2fab
data: 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000093f8f932b016b1c000000000000000000000000000000000000000000a137bb41b9113069a51e190000000000000000000000000000000000000000000000000000000000000000
topics: [
'0xd78ad95fa46c994b6551d0da85fc275fe613ce37657fb8d5e3d130840159d822', '0x0000000000000000000000007a250d5630b4cf539739df2c5dacb4c659f2488d', '0x0000000000000000000000003c02cebb49f6e8f1fc96158099ffa064bbfee38b']
block_timestamp: 2021-10-19 00:00:18
block_number: 13444845
block_hash: 0xe9ea4fc0ef9a13b1e403e68e3ff94bc94e472132528fe8f07ade422b84a43afc
Again, the relevant details are encoded into hexadecimal string in the topics
and data
fields. As in the case with transaction input
, it is instructive to go through the structure of these data fields. topics
is an array where the first element represents the hashed signature of the event interface definition. Any additional elements in the topics
array are typically blockchain addresses that are involved in the event, and may or may not exist depending on the specific context. data
represents the event parameter values and can vary in length depending on the event definition. As was the case with transactions, we need to reference the contract ABI, in order to translate this into human readable form.
The astute reader will notice that the contract addresses in the logs above, 0xb4e16d0168e52d35cacd2c6185b44281ec28c9dc
and 0x242301fa62f0de9e3842a5fb4c0cdca67e3a2fab
are different from the Router v2 contract, 0x7a250d5630b4cf539739df2c5dacb4c659f2488d
, which the user EOA initially called. These two addresses correspond to Uniswap v2 pair contracts for the USDC-WETH and DXO-WETH token pairs. These contracts are responsible for holding the liquidity for their respective trading pair and actually making the swap. The Router contract, which the user interacted with initially, functions as a coordinator and initiates internal transactions (traces) to the appropriate pair contracts. Therefore, in order to decode these events, we also need the pair contract ABI. Example code to decode logs is as follows:
Similar to the code for transaction decoding, the example code is optimized for batch decoding use cases, and is meant to be used in conjunction with something like PySpark to process a large number of log events. Running the above yields:
event emitted: Swap
arguments: {
"sender": "0x7a250d5630B4cF539739dF2C5dAcb4c659F2488D",
"to": "0x242301FA62f0De9e3842A5Fb4c0CdCa67e3A2Fab",
"amount0In": 2500000000,
"amount1In": 0,
"amount0Out": 0,
"amount1Out": 666409132118600476
}
and
event emitted: Swap
arguments: {
"sender": "0x7a250d5630B4cF539739dF2C5dAcb4c659F2488D",
"to": "0x3c02cebB49F6e8f1FC96158099fFA064bBfeE38B",
"amount0In": 0,
"amount1In": 666409132118600476,
"amount0Out": 194900241391490294085918233,
"amount1Out": 0
}
We can that these two are indeed swap
events that followed the path
in the initial request – USDC > WETH > DXO. We can see that the router contract (ending 488D
) is the sender in both events, acting as the coordinator. The USDC-WETH pair contract (ending c9dc
) swaps 2,500,000,000 units of USDC for 666,409,132,118,600,476 units of WETH, and then transfers the resulting WETH to the DXO-WETH pair contract (ending 2Fab
). The DXO-WETH contract then swaps the 666,409,132,118,600,476 units of WETH for 194,900,241,391,490,294,085,918,233 units of DXO and sends it back to the user (EOA ending E38B
) as initially requested.
Closing thoughts
As this example hopefully illustrates, the process of decoding is relatively straightforward once you have the tools, but knowing what to decode and how to interpret the resulting data is not. Depending on the specific question you are trying to answer, some functions and events are more relevant than others. For the purpose of analyzing economic activity and user behavior in web3 applications, it will be important to develop an understanding of how the specific smart contracts work, and to identify the key functions and events involved in the metric of interest. This is best done through a combination of actually using the product, examining the data exhaust on Block explorers like Etherscan, and reading the smart contract source code. This is a crucial requisite for developing the right decoding and analysis strategy.
I hope this was a useful discussion and I have helped you gain a better sense for how to work with crypto data. In my next post, I will show an example deep dive analysis on Opensea, the largest NFT marketplace. Be sure to hit the email icon to subscribe if you would like to be notified when that posts.
_Thank you for reading and feel free to reach out if you have questions or comments. Twitter | Linkedin_