1. Introduction
Blockchain technology is an innovative digital ledger system that provides secure record-keeping by storing and redundantly verifying transactions on a distributed network of nodes [
1]. This technology bifurcates into two primary classes: public (or permissionless) and private (or permissioned) blockchains. Permissionless blockchains are open access and allow the participation of any individual or entity [
2], while permissioned blockchains require credential validation or an economic incentive to allow collaboration in the network [
3]. Permissionless blockchains have pushed the development of DApps, which exhibit features such as distributed business logic, distributed data, resilience to failures at central points, and a guarantee of data immutability [
4].
However, permissionless blockchains face challenges that limit the optimal operation of DApps. One of the most relevant challenges is storage scalability, specifically the growth of the blockchain’s sublinearly with the number of nodes. To understand the problem of storage scalability in blockchains, let us imagine a library that constantly receives new books (blockchain transactions) with a constant daily rate of ten books, known as the growth rate, c. For security and redundancy, the library stores copies of all the received books in different sections, with the number of sections equivalent to the number of nodes n. In this scenario, if we want to determine the total number of books in the library storage size, s, we could calculate it as . However, the challenge occurs when the librarian cannot control the number of sections (nodes) where the book copies are stored. For example, one day there are five sections, and the next day, there are seven sections. This fluctuation in the number of sections affects the storage capacity of the library and the management of the books.
A real-world example of this challenge is seen in Bitcoin, where the storage size of the blockchain has currently reached 3.28 petabytes [
5]. This situation is influenced by the constant growth rate of the blockchain, which is approximately 488 GB per node, and by the number of nodes redundantly storing transactions, presently around 7065 [
5]. Ethereum [
6] serves as a notable case where storage growth may follow an exponential trend, as depicted in
Figure 1.
The previously mentioned issues arise from the inherent redundancy built into the design of permissionless blockchains. This redundancy creates a delicate balance: improvements in transactional throughput (measured in transactions per second) inevitably lead to increased storage requirements, while attempts to reduce storage potentially compromise throughput due to decreased availability and increased latency.
There are three primary approaches to increasing transactional throughput: block size management, off-chain mechanisms, and sharding. Block size management increases the block size to allow more transactions per block, temporarily helping transaction congestion [
7,
8]. Off-chain mechanisms process transactions outside the main blockchain through payment channels or sidechains, reducing the load on the main blockchain [
9,
10,
11,
12]. Sharding increases throughput by splitting the blockchain into smaller, parallel-processing parts called shards [
13,
14,
15]. However, the impact of these approaches on storage growth needs careful consideration.
Storage efficiency enhancement approaches are divided into centralized and decentralized data. Centralized approaches store data in a single location or through a central entity [
16,
17,
18], while decentralized strategies distribute data across multiple nodes in the blockchain network, enhancing robustness and immutability [
19,
20]. The common goal is to increase storage efficiency, but these strategies affect transactional throughput.
In summary, advances in blockchain technology aim to enhance transactional throughput and reduce node storage requirements. However, these goals are not mutually exclusive, as improvements in one often impact the other. We identified a noticeable gap in the analyses that relates storage growth to transactional throughput and vice versa. In this article, we unlock transactional patterns of the UTXO model to reveal the relation between storage and transactional throughput, providing the first analysis of the relation of these parameters. To achieve this, we apply the following methodology:
- 1.
Analysis and abstraction of transactional models.
- 2.
Formal comparison of models to highlight their cost on storage.
- 3.
Run experiments with data from the Bitcoin and the Ethereum blockchains.
The analysis resulting from the previous methodology shows that the UTXO model is more storage-intensive but offers flexibility in transactional throughput, showing signs of a trade-off in the parameters. The transactional behavior of the models, resulting from the abstraction step, led us to introduce a novel DAG-based abstraction of the Bitcoin transactional model: the spent-by relation. This new relation unlocks the transactional patterns that represent any transaction on the blockchain and shows the relationship between throughput and storage. Finally, the experiments on more than 800 M transactions show the most storage-intensive transactional patterns.
The remainder of the paper is structured as follows:
Section 2 presents an overview of the fundamental concepts of transactional models.
Section 3 presents an overview of related work, with particular emphasis on strategies that impact storage/throughput within blockchain systems.
Section 4 presents an analysis of the execution of transactional models and their impact on blockchain storage. In
Section 5, the spent-by relation is introduced as a novel abstraction of the UTXO model. In
Section 5.3, we unlock the transactional patterns within the UTXO model. Finally, in
Section 6, we introduce an experimental comparison of storage costs in UTXO transactional patterns.
2. Fundamental Background of Transactional Models
Blockchain technology, at its most basic essence, provides a mechanism for secure and verifiable storage of records through a redundancy system. This redundancy results from the verification and distributed storage of transactions in a network of nodes operating in a peer-to-peer (P2P) system. Transaction records on the blockchain network are grouped into blocks, thus creating a chain of blocks, hence the term “blockchain”. Each block contains a series of transactions, all of which are validated and confirmed by the network. The block is linked to the previous one through a unique identifier called hash. This hash results from a cryptographic function that takes the data of the current block and the ID from the previous one, producing a unique fixed-length string. This implies that any change breaking the blockchain indicates manipulation.
This fundamental understanding of blockchain technology sets the stage for a deeper exploration of its complexity and functionality, especially in the context of the transactional models of Bitcoin and Ethereum. In this section, the main transactional models are discussed, specifically the unspent transaction output (UTXO) [
21] model and the account model [
22].
2.1. UTXO Model
In the UTXO model, the state of transactions is represented as a collection of unspent transaction outputs. This is illustrated in the DAG shown in
Figure 2, where vertices symbolize transactions, and edges represent pointers that consume the previous transaction to generate a new one.
There are several definitions of the UTXO model, such as a Directed Acyclic graph. In this article, the definition provided by Jeyakumar et al. [
23] is highlighted for its ability to encompass the transactional model of Bitcoin and Ethereum.
Definition 1. A directed graph , where represents the set of nodes and represents the set of edges. For each vertex , an edge is of the form .
Bitcoin transactions use one or more unspent outputs from previous transactions to create new outputs. These new outputs become unspent outputs that are available for future transactions.
According to Narula and Dryja [
24], a digital signature, a public key, and a timestamp must be provided to consume an unspent output. In addition, the following properties must be met:
- 1.
All outputs are not the same.
- 2.
An unspent output refers to a specific output when spending.
- 3.
Unspent outputs are consumed, creating new outputs.
- 4.
An output can only be spent once.
These properties are based on the Bitcoin protocol and replicated in other applications.
2.2. Account Model
In contrast to the UTXO model, the account model represents the state of blockchain transactions as a variety of accounts or addresses, which are managed by entities or smart contracts [
25]. These entities can be individuals, organizations, or automated systems. An example of automated systems is smart contracts, which are simple programs housed within Ethereum’s virtual machine (EVM), facilitating the execution of complex operations and agreements autonomously, while providing high reliability.
In Ethereum’s implementation of the account model, transactions are abstractly represented as state transitions.
Figure 3 shows a graphical representation that illustrates the flow of transactions that update account statuses as they are executed.
In addition to the traditional transactions in Ethereum, there are types of transactions specifically related to smart contracts. These transactions are typically classified in the literature as contract deployment and contract invocation:
The process of contract deployment essentially involves the creation of a smart contract. This can be equated to an executable program that is assigned a unique address within the blockchain. The smart contract contains a set of predefined functions or instructions that are written in a programming language compatible with the Ethereum blockchain, such as Solidity [
26].
On the other hand, contract invocation refers to the process of executing or “calling” the functions embedded within the smart contract. These functions can be invoked by other addresses within the blockchain network, allowing them to interact with the smart contract and initiate specific operations. These operations can range from simple value transfers to more complex interactions involving multiple smart contracts [
27].
Finally, there are other transactional models, such as the EUTXO model [
28] and the account abstraction ledger [
29], but these are based on the models discussed above.
4. Understanding the Execution of Transaction Models and Their Relation to Blockchain Storage
This section analyzes the most relevant transactional models in the literature, such as the UTXO and the account model. The goal is to understand their transactional behavior and the relationship with storage. This was done by abstracting the transactional models of Bitcoin and Ethereum into transactional cases: three cases for the UTXO model and one for the account model. Using these abstractions, we performed a formal and experimental comparison and identified which of the two models incurs higher storage costs.
4.1. UTXO Model Storage Growth Analysis
In the UTXO transactional model, each transaction consumes one or more unspent outputs and generates one or more new outputs. When a new transaction is generated, it is possible to choose which unspent outputs are involved. This selection is arbitrary as long as the sum of the inputs is greater or equal to the total value of the outputs. The arbitrariness of the UTXO model allows for simultaneous operations while ensuring that the new transaction is directly linked to previous transactions on the blockchain. To better understand transaction execution consider the following example.
Example: Suppose that Alice purchases a coffee from Bob using Bitcoin. Alice has BTC 0.2as unspent outputs in her wallet, and the coffee value is BTC 0.1. Three cases can be produced after the purchase regarding how unspent outputs can be selected: (a) a single output, (b) multiple outputs with a value less than the input value, or (c) multiple outputs with the same value as the input value.
- (a)
In the first case, as shown in
Figure 5a, Alice has a single output in her wallet with a value of BTC 0.2. To pay for the coffee, she creates a transaction that splits the BTC 0.2 unspent output into two new outputs: one with BTC 0.1 that she sends to Bob and another with BTC 0.1 that she sends back to herself.
- (b)
In the second case, as shown in
Figure 5b, Alice has multiple outputs in her wallet with a value less than the input value. To pay for the coffee, Alice merges the unspent outputs with a lesser value up to BTC 0.1, and creates a transaction that she pays to Bob.
- (c)
In the third case, as shown in
Figure 5c, Alice has multiple outputs in her wallet with a value equal to the input value. To pay for the coffee, Alice transfers the unspent output with the same value as the coffee and creates a new transaction that is sent to Bob.
The example above shows that the execution of the UTXO model has two features: the order selection of unspent outputs and the concurrently executed transaction. The arbitrary order of unspent output selection allows granular control over the input consumed by each transaction, allowing flexibility since a single transaction can consume multiple combinations of unspent outputs. This flexibility of the UTXO model allows for the simultaneous execution of unspent outputs. This approach facilitates the processing of multiple operations from a single unspent output within a single transaction, increasing transactional throughput. However, we have observed that this simultaneous execution in the UTXO model incurs a high storage cost. This cost escalates with an increase in the number of new unspent outputs. This additional storage demand impacts the efficiency of these nodes’ storage capabilities. A detailed analysis of the storage costs associated with the UTXO model is provided in
Section 5.3.
4.2. Account Model Storage Growth Analysis
In the account model, each user has a unique address used as an identifier and associated with the balance of the transaction history.
Figure 6 shows how an address’s balance, as a state, is updated by transactions, which subtract transferred value assets from the sender’s account and add value to the recipient’s account. An example of the account model is traditional banking systems, where a user has a unique account number associated with their balance. When a user initiates a transaction, the funds are debited from their account and credited to the recipient’s account. The account balance represents the current state of the user’s funds, and all transactions are recorded in a ledger.
Ethereum’s programmability allows for two additional types of transactions within its account model: those that deploy contracts on the Ethereum virtual machine (EVM) and those invoking functions of these smart contracts. Each contract within the EVM operates under its unique set of rules and transactions, executed by external transactions. However, maintaining account states in Ethereum, as illustrated in
Figure 6, requires transaction serialization. This condition limits high transaction throughput but is offset by transactions that require less storage capacity.
In the transactional models described before, we find significant differences in terms of transaction execution, which directly impacts storage requirements. For instance, the Bitcoin model can split one output to create new ones and merge multiple outputs into a smaller set, as shown in the Alice and Bob example. This flexibility means that the storage size of each transaction can vary depending on the number of outputs it manipulates. On the other hand, the account model manages the state in a serialized manner that is less storage-intensive but at less throughput. Each transaction updates the state of accounts directly, leading to a more predictable and often smaller storage footprint per transaction compared to the Bitcoin model.
To validate this analysis, we conduct an analytical study, comparing the two models by representing them as graphs (The details of the formal comparison of the two transactional models are available in
Appendix A), and evaluate a particular case in the following subsection.
4.3. Transaction Sizes in Bitcoin and Ethereum
In this section, we analyze the Bitcoin and Ethereum blockchains. Our hypothesis based on the previous section is that the UTXO model requires more storage than the account model. For our comparison, we used a random sample of 10% of the transactions processed on each blockchain until 4 July 2023. This resulted in the analysis of 84,474,947 transactions in Bitcoin and 348,506,740 in Ethereum.
For data extraction, a set of specific tools and libraries were used: BlockSci version 0.7 [
34], Geth version 1.12.0 [
35], Python 3, along with the libraries Pandas, NumPy, Multiprocessing, and Matplotlib. The repository for reproducing the experiments can be found at:
https://github.com/jdom1824/Unlocking-UTXO-transactional-patterns (accessed on 3 June 2024). The results obtained are visualized in the form of histograms, shown in
Figure 7 and
Figure 8, to facilitate comparison. The
X-axis represents the size of the transactions, while the
Y-axis represents the number of transactions.
When comparing the histograms, it is clear that the distribution of transactions in Bitcoin extends up to 1 MB. This is a significant size that reflects the robust nature of the UTXO model, as it can handle large transactions while resisting failures. In contrast, Ethereum operates differently. Only a small number of transactions in Ethereum reach a size of up to 0.3 MB. This is less than a third of the maximum observed in Bitcoin, indicating a more compact transaction size in Ethereum’s model.
A closer look at the data reveals that most transactions in Ethereum are situated in the range of 0.13 MB. This is a narrower range compared to Bitcoin, where a wider distribution is observed, reaching up to 0.2 MB. This difference in distribution patterns between the two cryptocurrencies provides valuable insights into their respective transactional models.
As a result of these observations, the histograms suggest that the transactional model of Bitcoin implies a higher storage cost. This cost is not static; it is anticipated to escalate in line with the fragmentation of unspent outputs, as depicted in example (a) of
Figure 5. This trend suggests that as Bitcoin usage increases, transactions increase storage requirements.
On the other hand, Ethereum presents a different scenario. It has a lower storage cost that is expected to remain constant within the same storage ranges. This stability is related to transaction serialization, indicating a more stable model for Ethereum in terms of storage. This has significant implications for the development of DApps on Ethereum.
4.4. Summary
This section analyzes the execution of the transactional models for both Bitcoin and Ethereum. We identified that the transactional model of Bitcoin is more flexible when selecting the available outputs to consume, while the Ethereum model presents simpler transactions that are easily programmable. The flexibility of the UTXO model makes it efficient when transferring value to users, while the account model is limited by serialization to update the state of the EVM.
We established the hypothesis that the UTXO transactional model incurs higher storage costs due to the splitting and consolidation of unspent outputs. We confirmed our hypothesis with an analytical study in
Appendix A as well as the histograms shown in
Figure 7 and
Figure 8. Although the UTXO model is storage-intensive, it also allows for significant transaction throughput. This is achieved by allowing multiple operations within a single transaction, providing flexibility between transaction throughput and storage, and showing the signs of the trade-off in the parameters.
In the following section, we focus on the UTXO transactional model, specifically on the model’s flexibility to perform multiple operations. We delve deeper into transactional patterns to define the trade-off between storage parameters and transactional throughput.
5. Unlocking Transactional Patterns Based on Spent-By Relation
This section unlocks transactional patterns of the UTXO model to reveal the trade-off between transactional throughput and storage. To do this, we used abstractions from previous analyses and defined the spent-by relation. We then modified the cardinality of the spent-by relation using less than, greater than, and equal functions to observe three transactional patterns within the UTXO model: splitting, merging, and transferring. For clarity in this analysis, we proceed based on the premise that the number of nodes () within a permissionless blockchain system grows linearly.
5.1. Defining the UTXO Model as a DAG
The UTXO model is defined as a DAG. Formally, it is represented as a tuple , where V is a finite set of vertices, and R is a set of edges, such that we have the following:
The set of vertices represents the outputs of the UTXO model and is divided into two subsets . Here, is the set of spent outputs, and is the set of unspent outputs.
The set of edges R is determined by the spent-by relation, which specifies how the and are related.
5.2. Spent-By Relation “←”
To define the spent-by relation, we begin by partitioning the graph (
G) into a subgraph,
, as illustrated in
Figure 9, where
,
, such as
.
Let us define the set of edges, , which satisfies the following properties:
. This represents all pairs , where x and y are elements of the sets and , respectively.
. This means that the number of edges in is one less than the number of vertices in .
The spent-by relation defines the set of relations that exist between subsets of unspent outputs and spent outputs. Formally, we define the spent-by relation as a subset
of the Cartesian product
:
Based on the cardinality relation between and , different transactional behaviors are observed: splitting, merging, and transferring. The splitting pattern occurs when a set of spent outputs is divided into a larger set of unspent outputs (i.e., ). The merging pattern manifests when multiple spent outputs are combined into a smaller number of unspent outputs (i.e., ). Lastly, the transferring pattern arises when each element in is linked precisely to one element in (), representing a one-to-one relation between spent and unspent outputs.
5.3. Unlocking Transactional Patterns
This section focuses on transactional patterns and introduces the relationship between throughput and storage parameters.
5.3.1. Splitting Pattern
To illustrate the splitting pattern, let us revisit the example of Alice and Bob, specifically referencing the scenario presented in
Figure 5a. This pattern involves dividing one or several unspent outputs into smaller parts, as illustrated in
Figure 10. However, it is important to highlight that we have generalized the splitting pattern by extending it to all scenarios where the set of unspent input values is greater than the set of spent output values.
In the behavior of the splitting pattern, it is observed that the number of operations depends on a factor defined within the application. For instance, a single Bitcoin in an unspent output can be divided into up to
new outputs [
36]. Therefore, to calculate the number of outputs per splitting pattern and its associated storage, we present the following definitions:
Definition 2. (Outputs per splitting pattern) The number of outputs produced by a splitting pattern within a given time interval is quantified using two parameters: the splitting factor () and the time interval (t), where and . Consequently, the output rate per time interval can be expressed as follows: Definition 3. (Storage per output splitting pattern) The storage generated by the splitting pattern is related to the average output size (τ), the number of outputs generated per time interval (), and the number of nodes in the system (η). This is represented as follows: Note that the value of () in Definition 2 is determined by each application, setting constraints on the number of new outputs. We operate under the assumption that is a very large number, and therefore, presents a high degree of transactional throughput. However, as indicated in Definition 3, there is a strong relation between transactional throughput and storage. This relation is only observable at the level of transaction models. Our observations reveal that as the number of outputs processed in a transaction increases, so does the storage cost on the nodes. Consequently, storage grows in proportion to transactional throughput.
To evaluate the maximum growth of storage, we employ a Big O notation. This indicates that the increase in storage, following the splitting pattern, is given by .
5.3.2. Merging Pattern
The merging pattern emerges from the consolidation of multiple outputs into a reduced set of unspent outputs, as illustrated in the example of Alice and Bob presented in the previous section, specifically in
Figure 5b. The primary characteristic of the merging pattern lies in the reduction of the number of new outputs to a smaller set compared to the input values, establishing a balance with the splitting pattern. The abstraction of this pattern is illustrated in
Figure 11. To calculate the number of outputs per merging pattern and the amount of storage used per output, we present the following definitions.
Definition 4. (Outputs per merging pattern) In this definition, we use to represent the number of outputs generated by the merging pattern, where and . Therefore, the number of outputs generated by the merging pattern equals the set of unspent outputs, which by definition are fewer than the number of spent outputs. Definition 5. (Storage per output merging pattern) The average output size τ, the number of outputs generated per time interval (), and the number of nodes in the system (η) measure the storage generated by the merging pattern per time interval as follows: Definitions 4 and 5 illustrate how the merging pattern improves the efficiency of future transactions. This improvement results from consolidating multiple outputs into a reduced set of unspent outputs, which reduces the processing constraint for subsequent transactions. As a result, less time and fewer computational resources are required to process and validate transactions, boosting the overall system efficiency. However, it is important to consider that defining the storage per output merging pattern suggests a similarity to the splitting pattern. We recognize that the average output size,
, can vary significantly depending on the pattern or transaction type. We explore this variation further in
Section 6. Storage growth, following the merging pattern, occurs at a rate of
.
5.3.3. Transferring Pattern
The transferring pattern represents the exchange of ownership between parties without the need to engage in computational processing to split or merge unspent outputs. This pattern can be visualized in a scenario where an unspent output changes ownership through its inclusion as an input in a new transaction, generating a new output, as illustrated in
Figure 5c.
The transferring pattern is a fundamental component in both the Bitcoin UTXO model and the Ethereum account model. In the UTXO model, it is characterized by the serialized tracing of unspent outputs, while in the account model, it updates the state of individual accounts or Ethereum addresses. Both models share the transferring pattern for managing transactions, as depicted in the abstraction shown in
Figure 12.
An interesting feature of the transferring pattern is that only a one-to-one operation is carried out at each time interval. This structure has notable implications for both parameter storage requirements and transactional throughput.
Definition 6. (Storage per output transferring pattern) The storage generated by the transferring pattern is related to the average output size (), the number of outputs generated per time interval (), and the number of nodes in the system (η). This is represented as follows: Since a transaction in the transferring pattern is constrained by the non-concurrency of the operations, storage grows constantly. In terms of computational complexity, this means that the storage requirements for this pattern increase linearly with the number of nodes in the network. This realization comes from the recognition that the transferring pattern is sufficient to represent the serialization process within the account model or UTXO model.
5.4. Relationship between Throughput and Storage
Transactional throughput refers to the system’s capacity to process transactions over a time interval, and each transaction in environments such as Bitcoin can generate multiple outputs.
We consider the following parameters before defining the transactional throughput and its relationship with storage:
Outputs across transactional patterns This parameter, denoted as
, represents the total number of outputs generated by all transactional patterns (splitting, merging, and transferring). It is the sum of the outputs from each pattern, expressed as follows:
Number of outputs of all transactional patterns in a time interval: This parameter, denoted as
, represents the total number of outputs generated by all transactional patterns per time interval. It is calculated by dividing the total number of outputs
by the time interval
t, expressed as follows:
Average number of outputs per transaction: This parameter, denoted as
, represents the average number of outputs generated per transaction. It is calculated by dividing the total number of outputs
by the total number of transactions
, expressed as follows:
Definition 7. (Transactional Throughput) We define transactional throughput (tps) as the number of transactions processed per second. If σ is the total number of outputs generated in a time interval t, and λ is the average number of outputs per transaction, then transactional throughput is calculated as follows: Definition 8. (Throughput-Storage Relationship) The storage generated by each transactional pattern is related to the average output size (τ), the number of outputs generated per time interval (σ), and the number of nodes in the system (η). Therefore, the relation between the transactional throughput and storage is given by the following: By increasing the transactional throughput (tps), we also increase the number of outputs per interval of time () and, therefore, the required storage increases.
5.5. Summary
In this section, we unlock the transactional patterns inherent in the UTXO model. We formalize the UTXO model by representing it as a DAG and define the spent-by relation. We reveal the trade-off between transactional throughput and storage based on the definitions of each pattern, highlighting that storage growth is related to the number of new outputs generated. We analyze each pattern’s contribution to storage size, employing Big O notation. The underlying premise is that the number of nodes in the permissionless blockchain network increases at a linear rate. However, although analytically, the splitting and merging transactional patterns consume more storage, these results are not directly comparable due to our assumption of output size as a constant . In the following section, we delve deeper into this variable and define which pattern is most costly in storage and which provides more flexibility in the throughput.
6. Experimental Comparison of Storage Costs in UTXO Transactional Patterns
This section analyzes the storage cost of each pattern to identify which is higher and which provides greater flexibility in the storage/throughput trade-off.
In the theoretical analysis that we previously conducted, we used a constant
for the transferring, merging, and splitting transactional patterns. For this experimental study, we used the entire Bitcoin blockchain as our dataset, examining a total of 791,800 blocks to determine the storage of each pattern.
Figure 13 shows the experimental framework for our analysis using Bitcoin Core version 0.22 [
37]. We synchronized a complete Bitcoin node up to 4 July 2023 and extracted data for further analysis using BlockSci version 0.7.0. After extracting the data, we filtered the dataset based on transaction patterns and converted it into graphical representations to enhance the clarity and interpretability of the results discussed in this section. Derived from this work, we have created a database containing 800 million transactions, which can be used to replicate the experiments in [
38].
As mentioned before, the initial step taken with the dataset involved filtering and classifying Bitcoin transactions. This classification results in the distribution of transactional patterns within Bitcoin and is represented in a pie chart, as shown in
Figure 14. We observed that the splitting pattern is the most frequent in Bitcoin, accounting for 64.6% with a total of 545,585,796 transactions. This trend emerges because Bitcoins are generated through the Coinbase transaction, which includes a UTXO with a significant amount of Bitcoin. Due to the high dollar value of each Bitcoin, their utilization likely begins with a division.
The transferring pattern accounts for 22.1% of classified transactions, totaling 186,881,657. We can assert that this is the second most utilized pattern in Bitcoin. The reason is that in Bitcoin, a fee is levied based on the storage consumed by the transaction. Since this pattern is the least storage-intensive, it is the second most common.
The merging pattern accounts for 13.3% of transactions, amounting to 112,107,603. From these data, we infer that the consolidation of unspent outputs is a more storage-intensive process. We assume that the available output for expenditure must encompass the causal history of previous transactions.
The classification depicted in
Figure 14 reflects the most used patterns in Bitcoin. From this, we discern an indication suggesting that the merging pattern is the most storage-intensive. We then analyze each transaction pattern individually, considering the number of outputs against storage size. This clarifies the storage difference between the splitting and merging patterns. Moreover, we confirm that the least storage-intensive transaction pattern is transferring.
6.1. Storage Cost in Splitting Pattern
Figure 15 provides a graphical illustration of transactions classified under the splitting pattern. The
X-axis represents the size of the transactions in bytes, whereas the
Y-axis represents the number of outputs used in each transaction. Through an in-depth analysis of the data density and distribution depicted in the chart, we confirm our initial observation that the splitting pattern is dominant within Bitcoin.
Concerning the relation between the number of outputs and storage costs, we identified transactions labeled as splitting, which recorded up to 15,000 outputs in a single transaction. In terms of storage, this transaction has demanded up to 0.5 MB. Nevertheless, the transactions tend to fall within a range of up to 4000 outputs with a storage requirement that is close to 0.2 MB.
We highlight the significance of the splitting pattern in Bitcoin. While it is the most common transactional pattern, and some transactions demand substantial storage resources, the overall trend remains moderate. It is important to note that one Bitcoin is split into up to 100 million parts, making this thorough analysis of the pattern crucial to guide future research efforts within the Bitcoin network.
6.2. Storage Cost in Transferring Pattern
Figure 16 provides a graphical illustration of transactions classified under the transferring pattern. Based on the spent-by relation, this pattern contains transactions that maintain a one-to-one operation within the set of outputs. In the graph, the
X-axis represents the size of the transactions in bytes, while the
Y-axis indicates the number of outputs used.
It is observed that some transactions reach up to 2000 outputs, with a storage cost of 0.15 MB. However, the overall trend revolves around transactions using approximately 500 outputs, with a storage requirement of about 0.05 MB.
In addition, in
Figure 16, two distinct point distributions are revealed, each representing a specific transaction type. Upon analysis, it is meaningful that certain transactions with a larger number of outputs have a lower storage cost, especially in the 0.05 to 0.06 MB range. This variability in storage arises from the diversity of transaction types in Bitcoin, which includes standard transactions, Multisig transactions [
39], Pay-to-Script-Hash (P2SH) transactions [
40], SegWit transactions [
41], CoinJoin transactions [
42], and time-locked transactions [
43]. Each type has its unique storage characteristics and requirements, reflecting the variety of transactions observed in the graph.
6.3. Storage Costs in the Merging Pattern
Figure 17 provides a classification of transactions under the merging pattern. In this chart, the X-axis represents the volume of the transactions in megabytes (MB), while the Y-axis quantifies the number of outputs involved. It is noteworthy that several transactions reach up to 1 MB, which corresponds to the maximum capacity of a Bitcoin block before the SegWit implementation, with an output range oscillating between 6000 and 7500. However, transactions within this pattern fall within a range of approximately 2500 outputs, consuming storage close to 0.2 MB.
Analogous to previous figures, some transactions have a storage distribution that deviates from the classification of the merging pattern. Note that Bitcoin offers a variety of transaction types. This diversity is interesting for future studies and possible classification of patterns in different Bitcoin transaction types [
44].
6.4. Analysis of Transaction Pattern in Storage/Throughput Flexibility
In our detailed review of the three patterns, we observed that the transferring pattern has the lowest storage requirement, rating it as the second most common pattern in Bitcoin. The splitting pattern is the most common and offers the best trade-off between transactional throughput and storage. The merging pattern supports operations that consolidate outputs on the order of thousands but require more storage. However, the storage cost for each pattern varies depending on the structure. For example, a structure with a higher number of spent outputs than unspent outputs is more costly in terms of storage because it is necessary to prove ownership of the coins by unlocking the transaction script, which requires a digital signature, as shown in
Table 2. Further comparison between the structures of spent and unspent outputs, depicted in
Table 2 and
Table 3, illustrates the different storage requirements.
6.5. Summary
In this section, we found that in the UTXO model, there is no fixed storage value for spent and unspent outputs; this varies depending on the transactional pattern and types of transactions. We observed that the splitting pattern offers the best trade-off between throughput and storage, allowing millions of operations in a single transaction while keeping the storage low. However, this benefit is offset by the merging pattern, which consolidates these operations into transactions that, although more storage-intensive, reduce the number of outputs and prevent overflow in processing. Finally, we conclude that the key to achieving storage scalability in a permissionless blockchain system resides in proposing strategies that optimally trade off the relationship between throughput and storage at the transaction pattern level.
7. Discussion
This research was the first to highlight the importance of the relationship between throughput and storage efficiency, setting the stage for future research on achieving high transactional throughput without sacrificing storage efficiency.
In the current state of the art, different approaches tend to focus on throughput at the expense of storage, or vice versa. For example, while techniques such as sharding and off-chain improve throughput, they also introduce storage challenges. Similarly, storage reduction methods reduce transactional throughput. Our approach shows that it is possible to achieve a balance between the two parameters. For example,
Section 6.1 reveals that the splitting pattern in the UTXO model maintains a high number of operations while using low storage consumption. Thus, exploring techniques based on generating this pattern more intensely instead of others will be favorable in terms of storage requirements. This insight paves the way for new blockchain designs that hold this trade-off, leading to a more scalable blockchain.
7.1. Practical Implications of Transactional Patterns
Unlocking transactional patterns to abstract transactions in a granular manner showcases its applicability across several blockchain research. For instance, the direct relation between inputs and outputs that our model describes enhances traceability analyses. In high-frequency trading environments where private blockchains are used, and storage constantly grows, the splitting patterns could increase throughput by allowing transactions to be executed in parallel. Lastly, new types of transactions could be proposed based on the identified transactional patterns. These innovations could enhance privacy and security in blockchain environments.
7.2. Discussion of Experimental Results
The experimental comparison based on the classification of transactions of 791,800 blocks shows how each pattern grows in storage requirements according to the number of outputs. For example, the splitting pattern, which represents 64.6% of the transactions, shows that its average storage growth per number of outputs is 32 bytes. This flexibility to increase the number of operations at a relatively low storage cost makes this pattern storage efficient.
The transferring pattern, which comprises 22.1% of the transactions, requires around 0.05 MB for approximately 500 outputs, or about 100 bytes per output. This sets it in the intermediate in terms of storage efficiency.
On the other hand, the merging pattern, which represents 13.3% of the transactions, involves the consolidation of multiple inputs into fewer outputs, which is inherently more storage-intensive. This consolidation pattern has an average output size of 128 bytes. Although it is crucial for managing and reducing the number of UTXOs in the system, it also introduces higher storage costs, with transactions that can reach up to 1 MB.
7.3. Future Research
Future research explores models that delineate the relationships among transactional throughput, storage, latency, availability, and reachability. Additionally, future studies investigate different transaction types in Bitcoin to develop methods to optimize storage efficiency.
One strategy for future work is to maintain the balance between the set of outputs in a transaction by identifying transactional patterns. For example, a set of transactions in the mempool could be grouped according to the splitting and merging pattern into a single transaction, similar to a CoinJoin transaction, thus reducing storage requirements and allowing more transactions to be processed per block, increasing throughput.
We anticipate that any method that seeks to increase transactional throughput will also need to consider the storage requirements. Future suggestions from this study could explore the fragmentation of the blockchain through transactional patterns to manage space and carefully increase throughput. We invite other researchers to use the databases [
38] and tools shared in this study to analyze blockchains based on the UTXO model, such as Litecoin, Dogecoin, and Cardano. Future work with these tools will aim to identify transactional patterns of these blockchains and compare them with this study to improve the storage scalability of the system.