I-DELTA Security and Privacy Perspective
This is Part 8 in a series on the I-DELTA project. Read Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8.
Türkçe için buraya tıklayınız.
Privacy vs Security
The main security services provided by blockchain is integrity, availability, and fault tolerance. Privacy is not provided by design in many implementations [30]. However, privacy is usually a misunderstood subject. Most of the analyses take the cryptocurrency implementations into consideration. For these DLTs, there is the transparency of records as public ledgers are used in these implementations. However, these public records also do not keep any personal or private data. Indeed, such data should not be kept in private ledgers. Furthermore, it is not a common practice, especially in enterprise DLT implementations. Different implementations have different levels of privacy [29, 31]. Studies on privacy protection in blockchain are summarized in [29].
The problems of privacy and security are exacerbated with interoperability. As mentioned before, different DLTs may have different design rationales and characteristics. Hence, the privacy level or the decision of keeping the transactional data secret may differ from one DLT to another. Interoperability, enabling multiple DLTs working together, requires an utmost concern and advanced cryptographic techniques to guarantee the privacy and security levels of each DLT on common ground without wasting resources. This is why understanding the current problems, as well as the solutions in detail is important from the interoperability point of view.
Problems and Challenges
There are privacy challenges in collecting and storing personal data; PII (personally identifiable information) and PHI (Protected health information). The users should have trust in the system that their privacy is preserved. International laws like GDPR (General Data Protection Regulation of the EU - EU General Data Protection Regulation 2016/679) and national laws like KVKK (“Kişisel Verileri Koruma Kanunu” of Turkey) have strict regulations on the privacy of the citizens. The software developers of such systems are also responsible for these regulations [32]. There are challenges in satisfying Individuals' rights:
● Transparency and information
● Access, rectification, erasure
● Objection
● Automated individual decision-making
● Portability
There can also be restrictions on processing and exceptions to rights. The main challenges can be given as follows [33]:
● The approval of the share: The user should have the means to control the personal data and should know how that data is shared.
● Security of the collected data: The organizations are responsible for the security of the collected data. The data should not be revealed (when compromised) or sold to other parties.
● The right to be forgotten: The user can ask for the removal of the personal data and there should be some means to accomplish that.
Even when direct personal information that can be linked to a real identity is not stored on a DLT, depending on the use-cases, privacy still becomes an important issue. The pseudonyms, i.e., the DLT addresses provide limited privacy protection which can be defeated by using techniques such as transaction network analysis and behavioral analysis. Furthermore, regulations (e.g., anti-money-laundering) and policies such as KYC make these techniques easy to apply. For instance, when the user once uses his/her ledger address through an internet connection, the server can link the IP address with the DLT address. While sending a transaction to the DLT network, the broadcast and relay information can be used for the same task. In the literature, there are studies on the possibility of breaking the underlying P2P structure and decentralized protocols via classical attack patterns such as Denial-of-Service [41] and Sybil [42] attacks.
For better privacy protection, one can use multiple addresses, i.e., multiple pseudonyms. Even when multiple pseudonyms are used by the same person, the transaction network can be analyzed via advanced ML (e.g., clustering) techniques to link these addresses to each other. This analysis can further be improved by using behavioral patterns such as transaction time, day, hour, amount, etc. Especially for cryptocurrencies where fungibilityis an important concern, privacy is not only crucial for keeping the identities secret, but also it is necessary to keep the value of each token the same. For instance, there is a chance for tokens produced/transferred by a transaction funded from an unethical/illegal action/source to be denied by some markets, authorities, etc. This will lessen the tokens' convertibility and hence, their value. It will also damage the internal structure and affect the integrity of the corresponding cryptocurrency.
None of the above-mentioned problems are related to the secrecy of the data. DLT transactions form the network. They establish relations between the addresses and pseudonyms. This will happen regardless of the readability level of the information they carry. However, for some applications, one may want only the desired parties to read the data stored on the ledger including the assets (e.g., BTC, ETH, ALGO balances, retirement fund amount, financial information stored on transactions) and history (previous supply-chain steps, agreement details, electronic health records, etc.). Indeed, when the data is not intended to be used for validation purposes, it is a good practice not to store such data in the ledger. Instead, those data can be kept in encrypted form on legacy systems and the cloud [33]. However, when validation is necessary they must be processed by the smart contracts. Furthermore, they must be in an encrypted form and still be processable and verifiable. For instance, one should not be able to transfer an asset (e.g., crypto tokens) she does not own, and this must be verified by the third parties in DLT who do not know she had it.
Current Solutions and Approaches
As stated above, regardless of the data being encrypted or not, transactions form a network whose topology can be exploited via machine learning tasks leveraged to link multiple pseudonyms as well as real-life identities. The best practice to avoid this is using each address only once. Although this solves many linkability problems, frequent payment patterns can still leak information when advanced techniques are used. The DLT can be more resistant to such techniques via practical solutions such as mixing [43]. A mixing service works as a post-office; each message/transaction, containing the data and a recipient address is encrypted by the mixing service's public-key and sent. The mixing service, who can decrypt the message, acts as a relay node, and directs the transaction to its real recipient. Assuming that the service has a large number of users and hence receiving many messages, it is easy to obfuscate the output message order so that it is not mappable to the input order.
In a DLT, the implementation of an effective and efficient mixing service is not an easy task. The service must be secure (no theft, double spending, DoS resistance), provide a good level of anonymity (large user base, unbiased randomness), deniable (plausibly deniable, no cryptographic evidence), scalable (cost-efficient, can support a large number of users), and prevent misuses (e.g., money laundering). Various mixing protocols have been proposed and employed for cryptocurrencies. The main idea is transferring the tokens to the mixing service addresses first and let the service either relay them to the actual recipient or provide a voucher to be redeemed later.
● In centralized mixing, all the mixing protocol is controlled by a trusted third-party. Such a scheme has many risks including counterparty risk, i.e., the mixer can steal funds, logging risk, i.e., the mixer can log the transaction details, centralization risk, i.e., the mixing service is a single point of failure, can be hacked, controlled by an adversary, etc. Mixcoin [44] and Blindcoin [45] are two examples of centralized mixing protocols. The former adds a signature-based accountability mechanism to expose theft so that users can unambiguously prove if the mixer has misbehaved. Besides, the latter uses the idea of blinding to keep the input-output address mapping hidden from the mixing service.
● In altcoin exchange mixing, the tokens, e.g., BTC, first sent to an exchange and spent to buy a different token, e.g., ETH. These tokens are then sent to another exchange and the operation is repeated multiple times. In the last step, the original token, e.g., BTC, is bought. The difference between the initial amount and the final one, which is spent as exchange fees, is the cost of mixing service. Although this process is plausibly deniable (just a set of financial transactions) and seems to be untraceable due to the involvement of many exchanges, a powerful authority can still link the addresses to real identities due to the KYC regulations.
● Fair-exchange mixing leverages advanced cryptographic fair-exchange protocols to avoid coin theft. In such a service, each asset can be fairly exchanged with a voucher which later can be redeemed. A fair-exchange protocol is a multi-party protocol where parties accept to deliver an item if and only if they receive an item in return. Assuming such a service is used by many users at a time, finding the input-output mapping of the received and sent assets becomes hard. Although such schemes have stronger guarantees, it is not straightforward to implement them since they require a large amount of computation and a sufficient amount of asset transfer (e.g., liquidity for the assets are cryptocurrencies). TumbleBit [46] and Xim [47] are two examples of centralized fair-exchange mixing protocols.
● Decentralized mixing protocols try to remove counterparty risks and avoid fees by taking out the middleman (the centralized mixer). The idea is creating a network of peers who cooperate to make transactions that mix the assets, without relying on a trusted third party. CoinJoin, for instance, exploits the fact that a Bitcoin transaction can have multiple inputs and outputs [48]. Thus, although the anonymity set (i.e., the set of possible output addresses for a single input) is usually small, a single transaction can be used for mixing purposes. That is one of its inputs may be said to be directed to any of its outputs. Furthermore, the multisignature support (i.e., signing by multiple parties) in Bitcoin scripting can make multiple participants be in control of the process. Hence, the participants can avoid the counterparty risks of centralized mixing services. A participant only signs the transaction when it is correct, i.e., when one of the outputs is the desired output for the participant. However, without having private and anonymous communication channels for submitting the output addresses, the CoinJoin protocol is vulnerable to traffic analysis. Furthermore, since multi signature transactions with a large number of participants are not frequent in Bitcoin, these transactions are not plausibly deniable. Also, the protocol is not DoS-resistant since even a single participant can stop the protocol if s/he chooses not to sign her/his part. The CoinJoin idea is currently employed by cryptocurrencies such as DASHwith extensions [49]. Another mixing protocol, CoinShuffle borrows the idea of using multisignatures and enhances it by a privacy-preserving, cryptographic shuffling protocol [50]. The protocol simply makes every participant shuffle the output addresses provided by the previous participants without seeing them. Hence, they cannot be mapped to input addresses.
Another way to hide the addresses from curious adversaries is a ring signature [50]. A ring signature is created using a group of keys formed by the participant's key, and a number of other public keys chosen by her/him. When the transaction is broadcasted, a third party can verify that it is signed by one of the private keys corresponding to the public keys in the group. However, the third party cannot know which key is used. That is a ring signature hides the sender/source of a transaction among a group of possible senders. Ring signatures are being used by popular privacy-focused cryptocurrencies such as Monero [51] along with other countermeasures such as one-time addresses.
As protecting the privacy of the participants, keeping the data secret is also a hard problem. As mentioned before, the read access to the data can be restricted by storing the details of the transactions "off-chain". That is another isolated system, database, storage, etc. can be used to store the plain data where only the hash (a short, compressed form of the data obtained via a one-way function with cryptographic guarantees) of the transaction details are kept "on-chain". This is a common practice since the transaction details cannot be obtained from the hash functions. In fact, if the data is not crucial for verification purposes, e.g., personal details of BTC users, this is believed to be the best practice. However, when "off-chain" storage must be used for verification and validation, this is not a straightforward task. Indeed, when required, a participant with read-privileges on the isolated storage can check whether the hash is generated from the correct transaction data. However, "off-chain" storage duplicates the "sources" of truth which is the main responsibility of a DLT. How the isolated storage is maintained is another problem; having a trusted-third party totally negates the benefits of a DLT. Hence, each party can keep and manage her/his own transaction data and grant access privileges to the required parties. Unfortunately, this is also against the fundamental motivations of using a DLT. This is why cryptographic techniques are mostly used to have confidential transaction support on blockchains [32]. These techniques, which have been mostly proposed for other applications, can be summarized as follows:
● (Fully) Homomorphic Encryption (HE) For instance, let E(4) and E(5) be the ciphertexts for the values 4 and 5, obtained by a HE scheme with the encryption function E. Then one can have E(9) = E(4 + 5) or E(20) = E(4 x 5) without performing any decryption operations and performing the arithmetic on the range of the E, i.e. the encrypted domain. The use of HE on a distributed ledger unleashes exciting opportunities. For instance, with HE, a blockchain-based cryptocurrency can be implemented while hiding all the account balances from the public since each transaction can be performed by modifying the account balances and without revealing them.
The first FHE scheme is proposed by Craig Gentry [35] followed by more practical schemes such as BGV, FV, CKKS, and TFHE [36-39]. While TFHE implements fast "bootstraping'', which supports fully homomorphic operation; today, most homomorphic encryption schemes provide, in fact, "somewhat" homomorphic encryption (SWHE) capability, which simply means the number of homomorphic operations that can be applied on ciphertext is limited and the decryption is not possible if the limit, which is often referred as the noise budget, is exceeded. In particular, when a plaintext data is encrypted, the corresponding ciphertext contains a noise term, which increases after every homomorphic operation, i.e., addition or multiplication, over the ciphertext. Homomorphic multiplication is much more costly than homomorphic addition in terms of the noise budget. Fortunately, for many applications such as cryptocurrencies and digital asset management, addition is the only, if not, the most frequent arithmetic operation used. That being said, homomorphic computation is practicable for all (arithmetic) circuits with low multiplicative depth and high-degree of parallelism. Therefore, there is an interest in the literature for research on novel (parallel) algorithms resulting in low-depth circuits for efficient homomorphic computation, even for rudimentary operations such as sorting [40] which can enlighten new avenues for the use of HE in DLT-based applications.
● Secure Multi-Party Computation (SMC) leverages advanced cryptographic primitives and protocols for the secure computation of functions over private inputs of two or more parties. At the end of the protocol, parties learn their outputs and nothing else except for what can be learned in the ideal world. In a possible application, the inputs to SMC are (multiple) data items stored on a DLT, or inputs coming from several participants who may not know and trust each other. These inputs will be processed to obtain a final result that will be revealed to the participants or to the public.
In theory, the participants of an SMC protocol can be modeled as semi-honest, malicious, and covert adversaries. These models are closely related to the public/private, permissioned/permissionless characteristics of the underlying DLTs. For instance, a semi-honest adversary is a participant that follows the protocol but tries to learn from the exchange messages. Such a participant can exist even in private DLTs. In fact, one must assume that such participants always exist. A malicious party can behave in arbitrary ways to learn more about other parties' inputs. These participants exist and create security issues in permissionless DLTs. However, even for permissioned DLTs, one needs to have a monitoring and identification mechanism/tool for malicious actions. Once this exists, the read/write accesses of the participants who are detected to act maliciously can be revoked.
SMCs that are secure against malicious parties are much harder to construct and more resource-demanding than those against semi-honest adversaries. As a compromise between these two extremes, covert adversaries are allowed to cheat but get caught by honest participants with some probability. If this probability is sufficiently large, the cheating party is deterred as it is reluctant to lose its reputation. SMC protocols rely on oblivious transfer~(OT) protocols and FHE. With a carefully selected adversary model depending on the application requirements, DLTs with various security guarantees can be implemented for decentralized applications.
● Zero-Knowledge Proofs (ZKP) are used when someone wants to prove the correctness of a statement without revealing any other information other than the statement is correct. In a DLT, especially when assets/tokens/etc. are transferred from one address to another, the ownership of a sufficient amount of resources must be verified. With zero-knowledge proofs, the sender can prove that s/he has enough resources without revealing her/his actual balance, i.e., the amount of resources s/he has. Formally, a zero-knowledge proof has three important properties:
● Completeness: The verifier will be “always” convinced if the statement is correct.
● Soundness: The verifier will "never" be cheated if the statement is not correct.
● Zero-Knowledge: The verifier will not learn anything else except whether the statement is correct or not.
Although the zero-knowledge proofs are usually designed as interactive protocols, techniques such as Fiat-Shamir heuristic [52] are frequently used to make them as non-interactive. These protocols, which have many applications in practice, are called Non-Interactive ZKPs (NIZKs). However, due to the computation required to generate and/or verify such proofs, NIZKs can incur significant computational overheads and can be a burden to implement a scalable system. For DLTs having many transactions, this limits the use of traditional NIZKs in the literature. This is why many researchers have been trying to reduce the proving and verification complexity of NIZKs.
In the literature, the terms SNARGs (succinct non-interactive adaptive arguments) and SNARKs (succinct non-interactive adaptive argument of knowledge) have been used to formally define efficient and computationally sound proofs (of knowledge). A succinct, e.g., short, cheap in communication and computation, NIZK is, therefore, a SNARK and called as zk-SNARK. The use of zk-SNARKs in Bitcoin has been discussed by Ben-Sasson in 2013 [53] and the Zerocash protocol, which is also based on zk-SNARKs, is proposed in 2014 to have decentralized anonymous payments [54]. Bulletproofs, which can prove that a committed value is in a range with a logarithmic complexity of the range length, are proposed in 2017 [55]. Furthermore, in 2018, the zk-STARK protocol was introduced which uses a setup with no trusted parties, quasi-linear proving time, and poly-logarithmic verification time [56]. Thanks to these attempts, these cryptographic marvels have been currently in use in cryptocurrencies such as Zcash, which is based on the Zerocash protocol and zk-SNARKs [57].
This is Part 8 in a series on the I-DELTA project. Read Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8.
Türkçe için buraya tıklayınız.