Deanonymization So the big concern about decentralization that we look at in regards to deanonymization is that now, we can go back in the blockchains history to reveal information about a particular pseudonym. So thats the goal of deanonymization, or linking One way we can achieve this which we alluded to earlier is by transaction graph analysis, which is simply just inspecting the transaction history in the blockchain to derive useful information. Particularly, we can construct transaction graphs, like you can see on the right side of the screen. On a transaction graph, each node is a pseudonym, and each edge is a transaction conducted between pseudonyms. From a transaction graph, you might be able to see some pseudonyms make transactions more than others, or are paid more than others, or perhaps make certain transactions with certain other pseudonyms.
One way of analyzing the transaction graph is by clustering, or attributing a cluster of addresses or pseudonyms to the same real world entity. Taking what we know so far, we can identify two main heuristics in associating addresses together. The first is the merging of transaction outputs, and that occurs when there are multiple inputs to a transaction.
For example, consider Bob, who wants to buy a coffee that costs 0.05 BTC, and has two outputs, one with 0.02 BTC and the other with 0.03 BTC. He merges the two outputs into one thats 0.05 BTC, enough to pay for his coffee. This is a fairly reasonable heuristic because its often the case that outputs are merged by the same entity.
Rarely do people conduct joint payments. Another heuristic is that of change addresses. Say Bob wants to buy the same 0.05 BTC coffee the next day, but only has an output worth 1 whole BTC.
Bob would send 0.05 BTC to the coffee shop, and the rest of the 0.95 BTC to himself at a change address. This is fairly reasonable because in looking at Bobs transaction history, one of his two outputs must have been to a change address, unless he had purchased two items at the same time. And also, we could also look at whether addresses have been associated with any previous transactions. As per best practice, change addresses are usually newly generated, so when Bob makes the transaction to buy coffee, he would be sending his change back to an address never before seen on the blockchain something that we can easily identify.
In both cases, of merging transaction outputs or of change addresses, if we know that Bob owns one address, we can guess with high confidence that Bob owns the other associated address. So we use these two heuristics to link all these addresses to one single person. So through this way, we could identify clusters. We talked about heuristics. Now well go over several techniques for identifying which cluster is who linking clusters with their real world identities.
Businesses at least those that accept cryptocurrency payments are outwards facing and consumer centric, making it easy to go to an online service such as Coinbase and make a transaction with them. Since we know our own public addresses, we could simply wait for the transaction we made to show up within a cluster, or be merged into a cluster, and that cluster would likely be that of the business. This tactic is called tagging by transacting. On the other hand, theres a much more passive approach. We could just look at the graph and infer by looking at transaction activity.
In 2013, Mt. Gox was a large part of the entire Bitcoin ecosystem, and composed much of the entire transaction volume. The graph to the right shows the Bitcoin transaction graph from 2013, and the purple dot towards the right of the graph is Mt.
Gox. Similarly, SatoshiDice was a gambling site that allowed users to gamble with small denominations of Bitcoin. This made for many transactions, though the total transaction volume wasnt nearly comparable to that of Mt. Gox.
So, the dot represented by SatoshiDice on the graph is very small. However, there were a lot of transactions, so in the graph, its easy to see that though the transaction volume is small, the transaction frequency was quite high. And this is true for any business or identity for which you can leverage some preexisting knowledge of. If you have some leads on transaction volume or frequency or timing, then you could look at transaction graphs and make solid linking inferences. As for identifying individuals, there are similar ways to deanonymize them.
An easy way is to send them Bitcoin. If you can manage to get them to reveal their address, its not that difficult to track them from there. This may require some social engineering if the other party is suspicious or particularly cautious.
Another way is to watch online activity, particularly forums. Its possible that an individual might post their address on a forum for convenience carelessly in order to get donations from general people or even provide services. Anyone who is watching, however, can now link that pseudonym with any other activity.
Finally, several service providers, such as Coinalytics, offer services to deanonymize funds obtained through illicit means, using data analytics to discover your real identity. Taint analysis is one way of easily tracing the movement of funds through the Bitcoin network. Taint analysis allows one to tag a bad address and trace its associated activity. It was this type of strategy that ruined Ross Ulbrichts defense by demonstrating that a majority of his funds originated from suspicious origins.
As seen in the diagram, each of the red circles at the top represents an address with 100 taint, meaning that it has either been denoted as a dirty address or has received all its funds from dirty addresses. Any other address in the Bitcoin space will have a certain amount of taint depending on what proportion of its funds came from a dirty address. One might think that they can circumvent getting caught by sending their tokens to a bunch of random addresses.
However, by design of taint analysis, that wont work at all.