At this point, it is well known that the SCOTUS decision to overturn Roe v. Wade introduces new digital surveillance risks. Protecting oneself in the age of surveillance capitalism has become increasingly relevant in recent years, but the removal of the constitutional right to abortion creates new implications. Since online data is frequently used as legal evidence to implicate individuals in crimes, it may now be used in court to incriminate those seeking abortion in states enforcing the ban. Thus, women who wish to access out-of-state abortions are advised to be covert.
One widely suggested strategy to avoid being monitored advises those who can get pregnant to delete their period tracking apps, use Signal to communicate, and to only use cash to purchase feminine products and pregnancy tests. While these are all helpful tips, they are simply not enough, since data from all corners of digital life contains revealing correlations that have the potential to be exploited by law enforcement and judicial systems. Data subject to scrutiny includes everything from app usage, browsing history, search queries, communications across unsecured and unencrypted channels, and more.
As the internet has grown, so has the underlying technology and network complexity used in its implementation. However, internet literacy has not evolved at the same rate, meaning that the vast majority of users do not meaningfully understand its risks, let alone know how to mitigate them. It is vital, now more than ever, to help bridge this knowledge gap so that individuals, especially those most vulnerable, are equipped to make informed choices online. To that end, I put together a manual for non-technical friends wishing to understand how their data moves around the internet, what parts of it are visible, why its visibility is dangerous, and what tools you can use to hide sensitive information. I decided to publish it here in case it can help someone else become familiar with the basics of internet architecture.
Table of Contents
- What is encryption?
- Encryption at rest and in transit
- “End-to-end” encryption is not full encryption
- How do I ensure my data is encrypted?
1. What is encryption?
Encryption provides confidentiality by taking plaintext, readable data (e.g. “pizza”) and scrambling it into a format unreadable to humans (e.g. “⧫︎♒︎♓︎⬧︎♓︎”). Because anyone can read unencrypted data, it is not private. This is a security risk since anyone who can exploit sensitive information, either by viewing it or tampering with it, can view it easily.
2. Encryption at rest and in transit
Encryption methods vary depending on whether data is at rest or in transit. Data at rest refers to data that is saved on your device, whereas data in transit refers to the data circulated over the web.
- Encrypting data at rest. Securing data at rest prevents the possibility of information on a given device from being compromised if it were stolen or physically accessed. A combination of disk encryption as well as file encryption. Disk encryption encrypts all data on a drive, including metadata, but does so using one single key, and allows all data to be decrypted at run-time. File encryption uses multiple keys, so even if an attacker were to gain access to a running system, they may not be able to access the files and directories that would remain encrypted.
- Encrypting data in transit. Someone does not need to be in possession of your physical machine to be in possession of your data. Securing data in transit prevents information from being compromised as it moves around the internet. When you interact with a website, or DM a friend on Twitter, this data must travel through many systems. The global internet is essentially a complex web of many smaller interconnected networks comprising of many machines. As this information passes through these machines to reach its destination, its flow is defined by a series of protocols. Some of these protocols encrypt parts of this data transfer, but not all of it. To really protect your data, you need to ensure all of it is encrypted.
2. “End-to-end” encryption is not full encryption
What parts of the exchange are encrypted
The amount of information that is visible in plaintext depends on the context in which someone is trying to access your data, and how. As an example, say you are browsing the internet on an unsecure public Wi-Fi connection at an airport or coffee shop. Someone sharing the same wireless connection may not be able to see the exact URLs or specific web pages you visit, or view any plaintext data returned from the server, but they will be able to see information about the messages sent, such as the size of data transmitted, when it was sent, from and to whom. While it may not seem like this information is enough to warrant worry, it can be used to make inferences about the communication, and ultimately determine exact URLs visited.
Why is this technically possible
This is doable due to the internet being organized according to a “layered” abstraction, where each layer has its own set of protocols. Most messaging apps and all secure (
https) websites that tout “end-to-end encryption” use a protocol known as TLS, which sits above the transport layer. TLS alone, however, is not sufficient, since it does not encrypt the IP header: a part of the exchange at that holds information such as source and destination IP addresses. This is because the IP header is at the network layer, unprotected by the transport layer protocol. Knowing the IP address you are contacting, an onlooker can use a reverse-DNS lookup to determine which web server you are contacting, and use that to find its hostname (i.e., the name of the website).
How easy it is to determine the name of the website depends on how specific the DNS information is. If the name of the webserver maps directly to the domain name (i.e., the webserver has a single DNS entry mapping to the website), an onlooker can determine exactly which websites you are visiting. By contrast, if the webserver has multiple DNS entries and thus maps to a variety of hostnames, a hacker or surveillance agent may still determine the domain name if the client uses Server Name Indication (SNI) and the plaintext hostname is visible in something known as the SSL handshake.
The risk of relying on TLS alone
Since different sites hosted by the same server may vary in size, so comparing the size of data transferred with the web server correlating to the destination IP address, it could be possible to narrow down which websites you are browsing. In addition to this metadata being visible in plaintext, most individuals today have a significant amount of publicly available information, most of which is contributed to through the use of social media accounts. This information, in addition to any targeted network surveillance, can be used to form a reasonably comprehensive understanding of an individual.
Beyond someone snooping for metadata on a shared network connection, information may also be exfiltrated through compromised endpoints or through vulnerable intermediaries. As data travels through various machines online, an intermediary system may contain a vulnerability that enables decryption before the packet has been received by its end destination.
3. How do I ensure my data is fully encrypted?
You can ensure full encryption by either using a VPN, Tor, or an encrypted SSH tunnel. Since the more commonly used approach involves VPN or Tor, I will focus the discussion on briefly explaining how these technologies work, and summarizing the key differences between the two.
A Virtual Private Network uses additional encryption. IPSec, a commonly used protocol implemented by VPNs, encrypts the IP header and the payload, meaning the IP header is also encrypted. A trusted IPSec VPN in tunnel mode allows your data to travel through an encrypted tunnel to its point of termination, a VPN server that decrypts your traffic before sending it to its final destination (the webserver).
The webserver at the final destination only sees the IP address of the VPN server, without possessing any knowledge of origination or intermediate nodes, making it difficult to trace back to your source IP address. The Encapsulating Security Payload (ESP) protocol also adds a new IP header to the packet, ensuring the actual destination IP address of the webserver is not revealed. Thus, this makes it more difficult for someone to determine the hostnames by correlating IP addresses, making it harder to see what websites you visited.
Tor is an anonymity network that can enable users to access TCP-based applications (such as websites) while making them resistant to a hacker’s attempts to see what they are browsing.
When accessing a website whilst using Tor, your data is sent through a series of nodes (known as onion routers) connected over a circuit. Your client negotiates a separate encryption key through the Diffie-Helman Key Agreement with each node on the circuit that the data must pass through. Each successive node the data passes through provides an added layer of encryption that gets “unwrapped” by the correct key at the corresponding node.
Each node can only see the nodes immediately before and after it but has no other information about the circuit path. Even your internet service provider would only know about the node in the first hop without any awareness about the path your packets take through the circuit, carefully ensuring the destination they travel to is masked. Read more here.
Tor vs VPN
In comparison to VPNs, the security guarantees of Tor are slightly different. Because Tor upholds the receiver anonymity property, you do not need to inherently trust the Tor nodes, whereas the VPN service you use must be one that is trusted, since VPN servers could be compromised. Tor would, however, introduce new attack vectors. While the layered encryption could prevent passive traffic analysis, if a motivated party is able to compromise keys, potentially by infecting your device with malware, or if someone is capable of running hostile routers in the Tor network, plaintext information could still be accessed.