With the much-anticipated geth 1.5 (“let there bee light”) launch, Swarm has been incorporated into the formal go-ethereum release as an experimental feature. The latest version of the code is POC 0.2 RC5 — “embrace your daemons” (roadmap), which is an updated and refined iteration of the codebase that was utilized on the Swarm toynet during the previous months.
This version comes equipped with the swarmcommand that initiates a standalone Swarm daemon as an independent process using your preferred IPC-compliant ethereum client if required. Bandwidth accounting (through the Swarm Accounting Protocol = SWAP) ensures smooth functioning and swift content distribution by encouraging nodes to share their bandwidth and relay information. The SWAP system is operational but is deactivated by default. Storage incentives (punitive insurance) to safeguard the availability of infrequently-accessed content are expected to be available in POC 0.4. Thus, at the moment, the client uses the blockchain solely for domain name resolution by default.
Through this blog post, we are excited to announce the initiation of our brand new Swarm testnet linked to the Ropsten ethereum testchain. The Ethereum Foundation is backing a cluster of 35 (which will expand to 105) Swarm nodes operating in the Azure cloud. This setup is hosting the Swarm homepage.
We view this testnet as the inaugural public trial, and we invite the community to join the network, contribute resources, and assist us in pinpointing issues, recognizing pain points, and providing feedback on usability. Guidance can be found in the Swarm guide. We encourage those capable of running persistent nodes (nodes that remain online) to reach out. We have already received commitments for 100TB deployments.
Please be aware that the testnet provides no assurances! Data may be lost or become inaccessible. In fact, guarantees of persistence cannot be provided at least until the implementation of the storage insurance incentive layer (scheduled for POC 0.4).
We envision guiding this project with increasing community engagement, thus we invite those interested to join our public discussion forums on gitter. We aim to establish the foundation for this dialogue through a series of blog posts focusing on the technology and ideology behind Swarm specifically and about Web3 in general. The first entry in this series will introduce the components and functionality of Swarm as it currently operates.
What is Swarm after all?
Swarm is a decentralized storage platform and content distribution service; a fundamental base layer service of the ethereum Web3 stack. The aim is to create a peer-to-peer storage and delivery solution that offers zero downtime, is resistant to DDOS attacks, fault-tolerant, and censorship-resistant, while being self-sustaining thanks to an integrated incentive structure. The incentive layer employs peer-to-peer accounting for bandwidth, offers deposit-based storage incentives, and facilitates resource trading for compensation. Swarm is crafted to closely integrate with the devp2p multiprotocol network layer of Ethereum as well as with the Ethereum blockchain for domain name resolution, service payments, and content availability assurance. Nodes on the current testnet utilize the Ropsten testchain solely for domain name resolution, with incentivisation turned off. The primary goal of Swarm is to deliver decentralized and redundant storage of Ethereum’s public records, particularly for storing and distributing dapp code and data as well as blockchain records.
Two key attributes differentiate Swarm from other decentralized distributed storage solutions. While existing services (Bittorrent, Zeronet, IPFS) permit you to register and share the content you host on your server, Swarm delivers the hosting itself as a decentralized cloud storage solution. There exists a genuine feeling that you can simply ‘upload and vanish’: you upload your assets to the swarm and retrieve them later, all potentially without needing a hard drive. Swarm aspires to be the universal storage and delivery service that, when fully operational, meets use-cases spanning from delivering low-latency real-time interactive web applications to serving as guaranteed persistent storage for scarcely utilized content.
The second notable feature is the incentive structure. The beauty of decentralized consensus of computation and state is that it allows programmable rulesets for communities, networks, and decentralized services that resolve their coordination challenges by deploying transparent self-enforcing incentives. Such incentive models view individual participants as agents acting on their rational self-interest; however, the emergent behavior of the network is significantly more advantageous to the participants compared to the absence of coordination.
Shortly after Vitalik’s whitepaper, the core Ethereum development team recognized that a generalized blockchain is a vital missing component needed, along with existing peer-to-peer technologies, to establish a fully decentralized internet. In May 2014, the concept of having distinct protocols (shh for Whisper, bzz for Swarm, eth for the blockchain) was introduced by Gavin and Vitalik who envisioned the Ethereum ecosystem within the expansive crypto 2.0 vision of the third web. The Swarm project exemplifies a system where incentivization empowers participants to efficiently pool their storage and bandwidth resources to deliver global content services to all contributors. One might say that the smart contracts of the incentives embody the hive mind of the swarm.
An extensive synthesis of our investigation into these matters resulted in the publication of the first two orange papers. Incentives are further elucidated in the devcon2 presentation regarding the Swarm incentive structure. Additional information will be provided in forthcoming articles.
What is the operation of Swarm?
Swarm functions as a network, a service, and a protocol (set of rules). A Swarm network is made up of nodes utilizing a wire protocol called bzz that employs the ethereum devp2p/rlpx network stack as the foundational transport. The Swarm protocol (bzz) specifies a mode of interaction. At its essence, Swarm implements a distributed content-addressed chunk repository. Chunks are arbitrary data blobs with a fixed maximum dimension (currently 4KB). Content addressing signifies that the address of any chunk is deterministically derived from its content. The addressing framework relies on a hash function which takes a chunk as input and yields a 32-byte long key as output. A hash function is irreversible, collision-free, and uniformly distributed (this is indeed the principle that underpins bitcoin and, in general, proof-of-work).
The hash of a chunk serves as the address that clients can utilize to retrieve the chunk (the hash’s preimage). Irreversible and collision-free addressing immediately affords integrity protection: regardless of how a client is aware of an address, it can discern if the chunk is intact or has been altered simply by hashing it.
Swarm’s primary feature as a distributed chunk store is that you are able to upload content into it. The nodes comprising the Swarm all allocate resources (disk space, memory, bandwidth, and CPU) to store and deliver chunks. However, what establishes who is maintaining a chunk? Swarm nodes possess an address (the hash of their bzz-account) within the same keyspace as the chunks themselves. Let’s refer to this address space as the overlay network. When we upload a chunk to the Swarm, the protocol assures that it will end up being housed at nodes that are nearest to the chunk’s address (based on a well-defined measure of distance in the overlay address space). The method by which chunks reach their address is known as syncing and is an integral part of the protocol. Nodes that subsequently desire to retrieve the content can locate it again by sending a query to nodes that are near the content’s address. In fact, when a node requires a chunk, it simply submits a request to the Swarm with the address of the content, and the Swarm will relay the requests until the data is discovered (or the request times out). In this regard, Swarm resembles a conventional distributed hash table (DHT) but incorporates two critical (and underexplored) characteristics.
Swarm employs a set of TCP/IP connections wherein each node maintains a set of (semi-)permanent peers. All wire protocol communications between nodes are transmitted from node to node via active peer connections. Swarm nodes proactively manage their peer connections to sustain a specific set of links, which facilitates syncing and content retrieval through key-based routing. Consequently, a chunk-to-be-stored or a content-retrieval-request message can always be efficiently guided along these peer connections to the nodes that are closest to the content’s address. This variation of the routing schema is referred to as forwarding Kademlia.
When combined with the SWAP incentive mechanism, a node’s rational self-interest prompts opportunistic caching behavior: the node saves all relayed chunks locally so that it can serve them the next time they are requested. As a result of this behavior, popular content tends to be replicated more redundantly across the network, effectively reducing retrieval latency – this phenomenon is often referred to as Swarm being ‘auto-scaling’ as a distribution network. Moreover, this caching behavior alleviates the original custodians from potential DDOS attacks. SWAP incentivizes nodes to cache all content they encounter until their storage capacity is filled. In reality, caching incoming chunks of average anticipated utility is a consistently beneficial strategy, even if it requires expunging older chunks. The most reliable indicator of demand for a chunk is the frequency of requests in the past. Therefore, it is logical to discard chunks that were requested the longest time ago. Consequently, content that becomes unpopular, outdated, or was never popular at all will be garbage collected and removed unless safeguarded by insurance. The outcome is that nodes will maximize the use of their dedicated resources for the benefit of users. This natural auto-scaling transforms Swarm into a type of maximum-utilization elastic cloud.
Documents and the Swarm hash
Having explained how Swarm operates as a distributed chunk repository (fixed-size preimage archive), you might wonder, where do chunks originate from and why is it important?
At the API layer, Swarm offers a chunker. The chunker accepts any kind of readable source, like a file or a video capture device, and slices it into fixed-sized chunks. These so-called data chunks or leaf chunks are hashed and subsequently synced with peers. The hashes of the data chunks are then compiled into chunks themselves (termed intermediate chunks), repeating the process. Presently, 128 hashes constitute a new chunk. Consequently, the data is represented by a Merkle tree, and it is the root hash of this tree that serves as the address for retrieving the uploaded file.
When you retrieve this ‘file’, you look up the root hash and download its preimage. If the preimage is an intermediate chunk, it is interpreted as a sequence of hashes directing to chunks on a lower level. Ultimately, the process reaches the data level, and the content can be served. A crucial feature of a Merkleized chunk tree is that it provides integrity protection (what you seek is what you get) even during partial reads. For instance, this means that you can navigate back and forth in a large movie file and still remain confident that the data has not been altered. Advantages of utilizing smaller units (4kb chunk size) incorporate parallelization of content retrieval and reduced wasted traffic in the event of network disruptions.
Manifests and URLs
On top of the chunk Merkle trees, Swarm delivers a vital third layer of content organization: manifest files. A manifest is a JSON array of manifest entries. An entry minimally describes a path, a content type, and a hash linking to the actual content. Manifests allow for the creation of a virtual site hosted on Swarm, enabling URL-based addressing by always assuming that the host section of the URL refers to a manifest, and the path is aligned with the paths of manifest entries. Manifest entries can direct to other manifests, thus theycan be recursively nested, enabling manifests to be defined as a compressed trie effectively accommodating vast datasets (e.g., Wikipedia or YouTube). Manifests may also be perceived as sitemaps or routing tables that connect URL strings to content. Since each step along the path involves either merkelised structures or content addresses, manifests offer integrity assurance for an entire site.
Manifests can be accessed and directly navigated using the bzzr URL protocol. This application is illustrated by the Swarm Explorer, an example Swarm dapp that showcases manifest entries as if they were files on a disk organized within directories. Manifests can be effortlessly interpreted as directory structures so that a directory and a virtual host may be regarded as equivalent. A straightforward decentralized Dropbox implementation can be grounded on this characteristic. The Swarm Explorer is available on Swarm: you can utilize it to navigate any virtual site by inputting a manifest’s address hash into the URL: this link will exhibit the explorer navigating its own source code.
Hash-based addressing is unchangeable, which signifies that you cannot overwrite or modify the content of a document under a static address. However, since chunks are synchronized to other nodes, Swarm is immutable in the more robust sense that once something is uploaded to Swarm, it cannot be unseen, unpublished, revoked, or deleted. For this reason, exercise extra caution with what you share. Nevertheless, you can alter a site by developing a new manifest that includes new entries or omits outdated ones. This action is inexpensive since it doesn’t necessitate relocating any of the actual content referenced. The photo album serves as another Swarm dapp that illustrates how this is accomplished. the source on GitHub. If you wish for your updates to exhibit continuity or require an anchor to display the latest iteration of your content, you will need name-based mutable addresses. This is where blockchain, the Ethereum Name Service, and domain names come into play. A more comprehensive method to monitor changes is to utilize version control, like Git or Mango, a Git utilizing Swarm (or IPFS) as its backend.
Ethereum Name Service
To authorize changes or publish updates, domain names are essential. For an effective domain name service, you require the blockchain and some governance. Swarm employs the Ethereum Name Service (ENS) to resolve domain names into Swarm hashes. Tools are available to interact with the ENS for acquiring and managing domains. The ENS is vital as it acts as the bridge between the blockchain and Swarm.
If you utilize the Swarm proxy for browsing, the client presumes that the domain (the segment following bzz:/ up to the initial slash) resolves to a content hash through ENS. Owing to the proxy and the standard URL scheme handler interface, Mist integration should be wonderfully straightforward for Mist’s official launch with Metropolis.
Our roadmap is ambitious: Swarm 0.3 comes with a significant rework of the network layer and the syncing protocol, obfuscation and double masking for plausible deniability, Kademlia-routed peer-to-peer messaging, enhanced bandwidth accounting, and extended manifests with HTTP header support and metadata. Swarm 0.4 is scheduled to include client-side redundancy with erasure coding, scan and repair with proof of custody, encryption support, adaptive transmission channels for multicast streams, and the eagerly awaited storage insurance and litigation.
In upcoming posts, we will delve into obfuscation and plausible deniability, proof of custody and storage insurance, internode messaging, the testing and simulation framework for the network, and more. Stay tuned, bzz…
