This post is an attempt to give a tour of the Tezos code base and its state of development. All of the functionality described in the whitepaper has been implemented to this date, except for gas metering. Most of the remaining work consists in finishing a security addition that we made to the network shell to increase its resilience to DDOS, optimizing smart-contract storage, and-most importantly-testing our network on a large scale and performing external security audits.
We give a high-level overview, with a low-level emphasis on the remaining development to launch the Tezos network
(highlighted like this). This approach is both more transparent and more objective than giving timeline estimates about the completion of the project. Most links in this document point to directories in our Github repo where those features are implemented.
Broadly speaking, we have four different services, the node, the client, the baker, and the evil client.
The main program is the Tezos node. It connects to the network, relays blocks and transactions, validates transactions, and maintains a locally accessible view of the globally consistent public ledger. Think of it as the equivalent of bitcoind with generate set to off.
The client interfaces with the node through its JSON HTTP API to provide a more convenient command line interface to access the ledger. In particular, it handles keys, remembers contract addresses, etc. You might call it our wallet.
The baker (which is technically split between the baker and the endorser and started from the client) monitors the network and creates blocks (or attests to the validity of other blocks through a signature) when the proof-of-stake algorithm calls upon it to do so.
The attacker is an integration test tool. Its goal is to implement various attacks (sending a large amount of data, fake data, trying to trick a node into doing several reorganizations or validating a very long chain). It is philosophically inspired by Netflix's Chaos Monkey.
We also have a webclient, which contains a rudimentary version of the client that can be accessed locally in the browser, but we're rethinking this in favor of putting more logic on the client side, following Ethereum's approach with web3, so I won't really cover it in this post. We do have UX / UI wireframes for it too.
The alpha protocol, i.e. the protocol Tezos will initially ship with, is compiled within the node, but since it is extremely modular and concerns itself with problematic which are largely orthogonal to the concerns of the node, we give it its own section. When we talk about a protocol in Tezos, we are not referring to a wire protocol, but to a blockchain protocol, defined by its transaction semantic and its consensus algorithm as described in the white paper.
The node connects to the rest of the Tezos network through a peer to peer network layer. It validates and relays blocks and transactions, maintains the state of the ledger, and interfaces with the current protocol to interpret the semantic of the various operations it encounters.
Both the peer-to-peer layer and the network shell export their RPC which is useful for building block explorers or debugging network connectivity issues.
1.1. Peer-to-peer layer
The peer to peer layer is the outermost layer of a Tezos node. It communicates with other Tezos nodes through a peer-to-peer gossip network. Beyond its basic functionality, the current peer to peer layer implements a few additional features:
- All communications between all nodes are encrypted and authenticated
- Peers are scored, monitored and placed in gray lists or blacklists if they misbehave (e.g. relay invalid information) or use too many resources. Peers can also be whitelisted.
- Peers regularly exchange known good peers to increase the overall connectivity of the network.
- Each peer's public key comes with a proof-of-work stamp to ensure a minimal cost for creating new identities in the network.
- We performed extensive fuzz-testing on the peer to peer layer in the past to identify possible risks of remotely triggering exceptions (OCaml is memory safe so remote code execution is extremely unlikely, barring a bug in the OCaml code or in some of the C libraries we use such as libsodium).
Here's some of the remaining development and testing we would like to do on the peer to peer layer:
Perform additional fuzz-testing. Our nodes are programmed to be very conservative and shut down at the slightest exception. This is great for debugging, but not always a good behavior to display on a real life system. It's been awhile since we ran our fuzz tests and the peer to peer layer has evolved quite a bit since then, so we should give it another round.
Run a bigger, messier, testnet with many more machines located all around the world. It's one thing for the peer to peer layer to work in a small environment, but we want to make sure it scales to a much larger number of nodes and higher latencies. This is something we'll learn as we grow our testnet.
1.2 Context Database
This is the part of the node that stores data related to the state of the ledger. We use Irmin as a backend which has several benefits. It gives us a functional, append-only, view of the storage, and it encodes the state of the ledger in a hash tree which simplifies building simple verification clients.
There isn't much to say here other than
we'd like to run scalability tests to see how Irmin performs with billions of internal node. If it turns out that the backend is to slow to handle a tree of this size, we would look to replace it with key-value database such as leveldb. This doesn't mean replacing Irmin itself, but rather
adding a new backed to Irmin. This change would be completely transparent to the rest of the code.
In addition to the context database, the node also stores blocks and operations. Currently those are being stored inefficiently on disk.
Moving them to leveldb or Irmin will help preserve inodes on the file system.
1.3 Network Shell
The network shell sits at the interface between the network and the protocol. It takes blocks sent to it by its peers and validates them against the current protocol.
This is, in my opinion, one of the most delicate parts of the code and hard to get right (other parts of the code are more security sensitive, but easier to get right). Proof-of-work consensus algorithms are very resilient to spam and denial of service attacks. If a Bitcoin node tries to trick you into validating a bad reorganization, you can either tell it is lying very quickly (by examining its headers), or it is impractically expensive for them to trick you. By design, block creation in proof-of-stake is cheap, which makes it harder to defend against denial of service attacks.