Implementing a bittorrent client with rust pt 1
How lovely torrenting works, the network system by which information can be passed amongst peers. This has enabled a widely used p2p (peer to peer) infrastructure for data sharing. This makes the deep end of the internet more interesting. I got interested in this technology while I was learning about the blockchain and the p2p system of communication. With this, I decided to implement one with rust.
This post will describe the technology, how it’s build and will serve as a documentation
for the project. The source code can be found here.
Note, this project takes re-inventing the wheels seriously since the point is to
understand the system, hence, functionalities that could easily be gotten from a library
will be rebuilt.
Structure of the project will be
Introduction
The technology of bittorrent bases its operations on the ability to allow downloaders
serve as uploaders of the same file they are downloading or have downloaded. This is called
seeding by peers in a swarm. This is obviously the distributed file sharing.
The entire file to be downloaded is broken into small chunks known as pieces each having
the same size. This can be breaking a 1mb file into 1000 1kb pieces or another chunk size.
Each piece is encrypted using a hash and can be verified from the torrent descriptor.
Bencode parser
The bencode parser works by consuming each character and translating into an enum struct I call BenStruct using the bencode specification. It simply uses delimiters to break store strings, integers, lists, dictionaries all in a linear format. Hence, bencode is a good way of storing complex data structures in a linear memory storage.
Spec summary for Bencode
-
Integers
These start with ‘i’ and end with ‘e’. The number can be negative. Examples are
i32e // decoded to 32 i0e // decodes to 0 i-56e // decodes to -56
I call them the Int variant in the BenStruct enum.
-
Bytes
These follow the format of a length followed by a colon “:” and then the bytes, eg
4:spam // decodes to "spam"
I call them the Byte variant in the BenStruct enum. Note, these are not necessarily ascii bytes hence bencode isn’t considered human-readable and should not be treated as text files.
-
Lists
Starting with a delimiter of ‘l’ and ending with ‘e’, it can contain other data types. It has no restriction on the data type of the children.
li32e4:spami-56ee // decoded to [32, "spam", -56]
-
Dictionaries or Structs
The last standard data type. This is used to represent a key pair data structure. Following the format of starting with ‘d’ and ending with ‘e’. The keys are immediately followed by the value. The keys must be byte strings and must be sorted by lexicographical order.
d3:bar4:spam3:fooi45ee // decoded to {"bar": "spam", "foo": 45}
Algorithm for parsing bencode
My guide for parsing the bencode is actually an advantage from learning more on
data structures. I’ve always known what recursions are but until recently, I had a
view on how things work deeply and how the activation record
and the stack
works.
With this, I have a really better and interesting visualization of recursion in my head.
My notes on this recursion are here.
How this works recursively is
func process_bencode:
c = consume a character
match type of c and with type using spec summary above:
type int | byte => consume int | byte
type list => {
let base_vec = some form of vector
loop while list hasn't ended (use spec)
base_vec appends process_bencode() // note this recursion allows Nested lists
return base_vec
}
type dict => {
let base_hash_map = some form of hash_map
loop while dict hasn't ended
let key = process_bencode() // recursion from here
assert key is string
let value = process_bencode() // for any form of
base_hash_map inserts (key, value)
return base_hash_map
}
This is basically the flow I’m using with parsing bencode.
Next will be going on figuring out the structuring and the http section, sweet. For this, planning to implement a concurrent solution or building on hyper. How this goes, will be discussed in their respective parts.