File-sharing

This chapter documents the GNUnet file-sharing application. The original file-sharing implementation for GNUnet was designed to provide anonymous file-sharing. However, over time, we have also added support for non-anonymous file-sharing (which can provide better performance). Anonymous and non-anonymous file-sharing are quite integrated in GNUnet and, except for routing, share most of the concepts and implementation. There are three primary file-sharing operations: publishing, searching and downloading. For each of these operations, the user specifies an anonymity level. If both the publisher and the searcher/downloader specify "no anonymity", non-anonymous file-sharing is used. If either user specifies some desired degree of anonymity, anonymous file-sharing will be used.

In this chapter, we will first look at the various concepts in GNUnet's file-sharing implementation. Then, we will discuss specifics as to how they impact users that publish, search or download files.

File-sharing: Concepts

Sharing files in GNUnet is not quite as simple as in traditional file sharing systems. For example, it is not sufficient to just place files into a specific directory to share them. In addition to anonymous routing GNUnet attempts to give users a better experience in searching for content. GNUnet uses cryptography to safely break content into smaller pieces that can be obtained from different sources without allowing participants to corrupt files. GNUnet makes it difficult for an adversary to send back bogus search results. GNUnet enables content providers to group related content and to establish a reputation. Furthermore, GNUnet allows updates to certain content to be made available. This section is supposed to introduce users to the concepts that are used to achive these goals.

Files

A file in GNUnet is just a sequence of bytes. Any file-format is allowed and the maximum file size is theoretically 264 bytes, except that it would take an impractical amount of time to share such a file. GNUnet itself never interprets the contents of shared files, except when using GNU libextractor to obtain keywords.

Keywords

Keywords are the most simple mechanism to find files on GNUnet. Keywords are case-sensitive and the search string must always match exactly the keyword used by the person providing the file. Keywords are never transmitted in plaintext. The only way for an adversary to determine the keyword that you used to search is to guess it (which then allows the adversary to produce the same search request). Since providing keywords by hand for each shared file is tedious, GNUnet uses GNU libextractor to help automate this process. Starting a keyword search on a slow machine can take a little while since the keyword search involves computing a fresh RSA key to formulate the request.

Directories

A directory in GNUnet is a list of file identifiers with meta data. The file identifiers provide sufficient information about the files to allow downloading the contents. Once a directory has been created, it cannot be changed since it is treated just like an ordinary file by the network. Small files (of a few kilobytes) can be inlined in the directory, so that a separate download becomes unnecessary.

Pseudonyms

Pseudonyms in GNUnet are essentially public-private (RSA) key pairs that allow a GNUnet user to maintain an identity (which may or may not be detached from his real-life identity). GNUnet's pseudonyms are not file-sharing specific --- and they will likely be used by many GNUnet applications where a user identity is required.

Note that a pseudonym is NOT bound to a GNUnet peer. There can be multiple pseudonyms for a single user, and users could (theoretically) share the private pseudonym keys (currently only out-of-band by knowing which files to copy around).

Namespaces

A namespace is a set of files that were signed by the same pseudonym. Files (or directories) that have been signed and placed into a namespace can be updated. Updates are identified as authentic if the same secret key was used to sign the update. Namespaces are also useful to establish a reputation, since all of the content in the namespace comes from the same entity (which does not have to be the same person).

Advertisements

Advertisements are used to notify other users about the existence of a namespace. Advertisements are propagated using the normal keyword search. When an advertisement is received (in response to a search), the namespace is added to the list of namespaces available in the namespace-search dialogs of gnunet-fs-gtk and printed by gnunet-pseudonym. Whenever a namespace is created, an appropriate advertisement can be generated. The default keyword for the advertising of namespaces is "namespace".

Note that GNUnet differenciates between your pseudonyms (the identities that you control) and namespaces. If you create a pseudonym, you will not automatically see the respective namespace. You first have to create an advertisement for the namespace and find it using keyword search --- even for your own namespaces. The gnunet-pseudonym tool is currently responsible for both managing pseudonyms and namespaces. This will likely change in the future to reduce the potential for confusion.

Anonymity level

The anonymity level determines how hard it should be for an adversary to determine the identity of the publisher or the searcher/downloader. An anonymity level of zero means that anonymity is not required. The default anonymity level of "1" means that anonymous routing is desired, but no particular amount of cover traffic is necessary. A powerful adversary might thus still be able to deduce the origin of the traffic using traffic analysis. Specifying higher anonymity levels increases the amount of cover traffic required. While this offers better privacy, it can also significantly hurt performance.

Content Priority

Depending on the peer's configuration, GNUnet peers migrate content between peers. Content in this sense are individual blocks of a file, not necessarily entire files. When peers run out of space (due to local publishing operations or due to migration of content from other peers), blocks sometimes need to be discarded. GNUnet first always discards expired blocks (typically, blocks are published with an expiration of about two years in the future; this is another option). If there is still not enough space, GNUnet discards the blocks with the lowest priority. The priority of a block is decided by its popularity (in terms of requests from peers we trust) and, in case of blocks published locally, the base-priority that was specified by the user when the block was published initially.

Replication

When peers migrate content to other systems, the replication level of a block is used to decide which blocks need to be migrated most urgently. GNUnet will always push the block with the highest replication level into the network, and then decrement the replication level by one. If all blocks reach replication level zero, the selection is simply random.

File-sharing: Publishing

The command gnunet-publish can be used to add content to the network.
The basic format of the command is

$ gnunet-publish [-n] [-k KEYWORDS]* [-m TYPE:VALUE] FILENAME 

Important command-line options

The option -k is used to specify keywords for the file that should be inserted.
You can supply any number of keywords, and each of the keywords will be sufficient to locate and retrieve the file.

The -m option is used to specify meta-data, such as descriptions. You can use -m multiple times. The TYPE passed must be from the list of meta-data types known to libextractor. You can obtain this list by running extract -L.
Use quotes around the entire meta-data argument if the value contains spaces.
The meta-data is displayed to other users when they select which files to download. The meta-data and the keywords are optional and maybe inferred using GNU libextractor.

gnunet-publish has a few additional options to handle namespaces and directories.
See the man-page for details.

Indexing vs. Inserting

By default, GNUnet indexes a file instead of making a full copy. This is much more efficient, but requries the file to stay unaltered at the location where it was when it was indexed. If you intend to move, delete or alter a file, consider using the option -n which will force GNUnet to make a copy of the file in the database.

Since it is much less efficient, this is strongly discouraged for large files. When GNUnet indexes a file (default), GNUnet does not create an additional encrypted copy of the file but just computes a summary (or index) of the file. That summary is approximately two percent of the size of the original file and is stored in GNUnet’s database. Whenever a request for a part of an indexed file reaches GNUnet, this part is encrypted on-demand and send out. This way, there is no need for an additional encrypted copy of the file to stay anywhere on the drive. This is different from other systems, such as Freenet, where each file that is put online must be in Freenet’s database in encrypted format, doubling the space requirements if the user wants to preseve a directly accessible copy in plaintext.

Thus indexing should be used for all files where the user will keep using this file (at the location given to gnunet-publish) and does not want to retrieve it back from GNUnet each time. If you want to remove a file that you have indexed from the local peer, use the tool gnunet-unindex to un-index the file.

The option -n may be used if the user fears that the file might be found on his drive (assuming the computer comes under the control of an adversary).
When used with the -n flag, the user has a much better chance of denying knowledge of the existence of the file, even if it is still (encrypted) on the drive and the adversary is able to crack the encryption (e.g. by guessing the keyword.

File-sharing: Searching

The command gnunet-search can be used to search for content on GNUnet. The format is:

$ gnunet-search [-t TIMEOUT] KEYWORD

The -t option specifies that the query should timeout after approximately TIMEOUT seconds. A value of zero is interpreted as no timeout, which is also the default. In this case, gnunet-search will never terminate (unless you press CTRL-C).

If multiple words are passed as keywords, they will all be considered optional. Prefix keywords with a "+" to make them mandatory.

Note that searching using

$ gnunet-search Das Kapital

is not the same as searching for

$ gnunet-search "Das Kapital"

as the first will match files shared under the keywords "Das" or "Kapital" whereas the second will match files shared under the keyword "Das Kapital".

Search results are printed by gnunet-search like this:

$ gnunet-download -o "COPYING" -- gnunet://fs/chk/N8...C92.17992
=> The GNU Public License <= (mimetype: text/plain)

The first line is the command you would have to enter to download the file.
The argument passed to -o is the suggested filename (you may change it to whatever you like).
The -- is followed by key for decrypting the file, the query for searching the file, a checksum (in hexadecimal) finally the size of the file in bytes.
The second line contains the description of the file; here this is "The GNU Public License" and the mime-type (see the options for gnunet-publish on how to specify these).

File-sharing: Downloading

In order to download a file, you need the three values returned by gnunet-search.
You can then use the tool gnunet-download to obtain the file:

$ gnunet-download -o FILENAME -- GNUNETURL

FILENAME specifies the name of the file where GNUnet is supposed to write the result. Existing files are overwritten. If the existing file contains blocks that are identical to the desired download, those blocks will not be downloaded again (automatic resume).

If you want to download the GPL from the previous example, you do the following:

$ gnunet-download -o "COPYING" -- gnunet://fs/chk/N8...92.17992

If you ever have to abort a download, you can continue it at any time by re-issuing gnunet-download with the same filename. In that case, GNUnet will not download blocks again that are already present.

GNUnet’s file-encoding mechanism will ensure file integrity, even if the existing file was not downloaded from GNUnet in the first place.

You may want to use the -V switch (must be added before the --) to turn on verbose reporting. In this case, gnunet-download will print the current number of bytes downloaded whenever new data was received.

File-sharing: Directories

Directories are shared just like ordinary files. If you download a directory with gnunet-download, you can use gnunet-directory to list its contents. The canonical extension for GNUnet directories when stored as files in your local file-system is ".gnd". The contents of a directory are URIs and meta data.
The URIs contain all the information required by gnunet-download to retrieve the file. The meta data typically includes the mime-type, description, a filename and other meta information, and possibly even the full original file (if it was small).

File-sharing: Namespace Management

THIS TEXT IS OUTDATED AND NEEDS TO BE REWRITTEN FOR 0.10!

The gnunet-pseudonym tool can be used to create pseudonyms and to advertise namespaces. By default, gnunet-pseudonym simply lists all locally available pseudonyms.

Creating Pseudonyms

With the -C NICK option it can also be used to create a new pseudonym.
A pseudonym is the virtual identity of the entity in control of a namespace.
Anyone can create any number of pseudonyms. Note that creating a pseudonym can take a few minutes depending on the performance of the machine used.

Deleting Pseudonyms

With the -D NICK option pseudonyms can be deleted. Once the pseudonym has been deleted it is impossible to add content to the corresponding namespace. Deleting the pseudonym does not make the namespace or any content in it unavailable.

Advertising namespaces

Each namespace is associated with meta-data that describes the namespace.
This meta data is provided by the user at the time that the namespace is advertised. Advertisements are published under keywords so that they can be found using normal keyword-searches. This way, users can learn about new namespaces without relying on out-of-band communication or directories.
A suggested keyword to use for all namespaces is simply "namespace".
When a keyword-search finds a namespace advertisement, it is automatically stored in a local list of known namespaces. Users can then associate a rank with the namespace to remember the quality of the content found in it.

Namespace names

While the namespace is uniquely identified by its ID, another way to refer to the namespace is to use the NICKNAME. The NICKNAME can be freely chosen by the creator of the namespace and hence conflicts are possible. If a GNUnet client learns about more than one namespace using the same NICKNAME, the ID is appended to the NICKNAME to get a unique identifier.

Namespace root

An item of particular interest in the namespace advertisement is the ROOT.
The ROOT is the identifier of a designated entry in the namespace. The idea is that the ROOT can be used to advertise an entry point to the content of the namespace.

File-Sharing URIs

GNUnet (currently) uses four different types of URIs for file-sharing. They all begin with "gnunet://fs/". This section describes the four different URI types in detail.

Encoding of hash values in URIs

Most URIs include some hash values. Hashes are encoded using base32hex (RFC 2938).

Content Hash Key (chk)

A chk-URI is used to (uniquely) identify a file or directory and to allow peers to download the file. Files are stored in GNUnet as a tree of encrypted blocks. The chk-URI thus contains the information to download and decrypt those blocks. A chk-URI has the format "gnunet://fs/chk/KEYHASH.QUERYHASH.SIZE". Here, "SIZE" is the size of the file (which allows a peer to determine the shape of the tree), KEYHASH is the key used to decrypt the file (also the hash of the plaintext of the top block) and QUERYHASH is the query used to request the top-level block (also the hash of the encrypted block).

Location identifiers (loc)

For non-anonymous file-sharing, loc-URIs are used to specify which peer is offering the data (in addition to specifying all of the data from a chk-URI). Location identifiers include a digital signature of the peer to affirm that the peer is truly the origin of the data. The format is "gnunet://fs/loc/KEYHASH.QUERYHASH.SIZE.PEER.SIG.EXPTIME". Here, "PEER" is the public key of the peer (in GNUnet format in base32hex), SIG is the RSA signature (in GNUnet format in base32hex) and EXPTIME specifies when the signature expires (in milliseconds after 1970).

Keyword queries (ksk)

A keyword-URI is used to specify that the desired operation is the search using a particular keyword. The format is simply "gnunet://fs/ksk/KEYWORD". Non-ASCII characters can be specified using the typical URI-encoding (using hex values) from HTTP. "+" can be used to specify multiple keywords (which are then logically "OR"-ed in the search, results matching both keywords are given a higher rank): "gnunet://fs/ksk/KEYWORD1+KEYWORD2".

Namespace content (sks)

Namespaces are sets of files that have been approved by some (usually pseudonymous) user --- typically by that user publishing all of the files together. A file can be in many namespaces. A file is in a namespace if the owner of the ego (aka the namespace's private key) signs the CHK of the file cryptographically. An SKS-URI is used to search a namespace. The result is a block containing meta data, the CHK and the namespace owner's signature. The format of a sks-URI is "gnunet://fs/sks/NAMESPACE/IDENTIFIER". Here, "NAMESPACE" is the public key for the namespace. "IDENTIFIER" is a freely chosen keyword (or password!). A commonly used identifier is "root" which by convention refers to some kind of index or other entry point into the namespace.