GNUnet
GNU’s decentralized anonymous and censorship-resistant P2P framework.
GNUnet logo  
[English | Afrikaans | Bulgarian | Catalan | Czech | Danish | Dutch | Esperanto | Finnish | French | Galician | German | Hungarian | Italian | Japanese | Polish | Portuguese | Romanian | Russian | Simplified chinese | Slovak | Spanish | Swedish | Traditional chinese | Ukrainian]

File Sharing with GNUnet

Anonymous file sharing logo

Concepts

Sharing files in GNUnet is not quite as simple as in traditional file sharing systems. For example, it is not sufficient to just place files into a specific directory to share them. In addition to anonymous routing GNUnet attempts to give users a better experience in searching for content. GNUnet uses cryptography to safely break content into smaller pieces that can be obtained from different sources without allowing participants to corrupt files. GNUnet makes it difficult for an adversary to send back bogus search results. GNUnet enables content providers to group related content and to establish a reputation. Furthermore, GNUnet allows updates to certain content to be made available. This section is supposed to introduce users to the concepts that are used to achive this goals.

Files

A file in GNUnet is just a sequence of bytes. Any file-format is allowed and the maximum file size is theoretically 264 bytes, except that it would take an impractical amount of time to share such a file. GNUnet itself never interprets the contents of shared files, except when using libextractor to obtain keywords.

Keywords

Keywords are the most simple mechanism to find files on GNUnet. Keywords are case-sensitive and the search string must always match exactly the keyword used by the person providing the file. Keywords are never transmitted in plaintext, for details see the ECRS paper. Since providing keywords by hand for each shared file is tedious, GNUnet uses libextractor to help automate this process. Starting a keyword search on a slow machine can take a while since the keyword search involves computing a fresh RSA key to fomulate the request.

Directories

A directory in GNUnet is a list of file identifiers with meta-data. The file identifiers provide sufficient information about the files to allow downloading the contents. Once a directory has been created, it cannot be changed since it is treated just like an ordinary file by the network.

Namespaces

A namespace is a set of files that were signed by the same pseudonym. A pseudonym is essentially a public-private RSA key. Note that a pseudonym is NOT bound to a GNUnet peer. There can be multiple pseudonyms for a single user, and users could share pseudonym keys (out-of-band). Files (or directories) that have been signed and placed into a namespace can be updated. Updates are identified as authentic if the same secret key was used to sign the update. Namespaces are also useful to establish a reputation, since all of the content in the namespace comes from the same entity (which does not have to be the same person).

Advertisements

Advertisements are used to notify other users about the existence of a namespace. Advertisements are propagated using the normal keyword search. When an advertisement is received (in response to a search), it is NOT displayed immediately. Instead, the namespace is added to the list of namespaces available in the namespace-search dialogs of gnunet-gtk and printed by gnunet-pseudonym. Whenever a namespace is created, an appropriate advertisement can be generated. The default keyword for the advertising of namespaces is namespace.

Collections

A collection is an automatically managed namespace. The root of the namespace points to a directory with all of the files inserted by the user since the collection was initiated. The root is updated upon request to reflect files published since the last update. The construction of the directory and the update of the namespace are done automatically by GNUnet on each insertion and does not require work from the user. Collections are advertised under the keyword collection.

Example

Here is how to start a collection:

$ gnunet-pseudonym -a -C NICKNAME -D DESCRIPTION -r AUTHORNAME
$ gnunet-insert FILENAME
$ gnunet-search collection
$ gnunet-gtk

In gnunet-gtk select Advanced-Search Namespace. In the dialog, select the NICKNAME from the list of namespace identifiers. The search key identifier will be filled out automatically to point to the root of the namespace. The search should yield a directory which contains the file FILENAME. After inserting additional files, additional directories with more files will show up in the search. To stop the collection, use

$ gnunet-pseudonym -E

Note that the UI may not always be very pretty for collections since this is a new feature.

File-sharing options in gnunet.conf

This section describes the options in gnunet.conf that relate to anonymous file sharing. Most options are in the configuration file for the gnunetd daemon, the others are specfically marked as client options.

FS: QUOTA

Use this option to specify how much space GNUnet is allowed to use on the drive. This does not include indexed files. The value is specified in MB, the default is 1024. Note that whenever you change this value, GNUnet may have to reorganize the database, which can take quite some time on the next start (obviously depending on the previous size of the database).

Large amounts of storage space may also have some impact on memory use, a typical value is around 250 kb memory per gigabyte of storage space. Note that indexing files (instead of inserting, indexing is the default, insertion can be enforced with the -n switch) is much cheaper; the files will cause less memory usage, use less space in the database and the operation will be faster.

FS: INDEX-DIRECTORY

This option specifies the name of the directory where indexed files are either copied to or symlinked from. When a file is indexed with the option -l and if gnunetd and the inserting process run on the same machine, then a symbolic link is created from the index-directory to that file. Without the -l option or if gnunetd runs on a different machine, a copy of the file is made instead. Note that the indexing process does a lot more than just this, thus moving files over to the index directory by hand will NOT share these files.

FS: ACTIVEMIGRATION

Setting this option to YES allows gnunetd to migrate data to the local machine. Setting this option to YES is highly recommended for efficiency. Its also the default. If you set this value to YES, GNUnet will store content on your machine that you cannot decrypt. While this may protect you from liability if the judge is sane, it may not (IANAL). If you put illegal content on your machine yourself, setting this option to YES will probably increase your chances to get away with it since you can plausibly deny that you inserted the content. Note that in either case, your anonymity would have to be broken first (which may be possible depending on the size of the GNUnet network and the strength of the adversary).

FS: EXTRACTORS

This is a client option respected by gnunet-insert and gnunet-gtk. This option specifies which additional extractors gnunet-insert should use for keyword extraction. The default set of extractors from your local libextractor installation is always used. Typically, an extractor for splitting keywords at word boundaries is added here.

MODULES: sqstore

Which database type should be used for content? Valid types are "sqstore_sqlite" and "sqstore_mysql". The libraries and header files for the specified type must have been available at compile time. If the type is changed, you must stop gnunetd and run gnunet-update to convert the database.

The mysql module requires manual setup, described here. The sqlite databases only requires the installation of the respective database (with header files) before running configure.

Setting up the mysql database

MySQL 4.1 is required since GNUnet uses prepared statements. First, here are some performance numbers comparing MySQL and SQlite:

DB MySQL 4 MySQL 5 SQlite
1.6 GHZ AMD 64
GNU/Linux, gcc 3.3.5
IO: 53.67 MB/sec
Upload: 615 MB, 350s (1714 kbps)

  Upload: 615 MB, 424s (1414 kbps)

3 GHZ Pentium 4
Windows XP, gcc 3.4.2
Upload: 615 MB, 6062s (99 kbps) Upload: 615 MB, 1388s (433 kbps) Upload: 615 MB, 517s (1162 kbps)

Highlights

Pros:

Cons:

Setup Instructions

First, you must have mysql including the development files (headers) installed on the system when you configure GNUnet. Not all binary distributions contain the mysql module, so you may also have to compile GNUnet by hand. After you have mysql installed and GNUnet compiled with mysql support, do the following:

  1. In /etc/gnunetd.conf, set

    sqstore = sqstore_mysql
  2. Then access mysql as root (root of the database, not of the system):

    $ mysql -u root -p 

    and do the following. [You should replace $USER with the username that will be running the gnunetd process].

    CREATE DATABASE gnunet;
    GRANT select,insert,update,delete,create,alter,drop,create temporary tables ON gnunet.* TO $USER@localhost;
    SET PASSWORD FOR $USER@localhost=PASSWORD("$the_password_you_like");
    FLUSH PRIVILEGES;
  3. In the $HOME directory of $USER, create a ".my.cnf" file with the following lines:

    [client]
    user=$USER
    password=$the_password_you_like

    Note that .my.cnf file is a security risk since it exposes the password. You may want to keep the file in a place where it is not easily accessed. The $HOME/.my.cnf can be a symbolic link. It is also possible not to use any password if database security is no concern. Note that has only priviledges to mess up GNUnet′s tables, nothing else (unless you give him more, of course).

  4. Finally, you may want to briefly try if the DB connection works. First, login as $USER. Then use:

    # mysql -u $USER -p
    mysql> use gnunet;

    If you get the message "Database changed" it probably works. If you get "ERROR 2002: Can′t connect to local MySQL server through socket ′/tmp/mysql.sock′ (2)" it may be resolvable by "ln -s /var/run/mysqld/mysqld.sock /tmp/mysql.sock" So there may be some additional trouble depending on your mysql setup. Finally, after changing the configuration, you need to run gnunet-update.

REPAIRING TABLES

Its probably healthy to check your tables for inconsistencies every now and then. If you get odd SEGVs on gnunetd startup, it might be that the mysql databases have been corrupted.

The tables can be verified or fixed in the following ways:

  1. by shutting down mysqld (mandatory!) and running

    # myisamchk -r *.MYI

    in /var/lib/mysql/gnunet/ (or wherever the tables are stored).

  2. Another repair command is mysqlcheck. The usable command may depend on your mysql build/version.

  3. by executing

    mysql> REPAIR TABLE gnXXXX

    for each table in the gnunet database (USE gnunet; SHOW TABLES;)

If you have problems related to the mysql module, your best friend is probably the mysql manual. The first thing to check is that mysql is basically operational, that you can connect to it, create tables, issue queries and so on.


Commands for File Sharing

Anonymous file sharing logo

The only useful application that is currently available for GNUnet is anonymous file-sharing. The GUI interface is described here. For shell-gurus, five shell commands provide the interface:

gnunet-auto-share

gnunet-auto-share can be used to "automatically" share all of the files in a given directory. The basic format of the command is

$ gnunet-auto-share DIRECTORY-NAME*

After being started like this, gnunet-auto-share will put itself into the background (daemonize) and periodically check if new files have been copied into the given directories. Working in the background, gnunet-auto-share will ensure that all files in the given directory are published to gnunetd and thus available to the network. You will need to restart gnunet-auto-share whenever your computer is rebooted (just like you need to restart gnunetd). While your distribution may contain a script to automatically restart gnunetd. Writing such a script maybe a bit tricky since gnunet-auto-share would usually run as an ordinary user and needs to access the gnunet.conf configuration of that ordinary user (so that the user can add directories to share).

It is possible to manually specify keywords for the top-level files or directories shared using gnunet-auto-share. This can be done by creating an additional configuration file (by default: ~/.gnunet/metadata.conf). Use the names of the shared files as section names, the type of the metadata as the key and the value of the metadata as the values. The gnunet-auto-share man page contains a sample metadata.conf configuration file and more detailed descriptions of the various supported options.

gnunet-insert

The command gnunet-insert can be used to add content to the network. The basic format of the command is

$ gnunet-insert [-n] [-k KEYWORDS]* [-m TYPE:VALUE] FILENAME

The option -k is used to specify keywords for the file that should be inserted. You can supply any number of keywords, and each of the keywords will be sufficient to locate and retrieve the file. The -m option is used to specify meta-data, such as descriptions. You can use -m multiple times. The TYPE passed must be from the list of meta-data types known to libextractor. You can obtain this list by running extract -l. Use quotes around the entire meta-data argument if the value contains spaces. The meta-data is displayed to other users when they select which files to download. The meta-data and the keywords are optional and maybe inferred by libextactor.

By default, GNUnet indexes a file instead of copying it. This is much more efficient, but requries the file to stay unaltered at the location where it was when it was indexed. Indexed files also must be accessible for gnunetd using the same absolute path. If indexing fails, make sure that the file permissions are set appropriately. If you intend to move, delete or alter a file, consider using the option -n which will force GNUnet to make a copy of the file in the database. Since it is much less efficient, this is strongly discouraged for large files. When GNUnet indexes a file (default), GNUnet does not create an additional encrypted copy of the file but just computes a summary (or index) of the file. That summary is approximately two percent of the size of the original file and is stored in GNUnet’s database. Whenever a request for a part of an indexed file reaches GNUnet, this part is encrypted on-demand and send out. There is no need for an additional encrypted copy of the file to stay anywhere on the drive. This is very different from other systems, such as Freenet, where each file that is put online must be in Freenet’s database in encrypted format, doubling the space requirements if the user wants to preseve a directly accessible copy in plaintext.

Thus indexing should be used for all files where the user will keep using this file (at the location given to gnunet-insert) and does not want to retrieve it back from GNUnet each time.

The option -n may be used if the user fears that the file might be found on his drive (assuming the computer comes under the control of an adversary). When used with the -n flag, the user has a much better chance of denying knowledge of the existence of the file, even if it is still (encrypted) on the drive and the adversary is able to crack the encryption (e.g. by guessing the keyword.

gnunet-insert has a ton of additional options to handle namespaces and directories. See the man-page for details. If you want to remove a file that you have indexed from the local peer, use the tool gnunet-unindex to un-index the file.


gnunet-search

The command gnunet-search can be used to search for content on GNUnet. The format is:

$ gnunet-search [-t TIMEOUT] KEYWORD [AND KEYWORD]*

The -t option specifies that the query should timeout after approximately TIMEOUT seconds. A value of zero is interpreted as no timeout. If multiple words are passed as keywords and are not separated by an AND, gnunet-search will concatenate them to one bigger keyword. Thus,

$ gnunet-search Das Kapital

and

$ gnunet-search "Das Kapital"

are identical. You can use AND to separate keywords. In that case, gnunet-search will only display results that match all the keywords. gnunet-search cannot do multiple independent queries ("OR"); you must use multiple processes for that.

Search results are printed by gnunet-search like this:

gnunet://ecrs/chk/9E4MDN4VULE8KJG6U1C8FKH5HA8C5CHSJTILRTTPGK8MJ6VHORERHE68JU8Q0FDTOH1DGLUJ3NLE99N0ML0N9PIBAGKG7MNPBTT6UKG.1I823C58O3LKS24LLI9KB384LH82LGF9GUQRJHACCUINSCQH36SI4NF88CMAET3T3BHI93D4S0M5CC6MVDL1K8GFKVBN69Q6T307U6O.17992:
gnunet-download -o "COPYING" gnunet://ecrs/chk/9E4MDN4VULE8KJG6U1C8FKH5HA8C5CHSJTILRTTPGK8MJ6VHORERHE68JU8Q0FDTOH1DGLUJ3NLE99N0ML0N9PIBAGKG7MNPBTT6UKG.1I823C58O3LKS24LLI9KB384LH82LGF9GUQRJHACCUINSCQH36SI4NF88CMAET3T3BHI93D4S0M5CC6MVDL1K8GFKVBN69Q6T307U6O.17992
                    filename: COPYING
                 description: The GNU Public License
                      author: RMS
            publication date: Sat Jun 25 08:29:13 2005

The second line is the command you would have to enter to download the file. The argument passed to -o is the suggested filename (you may change it to whatever you like). The filename is followed by key for decrypting the file, the query for searching the file, a checksum (in hexadecimal) finally the size of the file in bytes. The second line contains the description of the file; here this is "The GNU Public License", the author and the publication date (see the options for gnunet-insert on how to change these).

gnunet-download

In order to download a file, you need the three values returned by gnunet-search. You can then use the tool gnunet-download to obtain the file:

$ gnunet-download -o FILENAME GNUNETURL

FILENAME specifies the name of the file where GNUnet is supposed to write the result. Existing files are overwritten. If you want to download the GPL from the previous example, you do the following:

$ gnunet-download -o "COPYING" gnunet://ecrs/chk/9E4MDN4VULE8KJG6U1C8FKH5HA8C5CHSJTILRTTPGK8MJ6VHORERHE68JU8Q0FDTOH1DGLUJ3NLE99N0ML0N9PIBAGKG7MNPBTT6UKG.1I823C58O3LKS24LLI9KB384LH82LGF9GUQRJHACCUINSCQH36SI4NF88CMAET3T3BHI93D4S0M5CC6MVDL1K8GFKVBN69Q6T307U6O.17992

If you ever have to abort a download, you can continue it at any time by re-issuing gnunet-download with the same filename. In that case, GNUnet will not download blocks again that are already present. GNUnet’s file-encoding mechanism will ensure file integrity, even if the existing file was not downloaded from GNUnet in the first place. You may want to use the -V switch (must be added before the --) to turn on verbose reporting. In this case, gnunet-download will print the current number of bytes downloaded whenever new data was received.


The option -c CONFIGFILE can be passed to each of the commands to override the default location of the configuration file. The option -v shows the current version number. Use -h to get a short description of the options.

gnunet-unindex

gnunet-unindex can be used to un-index files that were inserted into GNUnet (works only for files that were inserted locally and that are still present on the local drive).

gnunet-directory

Directories are shared just like ordinary files. If you download a directory with gnunet-download, you can use gnunet-directory to list its contents. The contents of a directory are File Identifiers (FIs). An FI contains all the information required by gnunet-download to retrieve the file. Additionally, FIs can contain the mime-type, description, a filename and other meta information.

In order to make it possible to assemble directories, GNUnet stores all locally known FIs in a plaintext database, the FI database.

gnunet-directory can also be used to list the contents of the FI database.> The option -l causes the display of all known FI entries. The FI database can be flushed using the -k option. There is currently no way to selectively remove a specific entry.


Note that there is no command line tool to create a directory from the FI database. To create a directory from the command line, you must use gnunet-insert. The main use of the FI database is for building directories (and namespace entries) with gnunet-gtk.

gnunet-pseudonym

By default this tool lists all locally available pseudonyms. With the -C NICK option it can also be used to create a new pseudonym. A pseudonym is the virtual identity of the entity in control of a namespace. Anyone can create any number of pseudonyms. Note that creating a pseudonym can take a few minutes depending on the performance of the machine used.

With the -D NICK option pseudonyms can be deleted. Once the pseudonym has been deleted it is impossible to add content to the corresponding namespace. Deleting the pseudonym does not make the namespace or any content in it unavailable.

Advertising namespaces

Each namespace is associated with meta-data that describes the namespace. This meta-data is provided by the user at the time that the pseudonym was created. Naturally all of the information is optional and maybe incorrect since it is provided by the user and cannot be verified. The meta-data is published in what is called a namespace advertisement. These advertisements are exchanged within GNUnet and can be found using normal keyword-searches. This way, users can learn about new namespaces without relying on out-of-band communication or directories. When a pseudonym is created, the namespace is by default advertised under the keyword namespace. When a keyword-search finds a namespace advertisement, it is automatically stored in the local GNUnet database. The advertisement is preserved for tools like gnunet-pseudonym and gnunet-gtk that can reproduce them when appropriate.

Namespace roots

An item of particular interest in the namespace advertisement is the ROOT. The ROOT is the identifier of a designated file in the namespace. The idea is that the ROOT can be used to advertise an entry point to the content of the namespace. Note that currently all of the meta-data must be provided at the time where the namespace is created and cannot be updated later.

Naming namespaces

On the network, pseudonyms are uniquely identified using the hash of the corresponding public key. As a result, links to content in namespaces can be rather long -- they need to incorporate a 512-bit binary hash! In order to make using namespaces a bit more practical, NICKNAMEs are used to identify pseudonyms on each system. The NICKNAME is derived from the metadata provided when the namespace was created. Since metadata can be freely chosen by the creator of the namespace, conflicts are possible. Different users may choose to create namespaces with the same meta-data resulting in identical base-names for different namespaces. In order to ensure that there is only one namespace corresponding to a given name, a unique number is added to the base-name to ensure that the resulting NICKNAME is unique. The resulting NICKNAME is (usually) short and unique for the local system.

However, other systems may have chosen different unique numbers for the same namespace. For example, the same namespace may be called "Alice-1" on Carol′s system and "Alice-2" on Bob′s system, simply because Bob got an advertisement for another namespace "Alice" from Dave earlier (and hence Bob called that one "Alice-1"). If Bob were to discuss namespaces with Carol, they should not use the NICKNAMEs (which may differ between systems) but instead should use the hash of the public keys. Naturally, software can and should handle the necessary conversions between systems (by translating from NICKNAME to the hash of the public key and back).



Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Christian Grothoff.
Verbatim copying and distribution of this entire article
is permitted in any medium, provided this notice is preserved.

Translation engine based on i18nHTML (C) 2003, 2004, 2005, 2006, 2007 Christian Grothoff.

go to i18nHTML administration page