Developer Handbook

This book is intended to be an introduction for programmers that want to extend the GNUnet framework. GNUnet is more than a simple peer-to-peer application. For developers, GNUnet is:

  • Free software under the GNU General Public License, with a community that believes in the GNU philosophy
  • A set of standards, including coding conventions and architectural rules
  • A set of layered protocols, both specifying the communication between peers as well as the communication between components of a single peer.
  • A set of libraries with well-defined APIs suitable for writing extensions

In particular, the architecture specifies that a peer consists of many processes communicating via protocols. Processes can be written in almost any language. C and Java APIs exist for accessing existing services and for writing extensions. It is possible to write extensions in other languages by implementing the necessary IPC protocols.

GNUnet can be extended and improved along many possible dimensions, and anyone interested in free software and freedom-enhancing networking is welcome to join the effort. This developer handbook attempts to provide an initial introduction to some of the key design choices and central components of the system. This manual is far from complete, and we welcome informed contributions, be it in the form of new chapters or insightful comments.

However, the website is experiencing a constant onslaught of sophisticated link-spam entered manually by exploited workers solving puzzles and customizing text. To limit this commercial defacement, we are strictly moderating comments and have disallowed "normal" users from posting new content. However, this is really only intended to keep the spam at bay. If you are a real user or aspiring developer, please drop us a note (IRC, e-mail, contact form) with your user profile ID number included. We will then relax these restrictions on your account. We're sorry for this inconvenience; however, few people would want to read this site if 99% of it was advertisements for bogus websites.

Introduction

This developer handbook is intended as first introduction to GNUnet for new developers that want to extend the GNUnet framework. After the introduction, each of the GNUnet subsystems (directories in the src/ tree) is (supposed to be) covered in its own chapter. In addition to this documentation, GNUnet developers should be aware of the services available on the GNUnet server to them.

New developers can have a look a the GNUnet tutorials for C and java available in the src/ directory of the repository or under the following links:

In addition to this book, the GNUnet server contains various resources for GNUnet developers. They are all conveniently reachable via the "Developer" entry in the navigation menu. Some additional tools (such as static analysis reports) require a special developer access to perform certain operations. If you feel you need access, you should contact Christian Grothoff, GNUnet's maintainer.

The public subsystems on the GNUnet server that help developers are:

  • The Version control system keeps our code and enables distributed development. Only developers with write access can commit code, everyone else is encouraged to submit patches to the developer mailinglist.
  • The GNUnet bugtracking system is used to track feature requests, open bug reports and their resolutions. Anyone can report bugs, only developers can claim to have fixed them.
  • A buildbot is used to check GNUnet builds automatically on a range of platforms. Builds are triggered automatically after 30 minutes of no changes to Git.
  • The current quality of our automated test suite is assessed using Code coverage analysis. This analysis is run daily; however the webpage is only updated if all automated tests pass at that time. Testcases that improve our code coverage are always welcome.
  • We try to automatically find bugs using a static analysis scan. This scan is run daily; however the webpage is only updated if all automated tests pass at the time. Note that not everything that is flagged by the analysis is a bug, sometimes even good code can be marked as possibly problematic. Nevertheless, developers are encouraged to at least be aware of all issues in their code that are listed.
  • We use Gauger for automatic performance regression visualization. Details on how to use Gauger are here.
  • We use junit to automatically test gnunet-java. Automatically generated, current reports on the test suite are here.
  • We use Cobertura to generate test coverage reports for gnunet-java. Current reports on test coverage are here.

Project overview

The GNUnet project consists at this point of several sub-projects. This section is supposed to give an initial overview about the various sub-projects. Note that this description also lists projects that are far from complete, including even those that have literally not a single line of code in them yet.

GNUnet sub-projects in order of likely relevance are currently:

svn/gnunet
Core of the P2P framework, including file-sharing, VPN and chat applications; this is what the developer handbook covers mostly
svn/gnunet-gtk/
Gtk+-based user interfaces, including gnunet-fs-gtk (file-sharing), gnunet-statistics-gtk (statistics over time), gnunet-peerinfo-gtk (information about current connections and known peers), gnunet-chat-gtk (chat GUI) and gnunet-setup (setup tool for "everything")
svn/gnunet-fuse/
Mounting directories shared via GNUnet's file-sharing on Linux
svn/gnunet-update/
Installation and update tool
svn/gnunet-ext/
Template for starting 'external' GNUnet projects
svn/gnunet-java/
Java APIs for writing GNUnet services and applications
svn/gnunet-www/
Code and media helping drive the GNUnet website
svn/eclectic/
Code to run GNUnet nodes on testbeds for research, development, testing and evaluation
svn/gnunet-qt/
qt-based GNUnet GUI (dead?)
svn/gnunet-cocoa/
cocoa-based GNUnet GUI (dead?)

We are also working on various supporting libraries and tools:

svn/Extractor/
GNU libextractor (meta data extraction)
svn/libmicrohttpd/
GNU libmicrohttpd (embedded HTTP(S) server library)
svn/gauger/
Tool for performance regression analysis
svn/monkey/
Tool for automated debugging of distributed systems
svn/libmwmodem/
Library for accessing satellite connection quality reports

Finally, there are various external projects (see links for a list of those that have a public website) which build on top of the GNUnet framework.

Code overview

This section gives a brief overview of the GNUnet source code. Specifically, we sketch the function of each of the subdirectories in the gnunet/src/ directory. The order given is roughly bottom-up (in terms of the layers of the system).

util/ --- libgnunetutil
Library with general utility functions, all GNUnet binaries link against this library. Anything from memory allocation and data structures to cryptography and inter-process communication. The goal is to provide an OS-independent interface and more 'secure' or convenient implementations of commonly used primitives. The API is spread over more than a dozen headers, developers should study those closely to avoid duplicating existing functions.
hello/ --- libgnunethello
HELLO messages are used to describe under which addresses a peer can be reached (for example, protocol, IP, port). This library manages parsing and generating of HELLO messages.
block/ --- libgnunetblock
The DHT and other components of GNUnet store information in units called 'blocks'. Each block has a type and the type defines a particular format and how that binary format is to be linked to a hash code (the key for the DHT and for databases). The block library is a wapper around block plugins which provide the necessary functions for each block type.
statistics/
The statistics service enables associating values (of type uint64_t) with a componenet name and a string. The main uses is debugging (counting events), performance tracking and user entertainment (what did my peer do today?).
arm/
The automatic-restart-manager (ARM) service is the GNUnet master service. Its role is to start gnunet-services, to re-start them when they crashed and finally to shut down the system when requested.
peerinfo/
The peerinfo service keeps track of which peers are known to the local peer and also tracks the validated addresses for each peer (in the form of a HELLO message) for each of those peers. The peer is not necessarily connected to all peers known to the peerinfo service. Peerinfo provides persistent storage for peer identities --- peers are not forgotten just because of a system restart.
datacache/ --- libgnunetdatacache
The datacache library provides (temporary) block storage for the DHT. Existing plugins can store blocks in Sqlite, Postgres or MySQL databases. All data stored in the cache is lost when the peer is stopped or restarted (datacache uses temporary tables).
datastore/
The datastore service stores file-sharing blocks in databases for extended periods of time. In contrast to the datacache, data is not lost when peers restart. However, quota restrictions may still cause old, expired or low-priority data to be eventually discarded. Existing plugins can store blocks in Sqlite, Postgres or MySQL databases.
template/
Template for writing a new service. Does nothing.
ats/
The automatic transport selection (ATS) service is responsible for deciding which address (i.e. which transport plugin) should be used for communication with other peers, and at what bandwidth.
nat/ --- libgnunetnat
Library that provides basic functions for NAT traversal. The library supports NAT traversal with manual hole-punching by the user, UPnP and ICMP-based autonomous NAT traversal. The library also includes an API for testing if the current configuration works and the gnunet-nat-server which provides an external service to test the local configuration.
fragmentation/ --- libgnunetfragmentation
Some transports (UDP and WLAN, mostly) have restrictions on the maximum transfer unit (MTU) for packets. The fragmentation library can be used to break larger packets into chunks of at most 1k and transmit the resulting fragments reliabily (with acknowledgement, retransmission, timeouts, etc.).
transport/
The transport service is responsible for managing the basic P2P communication. It uses plugins to support P2P communication over TCP, UDP, HTTP, HTTPS and other protocols.The transport service validates peer addresses, enforces bandwidth restrictions, limits the total number of connections and enforces connectivity restrictions (i.e. friends-only).
peerinfo-tool/
This directory contains the gnunet-peerinfo binary which can be used to inspect the peers and HELLOs known to the peerinfo service.
core/
The core service is responsible for establishing encrypted, authenticated connections with other peers, encrypting and decrypting messages and forwarding messages to higher-level services that are interested in them.
testing/ --- libgnunettesting
The testing library allows starting (and stopping) peers for writing testcases.
It also supports automatic generation of configurations for peers ensuring that the ports and paths are disjoint. libgnunettesting is also the foundation for the testbed service
testbed/
The testbed service is used for creating small or large scale deployments of GNUnet peers for evaluation of protocols. It facilitates peer depolyments on multiple hosts (for example, in a cluster) and establishing varous network topologies (both underlay and overlay).
nse/
The network size estimation (NSE) service implements a protocol for (securely) estimating the current size of the P2P network.
dht/
The distributed hash table (DHT) service provides a distributed implementation of a hash table to store blocks under hash keys in the P2P network.
hostlist/
The hostlist service allows learning about other peers in the network by downloading HELLO messages from an HTTP server, can be configured to run such an HTTP server and also implements a P2P protocol to advertise and automatically learn about other peers that offer a public hostlist server.
topology/
The topology service is responsible for maintaining the mesh topology. It tries to maintain connections to friends (depending on the configuration) and also tries to ensure that the peer has a decent number of active connections at all times. If necessary, new connections are added. All peers should run the topology service, otherwise they may end up not being connected to any other peer (unless some other service ensures that core establishes the required connections). The topology service also tells the transport service which connections are permitted (for friend-to-friend networking)
fs/
The file-sharing (FS) service implements GNUnet's file-sharing application. Both anonymous file-sharing (using gap) and non-anonymous file-sharing (using dht) are supported.
cadet/
The CADET service provides a general-purpose routing abstraction to create end-to-end encrypted tunnels in mesh networks. We wrote a paper documenting key aspects of the design.
tun/ --- libgnunettun
Library for building IPv4, IPv6 packets and creating checksums for UDP, TCP and ICMP packets. The header defines C structs for common Internet packet formats and in particular structs for interacting with TUN (virtual network) interfaces.
mysql/ --- libgnunetmysql
Library for creating and executing prepared MySQL statements and to manage the connection to the MySQL database. Essentially a lightweight wrapper for the interaction between GNUnet components and libmysqlclient.
dns/
Service that allows intercepting and modifying DNS requests of the local machine. Currently used for IPv4-IPv6 protocol translation (DNS-ALG) as implemented by "pt/" and for the GNUnet naming system. The service can also be configured to offer an exit service for DNS traffic.
vpn/
The virtual public network (VPN) service provides a virtual tunnel interface (VTUN) for IP routing over GNUnet. Needs some other peers to run an "exit" service to work. Can be activated using the "gnunet-vpn" tool or integrated with DNS using the "pt" daemon.
exit/
Daemon to allow traffic from the VPN to exit this peer to the Internet or to specific IP-based services of the local peer. Currently, an exit service can only be restricted to IPv4 or IPv6, not to specific ports and or IP address ranges. If this is not acceptable, additional firewall rules must be added manually. exit currently only works for normal UDP, TCP and ICMP traffic; DNS queries need to leave the system via a DNS service.
pt/
protocol translation daemon. This daemon enables 4-to-6, 6-to-4, 4-over-6 or 6-over-4 transitions for the local system. It essentially uses "DNS" to intercept DNS replies and then maps results to those offered by the VPN, which then sends them using mesh to some daemon offering an appropriate exit service.
identity/
Management of egos (alter egos) of a user; identities are essentially named ECC private keys and used for zones in the GNU name system and for namespaces in file-sharing, but might find other uses later
revocation/
Key revocation service, can be used to revoke the private key of an identity if it has been compromised
namecache/
Cache for resolution results for the GNU name system; data is encrypted and can be shared among users, loss of the data should ideally only result in a performance degradation (persistence not required)
namestore/
Database for the GNU name system with per-user private information, persistence required
gns/
GNU name system, a GNU approach to DNS and PKI.
dv/
A plugin for distance-vector (DV)-based routing. DV consists of a service and a transport plugin to provide peers with the illusion of a direct P2P connection for connections that use multiple (typically up to 3) hops in the actual underlay network.
regex/
Service for the (distributed) evaluation of regular expressions.
scalarproduct/
The scalar product service offers an API to perform a secure multiparty computation which calculates a scalar product between two peers without exposing the private input vectors of the peers to each other.
consensus/
The consensus service will allow a set of peers to agree on a set of values via a distributed set union computation.
rest/
The rest API allows access to GNUnet services using RESTful interaction. The services provide plugins that can exposed by the rest server.
experimentation/
The experimentation daemon coordinates distributed experimentation to evaluate transport and ats properties

System Architecture

GNUnet developers like legos. The blocks are indestructible, can be stacked together to construct complex buildings and it is generally easy to swap one block for a different one that has the same shape. GNUnet's architecture is based on legos:

service lego block

This chapter documents the GNUnet lego system, also known as GNUnet's system architecture.

The most common GNUnet component is a service. Services offer an API (or several, depending on what you count as "an API") which is implemented as a library. The library communicates with the main process of the service using a service-specific network protocol. The main process of the service typically doesn't fully provide everything that is needed -- it has holes to be filled by APIs to other services.

A special kind of component in GNUnet are user interfaces and daemons. Like services, they have holes to be filled by APIs of other services. Unlike services, daemons do not implement their own network protocol and they have no API:

daemon lego block

The GNUnet system provides a range of services, daemons and user interfaces, which are then combined into a layered GNUnet instance (also known as a peer).

service stack

Note that while it is generally possible to swap one service for another compatible service, there is often only one implementation. However, during development we often have a "new" version of a service in parallel with an "old" version. While the "new" version is not working, developers working on other parts of the service can continue their development by simply using the "old" service. Alternative design ideas can also be easily investigated by swapping out individual components. This is typically achieved by simply changing the name of the "BINARY" in the respective configuration section.

Key properties of GNUnet services are that they must be separate processes and that they must protect themselves by applying tight error checking against the network protocol they implement (thereby achieving a certain degree of robustness).

On the other hand, the APIs are implemented to tolerate failures of the service, isolating their host process from errors by the service. If the service process crashes, other services and daemons around it should not also fail, but instead wait for the service process to be restarted by ARM.

AttachmentSize
Image icon daemon_lego_block.png12.9 KB
Image icon service_lego_block.png19.56 KB
Image icon service_stack.png23.32 KB

Subsystem stability

This page documents the current stability of the various GNUnet subsystems. Stability here describes the expected degree of compatibility with future versions of GNUnet. For each subsystem we distinguish between compatibility on the P2P network level (communication protocol between peers), the IPC level (communication between the service and the service library) and the API level (stability of the API). P2P compatibility is relevant in terms of which applications are likely going to be able to communicate with future versions of the network. IPC communication is relevant for the implementation of language bindings that re-implement the IPC messages. Finally, API compatibility is relevant to developers that hope to be able to avoid changes to applications build on top of the APIs of the framework.

The following table summarizes our current view of the stability of the respective protocols or APIs:

Subsystem P2P IPC C API
util n/a n/a stable
arm n/a stable stable
ats n/a unstable testing
block n/a n/a stable
cadet testing testing testing
consensus experimental experimental experimental
core stable stable stable
datacache n/a n/a stable
datastore n/a stable stable
dht stable stable stable
dns stable stable stable
dv testing testing n/a
exit testing n/a n/a
fragmentation stable n/a stable
fs stable stable stable
gns stable stable stable
hello n/a n/a testing
hostlist stable stable n/a
identity stable stable n/a
multicast experimental experimental experimental
mysql stable n/a stable
namestore n/a stable stable
nat n/a n/a stable
nse stable stable stable
peerinfo n/a stable stable
psyc experimental experimental experimental
pt n/a n/a n/a
regex stable stable stable
revocation stable stable stable
social experimental experimental experimental
statistics n/a stable stable
testbed n/a testing testing
testing n/a n/a testing
topology n/a n/a n/a
transport stable stable stable
tun n/a n/a stable
vpn testing n/a n/a

Here is a rough explanation of the values:

stable
no incompatible changes are planned at this time; for IPC/APIs, if there are incompatible changes, they will be minor and might only require minimal changes to existing code; for P2P, changes will be avoided if at all possible for the 0.10.x-series
testing
no incompatible changes are planned at this time, but the code is still known to be in flux; so while we have no concrete plans, our expectation is that there will still be minor modifications; for P2P, changes will likely be extensions that should not break existing code
unstable
changes are planned and will happen; however, they will not be totally radical and the result should still resemble what is there now; nevertheless, anticipated changes will break protocol/API compatibility
experimental
changes are planned and the result may look nothing like what the API/protocol looks like today
unknown
someone should think about where this subsystem is headed
n/a
this subsystem does not have an API/IPC-protocol/P2P-protocol

Naming conventions and coding style guide

Here you can find some rules to help you write code for GNUnet.

Naming conventions

include files

  • _lib: library without need for a process
  • _service: library that needs a service process
  • _plugin: plugin definition
  • _protocol: structs used in network protocol
  • exceptions:
    • gnunet_config.h --- generated
    • platform.h --- first included
    • plibc.h --- external library
    • gnunet_common.h --- fundamental routines
    • gnunet_directories.h --- generated
    • gettext.h --- external library

binaries

  • gnunet-service-xxx: service process (has listen socket)
  • gnunet-daemon-xxx: daemon process (no listen socket)
  • gnunet-helper-xxx[-yyy]: SUID helper for module xxx
  • gnunet-yyy: command-line tool for end-users
  • libgnunet_plugin_xxx_yyy.so: plugin for API xxx
  • libgnunetxxx.so: library for API xxx

logging

  • services and daemons use their directory name in GNUNET_log_setup (i.e. 'core') and log using plain 'GNUNET_log'.
  • command-line tools use their full name in GNUNET_log_setup (i.e. 'gnunet-publish') and log using plain 'GNUNET_log'.
  • service access libraries log using 'GNUNET_log_from' and use 'DIRNAME-api' for the component (i.e. 'core-api')
  • pure libraries (without associated service) use 'GNUNET_log_from' with the component set to their library name (without lib or '.so'), which should also be their directory name (i.e. 'nat')
  • plugins should use 'GNUNET_log_from' with the directory name and the plugin name combined to produce the component name (i.e. 'transport-tcp').
  • logging should be unified per-file by defining a LOG macro with the appropriate arguments, along these lines:
    #define LOG(kind,...) GNUNET_log_from (kind, "example-api",__VA_ARGS__)

configuration

  • paths (that are substituted in all filenames) are in PATHS (have as few as possible)
  • all options for a particular module (src/MODULE) are under [MODULE]
  • options for a plugin of a module are under [MODULE-PLUGINNAME]

exported symbols

  • must start with "GNUNET_modulename_" and be defined in "modulename.c"
  • exceptions: those defined in gnunet_common.h

private (library-internal) symbols (including structs and macros)

  • must NOT start with any prefix
  • must not be exported in a way that linkers could use them or
    other libraries might see them via headers; they must be either
    declared/defined in C source files or in headers that are in
    the respective directory under src/modulename/ and NEVER be
    declared in src/include/.

testcases

  • must be called "test_module-under-test_case-description.c"
  • "case-description" maybe omitted if there is only one test

performance tests

  • must be called "perf_module-under-test_case-description.c"
  • "case-description" maybe omitted if there is only one performance test
  • Must only be run if HAVE_BENCHMARKS is satisfied

src/ directories

  • gnunet-NAME: end-user applications (i.e., gnunet-search, gnunet-arm)
  • gnunet-service-NAME: service processes with accessor library (i.e., gnunet-service-arm)
  • libgnunetNAME: accessor library (_service.h-header) or standalone library (_lib.h-header)
  • gnunet-daemon-NAME: daemon process without accessor library (i.e., gnunet-daemon-hostlist) and no GNUnet management port
  • libgnunet_plugin_DIR_NAME: loadable plugins (i.e., libgnunet_plugin_transport_tcp)

Coding style

  • GNU guidelines generally apply
  • Indentation is done with spaces, two per level, no tabs
  • C99 struct initialization is fine
  • declare only one variable per line, so
      int i;
      int j;
    

    instead of

      int i,j;
    

    This helps keep diffs small and forces developers to think precisely about the type of every variable. Note that char * is different from const char* and int is different from unsigned int or uint32_t. Each variable type should be chosen with care.

  • While goto should generally be avoided, having a goto to the end of a function to a block of clean up statements (free, close, etc.) can be acceptable.
  • Conditions should be written with constants on the left (to avoid accidental assignment) and with the 'true' target being either the 'error' case or the significantly simpler continuation. For example:
     if (0 != stat ("filename," &sbuf)) 
    { 
      error(); 
    } 
    else
    { 
      /* handle normal case here */
    }
    

    instead of

     if (stat ("filename," &sbuf) == 0) 
    { 
      /* handle normal case here */
    } 
    else
    { 
      error(); 
    }
    

    If possible, the error clause should be terminated with a 'return' (or 'goto' to some cleanup routine) and in this case, the 'else' clause should be omitted:

     if (0 != stat ("filename," &sbuf)) 
    { 
      error(); 
      return;
    } 
    /* handle normal case here */
    

    This serves to avoid deep nesting. The 'constants on the left' rule applies to all constants (including. GNUNET_SCHEDULER_NO_TASK), NULL, and enums). With the two above rules (constants on left, errors in 'true' branch), there is only one way to write most branches correctly.

  • Combined assignments and tests are allowed if they do not hinder code clarity. For example, one can write:
     if (NULL == (value = lookup_function()))
    { 
      error(); 
      return;
    } 
    
  • Use break and continue wherever possible to avoid deep(er) nesting. Thus, we would write:
    next = head;
    while (NULL != (pos = next))
    { 
      next = pos->next;
      if (! should_free (pos))
         continue; 
      GNUNET_CONTAINER_DLL_remove (head, tail, pos);
      GNUNET_free (pos);
    } 
    

    instead of

    next = head;
    while (NULL != (pos = next))
    { 
      next = pos->next;
      if (should_free (pos))
      {
        /* unnecessary nesting! */
        GNUNET_CONTAINER_DLL_remove (head, tail, pos);
        GNUNET_free (pos);
      }
    } 
    
  • We primarily use for and while loops. A while loop is used if the method for advancing in the loop is not a straightforward increment operation. In particular, we use:
    next = head;
    while (NULL != (pos = next))
    { 
      next = pos->next;
      if (! should_free (pos))
         continue; 
      GNUNET_CONTAINER_DLL_remove (head, tail, pos);
      GNUNET_free (pos);
    } 
    

    to free entries in a list (as the iteration changes the structure of the list due to the free; the equivalent for loop does no longer follow the simple for paradigm of for(INIT;TEST;INC)). However, for loops that do follow the simple for paradigm we do use for, even if it involves linked lists:

    /* simple iteration over a linked list */
    for (pos = head; NULL != pos; pos = pos->next)
    { 
       use (pos);
    } 
    
  • The first argument to all higher-order functions in GNUnet must be declared to be of type void * and is reserved for a closure. We do not use inner functions, as trampolines would conflict with setups that use non-executable stacks.
    The first statement in a higher-order function, which unusually should be part of the variable declarations, should assign the cls argument to the precise expected type. For example:
    int
    callback (void *cls, char *args)
    { 
       struct Foo *foo = cls;
       int other_variables;
    
       /* rest of function */
    } 
    
  • It is good practice to write complex if expressions instead of using deeply nested if statements. However, except for addition and multiplication, all operators should use parens. This is fine:
    if ( (1 == foo) || 
         ((0 == bar) && (x != y)) )
      return x;
    

    However, this is not:

    if (1 == foo) 
      return x;
    if (0 == bar && x != y)
      return x;
    

    Note that splitting the if statement above is debateable as the return x is a very trivial statement. However, once the logic after the branch becomes more complicated (and is still identical), the "or" formulation should be used for sure.

  • There should be two empty lines between the end of the function and the comments describing the following function. There should be a single empty line after the initial variable declarations of a function. If a function has no local variables, there should be no initial empty line. If a long function consists of several complex steps, those steps might be separated by an empty line (possibly followed by a comment describing the following step). The code should not contain empty lines in arbitrary places; if in doubt, it is likely better to NOT have an empty line (this way, more code will fit on the screen).

Build-system

If you have code that is likely not to compile or build rules you might want to not trigger for most developers, use "if HAVE_EXPERIMENTAL" in your Makefile.am. Then it is OK to (temporarily) add non-compiling (or known-to-not-port) code.

If you want to compile all testcases but NOT run them, run configure with the
--enable-test-suppression option.

If you want to run all testcases, including those that take a while, run configure with the
--enable-expensive-testcases option.

If you want to compile and run benchmarks, run configure with the
--enable-benchmarks option.

If you want to obtain code coverage results, run configure with the
--enable-coverage option and run the coverage.sh script in contrib/.

Developing extensions for GNUnet using the gnunet-ext template

For developers who want to write extensions for GNUnet we provide the gnunet-ext template to provide an easy to use skeleton.

gnunet-ext contains the build environment and template files for the development of GNUnet services, command line tools, APIs and tests.

First of all you have to obtain gnunet-ext from SVN:

svn co https://gnunet.org/svn/gnunet-ext

The next step is to bootstrap and configure it. For configure you have to provide the path containing GNUnet with --with-gnunet=/path/to/gnunet and the prefix where you want the install the extension using --prefix=/path/to/install

./bootstrap
./configure --prefix=/path/to/install --with-gnunet=/path/to/gnunet

When your GNUnet installation is not included in the default linker search path, you have to add /path/to/gnunet to the file /etc/ld.so.conf and run ldconfig or your add it to the environmental variable LD_LIBRARY_PATH by using

export LD_LIBRARY_PATH=/path/to/gnunet/lib

Writing testcases

Ideally, any non-trivial GNUnet code should be covered by automated testcases. Testcases should reside in the same place as the code that is being tested. The name of source files implementing tests should begin with "test_" followed by the name of the file that contains the code that is being tested.

Testcases in GNUnet should be integrated with the autotools build system. This way, developers and anyone building binary packages will be able to run all testcases simply by running make check. The final testcases shipped with the distribution should output at most some brief progress information and not display debug messages by default. The success or failure of a testcase must be indicated by returning zero (success) or non-zero (failure) from the main method of the testcase. The integration with the autotools is relatively straightforward and only requires modifications to the Makefile.am in the directory containing the testcase. For a testcase testing the code in foo.c the Makefile.am would contain the following lines:

check_PROGRAMS = test_foo
TESTS = $(check_PROGRAMS)
test_foo_SOURCES = test_foo.c
test_foo_LDADD = $(top_builddir)/src/util/libgnunetutil.la

Naturally, other libraries used by the testcase may be specified in the LDADD directive as necessary.

Often testcases depend on additional input files, such as a configuration file. These support files have to be listed using the EXTRA_DIST directive in order to ensure that they are included in the distribution. Example:

EXTRA_DIST = test_foo_data.conf

Executing make check will run all testcases in the current directory and all subdirectories. Testcases can be compiled individually by running make test_foo and then invoked directly using ./test_foo. Note that due to the use of plugins in GNUnet, it is typically necessary to run make install before running any testcases. Thus the canonical command make check install has to be changed to make install check for GNUnet.

GNUnet's TESTING library

The TESTING library is used for writing testcases which involve starting a single or multiple peers. While peers can also be started by testcases using the ARM subsystem, using TESTING library provides an elegant way to do this. The configurations of the peers are auto-generated from a given template to have non-conflicting port numbers ensuring that peers' services do not run into bind errors. This is achieved by testing ports' availability by binding a listening socket to them before allocating them to services in the generated configurations.

An another advantage while using TESTING is that it shortens the testcase startup time as the hostkeys for peers are copied from a pre-computed set of hostkeys instead of generating them at peer startup which may take a considerable amount of time when starting multiple peers or on an embedded processor.

TESTING also allows for certain services to be shared among peers. This feature is invaluable when testing with multiple peers as it helps to reduce the number of services run per each peer and hence the total number of processes run per testcase.

TESTING library only handles creating, starting and stopping peers. Features useful for testcases such as connecting peers in a topology are not available in TESTING but are available in the TESTBED subsystem. Furthermore, TESTING only creates peers on the localhost, however by using TESTBED testcases can benefit from creating peers across multiple hosts.

API

TESTING abstracts a group of peers as a TESTING system. All peers in a system have common hostname and no two services of these peers have a same port or a UNIX domain socket path.

TESTING system can be created with the function GNUNET_TESTING_system_create() which returns a handle to the system. This function takes a directory path which is used for generating the configurations of peers, an IP address from which connections to the peers' services should be allowed, the hostname to be used in peers' configuration, and an array of shared service specifications of type struct GNUNET_TESTING_SharedService.

The shared service specification must specify the name of the service to share, the configuration pertaining to that shared service and the maximum number of peers that are allowed to share a single instance of the shared service.

TESTING system created with GNUNET_TESTING_system_create() chooses ports from the default range 12000 - 56000 while auto-generating configurations for peers. This range can be customised with the function GNUNET_TESTING_system_create_with_portrange(). This function is similar to GNUNET_TESTING_system_create() except that it take 2 additional parameters -- the start and end of the port range to use.

A TESTING system is destroyed with the funciton GNUNET_TESTING_system_destory(). This function takes the handle of the system and a flag to remove the files created in the directory used to generate configurations.

A peer is created with the function GNUNET_TESTING_peer_configure(). This functions takes the system handle, a configuration template from which the configuration for the peer is auto-generated and the index from where the hostkey for the peer has to be copied from. When successfull, this function returs a handle to the peer which can be used to start and stop it and to obtain the identity of the peer. If unsuccessful, a NULL pointer is returned with an error message. This function handles the generated configuration to have non-conflicting ports and paths.

Peers can be started and stopped by calling the functions GNUNET_TESTING_peer_start() and GNUNET_TESTING_peer_stop() respectively. A peer can be destroyed by calling the function GNUNET_TESTING_peer_destroy. When a peer is destroyed, the ports and paths in allocated in its configuration are reclaimed for usage in new peers.

Finer control over peer stop

Using GNUNET_TESTING_peer_stop() is normally fine for testcases. However, calling this function for each peer is inefficient when trying to shutdown multiple peers as this function sends the termination signal to the given peer process and waits for it to terminate. It would be faster in this case to send the termination signals to the peers first and then wait on them. This is accomplished by the functions GNUNET_TESTING_peer_kill() which sends a termination signal to the peer, and the function GNUNET_TESTING_peer_wait() which waits on the peer.

Further finer control can be achieved by choosing to stop a peer asynchronously with the function GNUNET_TESTING_peer_stop_async(). This function takes a callback parameter and a closure for it in addition to the handle to the peer to stop. The callback function is called with the given closure when the peer is stopped. Using this function eliminates blocking while waiting for the peer to terminate.

An asynchronous peer stop can be cancelled by calling the function GNUNET_TESTING_peer_stop_async_cancel(). Note that calling this function does not prevent the peer from terminating if the termination signal has already been sent to it. It does, however, cancels the callback to be called when the peer is stopped.

Helper functions

Most of the testcases can benefit from an abstraction which configures a peer and starts it. This is provided by the function GNUNET_TESTING_peer_run(). This function takes the testing directory pathname, a configuration template, a callback and its closure. This function creates a peer in the given testing directory by using the configuration template, starts the peer and calls the given callback with the given closure.

The function GNUNET_TESTING_peer_run() starts the ARM service of the peer which starts the rest of the configured services. A similar function GNUNET_TESTING_service_run can be used to just start a single service of a peer. In this case, the peer's ARM service is not started; instead, only the given service is run.

Testing with multiple processes

When testing GNUnet, the splitting of the code into a services and clients often complicates testing. The solution to this is to have the testcase fork gnunet-service-arm, ask it to start the required server and daemon processes and then execute appropriate client actions (to test the client APIs or the core module or both). If necessary, multiple ARM services can be forked using different ports (!) to simulate a network. However, most of the time only one ARM process is needed. Note that on exit, the testcase should shutdown ARM with a TERM signal (to give it the chance to cleanly stop its child processes).

The following code illustrates spawning and killing an ARM process from a testcase:

static void
run (void *cls, char *const *args, const char *cfgfile,
      const struct GNUNET_CONFIGURATION_Handle *cfg)
{
  struct GNUNET_OS_Process *arm_pid;
  arm_pid = GNUNET_OS_start_process (NULL, NULL,
                                     "gnunet-service-arm",
                                     "gnunet-service-arm",
                                     "-c", cfgname, NULL);
  /* do real test work here */
  if (0 != GNUNET_OS_process_kill (arm_pid, SIGTERM))
    GNUNET_log_strerror (GNUNET_ERROR_TYPE_WARNING, "kill");
  GNUNET_assert (GNUNET_OK == GNUNET_OS_process_wait (arm_pid));
  GNUNET_OS_process_close (arm_pid);
}

GNUNET_PROGRAM_run (argc, argv, "NAME-OF-TEST", "nohelp",
                      options, &run, cls);

An alternative way that works well to test plugins is to implement a mock-version of the environment that the plugin expects and then to simply load the plugin directly.

Performance regression analysis with Gauger

To help avoid performance regressions, GNUnet uses Gauger. Gauger is a simple logging tool that allows remote hosts to send performance data to a central server, where this data can be analyzed and visualized. Gauger shows graphs of the repository revisions and the performace data recorded for each revision, so sudden performance peaks or drops can be identified and linked to a specific revision number.

In the case of GNUnet, the buildbots log the performance data obtained during the tests after each build. The data can be accesed on GNUnet's Gauger page.

The menu on the left allows to select either the results of just one build bot (under "Hosts") or review the data from all hosts for a given test result (under "Metrics"). In case of very different absolute value of the results, for instance arm vs. amd64 machines, the option "Normalize" on a metric view can help to get an idea about the performance evolution across all hosts.

Using Gauger in GNUnet and having the performance of a module tracked over time is very easy. First of course, the testcase must generate some consistent metric, which makes sense to have logged. Highly volatile or random dependant metrics probably are not ideal candidates for meaningful regression detection.

To start logging any value, just include gauger.h in your testcase code. Then, use the macro GAUGER() to make the buildbots log whatever value is of interest for you to gnunet.org's Gauger server. No setup is necessary as most buildbots have already everything in place and new metrics are created on demand. To delete a metric, you need to contact a member of the GNUnet development team (a file will need to be removed manually from the respective directory).

The code in the test should look like this:

[other includes]
#include <gauger.h>

int
main (int argc, char *argv[])
{
 
  [run test, generate data]
  GAUGER("YOUR_MODULE", "METRIC_NAME", (float)value, "UNIT");
}

Where:

YOUR_MODULE
is a category in the gauger page and should be the name of the module or subsystem like "Core" or "DHT"
METRIC
is the name of the metric being collected and should be concise and descriptive, like "PUT operations in sqlite-datastore".
value
is the value of the metric that is logged for this run.
UNIT
is the unit in which the value is measured, for instance "kb/s" or "kb of RAM/node".

If you wish to use Gauger for your own project, you can grab a copy of the latest stable release or check out Gauger's Subversion repository.

GNUnet's TESTBED Subsystem

The TESTBED subsystem facilitates testing and measuring of multi-peer deployments on a single host or over multiple hosts.

The architecture of the testbed module is divided into the following:

  • Testbed API: An API which is used by the testing driver programs. It provides with functions for creating, destroying, starting, stopping peers, etc.
  • Testbed service (controller): A service which is started through the Testbed API. This service handles operations to create, destroy, start, stop peers, connect them, modify their configurations.
  • Testbed helper: When a controller has to be started on a host, the testbed API starts the testbed helper on that host which in turn starts the controller. The testbed helper receives a configuration for the controller through its stdin and changes it to ensure the controller doesn't run into any port conflict on that host.

The testbed service (controller) is different from the other GNUnet services in that it is not started by ARM and is not supposed to be run as a daemon. It is started by the testbed API though a testbed helper. In a typical scenario involving multiple hosts, a controller is started on each host. Controllers take up the actual task of creating peers, starting and stopping them on the hosts they run.

While running deployments on a single localhost the testbed API starts the testbed helper directly as a child process. When running deployments on remote hosts the testbed API starts Testbed Helpers on each remote host through remote shell. By default testbed API uses SSH as a remote shell. This can be changed by setting the environmental variable GNUNET_TESTBED_RSH_CMD to the required remote shell program. This variable can also contain parameters which are to be passed to the remote shell program. For e.g:

export GNUNET_TESTBED_RSH_CMD="ssh -o BatchMode=yes -o NoHostAuthenticationForLocalhost=yes %h"

Substitutions are allowed int the above command string also allows for substitions. through placemarks which begin with a `%'. At present the following substitutions are supported

  • %h: hostname
  • %u: username
  • %p: port

Note that the substitution placemark is replaced only when the corresponding field is available and only once. Specifying %u@%h doesn't work either. If you want to user username substitutions for SSH use the argument -l before the username substitution. Ex: ssh -l %u -p %p %h

The testbed API and the helper communicate through the helpers stdin and stdout. As the helper is started through a remote shell on remote hosts any output messages from the remote shell interfere with the communication and results in a failure while starting the helper. For this reason, it is suggested to use flags to make the remote shells produce no output messages and to have password-less logins. The default remote shell, SSH, the default options are "-o BatchMode=yes -o NoHostBasedAuthenticationForLocalhost=yes". Password-less logins should be ensured by using SSH keys.

Since the testbed API executes the remote shell as a non-interactive shell, certain scripts like .bashrc, .profiler may not be executed. If this is the case testbed API can be forced to execute an interactive shell by setting up the environmental variable `GNUNET_TESTBED_RSH_CMD_SUFFIX' to a shell program. An example could be:

export GNUNET_TESTBED_RSH_CMD_SUFFIX="sh -lc"

The testbed API will then execute the remote shell program as: $GNUNET_TESTBED_RSH_CMD -p $port $dest $GNUNET_TESTBED_RSH_CMD_SUFFIX gnunet-helper-testbed

On some systems, problems may arise while starting testbed helpers if GNUnet is installed into a custom location since the helper may not be found in the standard path. This can be addressed by setting the variable `HELPER_BINARY_PATH' to the path of the testbed helper. Testbed API will then use this path to start helper binaries both locally and remotely.

Testbed API can accessed by including "gnunet_testbed_service.h" file and linking with -lgnunettestbed.

Supported Topologies

While testing multi-peer deployments, it is often needed that the peers are connected in some topology. This requirement is addressed by the function GNUNET_TESTBED_overlay_connect() which connects any given two peers in the testbed.

The API also provides a helper function GNUNET_TESTBED_overlay_configure_topology() to connect a given set of peers in any of the following supported topologies:

  • GNUNET_TESTBED_TOPOLOGY_CLIQUE: All peers are connected with each other
  • GNUNET_TESTBED_TOPOLOGY_LINE: Peers are connected to form a line
  • GNUNET_TESTBED_TOPOLOGY_RING: Peers are connected to form a ring topology
  • GNUNET_TESTBED_TOPOLOGY_2D_TORUS: Peers are connected to form a 2 dimensional torus topology. The number of peers may not be a perfect square, in that case the resulting torus may not have the uniform poloidal and toroidal lengths
  • GNUNET_TESTBED_TOPOLOGY_ERDOS_RENYI: Topology is generated to form a random graph. The number of links to be present should be given
  • GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD: Peers are connected to form a 2D Torus with some random links among them. The number of random links are to be given
  • GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD_RING: Peers are connected to form a ring with some random links among them. The number of random links are to be given
  • GNUNET_TESTBED_TOPOLOGY_SCALE_FREE: Connects peers in a topology where peer connectivity follows power law - new peers are connected with high probabililty to well connected peers. See Emergence of Scaling in Random Networks. Science 286, 509-512, 1999.
  • GNUNET_TESTBED_TOPOLOGY_FROM_FILE: The topology information is loaded from a file. The path to the file has to be given. See Topology file format for the format of this file.
  • GNUNET_TESTBED_TOPOLOGY_NONE: No topology

The above supported topologies can be specified respectively by setting the variable OVERLAY_TOPOLOGY to the following values in the configuration passed to Testbed API functions GNUNET_TESTBED_test_run() and GNUNET_TESTBED_run():

  • CLIQUE
  • RING
  • LINE
  • 2D_TORUS
  • RANDOM
  • SMALL_WORLD
  • SMALL_WORLD_RING
  • SCALE_FREE
  • FROM_FILE
  • NONE

Topologies RANDOM, SMALL_WORLD and SMALL_WORLD_RING require the option OVERLAY_RANDOM_LINKS to be set to the number of random links to be generated in the configuration. The option will be ignored for the rest of the topologies.

Toplogy SCALE_FREE requires the options SCALE_FREE_TOPOLOGY_CAP to be set to the maximum number of peers which can connect to a peer and SCALE_FREE_TOPOLOGY_M to be set to how many peers a peer should be atleast connected to.

Similarly, the topology FROM_FILE requires the option OVERLAY_TOPOLOGY_FILE to contain the path of the file containing the topology information. This option is ignored for the rest of the topologies. See Topology file format for the format of this file.

Hosts file format

The testbed API offers the function GNUNET_TESTBED_hosts_load_from_file() to load from a given file details about the hosts which testbed can use for deploying peers. This function is useful to keep the data about hosts separate instead of hard coding them in code.

Another helper function from testbed API, GNUNET_TESTBED_run() also takes a hosts file name as its parameter. It uses the above function to populate the hosts data structures and start controllers to deploy peers.

These functions require the hosts file to be of the following format:

  • Each line is interpreted to have details about a host
  • Host details should include the username to use for logging into the host, the hostname of the host and the port number to use for the remote shell program
  • . All three values should be given.

  • These details should be given in the following format:
  • <username>@<hostname>:<port>

Note that having canonical hostnames may cause problems while resolving the IP addresses (See this bug). Hence it is advised to provide the hosts' IP numerical addresses as hostnames whenever possible.

Topology file format

A topology file describes how peers are to be connected. It should adhere to the following format for testbed to parse it correctly.

Each line should begin with the target peer id. This should be followed by a colon(`:') and origin peer ids seperated by `|'. All spaces except for newline characters are ignored. The API will then try to connect each origin peer to the target peer.

For example, the following file will result in 5 overlay connections: [2->1], [3->1],[4->3], [0->3], [2->0]

1:2|3
3:4| 0
0: 2

Testbed Barriers

The testbed subsystem's barriers API facilitates coordination among the peers run by the testbed and the experiment driver. The concept is similar to the barrier synchronisation mechanism found in parallel programming or multi-threading paradigms - a peer waits at a barrier upon reaching it until the barrier is reached by a predefined number of peers. This predefined number of peers required to cross a barrier is also called quorum. We say a peer has reached a barrier if the peer is waiting for the barrier to be crossed. Similarly a barrier is said to be reached if the required quorum of peers reach the barrier. A barrier which is reached is deemed as crossed after all the peers waiting on it are notified.

The barriers API provides the following functions:

  1. GNUNET_TESTBED_barrier_init(): function to initialse a barrier in the experiment
  2. GNUNET_TESTBED_barrier_cancel(): function to cancel a barrier which has been initialised before
  3. GNUNET_TESTBED_barrier_wait(): function to signal barrier service that the caller has reached a barrier and is waiting for it to be crossed
  4. GNUNET_TESTBED_barrier_wait_cancel(): function to stop waiting for a barrier to be crossed

Among the above functions, the first two, namely GNUNET_TESTBED_barrier_init() and GNUNET_TESTBED_barrier_cancel() are used by experiment drivers. All barriers should be initialised by the experiment driver by calling GNUNET_TESTBED_barrier_init(). This function takes a name to identify the barrier, the quorum required for the barrier to be crossed and a notification callback for notifying the experiment driver when the barrier is crossed. GNUNET_TESTBED_barrier_cancel() cancels an initialised barrier and frees the resources allocated for it. This function can be called upon a initialised barrier before it is crossed.

The remaining two functions GNUNET_TESTBED_barrier_wait() and GNUNET_TESTBED_barrier_wait_cancel() are used in the peer's processes. GNUNET_TESTBED_barrier_wait() connects to the local barrier service running on the same host the peer is running on and registers that the caller has reached the barrier and is waiting for the barrier to be crossed. Note that this function can only be used by peers which are started by testbed as this function tries to access the local barrier service which is part of the testbed controller service. Calling GNUNET_TESTBED_barrier_wait() on an uninitialised barrier results in failure. GNUNET_TESTBED_barrier_wait_cancel() cancels the notification registered by GNUNET_TESTBED_barrier_wait().

Implementation

Since barriers involve coordination between experiment driver and peers, the barrier service in the testbed controller is split into two components. The first component responds to the message generated by the barrier API used by the experiment driver (functions GNUNET_TESTBED_barrier_init() and GNUNET_TESTBED_barrier_cancel()) and the second component to the messages generated by barrier API used by peers (functions GNUNET_TESTBED_barrier_wait() and GNUNET_TESTBED_barrier_wait_cancel()).

Calling GNUNET_TESTBED_barrier_init() sends a GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_INIT message to the master controller. The master controller then registers a barrier and calls GNUNET_TESTBED_barrier_init() for each its subcontrollers. In this way barrier initialisation is propagated to the controller hierarchy. While propagating initialisation, any errors at a subcontroller such as timeout during further propagation are reported up the hierarchy back to the experiment driver.

Similar to GNUNET_TESTBED_barrier_init(), GNUNET_TESTBED_barrier_cancel() propagates GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_CANCEL message which causes controllers to remove an initialised barrier.

The second component is implemented as a separate service in the binary `gnunet-service-testbed' which already has the testbed controller service. Although this deviates from the gnunet process architecture of having one service per binary, it is needed in this case as this component needs access to barrier data created by the first component. This component responds to GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT messages from local peers when they call GNUNET_TESTBED_barrier_wait(). Upon receiving GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT message, the service checks if the requested barrier has been initialised before and if it was not initialised, an error status is sent through GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS message to the local peer and the connection from the peer is terminated. If the barrier is initialised before, the barrier's counter for reached peers is incremented and a notification is registered to notify the peer when the barrier is reached. The connection from the peer is left open.

When enough peers required to attain the quorum send GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT messages, the controller sends a GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS message to its parent informing that the barrier is crossed. If the controller has started further subcontrollers, it delays this message until it receives a similar notification from each of those subcontrollers. Finally, the barriers API at the experiment driver receives the GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS when the barrier is reached at all the controllers.

The barriers API at the experiment driver responds to the GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS message by echoing it back to the master controller and notifying the experiment controller through the notification callback that a barrier has been crossed. The echoed GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS message is propagated by the master controller to the controller hierarchy. This propagation triggers the notifications registered by peers at each of the controllers in the hierarchy. Note the difference between this downward propagation of the GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS message from its upward propagation -- the upward propagation is needed for ensuring that the barrier is reached by all the controllers and the downward propagation is for triggering that the barrier is crossed.

Automatic large-scale deployment of GNUnet in the PlanetLab testbed

PlanetLab is as a testbed for computer networking and distributed systems research. It was established in 2002 and as of June 2010 was composed of 1090 nodes at 507 sites worldwide.

To automate the GNUnet we created a set of automation tools to simplify the large-scale deployment. We provide you a set of scripts you can use to deploy GNUnet on a set of nodes and manage your installation.

Please also check https://gnunet.org/installation-fedora8-svn and
https://gnunet.org/installation-fedora12-svn to find detailled instructions how to install GNUnet on a PlanetLab node

PlanetLab Automation for Fedora8 nodes

Install buildslave on PlanetLab nodes running fedora core 8

Since most of the PlanetLab nodes are running the very old fedora core 8 image, installing the buildslave software is quite some pain. For our PlanetLab testbed we figured out how to install the buildslave software best.

Install Distribute for python:

curl http://python-distribute.org/distribute_setup.py | sudo python

Install Distribute for zope.interface <= 3.8.0 (4.0 and 4.0.1 will not work):

wget http://pypi.python.org/packages/source/z/zope.interface/zope.interface-3.8.0.tar.gz
tar zvfz zope.interface-3.8.0.tar.gz
cd zope.interface-3.8.0
sudo python setup.py install

Install the buildslave software (0.8.6 was the latest version):

wget http://buildbot.googlecode.com/files/buildbot-slave-0.8.6p1.tar.gz
tar xvfz buildbot-slave-0.8.6p1.tar.gz
cd buildslave-0.8.6p1
sudo python setup.py install

The setup will download the matching twisted package and install it.
It will also try to install the latest version of zope.interface which will fail to install. Buildslave will work anyway since version 3.8.0 was installed before!

Setup a new PlanetLab testbed using GPLMT

  • Get a new slice and assing nodes
  • Ask your PlanetLab PI to give you a new slice and assign the nodes you need

  • Install a buildmaster
  • You can stick to the buildbot documentation:
    http://buildbot.net/buildbot/docs/current/manual/installation.html

  • Install the buildslave software on all nodes
  • To install the buildslave on all nodes assigned to your slice you can use the tasklist install_buildslave_fc8.xml provided with GPLMT:


    ./gplmt.py -c contrib/tumple_gnunet.conf -t contrib/tasklists/install_buildslave_fc8.xml -a -p <planetlab password>

  • Create the buildmaster configuration and the slave setup commands
  • The master and the and the slaves have need to have credentials and the master has to have all nodes configured. This can be done with the create_buildbot_configuration.py script in the scripts directory

    This scripts takes a list of nodes retrieved directly from PlanetLab or read from a file and a configuration template and creates:
    - a tasklist which can be executed with gplmt to setup the slaves
    - a master.cfg file containing a PlanetLab nodes

    A configuration template is included in the <contrib>, most important is that the script replaces the following tags in the template:

    %GPLMT_BUILDER_DEFINITION :
    %GPLMT_BUILDER_SUMMARY
    %GPLMT_SLAVES
    %GPLMT_SCHEDULER_BUILDERS

    Create configuration for all nodes assigned to a slice:

    ./create_buildbot_configuration.py -u <planetlab username> -p <planetlab password> -s <slice> -m <buildmaster+port> -t <template>

    Create configuration for some nodes in a file:

    ./create_buildbot_configuration.p -f <node_file> -m <buildmaster+port> -t <template>

  • Copy the master.cfg to the buildmaster and start it
  • Use buildbot start <basedir> to start the server

  • Setup the buildslaves

Why do i get an ssh error when using the regex profiler?

English

Why do i get an ssh error "Permission denied (publickey,password)." when using the regex profiler although passwordless ssh to localhost works using publickey and ssh-agent?

You have to generate a public/private-key pair with no password:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_localhost
and then add the following to your ~/.ssh/config

Host 127.0.0.1
IdentityFile ~/.ssh/id_localhost

now make sure your hostsfile looks like
[USERNAME]@127.0.0.1:22
[USERNAME]@127.0.0.1:22

you can test your setup by running `ssh 127.0.0.1` in a terminal and then in the opened session run it again. If you were not asked for a password on either login, then you should be good to go.

TESTBED Caveats

This section documents a few caveats when using the GNUnet testbed subsystem.

CORE must be started

A simple issue is #3993: Your configuration MUST somehow ensure that for each peer the CORE service is started when the peer is setup, otherwise TESTBED may fail to connect peers when the topology is initialized, as TESTBED will start some CORE services but not necessarily all (but it relies on all of them running). The easiest way is to set 'FORCESTART = YES' in the '[core]' section of the configuration file. Alternatively, having any service that directly or indirectly depends on CORE being started with FORCESTART will also do. This issue largely arises if users try to over-optimize by not starting any services with FORCESTART.

ATS must want the connections

When TESTBED sets up connections, it only offers the respective HELLO information to the TRANSPORT service. It is then up to the ATS service to decide to use the connection. The ATS service will typically eagerly establish any connection if the number of total connections is low (relative to bandwidth). Details may further depend on the specific ATS backend that was configured. If ATS decides to NOT establish a connection (even though TESTBED provided the required information), then that connection will count as failed for TESTBED. Note that you can configure TESTBED to tolerate a certain number of connection failures (see '-e' option of gnunet-testbed-profiler). This issue largely arises for dense overlay topologies, especially if you try to create cliques with more than 20 peers.

libgnunetutil

libgnunetutil is the fundamental library that all GNUnet code builds upon. Ideally, this library should contain most of the platform dependent code (except for user interfaces and really special needs that only few applications have). It is also supposed to offer basic services that most if not all GNUnet binaries require. The code of libgnunetutil is in the src/util/ directory. The public interface to the library is in the gnunet_util.h header. The functions provided by libgnunetutil fall roughly into the following categories (in roughly the order of importance for new developers):

  • logging (common_logging.c)
  • memory allocation (common_allocation.c)
  • endianess conversion (common_endian.c)
  • internationalization (common_gettext.c)
  • String manipulation (string.c)
  • file access (disk.c)
  • buffered disk IO (bio.c)
  • time manipulation (time.c)
  • configuration parsing (configuration.c)
  • command-line handling (getopt*.c)
  • cryptography (crypto_*.c)
  • data structures (container_*.c)
  • CPS-style scheduling (scheduler.c)
  • Program initialization (program.c)
  • Networking (network.c, client.c, server*.c, service.c)
  • message queueing (mq.c)
  • bandwidth calculations (bandwidth.c)
  • Other OS-related (os*.c, plugin.c, signal.c)
  • Pseudonym management (pseudonym.c)

It should be noted that only developers that fully understand this entire API will be able to write good GNUnet code.

Ideally, porting GNUnet should only require porting the gnunetutil library. More testcases for the gnunetutil APIs are therefore a great way to make porting of GNUnet easier.

Logging

GNUnet is able to log its activity, mostly for the purposes of debugging the program at various levels.

gnunet_common.h defines several log levels:

ERROR
for errors (really problematic situations, often leading to crashes)
WARNING
for warnings (troubling situations that might have negative consequences, although not fatal)
INFO
for various information. Used somewhat rarely, as GNUnet statistics is used to hold and display most of the information that users might find interesting.
DEBUG
for debugging. Does not produce much output on normal builds, but when extra logging is enabled at compile time, a staggering amount of data is outputted under this log level.

Normal builds of GNUnet (configured with --enable-logging[=yes]) are supposed to log nothing under DEBUG level. The --enable-logging=verbose configure option can be used to create a build with all logging enabled. However, such build will produce large amounts of log data, which is inconvenient when one tries to hunt down a specific problem.

To mitigate this problem, GNUnet provides facilities to apply a filter to reduce the logs:

Logging by default
When no log levels are configured in any other way (see below), GNUnet will default to the WARNING log level. This mostly applies to GNUnet command line utilities, services and daemons; tests will always set log level to WARNING or, if --enable-logging=verbose was passed to configure, to DEBUG. The default level is suggested for normal operation.
The -L option
Most GNUnet executables accept an "-L loglevel" or "--log=loglevel" option. If used, it makes the process set a global log level to "loglevel". Thus it is possible to run some processes with -L DEBUG, for example, and others with -L ERROR to enable specific settings to diagnose problems with a particular process.
Configuration files.
Because GNUnet service and deamon processes are usually launched by gnunet-arm, it is not possible to pass different custom command line options directly to every one of them. The options passed to gnunet-arm only affect gnunet-arm and not the rest of GNUnet. However, one can specify a configuration key "OPTIONS" in the section that corresponds to a service or a daemon, and put a value of "-L loglevel" there. This will make the respective service or daemon set its log level to "loglevel" (as the value of OPTIONS will be passed as a command-line argument).

To specify the same log level for all services without creating separate "OPTIONS" entries in the configuration for each one, the user can specify a config key "GLOBAL_POSTFIX" in the [arm] section of the configuration file. The value of GLOBAL_POSTFIX will be appended to all command lines used by the ARM service to run other services. It can contain any option valid for all GNUnet commands, thus in particular the "-L loglevel" option. The ARM service itself is, however, unaffected by GLOBAL_POSTFIX; to set log level for it, one has to specify "OPTIONS" key in the [arm] section.

Environment variables.
Setting global per-process log levels with "-L loglevel" does not offer sufficient log filtering granularity, as one service will call interface libraries and supporting libraries of other GNUnet services, potentially producing lots of debug log messages from these libraries. Also, changing the config file is not always convenient (especially when running the GNUnet test suite).
To fix that, and to allow GNUnet to use different log filtering at runtime without re-compiling the whole source tree, the log calls were changed to be configurable at run time. To configure them one has to define environment variables "GNUNET_FORCE_LOGFILE", "GNUNET_LOG" and/or "GNUNET_FORCE_LOG":
  • "GNUNET_LOG" only affects the logging when no global log level is configured by any other means (that is, the process does not explicitly set its own log level, there are no "-L loglevel" options on command line or in configuration files), and can be used to override the default WARNING log level.
  • "GNUNET_FORCE_LOG" will completely override any other log configuration options given.
  • "GNUNET_FORCE_LOGFILE" will completely override the location of the file to log messages to. It should contain a relative or absolute file name. Setting GNUNET_FORCE_LOGFILE is equivalent to passing "--log-file=logfile" or "-l logfile" option (see below). It supports "[]" format in file names, but not "{}" (see below).

Because environment variables are inherited by child processes when they are launched, starting or re-starting the ARM service with these variables will propagate them to all other services.

"GNUNET_LOG" and "GNUNET_FORCE_LOG" variables must contain a specially formatted logging definition string, which looks like this:

[component];[file];[function];[from_line[-to_line]];loglevel[/component...]

That is, a logging definition consists of definition entries, separated by slashes ('/'). If only one entry is present, there is no need to add a slash to its end (although it is not forbidden either).
All definition fields (component, file, function, lines and loglevel) are mandatory, but (except for the loglevel) they can be empty. An empty field means "match anything". Note that even if fields are empty, the semicolon (';') separators must be present.
The loglevel field is mandatory, and must contain one of the log level names (ERROR, WARNING, INFO or DEBUG).
The lines field might contain one non-negative number, in which case it matches only one line, or a range "from_line-to_line", in which case it matches any line in the interval [from_line;to_line] (that is, including both start and end line).
GNUnet mostly defaults component name to the name of the service that is implemented in a process ('transport', 'core', 'peerinfo', etc), but logging calls can specify custom component names using GNUNET_log_from.
File name and function name are provided by the compiler (__FILE__ and __FUNCTION__ built-ins).

Component, file and function fields are interpreted as non-extended regular expressions (GNU libc regex functions are used). Matching is case-sensitive, ^ and $ will match the beginning and the end of the text. If a field is empty, its contents are automatically replaced with a ".*" regular expression, which matches anything. Matching is done in the default way, which means that the expression matches as long as it's contained anywhere in the string. Thus "GNUNET_" will match both "GNUNET_foo" and "BAR_GNUNET_BAZ". Use '^' and/or '$' to make sure that the expression matches at the start and/or at the end of the string.
The semicolon (';') can't be escaped, and GNUnet will not use it in component names (it can't be used in function names and file names anyway).

Every logging call in GNUnet code will be (at run time) matched against the log definitions passed to the process. If a log definition fields are matching the call arguments, then the call log level is compared the the log level of that definition. If the call log level is less or equal to the definition log level, the call is allowed to proceed. Otherwise the logging call is forbidden, and nothing is logged. If no definitions matched at all, GNUnet will use the global log level or (if a global log level is not specified) will default to WARNING (that is, it will allow the call to proceed, if its level is less or equal to the global log level or to WARNING).

That is, definitions are evaluated from left to right, and the first matching definition is used to allow or deny the logging call. Thus it is advised to place narrow definitions at the beginning of the logdef string, and generic definitions - at the end.

Whether a call is allowed or not is only decided the first time this particular call is made. The evaluation result is then cached, so that any attempts to make the same call later will be allowed or disallowed right away. Because of that runtime log level evaluation should not significantly affect the process performance.
Log definition parsing is only done once, at the first call to GNUNET_log_setup () made by the process (which is usually done soon after it starts).

At the moment of writing there is no way to specify logging definitions from configuration files, only via environment variables.

At the moment GNUnet will stop processing a log definition when it encounters an error in definition formatting or an error in regular expression syntax, and will not report the failure in any way.

Examples

GNUNET_FORCE_LOG=";;;;DEBUG" gnunet-arm -s
Start GNUnet process tree, running all processes with DEBUG level (one should be careful with it, as log files will grow at alarming rate!)
GNUNET_FORCE_LOG="core;;;;DEBUG" gnunet-arm -s
Start GNUnet process tree, running the core service under DEBUG level (everything else will use configured or default level).
GNUNET_FORCE_LOG=";gnunet-service-transport_validation.c;;;DEBUG" gnunet-arm -s
Start GNUnet process tree, allowing any logging calls from gnunet-service-transport_validation.c (everything else will use configured or default level).
GNUNET_FORCE_LOG="fs;gnunet-service-fs_push.c;;;DEBUG" gnunet-arm -s
Start GNUnet process tree, allowing any logging calls from gnunet-gnunet-service-fs_push.c (everything else will use configured or default level).
GNUNET_FORCE_LOG=";;GNUNET_NETWORK_socket_select;;DEBUG" gnunet-arm -s
Start GNUnet process tree, allowing any logging calls from the GNUNET_NETWORK_socket_select function (everything else will use configured or default level).
GNUNET_FORCE_LOG="transport.*;;.*send.*;;DEBUG/;;;;WARNING" gnunet-arm -s
Start GNUnet process tree, allowing any logging calls from the components that have "transport" in their names, and are made from function that have "send" in their names. Everything else will be allowed to be logged only if it has WARNING level.

On Windows, one can use batch files to run GNUnet processes with special environment variables, without affecting the whole system. Such batch file will look like this:

set GNUNET_FORCE_LOG=;;do_transmit;;DEBUG
gnunet-arm -s

(note the absence of double quotes in the environment variable definition, as opposed to earlier examples, which use the shell).
Another limitation, on Windows, GNUNET_FORCE_LOGFILE MUST be set in order to GNUNET_FORCE_LOG to work.

Log files

GNUnet can be told to log everything into a file instead of stderr (which is the default) using the "--log-file=logfile" or "-l logfile" option. This option can also be passed via command line, or from the "OPTION" and "GLOBAL_POSTFIX" configuration keys (see above). The file name passed with this option is subject to GNUnet filename expansion. If specified in "GLOBAL_POSTFIX", it is also subject to ARM service filename expansion, in particular, it may contain "{}" (left and right curly brace) sequence, which will be replaced by ARM with the name of the service. This is used to keep logs from more than one service separate, while only specifying one template containing "{}" in GLOBAL_POSTFIX.

As part of a secondary file name expansion, the first occurrence of "[]" sequence ("left square brace" followed by "right square brace") in the file name will be replaced with a process identifier or the process when it initializes its logging subsystem. As a result, all processes will log into different files. This is convenient for isolating messages of a particular process, and prevents I/O races when multiple processes try to write into the file at the same time. This expansion is done independently of "{}" expansion that ARM service does (see above).

The log file name that is specified via "-l" can contain format characters from the 'strftime' function family. For example, "%Y" will be replaced with the current year. Using "basename-%Y-%m-%d.log" would include the current year, month and day in the log file. If a GNUnet process runs for long enough to need more than one log file, it will eventually clean up old log files. Currently, only the last three log files (plus the current log file) are preserved. So once the fifth log file goes into use (so after 4 days if you use "%Y-%m-%d" as above), the first log file will be automatically deleted. Note that if your log file name only contains "%Y", then log files would be kept for 4 years and the logs from the first year would be deleted once year 5 begins. If you do not use any date-related string format codes, logs would never be automatically deleted by GNUnet.

Updated behavior of GNUNET_log

It's currently quite common to see constructions like this all over the code:

#if MESH_DEBUG
  GNUNET_log (GNUNET_ERROR_TYPE_DEBUG,
    "MESH: client disconnected\n");
#endif

The reason for the #if is not to avoid displaying the message when disabled (GNUNET_ERROR_TYPE takes care of that), but to avoid the compiler including it in the binary at all, when compiling GNUnet for platforms with restricted storage space / memory (MIPS routers, ARM plug computers / dev boards, etc).

This presents several problems: the code gets ugly, hard to write and it is very easy to forget to include the #if guards, creating non-consistent code. A new change in GNUNET_log aims to solve these problems.

This change requires to ./configure with at least --enable-logging=verbose to see debug messages.

Here is an example of code with dense debug statements:

 switch (restrict_topology)
  {
  case GNUNET_TESTING_TOPOLOGY_CLIQUE:
#if VERBOSE_TESTING
    GNUNET_log (GNUNET_ERROR_TYPE_DEBUG,
                _("Blacklisting all but clique topology\n"));
#endif
    unblacklisted_connections =
        create_clique (pg, &remove_connections, 
           BLACKLIST, GNUNET_NO);
    break;
  case GNUNET_TESTING_TOPOLOGY_SMALL_WORLD_RING:
#if VERBOSE_TESTING
    GNUNET_log (GNUNET_ERROR_TYPE_DEBUG,
       _("Blacklisting all but small world (ring) topology\n"));
#endif
    unblacklisted_connections =
        create_small_world_ring (pg, 
           &remove_connections, BLACKLIST);
    break;

Pretty hard to follow, huh?

From now on, it is not necessary to include the #if / #endif statements to acheive the same behavior. The GNUNET_log and GNUNET_log_from macros take care of it for you, depending on the configure option:

  • If --enable-logging is set to no, the binary will contain no log messages at all.
  • If --enable-logging is set to yes, the binary will contain no DEBUG messages, and therefore running with -L DEBUG will have no effect. Other messages (ERROR, WARNING, INFO, etc) will be included.
  • If --enable-logging is set to verbose, or veryverbose the binary will contain DEBUG messages (still, it will be neccessary to run with -L DEBUG or set the DEBUG config option to show them).

If you are a developer:

  • please make sure that you ./configure --enable-logging={verbose,veryverbose}, so you can see DEBUG messages.
  • please remove the #if statements around GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, ...) lines, to improve the readibility of your code.

Since now activating DEBUG automatically makes it VERBOSE and activates all debug messages by default, you probably want to use the https://gnunet.org/logging functionality to filter only relevant messages. A suitable configuration could be:
$ export GNUNET_FORCE_LOG="^YOUR_SUBSYSTEM$;;;;DEBUG/;;;;WARNING"
Which will behave almost like enabling DEBUG in that subsytem before the change. Of course you can adapt it to your particular needs, this is only a quick example.

Interprocess communication API

In GNUnet a variety of new message types might be defined and used in interprocess communication, in this tutorial we use the struct AddressLookupMessage as a example to introduce how to construct our own message type in GNUnet and how to implement the message communication between service and client.
(Here, a client uses the struct AddressLookupMessage as a request to ask the server to return the address of any other peer connecting to the service.)

Define new message types

First of all, you should define the new message type in gnunet_protocols.h:

 
 // Request to look addresses of peers in server. 
#define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP 29  
  // Response to the address lookup request.  
#define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY 30  	

Define message struct

After the type definition, the specified message structure should also be described in the header file, e.g. transport.h in our case.

 
GNUNET_NETWORK_STRUCT_BEGIN

struct AddressLookupMessage 
 {   
 struct GNUNET_MessageHeader header;   
 int32_t numeric_only GNUNET_PACKED; 
 struct GNUNET_TIME_AbsoluteNBO timeout;   
 uint32_t addrlen GNUNET_PACKED;   
 /* followed by 'addrlen' bytes of the actual address, then  
    followed by the 0-terminated name of the transport */  
}; 
GNUNET_NETWORK_STRUCT_END

Please note GNUNET_NETWORK_STRUCT_BEGIN and GNUNET_PACKED which both ensure correct alignment when sending structs over the network

Connection between client and server

For typical communication, the connection should be created first, in other words, a connection between the client and the service should be established.

Client setting

Establish connection

At first, on the client side, the underlying API is employed to create a new connection to a service, in our example the transport service would be connected.

 
struct GNUNET_CLIENT_Connection *client; 
client = GNUNET_CLIENT_connect ("transport", cfg); 
Initialize request message

When the connection is ready, we initialize the message. In this step, all the fields of the message should be properly initialized, namely the size, type, and some extra user-defined data, such as timeout, name of transport, address and name of transport.

 
struct AddressLookupMessage *msg; 
size_t len = sizeof (struct AddressLookupMessage) + addressLen 
                  + strlen (nameTrans) + 1; 
msg->header->size = htons (len);
msg->header->type = htons (GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP); 
msg->timeout = GNUNET_TIME_absolute_hton (abs_timeout); 
msg->addrlen = htonl (addressLen); 
char *addrbuf = (char *) &msg[1]; 
memcpy (addrbuf, address, addressLen); 
char *tbuf = &addrbuf[addressLen]; 
memcpy (tbuf, nameTrans, strlen (nameTrans) + 1); 

 

Note that, here the functions htonl, htons and GNUNET_TIME_absolute_hton are applied to convert little endian into big endian, about the usage of the big/small edian order and the corresponding conversion function please refer to Introduction of Big Endian and Little Endian.

Send request and receive response

Next, the client would send the constructed message as a request to the service and wait for the response from the service. To accomplish this goal, there are a number of API calls that can be used. In this example, GNUNET_CLIENT_transmit_and_get_response is chosen as the most appropriate function to use.

 
GNUNET_CLIENT_transmit_and_get_response (client, 
    msg->header, timeout, GNUNET_YES, 
    &address_response_processor, arp_ctx); 

the argument address_response_processor is a function with GNUNET_CLIENT_MessageHandler type, which is used to process the reply message from the service.

Server Setting

Startup service

After receiving the request message, we run a standard GNUnet service startup sequence using GNUNET_SERVICE_run, as follows,

  
int main(int argc, char**argv)
{   
  GNUNET_SERVICE_run(argc, argv, 
                  "transport"
                   GNUNET_SERVICE_OPTION_NONE, 
                  &run, NULL)); 
}
Add new handles for specified messages

in the function above the argument run is used to initiate transport service,and defined like this:

 
static void 
run (void *cls, 
        struct GNUNET_SERVER_Handle *serv, 
        const struct GNUNET_CONFIGURATION_Handle *cfg)
{
   GNUNET_SERVER_add_handlers (serv, handlers); 
} 

Here, GNUNET_SERVER_add_handlers must be called in the run function to add new handlers in the service. The parameter handlers is a list of struct GNUNET_SERVER_MessageHandler to tell the service which function should be called when a particular type of message is received, and should be defined in this way:

 
static struct GNUNET_SERVER_MessageHandler handlers[] = {    
{&handle_start, NULL, 
  GNUNET_MESSAGE_TYPE_TRANSPORT_START, 0},    
{&handle_send, NULL, 
  GNUNET_MESSAGE_TYPE_TRANSPORT_SEND, 0},    
{&handle_try_connect, NULL, 
  GNUNET_MESSAGE_TYPE_TRANSPORT_TRY_CONNECT, 
   sizeof (struct TryConnectMessage)}, 
{&handle_address_lookup, NULL, 
  GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP, 0},
{NULL, NULL, 0, 0} 
}; 

As shown, the first member of the struct in the first area is a callback function, which is called to process the specified message types, given as the third member. The second parameter is the closure for the callback function, which is set to NULL in most cases, and the last parameter is the expected size of the message of this type, usually we set it to 0 to accept variable size, for special cases the exact size of the specified message also can be set. In addition, the terminator sign depicted as {NULL, NULL, 0, 0} is set in the last aera.

Process request message

After the initialization of transport service, the request message would be processed. Before handling the main message data, the validity of this message should be checked out, e.g., to check whether the size of message is correct.

 
size = ntohs (message->size);   
if (size < sizeof (struct AddressLookupMessage))      
{        
  GNUNET_break_op (0);
  GNUNET_SERVER_receive_done (client, GNUNET_SYSERR);
  return;
} 

Note that, opposite to the construction method of the request message in the client, in the server the function nothl and ntohs should be employed during the extraction of the data from the message, so that the data in big endian order can be converted back into little endian order. See more in detail please refer to Introduction of Big Endian and Little Endian.

Moreover in this example, the name of the transport stored in the message is a 0-terminated string, so we should also check whether the name of the transport in the received message is 0-terminated:

nameTransport = (const char *) &address[addressLen];
if (nameTransport[size - sizeof (struct AddressLookupMessage) 
                                - addressLen - 1] != '\0')
{
  GNUNET_break_op (0);
  GNUNET_SERVER_receive_done (client, GNUNET_SYSERR);
  return; 
} 

Here, GNUNET_SERVER_receive_done should be called to tell the service that the request is done and can receive the next message. The argument GNUNET_SYSERR here indicates that the service didn't understand the request message, and the processing of this request would be terminated.

In comparison to the aforementioned situation, when the argument is equal to GNUNET_OK, the service would continue to process the requst message.

 

Response to client

Once the processing of current request is done, the server should give the response to the client. A new struct AddressLookupMessage would be produced by the server in a similar way as the client did and sent to the client, but here the type should be GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY rather than GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP in client.

 
struct AddressLookupMessage *msg; 
size_t len = sizeof (struct AddressLookupMessage) 
                   + addressLen + strlen (nameTrans) + 1; 
msg->header->size 
  = htons (len); 
msg->header->type 
  = htons (GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY); 

// ... 

struct GNUNET_SERVER_TransmitContext *tc;  
tc = GNUNET_SERVER_transmit_context_create (client);
GNUNET_SERVER_transmit_context_append_data (tc, NULL, 0,
      GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY);
GNUNET_SERVER_transmit_context_run (tc, rtimeout);

Note that, there are also a number of other APIs provided to the service to send the message.

Notification of clients

Often a service needs to (repeatedly) transmit notifications to a client or a group of clients. In these cases, the client typically has once registered for a set of events and then needs to receive a message whenever such an event happens (until the client disconnects). The use of a notification context can help manage message queues to clients and handle disconnects. Notification contexts can be used to send individualized messages to a particular client or to broadcast messages to a group of clients. An individualized notification might look like tihs:

GNUNET_SERVER_notification_context_unicast(nc, client,
   msg, GNUNET_YES); 

Note that after processing the original registration message for notifications, the server code still typically needs to call
GNUNET_SERVER_receive_done so that the client can transmit further messages to the server.

Conversion between Network Byte Order (Big Endian) and Host Byte Order

Here we can simply comprehend big endian and little endian as Network Byte Order and Host Byte Order respectively. What is the difference between both two?

Usually in our host computer we store the data byte as Host Byte Order, for example, we store a integer in the RAM which might occupies 4 Byte, as Host Byte Order the higher Byte would be stored at the lower address of RAM, and the lower Byte would be stored at the higher address of RAM. However, contrast to this, Network Byte Order just take the totally opposite way to store the data, says, it will store the lower Byte at the lower address, and the higher Byte will stay at higher address.

For the current communication of network, we normally exchange the information by surveying the data package, every two host wants to communicate with each other must send and receive data package through network. In order to maintain the identity of data through the transmission in the network, the order of the Byte storage must changed before sending and after receiving the data.

There ten convenient functions to realize the conversion of Byte Order in GNUnet, as following:

uint16_t htons(uint16_t hostshort)
Convert host byte order to net byte order with short int
uint32_t htonl(uint32_t hostlong)
Convert host byte order to net byte order with long int
uint16_t ntohs(uint16_t netshort)
Convert net byte order to host byte order with short int
uint32_t ntohl(uint32_t netlong)
Convert net byte order to host byte order with long int
unsigned long long GNUNET_ntohll (unsigned long long netlonglong)
Convert net byte order to host byte order with long long int
unsigned long long GNUNET_htonll (unsigned long long hostlonglong)
Convert host byte order to net byte order with long long int
struct GNUNET_TIME_RelativeNBO GNUNET_TIME_relative_hton (struct GNUNET_TIME_Relative a)
Convert relative time to network byte order.
struct GNUNET_TIME_Relative GNUNET_TIME_relative_ntoh (struct GNUNET_TIME_RelativeNBO a)
Convert relative time from network byte order.
struct GNUNET_TIME_AbsoluteNBO GNUNET_TIME_absolute_hton (struct GNUNET_TIME_Absolute a)
Convert relative time to network byte order.
struct GNUNET_TIME_Absolute GNUNET_TIME_absolute_ntoh (struct GNUNET_TIME_AbsoluteNBO a)
Convert relative time from network byte order.

Cryptography API

The gnunetutil APIs provides the cryptographic primitives used in GNUnet. GNUnet uses 2048 bit RSA keys for the session key exchange and for signing messages by peers and most other public-key operations. Most researchers in cryptography consider 2048 bit RSA keys as secure and practically unbreakable for a long time. The API provides functions to create a fresh key pair, read a private key from a file (or create a new file if the file does not exist), encrypt, decrypt, sign, verify and extraction of the public key into a format suitable for network transmission.

For the encryption of files and the actual data exchanged between peers GNUnet uses 256-bit AES encryption. Fresh, session keys are negotiated for every new connection.
Again, there is no published technique to break this cipher in any realistic amount of time. The API provides functions for generation of keys, validation of keys (important for checking that decryptions using RSA succeeded), encryption and decryption.

GNUnet uses SHA-512 for computing one-way hash codes. The API provides functions to compute a hash over a block in memory or over a file on disk.

The crypto API also provides functions for randomizing a block of memory, obtaining a single random number and for generating a permuation of the numbers 0 to n-1. Random number generation distinguishes between WEAK and STRONG random number quality; WEAK random numbers are pseudo-random whereas STRONG random numbers use entropy gathered from the operating system.

Finally, the crypto API provides a means to deterministically generate a 1024-bit RSA key from a hash code. These functions should most likely not be used by most applications; most importantly,
GNUNET_CRYPTO_rsa_key_create_from_hash does not create an RSA-key that should be considered secure for traditional applications of RSA.

Message Queue API

Introduction
Often, applications need to queue messages that are to be sent to other GNUnet peers, clients or services. As all of GNUnet's message-based communication APIs, by design, do not allow messages to be queued, it is common to implement custom message queues manually when they are needed. However, writing very similar code in multiple places is tedious and leads to code duplication.

MQ (for Message Queue) is an API that provides the functionality to implement and use message queues. We intend to eventually replace all of the custom message queue implementations in GNUnet with MQ.

Basic Concepts
The two most important entities in MQ are queues and envelopes.

Every queue is backed by a specific implementation (e.g. for mesh, stream, connection, server client, etc.) that will actually deliver the queued messages. For convenience,
some queues also allow to specify a list of message handlers. The message queue will then also wait for incoming messages and dispatch them appropriately.

An envelope holds the the memory for a message, as well as metadata (Where is the envelope queued? What should happen after it has been sent?). Any envelope can only be queued in one message queue.

Creating Queues
The following is a list of currently available message queues. Note that to avoid layering issues, message queues for higher level APIs are not part of libgnunetutil, but
the respective API itself provides the queue implementation.

GNUNET_MQ_queue_for_connection_client
Transmits queued messages over a GNUNET_CLIENT_Connection
handle. Also supports receiving with message handlers.
GNUNET_MQ_queue_for_server_client
Transmits queued messages over a GNUNET_SERVER_Client
handle. Does not support incoming message handlers.
GNUNET_MESH_mq_create
Transmits queued messages over a GNUNET_MESH_Tunnel
handle. Does not support incoming message handlers.
GNUNET_MQ_queue_for_callbacks
This is the most general implementation. Instead of delivering and receiving messages with one of GNUnet's communication APIs, implementation callbacks are called. Refer to "Implementing Queues" for a more detailed explanation.

Allocating Envelopes
A GNUnet message (as defined by the GNUNET_MessageHeader) has three parts: The size, the type, and the body.

MQ provides macros to allocate an envelope containing a message conveniently,
automatically setting the size and type fields of the message.

Consider the following simple message, with the body consisting of a single number value.

struct NumberMessage
{
  /** Type: GNUNET_MESSAGE_TYPE_EXAMPLE_1 */
  struct GNUNET_MessageHeader header;
  uint32_t number GNUNET_PACKED;
};

An envelope containing an instance of the NumberMessage can be constructed like this:

struct GNUNET_MQ_Envelope *ev;
struct NumberMessage *msg;
ev = GNUNET_MQ_msg (msg, GNUNET_MESSAGE_TYPE_EXAMPLE_1);
msg->number = htonl (42);

In the above code, GNUNET_MQ_msg is a macro. The return value is the newly allocated envelope. The first argument must be a pointer to some struct containing a struct GNUNET_MessageHeader header field, while the second argument is the desired message type, in host byte order.

The msg pointer now points to an allocated message, where the message type and the message size are already set. The message's size is inferred from the type of the msg pointer: It will be set to 'sizeof(*msg)', properly converted to network byte order.

If the message body's size is dynamic, the the macro GNUNET_MQ_msg_extra can be used to allocate an envelope whose message has additional space allocated after the msg structure.

If no structure has been defined for the message, GNUNET_MQ_msg_header_extra can be used to allocate additional space after the message header. The first argument then must be a pointer to a GNUNET_MessageHeader.

Envelope Properties
A few functions in MQ allow to set additional properties on envelopes:

GNUNET_MQ_notify_sent
Allows to specify a function that will be called once the envelope's message
has been sent irrevocably. An envelope can be canceled precisely up to the
point where the notify sent callback has been called.
GNUNET_MQ_disable_corking
No corking will be used when sending the message. Not every
queue supports this flag, per default, envelopes are sent with corking.

Sending Envelopes
Once an envelope has been constructed, it can be queued for sending with GNUNET_MQ_send.

Note that in order to avoid memory leaks, an envelope must either be sent (the queue will free it) or destroyed explicitly with GNUNET_MQ_discard.

Canceling Envelopes
An envelope queued with GNUNET_MQ_send can be canceled with GNUNET_MQ_cancel. Note that after the notify sent callback has been called, canceling a message results in undefined behavior. Thus it is unsafe to cancel an envelope that does not have a notify sent callback. When canceling an envelope, it is not necessary
to call GNUNET_MQ_discard, and the envelope can't be sent again.

Implementing Queues
TODO

Service API

Most GNUnet code lives in the form of services. Services are processes that offer an API for other components of the system to build on. Those other components can be command-line tools for users, graphical user interfaces or other services. Services provide their API using an IPC protocol. For this, each service must listen on either a TCP port or a UNIX domain socket; for this, the service implementation uses the server API. This use of server is exposed directly to the users of the service API. Thus, when using the service API, one is usually also often using large parts of the server API. The service API provides various convenience functions, such as parsing command-line arguments and the configuration file, which are not found in the server API. The dual to the service/server API is the client API, which can be used to access services.

The most common way to start a service is to use the GNUNET_SERVICE_run function from the program's main function. GNUNET_SERVICE_run will then parse the command line and configuration files and, based on the options found there, start the server. It will then give back control to the main program, passing the server and the configuration to the GNUNET_SERVICE_Main callback. GNUNET_SERVICE_run will also take care of starting the scheduler loop. If this is inappropriate (for example, because the scheduler loop is already running), GNUNET_SERVICE_start and related functions provide an alternative to GNUNET_SERVICE_run.

When starting a service, the service_name option is used to determine which sections in the configuration file should be used to configure the service. A typical value here is the name of the src/ sub-directory, for example "statistics". The same string would also be given to GNUNET_CLIENT_connect to access the service.

Once a service has been initialized, the program should use the
GNUNET_SERVICE_Main callback to register message handlers using GNUNET_SERVER_add_handlers. The service will already have registered a handler for the "TEST" message.

The option bitfield (enum GNUNET_SERVICE_Options) determines how a service should behave during shutdown. There are three key strategies:

instant (GNUNET_SERVICE_OPTION_NONE)
Upon receiving the shutdown signal from the scheduler, the service immediately terminates the server, closing all existing connections with clients.
manual (GNUNET_SERVICE_OPTION_MANUAL_SHUTDOWN)
The service does nothing by itself during shutdown. The main program will need to take the appropriate action by calling GNUNET_SERVER_destroy or GNUNET_SERVICE_stop (depending on how the service was initialized) to terminate the service. This method is used by gnunet-service-arm and rather uncommon.
soft (GNUNET_SERVICE_OPTION_SOFT_SHUTDOWN)
Upon receiving the shutdown signal from the scheduler, the service immediately tells the server to stop listening for incoming clients. Requests from normal existing clients are still processed and the server/service terminates once all normal clients have disconnected. Clients that are not expected to ever disconnect (such as clients that monitor performance values) can be marked as 'monitor' clients using GNUNET_SERVER_client_mark_monitor. Those clients will continue to be processed until all 'normal' clients have disconnected. Then, the server will terminate, closing the monitor connections. This mode is for example used by 'statistics', allowing existing 'normal' clients to set (possibly persistent) statistic values before terminating.

Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps

A commonly used data structure in GNUnet is a (multi-)hash map. It is most often used to map a peer identity to some data structure, but also to map arbitrary keys to values (for example to track requests in the distributed hash table or in file-sharing). As it is commonly used, the DHT is actually sometimes responsible for a large share of GNUnet's overall memory consumption (for some processes, 30% is not uncommon). The following text documents some API quirks (and their implications for applications) that were recently introduced to minimize the footprint of the hash map.

Analysis

The main reason for the "excessive" memory consumption by the hash map is that GNUnet uses 512-bit cryptographic hash codes --- and the (multi-)hash map also uses the same 512-bit 'struct GNUNET_HashCode'. As a result, storing just the keys requires 64 bytes of memory for each key. As some applications like to keep a large number of entries in the hash map (after all, that's what maps are good for), 64 bytes per hash is significant: keeping a pointer to the value and having a linked list for collisions consume between 8 and 16 bytes, and 'malloc' may add about the same overhead per allocation, putting us in the 16 to 32 byte per entry ballpark. Adding a 64-byte key then triples the overall memory requirement for the hash map.

To make things "worse", most of the time storing the key in the hash map is not required: it is typically already in memory elsewhere! In most cases, the values stored in the hash map are some application-specific struct that _also_ contains the hash. Here is a simplified example:

struct MyValue
{
   struct GNUNET_HashCode key;
   unsigned int my_data;
};

// ...
val = GNUNET_malloc (sizeof (struct MyValue));
val->key = key;
val->my_data = 42;
GNUNET_CONTAINER_multihashmap_put (map, &key, val, ...);

This is a common pattern as later the entries might need to be removed, and at that time it is convenient to have the key immediately at hand:

GNUNET_CONTAINER_multihashmap_remove (map, &val->key, val);

Note that here we end up with two times 64 bytes for the key, plus maybe 64 bytes total for the rest of the 'struct MyValue' and the map entry in the hash map. The resulting redundant storage of the key increases overall memory consumption per entry from the "optimal" 128 bytes to 192 bytes. This is not just an extreme example: overheads in practice are actually sometimes close to those highlighted in this example. This is especially true for maps with a significant number of entries, as there we tend to really try to keep the entries small.

Solution

The solution that has now been implemented is to optionally allow the hash map to not make a (deep) copy of the hash but instead have a pointer to the hash/key in the entry. This reduces the memory consumption for the key from 64 bytes to 4 to 8 bytes. However, it can also only work if the key is actually stored in the entry (which is the case most of the time) and if the entry does not modify the key (which in all of the code I'm aware of has been always the case if there key is stored in the entry). Finally, when the client stores an entry in the hash map, it must provide a pointer to the key within the entry, not just a pointer to a transient location of the key. If the client code does not meet these requirements, the result is a dangling pointer and undefined behavior of the (multi-)hash map API.

Migration

To use the new feature, first check that the values contain the respective key (and never modify it). Then, all calls to GNUNET_CONTAINER_multihashmap_put on the respective map must be audited and most likely changed to pass a pointer into the value's struct. For the initial example, the new code would look like this:

struct MyValue
{
   struct GNUNET_HashCode key;
   unsigned int my_data;
};

// ...
val = GNUNET_malloc (sizeof (struct MyValue));
val->key = key;
val->my_data = 42;
GNUNET_CONTAINER_multihashmap_put (map, &val->key, val, ...);

Note that &val was changed to &val->key in the argument to the put call. This is critical as often key is on the stack or in some other transient data structure and thus having the hash map keep a pointer to key would not work. Only the key inside of val has the same lifetime as the entry in the map (this must of course be checked as well). Naturally, val->key must be intiialized before the put call. Once all put calls have been converted and double-checked, you can change the call to create the hash map from

map = GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_NO);

to

map = GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_YES);

If everything was done correctly, you now use about 60 bytes less memory per entry in map. However, if now (or in the future) any call to put does not ensure that the given key is valid until the entry is removed from the map, undefined behavior is likely to be observed.

Conclusion

The new optimization can is often applicable and can result in a reduction in memory consumption of up to 30% in practice. However, it makes the code less robust as additional invariants are imposed on the multi hash map client. Thus applications should refrain from enabling the new mode unless the resulting performance increase is deemed significant enough. In particular, it should generally not be used in new code (wait at least until benchmarks exist).

Availability

The new multi hash map code was committed in SVN 24319 (will be in GNUnet 0.9.4). Various subsystems (transport, core, dht, file-sharing) were previously audited and modified to take advantage of the new capability. In particular, memory consumption of the file-sharing service is expected to drop by 20-30% due to this change.

The CONTAINER_MDLL API

This text documents the GNUNET_CONTAINER_MDLL API. The GNUNET_CONTAINER_MDLL API is similar to the GNUNET_CONTAINER_DLL API in that it provides operations for the construction and manipulation of doubly-linked lists. The key difference to the (simpler) DLL-API is that the MDLL-version allows a single element (instance of a "struct") to be in multiple linked lists at the same time.

Like the DLL API, the MDLL API stores (most of) the data structures for the doubly-linked list with the respective elements; only the 'head' and 'tail' pointers are stored "elsewhere" -- and the application needs to provide the locations of head and tail to each of the calls in the MDLL API. The key difference for the MDLL API is that the "next" and "previous" pointers in the struct can no longer be simply called "next" and "prev" --- after all, the element may be in multiple doubly-linked lists, so we cannot just have one "next" and one "prev" pointer!

The solution is to have multiple fields that must have a name of the format "next_XX" and "prev_XX" where "XX" is the name of one of the doubly-linked lists. Here is a simple example:

 
struct MyMultiListElement 
{
  struct MyMultiListElement *next_ALIST;  
  struct MyMultiListElement *prev_ALIST;
  struct MyMultiListElement *next_BLIST;  
  struct MyMultiListElement *prev_BLIST;
  void *data;
};

Note that by convention, we use all-uppercase letters for the list names. In addition, the program needs to have a location for the head and tail pointers for both lists, for example:

 
static struct MyMultiListElement *head_ALIST;
static struct MyMultiListElement *tail_ALIST;
static struct MyMultiListElement *head_BLIST;
static struct MyMultiListElement *tail_BLIST;

Using the MDLL-macros, we can now insert an element into the ALIST:

 
GNUNET_CONTAINER_MDLL_insert (ALIST, head_ALIST, tail_ALIST, element);

Passing "ALIST" as the first argument to MDLL specifies which of the next/prev fields in the 'struct MyMultiListElement' should be used. The extra "ALIST" argument and the "_ALIST" in the names of the next/prev-members are the only differences between the MDDL and DLL-API. Like the DLL-API, the MDLL-API offers functions for inserting (at head, at tail, after a given element) and removing elements from the list. Iterating over the list should be done by directly accessing the "next_XX" and/or "prev_XX" members.

The Automatic Restart Manager (ARM)

GNUnet's Automated Restart Manager (ARM) is the GNUnet service responsible for system initialization and service babysitting. ARM starts and halts services, detects configuration changes and restarts services impacted by the changes as needed. It's also responsible for restarting services in case of crashes and is planned to incorporate automatic debugging for diagnosing service crashes providing developers insights about crash reasons. The purpose of this document is to give GNUnet developer an idea about how ARM works and how to interact with it.

Basic functionality

  • ARM source code can be found under "src/arm".
    Service processes are managed by the functions in "gnunet-service-arm.c" which is controlled with "gnunet-arm.c" (main function in that file is ARM's entry point).
  • The functions responsible for communicating with ARM , starting and stopping services -including ARM service itself- are provided by the ARM API "arm_api.c".
    Function: GNUNET_ARM_connect() returns to the caller an ARM handle after setting it to the caller's context (configuration and scheduler in use). This handle can be used afterwards by the caller to communicate with ARM. Functions GNUNET_ARM_start_service() and GNUNET_ARM_stop_service() are used for starting and stopping services respectively.
  • A typical example of using these basic ARM services can be found in file test_arm_api.c. The test case connects to ARM, starts it, then uses it to start a service "resolver", stops the "resolver" then stops "ARM".

Key configuration options

Configurations for ARM and services should be available in a .conf file (As an example, see test_arm_api_data.conf). When running ARM, the configuration file to use should be passed to the command:

$ gnunet-arm -s -c configuration_to_use.conf

If no configuration is passed, the default configuration file will be used (see GNUNET_PREFIX/share/gnunet/defaults.conf which is created from contrib/defaults.conf).
Each of the services is having a section starting by the service name between square brackets, for example: "[arm]". The following options configure how ARM configures or interacts with the various services:

PORT
Port number on which the service is listening for incoming TCP connections. ARM will start the services should it notice a request at this port.
HOSTNAME
Specifies on which host the service is deployed. Note that ARM can only start services that are running on the local system (but will not check that the hostname matches the local machine name). This option is used by the gnunet_client_lib.h implementation to determine which system to connect to. The default is "localhost".
BINARY
The name of the service binary file.
OPTIONS
To be passed to the service.
PREFIX
A command to pre-pend to the actual command, for example, running a service with "valgrind" or "gdb"
DEBUG
Run in debug mode (much verbosity).
AUTOSTART
ARM will listen to UNIX domain socket and/or TCP port of the service and start the service on-demand.
FORCESTART
ARM will always start this service when the peer is started.
ACCEPT_FROM
IPv4 addresses the service accepts connections from.
ACCEPT_FROM6
IPv6 addresses the service accepts connections from.

Options that impact the operation of ARM overall are in the "[arm]" section. ARM is a normal service and has (except for AUTOSTART) all of the options that other services do. In addition, ARM has the following options:

GLOBAL_PREFIX
Command to be pre-pended to all services that are going to run.
GLOBAL_POSTFIX
Global option that will be supplied to all the services that are going to run.

Availability

As mentioned before, one of the features provided by ARM is starting services on demand. Consider the example of one service "client" that wants to connect to another service a "server". The "client" will ask ARM to run the "server". ARM starts the "server". The "server" starts listening to incoming connections. The "client" will establish a connection with the "server". And then, they will start to communicate together.
One problem with that scheme is that it's slow!
The "client" service wants to communicate with the "server" service at once and is not willing wait for it to be started and listening to incoming connections before serving its request.
One solution for that problem will be that ARM starts all services as default services. That solution will solve the problem, yet, it's not quite practical, for some services that are going to be started can never be used or are going to be used after a relatively long time.
The approach followed by ARM to solve this problem is as follows:

  • For each service having a PORT field in the configuration file and that is not one of the default services ( a service that accepts incoming connections from clients), ARM creates listening sockets for all addresses associated with that service.
  • The "client" will immediately establish a connection with the "server".
  • ARM --- pretending to be the "server" --- will listen on the respective port and notice the incoming connection from the "client" (but not accept it), instead
  • Once there is an incoming connection, ARM will start the "server", passing on the listen sockets (now, the service is started and can do its work).
  • Other client services now can directly connect directly to the "server".

Reliability

One of the features provided by ARM, is the automatic restart of crashed services.
ARM needs to know which of the running services died. Function "gnunet-service-arm.c/maint_child_death()" is responsible for that. The function is scheduled to run upon receiving a SIGCHLD signal. The function, then, iterates ARM's list of services running and monitors which service has died (crashed). For all crashing services, ARM restarts them.
Now, considering the case of a service having a serious problem causing it to crash each time it's started by ARM. If ARM keeps blindly restarting such a service, we are going to have the pattern: start-crash-restart-crash-restart-crash and so forth!! Which is of course not practical.
For that reason, ARM schedules the service to be restarted after waiting for some delay that grows exponentially with each crash/restart of that service.
To clarify the idea, considering the following example:

  • Service S crashed.
  • ARM receives the SIGCHLD and inspects its list of services to find the dead one(s).
  • ARM finds S dead and schedules it for restarting after "backoff" time which is initially set to 1ms. ARM will double the backoff time correspondent to S (now backoff(S) = 2ms)
  • Because there is a severe problem with S, it crashed again.
  • Again ARM receives the SIGCHLD and detects that it's S again that's crashed. ARM schedules it for restarting but after its new backoff time (which became 2ms), and doubles its backoff time (now backoff(S) = 4).
  • and so on, until backoff(S) reaches a certain threshold (EXPONENTIAL_BACKOFF_THRESHOLD is set to half an hour), after reaching it, backoff(S) will remain half an hour, hence ARM won't be busy for a lot of time trying to restart a problematic service.

GNUnet's TRANSPORT Subsystem

This chapter documents how the GNUnet transport subsystem works. The GNUnet transport subsystem consists of three main components: the transport API (the interface used by the rest of the system to access the transport service), the transport service itself (most of the interesting functions, such as choosing transports, happens here) and the transport plugins. A transport plugin is a concrete implementation for how two GNUnet peers communicate; many plugins exist, for example for communication via TCP, UDP, HTTP, HTTPS and others. Finally, the transport subsystem uses supporting code, especially the NAT/UPnP library to help with tasks such as NAT traversal.

Key tasks of the transport service include:

  • Create our HELLO message, notify clients and neighbours if our HELLO changes (using NAT library as necessary)
  • Validate HELLOs from other peers (send PING), allow other peers to validate our HELLO's addresses (send PONG)
  • Upon request, establish connections to other peers (using address selection from ATS subsystem) and maintain them (again using PINGs and PONGs) as long as desired
  • Accept incoming connections, give ATS service the opportunity to switch communication channels
  • Notify clients about peers that have connected to us or that have been disconnected from us
  • If a (stateful) connection goes down unexpectedly (without explicit DISCONNECT), quickly attempt to recover (without notifying clients) but do notify clients quickly if reconnecting fails
  • Send (payload) messages arriving from clients to other peers via transport plugins and receive messages from other peers, forwarding those to clients
  • Enforce inbound traffic limits (using flow-control if it is applicable); outbound traffic limits are enforced by CORE, not by us (!)
  • Enforce restrictions on P2P connection as specified by the blacklist configuration and blacklisting clients

Note that the term "clients" in the list above really refers to the GNUnet-CORE service, as CORE is typically the only client of the transport service.

Address validation protocol

This section documents how the GNUnet transport service validates connections with other peers. It is a high-level description of the protocol necessary to understand the details of the implementation. It should be noted that when we talk about PING and PONG messages in this section, we refer to transport-level PING and PONG messages, which are different from core-level PING and PONG messages (both in implementation and function).

The goal of transport-level address validation is to minimize the chances of a successful man-in-the-middle attack against GNUnet peers on the transport level. Such an attack would not allow the adversary to decrypt the P2P transmissions, but a successful attacker could at least measure traffic volumes and latencies (raising the adversaries capablities by those of a global passive adversary in the worst case). The scenarios we are concerned about is an attacker, Mallory, giving a HELLO to Alice that claims to be for Bob, but contains Mallory's IP address instead of Bobs (for some transport). Mallory would then forward the traffic to Bob (by initiating a connection to Bob and claiming to be Alice). As a further complication, the scheme has to work even if say Alice is behind a NAT without traversal support and hence has no address of her own (and thus Alice must always initiate the connection to Bob).

An additional constraint is that HELLO messages do not contain a cryptographic signature since other peers must be able to edit (i.e. remove) addresses from the HELLO at any time (this was not true in GNUnet 0.8.x). A basic assumption is that each peer knows the set of possible network addresses that it might be reachable under (so for example, the external IP address of the NAT plus the LAN address(es) with the respective ports).

The solution is the following. If Alice wants to validate that a given address for Bob is valid (i.e. is actually established directly with the intended target), it sends a PING message over that connection to Bob. Note that in this case, Alice initiated the connection so only she knows which address was used for sure (Alice maybe behind NAT, so whatever address Bob sees may not be an address Alice knows she has). Bob checks that the address given in the PING is actually one of his addresses (does not belong to Mallory), and if it is, sends back a PONG (with a signature that says that Bob owns/uses the address from the PING). Alice checks the signature and is happy if it is valid and the address in the PONG is the address she used. This is similar to the 0.8.x protocol where the HELLO contained a signature from Bob for each address used by Bob. Here, the purpose code for the signature is GNUNET_SIGNATURE_PURPOSE_TRANSPORT_PONG_OWN. After this, Alice will remember Bob's address and consider the address valid for a while (12h in the current implementation). Note that after this exchange, Alice only considers Bob's address to be valid, the connection itself is not considered 'established'. In particular, Alice may have many addresses for Bob that she considers valid.

The PONG message is protected with a nonce/challenge against replay attacks and uses an expiration time for the signature (but those are almost implementation details).

NAT library

The goal of the GNUnet NAT library is to provide a general-purpose API for NAT traversal without third-party support. So protocols that involve contacting a third peer to help establish a connection between two peers are outside of the scope of this API. That does not mean that GNUnet doesn't support involving a third peer (we can do this with the distance-vector transport or using application-level protocols), it just means that the NAT API is not concerned with this possibility. The API is written so that it will work for IPv6-NAT in the future as well as current IPv4-NAT. Furthermore, the NAT API is always used, even for peers that are not behind NAT --- in that case, the mapping provided is simply the identity.

NAT traversal is initiated by calling GNUNET_NAT_register. Given a set of addresses that the peer has locally bound to (TCP or UDP), the NAT library will return (via callback) a (possibly longer) list of addresses the peer might be reachable under. Internally, depending on the configuration, the NAT library will try to punch a hole (using UPnP) or just "know" that the NAT was manually punched and generate the respective external IP address (the one that should be globally visible) based on the given information.

The NAT library also supports ICMP-based NAT traversal. Here, the other peer can request connection-reversal by this peer (in this special case, the peer is even allowed to configure a port number of zero). If the NAT library detects a connection-reversal request, it returns the respective target address to the client as well. It should be noted that connection-reversal is currently only intended for TCP, so other plugins must pass NULL for the reversal callback. Naturally, the NAT library also supports requesting connection reversal from a remote peer (GNUNET_NAT_run_client).

Once initialized, the NAT handle can be used to test if a given address is possibly a valid address for this peer (GNUNET_NAT_test_address). This is used for validating our addresses when generating PONGs.

Finally, the NAT library contains an API to test if our NAT configuration is correct. Using GNUNET_NAT_test_start before binding to the respective port, the NAT library can be used to test if the configuration works. The test function act as a local client, initialize the NAT traversal and then contact a gnunet-nat-server (running by default on gnunet.org) and ask for a connection to be established. This way, it is easy to test if the current NAT configuration is valid.

Distance-Vector plugin

The Distance Vector (DV) transport is a transport mechanism that allows peers to act as relays for each other, thereby connecting peers that would otherwise be unable to connect. This gives a larger connection set to applications that may work better with more peers to choose from (for example, File Sharing and/or DHT).

The Distance Vector transport essentially has two functions. The first is "gossiping" connection information about more distant peers to directly connected peers. The second is taking messages intended for non-directly connected peers and encapsulating them in a DV wrapper that contains the required information for routing the message through forwarding peers. Via gossiping, optimal routes through the known DV neighborhood are discovered and utilized and the message encapsulation provides some benefits in addition to simply getting the message from the correct source to the proper destination.

The gossiping function of DV provides an up to date routing table of peers that are available up to some number of hops. We call this a fisheye view of the network (like a fish, nearby objects are known while more distant ones unknown). Gossip messages are sent only to directly connected peers, but they are sent about other knowns peers within the "fisheye distance". Whenever two peers connect, they immediately gossip to each other about their appropriate other neighbors. They also gossip about the newly connected peer to previously connected neighbors. In order to keep the routing tables up to date, disconnect notifications are propogated as gossip as well (because disconnects may not be sent/received, timeouts are also used remove stagnant routing table entries).

Routing of messages via DV is straightforward. When the DV transport is notified of a message destined for a non-direct neighbor, the appropriate forwarding peer is selected, and the base message is encapsulated in a DV message which contains information about the initial peer and the intended recipient. At each forwarding hop, the initial peer is validated (the forwarding peer ensures that it has the initial peer in its neighborhood, otherwise the message is dropped). Next the base message is re-encapsulated in a new DV message for the next hop in the forwarding chain (or delivered to the current peer, if it has arrived at the destination).

Assume a three peer network with peers Alice, Bob and Carol. Assume that Alice <-> Bob and Bob <-> Carol are direct (e.g. over TCP or UDP transports) connections, but that Alice cannot directly connect to Carol. This may be the case due to NAT or firewall restrictions, or perhaps based on one of the peers respective configurations. If the Distance Vector transport is enabled on all three peers, it will automatically discover (from the gossip protocol) that Alice and Carol can connect via Bob and provide a "virtual" Alice <-> Carol connection. Routing between Alice and Carol happens as follows; Alice creates a message destined for Carol and notifies the DV transport about it. The DV transport at Alice looks up Carol in the routing table and finds that the message must be sent through Bob for Carol. The message is encapsulated setting Alice as the initiator and Carol as the destination and sent to Bob. Bob receives the messages, verifies both Alice and Carol are known to Bob, and re-wraps the message in a new DV message for Carol. The DV transport at Carol receives this message, unwraps the original message, and delivers it to Carol as though it came directly from Alice.

SMTP plugin

This page describes the new SMTP transport plugin for GNUnet as it exists in the 0.7.x and 0.8.x branch. SMTP support is currently not available in GNUnet 0.9.x. This page also describes the transport layer abstraction (as it existed in 0.7.x and 0.8.x) in more detail and gives some benchmarking results. The performance results presented are quite old and maybe outdated at this point.

  1. Why use SMTP for a peer-to-peer transport?
  2. How does it work?
  3. How do I configure my peer?
  4. How do I test if it works?
  5. How fast is it?
  6. Is there any additional documentation?

Why use SMTP for a peer-to-peer transport?

There are many reasons why one would not want to use SMTP:

  • SMTP is using more bandwidth than TCP, UDP or HTTP
  • SMTP has a much higher latency.
  • SMTP requires significantly more computation (encoding and decoding time) for the peers.
  • SMTP is significantly more complicated to configure.
  • SMTP may be abused by tricking GNUnet into sending mail to
    non-participating third parties.

So why would anybody want to use SMTP?

  • SMTP can be used to contact peers behind NAT boxes (in virtual private networks).
  • SMTP can be used to circumvent policies that limit or prohibit peer-to-peer traffic by masking as "legitimate" traffic.
  • SMTP uses E-mail addresses which are independent of a specific IP, which can be useful to address peers that use dynamic IP addresses.
  • SMTP can be used to initiate a connection (e.g. initial address exchange) and peers can then negotiate the use of a more efficient protocol (e.g. TCP) for the actual communication.

In summary, SMTP can for example be used to send a message to a peer behind a NAT box that has a dynamic IP to tell the peer to establish a TCP connection to a peer outside of the private network. Even an extraordinary overhead for this first message would be irrelevant in this type of situation.

How does it work?

When a GNUnet peer needs to send a message to another GNUnet peer that has advertised (only) an SMTP transport address, GNUnet base64-encodes the message and sends it in an E-mail to the advertised address. The advertisement contains a filter which is placed in the E-mail header, such that the receiving host can filter the tagged E-mails and forward it to the GNUnet peer process. The filter can be specified individually by each peer and be changed over time. This makes it impossible to censor GNUnet E-mail messages by searching for a generic filter.

How do I configure my peer?

First, you need to configure procmail to filter your inbound E-mail for GNUnet traffic. The GNUnet messages must be delivered into a pipe, for example /tmp/gnunet.smtp. You also need to define a filter that is used by procmail to detect GNUnet messages. You are free to choose whichever filter you like, but you should make sure that it does not occur in your other E-mail. In our example, we will use X-mailer: GNUnet. The ~/.procmailrc configuration file then looks like this:

:0:
* ^X-mailer: GNUnet
/tmp/gnunet.smtp
# where do you want your other e-mail delivered to
# (default: /var/spool/mail/)
:0:
/var/spool/mail/

After adding this file, first make sure that your regular E-mail still works (e.g. by sending an E-mail to yourself). Then edit the GNUnet configuration. In the section SMTP you need to specify your E-mail address under EMAIL, your mail server (for outgoing mail) under SERVER, the filter (X-mailer: GNUnet in the example) under FILTER and the name of the pipe under PIPE.
The completed section could then look like this:

EMAIL = me@mail.gnu.org
MTU = 65000
SERVER = mail.gnu.org:25
FILTER = "X-mailer: GNUnet"
PIPE = /tmp/gnunet.smtp

Finally, you need to add smtp to the list of TRANSPORTS in the GNUNETD section. GNUnet peers will use the E-mail address that you specified to contact your peer until the advertisement times out. Thus, if you are not sure if everything works properly or if you are not planning to be online for a long time, you may want to configure this timeout to be short, e.g. just one hour. For this, set HELLOEXPIRES to 1 in the GNUNETD section.

This should be it, but you may probably want to test it first.

How do I test if it works?

Any transport can be subjected to some rudimentary tests using the gnunet-transport-check tool. The tool sends a message to the local node via the transport and checks that a valid message is received. While this test does not involve other peers and can not check if firewalls or other network obstacles prohibit proper operation, this is a great testcase for the SMTP transport since it tests pretty much nearly all of the functionality.

gnunet-transport-check should only be used without running gnunetd at the same time. By default, gnunet-transport-check tests all transports that are specified in the configuration file. But you can specifically test SMTP by giving the option --transport=smtp.

Note that this test always checks if a transport can receive and send. While you can configure most transports to only receive or only send messages, this test will only work if you have configured the transport to send and receive messages.

How fast is it?

We have measured the performance of the UDP, TCP and SMTP transport layer directly and when used from an application using the GNUnet core. Measureing just the transport layer gives the better view of the actual overhead of the protocol, whereas evaluating the transport from the application puts the overhead into perspective from a practical point of view.

The loopback measurements of the SMTP transport were performed on three different machines spanning a range of modern SMTP configurations. We used a PIII-800 running RedHat 7.3 with the Purdue Computer Science configuration which includes filters for spam. We also used a Xenon 2 GHZ with a vanilla RedHat 8.0 sendmail configuration. Furthermore, we used qmail on a PIII-1000 running Sorcerer GNU Linux (SGL).
The numbers for UDP and TCP are provided using the SGL configuration. The qmail benchmark uses qmail’s internal filtering whereas the sendmail benchmarks relies on procmail to filter and deliver the mail. We used the transport layer to send a message of b bytes (excluding transport protocol headers) directly to the local machine. This way, network latency and packet loss on the wire have no impact on the timings. n messages were sent sequentially over the transport layer, sending message i+1 after the i-th message was received. All messages were sent over the same connection and the time to establish the connection was not taken into account since this overhead is miniscule in practice -- as long as a connection is used for a significant number of messages.

Transport UDP TCP SMTP (Purdue sendmail) SMTP (RH 8.0) SMTP (SGL qmail)
11 bytes 31 ms 55 ms 781 s 77 s 24 s
407 bytes 37 ms 62 ms 789 s 78 s 25 s
1,221 bytes 46 ms 73 ms 804 s 78 s 25 s

The benchmarks show that UDP and TCP are, as expected, both significantly faster compared with any of the SMTP services. Among the SMTP implementations, there can be significant differences depending on the SMTP configuration. Filtering with an external tool like procmail that needs to re-parse its configuration for each mail can be very expensive. Applying spam filters can also significantly impact the performance of the underlying SMTP implementation. The microbenchmark shows that SMTP can be a viable solution for initiating peer-to-peer sessions: a couple of seconds to connect to a peer are probably not even going to be noticed by users. The next benchmark measures the possible throughput for a transport. Throughput can be measured by sending multiple messages in parallel and measuring packet loss. Note that not only UDP but also the TCP transport can actually loose messages since the TCP implementation drops messages if the write to the socket would block. While the SMTP protocol never drops messages itself, it is often so slow that only a fraction of the messages can be sent and received in the given time-bounds. For this benchmark we report the message loss after allowing t time for sending m messages. If messages were not sent (or received) after an overall timeout of t, they were considered lost. The benchmark was performed using two Xeon 2 GHZ machines running RedHat 8.0 with sendmail. The machines were connected with a direct 100 MBit ethernet connection.
Figures udp1200, tcp1200 and smtp-MTUs show that the throughput for messages of size 1,200 octects is 2,343 kbps, 3,310 kbps and 6 kbps for UDP, TCP and SMTP respectively. The high per-message overhead of SMTP can be improved by increasing the MTU, for example, an MTU of 12,000 octets improves the throughput to 13 kbps as figure smtp-MTUs shows. Our research paper) has some more details on the benchmarking results.

Bluetooth plugin

This page describes the new Bluetooth transport plugin for GNUnet. The plugin is still in the testing stage so don't expect it to work perfectly. If you have any questions or problems just post them here or ask on the IRC channel.

  1. What do I need to use the Bluetooth plugin transport?
  2. How does it work?
  3. What possible errors should I be aware of?
  4. How do I configure my peer?
  5. How can I test it?

What do I need to use the Bluetooth plugin transport?

If you are a Linux user and you want to use the Bluetooth transport plugin you should install the BlueZ development libraries (if they aren't already installed). For instructions about how to install the libraries you should check out the BlueZ site (http://www.bluez.org). If you don't know if you have the necesarry libraries, don't worry, just run the GNUnet configure script and you will be able to see a notification at the end which will warn you if you don't have the necessary libraries.

If you are a Windows user you should have installed the MinGW/MSys2 with the latest updates (especially the ws2bth header). If this is your first build of GNUnet on Windows you should check out the SBuild repository. It will semi-automatically assembles a MinGW/MSys2 installation with a lot of extra packages which are needed for the GNUnet build. So this will ease your work!
Finally you just have to be sure that you have the correct drivers for your Bluetooth device installed and that your device is on and in a discoverable mode. The Windows Bluetooth Stack supports only the RFCOMM protocol so we cannot turn on your device programatically!

How does it work?

The Bluetooth transport plugin uses virtually the same code as the WLAN plugin and only the helper binary is different. The helper takes a single argument, which represents the interface name and is specified in the configuration file. Here are the basic steps that are followed by the helper binary used on Linux:

  • it verifies if the name corresponds to a Bluetooth interface name
  • it verifies if the iterface is up (if it is not, it tries to bring it up)
  • it tries to enable the page and inquiry scan in order to make the device discoverable and to accept incoming connection requests
  • The above operations require root access so you should start the transport plugin with root privileges.

  • it finds an available port number and registers a SDP service which will be used to find out on which port number is the server listening on and switch the socket in listening mode
  • it sends a HELLO message with its address
  • finally it forwards traffic from the reading sockets to the STDOUT and from the STDIN to the writing socket

Once in a while the device will make an inquiry scan to discover the nearby devices and it will send them randomly HELLO messages for peer discovery.

What possible errors should I be aware of?

This section is dedicated for Linux users

Well there are many ways in which things could go wrong but I will try to present some tools that you could use to debug and some scenarios.

  • bluetoothd -n -d : use this command to enable logging in the foreground and to print the logging messages
  • hciconfig: can be used to configure the Bluetooth devices. If you run it without any arguments it will print information about the state of the interfaces. So if you receive an error that the device couldn't be brought up you should try to bring it manually and to see if it works (use hciconfig -a hciX up). If you can't and the Bluetooth address has the form 00:00:00:00:00:00 it means that there is something wrong with the D-Bus daemon or with the Bluetooth daemon. Use bluetoothd tool to see the logs
  • sdptool can be used to control and interogate SDP servers. If you encounter problems regarding the SDP server (like the SDP server is down) you should check out if the D-Bus daemon is running correctly and to see if the Bluetooth daemon started correctly(use bluetoothd tool). Also, sometimes the SDP service could work but somehow the device couldn't register his service. Use sdptool browse [dev-address] to see if the service is registered. There should be a service with the name of the interface and GNUnet as provider.
  • hcitool : another useful tool which can be used to configure the device and to send some particular commands to it.
  • hcidump : could be used for low level debugging

How do I configure my peer?

On Linux, you just have to be sure that the interface name corresponds to the one that you want to use. Use the hciconfig tool to check that. By default it is set to hci0 but you can change it.

A basic configuration looks like this :

[transport-bluetooth]
# Name of the interface (typically hciX)
INTERFACE = hci0
# Real hardware, no testing
TESTMODE = 0
TESTING_IGNORE_KEYS = ACCEPT_FROM;

In order to use the Bluetooth transport plugin when the transport service is started, you must add the plugin name to the default transport service plugins list. For example :

[transport]
...
PLUGINS = dns, bluetooth
...

If you want to use only the Bluetooth plugin set PLUGINS = bluetooth

On Windows, you cannot specify which device to use. The only thing that you should do is to add bluetooth on the plugins list of the transport service.

How can I test it?

If you have two Bluetooth devices on the same machine which use Linux you must:

  • create two different file configuration (one which will use the first interface (hci0) and the other which will use the second interface (hci1)). Let's name them peer1.conf and peer2.conf.
  • run gnunet-peerinfo -c peerX.conf -s in order to generate the peers private keys. The X must be replace with 1 or 2.
  • run gnunet-arm -c peerX.conf -s -i=transport in order to start the transport service. (Make sure that you have "bluetooth" on the transport plugins list if the Bluetooth transport service doesn't start.)
  • run gnunet-peerinfo -c peer1.conf -s to get the first peer's ID. If you already know your peer ID (you saved it from the first command), this can be skipped.
  • run gnunet-transport -c peer2.conf -p=PEER1_ID -s to start sending data for benchmarking to the other peer.

This scenario will try to connect the second peer to the first one and then start sending data for benchmarking.

On Windows you cannot test the plugin functionality using two Bluetooth devices from the same machine because after you install the drivers there will occur some conflicts between the Bluetooth stacks. (At least that is what happend on my machine : I wasn't able to use the Bluesoleil stack and the WINDCOMM one in the same time).

If you have two different machines and your configuration files are good you can use the same scenario presented on the begining of this section.

Another way to test the plugin functionality is to create your own application which will use the GNUnet framework with the Bluetooth transport service.

The implementation of the Bluetooth transport plugin

This page describes the implementation of the Bluetooth transport plugin.

First I want to remind you that the Bluetooth transport plugin uses virtually the same code as the WLAN plugin and only the helper binary is different. Also the scope of the helper binary from the Bluetooth transport plugin is the same as the one used for the wlan transport plugin: it acceses the interface and then it forwards traffic in both directions between the Bluetooth interface and stdin/stdout of the process involved.

The Bluetooth plugin transport could be used both on Linux and Windows platforms.

  1. Linux functionality
  2. Windows functionality
  3. Pending Features

Linux functionality

In order to implement the plugin functionality on Linux I used the BlueZ stack. For the communication with the other devices I used the RFCOMM protocol. Also I used the HCI protocol to gain some control over the device. The helper binary takes a single argument (the name of the Bluetooth interface) and is separated in two stages:

    THE INITIALIZATION

    • first, it checks if we have root privilegies (Remember that we need to have root privilegies in order to be able to bring the interface up if it is down or to change its state. ).
    • second, it verifies if the interface with the given name exists.
    • If the interface with that name exists and it is a Bluetooth interface:

    • it creates a RFCOMM socket which will be used for listening and call the open_device method
    • On the open_device method:

      • creates a HCI socket used to send control events to the the device
      • searches for the device ID using the interface name
      • saves the device MAC address
      • checks if the interface is down and tries to bring it UP
      • checks if the interface is in discoverable mode and tries to make it discoverable
      • closes the HCI socket and binds the RFCOMM one
      • switches the RFCOMM socket in listening mode
      • registers the SDP service (the service will be used by the other devices to get the port on which this device is listening on)
    • drops the root privilegies
    • If the interface is not a Bluetooth interface the helper exits with a suitable error

    THE LOOP

    The helper binary uses a list where it saves all the connected neighbour devices (neighbours.devices) and two buffers (write_pout and write_std). The first message which is send is a control message with the device's MAC address in order to announce the peer presence to the neighbours. Here are a short description of what happens in the main loop:

    • Every time when it receives something from the STDIN it processes the data and saves the message in the first buffer (write_pout). When it has something in the buffer, it gets the destination address from the buffer, searches the destination address in the list (if there is no connection with that device, it creates a new one and saves it to the list) and sends the message.
    • Every time when it receives something on the listening socket it accepts the connection and saves the socket on a list with the reading sockets.
    • Every time when it receives something from a reading socket it parses the message, verifies the CRC and saves it in the write_std buffer in order to be sent later to the STDOUT.

    So in the main loop we use the select function to wait until one of the file descriptor saved in one of the two file descriptors sets used is ready to use. The first set (rfds) represents the reading set and it could contain the list with the reading sockets, the STDIN file descriptor or the listening socket. The second set (wfds) is the writing set and it could contain the sending socket or the STDOUT file descriptor. After the select function returns, we check which file descriptor is ready to use and we do what is supposed to do on that kind of event. For example: if it is the listening socket then we accept a new connection and save the socket in the reading list; if it is the STDOUT file descriptor, then we write to STDOUT the message from the write_std buffer.

    To find out on which port a device is listening on we connect to the local SDP server and searche the registered service for that device.

    You should be aware of the fact that if the device fails to connect to another one when trying to send a message it will attempt one more time. If it fails again, then it skips the message.
    Also you should know that the transport Bluetooth plugin has support for broadcast messages.

    Detailes about the broadcast implementation

    First I want to point out that the broadcast functionality for the CONTROL messages is not implemented in a conventional way. Since the inquiry scan time is too big
    and it will take some time to send a message to all the discoverable devices I decided to tackle the problem in a different way. Here is how I did it:

    • If it is the first time when I have to broadcast a message I make an inquiry scan and save all the devices' addresses to a vector.
    • After the inquiry scan ends I take the first address from the list and I try to connect to it. If it fails, I try to connect to the next one. If it succeeds, I save the socket to a list and send the message to the device.
    • When I have to broadcast another message, first I search on the list for a new device which I'm not connected to. If there is no new device on the list I go to the beginning of the list and send the message to the old devices. After 5 cycles I make a new inquiry scan to check out if there are new discoverable devices and save them to the list. If there are no new discoverable devices I reset the cycling counter and go again through the old list and send messages to the devices saved in it.
    • Therefore :

      • every time when I have a broadcast message I look up on the list for a new device and send the message to it
      • if I reached the end of the list for 5 times and I'm connected to all the devices from the list I make a new inquiry scan. The number of the list's cycles after an inquiry scan could be increased by redefining the MAX_LOOPS variable
      • when there are no new devices I send messages to the old ones.

      Doing so, the broadcast control messages will reach the devices but with delay.

      NOTICE: When I have to send a message to a certain device first I check on the broadcast list to see if we are connected to that device. If not we try to connect to it and in case of success we save the address and the socket on the list. If we are already connected to that device we simply use the socket.

Windows functionality

For Windows I decided to use the Microsoft Bluetooth stack which has the advantage of coming standard from Windows XP SP2. The main disadvantage is that it only supports the RFCOMM protocol so we will not be able to have a low level control over the Bluetooth device. Therefore it is the user responsability to check if the device is up and in the discoverable mode. Also there are no tools which could be used for debugging in order to read the data coming from and going to a Bluetooth device, which obviously hindered my work. Another thing that slowed down the implementation of the plugin (besides that I wasn't too accomodated with the win32 API) was that there were some bugs on MinGW regarding the Bluetooth. Now they are solved but you should keep in mind that you should have the latest updates (especially the ws2bth header).

Besides the fact that it uses the Windows Sockets, the Windows implemenation follows the same principles as the Linux one:

    It has a initalization part where it initializes the Windows Sockets, creates a RFCOMM socket which will be binded and switched to the listening mode and registers a SDP service.

    In the Microsoft Bluetooth API there are two ways to work with the SDP:

    • an easy way which works with very simple service records
    • a hard way which is useful when you need to update or to delete the record

    Since I only needed the SDP service to find out on which port the device is listening on and that did not change, I decided to use the easy way. In order to register the service I used the WSASetService function and I generated the Universally Unique Identifier with the guidgen.exe Windows's tool.

    In the loop section the only difference from the Linux implementation is that I used the GNUNET_NETWORK library for functions like accept, bind, connect or select. I decided to use the GNUNET_NETWORK library because I also needed to interact with the STDIN and STDOUT handles and on Windows the select function is only defined for sockets, and it will not work for arbitrary file handles.

Another difference between Linux and Windows implementation is that in Linux, the Bluetooth address is represented in 48 bits while in Windows is represented in 64 bits. Therefore I had to do some changes on plugin_transport_wlan header.

Also, currently on Windows the Bluetooth plugin doesn't have support for broadcast messages. When it receives a broadcast message it will skip it.

Pending features

  • Implement the broadcast functionality on Windows (currently working on)
  • Implement a testcase for the helper :

    The testcase consists of a program which emaluates the plugin and uses the helper. It will simulate connections, disconnections and data transfers.

If you have a new idea about a feature of the plugin or suggestions about how I could improve the implementation you are welcome to comment or to contact me.

WLAN plugin

This section documents how the wlan transport plugin works. Parts which are not implemented yet or could be better implemented are described at the end.

The ATS Subsystem

ATS stands for "automatic transport selection", and the function of ATS in GNUnet is to decide on which address (and thus transport plugin) should be used for two peers to communicate, and what bandwidth limits should be imposed on such an individual connection. To help ATS make an informed decision, higher-level services inform the ATS service about their requirements and the quality of the service rendered. The ATS service also interacts with the transport service to be appraised of working addresses and to communicate its resource allocation decisions. Finally, the ATS service's operation can be observed using a monitoring API.

The main logic of the ATS service only collects the available addresses, their performance characteristics and the applications requirements, but does not make the actual allocation decision. This last critical step is left to an ATS plugin, as we have implemented (currently three) different allocation strategies which differ significantly in their performance and maturity, and it is still unclear if any particular plugin is generally superior.

GNUnet's CORE Subsystem

The CORE subsystem in GNUnet is responsible for securing link-layer communications between nodes in the GNUnet overlay network. CORE builds on the TRANSPORT subsystem which provides for the actual, insecure, unreliable link-layer communication (for example, via UDP or WLAN), and then adds fundamental security to the connections:

  • confidentiality with so-called perfect forward secrecy; we use ECDHE powered by Curve25519 for the key exchange and then use symmetric encryption, encrypting with both AES-256 and Twofish
  • authentication is achieved by signing the ephemeral keys using Ed25519, a deterministic variant of ECDSA
  • integrity protection (using SHA-512 to do encrypt-then-MAC)
  • replay protection (using nonces, timestamps, challenge-response, message counters and ephemeral keys)
  • liveness (keep-alive messages, timeout)

Limitations

CORE does not perform routing; using CORE it is only possible to communicate with peers that happen to already be "directly" connected with each other. CORE also does not have an API to allow applications to establish such "direct" connections --- for this, applications can ask TRANSPORT, but TRANSPORT might not be able to establish a "direct" connection. The TOPOLOGY subsystem is responsible for trying to keep a few "direct" connections open at all times. Applications that need to talk to particular peers should use the CADET subsystem, as it can establish arbitrary "indirect" connections.

Because CORE does not perform routing, CORE must only be used directly by applications that either perform their own routing logic (such as anonymous file-sharing) or that do not require routing, for example because they are based on flooding the network. CORE communication is unreliable and delivery is possibly out-of-order. Applications that require reliable communication should use the CADET service. Each application can only queue one message per target peer with the CORE service at any time; messages cannot be larger than approximately 63 kilobytes. If messages are small, CORE may group multiple messages (possibly from different applications) prior to encryption. If permitted by the application (using the cork option), CORE may delay transmissions to facilitate grouping of multiple small messages. If cork is not enabled, CORE will transmit the message as soon as TRANSPORT allows it (TRANSPORT is responsible for limiting bandwidth and congestion control). CORE does not allow flow control; applications are expected to process messages at line-speed. If flow control is needed, applications should use the CADET service.

When is a peer "connected"?

In addition to the security features mentioned above, CORE also provides one additional key feature to applications using it, and that is a limited form of protocol-compatibility checking. CORE distinguishes between TRANSPORT-level connections (which enable communication with other peers) and application-level connections. Applications using the CORE API will (typically) learn about application-level connections from CORE, and not about TRANSPORT-level connections. When a typical application uses CORE, it will specify a set of message types (from gnunet_protocols.h) that it understands. CORE will then notify the application about connections it has with other peers if and only if those applications registered an intersecting set of message types with their CORE service. Thus, it is quite possible that CORE only exposes a subset of the established direct connections to a particular application --- and different applications running above CORE might see different sets of connections at the same time.

A special case are applications that do not register a handler for any message type. CORE assumes that these applications merely want to monitor connections (or "all" messages via other callbacks) and will notify those applications about all connections. This is used, for example, by the gnunet-core command-line tool to display the active connections. Note that it is also possible that the TRANSPORT service has more active connections than the CORE service, as the CORE service first has to perform a key exchange with connecting peers before exchanging information about supported message types and notifying applications about the new connection.

libgnunetcore

The CORE API (defined in gnunet_core_service.h) is the basic messaging API used by P2P applications built using GNUnet. It provides applications the ability to send and receive encrypted messages to the peer's "directly" connected neighbours.

As CORE connections are generally "direct" connections,
applications must not assume that they can connect to arbitrary peers this way, as "direct" connections may not always be possible. Applications using CORE are notified about which peers are connected. Creating new "direct" connections must be done using the TRANSPORT API.

The CORE API provides unreliable, out-of-order delivery. While the implementation tries to ensure timely, in-order delivery, both message losses and reordering are not detected and must be tolerated by the application. Most important, the core will NOT perform retransmission if messages could not be delivered.

Note that CORE allows applications to queue one message per connected peer. The rate at which each connection operates is influenced by the preferences expressed by local application as well as restrictions imposed by the other peer. Local applications can express their preferences for particular connections using the "performance" API of the ATS service.

Applications that require more sophisticated transmission capabilities such as TCP-like behavior, or if you intend to send messages to arbitrary remote peers, should use the CADET API.

The typical use of the CORE API is to connect to the CORE service using GNUNET_CORE_connect, process events from the CORE service (such as peers connecting, peers disconnecting and incoming messages) and send messages to connected peers using GNUNET_CORE_notify_transmit_ready. Note that applications must cancel pending transmission requests if they receive a disconnect event for a peer that had a transmission pending; furthermore, queueing more than one transmission request per peer per application using the service is not permitted.

The CORE API also allows applications to monitor all communications of the peer prior to encryption (for outgoing messages) or after decryption (for incoming messages). This can be useful for debugging, diagnostics or to establish the presence of cover traffic (for anonymity). As monitoring applications are often not interested in the payload, the monitoring callbacks can be configured to only provide the message headers (including the message type and size) instead of copying the full data stream to the monitoring client.

The init callback of the GNUNET_CORE_connect function is called with the hash of the public key of the peer. This public key is used to identify the peer globally in the GNUnet network. Applications are encouraged to check that the provided hash matches the hash that they are using (as theoretically the application may be using a different configuration file with a different private key, which would result in hard to find bugs).

As with most service APIs, the CORE API isolates applications from crashes of the CORE service. If the CORE service crashes, the application will see disconnect events for all existing connections. Once the connections are re-established, the applications will be receive matching connect events.

The CORE Client-Service Protocol

This section describes the protocol between an application using the CORE service (the client) and the CORE service process itself.

Setup

When a client connects to the CORE service, it first sends a InitMessage which specifies options for the connection and a set of message type values which are supported by the application. The options bitmask specifies which events the client would like to be notified about. The options include:

GNUNET_CORE_OPTION_NOTHING
No notifications
GNUNET_CORE_OPTION_STATUS_CHANGE
Peers connecting and disconnecting
GNUNET_CORE_OPTION_FULL_INBOUND
All inbound messages (after decryption) with full payload
GNUNET_CORE_OPTION_HDR_INBOUND
Just the MessageHeader of all inbound messages
GNUNET_CORE_OPTION_FULL_OUTBOUND
All outbound messages (prior to encryption) with full payload
GNUNET_CORE_OPTION_HDR_OUTBOUND
Just the MessageHeader of all outbound messages

Typical applications will only monitor for connection status changes.

The CORE service responds to the InitMessage with an InitReplyMessage which contains the peer's identity. Afterwards, both CORE and the client can send messages.

Notifications

The CORE will send ConnectNotifyMessages and DisconnectNotifyMessages whenever peers connect or disconnect from the CORE (assuming their type maps overlap with the message types registered by the client). When the CORE receives a message that matches the set of message types specified during the InitMessage (or if monitoring is enabled in for inbound messages in the options), it sends a NotifyTrafficMessage with the peer identity of the sender and the decrypted payload. The same message format (except with GNUNET_MESSAGE_TYPE_CORE_NOTIFY_OUTBOUND for the message type) is used to notify clients monitoring outbound messages; here, the peer identity given is that of the receiver.

Sending

When a client wants to transmit a message, it first requests a transmission slot by sending a SendMessageRequest which specifies the priority, deadline and size of the message. Note that these values may be ignored by CORE. When CORE is ready for the message, it answers with a SendMessageReady response. The client can then transmit the payload with a SendMessage message. Note that the actual message size in the SendMessage is allowed to be smaller than the size in the original request. A client may at any time send a fresh SendMessageRequest, which then superceeds the previous SendMessageRequest, which is then no longer valid. The client can tell which SendMessageRequest the CORE service's SendMessageReady message is for as all of these messages contain a "unique" request ID (based on a counter incremented by the client for each request).

The CORE Peer-to-Peer Protocol

Creating the EphemeralKeyMessage

When the CORE service starts, each peer creates a fresh ephemeral (ECC) public-private key pair and signs the corresponding EphemeralKeyMessage with its long-term key (which we usually call the peer's identity; the hash of the public long term key is what results in a struct GNUNET_PeerIdentity in all GNUnet APIs. The ephemeral key is ONLY used for an ECDHE exchange by the CORE service to establish symmetric session keys. A peer will use the same EphemeralKeyMessage for all peers for REKEY_FREQUENCY, which is usually 12 hours. After that time, it will create a fresh ephemeral key (forgetting the old one) and broadcast the new EphemeralKeyMessage to all connected peers, resulting in fresh symmetric session keys. Note that peers independently decide on when to discard ephemeral keys; it is not a protocol violation to discard keys more often. Ephemeral keys are also never stored to disk; restarting a peer will thus always create a fresh ephemeral key. The use of ephemeral keys is what provides forward secrecy.

Just before transmission, the EphemeralKeyMessage is patched to reflect the current sender_status, which specifies the current state of the connection from the point of view of the sender. The possible values are:

KX_STATE_DOWN
Initial value, never used on the network
KX_STATE_KEY_SENT
We sent our ephemeral key, do not know the key of the other peer
KX_STATE_KEY_RECEIVED
This peer has received a valid ephemeral key of the other peer, but we are waiting for the other peer to confirm it's authenticity (ability to decode) via challenge-response.
KX_STATE_UP
The connection is fully up from the point of view of the sender (now performing keep-alives)
KX_STATE_REKEY_SENT
The sender has initiated a rekeying operation; the other peer has so far failed to confirm a working connection using the new ephemeral key

Establishing a connection

Peers begin their interaction by sending a EphemeralKeyMessage to the other peer once the TRANSPORT service notifies the CORE service about the connection. A peer receiving an EphemeralKeyMessage with a status indicating that the sender does not have the receiver's ephemeral key, the receiver's EphemeralKeyMessage is sent in response.
Additionally, if the receiver has not yet confirmed the authenticity of the sender, it also sends an (encrypted)PingMessage with a challenge (and the identity of the target) to the other peer. Peers receiving a PingMessage respond with an (encrypted) PongMessage which includes the challenge. Peers receiving a PongMessage check the challenge, and if it matches set the connection to KX_STATE_UP.

Encryption and Decryption

All functions related to the key exchange and encryption/decryption of messages can be found in gnunet-service-core_kx.c (except for the cryptographic primitives, which are in util/crypto*.c).
Given the key material from ECDHE, a Key derivation function is used to derive two pairs of encryption and decryption keys for AES-256 and TwoFish, as well as initialization vectors and authentication keys (for HMAC). The HMAC is computed over the encrypted payload. Encrypted messages include an iv_seed and the HMAC in the header.

Each encrypted message in the CORE service includes a sequence number and a timestamp in the encrypted payload. The CORE service remembers the largest observed sequence number and a bit-mask which represents which of the previous 32 sequence numbers were already used. Messages with sequence numbers lower than the largest observed sequence number minus 32 are discarded. Messages with a timestamp that is less than REKEY_TOLERANCE off (5 minutes) are also discarded. This of course means that system clocks need to be reasonably synchronized for peers to be able to communicate. Additionally, as the ephemeral key changes every 12h, a peer would not even be able to decrypt messages older than 12h.

Type maps

Once an encrypted connection has been established, peers begin to exchange type maps. Type maps are used to allow the CORE service to determine which (encrypted) connections should be shown to which applications. A type map is an array of 65536 bits representing the different types of messages understood by applications using the CORE service. Each CORE service maintains this map, simply by setting the respective bit for each message type supported by any of the applications using the CORE service. Note that bits for message types embedded in higher-level protocols (such as MESH) will not be included in these type maps.

Typically, the type map of a peer will be sparse. Thus, the CORE service attempts to compress its type map using gzip-style compression ("deflate") prior to transmission. However, if the compression fails to compact the map, the map may also be transmitted without compression (resulting in GNUNET_MESSAGE_TYPE_CORE_COMPRESSED_TYPE_MAP or GNUNET_MESSAGE_TYPE_CORE_BINARY_TYPE_MAP messages respectively). Upon receiving a type map, the respective CORE service notifies applications about the connection to the other peer if they support any message type indicated in the type map (or no message type at all). If the CORE service experience a connect or disconnect event from an application, it updates its type map (setting or unsetting the respective bits) and notifies its neighbours about the change. The CORE services of the neighbours then in turn generate connect and disconnect events for the peer that sent the type map for their respective applications. As CORE messages may be lost, the CORE service confirms receiving a type map by sending back a GNUNET_MESSAGE_TYPE_CORE_CONFIRM_TYPE_MAP. If such a confirmation (with the correct hash of the type map) is not received, the sender will retransmit the type map (with exponential back-off).

GNUnet's CADET subsystem

The CADET subsystem in GNUnet is responsible for secure end-to-end communications between nodes in the GNUnet overlay network. CADET builds on the CORE subsystem which provides for the link-layer communication and then adds routing, forwarding and additional security to the connections. CADET offers the same cryptographic services as CORE, but on an end-to-end level. This is done so peers retransmitting traffic on behalf of other peers cannot access the payload data.

  • CADET provides confidentiality with so-called perfect forward secrecy; we use ECDHE powered by Curve25519 for the key exchange and then use symmetric encryption, encrypting with both AES-256 and Twofish
  • authentication is achieved by signing the ephemeral keys using Ed25519, a deterministic variant of ECDSA
  • integrity protection (using SHA-512 to do encrypt-then-MAC, although only 256 bits are sent to reduce overhead)
  • replay protection (using nonces, timestamps, challenge-response, message counters and ephemeral keys)
  • liveness (keep-alive messages, timeout)

Additional to the CORE-like security benefits, CADET offers other properties that make it a more universal service than CORE.

  • CADET can establish channels to arbitrary peers in GNUnet. If a peer is not immediately reachable, CADET will find a path through the network and ask other peers to retransmit the traffic on its behalf.
  • CADET offers (optional) reliability mechanisms. In a reliable channel traffic is guaranteed to arrive complete, unchanged and in-order.
  • CADET takes care of flow and congestion control mechanisms, not allowing the sender to send more traffic than the receiver or the network are able to process.

libgnunetcadet

The CADET API (defined in gnunet_cadet_service.h) is the messaging API used by P2P applications built using GNUnet. It provides applications the ability to send and receive encrypted messages to any peer participating in GNUnet. The API is heavily base on the CORE API.

CADET delivers messages to other peers in "channels". A channel is a permanent connection defined by a destination peer (identified by its public key) and a port number. Internally, CADET tunnels all channels towards a destiantion peer using one session key and relays the data on multiple "connections", independent from the channels.

Each channel has optional paramenters, the most important being the reliability flag. Should a message get lost on TRANSPORT/CORE level, if a channel is created with as reliable, CADET will retransmit the lost message and deliver it in order to the destination application.

To communicate with other peers using CADET, it is necessary to first connect to the service using GNUNET_CADET_connect. This function takes several parameters in form of callbacks, to allow the client to react to various events, like incoming channels or channels that terminate, as well as specify a list of ports the client wishes to listen to (at the moment it is not possible to start listening on further ports once connected, but nothing prevents a client to connect several times to CADET, even do one connection per listening port). The function returns a handle which has to be used for any further interaction with the service.

To connect to a remote peer a client has to call the GNUNET_CADET_channel_create function. The most important parameters given are the remote peer's identity (it public key) and a port, which specifies which application on the remote peer to connect to, similar to TCP/UDP ports. CADET will then find the peer in the GNUnet network and establish the proper low-level connections and do the necessary key exchanges to assure and authenticated, secure and verified communication. Similar to GNUNET_CADET_connect,GNUNET_CADET_create_channel returns a handle to interact with the created channel.

For every message the client wants to send to the remote application, GNUNET_CADET_notify_transmit_ready must be called, indicating the channel on which the message should be sent and the size of the message (but not the message itself!). Once CADET is ready to send the message, the provided callback will fire, and the message contents are provided to this callback.

Please note the CADET does not provide an explicit notification of when a channel is connected. In loosely connected networks, like big wireless mesh networks, this can take several seconds, even minutes in the worst case. To be alerted when a channel is online, a client can call GNUNET_CADET_notify_transmit_ready immediately after GNUNET_CADET_create_channel. When the callback is activated, it means that the channel is online. The callback can give 0 bytes to CADET if no message is to be sent, this is ok.

If a transmission was requested but before the callback fires it is no longer needed, it can be cancelled with GNUNET_CADET_notify_transmit_ready_cancel, which uses the handle given back by GNUNET_CADET_notify_transmit_ready. As in the case of CORE, only one message can be requested at a time: a client must not call GNUNET_CADET_notify_transmit_ready again until the callback is called or the request is cancelled.

When a channel is no longer needed, a client can call GNUNET_CADET_channel_destroy to get rid of it. Note that CADET will try to transmit all pending traffic before notifying the remote peer of the destruction of the channel, including retransmitting lost messages if the channel was reliable.

Incoming channels, channels being closed by the remote peer, and traffic on any incoming or outgoing channels are given to the client when CADET executes the callbacks given to it at the time of GNUNET_CADET_connect.

Finally, when an application no longer wants to use CADET, it should call GNUNET_CADET_disconnect, but first all channels and pending transmissions must be closed (otherwise CADET will complain).

GNUnet's NSE subsystem

NSE stands for Network Size Estimation. The NSE subsystem provides other subsystems and users with a rough estimate of the number of peers currently participating in the GNUnet overlay. The computed value is not a precise number as producing a precise number in a decentralized, efficient and secure way is impossible. While NSE's estimate is inherently imprecise, NSE also gives the expected range. For a peer that has been running in a stable network for a while, the real network size will typically (99.7% of the time) be in the range of [2/3 estimate, 3/2 estimate]. We will now give an overview of the algorithm used to calcualte the estimate; all of the details can be found in this technical report.

Motivation

Some subsytems, like DHT, need to know the size of the GNUnet network to optimize some parameters of their own protocol. The decentralized nature of GNUnet makes efficient and securely counting the exact number of peers infeasable. Although there are several decentralized algorithms to count the number of peers in a system, so far there is none to do so securely. Other protocols may allow any malicious peer to manipulate the final result or to take advantage of the system to perform DoS (Denial of Service) attacks against the network. GNUnet's NSE protocol avoids these drawbacks.

Security

The NSE subsystem is designed to be resilient against these attacks. It uses proofs of work to prevent one peer from impersonating a large number of participants, which would otherwise allow an adversary to artifically inflate the estimate. The DoS protection comes from the time-based nature of the protocol: the estimates are calculated periodically and out-of-time traffic is either ignored or stored for later retransmission by benign peers. In particular, peers cannot trigger global network communication at will.

Principle

The algorithm calculates the estimate by finding the globally closest peer ID to a random, time-based value.

The idea is that the closer the ID is to the random value, the more "densely packed" the ID space is, and therefore, more peers are in the network.

Example

Suppose all peers have IDs between 0 and 100 (our ID space), and the random value is 42. If the closest peer has the ID 70 we can imagine that the average "distance" between peers is around 30 and therefore the are around 3 peers in the whole ID space. On the other hand, if the closest peer has the ID 44, we can imagine that the space is rather packed with peers, maybe as much as 50 of them. Naturally, we could have been rather unlucky, and there is only one peer and happens to have the ID 44. Thus, the current estimate is calculated as the average over multiple rounds, and not just a single sample.

Algorithm

Given that example, one can imagine that the job of the subsystem is to efficiently communicate the ID of the closest peer to the target value to all the other peers, who will calculate the estimate from it.

Target value

The target value itself is generated by hashing the current time, rounded down to an agreed value. If the rounding amount is 1h (default) and the time is 12:34:56, the time to hash would be 12:00:00. The process is repeated each rouning amount (in this example would be every hour). Every repetition is called a round.

Timing

The NSE subsystem has some timing control to avoid everybody broadcasting its ID all at one. Once each peer has the target random value, it compares its own ID to the target and calculates the hypothetical size of the network if that peer were to be the closest. Then it compares the hypothetical size with the estimate from the previous rounds. For each value there is an assiciated point in the period, let's call it "broadcast time". If its own hypothetical estimate is the same as the previous global estimate, its "broadcast time" will be in the middle of the round. If its bigger it will be earlier and if its smaler (the most likely case) it will be later. This ensures that the peers closests to the target value start broadcasting their ID the first.

Controlled Flooding

When a peer receives a value, first it verifies that it is closer than the closest value it had so far, otherwise it answers the incoming message with a message containing the better value. Then it checks a proof of work that must be included in the incoming message, to ensure that the other peer's ID is not made up (otherwise a malicious peer could claim to have an ID of exactly the target value every round). Once validated, it compares the brodcast time of the received value with the current time and if it's not too early, sends the received value to its neighbors. Otherwise it stores the value until the correct broadcast time comes. This prevents unnecessary traffic of sub-optimal values, since a better value can come before the broadcast time, rendering the previous one obsolete and saving the traffic that would have been used to broadcast it to the neighbors.

Calculating the estimate

Once the closest ID has been spread across the network each peer gets the exact distance betweed this ID and the target value of the round and calculates the estimate with a mathematical formula described in the tech report. The estimate generated with this method for a single round is not very precise. Remember the case of the example, where the only peer is the ID 44 and we happen to generate the target value 42, thinking there are 50 peers in the network. Therefore, the NSE subsystem remembers the last 64 estimates and calculates an average over them, giving a result of which usually has one bit of uncertainty (the real size could be half of the estimate or twice as much). Note that the actual network size is calculated in powers of two of the raw input, thus one bit of uncertainty means a factor of two in the size estimate.

libgnunetnse

The NSE subsystem has the simplest API of all services, with only two calls: GNUNET_NSE_connect and GNUNET_NSE_disconnect.

The connect call gets a callback function as a parameter and this function is called each time the network agrees on an estimate. This usually is once per round, with some exceptions: if the closest peer has a late local clock and starts spreading his ID after everyone else agreed on a value, the callback might be activated twice in a round, the second value being always bigger than the first. The default round time is set to 1 hour.

The disconnect call disconnects from the NSE subsystem and the callback is no longer called with new estimates.

Results

The callback provides two values: the average and the standard deviation of the last 64 rounds. The values provided by the callback function are logarithmic, this means that the real estimate numbers can be obtained by calculating 2 to the power of the given value (2average). From a statistics point of view this means that:

  • 68% of the time the real size is included in the interval [(2average-stddev), 2]
  • 95% of the time the real size is included in the interval [(2average-2*stddev, 2^average+2*stddev]
  • 99.7% of the time the real size is included in the interval [(2average-3*stddev, 2average+3*stddev]

The expected standard variation for 64 rounds in a network of stable size is 0.2. Thus, we can say that normally:

  • 68% of the time the real size is in the range [-13%, +15%]
  • 95% of the time the real size is in the range [-24%, +32%]
  • 99.7% of the time the real size is in the range [-34%, +52%]

As said in the introduction, we can be quite sure that usually the real size is between one third and three times the estimate. This can of course vary with network conditions. Thus, applications may want to also consider the provided standard deviation value, not only the average (in particular, if the standard veriation is very high, the average maybe meaningless: the network size is changing rapidly).

Examples

Let's close with a couple examples.

Average: 10, std dev: 1
Here the estimate would be 2^10 = 1024 peers.
The range in which we can be 95% sure is: [2^8, 2^12] = [256, 4096]. We can be very (>99.7%) sure that the network is not a hundred peers and absolutely sure that it is not a million peers, but somewhere around a thousand.
Average 22, std dev: 0.2
Here the estimate would be 2^22 = 4 Million peers.
The range in which we can be 99.7% sure is: [2^21.4, 2^22.6] = [2.8M, 6.3M]. We can be sure that the network size is around four million, with absolutely way of it being 1 million.

To put this in perspective, if someone remembers the LHC Higgs boson results, were announced with "5 sigma" and "6 sigma" certainties. In this case a 5 sigma minimum would be 2 million and a 6 sigma minimum, 1.8 million.

The NSE Client-Service Protocol

As with the API, the client-service protocol is very simple, only has 2 different messages, defined in src/nse/nse.h:

  • GNUNET_MESSAGE_TYPE_NSE_START
    This message has no parameters and is sent from the client to the service upon connection.
  • GNUNET_MESSAGE_TYPE_NSE_ESTIMATE
    This message is sent from the service to the client for every new estimate and upon connection. Contains a timestamp for the estimate, the average and the standard deviation for the respective round.

When the GNUNET_NSE_disconnect API call is executed, the client simply disconnects from the service, with no message involved.

The NSE Peer-to-Peer Protocol

The NSE subsystem only has one message in the P2P protocol, the GNUNET_MESSAGE_TYPE_NSE_P2P_FLOOD message.

This message key contents are the timestamp to identify the round (differences in system clocks may cause some peers to send messages way too early or way too late, so the timestamp allows other peers to identify such messages easily), the proof of work used to make it difficult to mount a Sybil attack, and the public key, which is used to verify the signature on the message.

Every peer stores a message for the previous, current and next round. The messages for the previous and current round are given to peers that connect to us. The message for the next round is simply stored until our system clock advances to the next round. The message for the current round is what we are flooding the network with right now. At the beginning of each round the peer does the following:

  • calculates his own distance to the target value
  • creates, signs and stores the message for the current round (unless it has a better message in the "next round" slot which came early in the previous round)
  • calculates, based on the stored round message (own or received) when to stard flooding it to its neighbors

Upon receiving a message the peer checks the validity of the message (round, proof of work, signature). The next action depends on the contents of the incoming message:

  • if the message is worse than the current stored message, the peer sends the current message back immediately, to stop the other peer from spreading suboptimal results
  • if the message is better than the current stored message, the peer stores the new message and calculates the new target time to start spreading it to its neighbors (excluding the one the message came from)
  • if the message is for the previous round, it is compared to the message stored in the "previous round slot", which may then be updated
  • if the message is for the next round, it is compared to the message stored in the "next round slot", which again may then be updated

Finally, when it comes to send the stored message for the current round to the neighbors there is a random delay added for each neighbor, to avoid traffic spikes and minimize cross-messages.

GNUnet's HOSTLIST subsystem

Peers in the GNUnet overlay network need address information so that they can connect with other peers. GNUnet uses so called HELLO messages to store and exchange peer addresses. GNUnet provides several methods for peers to obtain this information:

  • out-of-band exchange of HELLO messages (manually, using for example gnunet-peerinfo)
  • HELLO messages shipped with GNUnet (automatic with distribution)
  • UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast)
  • topology gossiping (learning from other peers we already connected to), and
  • the HOSTLIST daemon covered in this section, which is particularly relevant for bootstrapping new peers.

New peers have no existing connections (and thus cannot learn from gossip among peers), may not have other peers in their LAN and might be started with an outdated set of HELLO messages from the distribution. In this case, getting new peers to connect to the network requires either manual effort or the use of a HOSTLIST to obtain HELLOs.

HELLOs

The basic information peers require to connect to other peers are contained in so called HELLO messages you can think of as a business card. Besides the identity of the peer (based on the cryptographic public key) a HELLO message may contain address information that specifies ways to contact a peer. By obtaining HELLO messages, a peer can learn how to contact other peers.

Overview for the HOSTLIST subsystem

The HOSTLIST subsystem provides a way to distribute and obtain contact information to connect to other peers using a simple HTTP GET request. It's implementation is split in three parts, the main file for the daemon itself (gnunet-daemon-hostlist.c), the HTTP client used to download peer information (hostlist-client.c) and the server component used to provide this information to other peers (hostlist-server.c). The server is basically a small HTTP web server (based on GNU libmicrohttpd) which provides a list of HELLOs known to the local peer for download. The client component is basically a HTTP client (based on libcurl) which can download hostlists from one or more websites. The hostlist format is a binary blob containing a sequence of HELLO messages. Note that any HTTP server can theoretically serve a hostlist, the build-in hostlist server makes it simply convenient to offer this service.

Features

The HOSTLIST daemon can:

  • provide HELLO messages with validated addresses obtained from PEERINFO to download for other peers
  • download HELLO messages and forward these message to the TRANSPORT subsystem for validation
  • advertises the URL of this peer's hostlist address to other peers via gossip
  • automatically learn about hostlist servers from the gossip of other peers

Limitations

The HOSTLIST daemon does not:

  • verify the cryptographic information in the HELLO messages
  • verify the address information in the HELLO messages

Interacting with the HOSTLIST daemon

The HOSTLIST subsystem is currently implemented as a daemon, so there is no need for the user to interact with it and therefore there is no command line tool and no API to communicate with the daemon. In the future, we can envision changing this to allow users to manually trigger the download of a hostlist.

Since there is no command line interface to interact with HOSTLIST, the only way to interact with the hostlist is to use STATISTICS to obtain or modify information about the status of HOSTLIST:

$ gnunet-statistics -s hostlist

In particular, HOSTLIST includes a persistent value in statistics that specifies when the hostlist server might be queried next. As this value is exponentially increasing during runtime, developers may want to reset or manually adjust it. Note that HOSTLIST (but not STATISTICS) needs to be shutdown if changes to this value are to have any effect on the daemon (as HOSTLIST does not monitor STATISTICS for changes to the download frequency).

Hostlist security: address validation

Since information obtained from other parties cannot be trusted without validation, we have to distinguish between validated and not validated addresses. Before using (and so trusting) information from other parties, this information has to be double-checked (validated). Address validation is not done by HOSTLIST but by the TRANSPORT service.

The HOSTLIST component is functionally located between the PEERINFO and the TRANSPORT subsystem. When acting as a server, the daemon obtains valid (validated) peer information (HELLO messages) from the PEERINFO service and provides it to other peers. When acting as a client, it contacts the HOSTLIST servers specified in the configuration, downloads the (unvalidated) list of HELLO messages and forwards these information to the TRANSPORT server to validate the addresses.

The HOSTLIST daemon

The hostlist daemon is the main component of the HOSTLIST subsystem. It is started by the ARM service and (if configured) starts the HOSTLIST client and server components.

If the daemon provides a hostlist itself it can advertise it's own hostlist to other peers. To do so it sends a GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT message to other peers when they connect to this peer on the CORE level. This hostlist advertisement message contains the URL to access the HOSTLIST HTTP server of the sender. The daemon may also subscribe to this type of message from CORE service, and then forward these kind of message to the HOSTLIST client. The client then uses all available URLs to download peer information when necessary.

When starting, the HOSTLIST daemon first connects to the CORE subsystem and if hostlist learning is enabled, registers a CORE handler to receive this kind of messages. Next it starts (if configured) the client and server. It passes pointers to CORE connect and disconnect and receive handlers where the client and server store their functions, so the daemon can notify them about CORE events.

To clean up on shutdown, the daemon has a cleaning task, shutting down all subsystems and disconnecting from CORE.

The HOSTLIST server

The server provides a way for other peers to obtain HELLOs. Basically it is a small web server other peers can connect to and download a list of HELLOs using standard HTTP; it may also advertise the URL of the hostlist to other peers connecting on CORE level.

The HTTP Server

During startup, the server starts a web server listening on the port specified with the HTTPPORT value (default 8080). In addition it connects to the PEERINFO service to obtain peer information. The HOSTLIST server uses the GNUNET_PEERINFO_iterate function to request HELLO information for all peers and adds their information to a new hostlist if they are suitable (expired addresses and HELLOs without addresses are both not suitable) and the maximum size for a hostlist is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When PEERINFO finishes (with a last NULL callback), the server destroys the previous hostlist response available for download on the web server and replaces it with the updated hostlist. The hostlist format is basically a sequence of HELLO messages (as obtained from PEERINFO) without any special tokenization. Since each HELLO message contains a size field, the response can easily be split into separate HELLO messages by the client.

A HOSTLIST client connecting to the HOSTLIST server will receive the hostlist as a HTTP response and the the server will terminate the connection with the result code HTTP 200 OK. The connection will be closed immediately if no hostlist is available.

Advertising the URL

The server also advertises the URL to download the hostlist to other peers if hostlist advertisement is enabled. When a new peer connects and has hostlist learning enabled, the server sends a GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT message to this peer using the CORE service.

The HOSTLIST client

The client provides the functionality to download the list of HELLOs from a set of URLs. It performs a standard HTTP request to the URLs configured and learned from advertisement messages received from other peers. When a HELLO is downloaded, the HOSTLIST client forwards the HELLO to the TRANSPORT service for validation.

The client supports two modes of operation: download of HELLOs (bootstrapping) and learning of URLs.

Bootstrapping

For bootstrapping, it schedules a task to download the hostlist from the set of known URLs. The downloads are only performed if the number of current connections is smaller than a minimum number of connections (at the moment 4). The interval between downloads increases exponentially; however, the exponential growth is limited if it becomes longer than an hour. At that point, the frequency growth is capped at (#number of connections * 1h).

Once the decision has been taken to download HELLOs, the daemon chooses a random URL from the list of known URLs. URLs can be configured in the configuration or be learned from advertisement messages. The client uses a HTTP client library (libcurl) to initiate the download using the libcurl multi interface. Libcurl passes the data to the callback_download function which stores the data in a buffer if space is available and the maximum size for a hostlist download is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When a full HELLO was downloaded, the HOSTLIST client offers this HELLO message to the TRANSPORT service for validation. When the download is finished or failed, statistical information about the quality of this URL is updated.

Learning

The client also manages hostlist advertisements from other peers. The HOSTLIST daemon forwards GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT messages to the client subsystem, which extracts the URL from the message. Next, a test of the newly obtained URL is performed by triggering a download from the new URL. If the URL works correctly, it is added to the list of working URLs.

The size of the list of URLs is restricted, so if an additional server is added and the list is full, the URL with the worst quality ranking (determined through successful downloads and number of HELLOs e.g.) is discarded. During shutdown the list of URLs is saved to a file for persistance and loaded on startup. URLs from the configuration file are never discarded.

Usage

To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES section for the ARM services. This is done in the default configuration.

For more information on how to configure the HOSTLIST subsystem see the installation handbook:
Configuring the hostlist to bootstrap
Configuring your peer to provide a hostlist

GNUnet's IDENTITY subsystem

Identities of "users" in GNUnet are called egos. Egos can be used as pseudonyms (fake names) or be tied to an organization (for example, GNU) or even the actual identity of a human. GNUnet users are expected to have many egos. They might have one tied to their real identity, some for organizations they manage, and more for different domains where they want to operate under a pseudonym.

The IDENTITY service allows users to manage their egos. The identity service manages the private keys egos of the local user; it does not manage identities of other users (public keys). Public keys for other users need names to become manageable. GNUnet uses the GNU Name System (GNS) to give names to other users and manage their public keys securely. This chapter is about the IDENTITY service, which is about the management of private keys.

On the network, an ego corresponds to an ECDSA key (over Curve25519, using RFC 6979, as required by GNS). Thus, users can perform actions under a particular ego by using (signing with) a particular private key. Other users can then confirm that the action was really performed by that ego by checking the signature against the respective public key.

The IDENTITY service allows users to associate a human-readable name with each ego. This way, users can use names that will remind them of the purpose of a particular ego. The IDENTITY service will store the respective private keys and allows applications to access key information by name. Users can change the name that is locally (!) associated with an ego. Egos can also be deleted, which means that the private key will be removed and it thus will not be possible to perform actions with that ego in the future.

Additionally, the IDENTITY subsystem can associate service functions with egos. For example, GNS requires the ego that should be used for the shorten zone. GNS will ask IDENTITY for an ego for the "gns-short" service. The IDENTITY service has a mapping of such service strings to the name of the ego that the user wants to use for this service, for example "my-short-zone-ego".

Finally, the IDENTITY API provides access to a special ego, the anonymous ego. The anonymous ego is special in that its private key is not really private, but fixed and known to everyone. Thus, anyone can perform actions as anonymous. This can be useful as with this trick, code does not have to contain a special case to distinguish between anonymous and pseudonymous egos.

libgnunetidentity

Connecting to the service

First, typical clients connect to the identity service using GNUNET_IDENTITY_connect. This function takes a callback as a parameter. If the given callback parameter is non-null, it will be invoked to notify the application about the current state of the identities in the system.

  • First, it will be invoked on all known egos at the time of the connection. For each ego, a handle to the ego and the user's name for the ego will be passed to the callback. Furthermore, a void ** context argument will be provided which gives the client the opportunity to associate some state with the ego.
  • Second, the callback will be invoked with NULL for the ego, the name and the context. This signals that the (initial) iteration over all egos has completed.
  • Then, the callback will be invoked whenever something changes about an ego. If an ego is renamed, the callback is invoked with the ego handle of the ego that was renamed, and the new name. If an ego is deleted, the callback is invoked with the ego handle and a name of NULL. In the deletion case, the application should also release resources stored in the context.
  • When the application destroys the connection to the identity service using GNUNET_IDENTITY_disconnect, the callback is again invoked with the ego and a name of NULL (equivalent to deletion of the egos). This should again be used to clean up the per-ego context.

The ego handle passed to the callback remains valid until the callback is invoked with a name of NULL, so it is safe to store a reference to the ego's handle.

Operations on Egos

Given an ego handle, the main operations are to get its associated private key using GNUNET_IDENTITY_ego_get_private_key or its associated public key using GNUNET_IDENTITY_ego_get_public_key.

The other operations on egos are pretty straightforward. Using GNUNET_IDENTITY_create, an application can request the creation of an ego by specifying the desired name. The operation will fail if that name is already in use. Using GNUNET_IDENTITY_rename the name of an existing ego can be changed. Finally, egos can be deleted using GNUNET_IDENTITY_delete. All of these operations will trigger updates to the callback given to the GNUNET_IDENTITY_connect function of all applications that are connected with the identity service at the time. GNUNET_IDENTITY_cancel can be used to cancel the operations before the respective continuations would be called. It is not guaranteed that the operation will not be completed anyway, only the continuation will no longer be called.

The anonymous Ego

A special way to obtain an ego handle is to call GNUNET_IDENTITY_ego_get_anonymous, which returns an ego for the "anonymous" user --- anyone knows and can get the private key for this user, so it is suitable for operations that are supposed to be anonymous but require signatures (for example, to avoid a special path in the code). The anonymous ego is always valid and accessing it does not require a connection to the identity service.

Convenience API to lookup a single ego

As applications commonly simply have to lookup a single ego, there is a convenience API to do just that. Use GNUNET_IDENTITY_ego_lookup to lookup a single ego by name. Note that this is the user's name for the ego, not the service function. The resulting ego will be returned via a callback and will only be valid during that callback. The operation can be cancelled via GNUNET_IDENTITY_ego_lookup_cancel (cancellation is only legal before the callback is invoked).

Associating egos with service functions

The GNUNET_IDENTITY_set function is used to associate a particular ego with a service function. The name used by the service and the ego are given as arguments. Afterwards, the service can use its name to lookup the associated ego using GNUNET_IDENTITY_get.

The IDENTITY Client-Service Protocol

A client connecting to the identity service first sends a message with type GNUNET_MESSAGE_TYPE_IDENTITY_START to the service. After that, the client will receive information about changes to the egos by receiving messages of type GNUNET_MESSAGE_TYPE_IDENTITY_UPDATE. Those messages contain the private key of the ego and the user's name of the ego (or zero bytes for the name to indicate that the ego was deleted). A special bit end_of_list is used to indicate the end of the initial iteration over the identity service's egos.

The client can trigger changes to the egos by sending CREATE, RENAME or DELETE messages. The CREATE message contains the private key and the desired name. The RENAME message contains the old name and the new name. The DELETE message only needs to include the name of the ego to delete. The service responds to each of these messages with a RESULT_CODE message which indicates success or error of the operation, and possibly a human-readable error message.

Finally, the client can bind the name of a service function to an ego by sending a SET_DEFAULT message with the name of the service function and the private key of the ego. Such bindings can then be resolved using a GET_DEFAULT message, which includes the name of the service function. The identity service will respond to a GET_DEFAULT request with a SET_DEFAULT message containing the respective information, or with a RESULT_CODE to indicate an error.

GNUnet's NAMESTORE Subsystem

The NAMESTORE subsystem provides persistent storage for local GNS zone information. All local GNS zone information are managed by NAMESTORE. It provides both the functionality to administer local GNS information (e.g. delete and add records) as well as to retrieve GNS information (e.g to list name information in a client). NAMESTORE does only manage the persistent storage of zone information belonging to the user running the service: GNS information from other users obtained from the DHT are stored by the NAMECACHE subsystem.

NAMESTORE uses a plugin-based database backend to store GNS information with good performance. Here sqlite, MySQL and PostgreSQL are supported database backends. NAMESTORE clients interact with the IDENTITY subsystem to obtain cryptographic information about zones based on egos as described with the IDENTITY subsystem., but internally NAMESTORE refers to zones using the ECDSA private key. In addition, it collaborates with the NAMECACHE subsystem and stores zone information when local information are modified in the GNS cache to increase look-up performance for local information.

NAMESTORE provides functionality to look-up and store records, to iterate over a specific or all zones and to monitor zones for changes. NAMESTORE functionality can be accessed using the NAMESTORE api or the NAMESTORE command line tool.

libgnunetnamestore

To interact with NAMESTORE clients first connect to the NAMESTORE service using the GNUNET_NAMESTORE_connect passing a configuration handle. As a result they obtain a NAMESTORE handle, they can use for operations, or NULL is returned if the connection failed.

To disconnect from NAMESTORE, clients use GNUNET_NAMESTORE_disconnect and specify the handle to disconnect.

NAMESTORE internally uses the ECDSA private key to refer to zones. These private keys can be obtained from the IDENTITY subsytem. Here egos can be used to refer to zones or the default ego assigned to the GNS subsystem can be used to obtained the master zone's private key.

Editing Zone Information

NAMESTORE provides functions to lookup records stored under a label in a zone and to store records under a label in a zone.

To store (and delete) records, the client uses the GNUNET_NAMESTORE_records_store function and has to provide namestore handle to use, the private key of the zone, the label to store the records under, the records and number of records plus an callback function. After the operation is performed NAMESTORE will call the provided callback function with the result GNUNET_SYSERR on failure (including timeout/queue drop/failure to validate), GNUNET_NO if content was already there or not found GNUNET_YES (or other positive value) on success plus an additional error message.

Records are deleted by using the store command with 0 records to store. It is important to note, that records are not merged when records exist with the label. So a client has first to retrieve records, merge with existing records and then store the result.

To perform a lookup operation, the client uses the GNUNET_NAMESTORE_records_store function. Here he has to pass the namestore handle, the private key of the zone and the label. He also has to provide a callback function which will be called with the result of the lookup operation: the zone for the records, the label, and the records including the number of records included.

A special operation is used to set the preferred nickname for a zone. This nickname is stored with the zone and is automatically merged with all labels and records stored in a zone. Here the client uses the GNUNET_NAMESTORE_set_nick function and passes the private key of the zone, the nickname as string plus a the callback with the result of the operation.

Iterating Zone Information

A client can iterate over all information in a zone or all zones managed by NAMESTORE. Here a client uses the GNUNET_NAMESTORE_zone_iteration_start function and passes the namestore handle, the zone to iterate over and a callback function to call with the result. If the client wants to iterate over all the, he passes NULL for the zone. A GNUNET_NAMESTORE_ZoneIterator handle is returned to be used to continue iteration.

NAMESTORE calls the callback for every result and expects the client to call
GNUNET_NAMESTORE_zone_iterator_next to continue to iterate or GNUNET_NAMESTORE_zone_iterator_stop to interrupt the iteration. When NAMESTORE reached the last item it will call the callback with a NULL value to indicate.

Monitoring Zone Information

Clients can also monitor zones to be notified about changes. Here the clients uses the GNUNET_NAMESTORE_zone_monitor_start function and passes the private key of the zone and and a callback function to call with updates for a zone. The client can specify to obtain zone information first by iterating over the zone and specify a synchronization callback to be called when the client and the namestore are synced.

On an update, NAMESTORE will call the callback with the private key of the zone, the label and the records and their number.

To stop monitoring, the client call GNUNET_NAMESTORE_zone_monitor_stop and passes the handle obtained from the function to start the monitoring.

GNUnet's PEERINFO subsystem

The PEERINFO subsystem is used to store verified (validated) information about known peers in a persistent way. It obtains these addresses for example from TRANSPORT service which is in charge of address validation. Validation means that the information in the HELLO message are checked by connecting to the addresses and performing a cryptographic handshake to authenticate the peer instance stating to be reachable with these addresses. Peerinfo does not validate the HELLO messages itself but only stores them and gives them to interested clients.

As future work, we think about moving from storing just HELLO messages to providing a generic persistent per-peer information store. More and more subsystems tend to need to store per-peer information in persistent way. To not duplicate this functionality we plan to provide a PEERSTORE service providing this functionality

Features

  • Persistent storage
  • Client notification mechanism on update
  • Periodic clean up for expired information
  • Differentiation between public and friend-only HELLO

Limitations

  • Does not perform HELLO validation

Peer Information

The PEERINFO subsystem stores these information in the form of HELLO messages you can think of as business cards. These HELLO messages contain the public key of a peer and the addresses a peer can be reached under. The addresses include an expiration date describing how long they are valid. This information is updated regularly by the TRANSPORT service by revalidating the address. If an address is expired and not renewed, it can be removed from the HELLO message.

Some peer do not want to have their HELLO messages distributed to other peers , especially when GNUnet's friend-to-friend modus is enabled. To prevent this undesired distribution. PEERINFO distinguishes between public and friend-only HELLO messages. Public HELLO messages can be freely distributed to other (possibly unknown) peers (for example using the hostlist, gossiping, broadcasting), whereas friend-only HELLO messages may not be distributed to other peers. Friend-only HELLO messages have an additional flag friend_only set internally. For public HELLO message this flag is not set. PEERINFO does and cannot not check if a client is allowed to obtain a specific HELLO type.

The HELLO messages can be managed using the GNUnet HELLO library. Other GNUnet systems can obtain these information from PEERINFO and use it for their purposes. Clients are for example the HOSTLIST component providing these information to other peers in form of a hostlist or the TRANSPORT subsystem using these information to maintain connections to other peers.

Startup

During startup the PEERINFO services loads persistent HELLOs from disk. First PEERINFO parses the directory configured in the HOSTS value of the PEERINFO configuration section to store PEERINFO information.
For all files found in this directory valid HELLO messages are extracted. In addition it loads HELLO messages shipped with the GNUnet distribution. These HELLOs are used to simplify network bootstrapping by providing valid peer information with the distribution. The use of these HELLOs can be prevented by setting the USE_INCLUDED_HELLOS in the PEERINFO configuration section to NO. Files containing invalid information are removed.

Managing Information

The PEERINFO services stores information about known PEERS and a single HELLO message for every peer. A peer does not need to have a HELLO if no information are available. HELLO information from different sources, for example a HELLO obtained from a remote HOSTLIST and a second HELLO stored on disk, are combined and merged into one single HELLO message per peer which will be given to clients. During this merge process the HELLO is immediately written to disk to ensure persistence.

PEERINFO in addition periodically scans the directory where information are stored for empty HELLO messages with expired TRANSPORT addresses.
This periodic task scans all files in the directory and recreates the HELLO messages it finds. Expired TRANSPORT addresses are removed from the HELLO and if the HELLO does not contain any valid addresses, it is discarded and removed from disk.

Obtaining Information

When a client requests information from PEERINFO, PEERINFO performs a lookup for the respective peer or all peers if desired and transmits this information to the client. The client can specify if friend-only HELLOs have to be included or not and PEERINFO filters the respective HELLO messages before transmitting information.

To notify clients about changes to PEERINFO information, PEERINFO maintains a list of clients interested in this notifications. Such a notification occurs if a HELLO for a peer was updated (due to a merge for example) or a new peer was added.

The PEERINFO Client-Service Protocol

To connect and disconnect to and from the PEERINFO Service PEERINFO utilizes the util client/server infrastructure, so no special messages types are used here.

To add information for a peer, the plain HELLO message is transmitted to the service without any wrapping. Alle information required are stored within the HELLO message. The PEERINFO service provides a message handler accepting and processing these HELLO messages.

When obtaining PEERINFO information using the iterate functionality specific messages are used. To obtain information for all peers, a struct ListAllPeersMessage with message type GNUNET_MESSAGE_TYPE_PEERINFO_GET_ALL and a flag include_friend_only to indicate if friend-only HELLO messages should be included are transmitted. If information for a specific peer is required a struct ListAllPeersMessage with GNUNET_MESSAGE_TYPE_PEERINFO_GET containing the peer identity is used.

For both variants the PEERINFO service replies for each HELLO message he wants to transmit with a struct ListAllPeersMessage with type GNUNET_MESSAGE_TYPE_PEERINFO_INFO containing the plain HELLO. The final message is struct GNUNET_MessageHeader with type GNUNET_MESSAGE_TYPE_PEERINFO_INFO. If the client receives this message, he can proceed with the next request if any is pending

libgnunetpeerinfo

The PEERINFO API consists mainly of three different functionalities: maintaining a connection to the service, adding new information and retrieving information form the PEERINFO service.

Connecting to the Service

To connect to the PEERINFO service the function GNUNET_PEERINFO_connect is used, taking a configuration handle as an argument, and to disconnect from PEERINFO the function GNUNET_PEERINFO_disconnect, taking the PEERINFO handle returned from the connect function has to be called.

Adding Information

GNUNET_PEERINFO_add_peer adds a new peer to the PEERINFO subsystem storage. This function takes the PEERINFO handle as an argument, the HELLO message to store and a continuation with a closure to be called with the result of the operation. The GNUNET_PEERINFO_add_peer returns a handle to this operation allowing to cancel the operation with the respective cancel function GNUNET_PEERINFO_add_peer_cancel. To retrieve information from PEERINFO you can iterate over all information stored with PEERINFO or you can tell PEERINFO to notify if new peer information are available.

Obtaining Information

To iterate over information in PEERINFO you use GNUNET_PEERINFO_iterate. This function expects the PEERINFO handle, a flag if HELLO messages intended for friend only mode should be included, a timeout how long the operation should take and a callback with a callback closure to be called for the results. If you want to obtain information for a specific peer, you can specify the peer identity, if this identity is NULL, information for all peers are returned. The function returns a handle to allow to cancel the operation using GNUNET_PEERINFO_iterate_cancel.

To get notified when peer information changes, you can use GNUNET_PEERINFO_notify. This function expects a configuration handle and a flag if friend-only HELLO messages should be included. The PEERINFO service will notify you about every change and the callback function will be called to notify you about changes. The function returns a handle to cancel notifications with GNUNET_PEERINFO_notify_cancel.

GNUnet's PEERSTORE subsystem

GNUnet's PEERSTORE subsystem offers persistent per-peer storage for other GNUnet subsystems. GNUnet subsystems can use PEERSTORE to persistently store and retrieve arbitrary data. Each data record stored with PEERSTORE contains the following fields:

  • subsystem: Name of the subsystem responsible for the record.
  • peerid: Identity of the peer this record is related to.
  • key: a key string identifying the record.
  • value: binary record value.
  • expiry: record expiry date.

Functionality

Subsystems can store any type of value under a (subsystem, peerid, key) combination. A "replace" flag set during store operations forces the PEERSTORE to replace any old values stored under the same (subsystem, peerid, key) combination with the new value. Additionally, an expiry date is set after which the record is *possibly* deleted by PEERSTORE.

Subsystems can iterate over all values stored under any of the following combination of fields:

  • (subsystem)
  • (subsystem, peerid)
  • (subsystem, key)
  • (subsystem, peerid, key)

Subsystems can also request to be notified about any new values stored under a (subsystem, peerid, key) combination by sending a "watch" request to PEERSTORE.

Architecture

PEERSTORE implements the following components:

  • PEERSTORE service: Handles store, iterate and watch operations.
  • PEERSTORE API: API to be used by other subsystems to communicate and issue commands to the PEERSTORE service.
  • PEERSTORE plugins: Handles the persistent storage. At the moment, only an "sqlite" plugin is implemented.

libgnunetpeerstore

libgnunetpeerstore is the library containing the PEERSTORE API. Subsystems wishing to communicate with the PEERSTORE service use this API to open a connection to PEERSTORE. This is done by calling GNUNET_PEERSTORE_connect which returns a handle to the newly created connection. This handle has to be used with any further calls to the API.

To store a new record, the function GNUNET_PEERSTORE_store is to be used which requires the record fields and a continuation function that will be called by the API after the STORE request is sent to the PEERSTORE service. Note that calling the continuation function does not mean that the record is successfully stored, only that the STORE request has been successfully sent to the PEERSTORE service. GNUNET_PEERSTORE_store_cancel can be called to cancel the STORE request only before the continuation function has been called.

To iterate over stored records, the function GNUNET_PEERSTORE_iterate is to be used. peerid and key can be set to NULL. An iterator callback function will be called with each matching record found and a NULL record at the end to signal the end of result set. GNUNET_PEERSTORE_iterate_cancel can be used to cancel the ITERATE request before the iterator callback is called with a NULL record.

To be notified with new values stored under a (subsystem, peerid, key) combination, the function GNUNET_PEERSTORE_watch is to be used. This will register the watcher with the PEERSTORE service, any new records matching the given combination will trigger the callback function passed to GNUNET_PEERSTORE_watch. This continues until GNUNET_PEERSTORE_watch_cancel is called or the connection to the service is destroyed.

After the connection is no longer needed, the function GNUNET_PEERSTORE_disconnect can be called to disconnect from the PEERSTORE service. Any pending ITERATE or WATCH requests will be destroyed. If the sync_first flag is set to GNUNET_YES, the API will delay the disconnection until all pending STORE requests are sent to the PEERSTORE service, otherwise, the pending STORE requests will be destroyed as well.

GNUnet's SET Subsystem

The SET service implements efficient set operations between two peers over a mesh tunnel. Currently, set union and set intersection are the only supported operations. Elements of a set consist of an element type and arbitrary binary data. The size of an element's data is limited to around 62 KB.

Local Sets

Sets created by a local client can be modified and reused for multiple operations. As each set operation requires potentially expensive special auxilliary data to be computed for each element of a set, a set can only participate in one type of set operation (i.e. union or intersection). The type of a set is determined upon its creation. If a the elements of a set are needed for an operation of a different type, all of the set's element must be copied to a new set of appropriate type.

Set Modifications

Even when set operations are active, one can add to and remove elements from a set. However, these changes will only be visible to operations that have been created after the changes have taken place. That is, every set operation only sees a snapshot of the set from the time the operation was started. This mechanism is not implemented by copying the whole set, but by attaching generation information to each element and operation.

Set Operations

Set operations can be started in two ways: Either by accepting an operation request from a remote peer, or by requesting a set operation from a remote peer. Set operations are uniquely identified by the involved peers, an application id and the operation type.

The client is notified of incoming set operations by set listeners. A set listener listens for incoming operations of a specific operation type and application id. Once notified of an incoming set request, the client can accept the set request (providing a local set for the operation) or reject it.

Result Elements

The SET service has three result modes that determine how an operation's result set is delivered to the client:

  • Full Result Set. All elements of set resulting from the set operation are returned to the client.
  • Added Elements. Only elements that result from the operation and are not already in the local peer's set are returned. Note that for some operations (like set intersection) this result mode will never return any elements. This can be useful if only the remove peer is actually interested in the result of the set operation.
  • Removed Elements. Only elements that are in the local peer's initial set but not in the operation's result set are returned. Note that for some operations (like set union) this result mode will never return any elements. This can be useful if only the remove peer is actually interested in the result of the set operation.

libgnunetset

Sets

New sets are created with GNUNET_SET_create. Both the local peer's configuration (as each set has its own client connection) and the operation type must be specified. The set exists until either the client calls GNUNET_SET_destroy or the client's connection to the service is disrupted. In the latter case, the client is notified by the return value of functions dealing with sets. This return value must always be checked.

Elements are added and removed with GNUNET_SET_add_element and GNUNET_SET_remove_element.

Listeners

Listeners are created with GNUNET_SET_listen. Each time time a remote peer suggests a set operation with an application id and operation type matching a listener, the listener's callack is invoked. The client then must synchronously call either GNUNET_SET_accept or GNUNET_SET_reject. Note that the operation will not be started until the client calls GNUNET_SET_commit (see Section "Supplying a Set").

Operations

Operations to be initiated by the local peer are created with GNUNET_SET_prepare. Note that the operation will not be started until the client calls GNUNET_SET_commit (see Section "Supplying a Set").

Supplying a Set

To create symmetry between the two ways of starting a set operation (accepting and nitiating it), the operation handles returned by GNUNET_SET_accept and GNUNET_SET_prepare do not yet have a set to operate on, thus they can not do any work yet.

The client must call GNUNET_SET_commit to specify a set to use for an operation. GNUNET_SET_commit may only be called once per set operation.

The Result Callback

Clients must specify both a result mode and a result callback with GNUNET_SET_accept and GNUNET_SET_prepare. The result callback with a status indicating either that an element was received, or the operation failed or succeeded. The interpretation of the received element depends on the result mode. The callback needs to know which result mode it is used in, as the arguments do not indicate if an element is part of the full result set, or if it is in the difference between the original set and the final set.

The SET Client-Service Protocol

Creating Sets

For each set of a client, there exists a client connection to the service. Sets are created by sending the GNUNET_SERVICE_SET_CREATE message over a new client connection. Multiple operations for one set are multiplexed over one client connection, using a request id supplied by the client.

Listeners

Each listener also requires a seperate client connection. By sending the GNUNET_SERVICE_SET_LISTEN message, the client notifies the service of the application id and operation type it is interested in. A client rejects an incoming request by sending GNUNET_SERVICE_SET_REJECT on the listener's client connection. In contrast, when accepting an incoming request, a a GNUNET_SERVICE_SET_ACCEPT message must be sent over the
set that is supplied for the set operation.

Initiating Operations

Operations with remote peers are initiated by sending a GNUNET_SERVICE_SET_EVALUATE message to the service. The
client connection that this message is sent by determines the set to use.

Modifying Sets

Sets are modified with the GNUNET_SERVICE_SET_ADD and GNUNET_SERVICE_SET_REMOVE messages.

Results and Operation Status

The service notifies the client of result elements and success/failure of a set operation with the GNUNET_SERVICE_SET_RESULT message.

Iterating Sets

All elements of a set can be requested by sending GNUNET_SERVICE_SET_ITER_REQUEST. The server responds with GNUNET_SERVICE_SET_ITER_ELEMENT and eventually terminates the iteration with GNUNET_SERVICE_SET_ITER_DONE. After each received element, the client
must send GNUNET_SERVICE_SET_ITER_ACK. Note that only one set iteration may be active for a set at any given time.

The SET-Intersection Peer-to-Peer Protocol

The intersection protocol operates over CADET and starts with a GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer initiating the operation to the peer listening for inbound requests. It includes the number of elements of the initiating peer, which is used to decide which side will send a Bloom filter first.

The listening peer checks if the operation type and application identifier are acceptable for its current state. If not, it responds with a GNUNET_MESSAGE_TYPE_SET_RESULT and a status of GNUNET_SET_STATUS_FAILURE (and terminates the CADET channel).

If the application accepts the request, the listener sends back a
GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO if it has more elements in the set than the client. Otherwise, it immediately starts with the Bloom filter exchange. If the initiator receives a GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO response, it beings the Bloom filter exchange, unless the set size is indicated to be zero, in which case the intersection is considered finished after just the initial handshake.

The Bloom filter exchange

In this phase, each peer transmits a Bloom filter over the remaining keys of the local set to the other peer using a GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_BF message. This message additionally includes the number of elements left in the sender's set, as well as the XOR over all of the keys in that set.

The number of bits 'k' set per element in the Bloom filter is calculated based on the relative size of the two sets. Furthermore, the size of the Bloom filter is calculated based on 'k' and the number of elements in the set to maximize the amount of data filtered per byte transmitted on the wire (while avoiding an excessively high number of iterations).

The receiver of the message removes all elements from its local set that do not pass the Bloom filter test. It then checks if the set size of the sender and the XOR over the keys match what is left of his own set. If they do, he sends a
GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_DONE back to indicate that the latest set is the final result. Otherwise, the receiver starts another Bloom fitler exchange, except this time as the sender.

Salt

Bloomfilter operations are probablistic: With some non-zero probability the test may incorrectly say an element is in the set, even though it is not.

To mitigate this problem, the intersection protocol iterates exchanging Bloom filters using a different random 32-bit salt in each iteration (the salt is also included in the message). With different salts, set operations may fail for different elements. Merging the results from the executions, the probability of failure drops to zero.

The iterations terminate once both peers have established that they have sets of the same size, and where the XOR over all keys computes the same 512-bit value (leaving a failure probability of 2-511).

The SET-Union Peer-to-Peer Protocol

The SET union protocol is based on Eppstein's efficient set reconciliation without prior context. You should read this paper first if you want to understand the protocol.

The union protocol operates over CADET and starts with a GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer initiating the operation to the peer listening for inbound requests. It includes the number of elements of the initiating peer, which is currently not used.

The listening peer checks if the operation type and application identifier are acceptable for its current state. If not, it responds with a GNUNET_MESSAGE_TYPE_SET_RESULT and a status of GNUNET_SET_STATUS_FAILURE (and terminates the CADET channel).

If the application accepts the request, it sends back a strata estimator using a message of type GNUNET_MESSAGE_TYPE_SET_UNION_P2P_SE. The initiator evaluates the strata estimator and initiates the exchange of invertible Bloom filters, sending a GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF.

During the IBF exchange, if the receiver cannot invert the Bloom filter or detects a cycle, it sends a larger IBF in response (up to a defined maximum limit; if that limit is reached, the operation fails). Elements decoded while processing the IBF are transmitted to the other peer using GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS, or requested from the other peer using GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS messages, depending on the sign observed during decoding of the IBF. Peers respond to a GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS message with the respective element in a GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS message. If the IBF fully decodes, the peer responds with a GNUNET_MESSAGE_TYPE_SET_UNION_P2P_DONE message instead of another GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF.

All Bloom filter operations use a salt to mingle keys before hasing them into buckets, such that future iterations have a fresh chance of succeeding if they failed due to collisions before.

GNUnet's STATISTICS subsystem

In GNUnet, the STATISTICS subsystem offers a central place for all subsystems to publish unsigned 64-bit integer run-time statistics. Keeping this information centrally means that there is a unified way for the user to obtain data on all subsystems, and individual subsystems do not have to always include a custom data export method for performance metrics and other statistics. For example, the TRANSPORT system uses STATISTICS to update information about the number of directly connected peers and the bandwidth that has been consumed by the various plugins. This information is valuable for diagnosing connectivity and performance issues.

Following the GNUnet service architecture, the STATISTICS subsystem is divided into an API which is exposed through the header gnunet_statistics_service.h and the STATISTICS service gnunet-service-statistics. The gnunet-statistics command-line tool can be used to obtain (and change) information about the values stored by the STATISTICS service. The STATISTICS service does not communicate with other peers.

Data is stored in the STATISTICS service in the form of tuples (subsystem, name, value, persistence). The subsystem determines to which other GNUnet's subsystem the data belongs. name is the name through which value is associated. It uniquely identifies the record from among other records belonging to the same subsystem. In some parts of the code, the pair (subsystem, name) is called a statistic as it identifies the values stored in the STATISTCS service.The persistence flag determines if the record has to be preserved across service restarts. A record is said to be persistent if this flag is set for it; if not, the record is treated as a non-persistent record and it is lost after service restart. Persistent records are written to and read from the file statistics.data before shutdown and upon startup. The file is located in the HOME directory of the peer.

An anomaly of the STATISTICS service is that it does not terminate immediately upon receiving a shutdown signal if it has any clients connected to it. It waits for all the clients that are not monitors to close their connections before terminating itself. This is to prevent the loss of data during peer shutdown -- delaying the STATISTICS service shutdown helps other services to store important data to STATISTICS during shutdown.

libgnunetstatistics

libgnunetstatistics is the library containing the API for the STATISTICS subsystem. Any process requiring to use STATISTICS should use this API by to open a connection to the STATISTICS service. This is done by calling the function GNUNET_STATISTICS_create(). This function takes the subsystem's name which is trying to use STATISTICS and a configuration. All values written to STATISTICS with this connection will be placed in the section corresponding to the given subsystem's name. The connection to STATISTICS can be destroyed with the function GNUNET_STATISTICS_destroy(). This function allows for the connection to be destroyed immediately or upon transferring all pending write requests to the service.

Note: STATISTICS subsystem can be disabled by setting DISABLE = YES under the [STATISTICS] section in the configuration. With such a configuration all calls to GNUNET_STATISTICS_create() return NULL as the STATISTICS subsystem is unavailable and no other functions from the API can be used.

Statistics retrieval

Once a connection to the statistics service is obtained, information about any other system which uses statistics can be retrieved with the function GNUNET_STATISTICS_get(). This function takes the connection handle, the name of the subsystem whose information we are interested in (a NULL value will retrieve information of all available subsystems using STATISTICS), the name of the statistic we are interested in (a NULL value will retrieve all available statistics), a continuation callback which is called when all of requested information is retrieved, an iterator callback which is called for each parameter in the retrieved information and a closure for the aforementioned callbacks. The library then invokes the iterator callback for each value matching the request.

Call to GNUNET_STATISTICS_get() is asynchronous and can be canceled with the function GNUNET_STATISTICS_get_cancel(). This is helpful when retrieving statistics takes too long and especially when we want to shutdown and cleanup everything.

Setting statistics and updating them

So far we have seen how to retrieve statistics, here we will learn how we can set statistics and update them so that other subsystems can retrieve them.

A new statistic can be set using the function GNUNET_STATISTICS_set(). This function takes the name of the statistic and its value and a flag to make the statistic persistent. The value of the statistic should be of the type uint64_t. The function does not take the name of the subsystem; it is determined from the previous GNUNET_STATISTICS_create() invocation. If the given statistic is already present, its value is overwritten.

An existing statistics can be updated, i.e its value can be increased or decreased by an amount with the function GNUNET_STATISTICS_update(). The parameters to this function are similar to GNUNET_STATISTICS_set(), except that it takes the amount to be changed as a type int64_t instead of the value.

The library will combine multiple set or update operations into one message if the client performs requests at a rate that is faster than the available IPC with the STATISTICS service. Thus, the client does not have to worry about sending requests too quickly.

Watches

As interesting feature of STATISTICS lies in serving notifications whenever a statistic of our interest is modified. This is achieved by registering a watch through the function GNUNET_STATISTICS_watch(). The parameters of this function are similar to those of GNUNET_STATISTICS_get(). Changes to the respective statistic's value will then cause the given iterator callback to be called. Note: A watch can only be registered for a specific statistic. Hence the subsystem name and the parameter name cannot be NULL in a call to GNUNET_STATISTICS_watch().

A registered watch will keep notifying any value changes until GNUNET_STATISTICS_watch_cancel() is called with the same parameters that are used for registering the watch.

The STATISTICS Client-Service Protocol.

Statistics retrieval

To retrieve statistics, the client transmits a message of type GNUNET_MESSAGE_TYPE_STATISTICS_GET containing the given subsystem name and statistic parameter to the STATISTICS service. The service responds with a message of type GNUNET_MESSAGE_TYPE_STATISTICS_VALUE for each of the statistics parameters that match the client request for the client. The end of information retrieved is signaled by the service by sending a message of type GNUNET_MESSAGE_TYPE_STATISTICS_END.

Setting and updating statistics

The subsystem name, parameter name, its value and the persistence flag are communicated to the service through the message GNUNET_MESSAGE_TYPE_STATISTICS_SET.

When the service receives a message of type GNUNET_MESSAGE_TYPE_STATISTICS_SET, it retrieves the subsystem name and checks for a statistic parameter with matching the name given in the message. If a statistic parameter is found, the value is overwritten by the new value from the message; if not found then a new statistic parameter is created with the given name and value.

In addition to just setting an absolute value, it is possible to perform a relative update by sending a message of type GNUNET_MESSAGE_TYPE_STATISTICS_SET with an update flag (GNUNET_STATISTICS_SETFLAG_RELATIVE) signifying that the value in the message should be treated as an update value.

Watching for updates

The function registers the watch at the service by sending a message of type GNUNET_MESSAGE_TYPE_STATISTICS_WATCH. The service then sends notifications through messages of type GNUNET_MESSAGE_TYPE_STATISTICS_WATCH_VALUE whenever the statistic parameter's value is changed.

GNUnet's Distributed Hash Table (DHT)

GNUnet includes a generic distributed hash table that can be used by developers building P2P applications in the framework. This section documents high-level features and how developers are expected to use the DHT. We have a research paper detailing how the DHT works. Also, Nate's thesis includes a detailed description and performance analysis (in chapter 6).

Key features of GNUnet's DHT include:

  • stores key-value pairs with values up to (approximately) 63k in size
  • works with many underlay network topologies (small-world, random graph), underlay does not need to be a full mesh / clique
  • support for extended queries (more than just a simple 'key'), filtering duplicate replies within the network (bloomfilter) and content validation (for details, please read the subsection on the block library)
  • can (optionally) return paths taken by the PUT and GET operations to the application
  • provides content replication to handle churn

GNUnet's DHT is randomized and unreliable. Unreliable means that there is no strict guarantee that a value stored in the DHT is always found --- values are only found with high probability. While this is somewhat true in all P2P DHTs, GNUnet developers should be particularly wary of this fact (this will help you write secure, fault-tolerant code). Thus, when writing any application using the DHT, you should always consider the possibility that a value stored in the DHT by you or some other peer might simply not be returned, or returned with a significant delay. Your application logic must be written to tolerate this (naturally, some loss of performance or quality of service is expected in this case).

Block library and plugins

What is a Block?

Blocks are small (< 63k) pieces of data stored under a key (struct GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which defines their data format. Blocks are used in GNUnet as units of static data exchanged between peers and stored (or cached) locally. Uses of blocks include file-sharing (the files are broken up into blocks), the VPN (DNS information is stored in blocks) and the DHT (all information in the DHT and meta-information for the maintenance of the DHT are both stored using blocks). The block subsystem provides a few common functions that must be available for any type of block.

The API of libgnunetblock

The block library requires for each (family of) block type(s) a block plugin (implementing gnunet_block_plugin.h) that provides basic functions that are needed by the DHT (and possibly other subsystems) to manage the block. These block plugins are typically implemented within their respective subsystems.
The main block library is then used to locate, load and query the appropriate block plugin. Which plugin is appropriate is determined by the block type (which is just a 32-bit integer). Block plugins contain code that specifies which block types are supported by a given plugin. The block library loads all block plugins that are installed at the local peer and forwards the application request to the respective plugin.

The central functions of the block APIs (plugin and main library) are to allow the mapping of blocks to their respective key (if possible) and the ability to check that a block is well-formed and matches a given request (again, if possible). This way, GNUnet can avoid storing invalid blocks, storing blocks under the wrong key and forwarding blocks in response to a query that they do not answer.

One key function of block plugins is that it allows GNUnet to detect duplicate replies (via the Bloom filter). All plugins MUST support detecting duplicate replies (by adding the current response to the Bloom filter and rejecting it if it is encountered again). If a plugin fails to do this, responses may loop in the network.

Queries

The query format for any block in GNUnet consists of four main components. First, the type of the desired block must be specified. Second, the query must contain a hash code. The hash code is used for lookups in hash tables and databases and must not be unique for the block (however, if possible a unique hash should be used as this would be best for performance). Third, an optional Bloom filter can be specified to exclude known results; replies that hash to the bits set in the Bloom filter are considered invalid. False-positives can be eliminated by sending the same query again with a different Bloom filter mutator value, which parameterizes the hash function that is used. Finally, an optional application-specific "eXtended query" (xquery) can be specified to further constrain the results. It is entirely up to the type-specific plugin to determine whether or not a given block matches a query (type, hash, Bloom filter, and xquery). Naturally, not all xquery's are valid and some types of blocks may not support Bloom filters either, so the plugin also needs to check if the query is valid in the first place.

Depending on the results from the plugin, the DHT will then discard the (invalid) query, forward the query, discard the (invalid) reply, cache the (valid) reply, and/or forward the (valid and non-duplicate) reply.

Sample Code

The source code in plugin_block_test.c is a good starting point for new block plugins --- it does the minimal work by implementing a plugin that performs no validation at all. The respective Makefile.am shows how to build and install a block plugin.

Conclusion

In conclusion, GNUnet subsystems that want to use the DHT need to define a block format and write a plugin to match queries and replies. For testing, the "GNUNET_BLOCK_TYPE_TEST" block type can be used; it accepts any query as valid and any reply as matching any query. This type is also used for the DHT command line tools. However, it should NOT be used for normal applications due to the lack of error checking that results from this primitive implementation.

libgnunetdht

The DHT API itself is pretty simple and offers the usual GET and PUT functions that work as expected. The specified block type refers to the block library which allows the DHT to run application-specific logic for data stored in the network.

GET

When using GET, the main consideration for developers (other than the block library) should be that after issuing a GET, the DHT will continuously cause (small amounts of) network traffic until the operation is explicitly canceled. So GET does not simply send out a single network request once; instead, the DHT will continue to search for data. This is needed to achieve good success rates and also handles the case where the respective PUT operation happens after the GET operation was started. Developers should not cancel an existing GET operation and then explicitly re-start it to trigger a new round of network requests; this is simply inefficient, especially as the internal automated version can be more efficient, for example by filtering results in the network that have already been returned.

If an application that performs a GET request has a set of replies that it already knows and would like to filter, it can call
GNUNET_DHT_get_filter_known_results with an array of hashes over the respective blocks to tell the DHT that these results are not desired (any more). This way, the DHT will filter the respective blocks using the block library in the network, which may result in a significant reduction in bandwidth consumption.

PUT

In contrast to GET operations, developers must manually re-run PUT operations periodically (if they intend the content to continue to be available). Content stored in the DHT expires or might be lost due to churn. Furthermore, GNUnet's DHT typically requires multiple rounds of PUT operations before a key-value pair is consistently available to all peers (the DHT randomizes paths and thus storage locations, and only after multiple rounds of PUTs there will be a sufficient number of replicas in large DHTs). An explicit PUT operation using the DHT API will only cause network traffic once, so in order to ensure basic availability and resistance to churn (and adversaries), PUTs must be repeated. While the exact frequency depends on the application, a rule of thumb is that there should be at least a dozen PUT operations within the content lifetime. Content in the DHT typically expires after one day, so DHT PUT operations should be repeated at least every 1-2 hours.

MONITOR

The DHT API also allows applications to monitor messages crossing the local DHT service. The types of messages used by the DHT are GET, PUT and RESULT messages. Using the monitoring API, applications can choose to monitor these requests, possibly limiting themselves to requests for a particular block type.

The monitoring API is not only usefu only for diagnostics, it can also be used to trigger application operations based on PUT operations. For example, an application may use PUTs to distribute work requests to other peers. The workers would then monitor for PUTs that give them work, instead of looking for work using GET operations. This can be beneficial, especially if the workers have no good way to guess the keys under which work would be stored. Naturally, additional protocols might be needed to ensure that the desired number of workers will process the distributed workload.

DHT Routing Options

There are two important options for GET and PUT requests:

GNUNET_DHT_RO_DEMULITPLEX_EVERYWHERE
This option means that all peers should process the request, even if their peer ID is not closest to the key. For a PUT request, this means that all peers that a request traverses may make a copy of the data. Similarly for a GET request, all peers will check their local database for a result. Setting this option can thus significantly improve caching and reduce bandwidth consumption --- at the expense of a larger DHT database. If in doubt, we recommend that this option should be used.
GNUNET_DHT_RO_RECORD_ROUTE
This option instructs the DHT to record the path that a GET or a PUT request is taking through the overlay network. The resulting paths are then returned to the application with the respective result. This allows the receiver of a result to construct a path to the originator of the data, which might then be used for routing. Naturally, setting this option requires additional bandwidth and disk space, so applications should only set this if the paths are needed by the application logic.
GNUNET_DHT_RO_FIND_PEER
This option is an internal option used by the DHT's peer discovery mechanism and should not be used by applications.
GNUNET_DHT_RO_BART
This option is currently not implemented. It may in the future offer performance improvements for clique topologies.

The DHT Client-Service Protocol

PUTting data into the DHT

To store (PUT) data into the DHT, the client sends a
struct GNUNET_DHT_ClientPutMessage to the service. This message specifies the block type, routing options, the desired replication level, the expiration time, key, value and a 64-bit unique ID for the operation. The service responds with a
struct GNUNET_DHT_ClientPutConfirmationMessage with the same 64-bit unique ID. Note that the service sends the confirmation as soon as it has locally processed the PUT request. The PUT may still be propagating through the network at this time.

In the future, we may want to change this to provide (limited) feedback to the client, for example if we detect that the PUT operation had no effect because the same key-value pair was already stored in the DHT. However, changing this would also require additional state and messages in the P2P interaction.

GETting data from the DHT

To retrieve (GET) data from the DHT, the client sends a
struct GNUNET_DHT_ClientGetMessage to the service. The message specifies routing options, a replication level (for replicating the GET, not the content), the desired block type, the key, the (optional) extended query and unique 64-bit request ID.

Additionally, the client may send any number of
struct GNUNET_DHT_ClientGetResultSeenMessages to notify the service about results that the client is already aware of. These messages consist of the key, the unique 64-bit ID of the request, and an arbitrary number of hash codes over the blocks that the client is already aware of. As messages are restricted to 64k, a client that already knows more than about a thousand blocks may need to send several of these messages. Naturally, the client should transmit these messages as quickly as possible after the original GET request such that the DHT can filter those results in the network early on. Naturally, as these messages are send after the original request, it is conceivalbe that the DHT service may return blocks that match those already known to the client anyway.

In response to a GET request, the service will send struct GNUNET_DHT_ClientResultMessages to the client. These messages contain the block type, expiration, key, unique ID of the request and of course the value (a block). Depending on the options set for the respective operations, the replies may also contain the path the GET and/or the PUT took through the network.

A client can stop receiving replies either by disconnecting or by sending a struct GNUNET_DHT_ClientGetStopMessage which must contain the key and the 64-bit unique ID of the original request. Using an explicit "stop" message is more common as this allows a client to run many concurrent GET operations over the same connection with the DHT service --- and to stop them individually.

Monitoring the DHT

To begin monitoring, the client sends a struct GNUNET_DHT_MonitorStartStop message to the DHT service. In this message, flags can be set to enable (or disable) monitoring of GET, PUT and RESULT messages that pass through a peer. The message can also restrict monitoring to a particular block type or a particular key. Once monitoring is enabled, the DHT service will notify the client about any matching event using struct GNUNET_DHT_MonitorGetMessages for GET events, struct GNUNET_DHT_MonitorPutMessage for PUT events and
struct GNUNET_DHT_MonitorGetRespMessage for RESULTs. Each of these messages contains all of the information about the event.

The DHT Peer-to-Peer Protocol

Routing GETs or PUTs

When routing GETs or PUTs, the DHT service selects a suitable subset of neighbours for forwarding. The exact number of neighbours can be zero or more and depends on the hop counter of the query (initially zero) in relation to the (log of) the network size estimate, the desired replication level and the peer's connectivity. Depending on the hop counter and our network size estimate, the selection of the peers maybe randomized or by proximity to the key. Furthermore, requests include a set of peers that a request has already traversed; those peers are also excluded from the selection.

PUTting data into the DHT

To PUT data into the DHT, the service sends a struct PeerPutMessage of type GNUNET_MESSAGE_TYPE_DHT_P2P_PUT to the respective neighbour. In addition to the usual information about the content (type, routing options, desired replication level for the content, expiration time, key and value), the message contains a fixed-size Bloom filter with information about which peers (may) have already seen this request. This Bloom filter is used to ensure that DHT messages never loop back to a peer that has already processed the request. Additionally, the message includes the current hop counter and, depending on the routing options, the message may include the full path that the message has taken so far. The Bloom filter should already contain the identity of the previous hop; however, the path should not include the identity of the previous hop and the receiver should append the identity of the sender to the path, not its own identity (this is done to reduce bandwidth).

GETting data from the DHT

A peer can search the DHT by sending struct PeerGetMessages of type GNUNET_MESSAGE_TYPE_DHT_P2P_GET to other peers. In addition to the usual information about the request (type, routing options, desired replication level for the request, the key and the extended query), a GET request also again contains a hop counter, a Bloom filter over the peers that have processed the request already and depending on the routing options the full path traversed by the GET. Finally, a GET request includes a variable-size second Bloom filter and a so-called Bloom filter mutator value which together indicate which replies the sender has already seen. During the lookup, each block that matches they block type, key and extended query is additionally subjected to a test against this Bloom filter. The block plugin is expected to take the hash of the block and combine it with the mutator value and check if the result is not yet in the Bloom filter. The originator of the query will from time to time modify the mutator to (eventually) allow false-positives filtered by the Bloom filter to be returned.

Peers that receive a GET request perform a local lookup (depending on their proximity to the key and the query options) and forward the request to other peers. They then remember the request (including the Bloom filter for blocking duplicate results) and when they obtain a matching, non-filtered response a struct PeerResultMessage of type
GNUNET_MESSAGE_TYPE_DHT_P2P_RESULT is forwarded to the previous hop. Whenver a result is forwarded, the block plugin is used to update the Bloom filter accordingly, to ensure that the same result is never forwarded more than once. The DHT service may also cache forwarded results locally if the "CACHE_RESULTS" option is set to "YES" in the configuration.

The GNU Name System (GNS)

The GNU Name System (GNS) is a decentralized database that enables users to securely resolve names to values. Names can be used to identify other users (for example, in social networking), or network services (for example, VPN services running at a peer in GNUnet, or purely IP-based services on the Internet). Users interact with GNS by typing in a hostname that ends in ".gnu" or ".zkey".

Videos giving an overview of most of the GNS and the motivations behind it is available here and here. The remainder of this chapter targets developers that are familiar with high level concepts of GNS as presented in these talks.

GNS-aware applications should use the GNS resolver to obtain the respective records that are stored under that name in GNS. Each record consists of a type, value, expiration time and flags.

The type specifies the format of the value. Types below 65536 correspond to DNS record types, larger values are used for GNS-specific records. Applications can define new GNS record types by reserving a number and implementing a plugin (which mostly needs to convert the binary value representation to a human-readable text format and vice-versa). The expiration time specifies how long the record is to be valid. The GNS API ensures that applications are only given non-expired values. The flags are typically irrelevant for applications, as GNS uses them internally to control visibility and validity of records.

Records are stored along with a signature. The signature is generated using the private key of the authoritative zone. This allows any GNS resolver to verify the correctness of a name-value mapping.

Internally, GNS uses the NAMECACHE to cache information obtained from other users, the NAMESTORE to store information specific to the local users, and the DHT to exchange data between users. A plugin API is used to enable applications to define new GNS record types.

libgnunetgns

The GNS API itself is extremely simple. Clients first connec to the GNS service using GNUNET_GNS_connect. They can then perform lookups using GNUNET_GNS_lookup or cancel pending lookups using GNUNET_GNS_lookup_cancel. Once finished, clients disconnect using GNUNET_GNS_disconnect.

Looking up records

GNUNET_GNS_lookup takes a number of arguments:

handle
This is simply the GNS connection handle from GNUNET_GNS_connect.
name
The client needs to specify the name to be resolved. This can be any valid DNS or GNS hostname.
zone
The client needs to specify the public key of the GNS zone against which the resolution should be done (the ".gnu" zone). Note that a key must be provided, even if the name ends in ".zkey". This should typically be the public key of the master-zone of the user.
type
This is the desired GNS or DNS record type to look for. While all records for the given name will be returned, this can be important if the client wants to resolve record types that themselves delegate resolution, such as CNAME, PKEY or GNS2DNS. Resolving a record of any of these types will only work if the respective record type is specified in the request, as the GNS resolver will otherwise follow the delegation and return the records from the respective destination, instead of the delegating record.
only_cached
This argument should typically be set to GNUNET_NO. Setting it to GNUNET_YES disables resolution via the overlay network.
shorten_zone_key
If GNS encounters new names during resolution, their respective zones can automatically be learned and added to the "shorten zone". If this is desired, clients must pass the private key of the shorten zone. If NULL is passed, shortening is disabled.
proc
This argument identifies the function to call with the result. It is given proc_cls, the number of records found (possilby zero) and the array of the records as arguments. proc will only be called once. After proc,> has been called, the lookup must no longer be cancelled.
proc_cls
The closure for proc.

Accessing the records

The libgnunetgnsrecord library provides an API to manipulate the GNS record array that is given to proc. In particular, it offers functions such as converting record values to human-readable strings (and back). However, most libgnunetgnsrecord functions are not interesting to GNS client applications.

For DNS records, the libgnunetdnsparser library provides functions for parsing (and serializing) common types of DNS records.

Creating records

Creating GNS records is typically done by building the respective record information (possibly with the help of libgnunetgnsrecord and libgnunetdnsparser) and then using the libgnunetnamestore to publish the information. The GNS API is not involved in this operation.

Future work

In the future, we want to expand libgnunetgns to allow applications to observe shortening operations performed during GNS resolution, for example so that users can receive visual feedback when this happens.

libgnunetgnsrecord

The libgnunetgnsrecord library is used to manipulate GNS records (in plaintext or in their encrypted format). Applications mostly interact with libgnunetgnsrecord by using the functions to convert GNS record values to strings or vice-versa, or to lookup a GNS record type number by name (or vice-versa). The library also provides various other functions that are mostly used internally within GNS, such as converting keys to names, checking for expiration, encrypting GNS records to GNS blocks, verifying GNS block signatures and decrypting GNS records from GNS blocks.

We will now discuss the four commonly used functions of the API.
libgnunetgnsrecord does not perform these operations itself, but instead uses plugins to perform the operation. GNUnet includes plugins to support common DNS record types as well as standard GNS record types.

Value handling

GNUNET_GNSRECORD_value_to_string can be used to convert the (binary) representation of a GNS record value to a human readable, 0-terminated UTF-8 string. NULL is returned if the specified record type is not supported by any available plugin.

GNUNET_GNSRECORD_string_to_value can be used to try to convert a human readable string to the respective (binary) representation of a GNS record value.

Type handling

GNUNET_GNSRECORD_typename_to_number can be used to obtain the numeric value associated with a given typename. For example, given the typename "A" (for DNS A reocrds), the function will return the number 1. A list of common DNS record types is here. Note that not all DNS record types are supported by GNUnet GNSRECORD plugins at this time.

GNUNET_GNSRECORD_number_to_typename can be used to obtain the typename associated with a given numeric value. For example, given the type number 1, the function will return the typename "A".

GNS plugins

Adding a new GNS record type typically involves writing (or extending) a GNSRECORD plugin. The plugin needs to implement the gnunet_gnsrecord_plugin.h API which provides basic functions that are needed by GNSRECORD to convert typenames and values of the respective record type to strings (and back). These gnsrecord plugins are typically implemented within their respective subsystems. Examples for such plugins can be found in the GNSRECORD, GNS and CONVERSATION subsystems.

The libgnunetgnsrecord library is then used to locate, load and query the appropriate gnsrecord plugin. Which plugin is appropriate is determined by the record type (which is just a 32-bit integer). The libgnunetgnsrecord library loads all block plugins that are installed at the local peer and forwards the application request to the plugins. If the record type is not supported by the plugin, it should simply return an error code.

The central functions of the block APIs (plugin and main library) are the same four functions for converting between values and strings, and typenames and numbers documented in the previous section.

The GNS Client-Service Protocol

The GNS client-service protocol consists of two simple messages, the LOOKUP message and the LOOKUP_RESULT. Each LOOKUP message contains a unique 32-bit identifier, which will be included in the corresponding response. Thus, clients can send many lookup requests in parallel and receive responses out-of-order. A LOOKUP request also includes the public key of the GNS zone, the desired record type and fields specifying whether shortening is enabled or networking is disabled. Finally, the LOOKUP message includes the name to be resolved.

The response includes the number of records and the records themselves in the format created by GNUNET_GNSRECORD_records_serialize. They can thus be deserialized using GNUNET_GNSRECORD_records_deserialize.

Hijacking the DNS-Traffic using gnunet-service-dns

This section documents how the gnunet-service-dns (and the gnunet-helper-dns) intercepts DNS queries from the local system.
This is merely one method for how we can obtain GNS queries. It is also possible to change resolv.conf to point to a machine running gnunet-dns2gns or to modify libc's name system switch (NSS) configuration to include a GNS resolution plugin. The method described in this chaper is more of a last-ditch catch-all approach.

gnunet-service-dns enables intercepting DNS traffic using policy based routing. We MARK every outgoing DNS-packet if it was not sent by our application. Using a second routing table in the Linux kernel these marked packets are then routed through our virtual network interface and can thus be captured unchanged.

Our application then reads the query and decides how to handle it: A query to an address ending in ".gnu" or ".zkey" is hijacked by gnunet-service-gns and resolved internally using GNS. In the future, a reverse query for an address of the configured virtual network could be answered with records kept about previous forward queries. Queries that are not hijacked by some application using the DNS service will be sent to the original recipient. The answer to the query will always be sent back through the virtual interface with the original nameserver as source address.

Network Setup Details

The DNS interceptor adds the following rules to the Linux kernel:

iptables -t mangle -I OUTPUT 1 -p udp --sport $LOCALPORT --dport 53 -j ACCEPT
iptables -t mangle -I OUTPUT 2 -p udp --dport 53 -j MARK --set-mark 3
ip rule add fwmark 3 table2
ip route add default via $VIRTUALDNS table2

Line 1 makes sure that all packets coming from a port our application opened beforehand ($LOCALPORT) will be routed normally. Line 2 marks every other packet to a DNS-Server with mark 3 (chosen arbitrarily). The third line adds a routing policy based on this mark 3 via the routing table.

Serving DNS lookups via GNS on W32

This section documents how the libw32nsp (and gnunet-gns-helper-service-w32) do DNS resolutions of DNS queries on the local system. This only applies to GNUnet running on W32.

W32 has a concept of "Namespaces" and "Namespace providers". These are used to present various name systems to applications in a generic way. Namespaces include DNS, mDNS, NLA and others. For each namespace any number of providers could be registered, and they are queried in an order of priority (which is adjustable).

Applications can resolve names by using WSALookupService*() family of functions.

However, these are WSA-only facilities. Common BSD socket functions for namespace resolutions are gethostbyname and getaddrinfo (among others). These functions are implemented internally (by default - by mswsock, which also implements the default DNS provider) as wrappers around WSALookupService*() functions (see "Sample Code for a Service Provider" on MSDN).

On W32 GNUnet builds a libw32nsp - a namespace provider, which can then be installed into the system by using w32nsp-install (and uninstalled by w32nsp-uninstall), as described in "Installation Handbook".

libw32nsp is very simple and has almost no dependencies. As a response to NSPLookupServiceBegin(), it only checks that the provider GUID passed to it by the caller matches GNUnet DNS Provider GUID, checks that name being resolved ends in ".gnu" or ".zkey", then connects to gnunet-gns-helper-service-w32 at 127.0.0.1:5353 (hardcoded) and sends the name resolution request there, returning the connected socket to the caller.

When the caller invokes NSPLookupServiceNext(), libw32nsp reads a completely formed reply from that socket, unmarshalls it, then gives it back to the caller.

At the moment gnunet-gns-helper-service-w32 is implemented to ever give only one reply, and subsequent calls to NSPLookupServiceNext() will fail with WSA_NODATA (first call to NSPLookupServiceNext() might also fail if GNS failed to find the name, or there was an error connecting to it).

gnunet-gns-helper-service-w32 does most of the processing:

  • Maintains a connection to GNS.
  • Reads GNS config and loads appropriate keys.
  • Checks service GUID and decides on the type of record to look up, refusing to make a lookup outright when unsupported service GUID is passed.
  • Launches the lookup

When lookup result arrives, gnunet-gns-helper-service-w32 forms a complete reply (including filling a WSAQUERYSETW structure and, possibly, a binary blob with a hostent structure for gethostbyname() client), marshalls it, and sends it back to libw32nsp. If no records were found, it sends an empty header.

This works for most normal applications that use gethostbyname() or getaddrinfo() to resolve names, but fails to do anything with applications that use alternative means of resolving names (such as sending queries to a DNS server directly by themselves). This includes some of well known utilities, like "ping" and "nslookup".

The GNS Namecache

The NAMECACHE subsystem is responsible for caching (encrypted) resolution results of the GNU Name System (GNS). GNS makes zone information available to other users via the DHT. However, as accessing the DHT for every lookup is expensive (and as the DHT's local cache is lost whenever the peer is restarted), GNS uses the NAMECACHE as a more persistent cache for DHT lookups. Thus, instead of always looking up every name in the DHT, GNS first checks if the result is already available locally in the NAMECACHE. Only if there is no result in the NAMECACHE, GNS queries the DHT. The NAMECACHE stores data in the same (encrypted) format as the DHT. It thus makes no sense to iterate over all items in the NAMECACHE --- the NAMECACHE does not have a way to provide the keys required to decrypt the entries.

Blocks in the NAMECACHE share the same expiration mechanism as blocks in the DHT --- the block expires wheneever any of the records in the (encrypted) block expires. The expiration time of the block is the only information stored in plaintext. The NAMECACHE service internally performs all of the required work to expire blocks, clients do not have to worry about this. Also, given that NAMECACHE stores only GNS blocks that local users requested, there is no configuration option to limit the size of the NAMECACHE. It is assumed to be always small enough (a few MB) to fit on the drive.

The NAMECACHE supports the use of different database backends via a plugin API.

libgnunetnamecache

The NAMECACHE API consists of five simple functions. First, there is GNUNET_NAMECACHE_connect to connect to the NAMECACHE service. This returns the handle required for all other operations on the NAMECACHE. Using GNUNET_NAMECACHE_block_cache clients can insert a block into the cache. GNUNET_NAMECACHE_lookup_block can be used to lookup blocks that were stored in the NAMECACHE. Both operations can be cancelled using GNUNET_NAMECACHE_cancel. Note that cancelling a GNUNET_NAMECACHE_block_cache operation can result in the block being stored in the NAMECACHE -- or not. Cancellation primarily ensures that the continuation function with the result of the operation will no longer be invoked. Finally, GNUNET_NAMECACHE_disconnect closes the connection to the NAMECACHE.

The maximum size of a block that can be stored in the NAMECACHE is GNUNET_NAMECACHE_MAX_VALUE_SIZE, which is defined to be 63 kB.

The NAMECACHE Client-Service Protocol

All messages in the NAMECACHE IPC protocol start with the struct GNUNET_NAMECACHE_Header which adds a request ID (32-bit integer) to the standard message header. The request ID is used to match requests with the respective responses from the NAMECACHE, as they are allowed to happen out-of-order.

Lookup

The struct LookupBlockMessage is used to lookup a block stored in the cache. It contains the query hash. The NAMECACHE always responds with a struct LookupBlockResponseMessage. If the NAMECACHE has no response, it sets the expiration time in the response to zero. Otherwise, the response is expected to contain the expiration time, the ECDSA signature, the derived key and the (variable-size) encrypted data of the block.

Store

The struct BlockCacheMessage is used to cache a block in the NAMECACHE. It has the same structure as the struct LookupBlockResponseMessage. The service responds with a struct BlockCacheResponseMessage which contains the result of the operation (success or failure). In the future, we might want to make it possible to provide an error message as well.

The NAMECACHE Plugin API

The NAMECACHE plugin API consists of two functions, cache_block to store a block in the database, and lookup_block to lookup a block in the database.

Lookup

The lookup_block function is expected to return at most one block to the iterator, and return GNUNET_NO if there were no non-expired results. If there are multiple non-expired results in the cache, the lookup is supposed to return the result with the largest expiration time.

Store

The cache_block function is expected to try to store the block in the database, and return GNUNET_SYSERR if this was not possible for any reason. Furthermore, cache_block is expected to implicitly perform cache maintenance and purge blocks from the cache that have expired. Note that cache_block might encounter the case where the database already has another block stored under the same key. In this case, the plugin must ensure that the block with the larger expiration time is preserved. Obviously, this can done either by simply adding new blocks and selecting for the most recent expiration time during lookup, or by checking which block is more recent during the store operation.

The REVOCATION Subsystem

The REVOCATION subsystem is responsible for key revocation of Egos. If a user learns that his private key has been compromised or has lost it, he can use the REVOCATION system to inform all of the other users that this private key is no longer valid. The subsystem thus includes ways to query for the validity of keys and to propagate revocation messages.

Dissemination

When a revocation is performed, the revocation is first of all disseminated by flooding the overlay network. The goal is to reach every peer, so that when a peer needs to check if a key has been revoked, this will be purely a local operation where the peer looks at his local revocation list. Flooding the network is also the most robust form of key revocation --- an adversary would have to control a separator of the overlay graph to restrict the propagation of the revocation message. Flooding is also very easy to implement --- peers that receive a revocation message for a key that they have never seen before simply pass the message to all of their neighbours.

Flooding can only distribute the revocation message to peers that are online. In order to notify peers that join the network later, the revocation service performs efficient set reconciliation over the sets of known revocation messages whenever two peers (that both support REVOCATION dissemination) connect. The SET service is used to perform this operation efficiently.

Revocation Message: Design Requirements

However, flooding is also quite costly, creating O(|E|) messages on a network with |E| edges. Thus, revocation messages are required to contain a proof-of-work, the result of an expensive computation (which, however, is cheap to verify). Only peers that have expended the CPU time necessary to provide this proof will be able to flood the network with the revocation message. This ensures that an attacker cannot simply flood the network with millions of revocation messages. The proof-of-work required by GNUnet is set to take days on a typical PC to compute; if the ability to quickly revoke a key is needed, users have the option to pre-compute revocation messages to store off-line and use instantly after their key has expired.

Revocation messages must also be signed by the private key that is being revoked. Thus, they can only be created while the private key is in the possession of the respective user. This is another reason to create a revocation message ahead of time and store it in a secure location.

libgnunetrevocation

The REVOCATION API consists of two parts, to query and to issue revocations.

Querying for revoked keys

GNUNET_REVOCATION_query is used to check if a given ECDSA public key has been revoked. The given callback will be invoked with the result of the check. The query can be cancelled using GNUNET_REVOCATION_query_cancel on the return value.

Preparing revocations

It is often desirable to create a revocation record ahead-of-time and store it in an off-line location to be used later in an emergency. This is particularly true for GNUnet revocations, where performing the revocation operation itself is computationally expensive and thus is likely to take some time. Thus, if users want the ability to perform revocations quickly in an emergency, they must pre-compute the revocation message. The revocation API enables this with two functions that are used to compute the revocation message, but not trigger the actual revocation operation.

GNUNET_REVOCATION_check_pow should be used to calculate the proof-of-work required in the revocation message. This function takes the public key, the required number of bits for the proof of work (which in GNUnet is a network-wide constant) and finally a proof-of-work number as arguments. The function then checks if the given proof-of-work number is a valid proof of work for the given public key. Clients preparing a revocation are expected to call this function repeatedly (typically with a monotonically increasing sequence of numbers of the proof-of-work number) until a given number satisfies the check. That number should then be saved for later use in the revocation operation.

GNUNET_REVOCATION_sign_revocation is used to generate the signature that is required in a revocation message. It takes the private key that (possibly in the future) is to be revoked and returns the signature. The signature can again be saved to disk for later use, which will then allow performing a revocation even without access to the private key.

Issuing revocations

Given a ECDSA public key, the signature from GNUNET_REVOCATION_sign and the proof-of-work, GNUNET_REVOCATION_revoke can be used to perform the actual revocation. The given callback is called upon completion of the operation. GNUNET_REVOCATION_revoke_cancel can be used to stop the library from calling the continuation; however, in that case it is undefined whether or not the revocation operation will be executed.

The REVOCATION Client-Service Protocol

The REVOCATION protocol consists of four simple messages.

A QueryMessage containing a public ECDSA key is used to check if a particular key has been revoked. The service responds with a QueryResponseMessage which simply contains a bit that says if the given public key is still valid, or if it has been revoked.

The second possible interaction is for a client to revoke a key by passing a RevokeMessage to the service. The RevokeMessage contains the ECDSA public key to be revoked, a signature by the corresponding private key and the proof-of-work, The service responds with a RevocationResponseMessage which can be used to indicate that the RevokeMessage was invalid (i.e. proof of work incorrect), or otherwise indicates that the revocation has been processed successfully.

The REVOCATION Peer-to-Peer Protocol

Revocation uses two disjoint ways to spread revocation information among peers. First of all, P2P gossip exchanged via CORE-level neighbours is used to quickly spread revocations to all connected peers. Second, whenever two peers (that both support revocations) connect, the SET service is used to compute the union of the respective revocation sets.

In both cases, the exchanged messages are RevokeMessages which contain the public key that is being revoked, a matching ECDSA signature, and a proof-of-work. Whenever a peer learns about a new revocation this way, it first validates the signature and the proof-of-work, then stores it to disk (typically to a file $GNUNET_DATA_HOME/revocation.dat) and finally spreads the information to all directly connected neighbours.

For computing the union using the SET service, the peer with the smaller hashed peer identity will connect (as a "client" in the two-party set protocol) to the other peer after one second (to reduce traffic spikes on connect) and initiate the computation of the set union. All revocation services use a common hash to identify the SET operation over revocation sets.

The current implementation accepts revocation set union operations from all peers at any time; however, well-behaved peers should only initiate this operation once after establishing a connection to a peer with a larger hashed peer identity.

GNUnet's File-sharing (FS) Subsystem

This chapter describes the details of how the file-sharing service works. As with all services, it is split into an API (libgnunetfs), the service process (gnunet-service-fs) and user interface(s). The file-sharing service uses the datastore service to store blocks and the DHT (and indirectly datacache) for lookups for non-anonymous file-sharing.
Furthermore, the file-sharing service uses the block library (and the block fs plugin) for validation of DHT operations.

In contrast to many other services, libgnunetfs is rather complex since the client library includes a large number of high-level abstractions; this is necessary since the Fs service itself largely only operates on the block level. The FS library is responsible for providing a file-based abstraction to applications, including directories, meta data, keyword search, verification, and so on.

The method used by GNUnet to break large files into blocks and to use keyword search is called the "Encoding for Censorship Resistant Sharing" (ECRS). ECRS is largely implemented in the fs library; block validation is also reflected in the block FS plugin and the FS service. ECRS on-demand encoding is implemented in the FS service.

NOTE: The documentation in this chapter is quite incomplete.

Encoding for Censorship-Resistant Sharing (ECRS)

When GNUnet shares files, it uses a content encoding that is called ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is described in the (so far unpublished) research paper attached to this page. ECRS obsoletes the previous ESED and ESED II encodings which were used in GNUnet before version 0.7.0.

The rest of this page assumes that the reader is familiar with the attached paper. What follows is a description of some minor extensions that GNUnet makes over what is described in the paper. The reason why these extensions are not in the paper is that we felt that they were obvious or trivial extensions to the original scheme and thus did not warrant space in the research report.

Namespace Advertisements

An SBlock with identifier ′all zeros′ is a signed advertisement for a namespace. This special SBlock contains metadata describing the content of the namespace. Instead of the name of the identifier for a potential update, it contains the identifier for the root of the namespace. The URI should always be empty. The SBlock is signed with the content provder′s RSA private key (just like any other SBlock). Peers can search for SBlocks in order to find out more about a namespace.

KSBlocks

GNUnet implements KSBlocks which are KBlocks that, instead of encrypting a CHK and metadata, encrypt an SBlock instead. In other words, KSBlocks enable GNUnet to find SBlocks using the global keyword search. Usually the encrypted SBlock is a namespace advertisement. The rationale behind KSBlocks and SBlocks is to enable peers to discover namespaces via keyword searches, and, to associate useful information with namespaces. When GNUnet finds KSBlocks during a normal keyword search, it adds the information to an internal list of discovered namespaces. Users looking for interesting namespaces can then inspect this list, reducing the need for out-of-band discovery of namespaces. Naturally, namespaces (or more specifically, namespace advertisements) can also be referenced from directories, but KSBlocks should make it easier to advertise namespaces for the owner of the pseudonym since they eliminate the need to first create a directory.

Collections are also advertised using KSBlocks.

AttachmentSize
PDF icon ecrs.pdf270.68 KB

File-sharing persistence directory structure

This section documents how the file-sharing library implements persistence of file-sharing operations and specifically the resulting directory structure. This code is only active if the GNUNET_FS_FLAGS_PERSISTENCE flag was set when calling GNUNET_FS_start. In this case, the file-sharing library will try hard to ensure that all major operations (searching, downloading, publishing, unindexing) are persistent, that is, can live longer than the process itself. More specifically, an operation is supposed to live until it is explicitly stopped.

If GNUNET_FS_stop is called before an operation has been stopped, a SUSPEND event is generated and then when the process calls GNUNET_FS_start next time, a RESUME event is generated. Additionally, even if an application crashes (segfault, SIGKILL, system crash) and hence GNUNET_FS_stop is never called and no SUSPEND events are generated, operations are still resumed (with RESUME events). This is implemented by constantly writing the current state of the file-sharing operations to disk. Specifically, the current state is always written to disk whenever anything significant changes (the exception are block-wise progress in publishing and unindexing, since those operations would be slowed down significantly and can be resumed cheaply even without detailed accounting). Note that
if the process crashes (or is killed) during a serialization operation, FS does not guarantee that this specific operation is recoverable (no strict transactional semantics, again for performance reasons). However, all other unrelated operations should resume nicely.

Since we need to serialize the state continuously and want to recover as much as possible even after crashing during a serialization operation, we do not use one large file for serialization. Instead, several directories are used for the various operations. When GNUNET_FS_start executes, the master directories are scanned for files describing operations to resume. Sometimes, these operations can refer to related operations in child directories which may also be resumed at this point. Note that corrupted files are cleaned up automatically. However, dangling files in child directories (those that are not referenced by files from the master directories) are not automatically removed.

Persistence data is kept in a directory that begins with the "STATE_DIR" prefix from the configuration file (by default, "$SERVICEHOME/persistence/") followed by the name of the client as given to GNUNET_FS_start (for example, "gnunet-gtk") followed by the actual name of the master or child directory.

The names for the master directories follow the names of the operations:

  • "search"
  • "download"
  • "publish"
  • "unindex"

Each of the master directories contains names (chosen at random) for each active top-level (master) operation. Note that a download that is associated with a search result is not a top-level operation.

In contrast to the master directories, the child directories are only consulted when another operation refers to them. For each search, a subdirectory (named after the master search synchronization file) contains the search results. Search results can have an associated download, which is then stored in the general "download-child" directory. Downloads can be recursive, in which case children are stored in subdirectories mirroring the structure of the recursive download (either starting in the master "download" directory or in the "download-child" directory depending on how the download was initiated). For publishing operations, the "publish-file" directory contains information about the individual files and directories that are part of the publication. However, this directory structure is flat and does not mirror the structure of the publishing operation. Note that unindex operations cannot have associated child operations.

GNUnet's REGEX Subsystem

Using the REGEX subsystem, you can discover peers that offer a particular service using regular expressions. The peers that offer a service specify it using a regular expressions. Peers that want to patronize a service search using a string. The REGEX subsystem will then use the DHT to return a set of matching offerers to the patrons.

For the technical details, we have "Max's defense talk and Max's Master's thesis. An additional publication is under preparation and available to team members (in Git).

How to run the regex profiler

The gnunet-regex-profiler can be used to profile the usage of mesh/regex for a given set of regular expressions and strings. Mesh/regex allows you to announce your peer ID under a certain regex and search for peers matching a particular regex using a string. See https://gnunet.org/szengel2012ms for a full introduction.

First of all, the regex profiler uses GNUnet testbed, thus all the implications for testbed also apply to the regex profiler (for example you need password-less ssh login to the machines listed in your hosts file).

Configuration

Moreover, an appropriate configuration file is needed. Generally you can refer to SVN HEAD: contrib/regex_profiler_infiniband.conf for an example configuration. In the following paragraph the important details are highlighted.

Announcing of the regular expressions is done by the gnunet-daemon-regexprofiler, therefore you have to make sure it is started, by adding it to the AUTOSTART set of ARM:

[regexprofiler]
AUTOSTART = YES

Furthermore you have to specify the location of the binary:

[regexprofiler]
# Location of the gnunet-daemon-regexprofiler binary.
BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
# Regex prefix that will be applied to all regular expressions and search strings.
REGEX_PREFIX = "GNVPN-0001-PAD"

When running the profiler with a large scale deployment, you probably want to reduce the workload of each peer. Use the following options to do this.

[dht]
# Force network size estimation
FORCE_NSE = 1

[dhtcache]
DATABASE = heap
# Disable RC-file for Bloom filter? (for benchmarking with limited IO availability)
DISABLE_BF_RC = YES
# Disable Bloom filter entirely
DISABLE_BF = YES

[nse]
# Minimize proof-of-work CPU consumption by NSE
WORKBITS = 1

Options

To finally run the profiler some options and the input data need to be specified on the command line.

gnunet-regex-profiler -c config-file -d log-file -n num-links -p
path-compression-length -s search-delay -t matching-timeout -a num-search-strings hosts-file policy-dir search-strings-file

config-file the configuration file created earlier.
log-file file where to write statistics output.
num-links number of random links between started peers.
path-compression-length maximum path compression length in the DFA.
search-delay time to wait between peers finished linking and
starting to match strings.
matching-timeout timeout after witch to cancel the searching.
num-search-strings number of strings in the search-strings-file.

The hosts-file should contain a list of hosts for the testbed, one per line in the following format. user@host_ip:port.

The policy-dir is a folder containing text files containing one or more regular expressions. A peer is started for each file in that folder and the regular expressions in the corresponding file are announced by this peer.

The search-strings-file is a text file containing search strings, one in each line.

You can create regular expressions and search strings for every AS in the
Internet using the attached scripts. You need one of the CAIDA routeviews prefix2as data files for this. Run create_regex.py <filename> <output path> to create the regular expressions and create_strings.py <input path> <outfile> to create a search strings file from the previously created regular expressions.

AttachmentSize
Binary Data create.tar.gz813 bytes