[English | Afrikaans | Bulgarian | Catalan | Czech | Danish | Dutch | Esperanto | Finnish | French | Galician | German | Hungarian | Italian | Japanese | Polish | Portuguese | Romanian | Russian | Simplified chinese | Slovak | Spanish | Swedish | Traditional chinese | Ukrainian]
GNU libextractor - a simple library for keyword extraction
Home
About
Recent News
Contact
Download
Online Demo
Documentation
Browse Source
Old News
Freshmeat Page

About GNU libextractor

libextractor

libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. libextractor is part of the GNU project. Our official GNU website can be found at http://www.gnu.org/software/libextractor/. libextractor can be downloaded from this site or the GNU mirrors.

The goal is to provide developers of file-sharing networks or WWW-indexing bots with a universal library to obtain simple keywords to match against queries. libextractor contains a shell-command "extract" that, similar to the well-known "file" command, can extract meta-data from a file an print the results to stdout.

Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF (NES Sound Format), SID, OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, FLV, REAL, RIFF (AVI), MPEG, QT and ASF.
Also, various additional MIME types are detected.

libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Recent News

Fri Apr 25 08:46:10 MDT 2008 | libextractor v0.5.20b released.
This release fixes security issues in the XPDF-based PDF plugin (which is not the one used by default).
Mon Apr 14 14:23:33 MDT 2008 | libextractor v0.5.20a released.
This release updates the Swedish, Vietnamese, German and Gaelic translations and adds translations for Dutch.
Thu Mar 20 23:38:47 MDT 2008 | libextractor v0.5.20 released.
This release adds support for AppleSingle and AppleDouble files and improves extraction of track numbers and ISRC codes.
Sat Jan 12 14:10:59 MST 2008 | libextractor v0.5.19a released.
This release fixes security issues in the XPDF-based PDF plugin (which is not the one used by default).
Mon Jan 7 08:51:58 MST 2008 | libextractor v0.5.19 released.
This release adds support for Adobe Flash (FLV) and Free Lossless Audio Codec (FLAC) files. The quicktime, ole2 and split extractors were also improved.

Older news archive

Links

Related work:

Articles related to libextractor: Projects that use libextractor:

Contact

libextractor is developed by Christian Grothoff and Vids Samanta. For questions about libextractor send email to libextractor@gnu.org.


libextractor@gnu.org

Translation engine based on i18nHTML (C) 2003, 2004, 2005, 2006, 2007 Christian Grothoff.

go to i18nHTML administration page