| GNU libextractor - a simple library for keyword extraction | |||||||||||
|
About GNU libextractor
libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. libextractor is part of the GNU project. Our official GNU website can be found at http://www.gnu.org/software/libextractor/. libextractor can be downloaded from this site or the GNU mirrors. The goal is to provide developers of file-sharing networks or WWW-indexing bots with a universal library to obtain simple keywords to match against queries. libextractor contains a shell-command "extract" that, similar to the well-known "file" command, can extract meta-data from a file an print the results to stdout.
Currently, libextractor supports the following formats:
HTML,
PDF,
PS,
OLE2 (DOC, XLS, PPT),
OpenOffice (sxw),
StarOffice (sdw),
DVI,
MAN,
FLAC,
MP3 (ID3v1 and ID3v2),
NSF (NES Sound Format),
SID,
OGG,
WAV,
EXIV2,
JPEG,
GIF,
PNG,
TIFF,
DEB,
RPM,
TAR(.GZ),
ZIP,
ELF,
FLV,
REAL,
RIFF (AVI),
MPEG,
QT
and
ASF.
libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Recent News
LinksRelated work:
Contactlibextractor is developed by Christian Grothoff and Vids Samanta. For questions about libextractor send email to libextractor@gnu.org. | ||||||||||
Translation engine based on i18nHTML (C) 2003, 2004, 2005, 2006, 2007 Christian Grothoff.