хранение индекса, проблемы

12
Andrey Ogarok
На сайте с 10.07.2007
Offline
49
#11

Не изобретайте велосипед. Используйте обратные списки. Ни одна СУБД для серьезных задач не подойдет. Посмотрите проект Lucene.

The Apache Lucene project develops open-source search software, including:

Lucene Java, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

Droids is an intelligent robot crawling framework currently in incubation.

Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Lucene Java search engine to the C# and .NET platform utilizing Microsoft .NET Framework. Lucene.Net is currently under incubation.

Lucy is a loose C port of Lucene Java, with Perl and Ruby bindings.

Mahout is a subproject with the goal of creating a suite of scalable machine learning libraries.

Nutch builds on Lucene Java to provide web search application software.

PyLucene is a Python port of the the Lucene Java project.

Solr is a high performance search server built using Lucene Java, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.

Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

www.asknet.ru (www.asknet.ru) - вопросно-ответная поисковая система. Автоматические ответы на вопросы пользователей.
Andrey Ogarok
На сайте с 10.07.2007
Offline
49
#12

При необходимости детализации информации пишите в личку.

12

Авторизуйтесь или зарегистрируйтесь, чтобы оставить комментарий