Apache Lucene

Software Screenshot:
Apache Lucene
Software Details:
Version: 5.3.1 / 4.10.4 / 3.6.2 updated
Upload Date: 10 Dec 15
Distribution Type: Freeware
Downloads: 241

Rating: nan/5 (Total Votes: 0)

Apache Lucene is suitable for any application that requires support for full-text searching, while also keeping server resource consumption down and producing fast & high-accuracy results.

Lucene is widely considered as one of the best search engines around, being at the core of many other search tools, the most famous being Apache Solr.

Lucene is written entirely in Java and since being released by the Apache Foundation, it has been ported to many other languages and various bindings and wrappers exist as third-party developed software.

What is new in this release:

  • All file access now uses Java's NIO.2 APIs which give Lucene stronger index safety in terms of better error handling and safer commits.
  • Every Lucene segment now stores a unique id per-segment and per-commit to aid in accurate replication of index files.
  • During merging, IndexWriter now always checks the incoming segments for corruption before merging. This can mean, on upgrading to 5.0.0, that merging may uncover long-standing latent corruption in an older 4.x index.

What is new in version 5.2.1 / 4.10.4 / 3.6.2:

  • All file access now uses Java's NIO.2 APIs which give Lucene stronger index safety in terms of better error handling and safer commits.
  • Every Lucene segment now stores a unique id per-segment and per-commit to aid in accurate replication of index files.
  • During merging, IndexWriter now always checks the incoming segments for corruption before merging. This can mean, on upgrading to 5.0.0, that merging may uncover long-standing latent corruption in an older 4.x index.

What is new in version 5.1.0 / 4.10.4 / 3.6.2:

  • All file access now uses Java's NIO.2 APIs which give Lucene stronger index safety in terms of better error handling and safer commits.
  • Every Lucene segment now stores a unique id per-segment and per-commit to aid in accurate replication of index files.
  • During merging, IndexWriter now always checks the incoming segments for corruption before merging. This can mean, on upgrading to 5.0.0, that merging may uncover long-standing latent corruption in an older 4.x index.

What is new in version 5.0.0 / 4.10.3 / 3.6.2:

  • New Terms.getMin/Max methods to retrieve the lowest and highest terms per field.
  • New IDVersionPostingsFormat, optimized for ID lookups that associate a monotonically increasing version per ID.
  • Atomic update of a set of doc values fields.
  • Numerous optimizations for doc values search-time performance.
  • New (default) Lucene49NormsFormat to better compress certain cases such as very short fields.
  • New SORTED_NUMERIC docvalues type for efficient processing of multi-valued numeric fields.
  • Indexer passes previous token stream for easier reuse.
  • MoreLikeThis accepts multiple values per field.
  • All classes that estimate their RAM usage now implement a new Accountable interface.
  • Lucene files are now written by (File)OutputStream on all platforms, completely disallowing seeking with simplified IO APIs.
  • Improve the confusing error message when MMapDirectory cannot create a new map.

What is new in version 4.8.0:

  • Lucene has a new Rescorer/QueryRescorer API to perform second-pass rescoring or reranking of search results using more expensive scoring functions after first-pass hit collection.
  • AnalyzingInfixSuggester now supports near-real-time autosuggest.
  • Simplified impact-sorted postings (using SortingMergePolicy and EarlyTerminatingCollector) to use Lucene's Sort class to express the sort order.
  • Bulk scoring and normal iterator-based scoring were separated, so some queries can do bulk scoring more effectively.
  • Switched to MurmurHash3 to hash terms during indexing.
  • IndexWriter now supports updating of binary doc value fields.
  • HunspellStemFilter now uses 10 to 100x less RAM. It also loads all known OpenOffice dictionaries without error.
  • Lucene now also fsyncs the directory metadata on commits, if the operating system and file system allow it (Linux, MacOSX are known to work).
  • Lucene now uses Java 7 file system functions under the hood, so index files can be deleted on Windows, even when readers are still open.
  • A serious bug in NativeFSLockFactory was fixed, which could allow multiple IndexWriters to acquire the same lock. The lock file is no longer deleted from the index directory even when the lock is not held.

What is new in version 4.7.0:

  • When sorting by String (SortField.STRING), you can now specify whether missing values should be sorted first (the default), or last.
  • NRT support for file systems that do not have delete on last close or cannot delete while referenced semantics.
  • Added LongBitSet for managing more than 2.1B bits (otherwise use FixedBitSet).
  • Added Analyzer for Kurdish.
  • Added Payload support to FileDictionary (Suggest) and make it more configurable.
  • Added a new BlendedInfixSuggester, which is like AnalyzingInfixSuggester but boosts suggestions that matched tokens with lower positions.
  • Added SimpleQueryParser: parser for human-entered queries.
  • Added multitermquery (wildcards,prefix,etc) to PostingsHighlighter.

What is new in version 4.6.0:

  • Added support for NumericDocValues field updates (without re-indexing the document) through IndexWriter.updateNumericDocValue(Term, String, Long).
  • New FreeTextSuggester can predict the next word using a simple ngram language model useful for "long tail" suggestions.
  • A new expression module allows for customized ranking with script-like syntax.
  • A new DirectDocValuesFormat can hold all doc values in heap as uncompressed java native arrays.
  • Term.hasFreqs can now determine if a given field indexed per-doc
  • term frequencies.

What is new in version 4.5.0:

  • New in-memory DocIdSet implementations which are especially better than FixedBitSet on small sets: WAH8DocIdSet, PFORDeltaDocIdSet and EliasFanoDocIdSet.
  • CachingWrapperFilter now caches filters with WAH8DocIdSet by default, which has the same memory usage as FixedBitSet in the worst case but is smaller and faster on small sets.
  • TokenStreams now set the position increment in end(), so we can handle trailing holes.
  • IndexWriter no longer clones the given IndexWriterConfig.
  • Various bugfixes and optimizations since the 4.4 release.

What is new in version 4.4.0:

  • New Replicator module: replicate index revisions between server and client.
  • New AnalyzingInfixSuggester: finds suggestions based on matches to any tokens in the suggestion, not just based on pure prefix matching.
  • New PatternCaptureGroupTokenFilter: emit multiple tokens, one for each capture group in one or more Java regexes.
  • New Lucene Facet module.

What is new in version 4.3.0:

  • New SearcherTaxonomyManager manages near-real-time reopens of both IndexSearcher and TaxonomyReader (for faceting).
  • Added new facet method to the facet module to compute facet counts using SortedSetDocValuesField, without a separate taxonomy index.
  • Significant performance improvements for minShouldMatch BooleanQuery due to skipping resulting in up to 4000% faster queries.
  • Various bugfixes and optimizations since the 4.2.1 release.

What is new in version 4.1.0:

  • Lucene no longer seeks when writing files (all fields are written in an append-only way). This means it works by default with append-only streams, hdfs, etc..
  • New suggest implementations: AnalyzingSuggester, where the underlying form (computed from a lucene Analyzer) used for suggestions is separate from the returned text and FuzzySuggester, which additionally allows for inexact matching on the input.
  • Near-realtime support was added to the facet module.
  • New Highlighter (postingshighlighter) added to the highlighter module.
  • Added FilterStrategy to FilteredQuery for more flexibility in filtered query execution.
  • Added CommonTermsQuery to speed up queries with very highly frequent terms. Term frequencies are efficiently detected at query time - no index time preparation required.
  • Several bugfixes and optimizations since the 4.0 release.

What is new in version 4.0-alpha:

  • The index formats for terms, postings lists, stored fields, term
  • vectors, etc. are pluggable via the Codec api. You can select from the provided implementations or customize the index format with your own Codec to meet your needs.
  • Substantially faster performance when using a Filter during searching.
  • File-system based directories can rate-limit the IO (MB/sec) of merge threads, to reduce IO contention between merging and searching threads.
  • FuzzyQuery is 100-200 times faster than in past releases.
  • A new spell checker, DirectSpellChecker, finds possible corrections
  • directly against the main search index without requiring a separate index.

What is new in version 3.6.0:

  • In addition to Java 5 and Java 6, this release has now full Java 7 support (minimum JDK 7u1 required).
  • TypeTokenFilter filters tokens based on their TypeAttribute.
  • Fixed offset bugs in a number of CharFilters, Tokenizers and TokenFilters that could lead to exceptions during highlighting.
  • Added phonetic encoders: Metaphone, Soundex, Caverphone, Beider-Morse, etc.
  • CJKBigramFilter and CJKWidthFilter replace CJKTokenizer.
  • Kuromoji morphological analyzer tokenizes Japanese text, producing both compound words and their segmentation.
  • Static index pruning (Carmel pruning) removes postings with low within-document term frequency.
  • QueryParser now interprets '*' as an open end for range queries.
  • FieldValueFilter excludes documents missing the specified field.
  • CheckIndex and IndexUpgrader allow you to specify the specific FSDirectory implementation to use with the new -dir-impl command-line option.
  • FSTs can now do reverse lookup (by output) in certain cases and can be packed to reduce their size. There is now a method to retrieve top N shortest paths from a start node in an FST.
  • New WFSTCompletionLookup suggester supports finer-grained ranking for suggestions.
  • FST based suggesters now use an offline (disk-based) sort, instead of in-memory sort, when pre-sorting the suggestions.
  • ToChildBlockJoinQuery joins in the opposite direction (parent down to child documents).
  • New query-time joining is more flexible (but less performant) than index-time joins.
  • Added HTMLStripCharFilter to strip HTML markup.

What is new in version 3.5.0:

  • Added a very substantial (3-5X) RAM reduction required to hold the terms index on opening an IndexReader.
  • Added IndexSearcher.searchAfter which returns results after a specified ScoreDoc (e.g. last document on the previous page) to support deep paging use cases.
  • Added SearcherManager to manage sharing and reopening IndexSearchers across multiple search threads. Underlying IndexReader instances are safely closed if not referenced anymore.
  • Added SearcherLifetimeManager which safely provides a consistent view of the index across multiple requests (e.g. paging/drilldown).
  • Renamed IndexWriter.optimize to forceMerge to discourage use of this method since it is horribly costly and rarely justified anymore.

What is new in version 3.3.0:

  • The spellchecker module now includes suggest/auto-complete functionality, with three implementations: Jaspell, Ternary Trie, and Finite State.
  • Support for merging results from multiple shards, for both "normal" search results (TopDocs.merge) as well as grouped results using the grouping module (SearchGroup.merge, TopGroups.merge).
  • An optimized implementation of KStem, a less aggressive stemmer for English.
  • Single-pass grouping implementation based on block document indexing.
  • Improvements to MMapDirectory (now also the default implementation returned by FSDirectory.open on 64-bit Linux).
  • NRTManager simplifies handling near-real-time search with multiple search threads, allowing the application to control which indexing changes must be visible to which search requests.
  • TwoPhaseCommitTool facilitates performing a multi-resource two-phased commit, including IndexWriter.
  • The default merge policy, TieredMergePolicy, has a new method (set/getReclaimDeletesWeight) to control how aggressively it targets segments with deletions, and is now more aggressive than before by default.
  • PKIndexSplitter tool splits an index by a mid-point term.

What is new in version 3.2.0:

  • A new grouping module, under lucene/contrib/grouping, enables search results to be grouped by a single-valued indexed field.
  • A new IndexUpgrader tool fully converts an old index to the current format.
  • A new Directory implementation, NRTCachingDirectory, caches small segments in RAM, to reduce the I/O load for applications with fast NRT reopen rates.
  • A new Collector implementation, CachingCollector, is able to gather search hits (document IDs and optionally also scores) and then replay them. This is useful for Collectors that require two or more passes to produce results.
  • Index a document block using IndexWriter's new addDocuments or updateDocuments methods. These experimental APIs ensure that the block of documents will forever remain contiguous in the index, enabling interesting future features like grouping and joins.
  • A new default merge policy, TieredMergePolicy, which is more efficient due to being able to merge non-contiguous segments.
  • NumericField is now returned correctly when you load a stored document (previously you received a normal Field back, with the numeric value converted string).

What is new in version 3.1.0:

  • ConstantScoreQuery now allows directly wrapping a Query.
  • IndexWriter is now configured with a new separate builder API, IndexWriterConfig. You can now control IndexWriter's previously fixed internal thread limit by calling setMaxThreadStates.
  • IndexWriter.getReader is replaced by IndexReader.open(IndexWriter). In addition you can now specify whether deletes should be resolved when you open an NRT reader.
  • MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher.
  • On 64bit Windows and Solaris JVMs, MMapDirectory is now the default implementation (returned by FSDirectory.open). MMapDirectory also enables unmapping if the JVM supports it.
  • New TotalHitCountCollector just counts total number of hits.
  • ReaderFinishedListener API enables external caches to evict entries once a segment is finished.

What is new in version 3.0.1:

  • Remove unneeded synchronization in FuzzyTermEnum.
  • When resolving deleted terms, do so in term sort order for better performance.
  • Don't incorrectly keep warning about the same immense term, when IndexWriter.infoStream is on.
  • Fix Min/MaxPayloadFunction returns 0 when only one payload is present.
  • Queries consisting of all zero-boost clauses (for example, text:foo^0) sorted incorrectly and produced invalid docids.
  • Removed the protected inner class ScoreTerm from FuzzyQuery. The change was needed because the comparator of this class had to be changed in an incompatible way. The class was never intended to be public.

What is new in version 2.9.2:

  • BooleanQuery was ignoring disableCoord in its hashCode and equals methods, cause bad things to happen when caching BooleanQueries.
  • Don't incorrectly keep warning about the same immense term, when IndexWriter.infoStream is on.
  • At high indexing rates, NRT reader could temporarily lose deletions.

What is new in version 3.0.0:

  • Removed the system property to set SegmentReader class implementation.
  • Change return type of SnapshotDeletionPolicy#snapshot() from IndexCommitPoint to IndexCommit. Code that uses this method needs to be recompiled against Lucene 3.0 in order to work. The previously deprecated IndexCommitPoint is also removed.
  • Provide a convenience AttributeFactory that creates a Token instance for all basic attributes.
  • Remove recursion in NumericRangeTermEnum.
  • Optimize Levenshtein Distance computation in FuzzyQuery.

Similar Software

Spidr
Spidr

12 May 15

Zoom Search Engine
Zoom Search Engine

10 Feb 16

Yioop!
Yioop!

10 Dec 15

PHP Search Engine
PHP Search Engine

13 May 15

Other Software of Developer Apache Software Foundation

Apache Sling
Apache Sling

13 Apr 15

Apache Torque
Apache Torque

13 Apr 15

Apache Hadoop
Apache Hadoop

10 Feb 16

Comments to Apache Lucene

Comments not found
Add Comment
Turn on images!