Analyzers
In order to make your data searchable it is typically _analyzed_.
When text is analyzed using Lucene's @StandardAnalyzer@, for example,
white-space and other irrelevant characters (eg punctation) are
discarded, as are un-interesting words (eg, 'and', 'or', etc) and the
remaining words are lower-cased. The input text is effectively
normalized for the search index.
Additionally when you search with a query string, that too is
analyzed. This process means that the terms you search on are
normalized in the same way as the terms in the index.
Lucene includes many analyzers out of the box and you can also provide
your own.
What we get with Compass
Compass acts as a registry of Analyzers, each identified by a name.
Compass provides two analyzers:
"default" which is used for indexing
and
"search" which is used for searching (analyzing query strings).
They are both instances of Lucene's
StandardAnalyzer (or equivalent).
You can re-define both of these or define additional analyzers with
new names.
Defining Analyzer implementations
You can define an analyzer with
#Compass settings and (since 0.5.1) as
a
#Spring bean.
Compass settings
The
Compass settings
can either be defined in the plugin's
configuration or in a
native Compass configuration file.
Compass actually provides shortcut names for some of the standard Lucene analyzers, and this is a simple way to define them, eg:
Map compassSettings = [
'compass.engine.analyzer.german.type': 'German'
]Here
"German" is a synonym provided by Compass for one of the standard Lucene analyzers and it has been named "german".
But you can also define your own implementations this way with a fully qualified class name:
Map compassSettings = [
'compass.engine.analyzer.swedishChef.type': 'com.acme.lucene.analysis.SwedishChefAnalyzer'
]See the
Compass settings reference and
general discussion with XML examples for the complete range of options.
Spring bean
_Since 0.5.1_
If you define a Spring bean in
resources.xml or
resources.groovy that is an instance of
org.apache.lucene.analysis.Analyzer then it wil be
automatically registered with Compass using the Spring bean name as it's name.
This allows you to inject your analyzer with other Spring beans and
configuration, eg
import com.acme.lucene.analysis.MyHtmlAnalyzerbeans = {
htmlAnalyzer(MyHtmlAnalyzer) {
context = someContext
includeMeta = true
}
}defines an analyzer called @"htmlAnalyzer"@, while
import org.apache.lucene.analysis.standard.StandardAnalyzerbeans = {
'default'(StandardAnalyzer, new HashSet()) // there are now no stop words
}re-defines the
"default" analyzer so that it has no stop-words (and
will not discard 'and', 'or', etc).
Using Analyzers
Indexing
For indexing purposes you define the analyzer in the mapping, either at the class level
class Book {
static searchable = {
analyzer 'bookAnalyzer'
}
String title
}and/or at the property level
class Book {
static searchable = {
title analyzer: 'bookTitleAnalyzer'
}
String title
}Property-level analyzers override class-level analyzers just for that property.
Note you can also use native Compass
XML or
annotations to map with custom analyzers.
Searching
You can say which analyzer you want to use on a per-query basis
def sr = Song.search("only the lonely", analyzer: 'songLyricsAnalyzer')or with the
plugin's configuration
you can choose a search analyzer for all search queries (unless
overriden on a per-query basis).
defaultMethodOptions = [
search: [reload: false, escape: false, offset: 0, max: 10, defaultOperator: "and", analyzer: 'myAnalyzer'],
suggestQuery: [userFriendly: true]
]You could also simply redefine the
"search" analyzer to achieve the
same effect.