elasticsearch ngram analyzer/tokenizer not working? - elasticsearch

it seems that the ngram tokenizer isn't working or perhaps my understanding/use of it isn't correct.
my tokenizer is doing a mingram of 3 and maxgram of 5. i'm looking for the term 'madonna' which is definitely in my documents under artists.name. i can find the term with other techniques (using simple analyzer and related), but not using ngram.
what i'm trying to accomplish by using the ngram is to find names and accounting for misspellings.
please see a shortened version of my mappings, my settings, and my query, and if you have any ideas, please let me know - it's driving me nuts!
settings...
{
"myindex": {
"settings": {
"index": {
"analysis": {
"analyzer": {
"ngramAnalyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "nGramTokenizer"
}
},
"tokenizer": {
"nGramTokenizer": {
"type": "nGram",
"min_gram": "3",
"max_gram": "5"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1020199"
},
"uuid": "60ggSr6TREaDTItkaNUagg"
}
}
}
}
mappings ...
{
"myindex": {
"mappings": {
"mytype": {
"properties": {
"artists.name": {
"type": "string",
"analyzer": "simple",
"fields": {
"ngram": {
"type": "string",
"analyzer": "ngramAnalyzer"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
query ...
{"query": {"match": {"artists.name.ngram": "madonna"}}}
document ...
{
"_index": "myindex",
"_type": "mytype",
"_id": "602537592951",
"_version": 1,
"found": true,
"_source": {
"artists": [
{
"name": "Madonna",
"id": "P 64565"
}
]
}
}
EDIT
incidentally, this query works (without ngram):
{"query": {"match": {"artists.name": "madonna"}}}
this obviously has something to do with the nested object here. i'm apparently not applying the ngram to the nested object properly.
ideas?

ok - i figured it out. i really hope this helps someone b/c it drove me crazy.
here's what my mapping turned out to look like:
{
"myindex": {
"mappings": {
"mytype": {
"properties": {
"artists": {
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string",
"analyzer": "ngramAnalyzer",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
and here's how i did it using Nest syntax...
first i had a sub type (class) called Person which has a Name and Id which looks like this (POCO)...
[Serializable]
public class Person
{
public string Name { get; set; }
[ElasticProperty(Analyzer = "fullTerm", Index = FieldIndexOption.not_analyzed)]
public string Id { get; set; }
}
and then my mapping went something like this ...
.AddMapping<MyIndex>(m => m
.MapFromAttributes()
.Properties(props =>
{
props
.Object<Person>(x => x.Name("artists")
.Properties(pp => pp
.MultiField(
mf => mf
.Name(s => s.Name)
.Fields(f => f
.String(s => s.Name(o => o.Name).Analyzer("ngramAnalyzer"))
.String(s => s.Name(o => o.Name.Suffix("raw")).Index(FieldIndexOption.not_analyzed))
)
)
)
)
)
Note: the Object here which indicates it's another object beneath my type 'artists'.
Thanks, me!!!
edit:
curl mappings might be something like this...
curl-XPOST"http://localhost:9200/yourindex/_mappings"-H'Content-Type:application/json'-d'{"myindex":{"mappings":{"mytype":{"properties":{"artists":{"properties":{"id":{"type":"string"},"name":{"type":"string","analyzer":"ngramAnalyzer","fields":{"raw":{"type":"string","index":"not_analyzed"}}}}}}}}}}'

Related

Elasticsearch NEST API: How to write Query descriptor to implement search with Starts with?

Using Elasticsearch Nest Client to search for company name store in Elasticsearch. Here is sample of my queryExtentions.
I want to change it to make sure when I search for "Starbucks", it should only return record starting with letter "Starbucks". Currently it is rerurning all the records where it has "StarBucks".
Based on documentation, I need to search on "Keyword" filed in order to get the result.
Need sample code to how to achieve this.
****Elastic Search Index Column"
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
Code*
var escapedSearchTerm = ElasticsearchQueryExtensions.EscapeQuery(companyName);
return new QueryContainerDescriptor<SearchResponseStorageContractV1>().Bool(b => b.Must(mu => mu
.QueryString(qs => qs
.AllowLeadingWildcard(true)
.AnalyzeWildcard(true)
.Fields(f => f.Field(s => s.Company.Name).Field(s => s.Organization.CommonName))
.Query(escapedSearchTerm)
)));
I am not familiar with Elastic Search Nest Client, but in JSON format you can implement search with functionality using prefix query
Adding a working example with index data,mapping,search query and search result
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Index Data:
{
"name":"Starbucks is a American multinational chain of coffeehouses"
}
{
"name":"coffee at Starbucks"
}
Search Query:
{
"query": {
"prefix": {
"name": {
"value": "Starbucks",
"case_insensitive": true // this param was introduced in 7.10.0
}
}
}
}
Search Result:
"hits": [
{
"_index": "67424740",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "Starbucks is a American multinational chain of coffeehouses"
}
}
]

what types are best for elasticsearch "KEYWORDS"(like hashtags) field?

i want to make Elasticsearch index for something KEYWORDS, like.. hashtag.
and make synonym filter for KEYWORDs.
i think two ways indexing keyword, first is make keyword type.
{
"settings": {
"keywordField": {
"type": "keyword"
}
}
}
if make a index with League of Legends
maybe this.
{
"keywordField": ["leagueoflegends", "league", "legends", "lol" /* synonym */]
}
or text type:
{
"settings": {
"keywordField": {
"type": "text",
"analyzer": "lowercase_and_whitespace_and_synonym_analyzer"
}
}
}
maybe this.
{
"keywordField": ["league of legends"](synonym: lol => leagueoflegends)
}
if use _analyzer api for this field, expects "leagueoflegends", "league", "legends"
search query: 'lol', 'league of legends', 'League of Legends' have to match this field.
which practice is best?
Adding a working example with index data, mapping, search query, and search result. In the below example, I have taken two synonyms lol and leagueoflegends
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"leagueoflegends, lol"
]
}
},
"analyzer": {
"synonym_analyzer": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"keywordField": {
"type": "text"
}
}
}
}
Index Data:
{
"keywordField": ["leagueoflegends", "league", "legends"]
}
Search Query:
{
"query": {
"match": {
"keywordField": {
"query": "lol",
"analyzer": "synonym_analyzer"
}
}
}
}
Search Result:
"hits": [
{
"_index": "66872989",
"_type": "_doc",
"_id": "1",
"_score": 0.19363807,
"_source": {
"keywordField": [
"leagueoflegends",
"league",
"legends"
]
}
}
]

How to create and add values to a standard lowercase analyzer in elastic search

Ive been around the houses with this for the past few days trying things in various orders but cant figure out why its not working.
I am trying to create an index in Elasticsearch with an analyzer which is the same as the "standard" analyzer but retains upper case characters when records are stored.
I create my analyzer and index as follows:
PUT /upper
{
"settings": {
"index" : {
"analysis" : {
"analyzer": {
"rebuilt_standard": {
"tokenizer": "standard",
"filter": [
"standard"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
}
}
}
Then add two records to test like this...
POST /upper/doc
{
"text" : "TEST"
}
Add a second record...
POST /upper/doc
{
"text" : "test"
}
Using /upper/_settings gives the following:
{
"upper": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "upper",
"creation_date": "1537788581060",
"analysis": {
"analyzer": {
"rebuilt_standard": {
"filter": [
"standard"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "s4oDgdsFTxOwsdRuPAWEkg",
"version": {
"created": "6030299"
}
}
}
}
}
But when I search with the following query I still get two matches! Both the upper and lower cases which must mean the analyser is not applied when I store the records.
Search like so...
GET /upper/_search
{
"query": {
"term": {
"text": {
"value": "test"
}
}
}
}
Thanks in advance!
first thing first you set your analyzer on the title field instead of upon the text field (since your search is on the text property, and since you are indexing doc with only text property)
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
try
"properties": {
"text": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
and keep us posted ;)

Elastic search Unorderered Partial phrase matching with ngram

Maybe I am going down the wrong route, but I am trying to set up Elasticsearch to use Partial Phrase matching to return parts of words from any order of a sentence.
Eg. I have the following input
test name
tester name
name test
namey mcname face
test
And I hope to do a search for "test name" (or "name test"), and I hope all of these return (hopefully sorted in order of score). I can do partial searches, and also can do out of order searches, but not able to combine the 2. I am sure this would be a very common issue.
Below is my Settings
{
"myIndex": {
"settings": {
"index": {
"analysis": {
"filter": {
"mynGram": {
"type": "nGram",
"min_gram": "2",
"max_gram": "5"
}
},
"analyzer": {
"custom_analyser": {
"filter": [
"lowercase",
"mynGram"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "5"
}
}
}
}
}
}
}
My mapping
{
"myIndex": {
"mappings": {
"myIndex": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "custom_analyser"
}
}
}
}
}
}
And my query
{
"query": {
"bool": {
"must": [{
"match_phrase": {
"name": {
"query": "test name",
"slop": 5
}
}
}]
}
}
}
Any help would be greatly appreciated.
Thanks in advance
not sure if you found your solution - I bet you did because this is such an old post, but I was on the hunt for the same thing and found this: Query-Time Search-as-you-type
Look up slop.

Understanding Elasticsearch synonym

Being very new in Elasticsearch, I'm not sure what's the best way to use synonym.
I have two fields, one is hashtag and another one is name. Hashtag containing names in lower case without whitespace whereas name contains actual name in camel case format.
I want to search based on name in the right format and want to get all matching names along with those docs where it matches hashtag as well.
For example, name contains "Tom Cruise" and hashtag is "tomcruise". I want to search "Tom Cruise" and expected result is it will return all docs which has either name "Tom Cruise" or hashtag "tomcruise".
Here is the way I'm creating this index:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"synonym" : {
"type" : "synonym",
"ignore_case" : true,
"synonyms" : [
"tom cruise => tomcruise, tom cruise"
]
}
},
"analyzer": {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
}
}
}
}
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"hashtag": {
"type": "string",
"search_analyzer": "synonym",
"analyzer": "standard"
},
"name":{
"type": "keyword"
}
}
}
}
POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "hashtag": "tomcruise", "name": "abc" }
{ "index": { "_id": 2 }}
{ "hashtag": "tomhanks", "name": "efg" }
{ "index": { "_id": 3 }}
{ "hashtag": "tomcruise" , "name": "efg" }
{ "index": { "_id": 4 }}
{ "hashtag": "news" , "name": "Tom Cruise"}
{ "index": { "_id": 5 }}
{ "hashtag": "celebrity", "name": "Kate Winslet" }
{ "index": { "_id": 6 }}
{ "hashtag": "celebrity", "name": "Tom Cruise" }
When I do analyze, it looks like I get the right tokens: [tomcruise, tom, cruise]
GET /my_index/_analyze
{
"text": "Tom Cruise",
"analyzer": "synonym"
}
Here's how I'm searching:
POST /my_index/my_type/_search?pretty
{
"query":
{
"multi_match": {
"query": "Tom Cruise",
"fields": [ "hashtag", "name" ]
}
}
}
Is this the right way to archive my search requirement?
What's the best way to search like this on Kibana? I have to use the entire query but what I need to do if I want to just type "Tom Cruise" and want to get the expected result? I tried with "_all" but didn't work.
Updated:
After discussing with Russ Cam and with my little knowledge of Elasticsearch, I thought it will be overkill to use synonym for my search requirement. So I changed search analyzer to generate same token and got the same result. Still want to know whether I'm doing it in the right way.
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true
}
},
"analyzer": {
"test_analyzer" : {
"type": "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "word_joiner"]
}
}
}
}
}

Resources