Elasticsearch NEST API: How to write Query descriptor to implement search with Starts with? - elasticsearch

Using Elasticsearch Nest Client to search for company name store in Elasticsearch. Here is sample of my queryExtentions.
I want to change it to make sure when I search for "Starbucks", it should only return record starting with letter "Starbucks". Currently it is rerurning all the records where it has "StarBucks".
Based on documentation, I need to search on "Keyword" filed in order to get the result.
Need sample code to how to achieve this.
****Elastic Search Index Column"
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
Code*
var escapedSearchTerm = ElasticsearchQueryExtensions.EscapeQuery(companyName);
return new QueryContainerDescriptor<SearchResponseStorageContractV1>().Bool(b => b.Must(mu => mu
.QueryString(qs => qs
.AllowLeadingWildcard(true)
.AnalyzeWildcard(true)
.Fields(f => f.Field(s => s.Company.Name).Field(s => s.Organization.CommonName))
.Query(escapedSearchTerm)
)));

I am not familiar with Elastic Search Nest Client, but in JSON format you can implement search with functionality using prefix query
Adding a working example with index data,mapping,search query and search result
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Index Data:
{
"name":"Starbucks is a American multinational chain of coffeehouses"
}
{
"name":"coffee at Starbucks"
}
Search Query:
{
"query": {
"prefix": {
"name": {
"value": "Starbucks",
"case_insensitive": true // this param was introduced in 7.10.0
}
}
}
}
Search Result:
"hits": [
{
"_index": "67424740",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "Starbucks is a American multinational chain of coffeehouses"
}
}
]

Related

Username search in Elasticsearch

I want to implement a simple username search within Elasticsearch. I don't want weighted username searches yet, so I would expect it wouldn't be to hard to find resources on how do this. But in the end, I came across NGrams and lot of outdated Elasticsearch tutorials and I completely lost track on the best practice on how to do this.
This is now my setup, but it is really bad because it matches so much unrelated usernames:
{
"settings": {
"index" : {
"max_ngram_diff": "11"
},
"analysis": {
"analyzer": {
"username_analyzer": {
"tokenizer": "username_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"username_tokenizer": {
"type": "ngram",
"min_gram": "1",
"max_gram": "12"
}
}
}
},
"mappings": {
"properties": {
"_all" : { "enabled" : false },
"username": {
"type": "text",
"analyzer": "username_analyzer"
}
}
}
}
I am using the newest Elasticsearch and I just want to query similar/exact usernames. I have a user db and users should be able to search for eachother, nothing to fancy.
If you want to search for exact usernames, then you can use the term query
Term query returns documents that contain an exact term in a provided field. If you have not defined any explicit index mapping, then you need to add .keyword to the field. This uses the keyword analyzer instead of the standard analyzer.
There is no need to use an n-gram tokenizer if you want to search for the exact term.
Adding a working example with index data, index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Index Data:
{
"username": "Jack"
}
{
"username": "John"
}
Search Query:
{
"query": {
"term": {
"username.keyword": "Jack"
}
}
}
Search Result:
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Jack"
}
}
]
Edit 1:
To match for similar terms, you can use the fuzziness parameter along with the match query
{
"query": {
"match": {
"username": {
"query": "someting",
"fuzziness":"auto"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "3",
"_score": 0.6065038,
"_source": {
"username": "something"
}
}
]

what types are best for elasticsearch "KEYWORDS"(like hashtags) field?

i want to make Elasticsearch index for something KEYWORDS, like.. hashtag.
and make synonym filter for KEYWORDs.
i think two ways indexing keyword, first is make keyword type.
{
"settings": {
"keywordField": {
"type": "keyword"
}
}
}
if make a index with League of Legends
maybe this.
{
"keywordField": ["leagueoflegends", "league", "legends", "lol" /* synonym */]
}
or text type:
{
"settings": {
"keywordField": {
"type": "text",
"analyzer": "lowercase_and_whitespace_and_synonym_analyzer"
}
}
}
maybe this.
{
"keywordField": ["league of legends"](synonym: lol => leagueoflegends)
}
if use _analyzer api for this field, expects "leagueoflegends", "league", "legends"
search query: 'lol', 'league of legends', 'League of Legends' have to match this field.
which practice is best?
Adding a working example with index data, mapping, search query, and search result. In the below example, I have taken two synonyms lol and leagueoflegends
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"leagueoflegends, lol"
]
}
},
"analyzer": {
"synonym_analyzer": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"keywordField": {
"type": "text"
}
}
}
}
Index Data:
{
"keywordField": ["leagueoflegends", "league", "legends"]
}
Search Query:
{
"query": {
"match": {
"keywordField": {
"query": "lol",
"analyzer": "synonym_analyzer"
}
}
}
}
Search Result:
"hits": [
{
"_index": "66872989",
"_type": "_doc",
"_id": "1",
"_score": 0.19363807,
"_source": {
"keywordField": [
"leagueoflegends",
"league",
"legends"
]
}
}
]

Space handling in Elastic Search

If a Document (Say a merchant name) that I am searching for has no space in it and user search by adding space in it, the result won't show in elastic search. How can that be improved to get results?
For example:
Merchant name is "DeliBites"
User search by typing in "Deli Bites", then the above merchant does not appear in results. The merchant only appears in suggestions when I have typed just "Deli" or "Deli" followed by a space or "Deli."
Adding another option, you can also use the edge n-gram tokenizer which will work in most of the cases, its simple to setup and use.
Working example on your data
Index definition
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
},
"index.max_ngram_diff" : 10
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
Index sample doc
{
"title" : "DeliBites"
}
Search query
{
"query": {
"match": {
"title": {
"query": "Deli Bites"
}
}
}
}
And search results
"hits": [
{
"_index": "65489013",
"_type": "_doc",
"_id": "1",
"_score": 0.95894027,
"_source": {
"title": "DeliBites"
}
}
]
I suggest using synonym token filter.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html
you should have a dictionary for all words that you want search.
something like this:
DelitBites => Deli Bites
ipod => i pod
before implementing synonym be sure you understood all aspect of it.
https://www.elastic.co/blog/boosting-the-power-of-elasticsearch-with-synonyms

Return only exact matches (substrings) in full text search (elasticsearch)

I have an index in elasticsearch with a 'title' field (analyzed string field). If I have the following documents indexed:
{title: "Joe Dirt"}
{title: "Meet Joe Black"}
{title: "Tomorrow Never Dies"}
and the search query is "I want to watch the movie Joe Dirt tomorrow"
I want to find results where the full title matches as a substring of the search query. If I use a straight match query, all of these documents will be returned because they all match one of the words. I really just want to return "Joe Dirt" because the title is an exact match substring of the search query.
Is that possible in elasticsearch?
Thanks!
One way to achieve this is as follows :
1) while indexing index title using keyword tokenizer
2) While searching use shingle token-filter to extract substring from the query string and match against the title
Example:
Index Settings
put test
{
"settings": {
"analysis": {
"analyzer": {
"substring": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"substring"
]
},
"exact": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"filter": {
"substring": {
"type":"shingle",
"output_unigrams" : true
}
}
}
},
"mappings": {
"movie": {
"properties": {
"title": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "exact"
}
}
}
}
}
}
}
Index Documents
put test/movie/1
{"title": "Joe Dirt"}
put test/movie/2
{"title": "Meet Joe Black"}
put test/movie/3
{"title": "Tomorrow Never Dies"}
Query
post test/_search
{
"query": {
"match": {
"title.raw" : {
"analyzer": "substring",
"query": "Joe Dirt tomorrow"
}
}
}
}
Result :
"hits": {
"total": 1,
"max_score": 0.015511602,
"hits": [
{
"_index": "test",
"_type": "movie",
"_id": "1",
"_score": 0.015511602,
"_source": {
"title": "Joe Dirt"
}
}
]
}

elasticsearch ngram analyzer/tokenizer not working?

it seems that the ngram tokenizer isn't working or perhaps my understanding/use of it isn't correct.
my tokenizer is doing a mingram of 3 and maxgram of 5. i'm looking for the term 'madonna' which is definitely in my documents under artists.name. i can find the term with other techniques (using simple analyzer and related), but not using ngram.
what i'm trying to accomplish by using the ngram is to find names and accounting for misspellings.
please see a shortened version of my mappings, my settings, and my query, and if you have any ideas, please let me know - it's driving me nuts!
settings...
{
"myindex": {
"settings": {
"index": {
"analysis": {
"analyzer": {
"ngramAnalyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "nGramTokenizer"
}
},
"tokenizer": {
"nGramTokenizer": {
"type": "nGram",
"min_gram": "3",
"max_gram": "5"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1020199"
},
"uuid": "60ggSr6TREaDTItkaNUagg"
}
}
}
}
mappings ...
{
"myindex": {
"mappings": {
"mytype": {
"properties": {
"artists.name": {
"type": "string",
"analyzer": "simple",
"fields": {
"ngram": {
"type": "string",
"analyzer": "ngramAnalyzer"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
query ...
{"query": {"match": {"artists.name.ngram": "madonna"}}}
document ...
{
"_index": "myindex",
"_type": "mytype",
"_id": "602537592951",
"_version": 1,
"found": true,
"_source": {
"artists": [
{
"name": "Madonna",
"id": "P 64565"
}
]
}
}
EDIT
incidentally, this query works (without ngram):
{"query": {"match": {"artists.name": "madonna"}}}
this obviously has something to do with the nested object here. i'm apparently not applying the ngram to the nested object properly.
ideas?
ok - i figured it out. i really hope this helps someone b/c it drove me crazy.
here's what my mapping turned out to look like:
{
"myindex": {
"mappings": {
"mytype": {
"properties": {
"artists": {
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string",
"analyzer": "ngramAnalyzer",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
and here's how i did it using Nest syntax...
first i had a sub type (class) called Person which has a Name and Id which looks like this (POCO)...
[Serializable]
public class Person
{
public string Name { get; set; }
[ElasticProperty(Analyzer = "fullTerm", Index = FieldIndexOption.not_analyzed)]
public string Id { get; set; }
}
and then my mapping went something like this ...
.AddMapping<MyIndex>(m => m
.MapFromAttributes()
.Properties(props =>
{
props
.Object<Person>(x => x.Name("artists")
.Properties(pp => pp
.MultiField(
mf => mf
.Name(s => s.Name)
.Fields(f => f
.String(s => s.Name(o => o.Name).Analyzer("ngramAnalyzer"))
.String(s => s.Name(o => o.Name.Suffix("raw")).Index(FieldIndexOption.not_analyzed))
)
)
)
)
)
Note: the Object here which indicates it's another object beneath my type 'artists'.
Thanks, me!!!
edit:
curl mappings might be something like this...
curl-XPOST"http://localhost:9200/yourindex/_mappings"-H'Content-Type:application/json'-d'{"myindex":{"mappings":{"mytype":{"properties":{"artists":{"properties":{"id":{"type":"string"},"name":{"type":"string","analyzer":"ngramAnalyzer","fields":{"raw":{"type":"string","index":"not_analyzed"}}}}}}}}}}'

Resources