ES Search partial word - ngram? - elasticsearch

I am using Elastic Search to index entities that contain two fields: agencyName and agencyAddress.
Let's say I have indexed one entity:
{
"agencyName": "Turismo Viajes",
"agencyAddress": "Av. Maipú 500"
}
I would like to be able to search for this entity and get the entity above searching through the agencyName. Different searches could be:
1) urismo
2) Viaje
3) Viajes
4) Turismo
5) uris
The idea is that if I query with those strings I should always get that entity (probably with different score depending on how accurate it is).
For this I thought that nGram would work out, so I defined a global analyzer in my elastic search.yml file called phrase.
index:
analysis:
analyzer:
phrase:
type: custom
tokenizer: nGram
filter: [nGram, lowercase, asciifolding]
And I created the agency index like this:
{
"possible_clients" : {
"possible_client" : {
"properties" : {
"agencyName" : {
"type" : "string",
"analyzer" : "phrase"
},
"agencyAddress" : {
"type": "string"
}
}
The problem is that when making a call like this:
curl -XPOST 'http://localhost:9200/possible_clients/possible_client/_search' -d '{
"query": { "term": { "agencyName": "uris" }}
}'
I don't get any hits. Any ideas what I am doing wrong?
Thanks in advance.

You are using a term query for searching. A term query is always unanalysed. So changing the analyser will not have any effect. You should use for example a match query.

According to the docs, the default value of the max_gram of your tokenizer is 2. So, you index tu, ur, ri, is, sm, mo , etc etc.
The term filter does not analyze your input, so, you are searching for uris, and uris was never indexed.
Try to set a max_gram. :
ngram tokenizer
ngram tokenfilter
And maybe you should not use both the ngram tokenizer and the ngram filter. I always used just the filter. (because the tokenizer was the whitespace)
here is a edgengram filter we had to define here. Ngrams should work just the same.
"filter" : {
"my_filter" : {
"type" : "edgeNGram",
"min_gram" : "1",
"max_gram" : "20"
}
}
Hope it helps.

Related

Phrase suggester returns unexpected result when first letter is misspelled

I'm using Elasticsearch Phrase Suggester for correcting user's misspellings. everything is working as I expected unless user enters a query which it's first letter is misspelled. At this situation phrase suggester returns nothing or returns unexpected results.
My query for suggestion:
{
"suggest": {
"text": "user_query",
"simple_phrase": {
"phrase": {
"field": "title.phrase",,
"collate": {
"query": {
"inlile" : {
"bool": {
"should": [
{ "match": {"title": "{{suggestion}}"}},
{ "match": {"participants": "{{suggestion}}"}}
]
}
}
}
}
}
}
}
}
Example when first letter is misspelled:
"simple_phrase" : [
{
"text" : "گاشانچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "گارانتی",
"score" : 0.00253151
}]
}
]
Example when fifth letter is misspelled:
"simple_phrase" : [
{
"text" : "کاشاوچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "کاشانچی",
"score" : 0.1121
},
{
"text" : "کاشانجی",
"score" : 0.0021
},
{
"text" : "کاشنچی",
"score" : 0.0020
}]
}
]
I expect that these two misspelled queries have same suggestions(my expected suggestions are second one). what is wrong?
P.S: I'm using this feature for Persian language.
I have solution for your problem, only need to add some fields in your schema.
P.S: I don't have that much expertise in elasticsearch but I have solved same problem using solr, you can implement same way in elasticSearch too
Create new ngram field and copy all you title name in ngram field.
When you fire any query for missspell word and you get blank result then split
the word and again fire the same query you will get results as expected.
Example : Suppose user searching for word Akshay but type it as Skshay, then
create query in below way you will get results as expected hopefully.
I am here giving you solr example same way you can achieve it using
elasticsearch.
**(ngram:"skshay" OR ngram:"sk" OR ngram:"ks" OR ngram:"sh" OR ngram:"ha" ngram:"ay")**
We have split the word sequence wise and fire query on field ngram.
Hope it will help you.
From Elasticsearch doc:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html
prefix_length
The number of minimal prefix characters that must match in order be a
candidate suggestions. Defaults to 1. Increasing this number improves
spellcheck performance. Usually misspellings don’t occur in the
beginning of terms. (Old name "prefix_len" is deprecated)
So by default phrase-suggester assumes that the first character is correct because the default value for prefix_length is 1.
Note: setting this value to 0 is not a good way because this will have performance implications.
You need to use the reverse analyzer
I explained it in this post so please go and check my answer
Elasticsearch spell check suggestions even if first letter missed
And regarding the duplicates, you can use
skip_duplicates
Whether duplicate suggestions should be filtered out (defaults to
false).

Multi word partial search

i am very new to the Elastic search.
Like to know how to search partial multi word search.
\
for ex :
My document
{
"title":"harry porter"
}
i need this document with search with following string
1.)har por
same as sql query (select * from books where title like '%har%' or title like '%por%')
Using a completion suggester will provide most of the feature you want. It will find words starting with an arbitrary string, like "har" or "por".
Check out this question for a full example on how to set up a completion suggester.
As described in the documentation, you can achieve multi-word search (i.e. returning "harry horter" from a search for "por") by creating your analyzer with the option preserve_position_increments set to false
PUT books
{
"mappings": {
"book" : {
"properties" : {
"suggest" : {
"type" : "completion",
"preserve_position_increments": false
},
"title" : {
"type": "keyword"
}
}
}
}
}
Refer to this : Edge NGram Tokenizer
This helps in partial multi-word search (similar to autocomplete suggestions). Hope this helps!

ElasticSearch filter on exact url

Let's say I create this document in my index:
put /nursery/rhyme/1
{
"url" : "http://example.com/mary",
"text" : "Mary had a little lamb"
}
Why does this query not return anything?
POST /nursery/rhyme/_search
{
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"url" : "http://example.com/mary"
}
}
}
The Term Query finds documents that contain the exact term specified in the inverted index. When you save the document, the url property is analyzed and it will result in the following terms (with the default analyzer) : [http, example, com, mary].
So what you currently have in you inverted index is that bunch of terms, non of them is http://example.com/mary.
What you want is to not analyze the url property or to do a Match Query that will split the query into terms just like when indexing.
Exact Match does not work for analyzed field. A string is by default analyzed which means http://example.com/mary string will be split and stored in reverse index as http , example , com , mary. That's why your query results in no output.
You can make your field not analyzed
{
"url": {
"type": "string",
"index": "not_analyzed"
}
}
but for this you will have to reindex your index.
Study about not_analyzed and term query here.
Hope this helps
In the ElasticSearch 7.x you have to use type "keyword" in maping properties, which is not analized https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

Elasticsearch: how to query a long field for exact match

My document has the following mapping property:
"sid" : {"type" : "long", "store": "yes", "index": "no"},
This property has only one value for each record. I would like to query this property. I tried the following queries:
{
"query" : {
"term" : {
"sid" : 10
}
}
}
{
"query" : {
"match" : {
"sid" : 10
}
}
}
However, I got no results. I do have a document with sid being euqal to 10. Anything I did is wrong? I would like to query this property for exact match.
Thanks and regards.
Quote from the documentation:
index: Set to analyzed for the field to be indexed and searchable after being
broken down into token using an analyzer. not_analyzed means that its
still searchable, but does not go through any analysis process or
broken down into tokens. no means that it won’t be searchable at all
(as an individual field; it may still be included in _all). Setting to
no disables include_in_all. Defaults to analyzed.
So, by setting index to no you cannot search by that field individually. So, you either need to remove no from index and choose something else or you can use "include_in_all":"yes" and use a different type of query:
"query": {
"match": {
"_all": 10
}
}

Exact (not substring) matching in Elasticsearch

{"query":{
"match" : {
"content" : "2"
}
}} matches all the documents whole content contains the number 2, however I would like the content to be exactly 2, no more no less - think of my requirement in a spirit of Java's String.equals.
Similarly for the second query I would like to match when the document's content is exactly '3 3' and nothing more or less. {"query":{
"match" : {
"content" : "3 3"
}
}}
How could I do exact (String.equals) matching in Elasticsearch?
Without seeing your index type mapping and sample data, it's hard to answer this directly - but I'll try.
Offhand, I'd say this is similar to this answer here (https://stackoverflow.com/a/12867852/382774), where you simply set the content field's index option to not_analyzed in your mapping:
"url" : {
"type" : "string",
"index" : "not_analyzed"
}
Edit: I wasn't clear enough with my original answer, shown above. I did not mean to imply that you should add the example code to your query, I meant that you need to specify in your index type mapping that the url field is of type string and it is indexed but not analyzed (not_analyzed).
This tells Elasticsearch to not bother analyzing (tokenizing or token filtering) the field when you're indexing your documents - just store it in the index as it exists in the document. For more information on mappings, see http://www.elasticsearch.org/guide/reference/mapping/ for an intro and http://www.elasticsearch.org/guide/reference/mapping/core-types/ for specifics on not_analyzed (tip: search for it on that page).
Update:
Official doc tells us that in a new version of Elastic search you can't define variable as "not_analyzed", instead of this you should use "keyword".
For the old version elastic:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
For new version:
{
"foo": {
"type" "keyword",
"index": true
}
}
Note that this functionality (keyword type) are from elastic 5.0 and backward compatibility layer is removed from Elasticsearch 6.0 release.
Official Doc
You should use filter instead of match.
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"content" : 2
}
}
}
}
And you got docs whose content is exact 2, instead of 20 or 2.1

Resources