Exact (not substring) matching in Elasticsearch

Exact (not substring) matching in Elasticsearch - elasticsearch

{"query":{
"match" : {
"content" : "2"
}
}} matches all the documents whole content contains the number 2, however I would like the content to be exactly 2, no more no less - think of my requirement in a spirit of Java's String.equals.
Similarly for the second query I would like to match when the document's content is exactly '3 3' and nothing more or less. {"query":{
"match" : {
"content" : "3 3"
}
}}
How could I do exact (String.equals) matching in Elasticsearch?

Without seeing your index type mapping and sample data, it's hard to answer this directly - but I'll try.
Offhand, I'd say this is similar to this answer here (https://stackoverflow.com/a/12867852/382774), where you simply set the content field's index option to not_analyzed in your mapping:
"url" : {
"type" : "string",
"index" : "not_analyzed"
}
Edit: I wasn't clear enough with my original answer, shown above. I did not mean to imply that you should add the example code to your query, I meant that you need to specify in your index type mapping that the url field is of type string and it is indexed but not analyzed (not_analyzed).
This tells Elasticsearch to not bother analyzing (tokenizing or token filtering) the field when you're indexing your documents - just store it in the index as it exists in the document. For more information on mappings, see http://www.elasticsearch.org/guide/reference/mapping/ for an intro and http://www.elasticsearch.org/guide/reference/mapping/core-types/ for specifics on not_analyzed (tip: search for it on that page).
Update:
Official doc tells us that in a new version of Elastic search you can't define variable as "not_analyzed", instead of this you should use "keyword".
For the old version elastic:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
For new version:
{
"foo": {
"type" "keyword",
"index": true
}
}
Note that this functionality (keyword type) are from elastic 5.0 and backward compatibility layer is removed from Elasticsearch 6.0 release.

Official Doc
You should use filter instead of match.
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"content" : 2
}
}
}
}
And you got docs whose content is exact 2, instead of 20 or 2.1

Related

Using Regexp Search inside a must bool query vs using must_not bool query

I want to make queries like - get all documents containing/not containing "some value" for a given field
-get all documents having value equal/not equal to "some value" for a given field.
As per my mapping the fields are String type meaning they support both keyword and full text search something like:
"myField" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
I was initially using regex matching like(this query is for not matches) :
"bool": {
"must":[
{
"regexp": {
"myField.keyword": {
"value": "~(some value)",
"flags": "ALL"
}
}
}
]
}
so, basically ~(word) for not, .*word.* for contains and ~(.*word.*) for not containing.
But, then also came across the 'must_not' bool query, so I understand I can also add a 'must_not' for the not equals cases clause along with the 'must' and 'should' clauses(for boolean AND and OR between other fields) in my bigger bool query, but still not sure about contains and not contains search, can someone definitively explain, what is the best practice here speaking both in terms of performance and accuracy of the result set returned.
ElasticSearch version used - Currently transitioning from v 6.3 to v 7.1.1

ElasticSearch filter on exact url

Let's say I create this document in my index:
put /nursery/rhyme/1
{
"url" : "http://example.com/mary",
"text" : "Mary had a little lamb"
}
Why does this query not return anything?
POST /nursery/rhyme/_search
{
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"url" : "http://example.com/mary"
}
}
}

The Term Query finds documents that contain the exact term specified in the inverted index. When you save the document, the url property is analyzed and it will result in the following terms (with the default analyzer) : [http, example, com, mary].
So what you currently have in you inverted index is that bunch of terms, non of them is http://example.com/mary.
What you want is to not analyze the url property or to do a Match Query that will split the query into terms just like when indexing.

Exact Match does not work for analyzed field. A string is by default analyzed which means http://example.com/mary string will be split and stored in reverse index as http , example , com , mary. That's why your query results in no output.
You can make your field not analyzed
{
"url": {
"type": "string",
"index": "not_analyzed"
}
}
but for this you will have to reindex your index.
Study about not_analyzed and term query here.
Hope this helps

In the ElasticSearch 7.x you have to use type "keyword" in maping properties, which is not analized https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

Elasticsearch: how to query a long field for exact match

My document has the following mapping property:
"sid" : {"type" : "long", "store": "yes", "index": "no"},
This property has only one value for each record. I would like to query this property. I tried the following queries:
{
"query" : {
"term" : {
"sid" : 10
}
}
}
{
"query" : {
"match" : {
"sid" : 10
}
}
}
However, I got no results. I do have a document with sid being euqal to 10. Anything I did is wrong? I would like to query this property for exact match.
Thanks and regards.

Quote from the documentation:
index: Set to analyzed for the field to be indexed and searchable after being
broken down into token using an analyzer. not_analyzed means that its
still searchable, but does not go through any analysis process or
broken down into tokens. no means that it won’t be searchable at all
(as an individual field; it may still be included in _all). Setting to
no disables include_in_all. Defaults to analyzed.
So, by setting index to no you cannot search by that field individually. So, you either need to remove no from index and choose something else or you can use "include_in_all":"yes" and use a different type of query:
"query": {
"match": {
"_all": 10
}
}

ES Search partial word - ngram?

I am using Elastic Search to index entities that contain two fields: agencyName and agencyAddress.
Let's say I have indexed one entity:
{
"agencyName": "Turismo Viajes",
"agencyAddress": "Av. Maipú 500"
}
I would like to be able to search for this entity and get the entity above searching through the agencyName. Different searches could be:
1) urismo
2) Viaje
3) Viajes
4) Turismo
5) uris
The idea is that if I query with those strings I should always get that entity (probably with different score depending on how accurate it is).
For this I thought that nGram would work out, so I defined a global analyzer in my elastic search.yml file called phrase.
index:
analysis:
analyzer:
phrase:
type: custom
tokenizer: nGram
filter: [nGram, lowercase, asciifolding]
And I created the agency index like this:
{
"possible_clients" : {
"possible_client" : {
"properties" : {
"agencyName" : {
"type" : "string",
"analyzer" : "phrase"
},
"agencyAddress" : {
"type": "string"
}
}
The problem is that when making a call like this:
curl -XPOST 'http://localhost:9200/possible_clients/possible_client/_search' -d '{
"query": { "term": { "agencyName": "uris" }}
}'
I don't get any hits. Any ideas what I am doing wrong?
Thanks in advance.

You are using a term query for searching. A term query is always unanalysed. So changing the analyser will not have any effect. You should use for example a match query.

According to the docs, the default value of the max_gram of your tokenizer is 2. So, you index tu, ur, ri, is, sm, mo , etc etc.
The term filter does not analyze your input, so, you are searching for uris, and uris was never indexed.
Try to set a max_gram. :
ngram tokenizer
ngram tokenfilter
And maybe you should not use both the ngram tokenizer and the ngram filter. I always used just the filter. (because the tokenizer was the whitespace)
here is a edgengram filter we had to define here. Ngrams should work just the same.
"filter" : {
"my_filter" : {
"type" : "edgeNGram",
"min_gram" : "1",
"max_gram" : "20"
}
}
Hope it helps.

is there a way to boost matches in long documents in elasticsearch

In elasticsearch the length of the document matters a lot to the final score of the search results. So if I have a match in a field that is just one line long, its going to score much higher than a single match in say .... a document with 5 pages of text.
Is there a way to override this behavior, or reliably and repeatedly boost the result to overcome this behavior?

I guess you mean that the length of the matching field is taken into account when computing the score. If you want to just disable this behaviour you can omit norms while indexing. That way you would lose index time boosting as well, but I guess you're not using it and even if you need boosting you should use query time boosting, way more flexible.
You have to update the mapping for your field like this:
"field_name" : {
"type" : "string",
"omit_norms" : true
}
If you want to override this default behaviour for all your string fields you can use a dynamic template like this:
{
"type_name" : {
"dynamic_templates" : [
{
"omit_norms_template" : {
"match_mapping_type" : "string",
"mapping" : {
"omit_norms" : true
}
}
}
]
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Exact (not substring) matching in Elasticsearch - elasticsearch

Official Doc You should use filter instead of match. { "query" : { "constant_score" : { "filter" : { "term" : { "content" : 2 } } } } And you got docs whose content is exact 2, instead of 20 or 2.1

Related

Using Regexp Search inside a must bool query vs using must_not bool query

ElasticSearch filter on exact url

Elasticsearch: how to query a long field for exact match

ES Search partial word - ngram?

is there a way to boost matches in long documents in elasticsearch

Categories

Resources