Find exact match phrase in ElasticSearch - elasticsearch

So I have the following ElasticSearch query:
"query": {
"bool": {
"must": [
{
"nested": {
"path": "specs",
"query": {
"bool": {
"must": [
{
"match": {
"specs.battery": "2 hours"
}
}
],
"minimum_should_match": 1
}
}
}
},
{
"terms": {
"category_ids": [
16405
]
}
}
]
}
}
At the moment it returns all documents that have either 2 or hours in specs.battery value. How could I modify this query, so that it only returns documents, that have exact phrase 2 hours in specs.battery field? As well, I would like to have the ability to have multiple phrases (2hrs, 2hours, 3 hours etc etc). Is this achievable?

The data in elasticsearch is by default tokenized when you index it. This means the result of indexing the expression "2 hours" will be 2 tokens mapped to the same document.
However there will not be a one token "2 hours", therefore it will either search 2 or hours or even will not find it if you use a filtered query.
To have Elasticseach consider "2 hours" as one expression you need to define specs.battery as not_analyzedin your mapping like follows:
curl -XPOST localhost:9200/your_index -d '{
"mappings" : {
"your_index_type" : {
"properties" : {
...
"battery" : { "type" : "string", "index":"not_analyzed" }
...
}
}
}
}'
Then you can have an exact match using a filtered query as follows:
curl -XGET 'http://localhost:9200/_all/_search?pretty=true' -d '
{
"query": {
"filtered" : {
"filter" : {
"term": {
"battery": "2 hours"
}
}
}
}
}'
Then you'll have an exact match.
More details at: https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
If on the other hand you absolutely need your field to be analyzed or work with an existing index that you can't change, you still have a solution by using the operator "and" like follows:
curl -XGET localhost:9200/your_index' -d '
{
"query": {
"match": {
"battery": {
"query": "2 hours",
"operator": "and"
}
}
}
}'
In the last option, you may have understood already that if you have a document that has "2 hours and something else" , the document will still be matched so this is not as precise as with an "not_analyzed" field.
More details on the last topic at:
https://www.elastic.co/guide/en/elasticsearch/guide/current/match-multi-word.html

Related

Find the first doc for each property value

I'm trying to get the first document that has a specific property.
for example i have 50 docs with property "a":"1", with different dates.
also 100 docs with "a":"2"
is there a way to query the first doc of each "a" value by date?
Not exactly what you wanted, but you could run the following which will show you the results that match a:1 or a:2 and will order the results as you wanted.
{
"sort": {
"your_timestamp_field": {
"order": "desc"
}
},
"query": {
"filtered": {
"filter": {
"or": [
{
"term": {
"a": 1
}
},
{
"term": {
"a": 2
}
}
]
}
}
}
}
You could also run multiple queries using msearch. For example
Place the below in a file named requests
{"index": "your-index"}
{"size":1,"sort":{"#timestamp":{"order":"desc"}},"query":{"filtered":{"filter":{"term":{"a":"1"}}}}}
{"index": "your-index"}
{"size":1,"sort":{"#timestamp":{"order":"desc"}},"query":{"filtered":{"filter":{"term":{"a":"2"}}}}}
Then run
curl -XGET http://localhost:9200/your-index/_msearch --data-binary #requests; echo

exact query search in elasticsearch

I have this query that returns if the word "mumbai" appear anywhere in the title.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"title": "mumbai"
}
}
}
}
}
So the result contains...
mumbai
mumbai ports
financial capital mumbai
I need to return only "mumbai" term and not the other documents where mumbai word is associated with other phrases. Only the first result is correct. How do I discard other results?
update
This query is working as expected and it lists the sort value 58 (random value) if the match is exact.
curl -XPOST "localhost:9200/enwiki_content/page/_search?pretty" -d'
{
"fields": "title",
"query": {
"match": {"title": "Mumbai"}
},
"sort": {
"_script": {
"script": "_source.title == \"Mumbai\" ? \"58\": \"78\";",
"type": "string"
}
}
}'
I need to return the title where match is exact Mumbai (and hence the sort value 58). How do I filter or add the script to "fields" parameter?
To get mumbai to match with doc which contains only mumbai and nothing else, you'll have to store a token count field for the field you are searching on.
This token count field will contain the number of tokens the field contains. Using this field, you can match mumbai on your title field, and match token_count field with the number of tokens in mumbai (which is one).
Note that token_count field in other documents will more than 1.
For reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/token-count.html
Note: If you are using stopwords, then you need to know about the other caveats related to token count. You can find the information in the above link.
Try the term query. It will do exact match search
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "mumbai"
}
}
]
}
}
}
Term query will not match Mumbai and mumbai, it will be counted as different words
Second Option:
If you can change the mapping then you can set the title field as not_analyzed
Third Option
match query with analyzer option
{
"query": {
"match": {
"title": {
"query": "mumbai",
"analyzer": "keyword"
}
}
}
}

Querystring search on array elements in Elastic Search

I'm trying to learn elasticsearch with a simple example application, that lists quotations associated with people. The example mapping might look like:
{
"people" : {
"properties" : {
"name" : { "type" : "string"},
"quotations" : { "type" : "string" }
}
}
}
Some example data might look like:
{ "name" : "Mr A",
"quotations" : [ "quotation one, this and that and these"
, "quotation two, those and that"]
}
{ "name" : "Mr B",
"quotations" : [ "quotation three, this and that"
, "quotation four, those and these"]
}
I would like to be able to use the querystring api on individual quotations, and return the people who match. For instance, I might want to find people who have a quotation that contains (this AND these) - which should return "Mr A" but not "Mr B", and so on. How can I achieve this?
EDIT1:
Andrei's answer below seems to work, with data values now looking like:
{"name":"Mr A","quotations":[{"value" : "quotation one, this and that and these"}, {"value" : "quotation two, those and that"}]}
However, I can't seem to get a query_string query to work. The following produces no results:
{
"query": {
"nested": {
"path": "quotations",
"query": {
"query_string": {
"default_field": "quotations",
"query": "quotations.value:this AND these"
}
}
}
}
}
Is there a way to get a query_string query working with a nested object?
Edit2: Yes it is, see Andrei's answer.
For that requirement to be achieved, you need to look at nested objects, not to query a flattened list of values but individual values from that nested object. For example:
{
"mappings": {
"people": {
"properties": {
"name": {
"type": "string"
},
"quotations": {
"type": "nested",
"properties": {
"value": {
"type": "string"
}
}
}
}
}
}
}
Values:
{"name":"Mr A","quotations":[{"value": "quotation one, this and that and these"}, {"value": "quotation two, those and that"}]}
{"name":"Mr B","quotations":[{"value": "quotation three, this and that"}, {"value": "quotation four, those and these"}]}
Query:
{
"query": {
"nested": {
"path": "quotations",
"query": {
"bool": {
"must": [
{ "match": {"quotations.value": "this"}},
{ "match": {"quotations.value": "these"}}
]
}
}
}
}
}
Unfortunately there is no good way to do that.
https://web.archive.org/web/20141021073225/http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/complex-core-fields.html
When you get a document back from Elasticsearch, any arrays will be in
the same order as when you indexed the document. The _source field
that you get back contains exactly the same JSON document that you
indexed.
However, arrays are indexed — made searchable — as multi-value fields,
which are unordered. At search time you can’t refer to “the first
element” or “the last element”. Rather think of an array as a bag of
values.
In other words, it is always considering all values in the array.
This will return only Mr A
{
"query": {
"match": {
"quotations": {
"query": "quotation one",
"operator": "AND"
}
}
}
}
But this will return both Mr A & Mr B:
{
"query": {
"match": {
"quotations": {
"query": "this these",
"operator": "AND"
}
}
}
}
If scripting is enabled, this should work:
"script": {
"inline": "for(element in _source.quotations) { if(element == 'this' && element == 'these') {return true;} }; return false;"
}

How to make query_string search exact phrase in ElasticSearch

I put 2 documents in Elasticsearch :
curl -XPUT "http://localhost:9200/vehicles/vehicle/1" -d'
{
"model": "Classe A"
}'
curl -XPUT "http://localhost:9200/vehicles/vehicle/2" -d'
{
"model": "Classe B"
}'
Why is this query returns the 2 documents :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe A\""
}
}
}'
And this one, only the second document :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe B\""
}
}
}'
I want elastic search to match on the exact phrase I pass to the query parameter, WITH the whitespace, how can I do that ?
What you need to look at is the analyzer you're using. If you don't specify one Elasticsearch will use the Standard Analyzer. It is great for the majority of cases with plain text input, but doesn't work for the use case you mention.
What the standard analyzer will do is split the words in your string and then converts them to lowercase.
If you want to match the whole string "Classe A" and distinguish this from "Classe B", you can use the Keyword Analyzer. This will keep the entire field as one string.
Then you can use the match query which will return the results you expect.
Create the mapping:
PUT vehicles
{
"mappings": {
"vehicle": {
"properties": {
"model": {
"type": "string",
"analyzer": "keyword"
}
}
}
}
}
Perform the query:
POST vehicles/_search
{
"query": {
"match": {
"model": "Classe A"
}
}
}
If you wanted to use the query_string query, then you could set the operator to AND
POST vehicles/vehicle/_search
{
"query": {
"query_string": {
"query": "Classe B",
"default_operator": "AND"
}
}
}
Additionally, you can use query_string and escape the quotes will also return an exact phrase:
POST _search
{
"query": {
"query_string": {
"query": "\"Classe A\""
}
}
use match phrase query as mentioned below
GET /company/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
Seems like in the latest versions of ES you can just use .keyword
POST vehicles/_search
{
"query": {
"term": {
"model.keyword": "Classe A"
}
}
}
It will match exactly the string "Classe A"
Dynamic fields determined by ES as text will have a subfield 'keyword', very useful for this cases:
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
Another nice solution would be using match and minimum_should_match(providing the percentage of the words you want to match). It can be 100% and will return the results containing at least the given text;
It is important that this approach is NOT considering the order of the words.
"query":{
"bool":{
"should":[
{
"match":{
"my_text":{
"query":"I want to buy a new new car",
"minimum_should_match":"90%"
}
}
}
]
}
}

Elasticsearch DSL query from an SQL statement

I'm new to Elasticsearch. I don't think I fully understand the concept of query and filters. In my case I just want to use filters as I don't want to use advance feature like scoring.
How would I convert the following SQL statement into elasticsearch query?
SELECT * FROM advertiser
WHERE company like '%com%'
AND sales_rep IN (1,2)
What I have so far:
curl -XGET 'localhost:9200/advertisers/advertiser/_search?pretty=true' -d '
{
"query" : {
"bool" : {
"must" : {
"wildcard" : { "company" : "*com*" }
}
}
},
"size":1000000
}'
How to I add the OR filters on sales_rep field?
Thanks
Add a "should" clause after your must clause. In a bool query, one or more should clauses must match by default. Actually, you can set the "minimum_number_should_match" to be any number, Check out the bool query docs.
For your case, this should work.
"should" : [
{
"term" : { "sales_rep_id" : "1" }
},
{
"term" : { "sales_rep_id" : "2" }
}
],
The same concept works for bool filters. Just change "query" to "filter". The bool filter docs are here.
I come across this post 4 years too late...
Anyways, perhaps the following code could be useful...
{
"query": {
"filtered": {
"query": {
"wildcard": {
"company": "*com*"
}
},
"filter": {
"bool": {
"should": [
{
"terms": {
"sales_rep_id": [ "1", "2" ]
}
}
]
}
}
}
}
}

Resources