Search query for elasticsearch when child element is array of string - elasticsearch

I created a documents in elasticsearch in the following format
curl -XPUT "http://localhost:9200/my_base.main_candidate/" -d'
{
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}'
My requirement is to search for location_tags containing one of the given options like ["Mumbai", "Pune"]. How do I do this?
I tried:
curl -XGET "http://localhost:9200/my_base.main_candidate/_search" -d '
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"specific_location.location_tags" : ["Mumbai"]
}
}
}
}
}'
which didn't work.
I got this output :
{
"took": 72,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

There are a several ways to solve this. Perhaps the most immediate one is to search for mumbai instead of Mumbai.
If I create the index with no mapping,
curl -XDELETE "http://localhost:9200/my_base.main_candidate/"
curl -XPUT "http://localhost:9200/my_base.main_candidate/"
then add a doc:
curl -XPUT "http://localhost:9200/my_base.main_candidate/doc/1" -d'
{
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}'
then run your query with the lower-case term
curl -XPOST "http://localhost:9200/my_base.main_candidate/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"specific_location.location_tags": [
"mumbai"
]
}
}
}
}
}'
I get back the expected doc:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_base.main_candidate",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}
}
]
}
}
This is because, since no explicit mapping was used, Elasticsearch uses defaults, which means the location_tags field will be analyzed with the standard analyzer, which will convert terms to lower-case. So the term Mumbai does not exist, but mumbai does.
If you want to be able to use upper-case terms in your query, you will need to set up an explicit mapping that tells Elasticsearch not to analyze the location_tags field. Maybe something like this:
curl -XDELETE "http://localhost:9200/my_base.main_candidate/"
curl -XPUT "http://localhost:9200/my_base.main_candidate/" -d'
{
"mappings": {
"doc": {
"properties": {
"specific_location": {
"properties": {
"location_tags": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
curl -XPUT "http://localhost:9200/my_base.main_candidate/doc/1" -d'
{
"specific_location": {
"location_name": "Mumbai",
"location_tags": [
"Mumbai"
],
"tags": [
"Mumbai"
]
}
}'
curl -XPOST "http://localhost:9200/my_base.main_candidate/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"specific_location.location_tags": [
"Mumbai"
]
}
}
}
}
}'
Here is all the above code in a handy place:
http://sense.qbox.io/gist/74844f4d779f7c2b94a9ab65fd76eb0ffe294cbb
[EDIT: by the way, I used Elasticsearch 1.3.4 when testing the above code]

Related

Elasticsearch query to search with mm-yyyy format on date field

I want to query the elasticsearch like 03-2015 on date field which is in yyyy-dd-mm format.
I tried like this, But it didn't worked.Its not giving any error, it is returning 0 records
curl -XPOST "http://localhost:9200/myindex/mytype/_search?pretty" -d '{
"query": {
"bool": {
"must": [
{
"range": {
"deliverydate": {
"gte": "03-2015",
"lte": "03-2015",
"format": "mm-yyyy"
}
}
}
]
}
}
}
'
my sample document is this
{
"took": 38,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "myindex",
"_type": "mytype",
"_id": "33924",
"_score": 1,
"_source": {
"id": 33924,
"deliverydate": "2015-03-14",
"name":"New test order"
}
}
]
}
}
Can anyone please help me on this. Is this a valid search on elasticsearch data?
Your format is not correct (MM instead of mm), it should be
curl -XPOST "http://localhost:9200/myindex/mytype/_search?pretty" -d '{
"query": {
"bool": {
"must": [
{
"range": {
"deliverydate": {
"gte": "03-2015",
"lte": "04-2015",
"format": "MM-yyyy"
}
}
}
]
}
}}'

ElasticSearch - Search fails for 2 string fields

I am running into search error - need some help. I have article index with id, title, artist, genre fields. When I run this query I get zero results-
POST /d3acampaign/article/_search
{
"query": {
"filtered": {
"query": {
"match": {"genre": "metal"}
},
"filter": {
"term": {"artist": "Somanath"}
}
}
}
}
But if I change the query to something like -
POST /d3acampaign/article/_search
{
"query": {
"filtered": {
"query": {
"match": {"genre": "metal"}
},
"filter": {
"term": {"id": "7"}
}
}
}
}
i get following result -
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "d3acampaign",
"_type": "article",
"_id": "7",
"_score": 1.4054651,
"_source": {
"id": "7",
"title": "The Last Airbender",
"artist": "Somanath",
"genre": "metal"
}
}
]
}
}
Clarification - I am noticing search failing in case if I try against string e.g. artist, title
The reason of you get empty hits is when you query by using term query, it will match the exact term in index includes uppercase.
and the default analyzer will create index with lowercase.
There are many ways to analyze text: the default standard analyzer
drops most punctuation, breaks up text into individual words, and
lower cases them. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
The solution:
{
"query": {
"filtered": {
"query": {
"match": {"genre": "metal"}
},
"filter": {
"term": {"artist": "somanath"} //to lower case
}
}
}
}
The other solution is change your index mapping to not_analyzed for your index type.
The full example:
#!/bin/bash
curl -XDELETE "localhost:9200/testindex"
curl -XPUT "localhost:9200/testindex/?pretty" -d '
{
"mappings":{
"test":{
"properties":{
"name":{
"index":"not_analyzed",
"type":"string"
}
}
}
}
}'
curl -XGET "localhost:9200/testindex/_mapping?pretty"
curl -XPOST "localhost:9200/testindex/test/1" -d '{
"name": "Jack"
}'
sleep 1
echo -e
echo -e
echo -e
echo -e "Filtered Query Search in not_analyzed index:"
echo -e
curl -XGET "localhost:9200/testindex/test/_search?pretty" -d '{
"query": {
"filtered": {
"filter": {
"term": {"name": "Jack"}
}
}
}
}'

Elasticsearch return unique values for a field

I am trying to build an Elasticsearch query that will return only unique values for a particular field.
I do not want to return all the values for that field nor count them.
For example, if there are 50 different values currently contained by the field, and I do a search to return only 20 hits (size=20). I want each of the 20 results to have a unique result for that field, but I don't care about the 30 other values not represented in the result.
For example with the following search (pseudo code - not checked):
{
from: 0,
size: 20,
query: {
bool: {
must: {
range: { field1: { gte: 50 }},
term: { field2: 'salt' },
/**
* I want to return only unique values for "field3", but I
* don't want to return all of them or count them.
*
* How do I specify this in my query?
**/
unique: 'field3',
},
mustnot: {
match: { field4: 'pepper'},
}
}
}
}
You should be able to do this pretty easily with a terms aggregation.
Here's an example. I defined a simple index, containing a field that has "index": "not_analyzed" so we can get the full text of each field as a unique value, rather than terms generated from tokenizing it, etc.
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Then I add a few docs with the bulk API.
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"title":"first doc"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"title":"second doc"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"title":"third doc"}
{"index":{"_index":"test_index","_type":"doc","_id":4}}
{"title":"third doc"}
Now we can run our terms aggregation:
POST /test_index/_search?search_type=count
{
"aggs": {
"unique_vals": {
"terms": {
"field": "title"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique_vals": {
"buckets": [
{
"key": "third doc",
"doc_count": 2
},
{
"key": "first doc",
"doc_count": 1
},
{
"key": "second doc",
"doc_count": 1
}
]
}
}
}
I'm very surprised a filter aggregation hasn't been suggested. It goes back all the way to ES version 1.3.
The filter aggregation is similar to a regular filter query but can instead be nested into an aggregation chain to filter out counts of documents that don't meet a particular criteria and give you sub-aggregation results based only on the documents that meet the criteria of the query.
First, we'll put our mapping.
curl --request PUT \
--url http://localhost:9200/items \
--header 'content-type: application/json' \
--data '{
"mappings": {
"item": {
"properties": {
"field1" : { "type": "integer" },
"field2" : { "type": "keyword" },
"field3" : { "type": "keyword" },
"field4" : { "type": "keyword" }
}
}
}
}
'
Then let's load some data.
curl --request PUT \
--url http://localhost:9200/items/_bulk \
--header 'content-type: application/json' \
--data '{"index":{"_index":"items","_type":"item","_id":1}}
{"field1":50, "field2":["salt", "vinegar"], "field3":["garlic", "onion"], "field4":"paprika"}
{"index":{"_index":"items","_type":"item","_id":2}}
{"field1":40, "field2":["salt", "pepper"], "field3":["onion"]}
{"index":{"_index":"items","_type":"item","_id":3}}
{"field1":100, "field2":["salt", "vinegar"], "field3":["garlic", "chives"], "field4":"pepper"}
{"index":{"_index":"items","_type":"item","_id":4}}
{"field1":90, "field2":["vinegar"], "field3":["chives", "garlic"]}
{"index":{"_index":"items","_type":"item","_id":5}}
{"field1":900, "field2":["salt", "vinegar"], "field3":["garlic", "chives"], "field4":"paprika"}
'
Notice, that only the documents with id's 1 and 5 will pass the criteria and so we will be left to aggregate on these two field3 arrays and four values total. ["garlic", "chives"], ["garlic", "onion"]. Also notice that field3 can be an array or single value in the data but I'm making them arrays to illustrate how the counts will work.
curl --request POST \
--url http://localhost:9200/items/item/_search \
--header 'content-type: application/json' \
--data '{
"size": 0,
"aggregations": {
"top_filter_agg" : {
"filter" : {
"bool": {
"must":[
{
"range" : { "field1" : { "gte":50} }
},
{
"term" : { "field2" : "salt" }
}
],
"must_not":[
{
"term" : { "field4" : "pepper" }
}
]
}
},
"aggs" : {
"field3_terms_agg" : { "terms" : { "field" : "field3" } }
}
}
}
}
'
After running the conjuncted filter/terms aggregation. We only have a count of 4 terms on field3 and three unique terms altogether.
{
"took": 46,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"top_filter_agg": {
"doc_count": 2,
"field3_terms_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "garlic",
"doc_count": 2
},
{
"key": "chives",
"doc_count": 1
},
{
"key": "onion",
"doc_count": 1
}
]
}
}
}
}

Elasticsearch array must and must_not

I have a documents looking like this in my elasticsearch DB :
{
"tags" => [
"tag-1",
"tag-2",
"tag-3",
"tag-A"
]
"created_at" =>"2013-07-02 12:42:19 UTC",
"label" =>"Mon super label"
}
I would like to be able to filter my documents with this criteria :
Documents tags array must have tags-1, tags-3 and tags-2 but must not have tags-A.
I tried to use a bool filter but I can't manage to make it work !
Here is a method that seems to accomplish you want: http://sense.qbox.io/gist/4dd806936f12a9668d61ce63f39cb2c284512443
First I created an index with an explicit mapping. I did this so I could set the "tags" property to "index": "not_analyzed". This means that the text will not be modified in any way, which will simplify the querying process for this example.
curl -XPUT "http://localhost:9200/test_index" -d'
{
"mappings": {
"docs" : {
"properties": {
"tags" : {
"type": "string",
"index": "not_analyzed"
},
"label" : {
"type": "string"
}
}
}
}
}'
and then add some docs:
curl -XPUT "http://localhost:9200/test_index/docs/1" -d'
{
"tags" : [
"tag-1",
"tag-2",
"tag-3",
"tag-A"
],
"label" : "item 1"
}'
curl -XPUT "http://localhost:9200/test_index/docs/2" -d'
{
"tags" : [
"tag-1",
"tag-2",
"tag-3"
],
"label" : "item 2"
}'
curl -XPUT "http://localhost:9200/test_index/docs/3" -d'
{
"tags" : [
"tag-1",
"tag-2"
],
"label" : "item 3"
}'
Then we can query using must and must_not clauses in a bool filter as follows:
curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"tags": [
"tag-1",
"tag-2",
"tag-3"
],
"execution" : "and"
}
}
],
"must_not": [
{
"term": {
"tags": "tag-A"
}
}
]
}
}
}
}
}'
which yields the correct result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "docs",
"_id": "2",
"_score": 1,
"_source": {
"tags": [
"tag-1",
"tag-2",
"tag-3"
],
"label": "item 2"
}
}
]
}
}
Notice the "execution" : "and" parameter in the terms filter in the must clause. This means only docs that have all the "tags" specified will be returned (rather than those that match one or more). That may have been what you were missing. You can read more about the options in the ES docs.
I made a runnable example here that you can play with, if you have ES installed and running at localhost:9200, or you can provide your own endpoint.

Return the most recent record from ElasticSearch index

I would like to return the most recent record (top 1) from ElasticSearch index similar to the sql query below;
SELECT TOP 1 Id, name, title
FROM MyTable
ORDER BY Date DESC;
Can this be done?
Do you have _timestamp enabled in your doc mapping?
{
"doctype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
},
"properties": {
...
}
}
}
You can check your mapping here:
http://localhost:9200/_all/_mapping
If so I think this might work to get most recent:
{
"query": {
"match_all": {}
},
"size": 1,
"sort": [
{
"_timestamp": {
"order": "desc"
}
}
]
}
For information purpose, _timestamp is now deprecated since 2.0.0-beta2.
Use date type in your mapping.
A simple date mapping JSON from date datatype doc:
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date"
}
}
}
}
}
You can also add a format field in date:
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date",
"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
}
}
}
}
Get the Last ID using by date (with out time stamp)
Sample URL : http://localhost:9200/deal/dealsdetails/
Method : POST
Query :
{
"fields": ["_id"],
"sort": [{
"created_date": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
],
"size": 1
}
result:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 9,
"max_score": null,
"hits": [{
"_index": "deal",
"_type": "dealsdetails",
"_id": "10",
"_score": 1,
"sort": [
1478266145174,
1
]
}]
}
}
You can use sort on date field and size=1 parameter.
Does it help?
If you are using python elasticsearch5 module or curl:
make sure each document that gets inserted has
a timestamp field that is type datetime
and you are monotonically increasing the timestamp value for each document
from python you do
es = elasticsearch5.Elasticsearch('my_host:my_port')
es.search(
index='my_index',
size=1,
sort='my_timestamp:desc'
)
If your documents are not inserted with any field that is of type datetime, then I don't believe you can get the N "most recent".
Since this question was originally asked and answered, some of the inner-workings of Elasticsearch have changed, particularly around timestamps. Here is a full example showing how to query for single latest record. Tested on ES 6/7.
1) Tell Elasticsearch to treat timestamp field as the timestamp
curl -XPUT "localhost:9200/my_index?pretty" -H 'Content-Type: application/json' -d '{"mappings":{"message":{"properties":{"timestamp":{"type":"date"}}}}}'
2) Put some test data into the index
curl -XPOST "localhost:9200/my_index/message/1" -H 'Content-Type: application/json' -d '{ "timestamp" : "2019-08-02T03:00:00Z", "message" : "hello world" }'
curl -XPOST "localhost:9200/my_index/message/2" -H 'Content-Type: application/json' -d '{ "timestamp" : "2019-08-02T04:00:00Z", "message" : "bye world" }'
3) Query for the latest record
curl -X POST "localhost:9200/my_index/_search" -H 'Content-Type: application/json' -d '{"query": {"match_all": {}},"size": 1,"sort": [{"timestamp": {"order": "desc"}}]}'
4) Expected results
{
"took":0,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":2,
"max_score":null,
"hits":[
{
"_index":"my_index",
"_type":"message",
"_id":"2",
"_score":null,
"_source":{
"timestamp":"2019-08-02T04:00:00Z",
"message":"bye world"
},
"sort":[
1564718400000
]
}
]
}
}
I used #timestamp instead of _timestamp
{
'size' : 1,
'query': {
'match_all' : {}
},
"sort" : [{"#timestamp":{"order": "desc"}}]
}
the _timestamp didn't work out for me,
this query does work for me:
(as in mconlin's answer)
{
"query": {
"match_all": {}
},
"size": "1",
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
Could be trivial but the _timestamp answer didn't gave an error but not a good result either...
Hope to help someone...
(kibana/elastic 5.0.4)
S.

Resources