Elasticsearch Sort By Epoch MilliSeconds Timestamp - sorting

I have the ES document structure as below.
"hits" : [
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "566d9a9d-62d4-4dcd-b3f3-c0598638fa43",
"_score" : 1.0,
"_source" : {
"values" : {
"isActive" : "false",
"length" : 18.49,
"latitude" : 33.69076,
"accuracy" : 7
},
"metadata" : {
"name" : "866425030270849",
"type" : "BAT-M1",
"ts" : "1572493157000"
}
}
},
To sort the ES index based on the metadata.ts (date field with format 'epoch_millis'). I am using the following query to get latest record.
curl -X GET "https://localhost:9200/testindex/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"term" : { "metadata.name" : "866425030270849" }
},
"sort": [
{ "devicedata.metadata.ts": "desc" }
],
"size": 1
}
'
But, I am unable to sort the recent record. Please help!
devicedata in query is the nested object of metadata.

Related

delete all documents where id start with a number Elasticsearch

What is the fastest way to get all _ids ?
I need a query to delete all documents where _id start with a number in elasticsearch.
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "_2432475",
"_score" : 1.0,
"_source" : {
"name" : "999",
"file" : null,
"age" : null,
}
},
Your best bet is to first copy the internal _id into a doc-level field (let's call it internal_id:
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.internal_id = ctx._id",
"lang": "painless"
}
}
and then use a match_phrase_prefix query like so:
GET myindex/_search
{
"query": {
"match_phrase_prefix": {
"internal_id": "_24"
}
}
}
POST /myindex/_delete_by_query' \
-H 'Content-Type: application/json' \
-d '{
"query": {
"terms": {
"_id": [ "1", "2" ]
}
}
}'
wild card on _id is not supported in elasticsearch, either you have to index similar key explictly into the doc or
you can update doc using _update_by_query and add _id key into it

Find list of Distinct string Values stored in a field in ElasticSearch

I have stored my data in elasticsearch which is as given below. It returns only distinct words in the given field and not the entire distinct phrase.
{
"_index" : "test01",
"_type" : "whatever01",
"_id" : "1234",
"_score" : 1.0,
"_source" : {
"company_name" : "State Bank of India",
"user" : ""
}
},
{
"_index" : "test01",
"_type" : "whatever01",
"_id" : "5678",
"_score" : 1.0,
"_source" : {
"company_name" : "State Bank of India",
"user" : ""
}
},
{
"_index" : "test01",
"_type" : "whatever01",
"_id" : "8901",
"_score" : 1.0,
"_source" : {
"company_name" : "Kotak Mahindra Bank",
"user" : ""
}
}
I tried using Term Aggregation Function
GET /test01/_search/
{
"aggs" : {
"genres":
{
"terms" :
{ "field": "company_name"}
}
}
}
I get the following output
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10531,
"buckets" : [
{
"key" : "bank",
"doc_count" : 2818
},
{
"key" : "mahindra",
"doc_count" : 1641
},
{
"key" : "state",
"doc_count" : 1504
}]
}}
How to get the entire string in the field "company_name" with only distinct values as given below?
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10531,
"buckets" : [
{
"key" : "Kotak Mahindra Bank",
"doc_count" : 2818
},
{
"key" : "State Bank of India",
"doc_count" : 1641
}
]
}}
It appears that you've set "fielddata": "true" for your field company_name which is of type text. This is not good as it can end up consuming lot of heap space as mentioned in this link.
Further more, the field's values of type text are broken down into tokens and is saved in inverted index using a process called Analysis. Setting fielddata on fields of type text would cause the aggregation to work as what you mentioned in your question.
What you'd need to do is create its sibling equivalent of type keyword as mentioned in this link and perform aggregation on that field.
Basically modify your mapping for company_name as below:
Mapping:
PUT <your_index_name>/_search
{
"mappings": {
"mydocs": {
"properties": {
"company_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Run the below aggregation query on this company_name.keyword field and you'd get what you are looking for.
Query:
POST <your_index_name>/_search
{
"aggs": {
"unique_names": {
"terms": {
"field": "company_name.keyword", <----- Run on this field
"size": 10
}
}
}
}
Hope this helps!

ElasticSearch - How can I get all of a document's fields?

I'm trying to investigate an ElasticSearch index for which I have no documentation. Some of the documents in this index have parent-child relationships. So I issued:
curl -XGET 'http://localhost:9200/myindex/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"has_parent": {
"type": "entity",
"query": {
"term": {
"_id": "PROFILE_19986956"
}
}
}
}
}'
And got:
"hits" : {
"total" : 13,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myindex",
"_type" : "property",
"_id" : "PROFILE_19986956_name",
"_score" : 1.0
},
...
]
}
Now I want to get the value of the document with ID PROFILE_19986956_name so I do curl -XGET 'http://localhost:9200/myindex/property/PROFILE_19986956_name?routing=0&pretty' and get:
{
"_index" : "myindex",
"_type" : "property",
"_id" : "PROFILE_19986956_name",
"_version" : 3,
"found" : true
}
Which has no value for the name, which I was expecting to get. I know it has to be there because searching for the entity's name yields a result but for some reason I can't get the field that contains the name. How can I get ES to show it?
Look at the mapping, I think the fields are indexed but the source is disabled. Try :
curl -XGET 'http://localhost:9200/myindex
and see if the mapping has :
"_source": {
"enabled": false
}
If you see this, the source of the documents has not been indexed in elasticsearch, so you can't get it from it.

ElasticSearch ignoring field named 'tags' when specified in "fields"

I have a search index, products, containing a field named tags, which is an array. Tags values appears in results when I don't add a fields section to my query, but when I do, it's just ignored outright, and doesn't appear in results, as shown below.
$ curl -XPOST 'http://localhost:9200/products/_search?pretty' -d '{ "query": {"match_all": {} }, "fields": ["tags", "id", "slug"], "size": 2}'
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 321826,
"max_score" : 1.0,
"hits" : [ {
"_index" : "products",
"_type" : "products",
"_id" : "39969794",
"_score" : 1.0,
"fields" : {
"id" : [ "39969794" ],
"slug" : [ "slug-39969794" ]
}
}, {
"_index" : "products",
"_type" : "products",
"_id" : "21296413",
"_score" : 1.0,
"fields" : {
"id" : [ "21296413" ],
"slug" : [ "slug-21296413" ]
}
} ]
}
}
Is there a reason or known issue for this? Is tags some kind of reserved word for ElasticSearch?
I'm using ES version 1.1.2 (Lucene 4.7).
tags is not an ES reserved word. So that's not your problem.
Is your tags an array of atomic types (numbers, strings or booleans)? Or is it an array of objects?
fields only works with leaf nodes. So "fields": ["tags"] should work fine with an array of strings but it would fail with an array of tag objects.
Confused as to why you are using "fields" instead of "terms?"
$ curl -XPOST 'http://localhost:9200/products/_search?pretty' -d
'{"query":
{
"match_all": {}
},
"terms": ["tags", "id", "slug"],
"size": 2}'

Elasticsearch faceting query

I was wondering if I can write a faceting query for something like this
My document structure
UserID, AnswerID[] (int array)
1 , [9,10,11,56,78,99]
2 , [10,11,56,78,99]
3 , [8,10,12,56, 79,99]
4 , [9,10,11,56,78,99]
If I just want the count of users who answered 9,56 I can write a query. But I have two lists
List A - 9,10,11
ListB - 56,78,99
I want the permutation of the two lists.
Count of users who answered [9,56], [9,78], [9,99], [10,56], [10,78], [10,99], [11,56]...
How do I write a query to achieve something like this.
Any help is appreciated,
Thanks.
It can work when not using lists:
# Print ES Version
curl 'http://localhost:9200/'
# Delete the index `testindex`
curl -XDELETE 'http://localhost:9200/testindex'
# Create the index `testindex`
curl -XPUT 'http://localhost:9200/testindex' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
}'
# Wait for yellow
curl -XGET 'http://localhost:9200/_cluster/health?wait_for_status=yellow'
# Index docs
curl -XPUT http://localhost:9200/testindex/type/1 -d '{ "listA":"value1", "listB":"value2" }'
curl -XPUT http://localhost:9200/testindex/type/2 -d '{ "listA":"value1", "listB":"value3" }'
curl -XPUT http://localhost:9200/testindex/type/3 -d '{ "listA":"value1", "listB":"value2" }'
# Refresh docs
curl -XPOST 'http://localhost:9200/testindex/_refresh'
# TermFacet
curl -XPOST 'http://localhost:9200/testindex/type/_search?pretty' -d '
{
"query": { "match_all" : {} },
"facets" : {
"tag" : {
"terms" : {
"script_field" : "_source.listA + \" - \" + _source.listB"
}
}
}
}'
Gives:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "testindex",
"_type" : "type",
"_id" : "1",
"_score" : 1.0, "_source" : { "listA":"value1", "listB":"value2" }
}, {
"_index" : "testindex",
"_type" : "type",
"_id" : "2",
"_score" : 1.0, "_source" : { "listA":"value1", "listB":"value3" }
}, {
"_index" : "testindex",
"_type" : "type",
"_id" : "3",
"_score" : 1.0, "_source" : { "listA":"value1", "listB":"value2" }
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 0,
"other" : -3,
"terms" : [ {
"term" : "value1 - value2",
"count" : 2
}, {
"term" : "value1 - value3",
"count" : 1
} ]
}
}
}
But I have no idea when using list... I would love to know if it can be done...

Resources