Why can I not see the _timestamp field while being able to filter a query by it?
The following query return the correct documents, but not the timestamp itself. How can I return the timestamp?
{
"fields": [
"_timestamp",
"_source"
],
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"_timestamp": {
"from": "2013-01-01"
}
}
}
}
}
}
The mapping is:
{
"my_doctype": {
"_timestamp": {
"enabled": "true"
},
"properties": {
"cards": {
"type": "integer"
}
}
}
}
sample output:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test1",
"_type" : "doctype1",
"_id" : "HjfryYQEQL6RkEX3VOiBHQ",
"_score" : 1.0, "_source" : {"cards": "5"}
}, {
"_index" : "test1",
"_type" : "doctype1",
"_id" : "sDyHcT1BTMatjmUS0NSoEg",
"_score" : 1.0, "_source" : {"cards": "2"}
}]
}
When timestamp field is enabled, it's indexed but not stored by default. So, while you can search and filter by the timestamp field, you cannot easily retrieve it with your records. In order to be able to retrieve the timestamp field you need to recreate your index with the following mapping:
{
"my_doctype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
},
"properties": {
...
}
}
}
This way you will be able to retrieve timestamp as the number of milliseconds since the epoch.
It is not necessary to store the timestamp field, since its exact value is preserved as a term, which is also more likely to already be present in RAM, especially if you are querying on it. You can access the timestamp via its term using a script_value:
{
"query": {
...
},
"script_fields": {
"timestamp": {
"script": "_doc['_timestamp'].value"
}
}
}
The resulting value is expressed in miliseconds since UNIX epoch. It's quite obscene that ElasticSearch can't do this for you, but hey, nothing's perfect.
Related
Let's say I have index like this:
{
"id": 6,
"name": "some name",
"users": [
{
"id": 1,
"name": "User1",
"isEnabled": false,
},
{
"id": 2,
"name": "User2",
"isEnabled": false,
},
{
"id": 3,
"name": "User3,
"isEnabled": true,
},
]
}
what I need is to return that index while user searching for the name some name, but also I want to filter out all not enabled users, and if there is not enabled users omit that index.
I tried to use filters like this:
{
"query": {
"bool": {
"must": {
"match": {
"name": "some name"
}
},
"filter": {
"term": {
"users.isEnabled": true
}
}
}
}
}
but in such a case I see index with all users no matter if user is enabled or not. I'm a bit new but is there a way to do so??? I can filter out all that in code after getting data from elasticsearch but in such a case it can break pagination if I remove some index without enabled users from result set.
I'm a bit new to elasticsearch, but as far I can't find how to do it. Thank you in advice!
Elasticsearch will return whole document if there is any match. If you update your mapping and make users array nested, you can achieve this by using inner hits. This is a basic example mapping that works:
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"users": {
"type": "nested"
}
}
}
}
And if you send a query like following, response will contain id and name from the parent document, and it will contain inner_hits that match to your user's isEnabled query.
{
"_source": ["id", "name"],
"query": {
"bool": {
"must": [
{
"match": {
"name": "some name"
}
},
{
"nested": {
"path": "users",
"query": {
"term": {
"users.isEnabled": {
"value": true
}
}
},
"inner_hits": {}
}
}
]
}
}
}
This is an example response
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9375811,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9375811,
"_source" : {
"name" : "some name",
"id" : 6
},
"inner_hits" : {
"users" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.540445,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "users",
"offset" : 2
},
"_score" : 1.540445,
"_source" : {
"id" : 3,
"name" : "User3",
"isEnabled" : true
}
}
]
}
}
}
}
]
}
}
Then you can do the mapping in the application.
I have an Elastic Search 6.8.7 cluster.
I have a column with this mapping:
"event_object": { "enabled": false, "type": "object" }
I want to search for records that match certain other criteria, and also have a particular value for a particular field field in this object.
So far, I have tried variations of doing a normal search for the indexed fields, and a filter script for the unindexed ones:
GET /my_index/_search
{
"query":{
"bool":{
"must":{
"query_string": {
"query": "foo:bar"
}
},
"filter": {
"script": {
"script": {
"source": "doc[\"event_object\"][\"state\"].value == \"R\""
}
}
}
}
},
"terminate_after":1000,
"from":0,
"size":1000
}
Which is a hodgepodge of testing myself forwards based on google searches. But I can't get things to even compile, let alone run and filter.
It is not possible to access the content of JSON objects that have enabled: false. From the official documentation:
Elasticsearch skips parsing of the contents of the field entirely. The JSON can still be retrieved from the _source field, but it is not searchable or stored in any other way
So even scripting will not help here.
However, there's one way to access this disabled data from scripting in a terms aggregation (using the include parameter and a top_hitssub-aggregation):
POST test/_search
{
"query": {
"match_all": {}
},
"aggs": {
"state": {
"terms": {
"script": "params._source.event_object.state",
"size": 100,
"include": "R"
},
"aggs": {
"hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
And you'd get a response like this one:
"aggregations" : {
"state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "R",
"doc_count" : 1,
"hits" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"event_object" : {
"state" : "R"
},
"test" : "hello"
}
}
]
}
}
}
]
}
}
I didn't find any answers how to do simple thing in ElasticSearch 6.8 I need to filter nested objects.
Index
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {
"human": {
"properties": {
"cats": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"breed": {
"type": "text"
},
"colors": {
"type": "integer"
}
}
},
"name": {
"type": "text"
}
}
}
}
}
Data
{
"name": "iridakos",
"cats": [
{
"colors": 1,
"name": "Irida",
"breed": "European Shorthair"
},
{
"colors": 2,
"name": "Phoebe",
"breed": "european"
},
{
"colors": 3,
"name": "Nino",
"breed": "Aegean"
}
]
}
select human with name="iridakos" and cats with breed contains 'European' (ignore case).
Only two cats should be returned.
Million thanks for helping.
For nested datatypes, you would need to make use of nested queries.
Elasticsearch would always return the entire document as a response. Note that nested datatype means that every item in the list would be treated as an entire document in itself.
Hence in addition to return entire document, if you also want to know the exact hits, you would need to make use of inner_hits feature.
Below query should help you.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "iridakos"
}
},
{
"nested": {
"path": "cats",
"query": {
"match": {
"cats.breed": "european"
}
},
"inner_hits": {}
}
}
]
}
}
}
Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.74455214,
"hits" : [
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1", <--- The document that hit
"_score" : 0.74455214,
"_source" : {
"name" : "iridakos",
"cats" : [
{
"colors" : 1,
"name" : "Irida",
"breed" : "European Shorthair"
},
{
"colors" : 2,
"name" : "Phoebe",
"breed" : "european"
},
{
"colors" : 3,
"name" : "Nino",
"breed" : "Aegean"
}
]
},
"inner_hits" : { <---- Note this
"cats" : {
"hits" : {
"total" : {
"value" : 2, <---- Count of nested doc hits
"relation" : "eq"
},
"max_score" : 0.52354836,
"hits" : [
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "cats",
"offset" : 1
},
"_score" : 0.52354836,
"_source" : { <---- First Nested Document
"breed" : "european"
}
},
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "cats",
"offset" : 0
},
"_score" : 0.39019167,
"_source" : { <---- Second Document
"breed" : "European Shorthair"
}
}
]
}
}
}
}
]
}
}
Note in your response how the inner_hits section would appear where you would find the exact hits.
Hope this helps!
You could use something like this:
{
"query": {
"bool": {
"must": [
{ "match": { "name": "iridakos" }},
{ "match": { "cats.breed": "European" }}
]
}
}
}
To search on a cat's breed, you can use the dot-notation.
I am executing a simple range query. But I see that an empty result being returned. But I know that they are many records/documents that satisfy the query.
Below are the 3 types of queries I have tried.
(the third one is intended query)
1)
"query": {
"range" : {
"endTime" : {
"gte" : 1559076400.0
}
}
}
2)
"query": {
"bool": {
"must": [
{"range" : {
"endTime" : {
"gte" : 1559076401.0
}
}
}
]
}
}
3)
"query": {
"bool": {
"filter": [
{"range" : {
"startTime" : {
"gt" : 1356873300.0
}
}
},
{"range" : {
"endTime" : {
"gte" : 1559076401.0
}
}
}
]
}
All 3 queries return an empty response.
Hope you people can help. Thank you.
In elastic index, before inserting data, you you need define the fields mappings as date or numbers so that range searches can be applied.
Or keep dynamic mappings ON so that elastic can identify the field types automatically based on inserted data.
In case of latter, do check the auto generated mappings on your index.
Also check the date/timestamp format.
Steps to check mappings
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html
Since you are using epoch time, you need to mention that in the mapping. This is what I did. Basically the mapping and the way you stored the data mattered here. I am not sure if we can save any format as we want and query using any format we want. I will do some more research and update the answer if that can be done
1) created the mapping -- to show how the endTime mapping is done
2) inserting a few sample documents
3) queried the document using epoch time -- the way you wanted
Mapping
PUT so_test24
{
"mappings" : {
"_doc" : {
"properties" : {
"id" : {
"type" : "long"
},
"endTime" : {
"type" : "date",
"format": "epoch_millis"
}
}
}
}
}
Inserting the documents
POST /so_test24/_doc
{
"id": 1,
"endTime": "1546300800"
}
POST /so_test24/_doc
{
"id": 2,
"endTime": "1514764800"
}
POST /so_test24/_doc
{
"id": 3,
"endTime": "1527811200"
}
POST /so_test24/_doc
{
"id": 4,
"endTime": "1535760000"
}
The search Query
GET /so_test24/_search
{
"query": {
"range": {
"endTime": {"gte": "1532883892"}
}
}
}
The result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "so_test24",
"_type" : "_doc",
"_id" : "uFIq42sB4TH56W1h-jGu",
"_score" : 1.0,
"_source" : {
"id" : 1,
"endTime" : "1546300800"
}
},
{
"_index" : "so_test24",
"_type" : "_doc",
"_id" : "u1Iq42sB4TH56W1h-zEK",
"_score" : 1.0,
"_source" : {
"id" : 4,
"endTime" : "1535760000"
}
}
]
}
}
I have stored three json objects in elasticsearch, each object has a title and projects array.
{"name": "haris","projects": [{"title": "Splunk"},{"title": "QRadar"},{"title": "LogAnalysis"}]}
{"name": "khalid","projects": [{"title": "MS"},{"title": "Google"},{"title": "Apple"}]}
{"name": "Hamid","projects": [{"title": "Toyota"},{"title": "Honda"},{"title": "Kia"}]}
I have written a query to extract a particular object by _id and its specific property projects
curl -XGET 'localhost:9200/jsontest/_search?pretty' -d '{"query" : { "match" : {"_id":"AV1kzzZqAzHWQ2S7B8f1"} }, "_source": ["projects"]}'
As expected it returns projects object
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "jsontest",
"_type" : "json",
"_id" : "AV1kzzZqAzHWQ2S7B8f1",
"_score" : 1.0,
"_source" : {
"projects" : [{"title" : "Splunk"},{"title" : "QRadar"},{"title" : "LogAnalysis"}
]
}
}
]
}
}
Question: is there a way to retrieve value at a particular index of projects? This is dummy data, in my real scenario projects can have a large number of elements and each element itself is a json object with a lot of properties. I only need to retrieve value at certain index of projects.
Here is what i would do.
First the mapping
PUT test/my_objects/_mapping
{
"properties": {
"name":{
"type": "string",
"index": "not_analyzed"
},
"projects": {
"type": "nested"
}
}
}
Second Projects are indexed
PUT test/my_objects/1111
{
"name": "haris",
"projects": [
{"title": "Splunk"},
{"title": "QRadar"},
{"title": "LogAnalysis"}
]
}
Finally the aggregation query
GET test/my_objects/_search
{
"aggs": {
"by_name": {
"terms": {
"field": "name"
},
"aggs": {
"by_project": {
"nested": {
"path": "projects"
},
"aggs": {
"by_title": {
"terms": {
"field": "projects.title"
}
}
}
}
}
}
}
}
its not tested and a bit tedious because of the nested aggs but should work if you manipulate it further for you requirements