Can I force ES to return dates in epoch_millis format? - elasticsearch

I have this field mapping
"time": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
and I'm querying documents with this filter:
"range": {
"time": {
"gt": "1473157500000",
"lte": "1473158700000",
"format": "epoch_millis"
}
this works and returns documents, but the result show the time field in a different format:
"time": "2016-09-06T10:25:23.678",
Is it possible to force queries to be returned in epoch_millis?

The _source always returns the data in the original document.
Ideally I feel it maybe more desirable and flexible to convert the _source data to the desired format for presentation or otherwise on the client end.
However for the above use case you could use fielddata_fields.
fielddata_fields would return fields in the format of how the field-data is actually stored which in case of date field happens to be epoch_millis.
From documentation:
Allows to return the field data representation of a field for each hit
Field data fields can work on fields that are not stored. It’s
important to understand that using the fielddata_fields parameter will
cause the terms for that field to be loaded to memory (cached), which
will result in more memory consumption.
Example:
post <index>/_search
{
"fielddata_fields": [
"time"
]
}

ES 6.5 onwards, we need to use docvalue_fields in this specific structure, as the fielddata_fields has been deprecated. E.g. let's say we ingested a json doc of the following format:
{
"#version": "1",
"#timestamp": "2019-01-29T10:01:19.217Z",
"host": "C02NTJRMG3QD",
"message": "hello world!"
}
Now let's execute the following get query with docvalue_fields:
curl -X GET \
http://localhost:9200/myindex/_search \
-H 'Content-Type: application/json' \
-d '{
"query": {
"match_all": {}
},
"docvalue_fields": [
{
"field": "#timestamp",
"format": "epoch_millis"
}
]
}'
And, we'll get the following response:
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "myindex",
"_type": "doc",
"_id": "mY8OmWgBF3ZItz5TVDiL",
"_score": 1,
"_source": {
"#version": "1",
"#timestamp": "2019-01-29T10:01:19.217Z",
"host": "C02NTJRMG3QD",
"message": "hello world!"
},
"fields": {
"#timestamp": [
"1548756079217"
]
}
}
]
}
}

Related

Source to destination Key Field mapping in Elastic Search

I have a elastic search index with source data coming in the following way:
"_source": {
"email": "smithamber#example.com",
"time": "2022-09-08T13:52:50.347861",
"message": "Pattern thank talk mention. Manage nearly tell beat. Difficult husband feel talk radio however.",
"sIp": "192.168.11.156",
"dIp": "80.254.211.60",
"ts": "2022-09-08T13:52:50"
}
Now I want a way to treat dynamically map #timestamp [destination key] field of ES doc to be time [source key]. For this i am using:
"runtime_mappings": {
"#timestamp": {
"type": "date",
"format": "yyyyMMdd'T'HHmmss.SSSZ",
"script": {
"source": "if (doc[\"time\"].size() == 0) {return} else {return doc[\"time\"].value;}",
"lang": "painless"
}
}
}
However, this does not work. Is there a better way to map source key field to destination key field in elastic search. I am open to static mapping as well if we set once before creating the index for one kind of source data.
I am looking for correct syntax for mapping my field.
Edited:
When I add the query -
{ "query": {
"range": {
"#timestamp": {
"gte": "now-5d",
"lte": "now"
}
}
}
}
I see no hits.
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
However, same query on field time gets me all filtered docs.
{
"took": 27,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 1.0,
"hits": [
{
"_index": "topic-indexer-xxx",
"_id": "c28sIYMB0xJUJru8c47O",
"_score": 1.0,
"_source": {
"email": "albertthompson#example.com",
"time": "2022-09-07T15:25:33.672016",
"message": "Candidate future staff ever former run. Like quality personal specific trouble cell money move. Available majority memory model thing TV wrong. Summer anyone light key.",
"sIp": "192.168.103.75",
"dIp": "191.27.68.163"
}
},
....
}
For mapping I have also tried dynamic templates; but still no results on query for #timestamp field:
{
"dynamic_templates": [
{
"#timestamp": {
"match": "time",
"mapping": {
"type": "date",
"format": "strict_date_optional_time",
"copy_to": "#timestamp"
}
}
}
]
}
With #paulo's response, I just did a little fine tuning to resolve the issue; The below mapping (as set) works and then I can run range queries on the #timestamp field:
{
"runtime": {
"#timestamp": {
"type": "date",
"script": {
"source": "if (doc['time'].size() != 0){ emit(doc['time'].value.toEpochMilli());}",
"lang": "painless"
}
}
},
"properties": {
"#timestamp": {
"type": "date"
}
}
}
Tldr;
I feel you go mixed up in your painless script.
Please find below an example you should be able to reproduce on your side.
Time is already a date on my side. Elasticsearch was able to detect it automatically.
On another note, using runtime fields while very flexible, may lead to performance issue on the long run.
Maybe you should be looking into ingest pipeline.
Solution
POST /73684302/_doc
{
"email": "smithamber#example.com",
"time": "2022-09-08T13:52:50.347861",
"message": "Pattern thank talk mention. Manage nearly tell beat. Difficult husband feel talk radio however.",
"sIp": "192.168.11.156",
"dIp": "80.254.211.60",
"ts": "2022-09-08T13:52:50"
}
POST /73684302/_doc
{
"email": "smithamber#example.com",
"message": "Pattern thank talk mention. Manage nearly tell beat. Difficult husband feel talk radio however.",
"sIp": "192.168.11.156",
"dIp": "80.254.211.60",
"ts": "2022-09-08T13:52:50"
}
GET /73684302/_search
{
"runtime_mappings": {
"#timestamp": {
"type": "date",
"script": {
"source": """
if (doc["time"].size() != 0){
emit(doc["time"].value.toEpochMilli());
}
""",
"lang": "painless"
}
}
},
"_source": false,
"fields": ["#timestamp"]
}

Why does elasticsearch filter does not give any results whereas using kibana dasboard gives the result?

I am query elastic search using sense. When using range filter on field, I get empty hits, but I am able to get results using kibana dashboard. Why is the filter not working? My query:
GET _search
{
"query": {
"bool": {
"must": [
{"match": {"field_name1": "value1"}},
{"match": {"file_name2": "value2"}}
]
}
},
"filter": { <- not working (no data, but gets data from kibana)
"range": {
"#timestamp": {
"gte": "2017-02-18"
}
}
},
"sort": [
{
"#timestamp": {
"order": "desc",
"ignore_unmapped" : true
}
}
]
}
From kibana dashboard when I add the time it add the time:(from:'2017-02-18T10:19:08.680Z',mode:absolute,to:'2017-02-19T10:19:08.680Z')) and I am able to see results. The dashboard also adds some other stuff like metadata and filter with negate but I think they do the same. Only the time part seem to be different. So why the difference and is my query correct? The sample url:
https://elasticsearch/app/kibana#/discover?
_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2017-02-18T09:23:41.044Z',mode:absolute,to:'2017-02-19T09:23:41.044Z'))
&_a=(columns:!(description,id),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:index-value,key:field_name1,negate:!f,value:value1),query:(match:(field_name2:(query:value2,type:phrase))))),index:index-value,interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('#timestamp',desc),uiState:(),vis:(aggs:!((params:(field:field_name2,orderBy:'2',size:20),schema:segment,type:terms),(id:'2',schema:metric,type:count)),type:histogram))
&indexPattern=index-value&type=histogram
Thanks.
Sample json response:
{
"took": some_number,
"timed_out": false,
"_shards": {
"total": some_number,
"successful": some_number,
"failed": 0
},
"hits": {
"total": some_number,
"max_score": null,
"hits": [
{
"_index": "index-name",
"_type": "log-1",
"_id": "alphanum",
"_score": null,
"_source": {
"headers": "header-string",
"query_string": "query-string",
"server_variables": "server-variables",
"cookies": "cookies",
"extra_data": "some extra stuff",
"exception_data_obj": {
"stack_trace": "",
"source": "",
"message": "success",
"additional_data": ""
},
"some_id": "211FA1F1-F312-1234-B539-F7AAE23EAA2F",
"level": "Warn",
"description": "Success",
"#timestamp": "2017-01-20T01:33:27.303Z",
"field1": "value1",
"field2": "value2"
"key": {
"key.field1": "key.value1",
"key.field2": "key.value2"
}
"#by": "app-name",
"environment": "env-name"
},
"sort": [
1484876007303
]
},
{}
]
}
}
it's not the same query, in the sense query you asked must query on field1 and field2 but in kibana you didn't

GET query based on timestamp

I’m looking for help on building a query that will retrieve the last number of documents for a given time frame, for example last 30 minutes.
The documents are syslogs like:
{
"_index": "logstash-2017.01.16",
"_type": "syslog",
"_id": "AVmnIUFGd2leAWt2KJSr",
"_score": 1,
"_source": {
"#timestamp": "2017-01-16T11:54:48.318Z",
"syslog_severity_code": 5,
"syslog_facility": "user-level",
"#version": "1",
"host": "10.0.0.1",
"syslog_facility_code": 1,
"message": "Test Syslog Message",
"type": "syslog",
"syslog_severity": "notice",
tags": [
"_grokparsefailure"
]
}
My idea is to build this query into another script that will check for new items being added to ES.
Use Range Query:
GET index/type/_count
{
"query": {
"range": {
"#timestamp": {
"from": "now-30m",
"to" : "now"
}
}
}
}
This will give output like :
{
"count": 2,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
}
}
where count will carry the number of document matched.
Read more about Range Query here

Latenise token on query time

I need to latenise the query tokens that I use when querying (or filtering). I can do this on application level, but I was wondering if elasticsearch provides an out of the box solution.
I'm using ES 1.7.5 (as a service)
By default elasticsearch will use the same analyzer at index time and query time but it is possible to specify a search_analyzer which will only be used at query time.
Let's take a look at the following example:
# First we define an analyzer which will fold non ascii characters called `latinize`.
PUT books
{
"settings": {
"analysis": {
"analyzer": {
"latinize": {
"tokenizer": "standard",
"filter": ["asciifolding"]
}
}
}
},
"mappings": {
"book": {
"properties": {
"name": {
"type": "string",
"analyzer": "standard", # We use the standard analyzer at index time.
"search_analyzer": "latinize" # But we use the latinize analyzer at query time.
}
}
}
}
}
# Now let's create a document and search for it with a non latinized string.
POST books/book
{
"name": "aaoaao"
}
POST books/_search
{
"query": {
"match": {
"name": "ääöääö"
}
}
}
And bam! There is our document.
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
"_index": "books",
"_type": "book",
"_id": "AVkIXdNyDpmDHTvI6Cp1",
"_score": 0.30685282,
"_source": {
"name": "aaoaao"
}
}
]
}
}

Retrieving top terms query in Elasticsearch

I am using Elasticsearch 1.1.0 and trying to retrieve the top 10 terms in a field called text
I've tried the following, but it instead returned all of the documents:
{
"query": {
"match_all": {}
},
"facets": {
"text": {
"terms": {
"field": "text",
"size": 10
}
}
}
}
EDIT
the following is an example of the result that is returned:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2747,
"max_score": 1,
"hits": [
{
"_index": "index_name",
"_type": "type_name",
"_id": "621637640908050432",
"_score": 1,
"_source": {
"metadata": {
"result_type": "recent",
"iso_language_code": "en"
},
"in_reply_to_status_id_str": null,
"in_reply_to_status_id": null,
"created_at": "Thu Jul 16 11:08:57 +0000 2015",
.
.
.
.
What am I doing wrong?
Thanks.
First of all, don't use facets. They are deprecated. Even though you use OLD version of Elasticsearch, switch to aggregations. Quoting documentation:
Faceted search refers to a way to explore large amounts of data by
displaying summaries about various partitions of the data and later
allowing to narrow the navigation to a specific partition.
In Elasticsearch, facets are also the name of a feature that allowed
to compute these summaries. facets have been replaced by aggregations
in Elasticsearch 1.0, which are a superset of facets.
Use this query instead:
POST /your_index/your_type/_search?search_type=count
{
"aggs" : {
"text" : {
"terms" : {
"field" : "text",
"size" : 10
}
}
}
}
This will work fine
Try this:
GET /index_name/type_name/_search?search_type=count
{
"query": {
"match_all": {}
},
"facets": {
"text": {
"terms": {
"field": "text",
"size": 10
}
}
}
}

Resources