What does # mean in elastic search documents? - elasticsearch

My question is: "What does the # mean in elastic search documents?" #timestamp automatically gets created along with #version. Why is this and what's the point?
Here is some context... I have a web app that writes logs to files. Then I have logstash forward these logs to elastic search. Finally, I use Kibana to visualize everything.
Here is an example of one of the documents in elastic search:
{
"_index": "logstash-2018.02.17",
"_type": "doc",
"_id": "0PknomEBajxXe2bTzwxm",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2018-02-17T05:06:13.362Z",
"source": "source",
"#version": "1",
"message": "message",
"env": "development",
"host": "127.0.0.1"
},
"fields": {
"#timestamp": [
"2018-02-17T05:06:13.362Z"
]
},
"sort": [
1518843973362
]
}

# fields are usually ones generated by Logstash as metadata ones, #timestamp being the value that the event was processed by Logstash. Similarly #version is also being added by Logstash to denote the version number of the document.
Here is the reference.

The # field is the metadata created for Logstash. It is part of the data itself.
More info is here.

Related

What is the purpose of some fields in Elasticsearch documents to have # and _ at the beginning and some duplicates

Why some fields of document of Elasticsearch have at the beginning the characters "#" and "_" and duplicated eg:
# at the beginning: #tags, #type
_ at the beginning: _score, _type
Some fields have two versions one with "#" and the other with "_" (with the same value):
#version and _version
Some fields have two versions one with "#" and the other without:
fields(inside root) and #fields(inside _source)
Some fields are duplicated (and with the same value):
#timestamp(inside fields) and #timestamp(inside _source)
Below is a real document of Elasticsearch with these fields:
{
"_index": "logstash-2021.02.25",
"_type": "_doc",
"_id": "q9_C1ncBR7kZ5_B9FyW_",
"_version": 1,
"_score": null,
"_source": {
"arquivo": "C:\\DEV_HOME\\GeradorDePlanilha.cs",
"level": "INFO",
"#tags": [],
"#type": "amqp",
"date": "2021-02-24 22:16:20.0910",
"#version": "1",
"type": "App.Log",
"metodo": "GerarPlanilhaComDadosGerenciaisBasicosAsync",
"#timestamp": "2021-02-25T01:16:20.091Z",
"origin": "App.Api.Worker",
"#fields": {},
"#source": "nlog://DESKTOP-F8BDSSI/API.Gerencia.GeradorDePlanilha",
"logger": "App.Log.Gerencia.GeradorDePlanilha",
"#message": "Método: GerarPlanilhaComDadosGerenciaisBasicosAsync Arquivo: C:\\DEV_HOME\\GeradorDePlanilha.cs Linha: 33",
"machineName": "DESKTOP-F8BDSSI"
},
"fields": {
"metodoCompleto": [
"GerarPlanilhaComDadosGerenciaisBasicosAsync - C:\\DEV_HOME\\GeradorDePlanilha.cs"
],
"#timestamp": [
"2021-02-25T01:16:20.091Z"
]
},
"sort": [
1614215780091
]
}
_fields Are reserved fields for Elasticsearch used for meta-fields outside the document source (like _source, _score, _id, etc). This fields are system generated and standard for all documents.
#fields Have no special meaning on Elasticsearch but are used for meta-fields inside the document source. #timestamp is a convention standard for time based documents. This fields are not system generated nor standard for all documents.
You can use _fields inside the document source but you should prefer #fields. You can think on the underscore as a "system field" meaning.
For example on index names you can't start with _
About your document in particular, the _fields outside source were generated by elasticsearch and the ones inside source were generated by the Logstash conf file.

What does _doc mean in elasticsearch sort search return?

When I search with sort in elasitcsearch using _search function, I got _doc in sort field. What is the difference between it and _doc field as document type?
Elasticseach version: 6.2.2
"sort": [
1577413214250, # timestamp
393 # _doc
]
Actually, kibana also uses _doc when implement "Surrounding Documents":
{"index":["prophet-job-*"],"ignore_unavailable":true,"preference":1577428415532}
{"version":true,"size":5,"search_after":[1577413214250,385],"sort":[{"#timestamp":{"order":"asc","unmapped_type":"boolean"}},{"_doc":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["#timestamp"],"query":{"bool":{"must":[{"match_all":{}}],"filter":[],"should":[],"must_not":[]}}}
{"index":["prophet-job-*"],"ignore_unavailable":true,"preference":1577428415532}
{"version":true,"size":5,"search_after":[1577413214250,385],"sort":[{"#timestamp":{"order":"desc","unmapped_type":"boolean"}},{"_doc":{"order":"asc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["#timestamp"],"query":{"bool":{"must":[{"match_all":{}}],"filter":[],"should":[],"must_not":[]}}}
_doc in the context of sorting can be used if you do not care about the order, and simply want the documents to be returned in the most efficient way possible. Think of it as a search option, as opposed to the index doc type "_doc"
For more information about sorting by _doc, see the official documentation.
Hi, Dennis, below is a simple sample: http://test.kibana.some.net/elasticsearch/_msearch
request paylaod:
{"index":["some-index-*"],"ignore_unavailable":true,"preference":1577931761749}
{"version":true,"size":5,"search_after":[1577931865123,12],"sort":[{"#timestamp":{"order":"asc","unmapped_type":"boolean"}},{"_doc":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["#timestamp"],"query":{"bool":{"must":[{"match_all":{}}],"filter":[],"should":[],"must_not":[]}}}
{"index":["some-index-*"],"ignore_unavailable":true,"preference":1577931761749}
{"version":true,"size":5,"search_after":[1577931865123,12],"sort":[{"#timestamp":{"order":"desc","unmapped_type":"boolean"}},{"_doc":{"order":"asc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["#timestamp"],"query":{"bool":{"must":[{"match_all":{}}],"filter":[],"should":[],"must_not":[]}}}
and the partial response is
{
"_index": "some-index-2020.01.02",
"_type": "doc",
"_id": "123456",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2020-01-02T02:24:25.123Z",
"prospector": {
"type": "log"
},
"#version": "1",
"tags": [
"beats_input_codec_plain_applied"
],
"fields": {
"some-values": "xxxxxx"
},
"message": "[2020-01-02 10:24:24] [INFO] [evaluation.py:277] Finished evaluation at 2020-01-02-02:24:24"
},
"fields": {
"#timestamp": [
"2020-01-02T02:24:25.123Z"
]
},
"sort": [
1577931865123,
11 # this is _doc value
]
}
The second time I search the same thing, response has the same content except _doc value changed to 12, so I'm confused about the definition of this field.

Elastic filter with dot (.) in name

I'm pretty new to ELK and seem to start with the complicated questions ;-)
I have elements that look like following:
{
"_index": "asd01",
"_type": "doc",
"_id": "...",
"_score": 0,
"_source": {
"#version": "1",
"my-key": "hello.world.to.everyone",
"#timestamp": "2018-02-05T13:45:00.000Z",
"msg": "myval1"
}
},
{
"_index": "asd01",
"_type": "doc",
"_id": "...",
"_score": 0,
"_source": {
"#version": "1",
"my-key": "helloworld.from.someone",
"#timestamp": "2018-02-05T13:44:59.000Z",
"msg": "myval2"
}
I want to filter for my-key(s) that start with "hello." and want to ignore elements that start with "helloworld.". The dot seem to be interpreted as a wildchard and every kind of escaping doesn't seem to work.
With a filter for that as I want to be able to use the same expression in Kibana as well as in the API directly.
Can someone point me to how to get it working with Elasticsearch 6.1.1?
It's not being used as a wildcard, it's just being removed by the default analyzer (standard analyzer). If you do not specify a mapping, elasticsearch will create one for you. For string fields it will create a multi value field, the default will be text (with default analyzer - standard) and keyword field with the keyword analyzer. If you do not want this behaviour you must specify the mapping explicitly during index creation, or update it and reindex the data
Try using this
GET asd01/_search
{
"query": {
"wildcard": {
"my-key.keyword": {
"value": "hello.*"
}
}
}
}

Removing From ElasticSearch by type last 7 day

I have different logs in elasticsearch 2.2 separate by 'type'. How can delete all data, only one of type, older one week? thanks
Example of logs:
{
"_index": "logstash-2016.02.23",
"_type": "dns_ns",
"_id": "AVMOj--RqgDl5Axva2Nt",
"_score": 1,
"_source": {
"#version": "1",
"#timestamp": "2016-02-23T14:37:07.029Z",
"type": "dns_ns",
"host": "11.11.11.11",
"clientip": "22.22.22.22",
"queryname": "api.bing.com",
"zonetype": "Public_zones",
"querytype": "A",
"querytype2": "+ED",
"dnsip": "33.33.33.33"
},
"fields": {
"#timestamp": [
1456238227029
]
}
}
See here or here on how to delete by query. In Elasticsearch 2.*, you might find the Delete by Query plugin useful.
Deleting "types" is no longer directly supported in ES 2.x A better plan is to have rolling indexes, that way deleting indexes older than 7 days becomes very easy.
Take the example of logstash, it creates an index for every day. You can then create an alias for logstash so that it queries all indexes. And then when it comes time to delete old data you can simply remove the entire index with:
DELETE logstash-2015-12-16

Elasticsearch lucene query in grafana

I have Grafana 2.6 and Elasticsearch 1.6.2 as datasource
on each of my documents, I have a field "status" that can have the values "Queued", "Complete"
I would like to graph the number of documents with status:Queued on time
here is 1 document:
{
"_index": "myindex",
"_type": "e_sdoc",
"_id": "AVHFTlZiGCWSWOI9Qtj4",
"_score": 3.2619324,
"_source": {
"status": "Queued",
"update_date": "2015-12-04T00:01:35.589956",
"md5": "738b67990f820ba28f3c10bc6c8b6ea3",
"sender": "Someone",
"type": "0",
"last_client_update": "2015-11-18T18:13:32.879085",
"uuid": "a80efd11-8ecc-4ef4-afb3-e8cd75d167ad",
"name": "Europe",
"insert_date": "2015-11-18T18:14:34.302295",
"filesize": 10948809532,
"is_online": "off",
"id1": 77841,
"id2": 53550932
},
"fields": {
"insert_date": [
1447870474302
],
"update_date": [
1449187295589
],
"last_client_update": [
1447870412879
]
}
}
My question is: Grafana wants a lucene query to submit to ES
but I have no idea what I should use
Have searched through the official doc, Grafana issues or looked into ES query made by Kibana but I can't find a valid syntax that is working :/
time field was the problem. it seems there is no timestamp in my documents
edited my Elasticsearch datasource
changed 'Time field name' from #timestamp to update_date
I have now datapoints !
(see comments for the lucene query)

Resources