elasticsearch: define field's order in returned doc - elasticsearch

i'm doing sending queries to elasticsearch and it responde with an unknown order of fields in its documents.
how can i fix the order that elsasticsearch is returning fields inside documents?
i mean, i'm sending this query:
{
"index": "my_index",
"_source":{
"includes" : ["field1","field2","field3","field14"]
},
"size": X,
"body": {
"query": {
// stuff
}
}
}
and when it responds, it gives me something not in the good order.
i ultimatly want to convert this to csv, and want to fix csv headers.
is there something to do so i can get something like
doc1 :{"field1","field2","field3","field14"}
doc2 :{"field1","field2","field3","field14"}
...
in the same order as my "_source" ?
thank's for your help.

A document in Elasticsearch is a JSON hash/map and by definition maps are unordered.
One solution around this would be to use Logstash in order to extract docs from ES using an elasticsearch input and output them in CSV using a csv output. That way you can guarantee that the fields in the CSV file will have the exact same order as specified. Another benefit is that you don't have to write your own boilerplate code to extract from ES and sink to CSV, Logstash does it all for you for free.
The Logstash configuration would look something like this:
input {
elasticsearch {
hosts => "localhost"
query => '{ "query": { "match_all": {} } }'
size => 100
index => "my_index"
}
}
filter {}
output {
csv {
fields => ["field1","field2","field3","field14"]
path => "/path/to/file.csv"
}
}

Related

No effect of “size” in ‘query’ while reindexing in elasticsearch

I have been using logstash to migrate a index to another. I have recently tried to reindex certain amount of data from large dataset in local environment. So I tried using following configuration for migration:
input{
elasticsearch{
hosts=>"localhost:9200"
index=>"old_indexindex"
query=>'{"query":{"match_all":{}},"size":10 }'
}
}filter{
mutate{
remove_field=>[
"#version",
"#timestamp"
]
}
}output{
elasticsearch{
hosts=>"localhost:9200"
index=>"new_index"
document_type=>"contact"
manage_template=>false
document_id=>"%{contactId}"
}
}
But this reindexes all the documents in old_index to new_index, where as , I was expecting just 10 documents to be reindexed in new_index.
Am I missing some concept using logstash with elasticsearch?
The elasticsearch input doesn't make a conventional search, but does a scan/scroll search type instead. This means that all data will be retrieved from the index and the role of the size parameter just serves to define how much data will be fetched during each scroll, not how much data will be fetched altogether.
Also, note that the size parameter in the query itself has no effect. You need to use the size parameter of the elasticsearch input and not specify it in the query.
input{
elasticsearch{
hosts=> "localhost:9200"
index=> "old_index"
query=> '*'
size => 10 <--- size goes here
}
}
That being said, if you're running ES 2.3 or later, there's a way to achieve what you desire using the Reindex API, like this:
POST /_reindex
{
"size": 10,
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}

How to make a field in Kibana numeric (from String)

I've inherited an ELK stack for logs and I'm still learning the ropes - I've been tasked with making two fields numeric on a certain type on our logstash indexes. Can't seem to figure out how to do this. Things I tried:
In the Kibana settings page, went to my logstash index and found the field. Went to edit on the controls tab, saw type listed as String (and it was immutable). Dropdown for format shows URL and String.
Went to one of my Elasticsearch hosts and found the grok rule for the document type, and found that they were indeed written to parse the field as a number. Example: %{NUMBER:response_code}
Ran out of ideas, since I don't know my way around the ELK stack.
Any help greatly appreciated, especially links to relevant documentation so I can understand what's going on. I'd be googling harder if I knew what to google.
Also note that %{NUMBER:response_code} doesn't make a number out of a string, it simply recognizes and parses a number present in a string, but the resulting response_code field is still a string, which you need to convert to number using a mutate/convert filter. grok will always parse a string into other smaller strings and it is your job to convert the resulting fields into the types you expect.
So you need to add this after your grok filter:
mutate {
convert => { "response_code" => "integer" }
}
From then on, the response_code in your event will be an integer and the logstash template used to create your daily logstash indices contains a specific dynamic template for integer fields. Note that the response_code field will be an integer only once the new logstash index is created, the existing indices will not change.
You will need to reindex your data. Because the Elasticsearch mapping (ie. schema) is already set to string for this field, you will not be able to index data as an integer within the same index.
A typical ELK setup will create rolling indices (per day or month), so it's possible to switch from string to interger between indices, but this is not recommended as it will interfere with long term aggregations and searches.
As you found out, changing the Grok rule will help with future data. Now, you need to pass all your existing data through Logstash again to apply the new ryles.
To do this, you can either pass the log files again, or have Logstash read from Elasticsearch using
input {
elasticsearch {
hosts => "localhost"
}
}
The newer versions of Elasticsearch should improve this by providing a native reindex API.
Try to view sample of documents:
curl -XGET 'localhost:9200/_search?q=opcode:userLessonComplexityPoll&pretty'
let say you see these docs:
{
"_index" : "myindex",
"_type" : "logs",
"_id" : "AWNoYI8pGmxxeL6jupEZ",
"_score" : 1.0,
"_source" : {
"production" : "0",
"lessonId" : "2144",
"opcode" : "userLessonComplexityPoll",
"courseId" : "45",
"lessonType" : "minitest",
...
So, try to convert in one document:
curl -XPOST 'localhost:9200/educa_stats-*/_update_by_query?pretty' -d '
{
"script": {
"lang": "painless",
"source": "if(ctx._source.lessonId instanceof String) { int lessonId = Integer.parseInt(ctx._source.lessonId); ctx._source.lessonId = (int)lessonId; }"
},
"query": {
"bool": {
"terms": {
"_id": ["AWNoYI8pGmxxeL6jupEZ", "AWMcRJYFGmxxeL6jucIZ"]
}
}
}
}'
success? Try to convert all documents by query:
curl -XPOST 'localhost:9200/educa_stats-*/_update_by_query?pretty' -d '
{
"script": {
"lang": "painless",
"source": "if(ctx._source.lessonId instanceof String) { int lessonId = Integer.parseInt(ctx._source.lessonId); ctx._source.lessonId = (int)lessonId; }"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "lessonId"
}
}
]
}
}
}'
All fields lessonId will be converted from String type to int (-2^32 - 2^32) type. It's all.

Insert aggregation results into an index

The goal is to build an Elasticsearch index with only the most recent documents in groups of related documents to track the current state of some monitoring counters and states.
I have crafted a simple Elasticsearch aggregation query:
{
"size": 0,
"aggs": {
"group_by_monitor": {
"terms": {
"field": "monitor_name"
},
"aggs": {
"get_latest": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
It groups related documents into buckets and select the most recent document for each bucket.
Here are the different ideas I had to get the job done:
directly use the aggregation query to push the results into the index, but it does not seem possible : Is it possible to put the results of an ElasticSearch aggregation back into the index?
use the Logstash Elasticsearch input plugin to execute the aggregation query and the Elasticsearch output plugin to push into the index, but seems like the input plugin only looks at the hits field and is unable to handle aggregation results: Aggregation Query possible input ES plugin !
use the Logstash http_poller plugin to get a JSON document, but it does not seem to allow specifying a body for the HTTP request !
use the Logstash exec plugin to execute cURL commands to get the JSON but this seems quite cumbersome and my last resort.
use the NEST API to build a basic application that will do polling, extract results, clean them and inject the resulting documents into the target index, but I'd like to avoid adding a new tool to maintain.
Is there a reasonably complex way of accomplishing this?
Edit the logstash.conf file as follow
input {
elasticsearch {
hosts => "localhost"
index => "source_index_name"
type =>"index_type"
query => '{Query}'
size => 500
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
index => "target_index_name"
document_id => "%{[#metadata][_id]}"
}
}

Get elasticsearch indices before specific date

My logstash service sends the logs to elasticsearch as daily indices.
elasticsearch {
hosts => [ "127.0.0.1:9200" ]
index => "%{type}-%{+YYYY.MM.dd}"
}
Does Elasticsearch provides the API to lookup the indices before specific date?
For example, how could I get the indices created before 2015-12-15 ?
The only time I really care about what indexes are created is when I want to close/delete them using curator. Curator has "age" type features built in, if that's also your use case.
I think you are looking for Indices Query have a look here
Here is an example:
GET /_search
{
"query": {
"indices" : {
"query": {
"term": {"description": "*"}
},
"indices" : ["2015-01-*", "2015-12-*"],
"no_match_query": "none"
}
}
}
Each index has a creation_date field.
Since the number of indices is supposed to be quite small there's no such feature as 'searching for indices'. So you just get their metadata and filter them inside your app. The creation_date is also available via _cat API.

Sorting a match query with ElasticSearch

I'm trying to use ElasticSearch to find all records containing a particular string. I'm using a match query for this, and it's working fine.
Now, I'm trying to sort the results based on a particular field. When I try this, I get some very unexpected output, and none of the records even contain my initial search query.
My request is structured as follows:
{
"query":
{
"match": {"_all": "some_search_string"}
},
"sort": [
{
"some_field": {
"order": "asc"
}
}
] }
Am I doing something wrong here?
In order to sort on a string field, your mapping must contain a non-analyzed version of this field. Here's a simple blog post I found that describes how you can do this using the multi_field mapping type.

Resources