Bulk upload log messages to local Elasticsearch - elasticsearch

We have a few external applications in cloud (IBM Bluemix) which logs its application syslogs in the bluemix logmet service which internally uses the ELK stack.
Now on a periodic basis, we would like to download the logs from the cloud and upload it into a local Elastic/Kibana instance. This is because storing logs in cloud services incurs cost and additional cost if we want to search the same by Kibana. The local elastic instance can delete/flush old logs which we don't need.
The downloaded logs will look like this
{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"Hello","type":"syslog","event_uuid":"474b78aa-6012-44f3-8692-09bd667c5822","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","#timestamp":"2017-09-29T02:30:15.598Z","message_type_str":"OUT","#version":"1","space_name_str":"prod","application_id_str":"3104b522-aba8-48e0-aef6-6291fc6f9250","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:15.598Z"}
{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"EFG","type":"syslog","event_uuid":"d902dddb-afb7-4f55-b472-211f1d370837","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","#timestamp":"2017-09-29T02:30:28.636Z","message_type_str":"OUT","#version":"1","space_name_str":"prod","application_id_str":"dcd9f975-3be3-4451-a9db-6bed1d906ae8","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:28.636Z"}
I have created an index in our local elasticsearch as
curl -XPUT 'localhost:9200/commslog?pretty' -H 'Content-Type: application/json' -d'
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"logs" : {
"properties" : {
"instance_id_str" : { "type" : "text" },
"source_id_str" : { "type" : "text" },
"app_name_str" : { "type" : "text" },
"message" : { "type" : "text" },
"type" : { "type" : "text" },
"event_uuid" : { "type" : "text" },
"ALCH_TENANT_ID" : { "type" : "text" },
"logmet_cluster" : { "type" : "text" },
"org_name_str" : { "type" : "text" },
"#timestamp" : { "type" : "date" },
"message_type_str" : { "type" : "text" },
"#version" : { "type" : "text" },
"space_name_str" : { "type" : "text" },
"application_id_str" : { "type" : "text" },
"ALCH_ACCOUNT_ID_str" : { "type" : "text" },
"org_id_str" : { "type" : "text" },
"timestamp" : { "type" : "date" }
}
}
}
}'
Now to bulk upload the file, used the command
curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '#commslogs.json'
The above command throws an error
Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]
The solution is to follow the rules for bulk upload as per
https://discuss.elastic.co/t/bulk-insert-file-having-many-json-entries-into-elasticsearch/46470/2
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
So i manually changed few of the log statements by adding action before every line
{ "index" : { "_index" : "commslog", "_type" : "logs" } }
This works!!.
Another option was to call the curl command, providing the _idex and _type in the path
curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '#commslogs.json'
but without the action, this too throws the same error
The problem is we cannot do this for thousands of log records we get. Is there an option where once we download the log files from Bluemix and upload the files without adding the action.
NOTE We are not using logstash at the moment, but
is it possible to use logstash and just use grok to transform the
logs and add the necessary entries?
How can we bulk upload documents via Logstash?
Is logstash the ideal solution or we can just write a program to
transform and do that
Thanks

As #Alain Collins said, you should be able to use filebeat directly.
For logstash:
it should be possible to use logstash, but rather than using grok, you should use the json codec/filter, it would be much easier.
You can use the file input with logstash to process many files and wait for it to finish (to know when it's finished, use a file/stdout, possibly with the dot codec, and wait for it to stop writing).
Instead of just transforming the files with logstash, you should directly upload to elasticsearch (with the elasticsearch output).
As for your problem, I think it will be much easier to just use a small program to add the missing action line or use filebeat, unless you are experimented enough with logstash config to write and logstash config quicker than a program adding one line everywhere in the document.

Related

What does "mappings" do in Elasticsearch?

I just started learning Elasticsearch. I am trying out to create index, adding data, deleting data, and search data.
I can also understand the settings of Elasticsearch.
When using "PUT" to use settings
{
"settings": {
"index.number_of_shards" : 1,
"index.number_of_replicas" : 0
}
}
When using "GET" to retrieve settings information
{
"dsm" : {
"settings" : {
"index" : {
"creation_date" : "1555487684262",
"number_of_shards" : "1",
"number_of_replicas" : "0",
"uuid" : "qsSr69OdTuugP2DUwrMh4g",
"version" : {
"created" : "7000099"
},
"provided_name" : "dsm"
}
}
}
}
However,
What does "mappings" do in Elasticsearch?
{
"kibana_sample_data_flights" : {
"aliases" : { },
"mappings" : {
"properties" : {
"AvgTicketPrice" : {
"type" : "float"
},
"Cancelled" : {
"type" : "boolean"
},
"Carrier" : {
"type" : "keyword"
},
"Dest" : {
"type" : "keyword"
},
"DestAirportID" : {
"type" : "keyword"
},
"DestCityName" : {
}, // just part of data
The mapping document is a way of describing the structure of your data and defining the types eg boolean, text, keyword. These types are important as they determine how your fields are indexed and analysed.
Elasticsearch supports dynamic mapping, so effectively performs an automatic best guess of the appropriate types but you may wish to override these.
I found this to be a useful article to explain the mapping process:
https://www.elastic.co/blog/found-elasticsearch-mapping-introduction
Indexing is determined by the field type for example where the type is 'keyword' the search engine will be expecting an exact match, when the type is 'text' the search engine will be trying to determine how well the document matches the query term and in so doing so will be performing a 'full text search'.
So for example:
- A search for jump should also match jumped, jumps, jumping, and perhaps even leap.
This is a great article describing exact vs full text search and is where I took the jump example: https://www.elastic.co/guide/en/elasticsearch/guide/current/_exact_values_versus_full_text.html
Much of the power of elasticsearch is in the mapping and analysis.
Its the mapping of the index. This means it describes the data that is stored in this index. Take a deeper look here.

Getting unknown setting [index._id] error while adding data to Elasticsearch

I have created a mapping eventlog in Elasticsearch 5.1.1. I added it successfully however while adding data under it, I am getting Illegal_argument_exception with reason unknown setting [index._id]. My result from getting the indices is yellow open eventlog sX9BYIcOQLSKoJQcbn1uxg 5 1 0 0 795b 795b
My mapping is:
{
"mappings" : {
"_default_" : {
"properties" : {
"datetime" : {"type": "date"},
"ip" : {"type": "ip"},
"country" : { "type" : "keyword" },
"state" : { "type" : "keyword" },
"city" : { "type" : "keyword" }
}
}
}
}
and I am adding the data using
curl -u elastic:changeme -XPUT 'http://localhost:8200/eventlog' -d '{"index":{"_id":1}}
{"datetime":"2016-03-31T12:10:11Z","ip":"100.40.135.29","country":"US","state":"NY","city":"Highland"}';
If I don't include the {"index":{"_id":1}} line, I get Illegal_argument_exception with reason unknown setting [index.apiKey].
The problem was arising with sending the data from the command line as a string. Keeping the data in a JSON file and sending it as binary solved it. The correct command is:
curl -u elastic:changeme -XPUT 'http://localhost:8200/eventlog/_bulk?pretty' --data-binary #eventlogs.json

Kibana Create Index Pattern : strange behaviour of wildcard

I have just one index in elasticsearch, with name aa-bb-YYYY-MM.
Documents in this index contain a field i want to use as date field.
Those documents have been inserted from a custom script (not using logstash).
When creating the index pattern in kibana:
If i enter aa-bb-*, the date field is not found.
If i enter aa-*, the date field is not found.
If i enter aa*, the date field is found, and i can create the index pattern.
But i really need to group indexes by the first two "dimensions".I tried using "_" instead "-", with the same result.
Any idea of what is going on?
Its working for me. I'm on the latest build on the 5.0 release branch (just past the beta1 release). I don't know what version you're on.
I created this index and added 2 docs;
curl --basic -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09' -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"test" : {
"properties" : {
"date" : { "type" : "date"},
"action" : {
"type" : "text",
"analyzer" : "standard",
"fields": {
"raw" : { "type" : "text", "index" : "not_analyzed" }
}
},
"myid" : { "type" : "integer"}
}
}
}
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/1' -d '{
"date" : "2015-08-23T00:01:00",
"action" : "start",
"myid" : 1
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/2' -d '{
"date" : "2015-08-23T14:02:30",
"action" : "stop",
"myid" : 1
}'
and I was able to create the index pattern with aa-bb-*

Elasticsearch mapping not working as expected

Having the following mapping:
curl -X PUT 'localhost:9200/cambio_indice?pretty=true' -d '{
"mappings" : {
"el_tipo" : {
"properties" : {
"name" : { "type" : "string" },
"age" : { "type" : "integer" },
"read" : { "type" : "integer" }
}}}}'
If I add the following code it works perfectly even though it doesn't match with the mapping (read is missing) but ES doesn't complain.
curl -X PUT 'localhost:9200/cambio_indice/el_tipo/1?pretty=true' -d '{
"name" : "Eduardo Inda",
"age" : 23
}'
And if I add the following entry, it also works.
curl -X PUT 'localhost:9200/cambio_indice/el_tipo/2?pretty=true' -d '{
"jose" : "stuff",
"ramon" : 23,
"garcia" : 1
}'
It seems that the mapping is not taking effect on the elements I'm adding. I'm doing something wrong when I try to map my type?
This is the default behaviour of Elasticsearch and is desirable in most of the cases. But for your case, if you do not want to allow indexing of fields not defined in your mapping, you need to update the mapping and set its "dynamic" property to "strict". Basically, your mapping definition should look like below:
{
"mappings": {
"el_tipo": {
"dynamic": "strict",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"read": {
"type": "integer"
}
}
}
}
}
Then if you try to index fields like "jose", "ramon" or "garcia", Elasticsearch will throw with an appropriate message saying that the dynamic addition of these fields is prohibited.
As per documentation Of ES:
By default, Elasticsearch provides automatic index and mapping when data is added under an index that has not been created before. In other words, data can be added into Elasticsearch without the index and the mappings being defined a priori. This is quite convenient since Elasticsearch automatically adapts to the data being fed to it - moreover, if certain entries have extra fields, Elasticsearch schema-less nature allows them to be indexed without any issues.
So new fields added by you will get automatically added to your mappings.
See this for more info

Issues when replicating from couchbase bucket to elasticsearch index?

This issue seems to be related to using the XDCR in couchbase. If I had the following simple objects
1: { "name" : "Mark", "age" : 30}
2: { "name" : "Bill", "age" : "forty"}
and set up an elasticsearch index as such
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"dynamic_templates": [
{
"store_generic": {
"match": "*",
"mapping": {
"store": "yes"
}
}
}
]
}
}'
I can then add the two objects to this index using the REST API
curl -XPUT localhost:9200/test/couchbaseDocument/1 -d '{
"name" : "Mark",
"age" : 30
}'
curl -XPUT localhost:9200/test/couchbaseDocument/2 -d '{
"name" : "Bill",
"age" : "forty"
}'
They are now both searchable (despite the fact the "age" is long for one and string for the other.
If, however, I stored these two objects in a couchbase bucket (rather than straight to elasticsearch) and set up the XDCR the first object replicates fine but the second fails with the following error
failed to execute bulk item (index) index {[test][couchbaseDocument][2], source[{"doc":{"name":"Bill","age":"forty"},"meta":{"id":"2","rev":"8-00000b9360d0a0bf0000000000000000","expiration":0,"flags":0}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [doc.age]
I can't figure out why it works via the REST API but not when couchbase replicates the same objects.
I followed the answer and used the following mapping to get things to work via XDCR
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"properties" : {
"doc": {
"properties" : {
"name" : {"type" : "string", "store" : "yes"},
"age" : {"type" : "string", "store" : "yes"}
}
}
}
}
}'
Now all the objects (despite having different types for the same fields) are replicated and searchable. I don't think there was any need to include the dynamic_templates approach I initially tried. The mapping works.
It's something you have to solve on elasticsearch side.
If the same field name can contain both numeric values and string values, you should create a mapping first which says that age is a String.
So elasticsearch won't try to auto guess type for this field.
Hope this helps

Resources