I'm trying to use the elastic search filter on logstash for make some Data Enrichment.
I got two indexes, and my goal it's get some data from one of them and add it to the other.
I configured a logstash filter who search in my elasticsearch and if there is a match the output goes to the index.
But my filter it's not working propery because when I test the filter i got this error
[WARN ] 2020-10-02 19:23:09.536 [[main]>worker2] elasticsearch - Failed to query elasticsearch for previous event {:index=>"logstash-*", :error=>"Unexpected character ('%' (code 37)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n
I think there it's some issue between the variable in the template and the elastic search
my logstash it's a 7.3.2 and my ES an 7.4.2
here it's my settings
Logstash.conf
input {
http{ }
}
filter {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "logstash-*"
query_template => "search-by-ip.json"
fields => {
"id" => "[suscriberid]"
}
}
}
output {
stdout { codec => rubydebug }
}
-----------------
search-by-ip.json
{
"size": 1,
"query": { "match":{"IP": %{[ip]} } }
}
-------------------
testcase.sh
curl -XPOST "localhost:8080" -H "Content-Type: application/json" -d '{
"size": 1,
"query": { "match":{"ip": "192.168.1.4" }}
}'
```
Thanks!
If you ever process an event that does not have an [ip] field then the sprintf reference will not be substituted and you will get that error.
Note that ip and IP are different fields. Not sure if the %{[ip]} requires double quotes around it.
Related
I have some logs with the following format(I changed the IPs from public to private, but you get the idea):
192.168.0.1 [20/Nov/2019:16:09:28 +0000] GET /some_path HTTP/1.1 200 2 2
192.168.0.2 [20/Nov/2019:16:09:28 +0000] GET /some_path HTTP/1.1 200 2 2
I then grok these logs using the following pattern:
grok { match => { "message" => "%{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] %{WORD:method} %{URIPATHPARAM:request} %{DATA:httpversion} %{NUMBER:response} %{NUMBER:duration}" } }
geoip { source => "clientip" }
On my output section, I have the following code:
else if "host.name" in [host][name]{ #if statement with the hostname
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "mms18-%{+YYYY.MM.dd}"
user => "admin-user"
password => "admin-password"
}
}
The problem is that when I go to Kibana the geoip.location is mapped as an object, and I can not use it on a map Dashboard.
Since the index's name changed daily, I can not manually put the correct geoip mapping, since I would have to do it every day.
One solution I thought that partially solves the problem is removing the date from the index in Logstash output, so it has a constant index of "mms18" and then using this on Kibana management console:
PUT mms18
{
"mappings": {
"properties": {
"geoip": {
"properties": {
"location": { "type": "geo_point" }
}
}
}
}
}
However, this is not ideal since I want to have the option of showing all the indexes with their respectful dates, and then choosing what to delete and what not.
Is there any way that I can achieve the correct mapping while also preserving the indexes with their dates?
Any help would be appreciated.
Use an index template (with a value for index_patterns like "mms-*") that maps geoip as a geo_point.
How do I avoid elasticsearch duplicate documents?
The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).
documentation:
File input plugin.
File rotation is detected and handled by this input,
regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
logs line count on elk server:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash input conf file:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}
You can use fingerprint filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html
This can e.g. be used to create consistent document ids when inserting
events into Elasticsearch, allowing events in Logstash to cause
existing documents to be updated rather than new documents to be
created.
Here is my excuting search to query my elasticsearch database (it works fine):
curl -XPOST 'localhost:9200/test/_search?pretty' -d '
{
"size":1,
"query": {
"match": {
"log.device":"xxxx"
}
},
"sort" : [ {
"_timestamp" :
{
"order":"desc"
}
}]
}'
I want to do the same thing through logstash with the plugin elasticsearch. However, there is no "size" option available in the website https://www.elastic.co/guide/en/logstash/current/plugins-filters-elasticsearch.html
elasticsearch {
hosts => ["localhost:9200/test"]
query => "log.device:%{[log][device]}"
sort => "#timestamp:desc"
}
Do you how to manage this problem ?
Thank you for your attention and for your help.
Joe
Since the size is hardcoded to 1 in that plugin, you don't have to add any size parameter.
Also make sure to sort on _timestamp not #timestamp.
Finally, the hosts parameter doesn't take any index.
So:
elasticsearch {
hosts => ["localhost:9200"]
query => "log.device:%{[log][device]}"
sort => "_timestamp:desc"
}
If you really need to specify an index, this is not supported yet, but I've created a PR last week in order to support this. So until this gets merged and released, you'll be able to use my version instead of the official one:
$> git clone http://github.com/consulthys/logstash-filter-elasticsearch
$> cd logstash-filter-elasticsearch
$> gem build logstash-filter-elasticsearch.gemspec
$> $LS_HOME/bin/plugin -install logstash-filter-elasticsearch-2.0.4.gem
After installing the amended plugin, you'll be able to work on a specific index:
elasticsearch {
hosts => ["localhost:9200"]
index => "test"
query => "log.device:%{[log][device]}"
sort => "_timestamp:desc"
}
i'm doing sending queries to elasticsearch and it responde with an unknown order of fields in its documents.
how can i fix the order that elsasticsearch is returning fields inside documents?
i mean, i'm sending this query:
{
"index": "my_index",
"_source":{
"includes" : ["field1","field2","field3","field14"]
},
"size": X,
"body": {
"query": {
// stuff
}
}
}
and when it responds, it gives me something not in the good order.
i ultimatly want to convert this to csv, and want to fix csv headers.
is there something to do so i can get something like
doc1 :{"field1","field2","field3","field14"}
doc2 :{"field1","field2","field3","field14"}
...
in the same order as my "_source" ?
thank's for your help.
A document in Elasticsearch is a JSON hash/map and by definition maps are unordered.
One solution around this would be to use Logstash in order to extract docs from ES using an elasticsearch input and output them in CSV using a csv output. That way you can guarantee that the fields in the CSV file will have the exact same order as specified. Another benefit is that you don't have to write your own boilerplate code to extract from ES and sink to CSV, Logstash does it all for you for free.
The Logstash configuration would look something like this:
input {
elasticsearch {
hosts => "localhost"
query => '{ "query": { "match_all": {} } }'
size => 100
index => "my_index"
}
}
filter {}
output {
csv {
fields => ["field1","field2","field3","field14"]
path => "/path/to/file.csv"
}
}
ElasticSearch Index Creation
curl -XPOST 'http://localhost:9200/music/' -d '{}'
Field Mapping
curl -XPUT 'http://localhost:9200/music/_mapping/song' -d '
{
"properties": {
"name" : {
"type" : "string"
},
"suggest": {
"type" : "completion"
}
}
}'
LogStash config file, musicStash.config
input {
file {
path => "pathToCsv"
start_position => beginning
}
}
filter {
csv {
columns => ["id", "name", "suggest"]
separator => ","
}
}
output {
elasticsearch {
hosts => "localhost"
index => "music"
document_id => "%{id}"
}
}
Now while executing logstash config file, received following exception in elasticsearch console
failed to put mappings on indices [[music]], type [logs]
java.lang.IllegalArgumentException: Mapper for [suggest] conflicts with existing mapping in other types:
[mapper [suggest] cannot be changed from type [completion] to [string]]
at org.elasticsearch.index.mapper.FieldTypeLookup.checkCompatibility(FieldTypeLookup.java:117)
And error received in logstash console,
response=>{"index"=>{"_index"=>"music", "_type"=>"logs", "_id"=>"5", "status"=>400,
"error"=>{"type"=>"illegal_argument_exception",
"reason"=>"Mapper for [suggest] conflicts with existing mapping in other types:\n[mapper [suggest] cannot be changed from type [completion] to [string]]"}}}, :level=>:warn}
So how to achieve elasticsearch auto-complete feature by importing csv file through Logstash.
You're missing the following setting in your elasticsearch output:
document_type => "song"
What happens is that logstash is creating a new type called logs (by default) and since as of ES 2.0 it is forbidden to have two fields with the same name but different types (string vs completion) in the same index, it's erroring out.
Just modify your output like this and it will work:
output {
elasticsearch {
hosts => "localhost"
index => "music"
document_type => "song"
document_id => "%{id}"
}
}
I'm the author of elasticsearch_loader
If you just want to load CSV data into elasticsearch you can make use of elasticsearch_loader
After installation you will be able to load csv/json/parquet files into elasticsearch by issuing:
elasticsearch_loader \
--index-settings-file mappings.json \
--index completion \
--type song \
--id-field id \
csv \
input1.csv input2.csv