'tags' in logstash modifying 'tags' column from sql query - elasticsearch

I have a LogStash setup that will fetch data from Postgres. The problem is, I have a column called "tags" in SQL which is an array of strings. But during the insertion of data, the Logstash will append the "Logstash tag" into the tags array of my column. Is there any way to override that?
jdbc {
tags => "color_list"
jdbc_connection_string => "jdbc:postgresql://${DB_HOST}:${DB_PORT}/${DB_NAME}"
jdbc_user => "${DB_USER}"
jdbc_password => "${DB_PASSWORD}"
schedule => "* * * * *"
jdbc_driver_class => "org.postgresql.Driver"
statement => "SELECT tags
from color_table;"
In my table, the tags column is empty. So I am expecting an empty array. But am receiving [color_list] instead of []. Is there any way to override that?

Logstash manages "tags" on each event. It is some kind of metadata, that you can manupulate in your pipeline : all filters have add_tag and remove_tag options, for example. Some filters will automatically add tags on failure (for example: grokparsefailure if the grok pattern doesn't match on the event's contents).
I would advise to not attempt to use the tags field for anything else.
I suggest you rename your field from the DB instead. How about this ?
statement => "SELECT tags as db_tags
from color_table;"
Then in your doument you can process the [db_tags] field as you expect, and leave the [tags] field for logstash.

Related

logstash elasticsearch input plugin query do not execute by schedule

I want to import the latest data from one elasticsearch cluster index to another index. so I have made the following input plugin setting
elasticsearch {
hosts => ["xxxxx"]
user => "xxxx"
password => "xxxxx"
index => "xxxxxx"
query => '{"sort":[{"timestamp":{"order":"desc"}}]}'
schedule => "*/1 * * * *"
size => 10000
scroll => "1m"
docinfo => true
}
the data was imported to the target index successfully, but the timestamp was not latest. the data I get in the target index, the timestamp is the time I start the logstash. the query not execute as the schedule.
I want to know the elasticsearch plugin will import the data until all the data find by the query was inputed, than start another job.

Elasticsearch maintenance an unique _id across the aliases

We have ES data where we have several indexes belong to the same alias. One of them is a written index.
How can we keep the _id of documents is unique across the indexes belong to the same alias?
We are right now having a duplicated _id on our alias. Each index has 1 record of the same id. We only want the lastest record of that _id on our data, the newer will overwrite the older.
If i correctly understand the problem, you can have uniqueness of data by using _id value as a fingerprint value via logstash [ assuming its being used].
You can have something like the below in your logstash filter:
fingerprint{
source => ["session_id"]
method => "SHA1"
}
This value in the fingerprint field can then be used to put the data in an index and updated on top of an already existing document.
Below is an example of output section in logstash:
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "indexname"
action => "update"
document_id => "%{fingerprint}"
doc_as_upsert => true
}

update multiple records in elastic using logstash

Hi guy i have issue with updating multiple records in elastic using logstash.
My logstash configuration is bellow
output {
elasticsearch {
hosts => "******"
user => "xxxxx"
password => "yyyyyy"
index => "index_name"
document_type => "doc_type"
action => "update"
script_lang => "painless"
script_type => "inline"
document_id => "%{Id}"
script => 'ctx._source.Tags = params.event.get("Tags");'
}
}
My output to logstash dump folder looks like:
{"index_name":"feed_name","doc_type":"doc_type","Id":["b504d808-f82d-4eaa-b192-446ec0ba487f", "1bcbc54f-fa7a-4079-90e7-71da527f56a5"],"es_action":"update","Tags": ["tag1","tag2"]}
My biggest issue here is that I am not able to update those two recods at once but I need to create two records each with different ID.
Is there a why to solve this by writing query in my output configuration?
In sql that would look someting like this:
Update Table
SET Tags
WHERE ID in (guid1, guid2)
I know that in this case I can add two records in logstash and problem solved but I need to solve second issue where I need to replace all records that have one tag1 and give it newTag.
Have you considered to use the split filter in order to clone the event in events with one id each one? It seems the filter can help you.

Why parse failure on Log-stash occurs while field type is the same as before with no change?

The logstash log file says:
"tags"=>["_grokparsefailure"]}, "status_code"]}>, #data={"#version"=>"1", "#timestamp"=>"2016-09-24T08:00:54.894Z", "path"=>"/var/log/nginx/access.log", "host"=>"sample-com", "remote_addr"=>"127.0.0.1", "remote_user"=>"-", "date"=>"05/Sep/2016:10:03:01 +0000", "method"=>"GET", "uri_path"=>"/accounts", "version"=>"HTTP/1.1", "status_code"=>"200", "body_byte_sent"=>419, "referer"=>"-", "user_agent"=>"python-requests/2.4.3 CPython/2.7.9 Linux/3.16.0-4-amd64", "request_time"=>6.161, "auth_type"=>"Bearer", "client_id"=>"beta",
"web_client_ip"=>"172.*.131.177", "response_json"=>{"_links"=>{"applications"=>{"href"=>"/applications"}, "menus"=>{"href"=>"/menus"}, "messages"=>{"href"=>"/messages"}, "numbers"=>{"href"=>"/numbers"}, "self"=>{"href"=>"/accounts"}}, "account_status"=>"active", "creation_date"=>"2016-06-07 09:25:18", "credit"=>{"balance"=>#<BigDecimal:367dbf49,'0.19819267E4',8(12)>, "currency"=>"usd"}, "email"=>"*#gmail.com",
"id"=>"677756yt7557", "lastname"=>"Qurbani", "name"=>"M", "notifications"=>{"black_list"=>{"uids"=>[]}, "settings"=>{"email"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}, "language"=>"en", "push_notif"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}, "sms"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}}}, "phone"=>"+9****", "status"=>"inactive", "verification_status"=>{"email"=>"unverified", "phone"=>"verified"}}, "request_json"=>{}, "tags"=>["_grokparsefailure"]}, #metadata_accessors=#<LogStash::Util::Accessors:0x6ec6acbe #store={"path"=>"/var/log/nginx/access.log"}, #lut={"[path]"=>[{"path"=>"/var/log/nginx/access.log"}, "path"]}>,
#cancelled=false>], :response=>{"create"=>{"_index"=>"logstash-api-2016.09.24", "_type"=>"logs", "_id"=>"AVdbNisZCijYhuqEamFy", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [response_json.credit]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [balance]"}}}}, :level=>:warn}
Here I have a log like below in credit section:
"credit": {"balance": 0.0, "currency": "usd"}
I have removed all the indices from Elasticsearch, and I didn't found any .sincedb* in home or elsewhere to remove logstash DB.
Why this error happens when I don't actually have a change in balance value? What is the reason for that?
After restarting Logstash it does not aggregate data from log files!
I removed all since_dbs_* from /var/lib/logstash/ and said to start tailing from the beginning position in Logstash configuration.
Now the below error is raised:
object mapping for [response_json.credit] tried to parse field [credit] as object, but found a concrete value
It seems that sometimes credit is sent as a scalar value and sometimes as an object with two fields!
EDIT1:
2 different credit fields with different data has been posted to one credit in Elasticsearch. So I tried to rename these fields and remove the credit from both configs in logstash, so for now I have:
add_field => {"first_credit" => "%{[response_json.credit]}"}
remove_field => ["response_json.credit"]
New fields get added, but the value is literally %{[response_json.credit]} and field is not removed so error happens again. I want to get the value of credit and put it inside of first credit and remove the credit itself. I even tried the below:
add_field => {"first_credit" => "%{[response_json][credit]}"}
remove_field => ["response_json.credit"]
What I'm doing wrong?
EDIT:2
I have noticed that one file access.log has a credit field with different values.
One credit is numeric: 2.99
The other credit is a JSON: {"currency": "usd", "balance": 2.99}
I used the below logstash configuration to solve the problem and save them all as a string in ES:
if ([response_json][credit]) {
mutate {
add_field => {"new_credit" => "%{[response_json][credit]}"}
remove_field => [ "[response_json][credit]" ]
}
}
It gives the below error:
"new_credit"=>"{\"balance\":3.102,\"currency\":\"usd\"}", "tags"=>["_grokparsefailure"]},
#metadata_accessors=#<LogStash::Util::Accessors:0x46761362 #store={"path"=>"/var/log/nginx/access.log.1"},
#lut={"[path]"=>[{"path"=>"/var/log/nginx/access.log.1"}, "path"]}>,
#cancelled=false>], :response=>{"create"=>{"_index"=>"logstash-api-2016.09.27", "_type"=>"logs", "_id"=>"AVdqrION3CJVjhZgZcnl", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [new_credit]", "caused_by"=>{"type"=>"number_format_exception", "reason"=>"For input string: \"{\"balance\":3.102,\"currency\":\"usd\"}\""}}}}, :level=>:warn
From looking at your log, "credit"=>{"balance"=>#<BigDecimal:367dbf49,'0.19819267E4',8(12)>, I think this issue may be related.
If you check the Elasticsearch mapping of your index at {elasticsearch:ip}:9200/logstash-api-2016.09.24/_mapping, I bet that the balance field has an Integer mapping. If there was initially an integer mapping, any value that is not an integer (for example, an object) will fail.
You can resolve this by creating an index template that specifies balance as a float. If you choose to do this, ensure that you delete the old index or create a new one, as existing mappings cannot be modified.
You could also ensure that balance is always the same data type in the source of the logs.
Or you could add a mutate filter and convert the balance field to your desired data type.
Check out your mapping and let me know if my theory is right. :)
EDIT:
The code block you just sent me will have exactly the same problem as before - object credit and int credit will be stored in the same field. The following will store credit[balance] (an int) and int credit in the same field called new_credit, which should be mapped to an Integer.
if ([response_json][credit][balance]) {
mutate {
add_field => {"new_credit" => "%{[response_json][credit][balance}"}
remove_field => [ "[response_json][credit]" ]
}
}
else {
mutate {
add_field => {"new_credit" => "%{[response_json][credit]}"}
remove_field => [ "[response_json][credit]" ]
}
}

Logstash Filter for a custom message

I am trying to parse a bunch of strings in Logstash and output is set as ElasticSearch.
Sample input string is: 2016 May 24 10:20:15 User1 CREATE "Create a new folder"
The grok filter is:
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{WORD:user} %{WORD:action_performed} %{WORD:action_description} "}
In Elasticsearch, I am not able to see separate columns for different field such as timstamp, user, action_performed etc.
Instead the whole string is under a single column "message".
I would like to store the information in separate fields instead of just a single column.
Not sure what to change in logstash filter to achieve as desired.
Thanks!
You need to change your grok pattern with this, i.e. use QUOTEDSTRING instead of WORD and it will work!
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{WORD:user} %{WORD:action_performed} %{QUOTEDSTRING:action_description}"}

Resources