How to delete all documents in elasticsearch with logstash from a search - elasticsearch

I am using logstash to pass data to elasticsearch and I would like to know how to delete all documents.
I do this to remove those that come with id, but what I need now is to delete all documents that match a fixed value, for example Fixedfield = "Base1" regardless of whether the id that is obtained in jdbc input exists or not.
The idea is to delete all the documents where elasticsearch fixedField = "Base1" exists and insert the new documents that I get from the jdbc input, this way I avoid leaving documents that no longer exist in my source (jdbc input).
A more complete example
My document_id is formed: 001, 002, 003, etc.
My fixed field is made up of "Base1" for the three document_id
Any ideas?
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://xxxxx;databaseName=xxxx;"
statement => "Select * from public.test"
}
}
filter {
if [is_deleted] {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "delete"
}
}
mutate {
remove_field => [ "is_deleted","#version","#timestamp" ]
}
} else {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "index"
}
}
mutate {
remove_field => [ "is_deleted","#version","#timestamp" ]
}
}
}
output {
elasticsearch {
hosts => "xxxxx"
user => "xxxxx"
password => "xxxxx"
index => "xxxxx"
document_type => "_doc"
document_id => "%{id}"
}
stdout { codec => rubydebug }
}

I finally managed to eliminate, but ..... the problem I have now that apparently when the input starts, it counts the number of records it gets and when it continues towards the output, it eliminates in the first round and in The following n-1 turns the error message is displayed:
[HTTP Output Failure] Encountered non-2xx HTTP code 409
{:response_code=>409,
:url=>"http://localhost:9200/my_index/_delete_by_query",
The other, which I think may be happening is that _delete_by_query is not a bulk bulk deletion, but rather query / delete, which would lead to the query returning n results and therefore trying to delete n times.
Any ideas how I could iterate it once or how to avoid that error?
I clarify that the error is not only displayed once, but the number of documents to be deleted is displayed n-1 times
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://xxxxx;databaseName=xxxx;"
statement => "Select * from public.test"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
hosts => "localhost:9200"
index => "%{[#metadata][miEntidad]}"
document_type => "%{[#metadata][miDocumento]}"
document_id => "%{id}"
}
http {
url => "http://localhost:9200/my_index/_delete_by_query"
http_method => "post"
format => "message"
content_type => "application/json; charset=UTF-8"
message => '{"query": { "term": { "properties.codigo.keyword": "TEX_FOR_SEARCH_AND_DELETE" } }}'
}
}

Finally it worked like this:
output {
http {
url => "http://localhost:9200/%{[#metadata][miEntidad]}/_delete_by_query?conflicts=proceed"
http_method => "post"
format => "message"
content_type => "application/json; charset=UTF-8"
message => '{"query": { "term": { "properties.code.keyword": "%{[properties][code]}" } }}'
}
jdbc {
connection_string => 'xxxxxxxx'
statement => ["UPDATE test SET estate = 'A' WHERE entidad = ? ","%{[#metadata][miEntidad]}"]
}
}

Related

Elasticsearch array of an object using logstash

I have a mysql database working as a primary database and i'm ingesting data into elasticsearch from mysql using logstash. I have successfully indexed the users table into elasticsearch and it is working perfectly fine however, my users table has fields interest_id and interest_name which contains the ids and names of user interests as follows:
"interest_id" : "1,2",
"interest_name" : "Business,Farming"
What i'm trying to achieve:
I want to make an object of interests and this object should contain array of interest ids and interests_names like so:
interests : {
[
"interest_name" : "Business"
"interest_id" : "1"
],
[
"interest_name" : "Farming"
"interest_id" : "2"
]
}
Please let me know if its possible and also what is the best approach to achieve this.
My conf:
input {
jdbc {
jdbc_driver_library => "/home/logstash-7.16.3/logstash-core/lib/jars/mysql-connector-java-
8.0.22.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/"
jdbc_user => "XXXXX"
jdbc_password => "XXXXXXX"
sql_log_level => "debug"
clean_run => true
record_last_run => false
statement_filepath => "/home/logstash-7.16.3/config/queries/query.sql"
}
}
filter {
mutate {
remove_field => ["#version", "#timestamp",]
}
}
output {
elasticsearch {
hosts => ["https://XXXXXXXXXXXX:443"]
index => "users"
action => "index"
user => "XXXX"
password => "XXXXXX"
template_name => "myindex"
template => "/home/logstash-7.16.3/config/my_mapping.json"
template_overwrite => true
}
}
I have tried doing this by creating a nested field interests in my mapping and then adding mutate filer in my conf file like this:
mutate {
rename => {
"interest_id" => "[interests][interest_id]"
"interest_name" => "[interests][interest_name]"
}
With this i'm only able to get this output:
"interests" : {
"interest_id" : "1,2",
"interest_name" : "Business,Farming"
}

Need clarification on sql_last_value used in Logstash configuration

Hi All i am using below code for indexing data from MSSql server to elasticsearch but i am not clear about this sql_last_value.
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://xxxx:1433;databaseName=xxxx;"
jdbc_user => "xxxx"
jdbc_paging_enabled => true
tracking_column => modified_date
tracking_column_type => "timestamp"
use_column_value => true
jdbc_password => "xxxx"
clean_run => true
schedule => "*/1 * * * *"
statement => "Select * from [dbo].[xxxx] where modified_date >:sql_last_value"
}
}
filter {
if [is_deleted] {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "delete"
}
}
mutate {
remove_field => [ "is_deleted","#version","#timestamp" ]
}
} else {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "index"
}
}
mutate {
remove_field => [ "is_deleted","#version","#timestamp" ]
}
}
}
output {
elasticsearch {
hosts => "xxxx"
user => "xxxx"
password => "xxxx"
index => "xxxx"
action => "%{[#metadata][elasticsearch_action]}"
document_type => "_doc"
document_id => "%{id}"
}
stdout { codec => rubydebug }
}
Where this sql_last_value stored and how to view that physically?
Is it possible to set a customized value to sql_last_value?
Could any one please clarify on above queries?
The sql_last_value is stored in the file called .logstash_jdbc_last_run and according to the docs it is stored in $HOME/.logstash_jdbc_last_run. The file itself contains the timestamp of the last run and it can be set to a specific value.
You should define the last_run_metadata_path parameter for each single jdbc_input_plugin and point to a more specific location, as all running jdbc_input_plugin instances will share the same .logstash_jdbc_last_run file by default and potentially lead into unwanted results.

Logstash if type condition not working

In my Logstash I have one pipeline that runs 2 SQL queries to download data. Below conf file for the pipeline:
input {
jdbc {
jdbc_driver_library => "/opt/logstash/lib/ojdbc8.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
#Hidden db connection details
statement_filepath => "/etc/logstash/queries/transactions_all.sql"
type => "transactions"
}
jdbc {
jdbc_driver_library => "/opt/logstash/lib/ojdbc8.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
#Hidden db connection details
statement_filepath => "/etc/logstash/queries/snow_db_stats_ts_all.sql"
type => "db_stats_ts"
}
output{
if [type] == "transactions" {
elasticsearch {
index => "servicenow_oraaspoc-%{+YYYY.MM.dd}"
hosts => ["localhost:9200"]
}
}
if [type] == "db_stats_ts" {
elasticsearch {
index => "snow_db_stats_ts-%{+YYYY.MM.dd}"
hosts => ["localhost:9200"]
}
}
stdout {
codec => rubydebug
}
}
I can see in the console that everything works fine except index with type transactions is never saved to Elasticsearch. This condition if [type] == "transactions" { is never true and the second condition works without any problems. I tried to run pipeline just with transactions index without if condition and it worked fine. For some reason this one if condition is not working but why?
I have found one ridiculous workaround but it won't work if I encounter another index with this problem:
if [type] == "db_stats_ts" { .. } else {
elasticsearch {
index => "servicenow_oraaspoc-%{+YYYY.MM.dd}"
hosts => ["localhost:9200"]
}
}
ike Thomas Decaux mentioned you can use tags instead of type, btw you can use as many tags as you want for each jdbc block. Remove types from config file with tags like in the example below:
tags => ["transactions"]
tags => ["db_stats_ts"]
Your output block should look like that:
output{
if "transactions" in [tags] {
elasticsearch {
index => "servicenow_oraaspoc-%{+YYYY.MM.dd}"
hosts => ["localhost:9200"]
}
}
if "db_stats_ts" in [tags] {
elasticsearch {
index => "snow_db_stats_ts-%{+YYYY.MM.dd}"
hosts => ["localhost:9200"]
}
}
}

Logstash error when converting MySQL value to nested elasticsearch property on suggestion field

A Huge cry for help here, When i try to convert a MySQL value to a nested elasticsearch field using logstash i get the following error.
{"exception"=>"expecting List or Map, found class org.logstash.bivalues.StringBiValue", "backtrace"=>["org.logstash.Accessors.newCollectionException(Accessors.java:195)"
Using the following config file:
input {
jdbc {
jdbc_driver_library => "/logstash/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/data"
jdbc_user => "username"
jdbc_password => "password"
statement => "SELECT id, suggestions, address_count FROM `suggestions` WHERE id <= 100"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
filter {
mutate {
rename => { 'address_count' => '[suggestions][payload][count]' }
}
}
output {
elasticsearch {
hosts => [
"localhost:9200"
]
index => "dev_suggestions"
document_type => "address"
}
}
However if i rename address_count to a field that is not already in my mapping, Then it works just fine and it correctly adds the value as a nested property, I have tried on other fields in my index and not just suggestions.payloads.address_count and i get the same issue, It only works if the field has not been defined in the mapping.
This has caused me some serious headaches and if anyone could help me out to overcome this issue i would greatly appreciate it as Ive spent the last 48 hours banging my head on the table!
I initially assumed i could do the following with a MySQL query:
SELECT id, suggestion, '[suggestions][payload][count]' FROM `suggestions` WHERE id <= 100
Then i also tried
SELECT id, suggestion, 'suggestions.payload.count' FROM `suggestions` WHERE id <= 100
Both failed to insert the value with the later option giving an error that a field can not contain dots.
And finally the mapping:
{
"mappings": {
"address": {
"properties": {
"suggestions": {
"type": "completion",
"payloads" : true
}
}
}
}
}
Thanks to Val - and for future users in the same situation as myself that need to convert MySQL data into nested Elasticsearch objects using logstash, Here is a working solution using Logstash 5 and Elasticsearch 2.*
input {
jdbc {
jdbc_driver_library => "/logstash/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/data"
jdbc_user => "username"
jdbc_password => "password"
statement => "SELECT addrid, suggestion, address_count FROM `suggestions` WHERE id <= 20"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
filter {
ruby {
code => "
event.set('[suggestions][input]', event.get('suggestion'))
event.set('[suggestions][payload][address_count]', event.get('address_count'))
event.set('[v][payload][id]', event.get('addrid'))
"
remove_field => [ 'suggestion', 'address_count', 'addrid' ]
}
}
output {
elasticsearch {
hosts => [
"localhost:9200"
]
index => "dev_suggestions"
document_type => "address"
}
}
I think you need to proceed differently. First, I would rename the suggestions field in your SQL query to something else and then build the suggestions object from the values you get from your SQL query.
statement => "SELECT id, suggestion, address_count FROM `suggestions` WHERE id <= 100"
Then you could use a ruby filter (and remove your mutate one) in order to build your suggestions field, like this:
Logstash 2.x code:
ruby {
code => "
event['suggestions']['input'] = event['suggestion']
event['suggestions']['payload']['count'] = event['address_count']
"
remove_field => [ 'suggestion', 'address_count' ]
}
Logstash 5.x code:
ruby {
code => "
event.set('[suggestions][input]', event.get('suggestion'))
event.set('[suggestions][payload][count]', event.get('address_count'))
"
remove_field => [ 'suggestion', 'address_count' ]
}
PS: All this assumes you're using ES 2.x since the payload field has disappeared in ES 5.x

Logstash Duplicate Data

i have duplicate data in Logstash
how could i remove this duplication?
my input is:
input
input {
file {
path => "/var/log/flask/access*"
type => "flask_access"
max_open_files => 409599
}
stdin{}
}
filter
filter of files is :
filter {
mutate { replace => { "type" => "flask_access" } }
grok {
match => { "message" => "%{FLASKACCESS}" }
}
mutate {
add_field => {
"temp" => "%{uniqueid} %{method}"
}
}
if "Entering" in [api_status] {
aggregate {
task_id => "%{temp}"
code => "map['blockedprocess'] = 2"
map_action => "create"
}
}
if "Entering" in [api_status] or "Leaving" in [api_status]{
aggregate {
task_id => "%{temp}"
code => "map['blockedprocess'] -= 1"
map_action => "update"
}
}
if "End Task" in [api_status] {
aggregate {
task_id => "%{temp}"
code => "event['blockedprocess'] = map['blockedprocess']"
map_action => "update"
end_of_task => true
timeout => 120
}
}
}
Take a look at the image, the same data log, at the same time, and I just sent one log request.
i solve it
i create a unique id by ('document_id') in output section
document_id point to my temp and temp is my unique id in my project
my output changed to:
output {
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{temp}"
# sniffing => true
# manage_template => false
# index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
# document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
Executing tests in my local lab, I've just found out that logstash is sensitive to the number of its config files that are kept in /etc/logstash/conf.d directory.
If config files are more than 1, then we can see duplicates for the same record.
So, try to remove all backup configs from /etc/logstash/conf.d directory and perform logstash restart.

Resources