Elasticsearch is not creating an index received from Logstash Output file - elasticsearch

I'm have an Ubuntu 20.04 VM with Elasticsearch, Logstash and Kibana (all rel.7.7.0) What I'm trying to do is (among other things) to have Logstash to receive Syslog and Netflow traps from Cisco devices, transfer them to Elasticsearch and from there to Kibana for visualization.
I created a Logstash config file (cisco.conf) where input and output sections look like this:
input {
udp {
port => 5003
type => "syslog"
}
udp {
port => 2055
codec => netflow {
include_flowset_id => true
enable_metric => true
versions => [5, 9]
}
}
}
output {
stdout { codec => rubydebug }
if [type] == "syslog" {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "ciscosyslog-%{+YYYY.MM.dd}"
}
}
if [type] == "netflow" {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "cisconetflow-%{+YYYY.MM.dd}"
}
}
}
The problem is: the index ciscosyslog is created in Elasticsearch with no problem:
$ curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open ciscosyslog-2020.05.21 BRshOOnoQ5CsdVn3l0Z3kw 1 1 1438 0 338.4kb 338.4kb
green open .async-search dpd-HWYJSyW653u7BAhQVg 1 0 2 0 34.1kb 34.1kb
green open .kibana_1 xA5PIwKsTHCeOFyj9_NIQA 1 0 111 8 231.9kb 231.9kb
yellow open ciscosyslog-2020.05.22 kB4vJAooT3-fbIg0dKKt8w 1 1 566 0 159.2kb 159.2kb
However the index cisconetflow is not created as seen in the above table.
I made a debug on Logstash and I can see netflow messages arriving from Cisco devices:
[WARN ] 2020-05-22 17:57:04.999 [[main]>worker1] Dissector - Dissector mapping, field not found in event {"field"=>"message", "event"=>{"host"=>"10.200.8.57", "#timestamp"=>2020-05-22T21:57:04.000Z, "#version"=>"1", "netflow"=>{"l4_src_port"=>443, "version"=>9, "l4_dst_port"=>41252, "src_tos"=>0, "dst_as"=>0, "protocol"=>6, "in_bytes"=>98, "flowset_id"=>256, "src_as"=>0, "ipv4_dst_addr"=>"10.200.8.57", "input_snmp"=>1, "output_snmp"=>4, "ipv4_src_addr"=>"104.244.42.133", "in_pkts"=>1, "flow_seq_num"=>17176}}}
[WARN ] 2020-05-22 17:57:04.999 [[main]>worker1] Dissector - Dissector mapping, field not found in event {"field"=>"message", "event"=>{"host"=>"10.200.8.57", "#timestamp"=>2020-05-22T21:57:04.000Z, "#version"=>"1", "netflow"=>{"l4_src_port"=>443, "version"=>9, "l4_dst_port"=>39536, "src_tos"=>0, "dst_as"=>0, "protocol"=>6, "in_bytes"=>79, "flowset_id"=>256, "src_as"=>0, "ipv4_dst_addr"=>"10.200.8.57", "input_snmp"=>1, "output_snmp"=>4, "ipv4_src_addr"=>"104.18.252.222", "in_pkts"=>1, "flow_seq_num"=>17176}}}
{
"host" => "10.200.8.57",
"#timestamp" => 2020-05-22T21:57:04.000Z,
"#version" => "1",
"netflow" => {
"l4_src_port" => 57654,
"version" => 9,
"l4_dst_port" => 443,
"src_tos" => 0,
"dst_as" => 0,
"protocol" => 6,
"in_bytes" => 7150,
"flowset_id" => 256,
"src_as" => 0,
"ipv4_dst_addr" => "104.244.39.20",
"input_snmp" => 4,
"output_snmp" => 1,
"ipv4_src_addr" => "172.16.1.21",
"in_pkts" => 24,
"flow_seq_num" => 17176
}
But at this point I can't tell if Logstash is not delivering the information to ES or if ES is failing to create the index, Current facts are:
a) Netflow traffic is present at Logstash input
b) ES is creating only one of the two indexes received from Logstash.
Thanks.

You have conditionals in your output, using the type field, your first input is adding this field with its correct value, but your second input does not have the field, so it will never match your conditional.
Add the line type => "netflow" in your second input as you did with your first one.

Related

Logstash pagination quits early

I'm having a problem that i could not crack by googling, we are doing a load with jdbc plugin, using explicit pagination. when pipeline runs it is loading about 3.2 million records and then quits without errors like it finished successfully, but it should load around 6.4 million records. Here are our configuration:
input {
jdbc {
id => "NightlyRun"
jdbc_connection_string => "*******"
jdbc_driver_class => "Driver"
jdbc_user => "${USER}"
jdbc_password => "${PASS}"
lowercase_column_names => "false"
jdbc_paging_enabled => true
jdbc_page_size => 50000
jdbc_paging_mode => "explicit"
schedule => "5 2 * * *"
statement_filepath => "/usr/share/logstash/sql-files/sqlQuery1.sql"
}
}
}
output {
elasticsearch {
hosts => ["${ELASTIC_HOST}:9200"]
index => "index"
user => logstash
password => "${PASSWORD}"
document_id => "%{NUMBER}-%{value}"
}
}
And sql query we use:
declare #PageSize int
declare #Offset integer
set #PageSize=:size
set #Offset=:offset;
WITH cte AS
(
SELECT
id
FROM
entry
ORDER BY CREATE_TIMESTAMP
OFFSET #Offset ROWS
FETCH NEXT #PageSize ROWS ONLY
)
select * from entry
where entry.id=cte.id
the cte select count(*) from entry returns the expected 6.4 million records but logstash loads only 3.2 million before quitting. How can I ensure logstash loads all the records.
I tried running the query in database and setting offset to 3200000 and page size to 50000, database returns results, so it is not likely a database issue.

Logstash :sql_last_value is showing wrong junk date (Showing 6 month's old date as last run time)

I am observing very strange issue
I am using logstash + jdbc to load data from Oracle db to Elasticsearch
Below is how my config file looks like
input{
jdbc{
clean_run => "false"
jdbc_driver_library => "<path_to_ojdbc8-12.1.0.jar>"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "<connection_string>"
jdbc_user => "<usename>"
jdbc_password_filepath => ".\pwd.txt"
statement=> "SELECT * FROM customers WHERE CUSTOMER_NAME LIKE 'PE%' AND UPD_DATE > :sql_last_value "
schedule=>"*/1 * * * * "
use_column_value => true
tracking_column_type => "timestamp"
tracking_column => "upd_date"
last_run_metadata_path =>"<path to logstash_metadata>"
record_last_run => true
}
}
filter {
mutate {
copy => { "id" => "[#metadata][_id]"}
remove_field => ["#version","#timestamp"]
}
}
output {
elasticsearch{
hosts => ["<host>"]
index => "<index_name>"
document_id=>"%{[#metadata][_id]}"
user => "<user>"
password => "<pwd>"
}
stdout{
codec => dots
}
}
Now , i am triggering this file every minute on today that is March 8th 2021.
when i load for first-time , its all good -:sql_last_value is '1970-01-01 00:00:00.000000 +00:00'
But after this first load , ideally logstash_metadata should be showing '2021-03-08 <HH:MM:ss>' But strangely it is getting update as --- 2020-09-11 01:05:09.000000000 Z in logstash_metadata (:sql_last_value)
As you can see the difference is near about 180 days
I tried multiple times but still it is updating in the same way , Due to this my incremental load is getting screwed
My logstash Version is 7.10.2
Help is much appreciated!
NOTE: I am not using pagination as the number of results in the resultset are always very low in number for my query
The recorded date is the date of the last processed row.
Seeing your query , you don't have a specific order for the records read from DB.
Logstash jdbc input plugin encloses your query to one that orders rows by [1], 1 being the ordinal of the column it orders by.
So to process records in a correct order and get the latest upd_date value you need have upd_date be the first column in the select statement.
input{
jdbc{
clean_run => "false"
jdbc_driver_library => "<path_to_ojdbc8-12.1.0.jar>"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "<connection_string>"
jdbc_user => "<usename>"
jdbc_password_filepath => ".\pwd.txt"
statement=> "SELECT c.UPD_DATE, c.CUSTOMER_NAME, c.<Other field>
FROM customers c
WHERE c.CUSTOMER_NAME LIKE 'PE%' AND c.UPD_DATE > :sql_last_value
ORDER BY c.UPD_DATE ASC"
schedule=>"*/1 * * * * "
use_column_value => true
tracking_column_type => "timestamp"
tracking_column => "upd_date"
last_run_metadata_path =>"<path to logstash_metadata>"
record_last_run => true
}
}
Also note that this approach will exhaust the table the first time logstash runs, even if you set up jdbc_page_size. If you want this, that's fine.
But if you want logstash to run one batch of X rows every minute and stop until the next execution, then you must use a combination of jdbc_page_size and query with limits to make logstash retrieve exactly the amount of records you want, in the correct order. In SQL Server it work like that:
input{
jdbc{
jdbc_driver_library => ...
jdbc_driver_class => ...
jdbc_connection_string => ...
jdbc_user => ...
jdbc_password_filepath => ...
statement=> "SELECT TOP 10000 c.UPD_DATE, c.CUSTOMER_NAME
FROM customers c
WHERE c.CUSTOMER_NAME LIKE 'PE%' AND c.UPD_DATE > :sql_last_value
ORDER BY c.UPD_DATE ASC"
schedule=>"*/1 * * * * "
use_column_value => true
tracking_column_type => "timestamp"
tracking_column => "upd_date"
jdbc_page_size => 10000
last_run_metadata_path =>"<path to logstash_metadata>"
record_last_run => true
}
}
For Oracle DB you'll have to change your query depending on the version, either using
FETCH FIRST x ROWS ONLY; with Oracle 12, or ROWNUM for older versions.
In any case, I suggest you take a look at the logs to check the queries logstash runs.

Logstash aggregate filter didnt add new field into index in-fact didn't created any index after aggregation filter add

Dears,
i have created two grok pattern for in single log file , i want add existing field to another document if condition match , could you advice me how to add one existing field to another document
my log input
my input
INFO [2020-05-21 18:00:17,240][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - ValidationRuleSegmentStops - CarrierCode = AI ServiceNumber = 0531 DeparturePortCode = DEL ArrivalPortCode = CCJ DepartureDateTime = Thu May 21 XXXXX AST 2020 ArrivalDateTime = Thu May 21 XXX
WARN [2020-05-21 18:00:17,242][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - ValidationRuleSegmentStops - Multiple segment stops with departure datetime not set - only one permitted. Message sequence number 374991954 discarded
INFO [2020-05-21 18:00:17,242][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - SensitiveDataFilterHelper - Sensitive Data Filter key is not enabled.
ERROR [2020-05-21 18:00:17,243][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - AbstractMessageHandler - APP_LOGICAL_VALIDATION_FAILURE: comment1 = Multiple segment stops with departure datetime not set - only one permitted. Message sequence number 374991954 discarded
my filter
filter {
if [type] == "server" {
grok {
match => [ "message", "%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<seg_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+CarrierCode\s+=\s+(?<carrierCode>[A-Z0-9]{2})\s+ServiceNumber\s+=\s+(?<Number>[0-9]{4})\s+DeparturePortCode\s+=\s+(?<DeparturePort>[A-Z]{3})\s+ArrivalPortCode\s+=\s+(?<ArrivalPort>[A-Z]{3})\s+DepartureDateTime\s+=\s+%{DATESTAMP_OTHER:departure_datetime}\s+ArrivalDateTime\s+=\s+%{DATESTAMP_OTHER:arrival_datetime},%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<failed_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+(?<app-logical-error>[A-Z]{3}\_[A-Z]{7}\_[A-Z]{10}\_[A-Z]{7})\:\s+comment1 = Multiple segment stops with\s+%{WORD:direction}\s+datetime not set - only one permitted. Message sequence number\s+%{NUMBER:appmessageid:int}"]
}
}
if [failed_corid] == "%{seg_corid}" {
aggregate {
task_id => "%{appmessageid}"
code => "map['carrierCode'] = [carrierCode]"
map_action => "create"
end_of_task => true
timeout => 120
}
}
mutate { remove_field => ["message"]}
if "_grokparsefailure" in [tags]{drop {} }
}
output {
if [type] == "server" {
elasticsearch {
hosts => ["X.X.X.X:9200"]
index => "app-data-%{+YYYY-MM-DD}"
}
}
}
my requred fields found in defferent grok patter for example
carrier code found in
%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<seg_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+CarrierCode\s+=\s+(?<carrierCode>[A-Z0-9]{2})\s+ServiceNumber\s+=\s+(?<Number>[0-9]{4})\s+DeparturePortCode\s+=\s+(?<DeparturePort>[A-Z]{3})\s+ArrivalPortCode\s+=\s+(?<ArrivalPort>[A-Z]{3})\s+DepartureDateTime\s+=\s+%{DATESTAMP_OTHER:departure_datetime}\s+ArrivalDateTime\s+=\s+%{DATESTAMP_OTHER:arrival_datetime},
and failed corid found in
%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<failed_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+(?<app-logical-error>[A-Z]{3}\_[A-Z]{7}\_[A-Z]{10}\_[A-Z]{7})\:\s+comment1 = Multiple segment stops with\s+%{WORD:direction}\s+datetime not set - only one permitted. Message sequence number\s+%{NUMBER:appmessageid:int}
we want to merge carriercode field into failed_corid documents
kindly help us how to do it in logstash
expected output fields in the documents
{
"failed_corid": [id-433erfdtert3er]
"carrier_code": AI
}

Logstash jdbc-input-plugin configuration for initial sql_last_value

I synchronise data in Oracle database and ElasticSearch instance.
Database table "SYNC_TABLE" has the following columns: "ID" which is a NUMBER, "LAST_MODIFICATION" - TIMESTAMP, "TEXT" - VARCHAR2.
I use Logstash with jdbc-input-plugin in order to perform data syncronisation on a regular basis.
This is the Logstash configuration file:
input {
jdbc {
jdbc_driver_library => "ojdbc6.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:#localhost:1521:XE"
jdbc_user => "******"
jdbc_password => "******"
schedule => "* * * * *"
statement => "SELECT * from SYNC_TABLE where LAST_MODIFICATION >= :sql_last_value"
tracking_column => "LAST_MODIFICATION"
tracking_column_type => "timestamp"
use_column_value => true
}
}
output {
elasticsearch {
index => "SYNC_TABLE"
document_type => "SYNCED_DATA"
document_id => "%{ID}"
hosts => "localhost:9200"
}
stdout { codec => rubydebug }
}
I'd like to import all the data on the first run and then syncronise only diff between the last run and current time.
So I expect Logstash to make the following queries:
SELECT * from SYNC_TABLE where LAST_MODIFICATION >= '1 January 1970 00:00'
and then regularly
SELECT * from SYNC_TABLE where LAST_MODIFICATION >= 'time of last run'
Documentation says that initial value for should be 1 January 1970, but I see in my logs that instead it takes current timestamp.
This is the first query:
SELECT * from SYNC_TABLE where LAST_MODIFICATION >= TIMESTAMP '2017-08-14 09:17:00.481000 +00:00'
Is there any mistake in logstash configuration file that makes the logstash use current timestamp instead of default ('1 January 1970 00:00')?
The problem was in .logstash_jdbc_last_run file that contained the sql_last_value from previous runs.
I've removed this file and restarted logstash.

Logstash Grok Parsing Issue

I am using Logstash to read some log file.
Here are some data sources records
<2016-07-07 00:31:01> Start
<2016-07-07 00:31:59> Warning - Export_Sysem 6 (1) => No records to be exported
<2016-07-07 00:32:22> Export2CICAP (04) => Export PO : 34 record(s)
<2016-07-07 00:32:22> Export2CICAP (04) => Export CO : 87 record(s)
<2016-07-07 00:32:22> Export2CICAP (04) => Export FC
This is my conf file
grok{
match => {"message" => [
'<%{TIMESTAMP_ISO8601:Timestamp}> (%{WORD:Level} - )%{NOTSPACE:Job_Code} => %{GREEDYDATA:message}',
'<%{TIMESTAMP_ISO8601:Timestamp}> %{WORD:Parameter} - %{GREEDYDATA:Message}',
'<%{TIMESTAMP_ISO8601:Timestamp}> %{WORD:Status}',
]}
}
This is part of my output
{
"message" => "??2016-07-07 00:31:01> Start\r?",
"#version" => "1",
"#timestamp" => "2016-07-08T03:22:01.076Z",
"path" => "C:/CIGNA/Export.log",
"host" => "SIMSPad",
"type" => "txt",
"tags" => [
[0] "_grokparsefailure"
]
}
{
"message" => "<2016-07-07 00:31:59> Warning - Export_Sysem 6 (1) => No records to be exported\r?",
"#version" => "1",
"#timestamp" => "2016-07-06T16:31:59.000Z",
"path" => "C:/CIGNA/Export.log",
"host" => "SIMSPad",
"type" => "txt",
"Timestamp" => "2016-07-07 00:31:59",
"Parameter" => "Warning",
"Message" => "Export_Sysem 6 (1) => No records to be exported\r?"
}
{
"message" => "<2016-07-07 00:32:22> Export2CICAP (04) => Export CO : 87 record(s)\r?",
"#version" => "1",
"#timestamp" => "2016-07-06T16:32:22.000Z",
"path" => "C:/CIGNA/Export.log",
"host" => "SIMSPad",
"type" => "txt",
"Timestamp" => "2016-07-07 00:32:22",
"Status" => "Export2CICAP"
}
As seen from the output, the first output message has a grok parsing error and the other 2 outcomes did not fully parse the message. How should I modify my grok statement so it can fully parse the message?
For the first message, the problem comes from the two ?? that do no appear in the pattern, thus creating the _grokparsefailure.
The second and third message are not fully parsed because the first two pattern do not match the messages and so the message are parsed by the last pattern.
For the second message, if you wish to parse it with the first pattern (<%{TIMESTAMP_ISO8601:Timestamp}> (%{WORD:Level} - )%{NOTSPACE:Job_Code} => %{GREEDYDATA:message}), your pattern is false:
() around %{WORD:Level} - that do not appear in the log.
There is a space missing between :Timestamp}> and %{WORD:Level}. In the log there is two and only one in the pattern. Note that you can use %{SPACE} to avoid this problem (since %{SPACE} will match any number of space)
The %{NOTSPACE:Job_Code} match a sequence of character without any space, but there is a space in Export_Sysem 6 (1), so the Job_Code will be Export_Sysem and the => in the pattern will prevent successful matching with the first pattern.
Correct pattern :
<%{TIMESTAMP_ISO8601:Timestamp}> %{WORD:Level} - %{DATA:Job_Code} => %{GREEDYDATA:message}
For the third message, I don't see which pattern should be used.
If you add more details, I'll update my answer.
For reference: grok pattern definitions

Resources