Kibana Timelion Is not graphing data from index - elasticsearch

I'm setting up a graph to display Cisco Netflow 9 data using ELK stack 7.7.0. Data from routers reaches logstash, then to ElasticSearch and finally to Kibana.
In Kibana I'm using Timelion to graph incoming Bytes on router interface. For that purpose I created the index cisconetflow and picked the field "in_bytes" for graphing. The Timelion expression looks like this:
.es(q='netflow.in_bytes',index=cisconetflow*)
But once I press the Update and refresh buttons I get no errors but nothing happens, no data is displayed in the graph:
If I only include the index in the Timelion expression, it shows some hits:
Simultaneously I'm running a debug on logstash and I see that Netfrow data is present:
"host" => "172.16.8.57",
"#timestamp" => 2020-05-25T20:12:38.000Z,
"netflow" => {
"in_bytes" => 1638,
"flowset_id" => 256,
"input_snmp" => 1,
"protocol" => 17,
"l4_src_port" => 9131,
"ipv4_src_addr" => "192.168.1.70",
"version" => 9,
"src_tos" => 0,
"l4_dst_port" => 9131,
"ipv4_dst_addr" => "239.255.250.250",
"dst_as" => 0,
"flow_seq_num" => 23193,
"output_snmp" => 0,
"in_pkts" => 7,
"src_as" => 0
},
Same on Kibana discover dashboard, I see netflow data coming in and the netflowin_bytes field is displayed as available.
So, any clue on what I'm missing to get the data in the chart?
Thanks.

Ok,after researching I found I was missing timefield and metric parameters in the expression, now I see traffic from the field required.

Related

Migrating 3 million records from Oracle to Elastic search using logstash

We are trying to migrate around 3 million records from oracle to Elastic Search using Logstash.
We are applying a couple of jdbc_streaming filters as a part of our logstash script, one to load connecting nested objects and another to run a hierarchical query to load data to another nested object in the index.
We are able to index 0.4 million records in 24 hours. The total size occupied by .4 million records is around 300MB.
We tried multiple approaches to migrate data quickly into elastic from oracle but were not able to achieve desired results.
Please find below the approaches we tried :
1.In the logstash script,
we used jdbc_fetch_size,
jdbc_page_size,
jdbc_paging_enabled,
clean_run parameters,
set pipeline workers to 20 and
pipeline batch size to 125 in logstash.yml file.
2. On the elastic side,
we set the number of replicas to 0,
refresh interval to -1,
tried increasing the value of indices.memory.index_buffer_size parameter, increased number of watcher queues in the elastic.yml file.
We basically googled out and followed various suggestions from this site and others too but nothing seems to work out so far.
We are using a single node elastic setup and neither the DB nor the elastic node are present on the machine from which we are running the logstash script.
Please find below the logstash config file
input {
jdbc {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
statement => "select * from "
}
}
filter{
jdbc_streaming {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
#statement => "select claimnumber,claimtype,is_active from claim where policynumber = :policynumber"
parameters => {"policynumber" => "policynumber"}
target => "nested node"
}
stdout { codec => json }
}
filter{
jdbc_streaming {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
statement => "select listagg(column name,'/' ) within group(order by column name) from
where LEVEL > 1
start with =:
connect by prior = "
parameters => {"p1" => "p1"}
target => "nested node1"
}
}
output {
elasticsearch {
hosts => [""]
index => "<index_name>"
document_id => "%{doc_id}"
}
}
Can you please help us identify bottlenecks and also make suggestions on how to increase indexing performance.
Thank You

Logstash with Elasticsearch: I am using logstash to ingest the data into the ES index. But now I want logstash to run 24/7

#file:db.conf
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:#abcd.klm.uvw:1521/qtp1"
jdbc_user =>"user_wew"
jdbc_password => "password_wew"
statement => "select col1, col2, col3, col4, col5, col6, countid,max(version) as mv from master_object_table where version >:sql_last_value group by countid"
schedule => "* * * * *"
last_run_metadata_path => "C:/ES1/ELK_stack_7.4.2/logstash-7.4.2/logstash-7.4.2/Master_refresh_a.txt"
use_column_value => true
tracking_column => "version"
}
}
filter {
mutate {
convert => {
"countid" => "string"
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index =>"refresh_index_a"
document_id =>"%{countid}"
#document_type="_doc"
}
file {
path => "C:\\ES1\\ELK_stack_7.4.2\\logstash-7.4.2\\logstash-7.4.2\\bin\\logstashESRecordsIngestionDetails_refresh_a.txt"
codec => rubydebug
}
stdout { codec => rubydebug }
}
Above is my logstash config file setting. I want to run this logstash 24/7 and also if the machine shutsdown on which this logstash is running then how can I manage that as this logstash is ingesting the live data to ES index. Please suggest. Is there any way if one server goes down the logstash on another node will continue the work.
As per the documentation
Logstash is horizontally scalable and can form groups of nodes running
the same pipeline. Logstash’s adaptive buffering capabilities will
facilitate smooth streaming even through variable throughput loads. If
the Logstash layer becomes an ingestion bottleneck, simply add more
nodes to scale out. Here are a few general recommendations:
Beats should load balance across a group of Logstash nodes.
A minimum of two Logstash nodes are recommended for high availability.
It’s common to deploy just one Beats input per Logstash node, but multiple
Beats inputs can also be deployed per Logstash node to expose
independent endpoints for different data sources.

How to turn off pre-check of how many rows are in the resultset in logstash output

I'm trying to turn off the pre-select logstash does to determine the count of rows, but ExaSol DB does not support any limits in any aggregation, is there any way to turn it off in logstash?
input {
jdbc {
jdbc_driver_library => "/opt/jdbc/exajdbc6.0.15.jar"
jdbc_driver_class => "com.exasol.jdbc.EXADriver"
jdbc_user => "am_mon"
jdbc_password => "XXXXX"
jdbc_connection_string => "jdbc:exa:xxx.xx.xx.xx..xx:xxxx"
jdbc_default_timezone => "Europe/Berlin"
# schedule => "05 7 * * *"
statement => "select local_date, LOCAL_HOUR, events from DWH_MON.V.M_EVENTS"
}
}
Logstash Error Log:
[2019-06-07T12:28:00,834][ERROR][logstash.inputs.jdbc ] Java::JavaSql::SQLException: LIMIT not allowed in aggregated selects [line 1, column 127] (Session: 1635677142479452406): SELECT count(*) AS "COUNT" FROM (select local_date, LOCAL_HOUR, events from DWH_MON.V.M_EVENTS limit 1) AS "T1" LIMIT 1
[2019-06-07T12:28:00,838][WARN ][logstash.inputs.jdbc ] Exception when executing JDBC query {:exception=>#}
As logstash wants to see how many rows are to be expected it uses limit 1, but exasol can't process any limit on aggregations.
It's a problem of LogStash, I guess. The LIMIT 1 part is unnecessary and should not be there in the first place.
You may try to use SQL pre-processor to try to identify such queries and remove LIMIT manually. But maybe it's easier to patch LogStash itself.

Create a new index in elasticsearch for each log file by date

Currently
I have completed the above task by using one log file and passes data with logstash to one index in elasticsearch :
yellow open logstash-2016.10.19 5 1 1000807 0 364.8mb 364.8mb
What I actually want to do
If i have the following logs files which are named according to Year,Month and Date
MyLog-2016-10-16.log
MyLog-2016-10-17.log
MyLog-2016-10-18.log
MyLog-2016-11-05.log
MyLog-2016-11-02.log
MyLog-2016-11-03.log
I would like to tell logstash to read by Year,Month and Date and create the following indexes :
yellow open MyLog-2016-10-16.log
yellow open MyLog-2016-10-17.log
yellow open MyLog-2016-10-18.log
yellow open MyLog-2016-11-05.log
yellow open MyLog-2016-11-02.log
yellow open MyLog-2016-11-03.log
Please could I have some guidance as to how do i need to go about doing this ?
Thanks You
It is also simple as that :
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "MyLog-%{+YYYY-MM-DD}.log"
}
}
If the lines in the file contain datetime info, you should be using the date{} filter to set #timestamp from that value. If you do this, you can use the output format that #Renaud provided, "MyLog-%{+YYYY.MM.dd}".
If the lines don't contain the datetime info, you can use the input's path for your index name, e.g. "%{path}". To get just the basename of the path:
mutate {
gsub => [ "path", ".*/", "" ]
}
wont this configuration in output section be sufficient for your purpose ??
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}

Logstash+Elasticsearch throughput

we're trying to process 5K msgs/sec with 2 identical machines, but seems like we max out logstash or elasticsearch.
Each has:
64Gb RAM, ≈3Ghz Xeon CPU
Logstash 1.5 installed
Elasticsearch 1.7.8 installed in cluster mode with second machine.
Logstash is configured to receive messages from 16-node kafka cluster and send it to Elasticsearch.
Data is CSV, contains 22 fields. Is that a normal throughput?
Here's the config:
input{
kafka {
type => "api"
zk_connect => "node1:2181,node2:2181,node3:2181"
codec => "plain"
topic_id => "api_events"
consumer_threads => 8
queue_size => 10000
rebalance_backoff_ms => 10000
rebalance_max_retries => 10
}
}
filters{
csv {
separator => "::"
columns => [
"hostname",
"status",
"body_bytes_sent",
"request_time",
"http_x_forwarded_for",
"uri",
"arg_key",
"http_user_agent",
"http_deviceid",
"http_country_code",
"http_language_code",
"http_platform",
"http_versioncode",
"request_method",
"http_x_forwarded_proto",
"upstream_cache_status",
"upstream_response_time",
"upstream_header_time",
"upstream_status",
"bytes_sent",
"time_local",
"upstream_addr"
]
remove_field => [
"message"
]
}
mutate {
convert => {
"body_bytes_sent" => "integer"
"request_time" => "float"
"upstream_response_time" => "float"
"upstream_header_time" => "float"
"bytes_sent" => "integer"
}
}
}
}
output{
elasticsearch {
cluster => "MyCluster"
protocol => "node"
index => "api-%{+YYYY.MM.dd}"
host => "elasticnode1"
flush_size => 50000
workers => 4
}
}
I'm surprised this question has not been answered. The question has been asked with rather old versions of logstash There have been multiple improvements and refactoring in Logstash since so some of the parameters will be different.
5k messages a second, at least on the face of it, sounds pretty low to achieve but of course it depends on quite a few things which are not stated. For example, wow big is each message? How many partitions is it listening from? Is each partition in the input actually receiving messages at high throughputs or is only the aggregate throughput high?
I would suggest starting with a small batch size (say 500) with 1 worker and slowly increasing the batch size till you see no improvement and then increasing the workers to make use of the cores on the machine. It is possible you're not getting each batch full enough per request per worker. The following article shows how to profile and measure how the in flight requests are doing with real arriving data:
https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html
Of course the other side of this is the Elasticsearch side. It is worth measuring that Elasticsearch is not lagging in processing all he concurrent requests. How many Elasticsearch nodes are there and what are the numbers of Client/Data nodes? There are a number of things to look out for on Elasticsearch as well when doing "heavy" indexing:
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html
I hope this helps you and others looking to do this today.

Resources