Use Logstash to get nested Airflow Logs, and send to Elasticsearch - elasticsearch

I am new to Logstash and ELK as a whole. I am trying to send my airflow logs to Logstash. I am confused on how to configure my configuration file, especially because I have several (nested) log files.
My airflow is deployed on an AWS EC2 instance and my logs directory is something like this: /home/ubuntu/run/logs/scheduler/
The scheduler directory has a couple of dated folders. Using one of the folders as an example:
/home/ubuntu/run/logs/scheduler/2022-08-31.
The dated folder has files such as
testing.py.log hello_world.py.log dag_file.py.log
Now, while configuring my /etc/logstash/conf.d/(based on the log path I shared above), how can I define my path to pick all the logs?
This is what my /etc/logstash/conf.d/apache-01.conf currently looks like, but I know the path isn't accurate:
input {
file {
path => "~/home/ubuntu/run/log/scheduler/"
start_position => "beginning"
codec -> "line"
}
}
filter {
grok {
match => { "path" => "" }
}
mutate {
add_field => {
"log_id" => "%{[dag_id]}-%{[task_id]}-%{[execution_date]}-%{[try_number]}"
}
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
}
}

The path parameter needs a absolute path.
To process all py.log files you can use this input
input {
file {
path => "/home/ubuntu/run/logs/scheduler/*/*py.log"
start_position => "beginning"
codec -> "line"
}
}
To process only the files hello_world.py.log and dag_file.py.log you can use an array for your path
input {
file {
path => ["/home/ubuntu/run/logs/scheduler/*/hello_world.py.log", "/home/ubuntu/run/logs/scheduler/*/dag_file.py.log"]
start_position => "beginning"
codec -> "line"
}
}

Related

Logstash not importing data

I am working on an ELK stack setup I want to import data from a csv file from my PC to elasticsearch via logstash. Elasticsearch and Kibana is working properly.
Here is my logstash.conf file:
input {
file {
path => "C:/Users/aron/Desktop/es/archive/weapons.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ","
columns => ["name", "type", "country"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200/"]
index => "weapons"
document_type => "ww2_weapon"
}
stdout {}
}
And a sample row data from my .csv file looks like this:
Name
Type
Country
10.5 cm Kanone 17
Field Gun
Germany
German characters are all showing up.
I am running logstash via: logstash.bat -f path/to/logstash.conf
It starts working but it freezes and becomes unresponsive along the way, here is a screenshot of stdout
In kibana, it created the index and imported 2 documents but the data is all messed up. What am I doing wrong?
If your task is only to import that CSV you better use the file upload in Kibana.
Should be available under the following link (for Kibana > v8):
<your Kibana domain>/app/home#/tutorial_directory/fileDataViz
Logstash is used if you want to do this job on a regular basis with new files coming in over time.
You can try with below one. It is running perfectly on my machine.
input {
file {
path => "path/filename.csv"
start_position => "beginning"
sincedb_path => "NULL"
}
}
filter {
csv {
separator => ","
columns => ["field1","field2",...]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "https://localhost:9200"
user => "username" ------> if any
password => "password" ------> if any
index => "indexname"
document_type => "doc_type"
}
}

Generating Logs from multiple directories in Logstash. Logs not appearing in ElasticSearch

I am trying to collect logs from multiple directories through Logstash and send them to Elasticsearch.
This is my configuration:
input{
file {
path => ["/XXX/XXX/results/**/log_file.txt"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
pattern_definitions => { "THREAD_NAME" => "%{WORD}?%{NUMBER}?" }
match => { "message" => "%{SPACE}?%{TIMESTAMP_ISO8601:asctime}?%{SPACE}?\|%{SPACE}?%{THREAD_NAME:thread_name}"}
}
}
output{
elasticsearch {
hosts => ["x.x.x.x:9200"]
}
stdout { codec => rubydebug }
}
The file path is a relative path.
The logs are placed inside different directories placed inside the results directory:
results/dir/log_file.txt.
I have tried this configuration with stdin and logs appeared inside Kibana, but Logstash doesn't pick up the logs in the directories. Please advise.

Elasticsearch not recieving input from logstash

I'm running logstash where the output is set to elasticsearch on my localhost. However, when I open up elasticsearch, it appears that it did not receive any data from logstash. Logstash parses the csv file correctly, as I can see by the output in the terminal.
I've tried modifying the conf file, but the problem remains. The conf file is below
input {
file {
path => "/Users/kevinliu/Desktop/logstash_tutorial/gg.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["name","price","unit","url"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "gg-prices"
}
stdout {}
}
When I access localhost:9200/ I just see the default " "You Know, for Search" display/message from elasticsearch.

Logstash multiple logs

I am following an online tutorial and have been provided with a cars.csv file and the following Logstash config file. My logstash is running perfectly well and is indexing the CSV as we speak.
The question is, I have another log file (entirely different data) which I need to parse and index into a different index.
How do I add this configuration without restarting logstash?
If above isn't possible and I edit the config file then restart logstash - it won't reindex the entire cars file will it?
If I do 2. How do I format the config for multiple styles of log file.
eg. my new log file looks like this:
01-01-2017 ORDER FAILED: £12.11 Somewhere : Fraud
Existing Config File:
input {
file {
path => "/opt/cars.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns =>
[
"maker",
"model",
"mileage",
"manufacture_year",
"engine_displacement",
"engine_power",
"body_type",
"color_slug",
"stk_year",
"transmission",
"door_count",
"seat_count",
"fuel_type",
"date_last_seen",
"date_created",
"price_eur"
]
}
mutate {
convert => ["mileage", "integer"]
}
mutate {
convert => ["price_eur", "float"]
}
mutate {
convert => ["engine_power", "integer"]
}
mutate {
convert => ["door_count", "integer"]
}
mutate {
convert => ["seat_count", "integer"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "cars"
document_type => "sold_cars"
}
stdout {}
}
Config file for orders.log
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z]+)( : (?<order_failure_reason>[A-Za-z ]+))?"}
}
mutate {
convert => ["order_amount", "float"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
}
stdout {}
}
Disclaimer: I'm a complete newbie. Second day using ELK.
For point 1, either in your logstash.yml file, you can set
config.reload.automatic:true
Or, while executing logstash with conf file, run it like:
bin/logstash -f conf-file-name.conf --config.reload.automatic
After doing either of these settings, you can start your logstash and from now on any change you make in conf file will be reflected back.
2. If above isn't possible and I edit the config file then restart logstash - it won't reindex the entire cars file will it?
If you use sincedb_path => "/dev/null", Logstash won't remember where is has stopped reading a document and will reindex it at each restart. You'll have to remove this line if you wish for Logstash to remember (see here).
3.How do I format the config for multiple styles of log file.
To support multiple style of log files, you can put tags on the file inputs (see https://www.elastic.co/guide/en/logstash/5.5/plugins-inputs-file.html#plugins-inputs-file-tags) and then use conditionals (see https://www.elastic.co/guide/en/logstash/5.5/event-dependent-configuration.html#conditionals) in your file config.
Like this:
file {
path => "/opt/cars.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
tags => [ "csv" ]
}
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
sincedb_path => "/dev/null"
tags => [] "log" ]
}
if csv in [tags] {
...
} else if log in [tags] {
...
}

logstash file input configuration

My simple config looks like this.
I have created a dummy folder in my home directory and created some log files in it.
My config file looks like this.
input{
file{
type => "dummylog"
path => [/home/rohit/dummy/*.log" ]
}
}
output{
elasticsearch{
embedded => true
}
}
Now after running logstash i am unable to see in any files in web ui of logstash. Those files have not fetched into elasticsearch. I am using an embedded elasticsearch so no need to run a separate process. Can anyone help me where i am committing mistake?
input {
file {
path => "/home/rohit/dummy/*.log"
type => "log"
}
}
filter {
if [type] == "log" {
grok {
pattern => "%{COMBINEDAPACHELOG}"
}
}
}
output {
elasticsearch { host => localhost } stdout { } }
}
to check for the pattern if correct run
$ bin/logstash agent -f
httpd-shipper.conf --configtest
Perhaps you should fix the syntax errors in your config and then see if there is a deeper problem.
You are missing a " in the line:
path => [/home/rohit/dummy/*.log" ]
If that doesn't solve the problem, it would be wise to confirm that the user that runs the logstash process, has read access to the log(s) it is trying to read.
Be very careful of the syntax. You missed "(quotes) in the path
input{
file{
type => "dummylog"
path => ["/home/rohit/dummy/*.log"]
}
}
output{
elasticsearch{
embedded => true
}
}
Also note that for this to work, elasticsearch has to be on the same host as logstash.
In addition to the opening quotation mark you must have forgotten about:
start_position => ... # string, one of ["beginning", "end"]
(optional), default: "end"
So, it should be:
input{
file{
type => "dummylog"
path => "/home/rohit/dummy/*.log"
start_position => "beginning"
}
}
output {
elasticsearch{
embedded => true
}
}

Resources