Importing csv file into elastic search - elasticsearch

I am trying to import a huge csv file to elastic serach.Trying to use logstash for the same.
Sample csv file [Note:multiple values ]
Shop_name,Review_Title,Review_Text,,,,
Accord ,Excellent ,Nice Collection.,,,,,
Accord , Bad ,Not too comfortable,,,
Accord , Good ,excellent location and Staff,,,
Accord , Good ,Great Colletion,,,
Shopon,good, staff very good ,,,
Harrisons ,Spacious,Nice Colletion
Logstash congiguration
input {
file {
path => ["shopreview.csv"]
start_position => "beginning"
}
}
filter {
csv {
columns => [
"Shop_name",
"Review_Title",
"Review_Text"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "reviews"
document_type => "shopreview"
document_id => "%{Shop_name}"
workers => 1
}
}
here when i query for review i should get all the reviews for the particular shop.
Issue
while I query with localhost:9020/review/shopreview/Accord I am not getting all values .only 1 entry. Is the config missing something . I am little new to elk stack

Related

Logstash not importing data

I am working on an ELK stack setup I want to import data from a csv file from my PC to elasticsearch via logstash. Elasticsearch and Kibana is working properly.
Here is my logstash.conf file:
input {
file {
path => "C:/Users/aron/Desktop/es/archive/weapons.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ","
columns => ["name", "type", "country"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200/"]
index => "weapons"
document_type => "ww2_weapon"
}
stdout {}
}
And a sample row data from my .csv file looks like this:
Name
Type
Country
10.5 cm Kanone 17
Field Gun
Germany
German characters are all showing up.
I am running logstash via: logstash.bat -f path/to/logstash.conf
It starts working but it freezes and becomes unresponsive along the way, here is a screenshot of stdout
In kibana, it created the index and imported 2 documents but the data is all messed up. What am I doing wrong?
If your task is only to import that CSV you better use the file upload in Kibana.
Should be available under the following link (for Kibana > v8):
<your Kibana domain>/app/home#/tutorial_directory/fileDataViz
Logstash is used if you want to do this job on a regular basis with new files coming in over time.
You can try with below one. It is running perfectly on my machine.
input {
file {
path => "path/filename.csv"
start_position => "beginning"
sincedb_path => "NULL"
}
}
filter {
csv {
separator => ","
columns => ["field1","field2",...]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "https://localhost:9200"
user => "username" ------> if any
password => "password" ------> if any
index => "indexname"
document_type => "doc_type"
}
}

How hit_cache_size in logstash dns filter works?

I am using dns filter in logstash for my csv file. In my csv file, I have two fields. They are website and count.
Here's the sample content of my csv file:
|website|n|
|www.google.com|n1|
|www.yahoo.com|n2|
|www.bing.com|n3|
|www.stackoverflow.com|n4|
|www.smackcoders.com|n5|
|www.zoho.com|n6|
|www.quora.com|n7|
|www.elastic.co|n8|
Here's my logstash config file:
input {
file {
path => "/home/paulsteven/log_cars/cars_dns.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["website","n"]
}
dns {
resolve => [ "website" ]
action => "replace"
hit_cache_size => 8000
hit_cache_ttl => 300
failed_cache_size => 1000
failed_cache_ttl => 10
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "dnsfilter03"
document_type => "details"
}
stdout{}
}
Here's the sample data passing through logstash:
{
"#version" => "1",
"path" => "/home/paulsteven/log_cars/cars_dns.csv",
"website" => "104.28.5.86",
"n" => "n21",
"host" => "smackcoders",
"message" => "www.smackcoders.com,n21",
"#timestamp" => 2019-04-23T10:41:15.680Z
}
In the logstash config file, I want to know about hit_cache_size. What is the use of it. I read the guide of dns filter in th elastic website but unable to figure it out. I added the field in my logstash config but nothing happened. can i get any examples for that. I want to know the use of hit_cache_size. What is the job, it's doing in dns filter
The hit_cache_size allows you to store the result of a successful request, so if you need to run a dns request on the same host will look into the cache instead and only will do a dns lookup if the host is not cached.
If your data has unique hosts then there is no reason to use the hit_cache_size since the hosts only appears once.
The hit_cache_ttl works with the hit_cache_size and says how many seconds the request will be stored in the cache.

Logstash filter to identify address matches

I have a CSV file with customer addresses. I have also an Elasticsearch index with my own addresses. I use Logstash as tool to import the CSV file. I'd like to use a logstash filter to check in my index if the customer address already exists. All I found is the default elasticsearch filter ("Copies fields from previous log events in Elasticsearch to current events") which doesn't look the correct one to solve my problem. Does another filter exist for my problem?
Here my configuration file so far:
input {
file {
path => "C:/import/Logstash/customer.CSV"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
columns => [
"Customer",
"City",
"Address",
"State",
"Postal Code"
]
separator => ";"
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "customer-unmatched"
}
stdout{}
}
You don't normally have access to the data in Elasticsearch while processing your Logstash event. Consider using a pipeline on an Ingest node

Logstash multiple logs

I am following an online tutorial and have been provided with a cars.csv file and the following Logstash config file. My logstash is running perfectly well and is indexing the CSV as we speak.
The question is, I have another log file (entirely different data) which I need to parse and index into a different index.
How do I add this configuration without restarting logstash?
If above isn't possible and I edit the config file then restart logstash - it won't reindex the entire cars file will it?
If I do 2. How do I format the config for multiple styles of log file.
eg. my new log file looks like this:
01-01-2017 ORDER FAILED: £12.11 Somewhere : Fraud
Existing Config File:
input {
file {
path => "/opt/cars.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns =>
[
"maker",
"model",
"mileage",
"manufacture_year",
"engine_displacement",
"engine_power",
"body_type",
"color_slug",
"stk_year",
"transmission",
"door_count",
"seat_count",
"fuel_type",
"date_last_seen",
"date_created",
"price_eur"
]
}
mutate {
convert => ["mileage", "integer"]
}
mutate {
convert => ["price_eur", "float"]
}
mutate {
convert => ["engine_power", "integer"]
}
mutate {
convert => ["door_count", "integer"]
}
mutate {
convert => ["seat_count", "integer"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "cars"
document_type => "sold_cars"
}
stdout {}
}
Config file for orders.log
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z]+)( : (?<order_failure_reason>[A-Za-z ]+))?"}
}
mutate {
convert => ["order_amount", "float"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
}
stdout {}
}
Disclaimer: I'm a complete newbie. Second day using ELK.
For point 1, either in your logstash.yml file, you can set
config.reload.automatic:true
Or, while executing logstash with conf file, run it like:
bin/logstash -f conf-file-name.conf --config.reload.automatic
After doing either of these settings, you can start your logstash and from now on any change you make in conf file will be reflected back.
2. If above isn't possible and I edit the config file then restart logstash - it won't reindex the entire cars file will it?
If you use sincedb_path => "/dev/null", Logstash won't remember where is has stopped reading a document and will reindex it at each restart. You'll have to remove this line if you wish for Logstash to remember (see here).
3.How do I format the config for multiple styles of log file.
To support multiple style of log files, you can put tags on the file inputs (see https://www.elastic.co/guide/en/logstash/5.5/plugins-inputs-file.html#plugins-inputs-file-tags) and then use conditionals (see https://www.elastic.co/guide/en/logstash/5.5/event-dependent-configuration.html#conditionals) in your file config.
Like this:
file {
path => "/opt/cars.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
tags => [ "csv" ]
}
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
sincedb_path => "/dev/null"
tags => [] "log" ]
}
if csv in [tags] {
...
} else if log in [tags] {
...
}

Logstash Update a document in elasticsearch

Trying to update a specific field in elasticsearch through logstash. Is it possible to update only a set of fields through logstash ?
Please find the code below,
input {
file {
path => "/**/**/logstash/bin/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "multi"
}
}
filter {
csv {
separator => "|"
columns => ["GEOREFID","COUNTRYNAME", "G_COUNTRY", "G_UPDATE", "G_DELETE", "D_COUNTRY", "D_UPDATE", "D_DELETE"]
}
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
query => "GEOREFID:%{GEOREFID}"
fields => [["JSON_COUNTRY","G_COUNTRY"],
["XML_COUNTRY","D_COUNTRY"]]
}
if [G_COUNTRY] {
mutate {
update => { "D_COUNTRY" => "%{D_COUNTRY}"
}
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
document_id => "%{GEOREFID}"
}
}
We are using the above configuration when we use this the null value field is getting removed instead of skipping null value update.
Data comes from 2 different source. One is from XML file and the other is from JSON file.
XML log format : GEO-1|CD|23|John|892|Canada|31-01-2017|QC|-|-|-|-|-
JSON log format : GEO-1|AS|33|-|-|-|-|-|Mike|123|US|31-01-2017|QC
When adding one log new document will get created in the index. When reading the second log file the existing document should get updated. The update should happen only in the first 5 fields if log file is XML and last 5 fields if the log file is JSON. Please suggest us on how to do this in logstash.
Tried with the above code. Please check and can any one help on how to fix this ?
For the Elasticsearch output to do any action other than index you need to tell it to do something else.
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
document_id => "%{GEOREFID}"
}
This should probably be wrapped in a conditional to ensure you're only updating records that need updating. There is another option, though, doc_as_upsert
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
doc_as_upsert => true
document_id => "%{GEOREFID}"
}
This tells the plugin to insert if it is new, and update if it is not.
However, you're attempting to use two inputs to define a document. This makes things complicated. Also, you're not providing both inputs, so I'll improvise. To provide different output behavior, you will need to define two outputs.
input {
file {
path => "/var/log/xmlhome.log"
[other details]
}
file {
path => "/var/log/jsonhome.log"
[other details]
}
}
filter { [some stuff ] }
output {
if [path] == '/var/log/xmlhome.log' {
elasticsearch {
[XML file case]
}
} else if [path] == '/var/log/jsonhome.log' {
elasticsearch {
[JSON file case]
action => "update"
}
}
}
Setting it up like this will allow you to change the ElasticSearch behavior based on where the event originated.

Resources