Import text file in Elasticsearch - elasticsearch

I'd like to import a text file in Elasticsearch.
The text file contains a single (hash)value per line.
After spending several hours of struggling, I didn't get it done.
Help is greatly appreciated.
Elasticsearch 5.1.2 with Logstash installed.
Sample data:
2d75cc1bf8e57872781f9cd04a529256
00f538c3d410822e241486ca061a57ee
3f066dd1f1da052248aed5abc4a0c6a1
781770fda3bd3236d0ab8274577dddde
86b6c59aa48a69e16d3313d982791398
Need just one index 'hashes', type 'md5'

You can use duckimport, it's similar to Logstash but easy to use. I'm the developer of that

Well if you have logstash, import it with logstash.
Example config:
input {
file {
path => "/path/myfile"
start_position => "beginning"
type => "md5"
}
}
output {
elasticsearch {
index => "hashes"
}
}
assuming you run logstash on the same instance as elasticsearch.

Related

Have #timestamp in document as epoch-millis when using logstash

In a PoC that's being done in our project, we are trying out Logstash instead of our own java based indexing module to push data to ElasticSearch. The incoming json data doesn't have an #timestamp field. So when using Logstash, it's adding that field in ISO format. But we already have a specific mapping for that ES index, and it requires us to push the #timestamp in epoch-millis format.
I've tried playing with ruby filters to convert the #timestamp to epoch-millis, but no luck so far. Is there any way we can ingest records to ES through Logstash, with #timestamp being in epoch-millis format?
I'm using logstash 6.5.4 and ES 6.2.2
Update: After trying out the suggestion in the answer, my conf file looks like this:
input { stdin { } }
filter {
ruby {
code => "
epoch_ts = event.timestamp.time.localtime.strftime('%s').to_i
event.set( 'epoch', epoch_ts )
"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "myindex"
script_type => "inline"
script => 'ctx._source.#timestamp = params.event.get("epoch")'
}
stdout { codec => rubydebug }
}
But still it doesn't work. The #timestampvalue doesn't change at all. Now, I also need to remove that extra field epoch.
this ruby code should work for you:
ruby {
code => "
epoch_ts = event.timestamp.time.localtime.strftime('%s').to_f
event.set( '#timestamp', epoch_ts )
"
}
After quite a while of searching around the web, I finally gave up on this approach. Instead I forced ES to return the #timestamp in epoch_millis using this docvalue_fields approach.

reading .gz files using logstash

I am trying to use logstash 5.5 for analyzing archived (.gz) files generating every minute. Each.gz file contains csv file in it. My .conf file looks like below:
input {
file {
type => "gzip"
path => [ “C:\data*.gz” ]
start_position => "beginning"
sincedb_path=> "gzip"
codec => gzip_lines
}
}
filter {
csv {
separator => ","
columns => [“COL1”,“COL2”,“COL3”,“COL4”,“COL5”,“COL6”,“COL7”]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "mydata"
document_type => “zipdata”
}
stdout {}
}
Initially I was getting error for missing gzip_lines plugin. So, I installed it. After installing this plugin, I can see that logstash says “Succesfully started Logstash API endpoint” but nothing get indexed. I do not see any indexing of data in elasticsearch in logstash logs. When I try to get the index in Kibana, it is not available there. It means that logstash is not putting data in elasticsearch.
May be I am using wrong configuration. Please suggest, what is the correct way of doing this?

use field in index name for elasticsearch plugin logstash

I am trying to have elasticsearch index based on field so I can get an index for each source (allowing for secure access to each index).
I tried something along the lines of
output {
stdout { codec => rubydebug }
elasticsearch {
index => [SERVER]"-%{+YYYY.MM.dd}"
}
}
as well as
output {
stdout { codec => rubydebug }
elasticsearch{
index => "[SERVER]-%{+YYYY.MM.dd}"
}
}
and neither work : first errors, second tries to create the index with [SERVER] in it then errors due to uppercase, this might not be supported as I can't find it anywhere in the docs, but I was wondering if anyone has gotten something like this functional for their own ELK stacks?
The right syntax for this is "%{SERVER}-%{+YYYY.MM.dd}"
According to the documentation :
[The index to write] can be dynamic using the %{foo} syntax.

Losgstah configuration issue

I begin with logstash and ElasticSearch and I would like to index .pdf or .doc file type in ElasticSearch via logstash.
I configured logstash using the codec multiline to get my file in a single message in ElasticSearch. Below is my configuration file:
input {
file {
path => "D:/BaseCV/*"
codec => multiline {
# Grok pattern names are valid! :)
pattern => ""
what => "previous"
}
}
}
output {
stdout {
codec => "rubydebug"
}
elasticsearch {
hosts => "localhost"
index => "cvindex"
document_type => "file"
}
}
At the start of logstash the first file I add, I recovered in ElasticSearch in one message, but the following are spread over several messages. I wish I had the correspondence : 1 file = 1 message.
Is this possible ? What should I change my setup to solve the problem ?
Thank you for your feedback.

how to mention a json file for elasticsearch which can be used by kibana?

I have to show my log file(a json file) to a dashboard of kibana. I configured elasticsearch and kibana.
I tried giving elasticsearch.yml path.data: C:\Users\Rajesh\Desktop\temp (where my logs are),but while using dashboard if i am searching for any string it gives 0 Results.
Could anyone please guide me? Thanks in Advance
You can use Logstash to read your logfile and then output to elasticsearch. Then use kibana to view it.
Logstash has a lot of plugin help you to do this.
Here is an example for your reference. This is the Logstash configuration. We read all the json data from a file and then output to elasticsearch.
input {
file {
path => "/path/to/your/json/file"
codec => json_lines {
}
}
}
output {
elasticsearch {
cluster => "abc"
}
}

Resources