I'd like to import a text file in Elasticsearch. The text file contains 3 values per line. After spending several hours of struggling, I didn't get it done. Help is greatly appreciated.
Elasticsearch 5.4.0 with Logstash installed.
Sample data:
username email hash
username email hash
username email hash
username email hash
username email hash
also built a python script but its too slow:
import requests
import json
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
i = 1
with open("my2") as fileobject:
for line in fileobject:
username, email, hash = line.strip('\n').split(' ')
body = {"username": username, "email": email, "password": hash}
es.index(index='dbs', doc_type='db1', id=i, body=body)
i += 1
edit:
Thanks its work but i guess my filter is bad because i want it to look like this:
{
"_index": "logstash-2017.06.01",
"_type": "db",
"_id": "AVxinqK5XRvft8kN7Q6M",
"_version": 1,
"_score": null,
"_source": {
"username": "Marlb0ro",
"email": "Marlb0ro#site.com",
"hash": "123456",
}
and it put the data like this:
{
"_index": "logstash-2017.06.01",
"_type": "logs",
"_id": "AVxinqK5XRvft8kN7Q6M",
"_version": 1,
"_score": null,
"_source": {
"path": "C:/Users/user/Desktop/user/log.txt",
"#timestamp": "2017-06-01T07:46:22.488Z",
"#version": "1",
"host": "DESKTOP-FNGSJ6C",
"message": "username email password",
"tags": [
"_grokparsefailure"
]
},
"fields": {
"#timestamp": [
1496303182488
]
},
"sort": [
1496303182488
]
}
Simply put this in a file called grok.conf:
input {
file {
path => "/path/to/your/file.log"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => {"message" => "%{WORD:username} %{WORD:email} %{WORD:hash}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Then run Logstash with bin/logstash -f grok.conf and you should be ok.
Related
I have sample csv file in s3 with 3 column without any header. But during data transfer from s3 csv to elasticsearch, I want to give some name to each column (in my case id, name, age to column 0 to 2 respectively).
Input Sample.csv
1,myname,23
2,myname2,24
Expected Output should be following doc in ES index:
[{
"_index": "user_detail",
"_type": "user_detail_type",
"_id": "1",
"_score": 1.0,
"_source": {
"id": "1",
"name": "myname",
"age": "23"
}
},
{
"_index": "user_detail",
"_type": "user_detail_type",
"_id": "2",
"_score": 1.0,
"_source": {
"id": "2",
"name": "myname2",
"age": "24"
}
}]
Logstash config that I have written is:
input {
s3 {
bucket => "users"
region => "us-east-1"
watch_for_new_files => false
prefix => "user.csv"
}
}
filter {
// Need help here
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "user_detail"
document_type => "user_detail_type"
document_id => "%{id}"
}
}
Doubt:
What should I write in filter section or any change in config to convert column[0] => id, column[1] => name, column[2] => age during Elasticsearch insertion.
I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?
This is my json log file. I'm trying to store the file to my elastic-Search through my logstash.
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":
["Action", "Adventure", "Sci-Fi"] }
after storing the data into the elasticSearch, my results is as follow
{
"_index": "filebeat-6.2.4-2018.11.09",
"_type": "doc",
"_id": "n-J39mYB6zb53NvEugMO",
"_score": 1,
"_source": {
"#timestamp": "2018-11-09T03:15:32.262Z",
"source": "/Users/jinwoopark/Jin/json_files/testJson.log",
"offset": 106,
"message": """{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }""",
"id": "%{id}",
"#version": "1",
"host": "Jinui-MacBook-Pro.local",
"tags": [
"beats_input_codec_plain_applied"
],
"prospector": {
"type": "log"
},
"title": "%{title}",
"beat": {
"name": "Jinui-MacBook-Pro.local",
"hostname": "Jinui-MacBook-Pro.local",
"version": "6.2.4"
}
}
}
What I'm trying to do is that,
I want to store only "genre value" into the message field, and store other values(ex id, title) into extra fields(the created fields, which is id and title field). but the extra fields were stored with empty values(%{id}, %{title}). It seems like I need to modify my logstash json filter, but here I need your help.
my current configuration of logstash is as follow
input {
beats {
port => 5044
}
}
filter {
json {
source => "genre" //want to store only genre (from json log) into message field
}
mutate {
add_field => {
"id" => "%{id}" // want to create extra field for id value from log file
"title" => "%{title}" // want to create extra field for title value from log file
}
}
date {
match => [ "timestamp", "dd/MM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}
When you tell the json filter that the source is genre, it should ignore the rest of the document, which would explain why you don't get an id or title.
Seems like you should parse the entire json document, and use the mutate->replace plugin to move the contents of genre to message.
Environment
DB: Sybase
Logstash: 2.2.0 with JDBC Plugin, Elasticsearch Output plugin
SQL Query:
select res.id as 'res.id', res.name as 'res.name', tag.name as 'tag.name'
from Res res, ResTags rt, Tags tag
where res.id *= rt.resrow and rt.tagid *= tag.id
SQL Result:
res.id | res.name | tag.name
0 | result0 | null
0 | result0 | tagA
1 | result1 | tagA
1 | result1 | tagB
2 | result2 | tagA
2 | result2 | tagC
Index Mapping:
{
"mappings": {
"res": {
"properties": {
"id": { "type": "long"},
"name": { "type": "string" },
"tags": {
"type": "nested",
"properties": { "tagname": { "type": "string" }}
}
}
}
}
Conf File:
input {
jdbc {
jdbc_driver_library => "jtds-1.3.1.jar"
jdbc_driver_class => "Java::net.sourceforge.jtds.jdbc.Driver"
jdbc_connection_string => "jdbc:jtds:sybase://hostname.com:1234/schema"
jdbc_user => "george"
jdbc_password => "monkey"
jdbc_fetch_size => 100
statement_filepath => "/home/george/sql"
}
}
output {
elasticsearch {
action => "update"
index => "myres"
document_type => "res"
document_id => "%{res.id}"
script_lang => "groovy"
hosts => [ "my.other.host.com:5921" ]
upsert => ' {
"id" : %{res.id},
"name" : "%{res.name}",
"tags" :[{ "tagname": "%{tag.name}" }]
}'
script => '
if (ctx._source.res.tags.containsValue(null)) {
// if null has been added replace it with actual value
cts._source.res.tags = [{"tagname": "%{tag.name}" }];
else {
// if you find the tag, then do nothing
if (ctx._source.res.tags.containsValue("%{tag.name}")) {}
else {
// if the value you try to add is not null
if (%{tag.name} != null)
// add it as a new object into the tag array
ctx._source.res.tags += {"tagname": "%{tag.name}"};
}
}
'
}
}
The GOAL is to add the multiple rows returned from the database into ES, concatenating the tags as new objects (this is simplified example, so add_tag and filters do not do the job, as I have json structure deeper than 2 levels (nested of nested, etc))
The desired outcome after the bulk upload into ES would be:
{
"hits": {
"total": 3,
"max_score": 1,
"hits": [ {
"_index": "myres",
"_type": "res",
"_id": 0,
"_score": 1,
"_source": {
"res": {
"id":0,
"name": "result0",
"tags": [{"tagname": "tagA"}],
"#version": "2",
"#timestamp": "2016-xx-yy..."
}
},{
"_index": "myres",
"_type": "res",
"_id": 1,
"_score": 1,
"_source": {
"res": {
"id":1,
"name": "result1",
"tags": [{"tagname": "tagA"},{"tagname": "tagB"}],
"#version": "2",
"#timestamp": "2016-xx-yy..."
}
}{
"_index": "myres",
"_type": "res",
"_id": 2,
"_score": 1,
"_source": {
"res": {
"id":2,
"name": "result2",
"tags": [{"tagname": "tagA"},{"tagname": "tagC"],
"#version": "2",
"#timestamp": "2016-xx-yy..."
}
}
}
...
ISSUE: if in the conf, output section the script is not commented out, the below error pops out. If the script is not included, then only the initial tags (as expected) are imported, and the second ones are not.
It looks like script is not working within elasticsearch output.
ERROR message:
[400] {"error":"ActionRequestValidationException[Validation Failed:
1: script or doc is missing;
2: script or doc is missing;
3: script or doc is missing;],"status":400]} {:class=> ... bla bla ...}
NOTES
To avoid wasting peoples' time, doc_as_upsert => true also does not work as expected. It just keeps on updating / overwriting and just keeps the latest row of the db.
Also, the river plugin for jdbc to ES does not support nested of nested structure so that does not work eithe
After processing data with: input | filter | output > ElasticSearch the format it's get stored in is somewhat like:
"_index": "logstash-2012.07.02",
"_type": "stdin",
"_id": "JdRaI5R6RT2do_WhCYM-qg",
"_score": 0.30685282,
"_source": {
"#source": "stdin://dist/",
"#type": "stdin",
"#tags": [
"tag1",
"tag2"
],
"#fields": {},
"#timestamp": "2012-07-02T06:17:48.533000Z",
"#source_host": "dist",
"#source_path": "/",
"#message": "test"
}
I filter/store most of the important information in specific fields, is it possible to leave out the default fields like: #source_path and #source_host? In the near future it's going to store 8 billion logs/month and I would like to run some performance tests with this default fields excluded (I just don't use these fields).
This removes fields from output:
filter {
mutate {
# remove duplicate fields
# this leaves timestamp from message and source_path for source
remove => ["#timestamp", "#source"]
}
}
Some of that will depend on what web interface you are using to view your logs. I'm using Kibana, and a customer logger (c#) that indexes the following:
{
"_index": "logstash-2013.03.13",
"_type": "logs",
"_id": "n3GzIC68R1mcdj6Wte6jWw",
"_version": 1,
"_score": 1,
"_source":
{
"#source": "File",
"#message": "Shalom",
"#fields":
{
"tempor": "hit"
},
"#tags":
[
"tag1"
],
"level": "Info"
"#timestamp": "2013-03-13T21:47:51.9838974Z"
}
}
This shows up in Kibana, and the source fields are not there.
To exclude certain fields you can use prune filter plugin.
filter {
prune {
blacklist_names => [ "#timestamp", "#source" ]
}
}
Prune filter is not a logstash default plugin and must be installed first:
bin/logstash-plugin install logstash-filter-prune