I have sample csv file in s3 with 3 column without any header. But during data transfer from s3 csv to elasticsearch, I want to give some name to each column (in my case id, name, age to column 0 to 2 respectively).
Input Sample.csv
1,myname,23
2,myname2,24
Expected Output should be following doc in ES index:
[{
"_index": "user_detail",
"_type": "user_detail_type",
"_id": "1",
"_score": 1.0,
"_source": {
"id": "1",
"name": "myname",
"age": "23"
}
},
{
"_index": "user_detail",
"_type": "user_detail_type",
"_id": "2",
"_score": 1.0,
"_source": {
"id": "2",
"name": "myname2",
"age": "24"
}
}]
Logstash config that I have written is:
input {
s3 {
bucket => "users"
region => "us-east-1"
watch_for_new_files => false
prefix => "user.csv"
}
}
filter {
// Need help here
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "user_detail"
document_type => "user_detail_type"
document_id => "%{id}"
}
}
Doubt:
What should I write in filter section or any change in config to convert column[0] => id, column[1] => name, column[2] => age during Elasticsearch insertion.
Related
I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?
In my DB, I've data in below format:
But in ElasticSearch I want to push data with respect to item types. So each record in ElasticSearch will list all item names & its values per item type.
Like this:
{
"_index": "daily_needs",
"_type": "id",
"_id": "10",
"_source": {
"item_type: "10",
"fruits": "20",
"veggies": "32",
"butter": "11",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "11",
"_source": {
"item_type: "11",
"hair gel": "50",
"shampoo": "35",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "12",
"_source": {
"item_type: "12",
"tape": "9",
"10mm screw": "7",
"blinker fluid": "78",
}
}
Can I achieve this in Logstash?
I'm new into Logstash, but as per my understanding it can be done in filter. But I'm not sure which filter to use or do I've to create a custom filter for this.
Current conf example:
input {
jdbc {
jdbc_driver_library => "ojdbc6.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "myjdbc-configs"
jdbc_user => "dbuser"
jdbc_password => "dbpasswd"
schedule => "* * * * *"
statement => "SELECT * from item_table"
}
}
filter {
## WHAT TO WRITE HERE??
}
output {
elasticsearch {
hosts => [ "http://myeshost/" ]
index => "myindex"
}
}
Kindly suggest. Thank you.
You can achieve this using aggregate filter plugin. I have not tested below, but should give you an idea.
filter {
aggregate {
task_id => "%{item_type}" #
code => "
map['Item_type'] = event.get('Item_type')
map[event.get('Item_Name')] = map[event.get('Item_Value')]
"
push_previous_map_as_event => true
timeout => 3600
timeout_tags => ['_aggregatetimeout']
}
if "aggregated" not in [tags] {
drop {}
}
}
Important Caveats for using aggregate filter:
The sql query MUST order the results by Item_Type, so the events are not out of order.
Column names in sql query should match the column names in the filter map[]
You should use ONLY ONE worker thread for aggregations otherwise events may be processed out of sequence and unexpected results will occur.
I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}
I'd like to import a text file in Elasticsearch. The text file contains 3 values per line. After spending several hours of struggling, I didn't get it done. Help is greatly appreciated.
Elasticsearch 5.4.0 with Logstash installed.
Sample data:
username email hash
username email hash
username email hash
username email hash
username email hash
also built a python script but its too slow:
import requests
import json
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
i = 1
with open("my2") as fileobject:
for line in fileobject:
username, email, hash = line.strip('\n').split(' ')
body = {"username": username, "email": email, "password": hash}
es.index(index='dbs', doc_type='db1', id=i, body=body)
i += 1
edit:
Thanks its work but i guess my filter is bad because i want it to look like this:
{
"_index": "logstash-2017.06.01",
"_type": "db",
"_id": "AVxinqK5XRvft8kN7Q6M",
"_version": 1,
"_score": null,
"_source": {
"username": "Marlb0ro",
"email": "Marlb0ro#site.com",
"hash": "123456",
}
and it put the data like this:
{
"_index": "logstash-2017.06.01",
"_type": "logs",
"_id": "AVxinqK5XRvft8kN7Q6M",
"_version": 1,
"_score": null,
"_source": {
"path": "C:/Users/user/Desktop/user/log.txt",
"#timestamp": "2017-06-01T07:46:22.488Z",
"#version": "1",
"host": "DESKTOP-FNGSJ6C",
"message": "username email password",
"tags": [
"_grokparsefailure"
]
},
"fields": {
"#timestamp": [
1496303182488
]
},
"sort": [
1496303182488
]
}
Simply put this in a file called grok.conf:
input {
file {
path => "/path/to/your/file.log"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => {"message" => "%{WORD:username} %{WORD:email} %{WORD:hash}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Then run Logstash with bin/logstash -f grok.conf and you should be ok.
Environment
DB: Sybase
Logstash: 2.2.0 with JDBC Plugin, Elasticsearch Output plugin
SQL Query:
select res.id as 'res.id', res.name as 'res.name', tag.name as 'tag.name'
from Res res, ResTags rt, Tags tag
where res.id *= rt.resrow and rt.tagid *= tag.id
SQL Result:
res.id | res.name | tag.name
0 | result0 | null
0 | result0 | tagA
1 | result1 | tagA
1 | result1 | tagB
2 | result2 | tagA
2 | result2 | tagC
Index Mapping:
{
"mappings": {
"res": {
"properties": {
"id": { "type": "long"},
"name": { "type": "string" },
"tags": {
"type": "nested",
"properties": { "tagname": { "type": "string" }}
}
}
}
}
Conf File:
input {
jdbc {
jdbc_driver_library => "jtds-1.3.1.jar"
jdbc_driver_class => "Java::net.sourceforge.jtds.jdbc.Driver"
jdbc_connection_string => "jdbc:jtds:sybase://hostname.com:1234/schema"
jdbc_user => "george"
jdbc_password => "monkey"
jdbc_fetch_size => 100
statement_filepath => "/home/george/sql"
}
}
output {
elasticsearch {
action => "update"
index => "myres"
document_type => "res"
document_id => "%{res.id}"
script_lang => "groovy"
hosts => [ "my.other.host.com:5921" ]
upsert => ' {
"id" : %{res.id},
"name" : "%{res.name}",
"tags" :[{ "tagname": "%{tag.name}" }]
}'
script => '
if (ctx._source.res.tags.containsValue(null)) {
// if null has been added replace it with actual value
cts._source.res.tags = [{"tagname": "%{tag.name}" }];
else {
// if you find the tag, then do nothing
if (ctx._source.res.tags.containsValue("%{tag.name}")) {}
else {
// if the value you try to add is not null
if (%{tag.name} != null)
// add it as a new object into the tag array
ctx._source.res.tags += {"tagname": "%{tag.name}"};
}
}
'
}
}
The GOAL is to add the multiple rows returned from the database into ES, concatenating the tags as new objects (this is simplified example, so add_tag and filters do not do the job, as I have json structure deeper than 2 levels (nested of nested, etc))
The desired outcome after the bulk upload into ES would be:
{
"hits": {
"total": 3,
"max_score": 1,
"hits": [ {
"_index": "myres",
"_type": "res",
"_id": 0,
"_score": 1,
"_source": {
"res": {
"id":0,
"name": "result0",
"tags": [{"tagname": "tagA"}],
"#version": "2",
"#timestamp": "2016-xx-yy..."
}
},{
"_index": "myres",
"_type": "res",
"_id": 1,
"_score": 1,
"_source": {
"res": {
"id":1,
"name": "result1",
"tags": [{"tagname": "tagA"},{"tagname": "tagB"}],
"#version": "2",
"#timestamp": "2016-xx-yy..."
}
}{
"_index": "myres",
"_type": "res",
"_id": 2,
"_score": 1,
"_source": {
"res": {
"id":2,
"name": "result2",
"tags": [{"tagname": "tagA"},{"tagname": "tagC"],
"#version": "2",
"#timestamp": "2016-xx-yy..."
}
}
}
...
ISSUE: if in the conf, output section the script is not commented out, the below error pops out. If the script is not included, then only the initial tags (as expected) are imported, and the second ones are not.
It looks like script is not working within elasticsearch output.
ERROR message:
[400] {"error":"ActionRequestValidationException[Validation Failed:
1: script or doc is missing;
2: script or doc is missing;
3: script or doc is missing;],"status":400]} {:class=> ... bla bla ...}
NOTES
To avoid wasting peoples' time, doc_as_upsert => true also does not work as expected. It just keeps on updating / overwriting and just keeps the latest row of the db.
Also, the river plugin for jdbc to ES does not support nested of nested structure so that does not work eithe