logstash remove_field not working in order to upload csv to elasticsearch - elasticsearch

I'm using elasticsearch, kibana and logstash 6.0.1.
I wish to upload csv data to elasticsearch by logstash and removing fields (path, #timestamp, #version, host and message). I'm showing logstash.conf and emp.csv files below. The upload will work if I don't use the remove_field instruction but I need to. Furthermore, the index was not created.
logstash.conf:
input {
file {
path => "e:\emp.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns => ["code","color"]
remove_field => ["path", "#timestamp", "#version", "host", "message"]
}
mutate {convert => ["code", "string"]}
mutate {convert => ["color", "string"]}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "emp5"
user => "elastic"
password => "password"
}
stdout {}
}
emp.csv:
1,blue
2,red
What is missing in this case?

In your csv file the data is not available that you are trying to delete.
Instead try this to delete for example path and host field:
(...)
filter {
csv {
separator => ","
columns => ["code","color"]
}
mutate {
remove_field => ["path", "host"]
}
(...)
And for information, if field path and/or host doesn't exist, there's no problem. The plugin will remove field if field exists, and just do nothing if field does not exist.
Edit:
I have tested it on fresh elastic stack:
You can delete index with:
curl -X DELETE "localhost:9200/emp5"
Also note that in your current config logstash will read the file only once.
You can change that behaviour by adding sincedb_path => "/dev/null"
or in Windows case: sincedb_path => "NUL" inside:
input {
file {
(...) # here
}
section.
Then after logstash work verify result with:
curl -X GET "localhost:9200/emp5?pretty"
{
"emp5" : {
"aliases" : { },
"mappings" : {
"doc" : {
"properties" : {
"#timestamp" : {
"type" : "date"
},
"#version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"color" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "5",
"blocks" : {
"read_only_allow_delete" : "true"
},
"provided_name" : "emp5",
"creation_date" : "1576099826712",
"number_of_replicas" : "1",
"uuid" : "reXYzqPgQryYcASoov9l5A",
"version" : {
"created" : "6080599"
}
}
}
}
}
As you can see there is no host and path field.

Related

match_only_text fields do not support sorting and aggregations elasticsearch

I would like to count and sort the number of occurred message on a field of type match_only_text. Using a DSL query the output needed to have to look like this:
{" Text message 1":615
" Text message 2":568
....}
So i tried this on kibana:
GET my_index_name/_search?size=0
{
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message"
}
}
}
}
However i get this error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "match_only_text fields do not support sorting and aggregations"
}
I am interested in the field "message" this is its mapping:
"message" : {
"type" : "match_only_text"
}
This is a part of the index mapping:
"mappings" : {
"_meta" : {
"package" : {
"name" : "system"
},
"managed_by" : "ingest-manager",
"managed" : true
},
"_data_stream_timestamp" : {
"enabled" : true
},
"dynamic_templates" : [
{
"strings_as_keyword" : {
"match_mapping_type" : "string",
"mapping" : {
"ignore_above" : 1024,
"type" : "keyword"
}
}
}
],
"date_detection" : false,
"properties" : {
"#timestamp" : {
"type" : "date"
}
.
.
.
"message" : {
"type" : "match_only_text"
},
"process" : {
"properties" : {
"name" : {
"type" : "keyword",
"ignore_above" : 1024
},
"pid" : {
"type" : "long"
}
}
},
"system" : {
"properties" : {
"syslog" : {
"type" : "object"
}
}
}
}
}
}
}
Please Help
Yes, by design, match_only_text is of the text field type family, hence you cannot aggregate on it.
You need to:
A. create a message.keyword sub-field in your mapping of type keyword:
PUT my_index_name/_mapping
{
"properties": {
"message" : {
"type" : "match_only_text",
"fields": {
"keyword": {
"type" : "keyword"
}
}
}
}
}
B. update the whole index (using _update_by_query) so the sub-field gets populated and
POST my_index_name/_update_by_query?wait_for_completion=false
Then, depending on the size of your index, call GET _tasks?actions=*byquery&detailed regularly to check the progress of the task.
C. run the aggregation on that sub-field.
POST my_index_name/_search
{
"size": 0,
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message.keyword"
}
}
}
}

How to perfrom `sum` and `avg` aggs in even if the mappings of the field is on `text` and `keyword` types?

I'm trying to perform a sum and avg aggregations on my Elasticsearch query, everything works perfectly fine but I've encountered a problem -- I want to perform the aforementioned aggs to my nested fields that are on text / keyword types.
The reason that they're as such, is because we'll be using the keywords analyzer when we are performing the search API if these specific nested field and subfields are required.
Here's my mapping:
"eng" : {
"type" : "nested",
"properties" : {
"date_updated" : {
"type" : "long"
},
"soc_angry_count" : {
"type" : "float"
},
"soc_comment_count" : {
"type" : "float"
},
"soc_dislike_count" : {
"type" : "float"
},
"soc_eng_score" : {
"type" : "float"
},
"soc_er_score" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"soc_haha_count" : {
"type" : "float"
},
"soc_kf_score" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"soc_like_count" : {
"type" : "float"
},
"soc_love_count" : {
"type" : "float"
},
"soc_mm_score" : {
"type" : "float"
},
"soc_sad_count" : {
"type" : "float"
},
"soc_save_count" : {
"type" : "float"
},
"soc_share_count" : {
"type" : "float"
},
"soc_te_score" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"soc_view_count" : {
"type" : "float"
},
"soc_wow_count" : {
"type" : "float"
}
}
}
Please focus on the soc_er_score, soc_kf_score and soc_te_score subfields of the eng nested field...
When I'm performing the following aggs, it's working fine:
'aggs' => [
'ENGAGEMENT' => [
'nested' => [
'path' => "eng"
],
'aggs' => [
'ARTICLES' => [
//Use Histogram because the pub_date is of
//long data type
//Use interval 86400 to represent 1 day
'histogram' => [
'field' => "eng.date_updated",
"interval" => "86400",
],
'aggs'= [
'SUM' => [
'sum' => [
"field" => "eng.soc_like_score"
]
]
]
]
]
]
]
Here's the output after doing the search API
BUT if the query is like this:
'aggs' => [
'ENGAGEMENT' => [
'nested' => [
'path' => "eng"
],
'aggs' => [
'ARTICLES' => [
//Use Histogram because the pub_date is of
//long data type
//Use interval 86400 to represent 1 day
'histogram' => [
'field' => "eng.date_updated",
"interval" => "86400",
],
'aggs'= [
'SUM' => [
'sum' => [
"field" => "eng.soc_te_score"
]
]
]
]
]
]
]
The output looks like this:
SOLUTIONS PERFORMED
SOLUTION 1 (for confirmation)
After reading some thorough forum discussions, I've learned that java-based parsing is available but it seems not working on my end
Here's my revised query:
'aggs'= [
'SUM' => [
'sum' => [
"field" => "Float.parseFloat(eng.soc_te_score).value"
]
]
]
But unfortunately, it's responding with null or 0 values
By the way, I'm using Laravel as my Web Framework, that's why this is how my debugger or error message window look like
Requesting for your help please, thank you in advance!
I would create another numeric subfield in addition to the keyword one. So you can use the keyword field for search and the numeric one for aggregations.
For example, modify your mapping like this:
"soc_er_score" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"numeric" : {
"type" : "long",
"ignore_malformed": true
}
}
},
You can then use:
soc_er_score for full text search
soc_er_score.keyword for sorting, terms aggregations and exact matching
soc_er_score.numeric for sum and other metric aggregations.
If you already have data in your index, simply modify the mapping by adding the new sub-field, like this:
PUT my-index/_mapping/doc
{
"properties": {
"eng": {
"soc_er_score" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"numeric" : {
"type" : "long",
"ignore_malformed": true
}
}
}
}
}
}
And then call the update by query endpoint in order to pick up the new mapping:
POST my-index/_update_by_query
When done, the eng.soc_er_score.numeric field will be indexed for all your existing documents.
I was able to solve my problem with the following simple script:
"aggs" = [
'SUM' => [
'sum' => [
"script" => "Float.parseFloat(doc['eng.soc_te_score.keyword'].value)"
]
]
];
With this, even if my nested field is of text and keyword type, I can still compute for their average and sum

Ingesting data from Spark to Elasticsearch with index template

In our existing design we are using logstash to fetch data from Kafka (JSON) and put it in ElasticSearch.
We are also using index template mapping while inserting data from logstash to ES and this could be done by setting 'template' property of ES output plugin of logstash, e.g.,
output {
elasticsearch {
template => "elasticsearch-template.json", //template file path
hosts => "localhost:9200"
template_overwrite => true
manage_template => true
codec=>plain
}
}
elasticsearch-template.json looks like below,
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "3s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256, "doc_values":true}
}
}
}
} ],
"properties" : {
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
}
}
Now we are going to replace logstash with Apache Spark and I want to use similar kind of usage of index template in Spark while inserting data to ES.
I am using elasticsearch-spark_2.11 library for this implementation.
Thanks.

Logstash issues in creating index remove .raw field in kibana

I have written a logstash conf filefor reading logs. If I use the default index, that is logstash-*, I could see .raw field in kibana. However, if I create a new index in conf file in logstash like
output{
elasticsearch {
hosts => "localhost"
index => "batchjob-*"}
}
Then the new index cant configure .raw field. Is there any resolve ways to solve it? Great Thanks.
The raw fields are created by a specific index template that the Logstash elasticsearch output creates in Elasticsearch.
What you can do is simply copy that template to a file named batchjob.json and change the template name to batchjob-* (see below)
{
"template" : "batchjob-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "omit_norms" : true},
"dynamic_templates" : [ {
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" }
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date" },
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "float" },
"longitude" : { "type" : "float" }
}
}
}
}
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => "localhost"
index => "batchjob-*"
template_name => "batchjob"
template => "/path/to/batchjob.json"
}
}

How to map geoip field in logstash with elasticsearch in order to display it in tile map of Kibana4

I'd like to display geoip fields in tile map of Kibana4.
Using the standard / automatic logstash geoip mapping to elasticsearch it all works fine.
However when creating a non-standard geoip field, I am not quite sure how to customize the elasticsearch-template.json in logstash in order to represent this field correctly in elasticsearch so that it can be chosen in Kibana4 for tile map creation.
Sure, customizing the standard template is not the best way - better create a custom template and point to it in elasticsearch output of logstash.conf. I just quickly wanted to check how the mapping has to be defined, so I modified the standard template.
My logstash.conf:
input {
tcp {
port => 514
type => syslog
}
udp {
port => 514
type => syslog
}
}
filter {
# Standard geoip field is automatically mapped by logstash to
# elastic search by using the elasticsearch-template.json file
geoip { source => "host" }
grok {
match => [
"message", "<%{POSINT:syslog_pri}>%{YEAR} %{SYSLOGTIMESTAMP:syslog_timestamp} %{DATA:device} <%{POSINT:status}> %{WORD:activity} %{DATA:inout} \(%{DATA:msg}\) Src:%{IPV4:src} SPort:%{INT:sport} Dst:%{IPV4:dst} DPort:%{INT:dport} IPP:%{INT:ipp} Rule:%{INT:rule} Interface:%{WORD:iface}",
"message", "<%{POSINT:syslog_pri}>%{YEAR} %{SYSLOGTIMESTAMP:syslog_timestamp} %{DATA:device} <%{POSINT:status}> %{WORD:activity} %{DATA:inout} \(%{DATA:msg}\) Src:%{IPV4:src} Dst:%{IPV4:dst} IPP:%{INT:ipp} Rule:%{INT:rule} Interface:%{WORD:iface}",
"message", "<%{POSINT:syslog_pri}>%{YEAR} %{SYSLOGTIMESTAMP:syslog_timestamp} %{DATA:device} <%{POSINT:status}> %{WORD:activity} %{DATA:inout} \(%{DATA:msg}\) Src:%{IPV4:src} Dst:%{IPV4:dst} Type:%{POSINT:type} Code:%{INT:code} IPP:%{INT:ipp} Rule:%{INT:rule} Interface:%{WORD:iface}"
]
}
# Is not mapped automatically by logstash in that it can be
# chosen in Kibana4 for tile map creation
geoip {
source => "src"
target => "src_geoip"
}
}
output {
elasticsearch {
host => "localhost"
protocol => "http"
}
}
My ...logstash-1.4.2\lib\logstash\outputs\elasticsearch\elasticsearch-template.json:
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
"properties" : {
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"type" : "object",
"dynamic": true,
"path": "full",
"properties" : {
"location" : { "type" : "geo_point" }
}
},
"src_geoip" : {
"type" : "object",
"dynamic": true,
"path": "full",
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
}
}
UPDATE: I havent figured out yet when this json file gets applied in elasticsearch. I followed the hints outlined in this question and copied the json file to a config/templates folder in elasticsearch directory. After deleting the indizes and restart of elasticsearch, the template was applied successfully.
Anyway, the field "src_geoip.location" still does not show up in the tile map creation form of Kibana4 (only the standard geoip.location field does).
Try overwrite template after editing template. Re-create indexes in Kibana after config change.
output {
elasticsearch {
template_overwrite => "true"
...
}
}
You also need to add objects for the src_geoip object in the index template on your elasticsearch instance. To set the default template for all indexes that match "logstash-netflow-*", execute the following on your elasticsearch instance:
curl -XPUT localhost:9200/_template/logstash-netflow -d '{
"template" : "logstash-netflow-*",
"mappings" : {
"_default_" : {
"_all" : {
"enabled" : false
},
"properties" : {
"#timestamp" : { "index" : "analyzed", "type" : "date" },
"#version" : { "index" : "analyzed", "type" : "integer" },
"src_geoip" : {
"dynamic" : true,
"type" : "object",
"properties" : {
"area_code" : { "type" : "long" },
"city_name" : { "type" : "string" },
"continent_code" : { "type" : "string" },
"country_code2" : { "type" : "string" },
"country_code3" : { "type" : "string" },
"country_name" : { "type" : "string" },
"dma_code" : { "type" : "long" },
"ip" : { "type" : "string" },
"latitude" : { "type" : "double" },
"location" : { "type" : "double" },
"longitude" : { "type" : "double" },
"postal_code" : { "type" : "string" },
"real_region_name" : { "type" : "string" },
"region_name" : { "type" : "string" },
"timezone" : { "type" : "string" }
}
},
"netflow" : { ....snipped......
}
}
}
}}'

Resources