how to transfer data to elastic via logstast and using analyzer? - elasticsearch

I have a logstash config file below. Elastic is reading my data as a b where as i want it to read it as ab i found i need to use not_analyzed for my sscat filed and max_shingle_size , min_shingle_size for products to get the best result.
Should I use not_analyzed for products field as well? Will that give better result?
How should I fill my my_id_analyzer to actually use the analyzer on different fields?
How should I connect the template with logstash config file?
input{
file{
path => "path"
start_position =>"beginning"
}
}
filter{
csv{
separator => ","
columns => ["Index", "Category", "Scat", "Sscat", "Products", "Measure", "Price", "Description", "Gst"]
}
mutate{convert => ["Index", "float"] }
mutate{convert => ["Price", "float"] }
mutate{convert => ["Gst", "float"] }
}
output{
elasticsearch{
hosts => "host"
user => "elastic"
password => "pass"
index => "masterdb"
}
}
I also have a template that can do it for all the future files that i upload
curl user:pass host:"host" /_template/logstash-id -XPUT -d '{
"template": "logstash-*",
"settings" : {
"analysis": {
"analyzer": {
"my_id_analyzer"{
}
}
}
}
},
"mappings": {
"properties" : {
"id" : { "type" : "string", "analyzer" : "my_id_analyzer" }
}
}
}'

You can use "ignore_above:" to restrict to a max length along with "not_analyzed" while creating mapping so that text doesn't get analyzed.
Declaring type as keyword instead of text will be other alternative for you.
Regarding the connecting template with logstash, why you need this? Once you have template created on elasticsearch, you can create your index which will follow the created template definition and you can start indexing.

Related

Elasticsearch - Reindex documents with stored / excluded fields

Im having an index mapping with the following configuration:
"mappings" : {
"_source" : {
"excludes" : [
"special_field"
]
},
"properties" : {
"special_field" : {
"type" : "text",
"store" : true
},
}
}
So, when A new document is indexed using this mapping a got de following result:
{
"_index": "********-2021",
"_id": "************",
"_source": {
...
},
"fields": {
"special_field": [
"my special text"
]
}
}
If a _search query is perfomed, special_field is not returned inside _source as its excluded.
With the following _search query, special_field data is returned perfectly:
GET ********-2021/_search
{
"stored_fields": [ "special_field" ],
"_source": true
}
Right now im trying to reindex all documents inside that index, but im loosing the info stored in special_field and only _source field is getting reindexed.
Is there a way to put that special_field back inside _source field?
Is there a way to reindex that documents without loosing special_field data?
How could these documents be migrated to another cluster without loosing special_field data?
Thank you all.
Thx Hamid Bayat, I finally got it using a small logstash pipeline.
I will share it:
input {
elasticsearch {
hosts => "my-first-cluster:9200"
index => "my-index-pattern-*"
user => "****"
password => "****"
query => '{ "stored_fields": [ "special_field" ], "_source": true }'
size => 500
scroll => "5m"
docinfo => true
docinfo_fields => ["_index", "_type", "_id", "fields"]
}
}
filter {
if [#metadata][fields][special_field]{
mutate {
add_field => { "special_field" => "%{[#metadata][fields][special_field]}" }
}
}
}
output {
elasticsearch {
hosts => ["http://my-second-cluster:9200"]
password => "****"
user => "****"
index => "%{[#metadata][_index]}"
document_id => "%{[#metadata][_id]}"
template => "/usr/share/logstash/config/index_template.json"
template_name => "template-name"
template_overwrite => true
}
}
I had to add fields into docinfo_fields => ["_index", "_type", "_id", "fields"] elasticsearch input plugin and all my stored_fields were on [#metadata][fields] event field.
As the #metadata field is not indexed i had to add a new field at root level with [#metadata][fields][special_field] value.
Its working like a charm.

Converting fields from String to Date in Logstash

I'm trying to index emails into elasticsearch with logstash
My conf file is like this :
sudo bin/logstash -e 'input
{ imap
{ host => "imap.googlemail.com"
password => "********"
user => "********#gmail.com"
port => 993
secure => "true"
check_interval => 10
folder => "Inbox"
verify_cert => "false" } }
output
{ stdout
{ codec => rubydebug }
elasticsearch
{ index => "emails"
document_type => "email"
hosts => "localhost:9200" } }'
The problem is that two fields of the outputs are parsed as String fields but they are supposed to be "date" fields
The format of the fields is as below :
"x-dbworld-deadline" => "31-Jul-2019"
"x-dbworld-start-date" => "18-Nov-2019"
How can I convert these two fields into date fields ?
Thanks!
How about create mapping of index on Elasticsearch.
It may look like this:
PUT date-test-191211
{
"mappings": {
"_doc": {
"properties": {
"x-dbworld-deadline": {
"type": "date",
"format": "dd-MMM-yyyy"
},
"x-dbworld-start-date": {
"type": "date",
"format": "dd-MMM-yyyy"
}
}
}
}
}
Then, those fields are recognized as Date format:
result:
[

ElasticSearch 5.0.0 - error about object name is already in use

I am learning ElasticSearch and have hit a block. I am trying to use logstash to load a simple CSV into ElasticSearch. This is the data, it is a postcode, longitude, latitude
ZE1 0BH,-1.136758103355,60.150855671143
ZE1 0NW,-1.15526666950369,60.1532197533966
I am using the following logstash conf file to filter the CSV to create a "location" field
input {
file {
path => "postcodes.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["postcode", "lat", "lon"]
separator => ","
}
mutate { convert => {"lat" => "float"} }
mutate { convert => {"lon" => "float"} }
mutate { rename => {"lat" => "[location][lat]"} }
mutate { rename => {"lon" => "[location][lon]"} }
mutate { convert => { "[location]" => "float" } }
}
output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "postcodes"
}
stdout { codec => rubydebug }
}
And I have added the mapping to ElasticSearch using the console in Kibana
PUT postcodes
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"feature": {
"_all": { "enabled": true },
"properties": {
"postcode": {"type": "text"},
"location": {"type": "geo_point"}
}
}
}
}
I check the mappins for the index using
GET postcodes/_mapping
{
"postcodes": {
"mappings": {
"feature": {
"_all": {
"enabled": true
},
"properties": {
"location": {
"type": "geo_point"
},
"postcode": {
"type": "text"
}
}
}
}
}
}
So this all seems to be correct having looked at the documentation and the other questions posted.
However when i run
bin/logstash -f postcodes.conf
I get an error:
[location] is defined as an object in mapping [logs] but this name is already used for a field in other types
I have tried a number of alternative methods;
Deleted the index and the create a template.json and changed my conf file to have the extra settings:
manage_template => true
template => "postcode_template.json"
template_name =>"open_names"
template_overwrite => true
and this gets the same error.
I have managed to get the data loaded by not supplying a template however the data never gets loaded in as a geo_point so you cannot use the Kibana Tile Map to visualise the data
Can anyone explain why I am receiving that error and what method I should use?
Your problem is that you don't have a document_type => feature on your elasticsearch output. Without that, it's going to create the object on type logs which is why you are getting this conflict.

Elasticsearch - change field from not_analyzed to analyzed

Is it possible to modify property of an existing field from not_analyzed to analyzed ?
If not, what can I do in order to keep all my documents in store ?
I cannot delete mappings (because then all documents will be gone) and I need that old field as analyzed.
You cannot modify an existing field, however, you can either create another field or add a sub-field to your not_analyzed field.
I'm going for the latter solution. So first, add a new sub-field to your existing field, like this:
curl -XPUT localhost:9200/index/_mapping/type -d '{
"properties": {
"your_field": {
"type": "string",
"index": "not_analyzed",
"fields": {
"sub": {
"type": "string"
}
}
}
}
}'
Above, we've added the sub-field called your_field.sub (which is analyzed) to the existing your_field (which is not_analyzed)
Next, we'll need to populate that new sub-field. If you're running the latest ES 2.3, you can use the powerful Reindex API
curl -XPUT localhost:9200/_reindex -d '{
"source": {
"index": "index"
},
"dest": {
"index": "index"
},
"script": {
"inline": "ctx._source.your_field = ctx._source.your_field"
}
}'
Otherwise, you can simply use the following Logstash configuration which will re-index your data in order to populate the new sub-field
input {
elasticsearch {
hosts => "localhost:9200"
index => "index"
docinfo => true
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{[#metadata][_index]}"
document_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
You can use this... there is something known as multi field type mapping, which allows you to have more than one mapping for a single field, and you can also query based on the field type..

Logstash/Elasticsearch CSV Field Types, Date Formats and Multifields (.raw)

I've been playing around with getting a tab delimited file into Elasticsearch using the CSV filter in Logstash. Getting the data in was actually incredibly easy, but I'm having trouble getting the field types to come in right when I look at the data in Kibana. Dates and integers continue to come in as strings, so I can't plot by date or do any analysis functions on integers (sum, mean, etc).
I'm also having trouble getting the .raw version of the fields to populate. For example, in device I have data like "HTC One", but when if I do a pie chart in Kibana, it'll show up as two separate groupings "HTC" and "One". When I try to chart device.raw instead, it comes up as a missing field. From what I've read, it seems like Logstash should automatically create a raw version of each string field, but that doesn't seem to be happening.
I've been sifting through the documentation, google and stack, but haven't found a solution. Any ideas appreciated! Thanks.
Config file:
#logstash.conf
input {
file {
path => "file.txt"
type => "event"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["userid","date","distance","device"]
separator => " "
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
protocol => "http"
index => "userid"
workers => 2
template => template.json
}
#stdout {
# codec => rubydebug
#}
}
Here's the template file:
#template.json:
{
"template": "event",
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0,
"index" : {
"query" : { "default_field" : "userid" }
}
},
"mappings": {
"_default_": {
"_all": { "enabled": false },
"_source": { "compress": true },
"dynamic_templates": [
{
"string_template" : {
"match" : "*",
"mapping": { "type": "string", "index": "not_analyzed" },
"match_mapping_type" : "string"
}
}
],
"properties" : {
"date" : { "type" : "date", "format": "yyyy-MM-dd HH:mm:ss"},
"device" : { "type" : "string", "fields": {"raw": {"type": "string","index": "not_analyzed"}}},
"distance" : { "type" : "integer"}
}
}
}
Figured it out - the template name IS the index. So the "template" : "event" line should have been "template" : "userid"
I found another (easier) way to specify the type of the fields. You can use logstash's mutate filter to change the type of a field. Simply add the following filter after your csv filter to your logstash config
mutate {
convert => [ "fieldname", "integer" ]
}
For details check out the logstash docs - mutate convert

Resources