Elasticsearch _mapping API is not telling me which fields are not analyzed - elasticsearch

I am trying to stop elasticsearch from analyzing some fields in my documents.
I posted this mapping:
{
"properties" : {
"f1" : {
"index" : "not_analyzed",
"include_in_all" : false,
"type" : "string"
},
"f2" : {
"index" : "not_analyzed",
"include_in_all" : false,
"type" : "string"
},
"f3" : {
"index" : "not_analyzed",
"include_in_all" : false,
"type" : "string"
}
}
}
Then I ping the mapping endpoint and it doesn't tell me if those fields
are analyzed or not:
{
"myindex" : {
"mappings" : {
"mytype" : {
"properties" : {
"f1" : {
"type" : "keyword",
"include_in_all" : false
},
"f2" : {
"type" : "keyword",
"include_in_all" : false
},
"f3" : {
"type" : "keyword",
"include_in_all" : false
}
}
}
}
}
}
In the examples I have seen querying _mapping API seems to tell what fields are analyzed or not.

In elasticsearch 5.0 and later there's a new way of separating analyzed and non-analyzed content:
Strings are dead, long live strings!
Keyword datatype
But, to summarize:
keyword is not analyzed
text is analyzed
and the index property that had 3 values: "no","analyzed","not-analyzed" is now simplified to just "yes" and "no"

Related

Ingesting data from Spark to Elasticsearch with index template

In our existing design we are using logstash to fetch data from Kafka (JSON) and put it in ElasticSearch.
We are also using index template mapping while inserting data from logstash to ES and this could be done by setting 'template' property of ES output plugin of logstash, e.g.,
output {
elasticsearch {
template => "elasticsearch-template.json", //template file path
hosts => "localhost:9200"
template_overwrite => true
manage_template => true
codec=>plain
}
}
elasticsearch-template.json looks like below,
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "3s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256, "doc_values":true}
}
}
}
} ],
"properties" : {
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
}
}
Now we are going to replace logstash with Apache Spark and I want to use similar kind of usage of index template in Spark while inserting data to ES.
I am using elasticsearch-spark_2.11 library for this implementation.
Thanks.

Logstash issues in creating index remove .raw field in kibana

I have written a logstash conf filefor reading logs. If I use the default index, that is logstash-*, I could see .raw field in kibana. However, if I create a new index in conf file in logstash like
output{
elasticsearch {
hosts => "localhost"
index => "batchjob-*"}
}
Then the new index cant configure .raw field. Is there any resolve ways to solve it? Great Thanks.
The raw fields are created by a specific index template that the Logstash elasticsearch output creates in Elasticsearch.
What you can do is simply copy that template to a file named batchjob.json and change the template name to batchjob-* (see below)
{
"template" : "batchjob-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "omit_norms" : true},
"dynamic_templates" : [ {
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" }
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date" },
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "float" },
"longitude" : { "type" : "float" }
}
}
}
}
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => "localhost"
index => "batchjob-*"
template_name => "batchjob"
template => "/path/to/batchjob.json"
}
}

How to use _timestamp in logstash elasticsearch

I am trying to figure out how to use the _timestamp with logstash.
I have tried to add to the mapping:
"_timestamp" : {
"enabled" : true,
"path" : "#timestamp"
},
But that does not have the expected effect. I did this in the elasticsearch-template.json file (I tried with and without the "store"=true):
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_timestamp" : {
"enabled" : true,
"store" : true,
"path" : "#timestamp"
},
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
.....
And I added the modified file to the output filter
output {
elasticsearch_http {
template => '/tmp/elasticsearch-template.json'
host => '127.0.0.1'
port=>9200
}
}
In order to make sure the database is clean I repeatedly do:
curl -XDELETE http://localhost:9200/logstash*
curl -XDELETE http://localhost:9200/_template/logstash
rm ~/.sincedb_*
and then I try to import my logfile. But for some reasons, the _timestamp is not set.
The mapping seems to be ok
{
"logstash-2014.03.24" : {
"_default_" : {
"dynamic_templates" : [ {
"string_fields" : {
"mapping" : {
"index" : "analyzed",
"omit_norms" : true,
"type" : "string",
"fields" : {
"raw" : {
"index" : "not_analyzed",
"ignore_above" : 256,
"type" : "string"
}
}
},
"match" : "*",
"match_mapping_type" : "string"
}
} ],
"_timestamp" : {
"enabled" : true,
"store" : true,
"path" : "#timestamp"
},
"properties" : {
"#version" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs"
},
"geoip" : {
"dynamic" : "true",
"properties" : {
"location" : {
"type" : "geo_point"
}
}
}
}
},
"logs" : {
"dynamic_templates" : [ {
"string_fields" : {
"mapping" : {
"index" : "analyzed",
"omit_norms" : true,
"type" : "string",
"fields" : {
"raw" : {
"index" : "not_analyzed",
"ignore_above" : 256,
"type" : "string"
}
}
},
"match" : "*",
"match_mapping_type" : "string"
}
} ],
"_timestamp" : {
"enabled" : true,
"store" : true,
"path" : "#timestamp"
},
"properties" : {
"#timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
The documents in the database look like
{
"_id": "Cps2Lq1nTIuj_VysOwwcWw",
"_index": "logstash-2014.03.25",
"_score": 1.0,
"_source": {
"#timestamp": "2014-03-25T00:47:09.703Z",
"#version": "1",
"created": "2014-03-25 01:47:09,703",
"host": "macbookpro.fritz.box",
"message": "2014-03-25 01:47:09,703 - Starting new HTTP connection (1): localhost",
"path": "/Users/scharf/git/ckann/annotator-store/logs/requests.log",
"text": "Starting new HTTP connection (1): localhost"
},
"_type": "logs"
},
why is the _timestamp not set???
In short, it does work.
I tested your exact scenario and here's what I found:
When using _source enabled and specifying _timestamp from some path in the _source,
you will never see _timestamp as part of the document, but if however, you add the ?fields query string part, for example:
http://<localhost>:9200/es_test_logs/ESTest1/ilq4PU3tR9SeoLo794wZlg?fields=_timestamp
you will get the correct _timestamp value.
If, instead of using path, you pass _timestamp externally (in the _source document), you will see _timestamp under the _source property in the document as normal.
If you disable the _source field, you will not see ANY property at all in the document, even those you set as "store" : true. You will only see them when specifying ?fields, or when building a query that returns those fields.

Use nested fields in kibana panels

I tried to display a Kibana dashboard and it works well. Unfortunately, when I want to add a pie chart (or another representation) containing the countries of the companies locations, I have an empty panel.
I'm able to use the kibana queries to filter on a specific country but I'm not able to display a panel with nested documents.
My mapping (I have to use nested fields because a company can have several locations):
{
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"company" : {
"properties" : {
"name" : { "type" : "string", "store" : "yes" },
"website" : { "type" : "string", "store" : "yes" },
"employees" : { "type" : "string", "store" : "yes" },
"type": { "type" : "string", "store" : "yes" },
"locations" : {
"type" : "nested",
"properties" : {
"city" : { "type" : "string", "store" : "yes" },
"country" : { "type" : "string", "store" : "yes" },
"coordinates" : { "type" : "geo_point", "store" : "yes" }
}
}
}
}
}
}
Do you know how could I display panel with nested objects? Is it implemented?
Thanks,
Kevin
you are missing one parameter ("include_in_parent": true) in your mapping.
The correct mapping should be:
{
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"company" : {
"properties" : {
"name" : { "type" : "string", "store" : "yes" },
"website" : { "type" : "string", "store" : "yes" },
"employees" : { "type" : "string", "store" : "yes" },
"type": { "type" : "string", "store" : "yes" },
"locations" : {
"type" : "nested",
"include_in_parent": true,
"properties" : {
"city" : { "type" : "string", "store" : "yes" },
"country" : { "type" : "string", "store" : "yes" },
"coordinates" : { "type" : "geo_point", "store" : "yes" }
}
}
}
}
}
}
It's clearly a Kibana bug. The facet query generated by Kibana is missing the "nested" field to indicate this.

ElasticSearch query with Nested filter is not working

I am indexing Elasticsearch documents with a nested field containing another nested field(so 2 step tree for a field). I want to match a document based on the data from inner nested field, which is not working.
NestedFilterBuilder looks like the below one..
"nested" : {
"filter" : {
"or" : {
"filters" : [ {
"term" : {
"event_attribute_value" : "Obama"
}
}, {
"term" : {
"event_attribute_value" : "President"
}
} ]
}
},
"path" : "eventnested.attributes"
}
This is the Java I am using to generate the query
orFilter.add(termFilter("event_attribute_value","president"));
NestedFilterBuilder nestedFilterBuilder = new NestedFilterBuilder("eventnested.attributes", orFilter);
finalFilter.add(nestedFilterBuilder);
Mapping on which the index built is
"eventnested":{
"type" : "nested", "store" : "yes", "index" : "analyzed", "omit_norms" : "true",
"include_in_parent":true,
"properties":{
"event_type":{"type" : "string", "store" : "yes", "index" : "analyzed","omit_norms" : "true"},
"attributes":{
"type" : "nested", "store" : "yes", "index" : "analyzed", "omit_norms" : "true",
"include_in_parent":true,
"properties":{
"event_attribute_name":{"type" : "string", "store" : "yes", "index" : "analyzed","omit_norms" : "true"},
"event_attribute_value":{"type" : "string", "store" : "yes", "index" : "analyzed","omit_norms" : "true"}
}
},
"event_attribute_instance":{"type" : "integer", "store" : "yes", "precision_step" : "0"}
}
}
Is thr something I am using wrong?
According to your mapping event_attribute_value is analyzed. It means that during indexing phrase "President Obama" is analyzed into two tokens: "president" and "obama". You are searching for the tokens "President" and "Obama" that don't exist in the index.
You can solve this problem by
changing field mapping to not_analyzed,
replacing term filter with text query or
using correct tokens in your term filter ("president" and "obama" in
this case).

Resources