ElasticSearch nesting for varied value - elasticsearch

Im looking for a clean solution for this. Basically I have arrays which are numbered by an integer. This number may be from 1-50. Rather than repeating my index 50 times, is there a work around for this?
Here is an example of how I would do it for level 1.
Thanks
"test" : {
"properties" : {
"1" : {
"properties" : {
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
"taglevel" : {
"type" : "long"
}
}
},
"2" : {
"properties" : {
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
"taglevel" : {
"type" : "long"
}
}
},
"3" : {
"properties" : {
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
"taglevel" : {
"type" : "long"
}
}
},
repeat 47 times more until
"50" : {
"properties" : {
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
"taglevel" : {
"type" : "long"
}
}
},

Related

Kibana index pattern mapping conflict

I am tired of reindexing every 2 3 weeks i have to do reindex.
{
"winlogbeat_sysmon" : {
"order" : 0,
"index_patterns" : [
"log-wlb-sysmon-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "winlogbeat_sysmon_policy",
"rollover_alias" : "log-wlb-sysmon"
},
"refresh_interval" : "1s",
"number_of_shards" : "1",
"number_of_replicas" : "1"
}
},
"mappings" : {
"properties" : {
"thread_id" : {
"type" : "long"
},
"z_elastic_ecs.event.code" : {
"type" : "long"
},
"geoip" : {
"type" : "object",
"properties" : {
"ip" : {
"type" : "ip"
},
"latitude" : {
"type" : "half_float"
},
"location" : {
"type" : "geo_point"
},
"longitude" : {
"type" : "half_float"
}
}
},
"dst_ip_addr" : {
"type" : "ip"
}
}
},
"aliases" : { }
}
}
this is the template i set earlier from then i didn't change anything
in current and previous indices of log-wlb-sysmon has dst_ip_addr has ip field and older indices of log-wlb-sysmon has text field in logstash i didn't see any warnning for this issue

Elastic Search and product nomenclature: hyphens and spaces

I'm having a hard time figuring out how to set up Elasticsearch for the typical product model nomenclature. For instance, a product called "Shure SM7B" should appear as a result when searching for SM7B, SM 7B, SM 7, SM-7... and vice versa: searching for SM7B should give results like SM-7, SM7...
For now, I'm getting this kind of results: if I search for "Roland D 50", I get Roland D 50, Roland D-50, Roland D-550, Roland D-20 and so on... but if I search for "Roland D50", I get only "Roland D50" results.
This is my current mapping/settings:
{
"products" : {
"mappings" : {
"Product" : {
"properties" : {
"article_reviews" : {
"type" : "integer"
},
"brand_id" : {
"type" : "integer"
},
"category" : {
"type" : "text"
},
"category_id" : {
"type" : "integer"
},
"date" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"description" : {
"type" : "text"
},
"has_image" : {
"type" : "integer"
},
"id" : {
"type" : "integer"
},
"last_review_date" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"min_price" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text"
},
"name_order" : {
"type" : "keyword"
},
"price_history" : {
"type" : "integer"
},
"rating" : {
"type" : "float"
},
"reviews" : {
"type" : "integer"
},
"shops" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"widget" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
Also, I'd need to autocomplete my searches, so for instance, "Shure SM" should show results like Shure SM-7, Shure SM7, Shure SM58, Shure SM 57, etc... narrowing them down as I type.
Any clues? Thank you!

How to do ES Moving Avearge Prediction with Logstash?

I am using Elasticsearch 2.3.2, and Logstash 2.3.3. I have found from https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-movavg-aggregation.html which states that moving average can do predictions. I know it is possible to only make query in ES, but I am not sure how should I do that with logstash.
I have a logstash file which reads a csv log file storing CPU usage for every 15 seconds. Should I just include the following into the logstash output json file for the related index as an output mapping?
{
"the_movavg":{
"moving_avg":{
"buckets_path": "the_sum",
"window" : 30,
"model" : "holt_winters",
"settings" : {
"type" : "mult",
"alpha" : 0.5,
"beta" : 0.5,
"gamma" : 0.5,
"period" : 7,
"pad" : true
}
}
}
This is my json file for logstash
{
"template" : "linux_cpu-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "omit_norms" : true},
"dynamic_templates" : [ {
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" }
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date" },
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "float" },
"longitude" : { "type" : "float" }
}
}
}
}
}
}
And is it possible to have it as a graph as to be shown in Kibana?

Mysteriously wrong values of numerical fields in ElasticSearch

I've spent the last 2 days investigating this mind-bending issue:I have an index with custom mappings on which I perform some aggregations. The problem is that in the results of the aggregation on numerical fields,it returns values that do not appear in the database from which the data was imported, even though the number of results is the same.
I found a similar issue here where the problem was inconsistent mapping of a field across an index, but in my case it is mapped as the same type. The problem happens with the fields: award.value.amount, award.value.x_amountEur, tender.value.x_amountEur as far as I have checked.This is my current mapping as stated by curl -XGET 'http://localhost:9200/documents/_mappings?pretty&human'
(the part that contains the target fields):
{
"documents" : {
"mappings" : {
"document" : {
"properties" : {
"additionalIdentifiers" : {
"type" : "string",
"index" : "not_analyzed"
},
"award" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"contract_number" : {
"type" : "string",
"index" : "not_analyzed"
},
"date" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"x_day" : {
"type" : "integer"
},
"x_month" : {
"type" : "integer"
},
"x_year" : {
"type" : "integer"
}
}
},
"description" : {
"type" : "string"
},
"initialValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_vat" : {
"type" : "float"
}
}
},
"minValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"x_amountEur" : {
"type" : "float"
}
}
},
"title" : {
"type" : "string"
},
"value" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_amountEur" : {
"type" : "float"
},
"x_vat" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
},
"x_initialValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"x_amountEur" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
}
}
},
"awardCriteria" : {
"type" : "string"
},
"contract_number" : {
"type" : "string"
},
"document_id" : {
"type" : "string",
"index" : "not_analyzed"
},
"numberOfTenderers" : {
"type" : "string"
},
"procurementMethod" : {
"type" : "string"
},
"procuring_entity" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"address" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"country" : {
"type" : "string"
},
"countryName" : {
"type" : "string",
"index" : "not_analyzed"
},
"email" : {
"type" : "string"
},
"locality" : {
"type" : "string"
},
"postalCode" : {
"type" : "string"
},
"streetAddress" : {
"type" : "string"
},
"telephone" : {
"type" : "string"
},
"x_url" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string"
},
"x_slug" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"suppliers" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"address" : {
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"email" : {
"type" : "string"
},
"locality" : {
"type" : "string"
},
"postalCode" : {
"type" : "string"
},
"streetAddress" : {
"type" : "string"
},
"telephone" : {
"type" : "string"
},
"x_url" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string"
},
"x_slug" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"tender" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"value" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_amountEur" : {
"type" : "float"
},
"x_vat" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
}
}
}
This is the aggregation I am using in order to get the values of contracts between each pair of supplier - procuring_entity:
Document.es.search({
"search_type": "count" ,
"body":{
"aggregations": {
"entities":{
"nested": {
"path": "procuring_entity"
},
"aggs": {
"procuring_entity_names": {
"terms": {
"field": "procuring_entity.x_slug",
"size": 0
},
"aggs": {
"suppliers": {
"nested": {
"path": "suppliers"
},
"aggs": {
"suppliers_names": {
"terms":{
"field": "suppliers.x_slug",
"size": 0
},
"aggs": {
"awards": {
"nested": {
"path": "award.value"
},
"aggs": {
"award_amounts": {
"terms":{
"field": "award.value.x_amountEur",
"size": 0
}
}
}
}
}
}
}
}
}
}
}
}
}
}})
The result with type float is :
{"entities"=>
{"doc_count"=>24300,
"procuring_entity_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
"doc_count"=>1360,
"suppliers"=>
{"doc_count"=>1360,
"suppliers_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"recipe-plus-as",
"doc_count"=>388,
"awards"=>
{"doc_count"=>388,
"awards"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>3679.086669921875, "doc_count"=>373},
{"key"=>0.0, "doc_count"=>12},
{"key"=>73610.3203125, "doc_count"=>1},
{"key"=>244000.0, "doc_count"=>1},
{"key"=>342348.9375, "doc_count"=>1}]}}}
The problem is that in MongoDB the same query returns 388 documents that all have award.value.x_amountEur = 3679.08661250056 , as presented by Mongoid query:
Document.where(:"procuring_entity.x_slug" => "vsia-bernu-kliniska-universitates-slimnica")
.keep_if{|doc| doc.suppliers.first.x_slug == "recipe-plus-as"}
.map{|doc| doc.award.value.x_amountEur}.uniq
=>[3679.08661250056]
A query directly into MongoDB returns the same.
I have also tried to map the targeted fields as double, which gave the same result and as long which returned the following (even more incorrect result):
{"entities"=>
{"doc_count"=>24300,
"procuring_entity_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
"doc_count"=>1360,
"suppliers"=>
{"doc_count"=>1360,
"suppliers_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"recipe-plus-as",
"doc_count"=>388,
"awards"=>
{"doc_count"=>388,
"awards"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>3679, "doc_count"=>371},
{"key"=>0, "doc_count"=>12},
{"key"=>44300, "doc_count"=>1},
{"key"=>80472, "doc_count"=>1},
{"key"=>331636, "doc_count"=>1},
{"key"=>342348, "doc_count"=>1},
{"key"=>1658805, "doc_count"=>1}]}}}
I'm using Elasticsearch 2.0, mongoid 5.0.1 and mongoid-elasticsearch for indexing. I can't think of anything else to do so any suggestion is welcomed and appreciated.
I tried to test your scenario with ES 2.0 and there is something that I'm missing. I cannot make it create buckets for the award.value.x_amountEur unless I use a reverse_nested aggregation to "get out" from one nested path and into another.
So, instead of the awards aggregation that you have I'm using the same aggregation but "wrapped" in a reverse_nested aggregation:
"aggs": {
"getting_back": {
"reverse_nested": {},
"aggs": {
"awards": {
"nested": {
"path": "award.value"
},
"aggs": {
"award_amounts": {
"terms": {
"field": "award.value.x_amountEur"
}
}
}
}
}
}
}
And for this one I am seeing something ok.
Later edit: following mine and more general #Val's suggestion, the complete solution was to use reverse_nested on both awards and suppliers aggregations.

How can i map custom date format in elasticsearch and Kibana4

I have nginx logs and i have this date format [02/Mar/2015:13:02:51 +0000]
What should i use in elasticsearch and what i should put in the dateformat field of Kibana4?
curl -XGET 'http://localhost:9200/_mapping?pretty'
{
"nginx" : {
"mappings" : {
"t07_nginx" : {
"properties" : {
"#timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"body_bytes_sent" : {
"type" : "string"
},
"geoip_country_code" : {
"type" : "string"
},
"host" : {
"type" : "string"
},
"http_host" : {
"type" : "string"
},
"http_referer" : {
"type" : "string"
},
"http_user_agent" : {
"type" : "string",
"index" : "not_analyzed"
},
"http_x_forwarded_for" : {
"type" : "string"
},
"message" : {
"type" : "string"
},
"msec request_time" : {
"type" : "string"
},
"remote_addr" : {
"type" : "string"
},
"request_http_protocol" : {
"type" : "string"
},
"request_time" : {
"type" : "string"
},
"request_type" : {
"type" : "string"
},
"request_url" : {
"type" : "string"
},
"status" : {
"type" : "string"
},
"upstream_addr" : {
"type" : "string"
},
"upstream_response_time" : {
"type" : "string"
}
}
}
}
}
with the above i can't see any data(events) in Kibana
Thanks
What does the input plugin for nginx/output plugin for elasticsearch in your fluentd config file look like?
Also, make sure you have your time range setup correctly in kibana. I believe it defaults to 15 minutes.

Resources