elasticsearch nested query with ruby gem - ruby

I am using the elasticsearch ruby gem to connect to an es server and currently have an index with the below mapping. I am trying to understand the proper syntax to query these nested objects. Experimenting with queries such as the following, but keep getting errors. I was wondering if someone could get me started on the proper syntax for querying a structure such as this? thanks!
client = Elasticsearch::Client.new log:true
client.search index: 'injuries', nested: { path: { week: {id: '1' } } }
Returns:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":"SearchPhaseExecutionException[Failed to execute phase [query
Sample Mapping:
{
"injuries" : {
"mappings" : {
"tbd" : {
"properties" : {
"injuries" : {
"properties" : {
"timestamp" : {
"properties" : {
"__content__" : {
"type" : "string"
},
"timeZone" : {
"type" : "string"
}
}
}
}
}
}
},
"football" : {
"properties" : {
"injuries" : {
"properties" : {
"timestamp" : {
"properties" : {
"__content__" : {
"type" : "string"
},
"timeZone" : {
"type" : "string"
}
}
},
"week" : {
"properties" : {
"id" : {
"type" : "string"
},
"inactivePlayers" : {
"properties" : {
"inactivePlayer" : {
"properties" : {
"firstName" : {
"type" : "string"
},
"lastName" : {
"type" : "string"
},
"playerId" : {
"type" : "string"
},
"position" : {
"type" : "string"
},
"status" : {
"type" : "string"
},
"teamId" : {
"type" : "string"
}
}
}
}
},
"injuredPlayers" : {
"properties" : {
"injuredPlayer" : {
"properties" : {
"displayName" : {
"type" : "string"
},
"firstName" : {
"type" : "string"
},
"gameStatus" : {
"type" : "string"
},
"injury" : {
"type" : "string"
},
"lastName" : {
"type" : "string"
},
"playerId" : {
"type" : "string"
},
"position" : {
"type" : "string"
},
"practiceStatus" : {
"type" : "string"
},
"teamId" : {
"type" : "string"
}
}
}
}
},
"season" : {
"type" : "string"
},
"seasonType" : {
"type" : "string"
}
}
}
}
}
}
}
}
}
}

Your nested query doesn't appear to have a query defined. I think it should be something like:
"nested" : {
"path" : "week",
"query" : {
"match" : {"week.id" : "1"}
}
}

Related

ElasticSearch Multi Search query not return results

I am new to ElasticSearch and running version. 2.3.5.
I am running this query:
{
"query" : {
"multi_match" : {
"type" : "cross_fields",
"query" : "John Schmidt Sankt Boulevard 118b 2554 Island",
"minimum_should_match" : "50%",
"operator" : "and",
"fields" : ["*Name", "*Street.*hasStringValue", "*hasStreetNumber", "*hasPostCode", "*PostalLocality.*hasStringValue"]
}
}
}
However it does not return any result. If I remove the 'b' after 118 from the query from the query it returns the document.
All other fields is a match so how can I make ElasticSearch return the document?
Here is the mapping:
{
"my_index" : {
"mappings" : {
"datasubject" : {
"properties" : {
"#context" : {
"properties" : {
"con" : {
"type" : "string"
},
"cor" : {
"type" : "string"
},
"geo" : {
"type" : "string"
},
"per" : {
"type" : "string"
}
}
},
"cor:Person" : {
"properties" : {
"con:hasContactPoint" : {
"properties" : {
"con:Mobile" : {
"properties" : {
"con:hasAreaCode" : {
"type" : "string"
},
"con:hasCompleteTelephoneNumberString" : {
"type" : "string"
},
"con:hasCountryCode" : {
"type" : "string"
}
}
},
"con:PostalAddress" : {
"properties" : {
"con:hasAddressPoint" : {
"properties" : {
"geo:StreetAddress" : {
"properties" : {
"con:hasPostCode" : {
"type" : "string"
},
"con:hasPostalLocality" : {
"properties" : {
"geo:PostalLocality" : {
"properties" : {
"cor:hasStringValue" : {
"type" : "string"
}
}
}
}
},
"geo:hasStreet" : {
"properties" : {
"geo:Street" : {
"properties" : {
"cor:hasStringValue" : {
"type" : "string"
}
}
}
}
},
"geo:hasStreetNumber" : {
"type" : "string"
}
}
}
}
}
}
}
}
},
"cor:hasBirthDate" : {
"properties" : {
"cor:Date" : {
"properties" : {
"cor:hasDateValue" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
}
}
}
}
},
"cor:hasName" : {
"properties" : {
"per:Name" : {
"properties" : {
"per:familyName" : {
"type" : "string"
},
"per:givenName" : {
"type" : "string"
}
}
}
}
},
"cor:isIdentifiedBy" : {
"properties" : {
"cor:GEDIvA" : {
"properties" : {
"cor:hasCompleteIdentifierValue" : {
"type" : "string"
}
}
},
"dataset/pdi:IndividualId" : {
"properties" : {
"cor:hasCompleteIdentifierValue" : {
"type" : "string"
}
}
}
}
}
}
}
}
}
}
}
}
And here is the index settings:
{
"gdprui" : {
"settings" : {
"index" : {
"creation_date" : "1525442279108",
"analysis" : {
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : "true"
},
"substring" : {
"type" : "edgeNGram",
"min_gram" : "1",
"max_gram" : "10"
}
},
"analyzer" : {
"default" : {
"filter" : [ "standard", "my_ascii_folding", "lowercase", "substring", "reverse" ],
"tokenizer" : "standard"
}
}
},
"number_of_shards" : "5",
"number_of_replicas" : "2",
"uuid" : "EMqhJwGWRKi1F5gFwuSKTQ",
"version" : {
"created" : "2030599"
}
}
}
}
}

Elasticsearch Aggregation sorting

My Elasticsearch mapping is
{
"mappings" : {
"loc" : {
"dynamic": "true",
"properties" : {
"geoip" : {
"properties" : {
"location" : { "type": "geo_point"}
}
},
"lon" : { "type" : "double" },
"lat" : { "type" : "double" },
"altitude" : { "type" : "double" },
"id" : { "type" : "long" },
"date" : { "type" : "date", "format" : "epoch_millis" },
"ip" : { "type" : "string" },
"port" : { "type" : "string" }
}
}
}
}
And I want to sort by time.
So i made query.
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "0.2km",
"geoip.location" : {
"lat" : 36.773353,
"lon" : 126.933847
}
}
}
}
},
"size" : 0,
"sort" : { "date" : { "order" : "desc" } },
"aggs" : {
"ids" : {
"terms" : {
"field" : "id"
},
"aggs" : {
"dedup_docs" : {
"top_hits" : {"size" : 1}
}
}
}
}
}
I want to return the latest time by grouping the results of applying the gps filter by id and sorting in chronological order.
However, the date value of the result is an unordered result.
I do not know how to modify the query.

aggregation fails on nested aggregation field

I've this mapping for fuas type:
curl -XGET 'http://localhost:9201/living_team/_mapping/fuas?pretty'
{
"living_v1" : {
"mappings" : {
"fuas" : {
"properties" : {
"backlogStatus" : {
"type" : "long"
},
"comment" : {
"type" : "string"
},
"dueTimestamp" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"matter" : {
"type" : "string"
},
"metainfos" : {
"properties" : {
"category 1" : {
"type" : "string"
},
"key" : {
"type" : "string"
},
"null" : {
"type" : "string"
},
"processos" : {
"type" : "string"
}
}
},
"resources" : {
"properties" : {
"noteId" : {
"type" : "string"
},
"resourceId" : {
"type" : "string"
}
}
},
"status" : {
"type" : "long"
},
"timestamp" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"user" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
I'm trying to perform this aggregation:
curl -XGET 'http://ESNode01:9201/living_team/fuas/_search?pretty' -d '
{
"aggs" : {
"demo" : {
"nested" : {
"path" : "metainfos"
},
"aggs" : {
"key" : { "terms" : { "field" : "metainfos.key" } }
}
}
}
}
'
ES realizes me:
"error" : {
"root_cause" : [ {
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [metainfos] is not nested"
} ],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query_fetch",
"grouped" : true,
"failed_shards" : [ {
"shard" : 3,
"index" : "living_v1",
"node" : "HfaFBiZ0QceW1dpqAnv-SA",
"reason" : {
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [metainfos] is not nested"
}
} ]
},
"status" : 500
}
Any ideas?
You're missing "type":"nested" from your metainfos mapping.
Should have been:
"metainfos" : {
"type":"nested",
"properties" : {
"category 1" : {
"type" : "string"
},
"key" : {
"type" : "string"
},
"null" : {
"type" : "string"
},
"processos" : {
"type" : "string"
}
}
}

Mysteriously wrong values of numerical fields in ElasticSearch

I've spent the last 2 days investigating this mind-bending issue:I have an index with custom mappings on which I perform some aggregations. The problem is that in the results of the aggregation on numerical fields,it returns values that do not appear in the database from which the data was imported, even though the number of results is the same.
I found a similar issue here where the problem was inconsistent mapping of a field across an index, but in my case it is mapped as the same type. The problem happens with the fields: award.value.amount, award.value.x_amountEur, tender.value.x_amountEur as far as I have checked.This is my current mapping as stated by curl -XGET 'http://localhost:9200/documents/_mappings?pretty&human'
(the part that contains the target fields):
{
"documents" : {
"mappings" : {
"document" : {
"properties" : {
"additionalIdentifiers" : {
"type" : "string",
"index" : "not_analyzed"
},
"award" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"contract_number" : {
"type" : "string",
"index" : "not_analyzed"
},
"date" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"x_day" : {
"type" : "integer"
},
"x_month" : {
"type" : "integer"
},
"x_year" : {
"type" : "integer"
}
}
},
"description" : {
"type" : "string"
},
"initialValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_vat" : {
"type" : "float"
}
}
},
"minValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"x_amountEur" : {
"type" : "float"
}
}
},
"title" : {
"type" : "string"
},
"value" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_amountEur" : {
"type" : "float"
},
"x_vat" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
},
"x_initialValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"x_amountEur" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
}
}
},
"awardCriteria" : {
"type" : "string"
},
"contract_number" : {
"type" : "string"
},
"document_id" : {
"type" : "string",
"index" : "not_analyzed"
},
"numberOfTenderers" : {
"type" : "string"
},
"procurementMethod" : {
"type" : "string"
},
"procuring_entity" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"address" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"country" : {
"type" : "string"
},
"countryName" : {
"type" : "string",
"index" : "not_analyzed"
},
"email" : {
"type" : "string"
},
"locality" : {
"type" : "string"
},
"postalCode" : {
"type" : "string"
},
"streetAddress" : {
"type" : "string"
},
"telephone" : {
"type" : "string"
},
"x_url" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string"
},
"x_slug" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"suppliers" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"address" : {
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"email" : {
"type" : "string"
},
"locality" : {
"type" : "string"
},
"postalCode" : {
"type" : "string"
},
"streetAddress" : {
"type" : "string"
},
"telephone" : {
"type" : "string"
},
"x_url" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string"
},
"x_slug" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"tender" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"value" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_amountEur" : {
"type" : "float"
},
"x_vat" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
}
}
}
This is the aggregation I am using in order to get the values of contracts between each pair of supplier - procuring_entity:
Document.es.search({
"search_type": "count" ,
"body":{
"aggregations": {
"entities":{
"nested": {
"path": "procuring_entity"
},
"aggs": {
"procuring_entity_names": {
"terms": {
"field": "procuring_entity.x_slug",
"size": 0
},
"aggs": {
"suppliers": {
"nested": {
"path": "suppliers"
},
"aggs": {
"suppliers_names": {
"terms":{
"field": "suppliers.x_slug",
"size": 0
},
"aggs": {
"awards": {
"nested": {
"path": "award.value"
},
"aggs": {
"award_amounts": {
"terms":{
"field": "award.value.x_amountEur",
"size": 0
}
}
}
}
}
}
}
}
}
}
}
}
}
}})
The result with type float is :
{"entities"=>
{"doc_count"=>24300,
"procuring_entity_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
"doc_count"=>1360,
"suppliers"=>
{"doc_count"=>1360,
"suppliers_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"recipe-plus-as",
"doc_count"=>388,
"awards"=>
{"doc_count"=>388,
"awards"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>3679.086669921875, "doc_count"=>373},
{"key"=>0.0, "doc_count"=>12},
{"key"=>73610.3203125, "doc_count"=>1},
{"key"=>244000.0, "doc_count"=>1},
{"key"=>342348.9375, "doc_count"=>1}]}}}
The problem is that in MongoDB the same query returns 388 documents that all have award.value.x_amountEur = 3679.08661250056 , as presented by Mongoid query:
Document.where(:"procuring_entity.x_slug" => "vsia-bernu-kliniska-universitates-slimnica")
.keep_if{|doc| doc.suppliers.first.x_slug == "recipe-plus-as"}
.map{|doc| doc.award.value.x_amountEur}.uniq
=>[3679.08661250056]
A query directly into MongoDB returns the same.
I have also tried to map the targeted fields as double, which gave the same result and as long which returned the following (even more incorrect result):
{"entities"=>
{"doc_count"=>24300,
"procuring_entity_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
"doc_count"=>1360,
"suppliers"=>
{"doc_count"=>1360,
"suppliers_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"recipe-plus-as",
"doc_count"=>388,
"awards"=>
{"doc_count"=>388,
"awards"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>3679, "doc_count"=>371},
{"key"=>0, "doc_count"=>12},
{"key"=>44300, "doc_count"=>1},
{"key"=>80472, "doc_count"=>1},
{"key"=>331636, "doc_count"=>1},
{"key"=>342348, "doc_count"=>1},
{"key"=>1658805, "doc_count"=>1}]}}}
I'm using Elasticsearch 2.0, mongoid 5.0.1 and mongoid-elasticsearch for indexing. I can't think of anything else to do so any suggestion is welcomed and appreciated.
I tried to test your scenario with ES 2.0 and there is something that I'm missing. I cannot make it create buckets for the award.value.x_amountEur unless I use a reverse_nested aggregation to "get out" from one nested path and into another.
So, instead of the awards aggregation that you have I'm using the same aggregation but "wrapped" in a reverse_nested aggregation:
"aggs": {
"getting_back": {
"reverse_nested": {},
"aggs": {
"awards": {
"nested": {
"path": "award.value"
},
"aggs": {
"award_amounts": {
"terms": {
"field": "award.value.x_amountEur"
}
}
}
}
}
}
}
And for this one I am seeing something ok.
Later edit: following mine and more general #Val's suggestion, the complete solution was to use reverse_nested on both awards and suppliers aggregations.

How can i map custom date format in elasticsearch and Kibana4

I have nginx logs and i have this date format [02/Mar/2015:13:02:51 +0000]
What should i use in elasticsearch and what i should put in the dateformat field of Kibana4?
curl -XGET 'http://localhost:9200/_mapping?pretty'
{
"nginx" : {
"mappings" : {
"t07_nginx" : {
"properties" : {
"#timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"body_bytes_sent" : {
"type" : "string"
},
"geoip_country_code" : {
"type" : "string"
},
"host" : {
"type" : "string"
},
"http_host" : {
"type" : "string"
},
"http_referer" : {
"type" : "string"
},
"http_user_agent" : {
"type" : "string",
"index" : "not_analyzed"
},
"http_x_forwarded_for" : {
"type" : "string"
},
"message" : {
"type" : "string"
},
"msec request_time" : {
"type" : "string"
},
"remote_addr" : {
"type" : "string"
},
"request_http_protocol" : {
"type" : "string"
},
"request_time" : {
"type" : "string"
},
"request_type" : {
"type" : "string"
},
"request_url" : {
"type" : "string"
},
"status" : {
"type" : "string"
},
"upstream_addr" : {
"type" : "string"
},
"upstream_response_time" : {
"type" : "string"
}
}
}
}
}
with the above i can't see any data(events) in Kibana
Thanks
What does the input plugin for nginx/output plugin for elasticsearch in your fluentd config file look like?
Also, make sure you have your time range setup correctly in kibana. I believe it defaults to 15 minutes.

Resources