Is it possible to define default mapping for an inner object in ElasticSearch? - elasticsearch

Say I have a document like this:
{
"events" : [
{
"event_id" : 123,
"props" : {
"version": "33"
},
{
"event_id" : 124,
"props" : {
"version": "44a"
}
]
}
Is it possible to specify that the events.props.version be mapped to some type?
I've tried:
{
"template" : "logstash-*",
...
"mappings" : {
"_default_" : {
"properties" : {
"events.props.version" : { "type" : "string" }
}
}
}
}
But that doesn't seem to work.

Please have a look at mapping API in elasticsearch Mapping API.
To set any analyzer in the inner element we need to consider each and every inner field as a separate properties set. try the following
{
"mappings": {
"properties": {
"events": {
"properties": {
"event_id": {
"type": "string",
"analyzer": "keyword"
},
"props": {
"properties": {
"version": {
"type": "string"
}
}
}
}
}
}
}
}
if this not works please provide me you mapping.

Sure, but you need to use the "object" type:
From the doc ( https://www.elastic.co/guide/en/elasticsearch/reference/1.5/mapping-object-type.html ) if you want to map
{
"tweet" : {
"person" : {
"name" : {
"first_name" : "Shay",
"last_name" : "Banon"
},
"sid" : "12345"
},
"message" : "This is a tweet!"
}
}
you can write:
{
"tweet" : {
"properties" : {
"person" : {
"type" : "object",
"properties" : {
"name" : {
"type" : "object",
"properties" : {
"first_name" : {"type" : "string"},
"last_name" : {"type" : "string"}
}
},
"sid" : {"type" : "string", "index" : "not_analyzed"}
}
},
"message" : {"type" : "string"}
}
}
}

Related

match_only_text fields do not support sorting and aggregations elasticsearch

I would like to count and sort the number of occurred message on a field of type match_only_text. Using a DSL query the output needed to have to look like this:
{" Text message 1":615
" Text message 2":568
....}
So i tried this on kibana:
GET my_index_name/_search?size=0
{
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message"
}
}
}
}
However i get this error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "match_only_text fields do not support sorting and aggregations"
}
I am interested in the field "message" this is its mapping:
"message" : {
"type" : "match_only_text"
}
This is a part of the index mapping:
"mappings" : {
"_meta" : {
"package" : {
"name" : "system"
},
"managed_by" : "ingest-manager",
"managed" : true
},
"_data_stream_timestamp" : {
"enabled" : true
},
"dynamic_templates" : [
{
"strings_as_keyword" : {
"match_mapping_type" : "string",
"mapping" : {
"ignore_above" : 1024,
"type" : "keyword"
}
}
}
],
"date_detection" : false,
"properties" : {
"#timestamp" : {
"type" : "date"
}
.
.
.
"message" : {
"type" : "match_only_text"
},
"process" : {
"properties" : {
"name" : {
"type" : "keyword",
"ignore_above" : 1024
},
"pid" : {
"type" : "long"
}
}
},
"system" : {
"properties" : {
"syslog" : {
"type" : "object"
}
}
}
}
}
}
}
Please Help
Yes, by design, match_only_text is of the text field type family, hence you cannot aggregate on it.
You need to:
A. create a message.keyword sub-field in your mapping of type keyword:
PUT my_index_name/_mapping
{
"properties": {
"message" : {
"type" : "match_only_text",
"fields": {
"keyword": {
"type" : "keyword"
}
}
}
}
}
B. update the whole index (using _update_by_query) so the sub-field gets populated and
POST my_index_name/_update_by_query?wait_for_completion=false
Then, depending on the size of your index, call GET _tasks?actions=*byquery&detailed regularly to check the progress of the task.
C. run the aggregation on that sub-field.
POST my_index_name/_search
{
"size": 0,
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message.keyword"
}
}
}
}

Not able to update mapping in elastic search

I have been trying to update my mapping but not able to do that. Majorly this question is related to updating the nested part. Suppose there is a field "Anand" which contains a field "hello"
{
"properties": {
"anand": {
"hello": {
"type": "short"
}
}
}
}
But I am getting the error
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "No type specified for field [anand]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "No type specified for field [anand]"
},
"status" : 400
}
Current Mapping is
{
"anandschool" : {
"mappings" : {
"properties" : {
"anand" : {
"type" : "nested"
},
"doc" : {
"properties" : {
"properties" : {
"properties" : {
"shop_tier" : {
"type" : "long"
}
}
}
}
},
"message" : {
"type" : "byte"
},
"properties" : {
"properties" : {
"shop_tier" : {
"type" : "long"
},
"shop_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"shop" : {
"type" : "long"
}
}
}
}
}
I even created a nested type anand so that it can work
{
"properties": {
"anand": {
"type": "nested"
}
}
}
Self Answer
When updating mapping for nested things need to update properties of the nested field.
For above example update by
"properties": {
"anand": {
"properties":{
"hello": {
"type": "short"
}
}
}
}
}
THough This will not work inside a field which is nested.Ex if anand type was "nested", it would not work. If anyone knows solution for that let me know.

Must match query is not working in elastic search

I am tying to find all Videos with the name "The Shining"
but also with a parent_id = 189, and parent_type = "folder"
My query seams to connect all of the match statements with "OR" instead of "AND"
What am I doing wrong?
{
"fields": ["name","parent_id","parent_type"],
"query": {
"and": {
"must":[
{
"match": {
"name": "The Shining"
}
},
{
"match": {
"parent_id": 189
}
},
{
"match": {
"parent_type": "folder"
}
}
]
}
}
}
Mapping:
{"video" : {
"mappings" : {
"video" : {
"properties" : {
"homepage_tags" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "integer"
},
"metaType" : {
"type" : "string"
},
"tag_category_name" : {
"type" : "string"
},
"tag_category_order" : {
"type" : "integer"
},
"tag_name" : {
"type" : "string"
}
}
},
"id" : {
"type" : "integer"
},
"name" : {
"type" : "string"
},
"parent_id" : {
"type" : "integer"
},
"parent_type" : {
"type" : "string"
},
"provider" : {
"type" : "string"
},
"publish" : {
"type" : "string"
},
"query" : {
"properties" : {
"bool" : {
"properties" : {
"must" : {
"properties" : {
"match" : {
"properties" : {
"name" : {
"type" : "string"
},
"parent_id" : {
"type" : "long"
}
}
}
}
}
}
}
}
},
"source_id" : {
"type" : "string"
},
"subtitles" : {
"type" : "nested",
"include_in_root" : true,
"properties" : {
"content" : {
"type" : "string",
"store" : true,
"analyzer" : "no_stopwords"
},
"end_time" : {
"type" : "float"
},
"id" : {
"type" : "integer"
},
"parent_type" : {
"type" : "string"
},
"start_time" : {
"type" : "float"
},
"uid" : {
"type" : "integer"
},
"video_id" : {
"type" : "string"
},
"video_parent_id" : {
"type" : "integer"
},
"video_parent_type" : {
"type" : "string"
}
}
},
"tags" : {
"type" : "nested",
"properties" : {
"content" : {
"type" : "string"
},
"end_time" : {
"type" : "string"
},
"id" : {
"type" : "integer"
},
"metaType" : {
"type" : "string"
},
"parent_type" : {
"type" : "string"
},
"start_time" : {
"type" : "string"
}
}
},
"uid" : {
"type" : "integer"
},
"vid_url" : {
"type" : "string"
}
}
}
}}}
I was able to solve the problem by reading Volodymryrs answer above.
Aditional matches were being found because they were matching "the". I tried to add the operator argument, but that did not work unfortunately. What I did instead was to use "match_phrase" and also switched my two other match fields to "term" - see my answer below –
[
'query' => [
'bool' => [
'must' => [
[
'match_phrase' => [
'name' => $searchTerm
]
],
[
'term' => [
'parent_id' => intVal($parent_id)
]
],
[
'term' => [
'parent_type' => strtolower($parent_type)
]
]
]
]
]
];
You will get matches on the or shining because of match query, and depending on matches you will get score. One of the easiest fixes would be to add operator and:
{
"match": {
"name": "The Shining",
"operator": "and"
}
}
But it's not what you need since this will also match names "shining The" or "The sun is shining".
Other option is that if you need to do exact matches on name, then you would need to make field name as non-analyzed. In ES 5 you can set field type as a keyword
In addition I would recommend you to use bool query with term queries since they will do exact match.
{
"fields": ["name","parent_id","parent_type"],
"query": {
"bool": {
"must":[{
"term": {
"name": "The Shining"
}
},
{
"term": {
"parent_id": 189
}
},
{
"term": {
"parent_type": "folder"
}
}
]
}
}
}

Mysteriously wrong values of numerical fields in ElasticSearch

I've spent the last 2 days investigating this mind-bending issue:I have an index with custom mappings on which I perform some aggregations. The problem is that in the results of the aggregation on numerical fields,it returns values that do not appear in the database from which the data was imported, even though the number of results is the same.
I found a similar issue here where the problem was inconsistent mapping of a field across an index, but in my case it is mapped as the same type. The problem happens with the fields: award.value.amount, award.value.x_amountEur, tender.value.x_amountEur as far as I have checked.This is my current mapping as stated by curl -XGET 'http://localhost:9200/documents/_mappings?pretty&human'
(the part that contains the target fields):
{
"documents" : {
"mappings" : {
"document" : {
"properties" : {
"additionalIdentifiers" : {
"type" : "string",
"index" : "not_analyzed"
},
"award" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"contract_number" : {
"type" : "string",
"index" : "not_analyzed"
},
"date" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"x_day" : {
"type" : "integer"
},
"x_month" : {
"type" : "integer"
},
"x_year" : {
"type" : "integer"
}
}
},
"description" : {
"type" : "string"
},
"initialValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_vat" : {
"type" : "float"
}
}
},
"minValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"x_amountEur" : {
"type" : "float"
}
}
},
"title" : {
"type" : "string"
},
"value" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_amountEur" : {
"type" : "float"
},
"x_vat" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
},
"x_initialValue" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"x_amountEur" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
}
}
},
"awardCriteria" : {
"type" : "string"
},
"contract_number" : {
"type" : "string"
},
"document_id" : {
"type" : "string",
"index" : "not_analyzed"
},
"numberOfTenderers" : {
"type" : "string"
},
"procurementMethod" : {
"type" : "string"
},
"procuring_entity" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"address" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"country" : {
"type" : "string"
},
"countryName" : {
"type" : "string",
"index" : "not_analyzed"
},
"email" : {
"type" : "string"
},
"locality" : {
"type" : "string"
},
"postalCode" : {
"type" : "string"
},
"streetAddress" : {
"type" : "string"
},
"telephone" : {
"type" : "string"
},
"x_url" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string"
},
"x_slug" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"suppliers" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"address" : {
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"email" : {
"type" : "string"
},
"locality" : {
"type" : "string"
},
"postalCode" : {
"type" : "string"
},
"streetAddress" : {
"type" : "string"
},
"telephone" : {
"type" : "string"
},
"x_url" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string"
},
"x_slug" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"tender" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"value" : {
"type" : "nested",
"properties" : {
"_id" : {
"properties" : {
"$oid" : {
"type" : "string"
}
}
},
"amount" : {
"type" : "float"
},
"currency" : {
"type" : "string"
},
"x_amountEur" : {
"type" : "float"
},
"x_vat" : {
"type" : "float"
},
"x_vatbool" : {
"type" : "boolean"
}
}
}
}
}
This is the aggregation I am using in order to get the values of contracts between each pair of supplier - procuring_entity:
Document.es.search({
"search_type": "count" ,
"body":{
"aggregations": {
"entities":{
"nested": {
"path": "procuring_entity"
},
"aggs": {
"procuring_entity_names": {
"terms": {
"field": "procuring_entity.x_slug",
"size": 0
},
"aggs": {
"suppliers": {
"nested": {
"path": "suppliers"
},
"aggs": {
"suppliers_names": {
"terms":{
"field": "suppliers.x_slug",
"size": 0
},
"aggs": {
"awards": {
"nested": {
"path": "award.value"
},
"aggs": {
"award_amounts": {
"terms":{
"field": "award.value.x_amountEur",
"size": 0
}
}
}
}
}
}
}
}
}
}
}
}
}
}})
The result with type float is :
{"entities"=>
{"doc_count"=>24300,
"procuring_entity_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
"doc_count"=>1360,
"suppliers"=>
{"doc_count"=>1360,
"suppliers_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"recipe-plus-as",
"doc_count"=>388,
"awards"=>
{"doc_count"=>388,
"awards"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>3679.086669921875, "doc_count"=>373},
{"key"=>0.0, "doc_count"=>12},
{"key"=>73610.3203125, "doc_count"=>1},
{"key"=>244000.0, "doc_count"=>1},
{"key"=>342348.9375, "doc_count"=>1}]}}}
The problem is that in MongoDB the same query returns 388 documents that all have award.value.x_amountEur = 3679.08661250056 , as presented by Mongoid query:
Document.where(:"procuring_entity.x_slug" => "vsia-bernu-kliniska-universitates-slimnica")
.keep_if{|doc| doc.suppliers.first.x_slug == "recipe-plus-as"}
.map{|doc| doc.award.value.x_amountEur}.uniq
=>[3679.08661250056]
A query directly into MongoDB returns the same.
I have also tried to map the targeted fields as double, which gave the same result and as long which returned the following (even more incorrect result):
{"entities"=>
{"doc_count"=>24300,
"procuring_entity_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
"doc_count"=>1360,
"suppliers"=>
{"doc_count"=>1360,
"suppliers_names"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>"recipe-plus-as",
"doc_count"=>388,
"awards"=>
{"doc_count"=>388,
"awards"=>
{"doc_count_error_upper_bound"=>0,
"sum_other_doc_count"=>0,
"buckets"=>
[{"key"=>3679, "doc_count"=>371},
{"key"=>0, "doc_count"=>12},
{"key"=>44300, "doc_count"=>1},
{"key"=>80472, "doc_count"=>1},
{"key"=>331636, "doc_count"=>1},
{"key"=>342348, "doc_count"=>1},
{"key"=>1658805, "doc_count"=>1}]}}}
I'm using Elasticsearch 2.0, mongoid 5.0.1 and mongoid-elasticsearch for indexing. I can't think of anything else to do so any suggestion is welcomed and appreciated.
I tried to test your scenario with ES 2.0 and there is something that I'm missing. I cannot make it create buckets for the award.value.x_amountEur unless I use a reverse_nested aggregation to "get out" from one nested path and into another.
So, instead of the awards aggregation that you have I'm using the same aggregation but "wrapped" in a reverse_nested aggregation:
"aggs": {
"getting_back": {
"reverse_nested": {},
"aggs": {
"awards": {
"nested": {
"path": "award.value"
},
"aggs": {
"award_amounts": {
"terms": {
"field": "award.value.x_amountEur"
}
}
}
}
}
}
}
And for this one I am seeing something ok.
Later edit: following mine and more general #Val's suggestion, the complete solution was to use reverse_nested on both awards and suppliers aggregations.

elasticsearch nested query with ruby gem

I am using the elasticsearch ruby gem to connect to an es server and currently have an index with the below mapping. I am trying to understand the proper syntax to query these nested objects. Experimenting with queries such as the following, but keep getting errors. I was wondering if someone could get me started on the proper syntax for querying a structure such as this? thanks!
client = Elasticsearch::Client.new log:true
client.search index: 'injuries', nested: { path: { week: {id: '1' } } }
Returns:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":"SearchPhaseExecutionException[Failed to execute phase [query
Sample Mapping:
{
"injuries" : {
"mappings" : {
"tbd" : {
"properties" : {
"injuries" : {
"properties" : {
"timestamp" : {
"properties" : {
"__content__" : {
"type" : "string"
},
"timeZone" : {
"type" : "string"
}
}
}
}
}
}
},
"football" : {
"properties" : {
"injuries" : {
"properties" : {
"timestamp" : {
"properties" : {
"__content__" : {
"type" : "string"
},
"timeZone" : {
"type" : "string"
}
}
},
"week" : {
"properties" : {
"id" : {
"type" : "string"
},
"inactivePlayers" : {
"properties" : {
"inactivePlayer" : {
"properties" : {
"firstName" : {
"type" : "string"
},
"lastName" : {
"type" : "string"
},
"playerId" : {
"type" : "string"
},
"position" : {
"type" : "string"
},
"status" : {
"type" : "string"
},
"teamId" : {
"type" : "string"
}
}
}
}
},
"injuredPlayers" : {
"properties" : {
"injuredPlayer" : {
"properties" : {
"displayName" : {
"type" : "string"
},
"firstName" : {
"type" : "string"
},
"gameStatus" : {
"type" : "string"
},
"injury" : {
"type" : "string"
},
"lastName" : {
"type" : "string"
},
"playerId" : {
"type" : "string"
},
"position" : {
"type" : "string"
},
"practiceStatus" : {
"type" : "string"
},
"teamId" : {
"type" : "string"
}
}
}
}
},
"season" : {
"type" : "string"
},
"seasonType" : {
"type" : "string"
}
}
}
}
}
}
}
}
}
}
Your nested query doesn't appear to have a query defined. I think it should be something like:
"nested" : {
"path" : "week",
"query" : {
"match" : {"week.id" : "1"}
}
}

Resources