Aggregating a Key/Value list in ElasticSearch - elasticsearch

General problem is that I've created a Name/Value mapping in elastic search to deal with a potentially huge user input of tags - as opposed to allowing an open schema where people can just create documents with new properties.
I've got an elastic search mapping that looks like this:
"Tags" : {
"properties" : {
"Value" : {
"analyzer" : "keyword",
"type" : "string"
},
"Name" : {
"analyzer" : "keyword",
"type" : "string"
}
}
},
With records that look like this
"Tags" : [
{
"Name" : "group",
"Value" : "foobar"
},
{
"Name" : "season",
"Value" : "winter"
}
],
What I'm trying to do with an elastic search query is to write a script that will aggregate only the season entries.
...
"script" : "for (int i = 0; i < doc['Tags.Value'].values.length; i++) {
if (doc['Tags.Value'].values[i] == 'season') {
return doc['Tags.Names'].values[i]
} }"
...
I've gone through about 200 permutations of the above script and it's not quite returning the results that I would like to see.

Your Tags field should be nested so that you can write a nested query to only select the season tags and then you can aggregate on those values only. That would allow you to ditch that script which is going to perform very badly if you have a huge amount of tags.
So your mapping needs to look like this:
"Tags" : {
"type": "nested", <---- add this
"properties" : {
"Value" : {
"analyzer" : "keyword",
"type" : "string"
},
"Name" : {
"analyzer" : "keyword",
"type" : "string"
}
}
},
Then your query should include a nested clause on the season tag names, so that your terms aggregation can simply work on those values.
{
"query": {
"filtered": {
"filter": {
"nested": {
"path": "Tags",
"filter": {
"term": {
"Tags.Name": "season"
}
}
}
}
}
},
"aggs": {
"season_tags": {
"nested": {
"path": "Tags"
},
"aggs": {
"season_values": {
"terms": {
"field": "Tags.Value"
}
}
}
}
}
}

Related

How can I use query_string to match both nested and non-nested fields at the same time?

I have an index with a mapping something like this:
"email" : {
"type" : "nested",
"properties" : {
"from" : {
"type" : "text",
"analyzer" : "lowercase_keyword",
"fielddata" : true
},
"subject" : {
"type" : "text",
"analyzer" : "lowercase_keyword",
"fielddata" : true
},
"to" : {
"type" : "text",
"analyzer" : "lowercase_keyword",
"fielddata" : true
}
}
},
"textExact" : {
"type" : "text",
"analyzer" : "lowercase_standard",
"fielddata" : true
}
I want to use query_string to search for matches in both the nested and the non-nested field at the same time, e.g.
email.to:foo#example.com AND textExact:bar
But I can't figure out how to write a query that will search both fields at once. The following doesn't work, because query_string searches do not return nested documents:
"query": {
"query_string": {
"fields": [
"textExact",
"email.to"
],
"query": "email.to:foo#example.com AND textExact:bar"
}
}
I can write a separate nested query, but that will only search against nested fields. Is there any way I can use query_string to match both nested and non-nested fields at the same time?
I am using Elasticsearch 6.8. Cross-posted on the Elasticsearch forums.
Nested documents can only be queried with the nested query.
You can follow below two approaches.
1. You can combine nested and normal query in must clause, which works like "and" for different queries.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "email",
"query": {
"term": {
"email.to": "foo#example.com"
}
}
}
},
{
"match": {
"textExact": "bar"
}
}
]
}
}
}
2. copy-to
The copy_to parameter allows you to copy the values of multiple fields into a group field, which can then be queried as a single field.
{
"mappings": {
"properties": {
"textExact":{
"type": "text"
},
"to_email":{
"type": "keyword"
},
"email":{
"type": "nested",
"properties": {
"to":{
"type":"keyword",
"copy_to": "to_email" --> copies to non-nested field
},
"from":{
"type":"keyword"
}
}
}
}
}
}
Query
{
"query": {
"query_string": {
"fields": [
"textExact",
"to_email"
],
"query": "to_email:foo#example.com AND textExact:bar"
}
}
}
Result
"_source" : {
"textExact" : "bar",
"email" : [
{
"to" : "sdfsd#example.com",
"from" : "a#example.com"
},
{
"to" : "foo#example.com",
"from" : "sdfds#example.com"
}
]
}

How to split object (nested) into multiple columns in Elasticsearch / Kibana data table visualization

I have a nested object indexed in elasticsearch (7.10) and I need to visualize it with a kibana table. The problem is that kibana throws in the values from the nested field which have the same name in one column.
Part of the index:
{
"index" : {
"mappings" : {
"properties" : {
"data1" : {
"type" : "keyword"
},
"Details" : {
"type" : "nested",
"properties" : {
"Amount" : {
"type" : "float"
},
"Currency" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"DetailType" : {
"type" : "keyword"
},
"Price" : {
"type" : "float"
},
"Quantity" : {
"type" : "float"
},
"TotalAmount" : {
"type" : "float"
.......
The problem in the table:
How can I get three rows named Details each with one split term (e.g DetailType: "start_fee")?
Update:
I could query the nested object in the console:
GET _search
{
"query": {
"nested": {
"path": "Details",
"query": {
"bool": {
"must": [
{ "match": { "Details.DetailType": "energybased_fee" }}
]
}
},
"inner_hits": {
}
}}}
But how can I visualize in the table only the "inner_hits" value?

Elasticsearch: Sort the Documents on the index value of the search string in a text field

I have Elasticsearch data like this-
PUT /text/_doc/1
{
"name": "pdf1",
"text":"For the past six weeks. The unemployment crisis has unfolded so suddenly and rapidly."
}
PUT /text/_doc/2
{
"name": "pdf2",
"text":"The unemployment crisis has unfolded so suddenly and rapidly."
}
In this example I am making a full text search, I am searching for all the documents that have "unemployment" sub-string in the "text" field. And in the end i want all the documents sorted in the ascending order of the index value of "unemployment" string in the "text" field. For eg - the sub-string "unemployment" comes first in the doc2 at index "4" so i want this document to be returned first in the results.
GET /text/_search?pretty
{
"query": {
"match": {
"text": "unemployment"
}
}
}
I have tried few things like term_vector, here is the mapping that i used but it didn't help.
PUT text/_mapping
{
"properties": {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"text" : {
"type" : "text",
"term_vector": "with_positions_offsets"
}
}
}
Can anyone please help me in making the right mapping and search Query?
Thanks in Advance!
Try this query
GET text/_search
{
"query": {
"function_score": {
"query": {
"match": {
"text": "unemployment"
}
},
"functions": [
{
"script_score": {
"script": {
"source": """
def docval = doc['text.keyword'].value;
def length = docval.length();
def index = (float) docval.indexOf('unemployment');
// the sooner the word appears the better so 'invert' the 'index'
return index > -1 ? (1 / index) : 0;
"""
}
}
}
],
"boost_mode": "sum"
}
}
}
using the auto-generated mapping
{
"text" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
Note that this is case-sensitive so it'd be reasonable to have a lowercase-normalized keyword field too and then access it in the script score script. This might get you on the right path.

Elasticsearch terms aggregate duplicates

I have a field using a ngram analyzer and trying to use a terms aggregate on the field to return unique documents by the field. The returned keys in the aggregates don't match the documents fields being returned and I'm getting duplicate fields.
"analysis" : {
"filter" : {
"autocomplete_filter" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "20"
}
},
"analyzer" : {
"autocomplete" : {
"type" : "custom",
"filter" : [ "lowercase", "autocomplete_filter" ],
"tokenizer" : "standard"
}
}
}
}
"name" : {
"type" : "string",
"analyzer" : "autocomplete",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
{
"query": {
"query_string": {
"query":"bra",
"fields":["name"],
"use_dis_max":true
}
},
"aggs": {
"group_by_name": {
"terms": { "field":"name.raw" }
}
}
}
I'm getting back the following names and keys.
Braingeyser, Brainstorm, Braingeyser, Brainstorm, Brainstorm, Brainstorm, Bramblecrush, Brainwash, Brainwash, Braingeyser
{"key":"Bog Wraith","doc_count":18}
{"key":"Birds of Paradise","doc_count":15}
{"key":"Circle of Protection: Black","doc_count":15}
{"key":"Lightning Bolt","doc_count":15}
{"key":"Grizzly Bears","doc_count":14}
{"key":"Black Knight","doc_count":13}
{"key":"Bad Moon","doc_count":12}
{"key":"Boomerang","doc_count":12}
{"key":"Wall of Bone","doc_count":12}
{"key":"Balance","doc_count":11}
How can I get elasticsearch to only return unique fields from the aggregate?
To remove duplicates being returned in your aggregate you can try:
"aggs": {
"group_by_name": {
"terms": { "field":"name.raw" },
"aggs": {
"remove_dups": {
"top_hits": {
"size": 1,
"_source": false
}
}
}
}
}

Search query for elastic search

I have documents in elastic search in the following format
{
"stringindex" : {
"mappings" : {
"files" : {
"properties" : {
"BaseOfCode" : {
"type" : "long"
},
"BaseOfData" : {
"type" : "long"
},
"Characteristics" : {
"type" : "long"
},
"FileType" : {
"type" : "long"
},
"Id" : {
"type" : "string"
},
"Strings" : {
"properties" : {
"FileOffset" : {
"type" : "long"
},
"RO_BaseOfCode" : {
"type" : "long"
},
"SectionName" : {
"type" : "string"
},
"SectionOffset" : {
"type" : "long"
},
"String" : {
"type" : "string"
}
}
},
"SubSystem" : {
"type" : "long"
}
}
}
}
}
}
My requirement is when I search for a particular string (String.string) i want to get only the FileOffSet (String.FileOffSet) for that string.
How do i do this?
Thanks
I suppose that you want to perform a nested query and retrieve only one field as the result, but I see problems in your mapping, hence I will split my answer in 3 sections:
What is the problem I see:
How to query nested fields (this is more ES background):
How to find a solution:
1) What is the problem I see:
You want to query a nested field, but you don't have a nested field.
The nested field part:
The field "Strings" is not nested in the type "files" (nested data without a nested field may bring future problems), otherwise your mapping for the field "Strings" would be something like this:
{
"stringindex" : {
"mappings" : {
"files" : {
"properties" : {
"Strings" : {
"properties" : {
"type" : "nested",
"String" : {
"type" : "string"
}
}
}
}
}
}
}
}
Note: yes, I cut most of the fields, but I did this to easily show that you didn't create a nested field.
With a nested field "in hands", we need a nested query.
The specific field result part:
To retrieve only one field as result, you have to include the property "_source" in your query.
2) How to query nested fields:
This is more for ES background, if you have never worked with nested fields.
Small example:
You define a type with a nested field:
{
"nesttype" : {
"properties" : {
"name" : { "type" : "string" },
"parents" : {
"type" : "nested" ,
"properties" : {
"sex" : { "type" : "string" },
"name" : { "type" : "string" }
}
}
}
}
}
You create some inputs:
{ "name" : "Dan", "parents" : [{ "name" : "John" , "sex" : "m" },
{ "name" : "Anna" , "sex" : "f" }] }
{ "name" : "Lana", "parents" : [{ "name" : "Maria" , "sex" : "f" }] }
Then you query, but only fetch the nested field "parents.name":
{
"query": {
"nested": {
"path": "parents",
"query": {
"bool": {
"must": [
{
"term": {
"sex": "m"
}
}
]
}
}
}
},
"_source" : [ "parents.name" ]
}
The output of this query is "the name of the parents of all people who have a parent of the sex 'm' ". One entry (Dan) has a father, whereas the other (Lana) doesn't. So it only will retrieve Dan's parents names.
3) How to find a solution:
To fix your mapping:
You only need to include the type "nested" in the field "Strings":
{
"files" : {
"properties" : {
...
"Strings" : {
"type" : "nested" ,
"properties" : {
"FileOffset" : { "type" : "long" },
"RO_BaseOfCode" : { "type" : "long" },
...
}
}
...
}
}
}
To query your data:
{
"query": {
"nested": {
"path": "Strings",
"query": {
"bool": {
"must": [
{
"term": {
"String": "my string"
}
}
]
}
}
}
},
"_source" : [ "Strings.FileOffSet" ]
}
Great answer by dan, but I think he didn't mention it all.
His solution don't work for your question, but I guess you even don't know that.
Consider a scenario where data is like ,
doc_1
{
"Id": 1,
"Strings": [
{
"string": "x",
"fileoffset": "f1"
},
{
"string": "y",
"fileoffset": "f2"
}
]
}
doc_2
{
"Id": 2,
"Strings": {
"string": "z",
"fileoffset": "f3"
}
}
When you run the like dan said, like say let's apply filter with Strings.string=x then response is like,
{
"hits": [
{
"_index": "stringindex",
"_type": "files",
"_id": "11961",
"_score": 1,
"_source": {
"Strings": [
{
"fileoffset": "f1"
},
{
"fileoffset": "f2"
}
]
}
}
]
}
This is because, elasticsearch will get hits from documents where any of the object inside nested field (here Strings) pass the filter criteria. (In this case in doc_1, Strings.string=x passed filter, so doc_1 is returned. But we don't know which nested object pass the criteria.
So, you have to use nested_aggregation,
Here is a solution for you..
POST index/type/_search
{
"size": 0,
"aggs": {
"StringsNested": {
"nested": {
"path": "Strings"
},
"aggs": {
"StringFilter": {
"filter": {
"term": {
"Strings.string": "x"
}
},
"aggs": {
"FileOffsets": {
"terms": {
"field": "Strings.fileoffset"
}
}
}
}
}
}
}
}
So, response is like,
"aggregations": {
"StringsNested": {
"doc_count": 2,
"StringFilter": {
"doc_count": 1,
"FileOffsets": {
"buckets": [
{
"key": "f1",
"doc_count": 1
}
]
}
}
}
}
Remember to have mapping of Strings as nested, as dan said.

Resources