Facet to get all keys from an object in elasticsearch - elasticsearch

Let's say I have the following docs:
{
"title": "Some Title",
options: {
"key5": 1,
"key3": 0,
"key1": 1,
}
},
{
"title": "Some Title",
options: {
"key2": 0,
"key3": 0,
"key5": 1,
}
}
I want to get all the keys from options object using facet.
If options was a simple array of keys as strings, I would simple use a facet like this:
"facets" : {
"options" : {
"terms" : {
"field" : "options"
}
}
}
But it doesn't work in my case.
So if a query returns those two docs, I should get these keys: ["key5","key3","key1","key2"]
What kind of facet do I actually need?

You can't do that using a facet.
You have 2 options -
Keep your current document structure and get the list of keys from the type mapping (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html). This brings the scheme of your type which holds all the fields encountered.
Change your structure. Keep the key also as a field, so your option array becomes an array of documents like:
"options" :
[
{ "key" : "key1", "value" : 1},
{ "key" : "key2", "value" : 0}
]
You probably will want to keep the context of the key-value pairs when searching or faceting so configure it as a nested type (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html).
Then you can facet on the "options.key" field to get a list of top keys.

if i understand you correctly, you would want to make a terms_facet for each and every field in your nested options object. kind of a "wildcard facet"?
i think that there is no functionality in the facet api that allows for this kind of operation. if i am not mistaken, fields used for faceting have to be mapped, so it might be possible to extract the fields in a separate query by inspecting the index mappings.

Related

Filter nested array using jmes query

I have to get the name of companies in which 'John' worked in the 'sales' department. My JSON
looks like this:
[
{
"name" : "John",
"company" : [{
"name" : "company1",
"department" : "sales"
},
{
"name" : "company2",
"department" : "backend"
},
{
"name" : "company3",
"department" : "sales"
}
],
"phone" : "1234"
}
]
And my jmesquery is like this:
jmesquery: "[? name=='John'].company[? department=='sales'].{Company: name}"
But with this query, I'm getting a null array.
This is because your first filter [?name=='John'] is creating a projection, and more specifically a filter projection, that you will have to reset in order to further filter it.
Resetting a projection can be achieved using pipes.
Projections are an important concept in JMESPath. However, there are times when projection semantics are not what you want. A common scenario is when you want to operate of the result of a projection rather than projecting an expression onto each element in the array.
For example, the expression people[*].first will give you an array containing the first names of everyone in the people array. What if you wanted the first element in that list? If you tried people[*].first[0] that you just evaluate first[0] for each element in the people array, and because indexing is not defined for strings, the final result would be an empty array, []. To accomplish the desired result, you can use a pipe expression, <expression> | <expression>, to indicate that a projection must stop.
Source: https://jmespath.org/tutorial.html#pipe-expressions
So, here would be a first step in your query:
[?name=='John'] | [].company[?department=='sales'].{Company: name}
This said, this still ends in an array of array:
[
[
{
"Company": "company1"
},
{
"Company": "company3"
}
]
]
Because you can end up with multiple people named John in a sales department.
So, one array for the users and another for the companies/departments.
In order to fix this, you can use the flatten operator: [].
So we end with:
[?name=='John'] | [].company[?department=='sales'].{Company: name} []
Which gives:
[
{
"Company": "company1"
},
{
"Company": "company3"
}
]

Is there a way to apply the synonym token filter in ElasticSearch to field names rather than the value?

Consider the following JSON file:
{
"titleSony": "Matrix",
"cast": [
{
"firstName": "Keanu",
"lastName": "Reeves"
}
]
}
Now, I know in ElasticSearch, you can apply a synonym token filter to field values as given in the following link: Elasticsearch Analysis: Synonym token filter.
Hence, I can create a "synonym.txt" file with Matrix => Matx, then if I search for titleSony:Matx, it will return the documents with Matrix as well.
Now, what I would like is to create a synonym for the field name titleSony. For example - titleSony => titleAll, such that when I search for titleAll, I should get all documents with titleSony as well.
Is there any way to accomplish this in ElasticSearch?
Now, what I would like is to create a synonym for the field name "titleSony". For example - titleSony => titleAll , hence when I search for "titleAll", I should get all documents with "titleSony" as well.
Yes, somewhat. Elasticsearch has some default behavior very similar to this, which I'll touch on in a bit.
The feature you're looking for is called "Copy to field." It allows you to specify that the terms in one field should be copied into another. This is useful for consolidating terms you expect to match into a single field, to help simplify your query when you would like to match against any one of a number of fields.
In this example, you would specify in your mapping that the terms in the titleSony field ought to be copied into the titleAll field. Presumably you'd have other fields (say, titleDisney) which also copy into that field as well. So a search against titleAll will effectively match the other fields whose terms are copied into it.
An excerpt of your mapping might look something like this:
{
"movies" : {
"properties" : {
"titleSony" : { "type" : "string", "copy_to" : "titleAll" },
"titleDisney" : { "type" : "string", "copy_to" : "titleAll" },
"titleAll" : { "type" : "string" },
"cast" : { ... },
...
}
}
I mentioned earlier that Elasticsearch does something like this. By default it creates a special field called _all into which all the document's terms are copied. This field lets you construct very simple queries to match against terms that occur in any field on the document. So as you see, this is a fairly common convention in Elasticsearch. (Elasticsearch mapping: _all field.)

Elastic search filter value like "123-325-23243" during aggregation

In elastic search query when I try to aggregate, I have value like 1234-3245-34234-2342 it just returns with key: 1234
Is there any possibility in mentionings the property type or regular expression in it
Some more explanation :
"aggregations": { "myagg": { "terms": { "field": "did", "size": 50 } } }
When I try it on the data the values are like ABC-CDEF-DEFG and after running the script it is not able aggregate it. It shows the key only to be ABC and
"key" : "ABC", "doc_count" : 24069
It can't take the entire key like ABC-DEF-GHI-fhho
Check your mapping, I expect you did not do anything for the mapping. That is when you can the standard analyzer for strings. The standard analyser brakes up at the "-", that is why you get the term you mentioned. Make the field not_analyzed and you should get better results.
When i use field.raw that fixes the issue...https://github.com/elasticsearch/kibana/issues/364

Elasticsearch aggregation on object

How do I can run an aggregation query only on object property, but get all properties in result? e.g. I want to get [{'doc_count': 1, 'key': {'id': 1, 'name': 'tag name'}}], but got [{'doc_count': 1, 'key': '1'] instead. Aggregation on field 'tags' returns zero results.
Mapping:
{
"test": {
"properties" : {
"tags" : {
"type" : "object",
"properties": {
"id" : {"type": "string", "index": "not_analyzed"},
"name" : {"type": "string", "index": "not_analyzed", "enabled": false}
}
}
}
}
}
Aggregation query: (returns only IDs as expected, but how can I get ID & name pairs in results?)
'aggregations': {
'tags': {
'terms': {
'field': 'tags.id',
'order': {'_count': 'desc'},
},
}
}
EDIT:
Got ID & Name by aggregating on "script": "_source.tags" but still looking for faster solution.
you can use a script if you want, e.g.
"terms":{"script":"doc['tags.id'].value + '|' + doc['tags.name'].value"}
for each created bucket you will get a key with the values of the fields that you have included in your script. To be honest though, the purpose of aggregations is not to return full docs back, but to do calculations on groups of documents (buckets) and return the results, e.g. sums and distinct values. What you actually doing with your query is that you create buckets based on the field tags.id.
Keep in mind that the key on the result will include both values separated with a '|' so you might have to manipulate its value to extract all the information that you need.
It's also possible to nest aggregation, you could aggregate by id, then by name.
Additional information, the answer above (cpard's one) works perfectly with nested object. Maybe the weird results that you got are from the fact that you are using object and not nested object.
The difference between these types is that nested object keeps the internal relation between the element in an object. That is why "terms":{"script":"doc['tags.id'].value + '|' + doc['tags.name'].value"} make sense. If you use object type, elasticsearch doesn't know which tags.name are with which tags.id.
For more detail:
https://www.elastic.co/blog/managing-relations-inside-elasticsearch

Changing elasticsearch mapping

Take the simplest case of indexing the following document in elasticsearch
{
"name": "Mark",
"age": 28
}
With automatic mapping the mapping for this index would now look like
"properties" : {
"doc" : {
"properties" : {
"age" : { "type" : "long"},
"name" : { "type" : "string"
}
}
},
But say I then wanted to allow the case where this document should be indexed
{
"name": "Bill",
"age": "seven"
}
If I try this the mapping does not update and elasticsearch throws an error since there is a conflict with the type of the age property.
Is there any way to do this so both docs could be automatically indexed and consequently queryable?
Mappings are defined per type so what you could do is having two types in your index:
numeric
alphabetical
And split the documents according to the value in the age field. If you run a query you can query both types.
you can add new fields and update a mapping. But you cannot update a mapping.To do that you need to drop the index and create a new mapping and index the data..!
For more info refer this link reference
You can't change existing mapping.You can only add new field in it.
Or you have to delete old mapping & create a new mapping for that particular index.

Resources