How to get distinct keys of a nested object in an elasticsearch document? - elasticsearch

I'd like to look across an index for a unique list of keys in a nested object.
So in the example below, I want the output
["alpha", "beta", "gamma", "sigma", "theta" ]
Much of the google search results were around unique values instead of the keys.
Example docs:
{
"foo": "bar",
"fooNested": {
"alpha": 1,
"beta": 4,
"gamma": 2,
}
},
{
"foo": "HelloWorld",
"fooNested": {
"sigma": 9,
"theta": 1
}
}
Is this possible using the rest api?

You can use mapping api to get all properties in index and parse it client side to list properties under nested object or you can store fields as values and query it.
example.
"fooNested": {
"sigma": 9,
"theta": 1,
"keys":["sigma","theta"]
}

Related

how to use Elastic Search nested queries by object key instead of object property

Following the Elastic Search example in this article for a nested query, I noticed that it assumes the nested objects are inside an ARRAY and that queries are based on some object PROPERTY:
{
nested_objects: [ <== array
{ name: "x", value: 123 },
{ name: "y", value: 456 } <== "name" property searchable
]
}
But what if I want nested objects to be arranged in key-value structure that gets updated with new objects, and I want to search by the KEY? example:
{
nested_objects: { <== key-value, not array
"x": { value: 123 },
"y": { value: 456 } <== how can I search by "x" and "y" keys?
"..." <=== more arbitrary keys are added now and then
]
}
Thank you!
You can try to do this using the query_string query, like this:
GET my_index/_search
{
"query": {
"query_string": {
"query":"nested_objects.\\*.value:123"
}
}
}
It will try to match the value field of any sub-field of nested_objects.
Ok, so my final solution after some ES insights is as follows:
1. The fact that my object keys "x", "y", ... are arbitrary causes a mess in my index mapping. So generally speaking, it's not a good ES practice to plan this kind of structure... So for the sake of mappings, I resort to the structure described in the "Weighted tags" article:
{ "name":"x", "value":123 },
{ "name":"y", "value":456 },
...
This means that, when it's time to update the value of the sub-object named "x", I'm having a harder (and slower) time finding it: I first need to query the entire top-level object, traverse the sub objects until I find one named "x" and then update its value. Then I update the entire sub-object array back into ES.
The above approach also causes concurrency issues in case I have multiple processes updating the same index. ES has optimistic locking I can use to retry when needed, or, I can queue updates and handle them serially

Count Unique Objects

My index looks like this:
"_source": {
"ProductName": "Random Product Name",
"Views": {
"Washington": [
{ "4nce5bbszjfppltvc": "2018-04-07T18:25:16.160Z" },
{ "4nce5bba8jfpowm4i": "2018-04-07T18:05:39.714Z" },
{ "4nce5bbszjfppltvc": "2018-04-07T18:36:23.928Z" },
]
}
}
I am trying to count the number of unique objects in Views.Washington.
In this case, the result would be 2, since two objects have the same key names. ( first and third object in the array ).
Obviously, my first thought was to use aggregations, but I am not sure how to use them with nested objects, like these.
Can this be done with normal aggregations?
Will I need to use a script?
Yes this can be done with Aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

Custom unicode sorting order for PouchDB/CouchDB Index (Mango Query)

I am using PouchDB (with a Cloudant remote database) to have a local database in a dictionary web app.
I need to have an index with a custom Pashto alphabet order (using Arabic unicode letters).
The localdb.find queries with $gte (alphabetically searching with partial words) do not work well because of the irregular Unicode characters in the Pashto alphabet.
Is it possible to create a custom sort, based on the Pashto alphabet, for an index?
See Mango Query Language
In this reference it is mentioned that:
The most important feature of a view result is that it is sorted by key.
Assume you have a database consisting of docs with a unicodeString field inside each doc. So a sample doc would look like below:
{
"_id":"2018-01-30-18-04-11",
"_rev":"AE19EBC7654",
"title":"Hello elephant",
"unicodeString":"שלום פיל",
}
Now you can have a CouchDB view with a map function like this:
function(doc) {
emit(doc.unicodeString, doc.title); // doc.unicodeString is key
// doc.title is value
}
The above view sorts all the docs inside the database according to its key which is doc.unicodeString. Therefore, if you use the above view, all of your docs would be sorted based on your Unicode string inside docs.
If you have 3 docs in database, when you query the above view, you receive a response result like this in which rows array is sorted according to key in each row:
{
"total_rows": 3,
"offset": 0,
"rows": [
{
"key": "ארץ",
"id": "2017-09-01-09-05-11",
"value": "Earth"
},
{
"key": "בין",
"id": "2015-01-19-11-30-28",
"value": "between"
},
{
"key": "שלום פיל",
"id": "2018-01-30-18-04-11",
"value": "Hello elephant"
}
]
}

Hiding _source fields based on other fields

Let's say I have two documents in a Elasticsearch index:
[
{
"foo": 1,
"bar": 2,
"visible_fields": ["foo"]
},
{
"foo": 1,
"bar": 2,
"visible_fields": ["bar"]
}
]
I want only the fields listed in visible_fields for each document to be returned in a query response. How would I do that?
I'm thinking a custom plugin or script could solve it but I don't know how or where to start. Looking through the source code for the existing plugins I can't find anything that I can use to access and modify the _source fields.

Index main-object, sub-objects, and do a search on sub-objects (that return sib-objects)

I've an object like it (simplified here), Each strain have many chromosomes, that have many locus, that have many features, that have many products, ... Here I just put 1 of each.
The structure in json is:
{
"name": "my strain",
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
I want to add this object in Elasticsearch, for the moment I've add objects separatly: locus, features and products. It's okay to do a search (I want type a keyword, watch in name of locus, name of features, and name of products), but I need to duplicate data like public and authorized_users, in each subobject.
Can I register the whole object in elasticsearch and just do a search on each locus level, features and products ? And get it individually ? (no return the Strain object)
Yes you can search at any level (ie, with a query like "chromosomes.locus.name").
But as you have arrays at each level, you will have to use nested objects (and nested query) to get exactly what you want, which is a bit more complex:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/query-dsl-nested-query.html
For your last question, no, you cannot get subobjects individually, elastic returns the whole json source object.
If you want only data from subobjects, you will have to use nested aggregations.

Resources