Hiding _source fields based on other fields - elasticsearch

Let's say I have two documents in a Elasticsearch index:
[
{
"foo": 1,
"bar": 2,
"visible_fields": ["foo"]
},
{
"foo": 1,
"bar": 2,
"visible_fields": ["bar"]
}
]
I want only the fields listed in visible_fields for each document to be returned in a query response. How would I do that?
I'm thinking a custom plugin or script could solve it but I don't know how or where to start. Looking through the source code for the existing plugins I can't find anything that I can use to access and modify the _source fields.

Related

Elasticsearch cannot index array of integers

Given mapping:
{
"mappings": {
"properties": {
"user_ids": {"type": "integer"}
}
}
}
Observations:
Index {"user_ids": 1}, and data will show up correctly
Index {"user_ids": "x"}, and error is thrown failed to parse field [user_ids] of type [integer] in document, indicating that mapping is working correctly
However, indexing {"user_ids": [1]} just clears the field, without throwing error.
Question:
Why does this happen and how can I index arrays of integers?
Extra:
I removed all settings config, doesn't change anything
I tried keyword type, doesn't change anything
If relevant, I use latest opensearch-py
It's not clear what do you mean by clear the field, also indexing array of integers works perfectly fine as shown in below example, hope you are following same requests.
put <index-name>/_doc/1
{
"user_ids": [
1,2,3
]
}
And get API returns, all the integers in the array.
GET <index-name>/_doc/1
"_source": {
"user_ids": [
1,
2,
3
]
}
}
Turns out it's my own error: I was using elasticsearch-head for quick checking of values, and they don't support displaying of array values :/ Once I double checked with queries, they came back correct.

How to get distinct keys of a nested object in an elasticsearch document?

I'd like to look across an index for a unique list of keys in a nested object.
So in the example below, I want the output
["alpha", "beta", "gamma", "sigma", "theta" ]
Much of the google search results were around unique values instead of the keys.
Example docs:
{
"foo": "bar",
"fooNested": {
"alpha": 1,
"beta": 4,
"gamma": 2,
}
},
{
"foo": "HelloWorld",
"fooNested": {
"sigma": 9,
"theta": 1
}
}
Is this possible using the rest api?
You can use mapping api to get all properties in index and parse it client side to list properties under nested object or you can store fields as values and query it.
example.
"fooNested": {
"sigma": 9,
"theta": 1,
"keys":["sigma","theta"]
}

how to use Elastic Search nested queries by object key instead of object property

Following the Elastic Search example in this article for a nested query, I noticed that it assumes the nested objects are inside an ARRAY and that queries are based on some object PROPERTY:
{
nested_objects: [ <== array
{ name: "x", value: 123 },
{ name: "y", value: 456 } <== "name" property searchable
]
}
But what if I want nested objects to be arranged in key-value structure that gets updated with new objects, and I want to search by the KEY? example:
{
nested_objects: { <== key-value, not array
"x": { value: 123 },
"y": { value: 456 } <== how can I search by "x" and "y" keys?
"..." <=== more arbitrary keys are added now and then
]
}
Thank you!
You can try to do this using the query_string query, like this:
GET my_index/_search
{
"query": {
"query_string": {
"query":"nested_objects.\\*.value:123"
}
}
}
It will try to match the value field of any sub-field of nested_objects.
Ok, so my final solution after some ES insights is as follows:
1. The fact that my object keys "x", "y", ... are arbitrary causes a mess in my index mapping. So generally speaking, it's not a good ES practice to plan this kind of structure... So for the sake of mappings, I resort to the structure described in the "Weighted tags" article:
{ "name":"x", "value":123 },
{ "name":"y", "value":456 },
...
This means that, when it's time to update the value of the sub-object named "x", I'm having a harder (and slower) time finding it: I first need to query the entire top-level object, traverse the sub objects until I find one named "x" and then update its value. Then I update the entire sub-object array back into ES.
The above approach also causes concurrency issues in case I have multiple processes updating the same index. ES has optimistic locking I can use to retry when needed, or, I can queue updates and handle them serially

Index main-object, sub-objects, and do a search on sub-objects (that return sib-objects)

I've an object like it (simplified here), Each strain have many chromosomes, that have many locus, that have many features, that have many products, ... Here I just put 1 of each.
The structure in json is:
{
"name": "my strain",
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
I want to add this object in Elasticsearch, for the moment I've add objects separatly: locus, features and products. It's okay to do a search (I want type a keyword, watch in name of locus, name of features, and name of products), but I need to duplicate data like public and authorized_users, in each subobject.
Can I register the whole object in elasticsearch and just do a search on each locus level, features and products ? And get it individually ? (no return the Strain object)
Yes you can search at any level (ie, with a query like "chromosomes.locus.name").
But as you have arrays at each level, you will have to use nested objects (and nested query) to get exactly what you want, which is a bit more complex:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/query-dsl-nested-query.html
For your last question, no, you cannot get subobjects individually, elastic returns the whole json source object.
If you want only data from subobjects, you will have to use nested aggregations.

Facet to get all keys from an object in elasticsearch

Let's say I have the following docs:
{
"title": "Some Title",
options: {
"key5": 1,
"key3": 0,
"key1": 1,
}
},
{
"title": "Some Title",
options: {
"key2": 0,
"key3": 0,
"key5": 1,
}
}
I want to get all the keys from options object using facet.
If options was a simple array of keys as strings, I would simple use a facet like this:
"facets" : {
"options" : {
"terms" : {
"field" : "options"
}
}
}
But it doesn't work in my case.
So if a query returns those two docs, I should get these keys: ["key5","key3","key1","key2"]
What kind of facet do I actually need?
You can't do that using a facet.
You have 2 options -
Keep your current document structure and get the list of keys from the type mapping (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html). This brings the scheme of your type which holds all the fields encountered.
Change your structure. Keep the key also as a field, so your option array becomes an array of documents like:
"options" :
[
{ "key" : "key1", "value" : 1},
{ "key" : "key2", "value" : 0}
]
You probably will want to keep the context of the key-value pairs when searching or faceting so configure it as a nested type (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html).
Then you can facet on the "options.key" field to get a list of top keys.
if i understand you correctly, you would want to make a terms_facet for each and every field in your nested options object. kind of a "wildcard facet"?
i think that there is no functionality in the facet api that allows for this kind of operation. if i am not mistaken, fields used for faceting have to be mapped, so it might be possible to extract the fields in a separate query by inspecting the index mappings.

Resources