Elasticsearch query with sorting by array of values - elasticsearch

Is that possible in ES to query and sort by array of values like for ex.
Give me all results, but results with "country_code" = [ 'de', 'au', 'es'] should be prioritized at the given order like they are in array
It is possible???

Elasticsearch does not really handle arrays, internally it's just same field having 3 different values: country_code = "de" AND country_code = "au" AND country_code = "es" all at the same time. You can though use script based sorting and handle arrays in Painless.

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

Aggregation on time interval

Let's say document has
{startDate,endDate}
fields among other fields.
And now we want to create range aggregation based on these fields.
So document must appear in each bucket that overlaps with start-end interval.
I found that aggregation is not supported yet on Range Fields => https://github.com/elastic/elasticsearch/issues/34644
As a workaround => I can use script and since I know size(time interval) of buckets I could generate array of values between start-end dates
aggregation_field : [startDate + bucketSize, startDate + bucketSize * 2, .... endDate]
But in some cases this array could be huge.
Are there other workarounds ?
Thanks !

Elasticsearch filter by unique value

i would like to get only results which have a unique value for a field how can i do it ?
I can count it with
$params ['body'] ['aggs']['test_count'] = array(
"cardinality" => array(
"field" => "id",
"precision_threshold"=> 00
)
);
but the results still appear how can i set a filter to get only value have a distinct id ?
If all you need is the unique values, you can use a terms aggregation to get them.
If you want the full document, the closest thing they have is the top hits aggregation. Refer specifically to the field collapsing example.

MongoDB compound indexes vs Single FIeld Indexes in terms of space consumption

According to this post compound indexes are bigger in terms of dimensions (I could not find much info on docs, so if you could point me there I would be grateful).
Suppose I have to search for the whole address (we can assume I will always have all the fields available both in collection and in the query) through a collection of addresses like
{
name: String,
street: String,
postcode: String,
City: String,
Country: String
}
My question is: how bigger a compound index would be?
If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?
If a compound index is bigger then a single field wouldn't it be better to add a hash of the concatenation of all values to all objects, add a single index to the hash field and search by that (although it do not sounds like a good practice)?
These accomplish different things. A compound index has an order and that order has an effect. For instance, the index { 'country' : 1, 'city' : 1, 'postcode' : 1 } would allow to search for all address in a specific city of a specific country. A hash can't do that - hashes only support exact matches.
I don't see how this is bad practice at all, it's just a very narrow use case. Remember than every slight difference in spelling, additional white spaces, etc. will result in different hash values and that you can't even answer simple question like "how many address in country X do we store?". But if you don't need that, why not?
By the way, MongoDB has built-in support for this. If the address is embedded, using a hashed index on the entire subdocument will accomplish what you need:
MongoDB supports hashed indexes of any single field. The hashing function collapses embedded documents and computes the hash for the entire value,
e.g.:
> db.hash.insert( {"name": "john", "address" : { "city" : "Chicago", "state":"IL",
"country" : "US" } } );
WriteResult({ "nInserted" : 1 })
> db.hash.createIndex( { "address" : "hashed" } );
...
>
> This query uses the index and finds the document:
> db.hash.find({"address" : {"city" : "Chicago", "state": "IL", "country" : "US" } } );
>
> // this query wont find the document b/c of missing state, but is still fast (IXSCAN)
> db.hash.find({"address" : {"city" : "Chicago", "country" : "US" } } );

Sum of total tokens in array

I have a document as below -
{
"array" : [ "Aone" , "Btwo" , "Aone" ]
}
I need to aggregate the sum of number of elements in array using aggregation.
value_count is giving me the unique tokens , but that is not what i am looking for.
First you need to make array a multi field with a new field called numOfTokens . Declare this field as token count.
You can find more about it here -http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count
This will create an addition field called array.numOfTokens per document that will have the number of tokens for that field.
Next you can do a simple sum aggregation on that field using - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html#search-aggregations-metrics-sum-aggregation

Resources