Performance with nested data in a script field - elasticsearch

I am wondering if there is a more performant way of performing a calculation on nested data in a script field or of organizing the data. In the code below, the data will contain values for 50 states and/or other regions. Each user is tied to an area, so the script above will search to see that the averageValue in their area is above a certain threshold and return a true/false value for each matching document.
Mapping
{
"mydata" : {
"properties" : {
...some fields,
"related" : {
"type" : "nested",
"properties" : {
"average_value" : {
"type" : "integer"
},
"state" : {
"type" : "string"
}
}
}
}
}
}
Script
"script_fields" : {
"inBudget" : {
"script" : {
"inline" : "_source.related.find { it.state == default_area && it.average_value >= min_amount } != null",
"params" : {
"min_amount" : 100,
"default_area" : "CA"
}
}
}
}
I have a working solution using the above method, but it slows my query down and I am curious if there is a better solution. I have been toying with the idea of using a inner object with a key, like: related_CA and having each states data in a separate object, however for flexibility I would rather not have to pre-define each region in a mapping (as I may not have them all ahead of time). I feel like I might be missing a simpler/better way and I am open to either reorganizing the data/mapping and/or changes to the script.

Related

Elasticsearch: Sort by calculated date value

Is it possible to compare the datefield to current time and then make a sort by the result of that comparison (something like switch cases in SQL order by)?
The goal is to make documents having an specific datetime field which its value is bigger than current time, move to top of the list but all documents having an specific datetime field less than current time are equal in terms of priority and should not be sorted by this datetime field.
Firstly, you can use microtime to easy usage. And there is script sort feature in Elasticsearch. You can also use if statements in this scripts.
{
"query" : {
....
},
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"inline": "if (doc['time'].value > current_time) return doc['field_name'].value; else return current_time",
"params" : {
"current_time" : 1476354035
}
},
"order" : "asc"
}
}
}
You should send a current time when you run your query with your script.

elastic search filter by documents count in nested document

I have this schema in elastic search.
79[
'ID' : '1233',
Geomtries:[{
'doc1' : 'F1',
'doc2' : 'F2'
},
(optional for some of the documents)
{
'doc2' : 'F1',
'doc3' : 'F2'
}]
]
the Geometries is a nested element.
I want to get all of the documents that have one object inside Geometries.
Tried so far :
"script" : {"script" : "if (Geomtries.size < 2) return true"}
But i get exceptions : no such property GEOMTRIES
If you have the field as type nested in the mapping, the typical doc[fieldkey].values.size() approached does not seem to work. I found the following script to work:
{
"from" : 0,
"size" : <SIZE>,
"query" : {
"filtered" : {
"filter" : {
"script" : {
"script" : "_source.containsKey('Geomtries') && _source['Geomtries'].size() == 1"
}
}
}
}
}
NB: You must use _source instead of doc.
The problem is in the way you access fields in your script, use:
doc['Geometry'].size()
or
_source.Geometry.size()
By the way for performance reasons, I would denormalize and add GeometryNumber field. You can use the transform mapping to compute size at index time.

Elasticsearch document aliases

I have multiple mappings which come from the same datasource but have small differences, like the example below.
{
"type_A" : {
"properties" : {
"name" : {
"type" : "string"
}
"meta_A" : {
"type" : "string"
}
}
}
}
{
"type_B" : {
"properties" : {
"name" : {
"type" : "string"
}
"meta_B" : {
"type" : "string"
}
}
}
}
What I want to be able to is:
Directly query specific fields (like meta_A)
Directly query all documents from the datsource
Query all documents from a specific mapping
What I was looking into is the type filter, so preferably I could write a query like this:
{
"query": {
"filtered" : {
"filter" : {
"type" : { "value" : "unified_type" }
}
}
// other query clauses
}
}
So instead of typing "type_A","type_B" in an or clause in the type filter I would like to have this "unified_type", but without giving up the possibility to directly query "type_A".
How could I achive this?
I don't think that it's possible. However, you could use copy_to functionality, so you would have your fields as they are now and their values copied into unified name.
The copy_to parameter allows you to create custom _all fields. In
other words, the values of multiple fields can be copied into a group
field, which can then be queried as a single field. For instance, the
first_name and last_name fields can be copied to the full_name field
as follows:
So you'd be copying both "meta_A" and "meta_B" into some "unified_meta" field and query this one.

elasticsearch query to find documents that don't exist

Is there a way in Elasticsearch through filters, queries, aggregations etc to search for a list of document ids and have returned which ids did not hit?
With a small list it is easy enough to compare the results against the requested ids list but I'm dealing with lists of ids in the tens of thousands and it is not going to be performant to do that.
Do you mean, from https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-not-filter.html
"filtered" : {
"query" : {
"term" : { "name.first" : "shay" }
},
"filter" : {
"not" : {
"range" : {
"postDate" : {
"from" : "2010-03-01",
"to" : "2010-04-01"
}
}
}
}
}
Take a look at the guide at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Mongo DB MapReduce: Emit key from array based on condition

I am new to mongo db so excuse me if this is rather trivial. I would really appreciate the help.
The idea is to generate a histogram over some specific values. In that case the mime types of some files. For that I am using a map reduce job.
I have a mongo with documents in the following form:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : [
{
"key" : "key1",
"value" : "Plain text",
"status" : "SINGLE_RESULT",
},
{
"key" : "key2",
"value" : "text/plain",
"status" : "SINGLE_RESULT",
},
{
"key" : "key3",
"value" : 3469,
"status" : "OK",
}
]
}
Please note, that in almost every document there are more metadata key values.
Map Reduce job
I tried doing the following:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;}
});
emit(mime, {count:1});
}
function reduce() {
var res = {count:0};
values.forEach(function (v) {res.count += v.count;});
return res;
}
db.collection.mapReduce(map, reduce, {out: { inline : 1}})
This seems to work for a small number of documents (~15K) but the problem is that iterating through all metadata key values takes a lot of time during the mapping phase. When running this on more documents (~1Mio) the operation takes for ever.
So my question is:
Is there some way in which I can emit the mime type (the value) directly instead of iterating through all keys and selecting it? Or is there a better way to write a map reduce functions.
Something like emit (this.metadata.value {$where this.metadata.key:"key2"}) or similar...
Thanks for your help!
Two thoughts ...
First thought: How attached are you to this document schema? Could you instead have the metadata field value as an embedded document rather than an embedded array, like so:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : {
"key1" : {
"value" : "Plain text",
"status" : "SINGLE_RESULT"
},
"key2": {
"value" : "text/plain",
"status" : "SINGLE_RESULT"
},
"key3" : {
"value" : 3469,
"status" : "OK"
}
}
}
Then your map step does away with the loop entirely:
function map() {
emit( this.metadata["key2"].value, { count : 1 } );
}
At that point, you might even be able to cast this as a "group" command rather than a "mapReduce".
Second thought: Absent a schema change like that, particularly if "key2" appears early in the metadata array, you could at least exit the loop eagerly once the key is found to save yourself some iterations, like so:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;
break;
}
});
emit(mime, {count:1});
}
Not sure if either path is the key to victory, but hopefully helpful thoughts. Best of luck!

Resources