ES reversed filtering - elasticsearch

Pardon the title, not sure how better to describe the problem.
Anyway, I have a table with entries(id, name, age, ..dynamic fields) and filter_groups(id, filters[])
Each filter group as a list of filters of the form {filter, field, value} that is used client-side to filter an HTML table of entries
Exmaples:
[{ filter: 'less_than', field: 'age' value: 10 },
{filter: 'is', field: 'name' value: "john doe" }]
If I wanted to fetch all the entries matching a particular filter group, it seems fairly straightforward to construct the query and send it to ES using the filters.
However, if you reverse the situation and given an entry want to fetch all the filter_groups whose filters match the entry, how would you go about doing this?
Thanks

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

ElasticSearch Chaining queries based on result from first query

I have some data and I am looking to implement a search feature that probably requires chaining multiple queries. for example there are few people who are part of a group but each member in the database are separate. None of the data is nested.
For example
data = [
{
id: '1'
name: 'abc',
familyId: '3'
},
{
id: '2'
name: 'def',
familyId: '3'
},
{
id: '3'
name: 'ghi',
familyId: null
},
]
So now I am trying to implement a search feature where people can search by name, and if the name matches I want to show that result along with his family members. Each data is different and there is no connection between them apart from the familyId.
So currently my solution is to make a search using the name first and then from the result of my first search I will see if there is family ID present in the result, and if yes make another ES query to get all the members and then show the result.
Is there a away I could make it one query that will give me the desired output?
Any suggestion is very much appreciated.
there's no native Elasticsearch functionality for this unfortunately, your approach is the current best way to do it

ElasticSearch Aggregations: subtracting aggregations based upon match

Using a simple albeit somewhat artificial example, let's say that I have several inventory docs stored in ElasticSearch where every document represents either the purchase or the sale of an item:
[
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149382734621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149383464621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149384824621},
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149387435621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149388434621},
{item_id: "bizbaz", type: "cost", value: 41.23, timestamp:149389424621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149389914621},
{item_id: "waahoo", type: "sale", value: 11.23, timestamp:149389914621},
...
]
And for a specified time range I want to calculate the current profit for each item. So for example I would want to return:
foobar_profit = sum(value of all documents item_id="foobar" and type="sale")
-sum(value of all documents item_id="foobar" and type="cost")
bizbaz_profit = sum(value of all documents item_id="bizbaz" and type="sale")
-sum(value of all documents item_id="bizbaz" and type="cost")
...
There are two aspects that I don't yet understand how to achieve.
I know how to aggregate over terms, so this would allow me to sum the value of of all "foobar" items regardless of type. But I don't know how to sum over all documents that match on two fields. For instance, I want to aggregate the above data set on the compound key (item_id,type). The dataset above would then yield the aggregations:
(foobar,cost)->24.68
(foobar,sale)->65.48
(bizbaz,cost)->41.23
(bizbaz,sale)->90.24
(waahoo,sale)->11.23
Presuming I can do #1, I will have aggregations like foobar_cost and foobar_sale. But I don't know how to combine two aggregations so that in this case foobar_profit = foobar_sale - foobar_cost. So the above aggregations would become
foobar_profit->40.8
bizbaz_profit->49.01
waahoo_profit->11.23
Some final notes:
In the example above, I only list 3 item_ids, but consider that there will be thousands of item_ids, so I can't do special-case queries per item_id.
Also, for a particular item, the cost and sale items will come in at different times, so we can't put the cost and sale price in the same document and diff the fields.
I can send back all the data and do the last step of the aggregations client side, but this might be a ton of data. Really, I need to do it on server side if possible so that I can sort the results by profit and return the top N.
You can just use nested aggregations. See here for a working example: https://gist.github.com/mattweber/71033b1bf2ebed1afd8e
I use a MatchAll Query in this example but you can replace that with a RangeQuery or whatever you need.

How do I model my document in MongoDB to make it paginable for nested attributes?

I'm trying to cache my tweets and show that based on my keyword save. However, as tweets grow overtime I need to paginate them.
I'm using Ruby and Mongoid which this is what I have come up so far.
class SavedTweet
include Mongoid::Document
field :saved_id, :type => String
field :slug, :type => String
field :tweets, :type => Array
end
And the tweets array would be like this
{id: "id", text: "text", created_at: "created_at"}
So it's like a bucket for each keyword that you can save. My first problem would be that Mongodb cannot sort the second level of document which in this case it's tweets and that'd make pagination much harder because I cannot use skip and limit. I will have to load the whole tweets and put that in the cache and paginate from that.
The question is how should I model my problem to make it paginable out of Mongodb and not in the memory. I'm assuming that doing it in Mongodb would be faster. Right now, I'm in the early stage of my application so it's easier to change the model than later. If you guys have any suggestions or opinion I'm really appreciated.
An option could be to save tweets in a different collection and link them with your SavedTweet class. It will be easy to query and you could use skip and limit without problems.
{id: "id", text: "text", created_at: "created_at", saved_tweet:"_id"}
EDIT: a better explanation, with two aditional options
As far I see, you have three options, if I understand correctly your requirements:
Use the same schema that you are already using. You would have two problems: you cannot use skip and limit with an usual query and you have a limit of 16 MB per document. I think, the first one could be resolved with an Aggregation Framework query ($unwind, $skip and $limit could be helpful). The second one could be a problem if you have a lot of tweet documents in the array, because one document cannot have more than 16MB of size.
Use two collections to store your tweets. One collection would have the same structure that you already have. For example:
{
save_id:"23232",
slug:"adkj"
}
And the other collection would have one document per tweet.
{
id: "id",
text: "text",
created_at: "created_at",
saved_tweet:"_id"
}
With saved_tweet field you are linking saved_tweets with tweet with a 1 to N relation. So with this way, you can carry out queries over tweet collection and still be able to use limit and skip operators..
Save all info in the same document. If your saved_tweet collection only have those fields, you can save all info in a whole document (one document for each tweet). Something like this:
{
save_id:"23232",
slug:"adkj"
tweet:
{
tweet_id: "id",
text: "text",
created_at: "created_at"
}
}
Whit this solution you are duplicating fields, because *save_id* and slug would be the same in other documents of the same saved_tweet, but I could be an option if you have a little quantity of fields and that fields are not subdocuments or arrays.
I hope it is clear now.

How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?
I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.
Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Resources