Elasticsearch Aggregation Huge Bucket - elasticsearch

I have an category tree on each document. Like
[{
name: "RootCat",
parentId: 104319,
id: 104319
},
{
name: "FirstLevel",
parentId: 104319,
id: 104328
},
{
name: "n Level",
parentId: 104328,
id: 107929
}]
The problem when i want to have a tree for each search like :
Root Cat
- First Level
--Second Level-1
--Second Level-2
Aggregations come up with all buckets. And i have about 40000 categories, so it makes huge network traffic. How can i get only categories that i want to show.My filter aggreagtion below
.Filter(SearchConstants.Aggregation.Category,
y => y.Aggregations(r => r.Filter("filteredAggs",
cc => cc.Filter(GetPostFilters(searchQuery)) .Aggregations(ra => ra.Nested("cat",
ty => ty.Path(rtw => rtw.CategoryList).Aggregations(
abc => abc.Terms("categoryId", t => t.Field(q => q.CategoryList.First().Id).Size(0)
I want to get one level children with all parents. Like
Root Cat
Firts Level
Second Level
(But buckets have all level children)

Not sure what you mean, by default the search results impact the actual aggregations that are returned. Another option is to use a filter in the aggregation itself. But this is dependent on your requirements. Check the docs for implementing a filter in the aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations-bucket-filter-aggregation.html

Related

ElasticSearch Chaining queries based on result from first query

I have some data and I am looking to implement a search feature that probably requires chaining multiple queries. for example there are few people who are part of a group but each member in the database are separate. None of the data is nested.
For example
data = [
{
id: '1'
name: 'abc',
familyId: '3'
},
{
id: '2'
name: 'def',
familyId: '3'
},
{
id: '3'
name: 'ghi',
familyId: null
},
]
So now I am trying to implement a search feature where people can search by name, and if the name matches I want to show that result along with his family members. Each data is different and there is no connection between them apart from the familyId.
So currently my solution is to make a search using the name first and then from the result of my first search I will see if there is family ID present in the result, and if yes make another ES query to get all the members and then show the result.
Is there a away I could make it one query that will give me the desired output?
Any suggestion is very much appreciated.
there's no native Elasticsearch functionality for this unfortunately, your approach is the current best way to do it

Counting documents by property occurrence in Kibana

I'm trying to create a visualization that looks like this:
Foobar, 10
Bar, 8
Baz, 5.6
The first column is the aggregation itself. Imagine i have documents like this:
{
id: 1,
name: 'lorem ipsum',
type: 'A'
author: {
name: 'Foobar',
}
}
{
id: 2,
name: 'dolor sit amet',
type: 'B',
author: {
name: 'Foobar',
}
}
So, i want to add a +1 to the score of "Foobar" everytime i find a document of type A. And a +2 to the score if i find a document of type B. Basically, aggregating by the author name, and calculating a dynamic value on results.
Is this possible in Kibana? Thanks for the help.
AFAIK, you can't do this in Kibana in visualize panel, maybe you can try it in program then index the result into es.

Restrict a find query MongoDB with ruby driver

I have a mongo query that look like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1})
My problem is that my collection contain a huge quantity of documents and the find is really slow.
So I was wondering if I could restrict the query for the last x inserted documents ?
Something like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1, max_doc: 10_000})
Thanks and sorry for my english.
You can't do that restriction at query time. But you could have a separated caped collection for this effect.
You insert on both and do this query on the caped collection which will will only retain the last N documents.
But this will only fit if you don't need to update or remove documents from that collection.
http://docs.mongodb.org/manual/core/capped-collections

ElasticSearch Aggregations: subtracting aggregations based upon match

Using a simple albeit somewhat artificial example, let's say that I have several inventory docs stored in ElasticSearch where every document represents either the purchase or the sale of an item:
[
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149382734621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149383464621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149384824621},
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149387435621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149388434621},
{item_id: "bizbaz", type: "cost", value: 41.23, timestamp:149389424621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149389914621},
{item_id: "waahoo", type: "sale", value: 11.23, timestamp:149389914621},
...
]
And for a specified time range I want to calculate the current profit for each item. So for example I would want to return:
foobar_profit = sum(value of all documents item_id="foobar" and type="sale")
-sum(value of all documents item_id="foobar" and type="cost")
bizbaz_profit = sum(value of all documents item_id="bizbaz" and type="sale")
-sum(value of all documents item_id="bizbaz" and type="cost")
...
There are two aspects that I don't yet understand how to achieve.
I know how to aggregate over terms, so this would allow me to sum the value of of all "foobar" items regardless of type. But I don't know how to sum over all documents that match on two fields. For instance, I want to aggregate the above data set on the compound key (item_id,type). The dataset above would then yield the aggregations:
(foobar,cost)->24.68
(foobar,sale)->65.48
(bizbaz,cost)->41.23
(bizbaz,sale)->90.24
(waahoo,sale)->11.23
Presuming I can do #1, I will have aggregations like foobar_cost and foobar_sale. But I don't know how to combine two aggregations so that in this case foobar_profit = foobar_sale - foobar_cost. So the above aggregations would become
foobar_profit->40.8
bizbaz_profit->49.01
waahoo_profit->11.23
Some final notes:
In the example above, I only list 3 item_ids, but consider that there will be thousands of item_ids, so I can't do special-case queries per item_id.
Also, for a particular item, the cost and sale items will come in at different times, so we can't put the cost and sale price in the same document and diff the fields.
I can send back all the data and do the last step of the aggregations client side, but this might be a ton of data. Really, I need to do it on server side if possible so that I can sort the results by profit and return the top N.
You can just use nested aggregations. See here for a working example: https://gist.github.com/mattweber/71033b1bf2ebed1afd8e
I use a MatchAll Query in this example but you can replace that with a RangeQuery or whatever you need.

How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?
I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.
Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Resources