Returning Aggregates Per Item in Elasticsearch - elasticsearch

I have a collection of documents (eg with fields for each student) and would like to return the aggregates for each student together in 1 query. I can only think of querying by student and then doing an aggregate but would like to prevent looping in my code to get the aggregate for each student.

If I understand correctly of your question, student name or student ID is part of the field in your document?
For example,
{
"name": "Steve",
"id": "sid",
"grade": 1,
...
}
If this is the case, I think you would just need to do a nested aggregation. Put terms aggregation based on student's name or id field first, and then do nested aggregation under the terms aggregation for field of your interests.
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
https://qbox.io/blog/elasticsearch-aggregations-nested-documents-tutorial

Related

How to group documents of different types according to the same matching field in Elasticsearch?

To preface, I'm working with Spring Data Elasticsearch.
Let's say I have about 10k documents each of a Car entity and a Owner entity:
Car: {VIN, make, model, color}
Owner: {VIN, owner}
Let's say that each car can have 0-many owners.
In the end, I want a bunch of CarProfile objects that consist of the matching Car data and Owner data together.
CarProfile: {VIN, make, model, color, List<String> owners}
I was thinking of two approaches to this:
Index all the Car and Owner data into ES. Group the documents by VIN and traverse through each group and convert each group into a CarProfile object.
Index all the Owner data into ES. Traverse through the Car data and for each Car, retrieve any matching Owner information with the VIN, then convert all the data to a CarProfile object.
Approach 1 would be more convenient, but I'm not sure if that approach is possible. It seems like aggregations can only give you a subset of the data (like how many owners per car) but not all the document data together. Any suggestions would be welcome.
You can check field collapsing. (https://www.elastic.co/guide/en/elasticsearch/reference/7.16/collapse-search-results.html)
You can keep documents flat with following document model.
CarProfile: {VIN, make, model, color, owner}
And you can group by VIN with a query like the below query and this should give you documents grouped by VIN with all of their fields.:
{
"query": {
"match_all": {},
"collapse": {
"field": "VIN"
}
}
}

Why does ES recommend to use single mapping per index and doesn't provide any "Join" functionality for this?

As you know, starting from version 6, ElasticSearch team deprecates multiple types per index as well as parent-child relationships. Proof is here
They recommend to use join queries instead of parent-child. But let's look on this join query here. They write:
The join datatype is a special field that creates parent/child
relation within documents of the same index.
They offer to use multiple indexes, restrict their indexes to work with only 1 single mapping _doc, but join query is designed to work only in bounds of the same index.
How to live on? How could I create parent-child relationships for separate indexes?
Example:
Index: "City"
{
"name": "Moscow",
"id": 1
}
Index: "Product"
{
"name": "Shirt",
"city": 1,
"id": 1
}
How could I get that "Shirt" above if I know only "Moscow" city name?

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Automatically indexing by a field name as desc

i have index type of book story that every week wants to put some books.
in this index i want to have always query by sorting a field name(in this case is "price" ) as desc so it's have some overhead on ES (cause of data volume)
in this service we always shows to user books by maximum to minimum price
is possible to have this feature automatically or manually for sorting document of book type in index always by price as desc and then when to want to query them it's always sorted by price as desc and dont need to give it by:
"sort" : { "price" { "order" : "desc" } }
No, you can not keep your data ordered based on a field. Elasticsearch keeps the data as Lucene segments inside. Take a look here to better understand internal structure of ES: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

How can I query/filter an elasticsearch index by an array of values?

I have an elasticsearch index with numeric category ids like this:
{
"id": "50958",
"name": "product name",
"description": "product description",
"upc": "00302590602108",
"**categories**": [
"26",
"39"
],
"price": "15.95"
}
I want to be able to pass an array of category ids (a parent id with all of it's children, for example) and return only results that match one of those categories. I have been trying to get it to work with a term query, but no luck yet.
Also, as a new user of elasticsearch, I am wondering if I should use a filter/facet for this...
ANSWERED!
I ended up using a terms query (as opposed to term). I'm still interested in knowing if there would be a benefit to using a filter or facet.
As you already discovered, a termQuery would work. I would suggest a termFilter though, since filters are faster, and cache-able.
Facets won't limit result, but they are excellent tools. They count hits within your total results of specific terms, and be used for faceted navigation.

Resources