How to group documents of different types according to the same matching field in Elasticsearch? - elasticsearch

To preface, I'm working with Spring Data Elasticsearch.
Let's say I have about 10k documents each of a Car entity and a Owner entity:
Car: {VIN, make, model, color}
Owner: {VIN, owner}
Let's say that each car can have 0-many owners.
In the end, I want a bunch of CarProfile objects that consist of the matching Car data and Owner data together.
CarProfile: {VIN, make, model, color, List<String> owners}
I was thinking of two approaches to this:
Index all the Car and Owner data into ES. Group the documents by VIN and traverse through each group and convert each group into a CarProfile object.
Index all the Owner data into ES. Traverse through the Car data and for each Car, retrieve any matching Owner information with the VIN, then convert all the data to a CarProfile object.
Approach 1 would be more convenient, but I'm not sure if that approach is possible. It seems like aggregations can only give you a subset of the data (like how many owners per car) but not all the document data together. Any suggestions would be welcome.

You can check field collapsing. (https://www.elastic.co/guide/en/elasticsearch/reference/7.16/collapse-search-results.html)
You can keep documents flat with following document model.
CarProfile: {VIN, make, model, color, owner}
And you can group by VIN with a query like the below query and this should give you documents grouped by VIN with all of their fields.:
{
"query": {
"match_all": {},
"collapse": {
"field": "VIN"
}
}
}

Related

Returning Aggregates Per Item in Elasticsearch

I have a collection of documents (eg with fields for each student) and would like to return the aggregates for each student together in 1 query. I can only think of querying by student and then doing an aggregate but would like to prevent looping in my code to get the aggregate for each student.
If I understand correctly of your question, student name or student ID is part of the field in your document?
For example,
{
"name": "Steve",
"id": "sid",
"grade": 1,
...
}
If this is the case, I think you would just need to do a nested aggregation. Put terms aggregation based on student's name or id field first, and then do nested aggregation under the terms aggregation for field of your interests.
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
https://qbox.io/blog/elasticsearch-aggregations-nested-documents-tutorial

Elasticsearch projections onto new type

Is it possible to get a projection as a query result in elasticsearch?
For example:
I have 3 types in my index:
User { Id, Name, Groups[], Location { Lat, Lon } }
Group { Id, Name, Topics[] }
Message { Id, UserId, GroupId, Content}
And I want to get the number of messages and users in a group in a given area, so my input would be:
{ Lat, Lon, Distance, GroupId }
and the output would be:
Group { Id, Name, Topics, NumberOfUsers, NumberOfMessages }
where the actual output of the query is a combination of data returned by the query and aggregations within that data.
Is this possible?
There are no JOINs in Elasticsearch (except for parent-child, but those shouldn't be used for heavy joining either). With your current data model you'll only be able to to application-side JOINs and depending on your actual data that might be a lot of roundtrips. I don't think this will work out too well.
PS: Generally, please provide some simple test documents with usable data. If I have to put together a test data set to try out your problem, your chances that anybody will actually try it will get rather slim.

elastic search get distinct random field values

We have elastic search document that has following fields:
{
"stockId": 1
"sellerId": 100
}
Multiple stockId can be mapped to single sellerId but one stock can only be mapped to a single dealer. There are around 10K stocks mapped to 1K sellers. But each sellerId might have different number of stocks i.e. few might have 100 while others have only 1.
Problem Statement: We want to select 'N' random documents out of all these documents indexed. The condition is that each of these 'N' document should belong to different seller i.e. distinct "sellerId". (We need to give award to these sellers).
What I have tried: I am trying to solve this by elastic query that fetches 'N' random distinct 'sellerId'. (and then elastic query to fetch 1 document of each of these 'N' sellers). One way could be to aggregate on 'sellerId' and then pick random 'N' keys but this is not desirable approach performance wise. Can someone help with better query?
I would rebuild my mapping to create a nested document type, with seller being the parent and stockid being the nested object:
{
"sellerid" : {"type" : "integer" },
"stock_obj" : {
"type" : "nested",
"properties" : {
"stockid" : { "type" : "integer" }
}
}
When you rebuild your index, you would create only one object per seller. Each seller would have all of their stock ids. It seems like there are about 10 stocks per seller, elasticsearch can handle this fine. (If there are thousands of stocks per seller, I would do this differently)
Then, I would do a search for N sellers, sorted randomly, and then as a second sort field, you would sort the stock ids randomly. Not the simplest mapping, but the query is easy and should be fast.
Also, separately, if you're just dealing with ~10k seller/stock data points that are integers, using elasticsearch is probably overkill. It can do what you want, but its main purpose is for searching large amounts of text.

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Automatically indexing by a field name as desc

i have index type of book story that every week wants to put some books.
in this index i want to have always query by sorting a field name(in this case is "price" ) as desc so it's have some overhead on ES (cause of data volume)
in this service we always shows to user books by maximum to minimum price
is possible to have this feature automatically or manually for sorting document of book type in index always by price as desc and then when to want to query them it's always sorted by price as desc and dont need to give it by:
"sort" : { "price" { "order" : "desc" } }
No, you can not keep your data ordered based on a field. Elasticsearch keeps the data as Lucene segments inside. Take a look here to better understand internal structure of ES: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

Resources