Elasticsearch - Count of associations between indexes? - elasticsearch

Coming from the relational database background, I want to know if there is a way to retrieve the number of unique associations between two indexes.
Basic Example (Using relational databases)
I have 3 tables: Person, Cars, Person-Cars
Person-Cars has two columns (person_id, car_id) and holds the number of associations (ownership) between people and cars.
On Elasticsearch, I have created an index for Person and for Cars.
Main Point
Everytime that I fetch a Car document, I want to know how many people own that car (IOW how many associations it has to unique people)
--
To archieve that, I would need another index for Person-Cars, and then would have to index all the association records? Is there a simpler way? What would be the best way to do this in ES?
I have looked into aggregations, but I think that can only be done on a single level (person or cars) not sure.
Thanks!

On Elasticsearch, I have created an index for Person and for Cars.
Most of the times it makes sense to store the data in a denormalized fashion in elastic search, viz defining one-to-many relationships as either nested or parent-child relationship or simply in multi-value fields.
What would be the best way to do this in ES?
It depends on your use case (either parent-child or nested or multi-value). Having separate indexes for each type definitely will add overhead. If you add other use cases and type of queries which you would be needing then only schema can be better modelled.
Considering only the shared use case: Below car document will solve your case :
{
"id":1,
"brand":"Hyundai",
"owners":[21,31,51] // <===== Ids of owners. Ids & names both can be stored if required.
"owners_cnt": 3 // <==== OR You can simply maintain the counter as well.
}
Whenever a person buy/sell a car, then car document needs to updated in this case. If buying and selling of cars happens frequently and you need to update both car & person if a person bought a car then this type of modelling makes less sense.
In that case it makes sense to have car_ids within-person doc :
{
"id":1,
"name":"Raj",
"cars":[1,2,3]
}
In this case, we can use below query to fetch the number of persons who bought a car , having id=3
GET person/_count
{
"query": {
"match": {
"cars": 3
}
}
Again better modelling can be achieved if more context is shared.

Related

JPA entity relationship dependent on other field

My database has a table that represents the common fields between a bunch of other things. So lets say there's one table like Vehicle and then Vehicle has fields like VIN, color, type...
Then there are other tables like Car and Truck and when the "type" on the Vehicle is "car" we want to look at the Car table to find other properties. Maybe stuff about fuelEfficiency and numPassengers. When type is "truck" we want to know things about trucks like maybe loadCapacity or whatever.
How do you model something like this? Do you Embed the Vehicle into all of the other types? Is there a way to do like a #OneToOne between the tables conditionally on the "type" field?
You could use a MappedSuperClass or a JoinedTable strategy, if you want to follow the real ORM way of doing this. But it comes with a cost. There are other overheads, such as querying restrictions and more joins per query etc that comes with these strategies.
If you denormalize the tables, you could use a datafilter strategy (only hibernate supports I think). But now you have everything in a single class and that could be a bigger problem to deal with.
Since you are using REST, you are probably looking to cater /api/vehicles/1 (correct me pls)
For such scenarios, it is easier to perform this using a "Vehicle" table which contains the "type" in it. You can use a jackson serializer to omit all non-null components. So in the case of a car, vehicle attributes and car parameters will come in, but null values on truck, bus etc will be omitted.

How to model user-configurable schemas in ElasticSearch 6+ ?

I work at a small SaaS startup.
As a SaaS software, you create an account and invite others.
We have some entities, like "product", but you can configure product's field.
Let's say you work with cars, you can create fields like "Model", "Year", "Weight", etc.
Let's say you work with clothes, you can create fields like "Size", "Gender", etc.
We have this modeled by SQL, but I want to have a replica in ElasticSearch for general searches and especially for customizeable reports.
To model this, I was considering the options:
One Index per Account Entity
When an account is created, I'd create an index named something like "product-" and it would have its own schema.
When the account's admin creates or changes an field, I'd need to use Update By Query or Reindex API.. idk.
When an account is deleted, I'd need to delete the indexes.
PRO: each index has an solid schema.
CONS: creating/deleting indexes dynamically sounds scary.
One Index per Entity.
This one seems ok too. I'd put "account_id" on every document, and filter it everytime.
Would have only one index per entity: "products", "users", "contacts", "sales" etc.
PRO: way simpler
CONS: each index has multiples schemas. One per account_id
I'm not sure how to consider the relationships either... Can I create relationships dynamically?
One Index to rule them all.
One index "entities", with fields "account_id", "entity_type".
Maybe I need to do this to map the relationships propertly. I'm quite confused. I did not understood fully the join field.
Anything that I'm missing?
Thanks for reading until here :)

Search/retrieve by a large OR query clause with Solr or Elasticsearch

I have a search database of car models: "Nissan Gtr", "Huynday Elantra", "Honda Accord", etc...
Now I also have a user list and the types of cars they like
user1 likes: carId:1234, carId:5678 etc...
Given user 1 I would like to return all the cars he likes, it can be 0 to even hundreads.
What the best way to model this in Solr or potentially another "nosql" system that can help with this problem.
I'm using Solr but I have the opportunity to use another system if I can and if it makes sense.
EDIT:
Solr solution is to slow for Join (Maybe we can try nested). And the current MySQL solution which uses join tables has over 2 billion rows.
so, you just want to store a mapping between User->Cars, and retrieve the cars based on the user...sounds very simple:
Your docs are Users: contain id (indexed), etc fields
one of the field is 'carsliked', multivalued, which contains the set of car ids he likes
you have details about each care in a different collection for example.
given a user id, you retrieve the 'carsliked' field, and get the car details with a cross collection join
You could also use nested object to store each liked car (with all the info about it) inside each user, but is a bit more complex. As a plus, you don't need the join on the query.
Solr would allow you many more things, for example, given a car, which users do like it? Elasticsearch will work exactly the same way (and probably many other tools, given how simple your use case seems).

Most efficient way to model data in Elasticsearch

I have an example of modelling an commence site. Say that the site has few hundreds shops and few millions products. The products per shop range: 1000-100.000 products/shop. I need to be able to aggregate the products and the shop fields. All the products and all the shops have the same schema.
Product
{
"productName"
"price"
"category"
}
Shop
{
"shopName"
"rating"
}
1) Is it more efficient to have a) 1 index/shop, b) same index and 1 type/shop or c) same index, same type and have a field to determine the shop of the product?
I read some related articles and most of them are in favour of same index and 1 type/shop. But then they say that if there is one single index which has a large number of docs it might be even slower than having multiple indices.
2) I also need to perform JOINS and aggregations between the shops and the products. For example I need to be able to retrieve all the products from the shops with rating higher than 8/10 and also get the number of products per category. Is it preferable to use a) application-side JOIN, b) parent-child relationships, c) Siren plug-in, d) something else?
I would definitely use single denormalized index/type for the use cases you've mentioned. If you will need more fields for the shop, then you will create another index for shops, while still keeping first denormalized index. Mind that you may need unique shop id alongside to shop name.

"Join query" in ElasticSearch

Let's say we have two index types: members and restaurants. Both contain city attribute.
I want to filter members (e.g. by name) and would like to include list of restaurant names from the members' hometown/city in the results.
Is it possible to do this using just one ES query? I guess it should be similar to DB join.
Thanks.
ES doesn't have the concepts of joins. This is due to it being an index rather than a relational database. Your best best to make two calls. One to get the member's documents, then another to get the restaurants.
Unless you have odd circumstances, this should still be very efficient.

Resources