Search/retrieve by a large OR query clause with Solr or Elasticsearch - elasticsearch

I have a search database of car models: "Nissan Gtr", "Huynday Elantra", "Honda Accord", etc...
Now I also have a user list and the types of cars they like
user1 likes: carId:1234, carId:5678 etc...
Given user 1 I would like to return all the cars he likes, it can be 0 to even hundreads.
What the best way to model this in Solr or potentially another "nosql" system that can help with this problem.
I'm using Solr but I have the opportunity to use another system if I can and if it makes sense.
EDIT:
Solr solution is to slow for Join (Maybe we can try nested). And the current MySQL solution which uses join tables has over 2 billion rows.

so, you just want to store a mapping between User->Cars, and retrieve the cars based on the user...sounds very simple:
Your docs are Users: contain id (indexed), etc fields
one of the field is 'carsliked', multivalued, which contains the set of car ids he likes
you have details about each care in a different collection for example.
given a user id, you retrieve the 'carsliked' field, and get the car details with a cross collection join
You could also use nested object to store each liked car (with all the info about it) inside each user, but is a bit more complex. As a plus, you don't need the join on the query.
Solr would allow you many more things, for example, given a car, which users do like it? Elasticsearch will work exactly the same way (and probably many other tools, given how simple your use case seems).

Related

Inserting Dynamic Data Into Cassandra Column Family

It might not be a relevant title for this question, but if I explain what I mean, it makes sense.
In order to learn how Cassandra works, I have the following scenario :
Consider I have an online store with lots of different products such as cars, smartphones, clothes etc. in which every product has its own specs.
I need some examples around how to model my Products column family?
It should be mentioned that I need to filter them by specs. something like:
SELECT * FROM Products WHERE Ram > 3;
Cassandra data modeling suggests one table for each query.
Now if you want to query on specs, we need to make specs as one of the keys. Using that table you can query something like this:
SELECT * from products where specs = 'RAM';
If you want to filter on brand, you need to keep brand_name as one of the columns and your query would look like this:
SELECT * from from products where brand_name = 'brandname';
So, what we are looking at here is we need to make 'RAM' as one of the columns.
You would need to do data wrangling to create data for each specs. For example: you have data with product_id,spec,inventory. You would need to do some analysis and see if you need to create columns for each spec or you can group a lot of specs and create new columns. The final data may look like :
product_id,ram product_id,hdd etc. You get the idea.
If you have a lot of specs, then maybe you need to create separate table for different products and then design data model from there.
I'd suggest you take data modeling course for a better understanding of cassandra data modeling.
I'm not quite sure if this is the best answer. However after some researches I found THIS.
This way I can store Product's specs in a column with MAP<TEXT, TEXT> data type. Then create INDEX on it.
Then I can query it like the following :
SELECT * FROM Products WHERE Specs['Ram'] > '3GB';
Hence, the Ram is a Key in the MAP and 3GB is it's value.

How to search for related data using Plastic/Elasticsearch?

I'm using Laravel 5, Elasticsearch 5 and Plastic in a e-commerce(like) project. Through the use of Elasticsearch and Plastic I'm able to create indexed data, allied to a powerful searching tool, while still working with Eloquent models. In this way, I'm expecting to be able to make what we could consider as extremely heavy querys (with lots of JOIN and LIKE), simple and fast. Expecting 63 million queries a day...
Said that, I'm having an issue with the following situation:
I have a table Department and a table Employee, they are related. I'm
searching for Departments and using a keyword that may only exist in
some column from Employee's table, is it possible to get all
Departments where there are Employees with that keyword?
Example:
When searching Departments with keyword 'Xbox', the system should be able to provide all departments with Employees that mention 'Xbox' in their profile.
After experimenting around, the solution was found. You must:
Relate the models in Laravel (check Eloquent Relationships at Laravel Documentation)
Make the appropriate mapping in the model, using the buildDocument() function
Make it searchable (as per example)
Rebuild your mappings
Regenerate your indexes
And that's it.

Elasticsearch the best way to design multiple one to many and many to many

I have two scenarios that I want to support but I don’t know the best way to design relations in the elasticsearch. I read the entire elasticsearch documentation but I couldn’t find the best way to design types for my scenarios.
Multiple one to many.
Let’s assume that I have the following tables in my relational database that I want to transfer to the elasticsearch:
Transaction table Id User1Id User2Id ….
User table Id Name
Transaction contains two references to User. As far as I know I cannot use the parent->child relation specifying two parents? I need to store transaction and user in separate types because they can be changed separately. I need to be able to search transaction through user details and return users connected with transactions. Any idea how to design such structure in the elastic search?
Many to many
Let’s assume that we have the following tables:
Order Id …
OrderLine OrderId UserId Amount …
User Id Name
Order line is always saved with the order so I thought that I can store order with order lines as a nested object relation but the user must be in the separate table. Is there any way how can I connected multiple users from order line with user type? I assume that I can use application side join but I need to retrieve order and order line always together and be able to search order by user data.
I can use grandparent and grandchildren relations but then I need to make joins in the application. Any idea how to design it in the best way?

"Join query" in ElasticSearch

Let's say we have two index types: members and restaurants. Both contain city attribute.
I want to filter members (e.g. by name) and would like to include list of restaurant names from the members' hometown/city in the results.
Is it possible to do this using just one ES query? I guess it should be similar to DB join.
Thanks.
ES doesn't have the concepts of joins. This is due to it being an index rather than a relational database. Your best best to make two calls. One to get the member's documents, then another to get the restaurants.
Unless you have odd circumstances, this should still be very efficient.

How are application like twitter implemented?

Suppose A follows 100 person,
then will need 100 join statement,
which is horrible for database I think.
Or there are other ways ?
Why would you need 100 Joins?
You would have a simple table "Follows" with your ID and the other persons ID in it...
Then you retrieve the "Tweets" by joining something like this:
Select top 100
tweet.*
from
tweet
inner join
followers on follower.id = tweet.AuthorID
where
followers.masterID = yourID
Now you just need a decent caching and make sure you use a non locking query and you have all information... (Well maybe add some userdata into the mix)
Edit:
tweet
ID - tweetid
AuthorID - ID of the poster
Followers
MasterID - (Basically your ID)
FollowerID - (ID of the person following you)
The Followers table has a composite ID based on master and followerID
It should have 2 indexes - one on "masterID - followerID" and one on "FollowerID and MasterID"
The real trick is to minimize your database usage (e.g., cache, cache, cache) and to understand usage patterns. In the specific case of Twitter, they use a bunch of different techniques from queuing, an insane amount of in-memory caching, and some really clever data flow optimizations. Give Scaling Twitter: Making Twitter 10000 percent faster and the other associated articles a read. Your question about how you implement "following" is to denormalize the data (precalculate and maintain join tables instead of performing joins on the fly) or don't use a database at all. <-- Make sure to read this!

Resources