Elasticsearch custom score function - elasticsearch

Lets say I have a the following tables:
products: id, name, latitude and longitude
users: id, name, latitude, longitude
interests: id, name
user_interests: user_id, interest_id
These gets inserted/updated into elasticsearch whenever they are created/edited using model observers.
Now I want to make a custom script/scoring function in elastic search which will return me a matching score when a user is searching for a product. The scoring will be based on distance between users location and product, user interests and product name match etc etc.
Being new to elastic search, whats the right approach of implementing this? Any tutorial, online resources or examples are highly appreciated.
PS: I am using PostgreSQL as a database. I can create a function there called get_match(product_id, user_id) which returns a number (0-100) based on matching criteria and do something like:
//psuedo sql
select p.name, get_match(p.id, u.id) as match
from products p, users u
order by match desc
I want to achieve a similar functionality in elasticsearch if possible.

Related

Search/retrieve by a large OR query clause with Solr or Elasticsearch

I have a search database of car models: "Nissan Gtr", "Huynday Elantra", "Honda Accord", etc...
Now I also have a user list and the types of cars they like
user1 likes: carId:1234, carId:5678 etc...
Given user 1 I would like to return all the cars he likes, it can be 0 to even hundreads.
What the best way to model this in Solr or potentially another "nosql" system that can help with this problem.
I'm using Solr but I have the opportunity to use another system if I can and if it makes sense.
EDIT:
Solr solution is to slow for Join (Maybe we can try nested). And the current MySQL solution which uses join tables has over 2 billion rows.
so, you just want to store a mapping between User->Cars, and retrieve the cars based on the user...sounds very simple:
Your docs are Users: contain id (indexed), etc fields
one of the field is 'carsliked', multivalued, which contains the set of car ids he likes
you have details about each care in a different collection for example.
given a user id, you retrieve the 'carsliked' field, and get the car details with a cross collection join
You could also use nested object to store each liked car (with all the info about it) inside each user, but is a bit more complex. As a plus, you don't need the join on the query.
Solr would allow you many more things, for example, given a car, which users do like it? Elasticsearch will work exactly the same way (and probably many other tools, given how simple your use case seems).

RethinkDB query OrderBy distances between central point and subtable of locations

I'm fairly new to RethinkDB and am trying to solve a thorny problem.
I have a database that currently consists of two kinds of account, customers and technicians. I want to write a query that will produce a table of technicians, ordered by their distance to a given customer. The technician and customer accounts each have coordinate location attributes, and the technicians have a service area attribute in the form of a roughly circular polygon of coordinates.
For example, a query that returns a table of technicians whose service area overlaps the location of a given customer looks like this:
r.db('database').table('Account')
.filter(r.row('location')('coverage').intersects(r.db('database')
.table('Account').get("6aab8bbc-a49f-4a9d-80cc-88c95d0bae8d")
.getField('location').getField('point')))
From here I want to order the resulting subtable of technicians by their distance to the customer they're overlapping.
It's hard to work on this without a sample dataset so I can play around. I'm using my imagination.
Your Account table stores both of customer and technician
Technician document has field location.coverage
By using intersect, you can returns a list of technician who the coverage locations includes customer location.
To order it, we can pass a function into orderBy command. With each of technican, we get their point field using distance command, return that distance number, and using that to order.
r.db('database').table('Account')
.filter(
r.row('location')('coverage')
.intersects(
r.db('database').table('Account').get("6aab8bbc-a49f-4a9d-80cc-88c95d0bae8d")('location')('point')
)
)
.orderBy(function(technician) {
return technician('location')('point')
.distance(r.db('database').table('Account').get("6aab8bbc-a49f-4a9d-80cc-88c95d0bae8d")('location')('point'))
})
I hope it helps. If not, let's post some sample data here and we can try figure it out together.

How to model this mysql table relationship in elastic search

I have a large amount of shop items imported into elastic search and I can query them.
I am wondering how best to model the following mysql table relationship into elastic search:
Shop items can have different offers. There are different offer types. And in some shops an item may be on offer, in other shops the item may not be on offer or have a different offer type. Items don't have to have offers. I model this below:
Items table
item_id
Offers table
shop_id, item_id, offer_type, user_id
Where user_id is the id of the user who created the offer.
So as an example, item_id 1 and shop_id's 1,2 and offer_types premium and featured.
Then the offers table could look like:
shop_id, item_id, offer_type, user_id
1,1,featured,45
2,1,premium,33
2,1,featured,45
But it's not the case that every item is on offer. And even if item_id 1 is on offer in shops 1 and 2, it might not be on offer in other shops.
I want to be able to query my /items type and it will only be for one shop at a time but for that shop I want to get all the items in e.g. a certain price range and of a certain category (that i can do all ready), but I need to know for each item in the results what offer they have if any (e.g. if featured, premium or whatever offer_type).
How can I best model this behaviour in elastic search?
One approach is Nested Object relationship - Shop contains set of items with id as your shop id
For your cases
1) Get all items of a shop - GET: http://host/your_index/shops_type/shopid
This will give you all items in a shop along with offer_type. you can filter in your program logic

Grab all documents whose IDs are not present in a separate table

I have a users table and a tasks table. In tasks, each document consists of a user_id field (which is an ID from the users table) and some other, irrelevant fields.
I would like to filter users after some criteria (.filter({'field': 'value'})), then get only those users that are NOT in the tasks table (that user_id field).
I started the query: r.table('users').filter({'field': 'value'}) but I don't quite have a clue on how to write that "user shouldn't be found in tasks table".
You can nest ReQL expression (including querying another table inside a method (like filter).
You want to use some indexes here to make things faster.
r.table('tasks').createIndex('user_id').run()
r.table('users').filter(...).filter(function(user) {
return r.table('users').getAll(user('id'), {index: 'user_id'}).isEmpty()
})

Elastic search query scenario

I am building an application which requires a location based search of hotels.
I have three Main classes
class Hotel {
String name
String latitude
String longitude
}
class HotelResource {
Hotel hotel
String name
}
class HotelResourceAvailability{
HotelResource resource
}
HotelResourceAvailability - holds the availability data of a hotel resource.
The query scenario,
As a user I want to search for all the hotels in a particular location which have at least one hotel resource available
and get the count of available resources for each of the hotels
Note - The hotels matching the location criteria but without any available resource should be filtered out.
I am new to elastic search and finding it difficult to decide on the approach any pointers would be really appreciated.
Three points to get you started:
I would keep your data model as flat as possible - Elasticsearch isn't relational so you can't easily join from one object to another.
Latitude and Longitude can be stored in the geo_point type - you can then use queries to find the nearest matching hotels.
Is Hotel availability based on date? if so I would use nesting or a parent child relationship.

Resources