Use a many to many relation in Elasticsearch - elasticsearch

Currently we have a problem to perform a query (or more precisely to design a mapping) in elasticsearch, which help us to perform a query over a relational problem, that we didn't get solved with our non-document orientated thinking from sql.
We want to create a many-to-many relation between different Elasticsearch entries. We need this to edit an entry once and keep all using’s updated to this.
To describe the problem, we'll use the following simple data model:
Broadcast Content
------------ ---------
Id Id
Day Title
Contents [] Description
So we have two different types to index, broadcasts and contents.
A broadcast can have many contents and single contents could also be part of different broadcasts (e.g. repetition).
JSON like:
index/broadcasts
{
"Id": "B1",
"Day": "2014-10-15",
"Contents": [
"C1",
"C2"
]
}
{
"Id": "B2",
"Day": "2014-10-16",
"Contents": [
"C1",
"C3"
]
}
index/contents
{
"Id": "C1",
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
}
{
"Id": "C2",
"Title": "Have a break!",
"Description": "Everything about Android"
}
{
"Id": "C3",
"Title": "Late Night Disaster",
"Description": "Comedy show"
}
Now we want to rename the "Late Night Disaster" into something more precisely and keep all references up to date.
How could we approach this? Are there fourther options in ES, like includes in RavenDB?
Nested objects or child-parent relations didn't helped us so far.

What about denormalizing? seems difficult if we come from the SQL mindset, but give you a try, even with millions of documents, LUCENE indexing can help, and renaming will be a batch job.
[
{
"Id": "B1",
"Day": "2014-10-15",
"Contents": [
{
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
},
{
"Title": "Have a break!",
"Description": "Everything about Android"
}
]
},
{
"Id": "B2",
"Day": "2014-10-16",
"Contents": [
{
"Title": "Wake up",
"Description": "Morning show with Jeff Bridges"
},
{
"Title": "Late Night Disaster",
"Description": "Comedy show"
}
]
}
]

Related

How to use elasticsearch for smart(simple) searching on marketplace?

I'm still learning elastic and a lot of things are unclear to me, including this example:
Suppose I have marketplace like amazon/any (many products with many sub options and availability by cities). And I want use elastic for searching only by string field.
For example I want to search "lord of the rings compilation in dublin" and elastic should return only books compilation on lord of the rings which availability in dublin.
Into elastic I can put documents with any schema (using only for searching).
So now I have this schema for elastic (data compilation from prod database):
[
{
"name": "lord of rings",
"seller": "Home Production",
"availability": [
{
"city": "dublin",
"category": "book",
"types": [
"one book",
"compilation"
]
},
{
"city": "london",
"category": "book",
"types": [
"one book",
"compilation"
]
}
]
},
{
"name": "lord of rings",
"seller": "Some",
"availability": [
{
"city": "dublin",
"category": "book",
"types": [
"one book",
"compilation"
]
},
{
"city": "london",
"category": "book",
"types": [
"one book",
"compilation"
]
},
{
"city": "dublin",
"category": "dvd",
"types": [
"disk"
]
}
]
}
]
This is a very abstract example. We can format the data schema in any way for ease of searching. The search city is always known (it is not part of the text query).
The difficulty is that one seller, for one product, has many cities of availability and in each city we know the "options" of availability (for example, one book or a whole collection)
I don't know how to describe it in more detail or how to find it in Google correctly.
I tried multi_match but it gives wrong answers if i want 'lord of rings dvd in dublin'.
He suggests that the first document is more relevant to me, although in fact the second document is the correct answer.
Relevance issues are not easy to solve, sometimes you could get better results if you boost the city field using multi-match.
Anyway, you need to study more to understand your scenario and make the documents you want more relevant.
I recommend that you read the book Relevant Search that will help you a lot to understand why some results are not relevant as you want.

Delete existing Records if they are not in sent array Rails 5 API

I need help on how to delete records that exist in the DB but not in array sent in a request;
My Array:
[
{ "id": "509",
"name": "Motions move great",
"body": "",
"subtopics": [
{
"title": "Tywan",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportations Gracious",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportation part",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
},
{
"name": "Motions kkk",
"body": "",
"subtopics": [
{
"title": "Transportations",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
}
]
Below is my implementation: where am going wrong?
#topics = #course.topics.map{|m| m.id()}
#delete= #topics
puts #delete
if Topic.where.not('id IN(?)', #topics).any?
#topics.each do |topic|
topic.destroy
end
end
it's not clear to me where, in your code, you pick the ids sent in the array you showed before... so I'm assuming like this:
objects_sent = [
{ "id": "509",
"name": "Motions move great",
"body": "",
"subtopics": [
{
"title": "Tywan",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportations Gracious",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
},
{
"title": "Transportation part",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
},
{
"name": "Motions kkk",
"body": "",
"subtopics": [
{
"title": "Transportations",
"url_path": "https://ugonline.s3.amazonaws.com/resources/6ca0fd64-8214-4788-8967-b650722ac97f/WhatsApp+Audio+2021-09-24+at+13.57.34.mpeg"
}
]
}
]
since you have your array like this, the only information you need to query on database is the ids (also, assuming the id's in the array are the id's on database, otherwise it wouldn't make sense). You can get them like this:
sent_ids = objects_sent.map{|o| o['id'].to_i}
Also, it seems to me that, for the code you showed, you want to destroy them based on a specific course. There would be 2 ways to do that. First, using the relationship (I prefer like this one):
#course.topics.where.not(id: sent_ids).destroy_all
Or you can do the query directly on the Topic model, but passing the course_id param:
Topic.where(course_id: #course.id).where.not(id: sent_ids).destroy_all
ActiveRecord is smart enough to mount that query correctly in both ways. Give it a test and see which works better for you

ElasticSearch - how to edit a field inside an array of a document

I have a document in our ElasticSearch index which looks like this:
{
"_index": "nm_doc",
"_type": "nm_doc",
"_id": "JRPXqmQBatyecf67YEfq",
"_score": 0.86147696,
"_source": {
"text": "A 29-year-old IT professional from Bhopal was convicted and sentenced to life imprisonment by an Additional Sessions Court in Pune on Wednesday for the rape and brutal murder of a woman in 2008, after she had refused his advances. Watch What Else is Making News The court found Manu Mohinder Ebrol, who worked in the same firm as the girl, of raping and killing the woman after stabbing her 18 times on the night of October 20, 2008, in her rented apartment. After committing the crime, Ebrol had fled to Bhopal. He was arrested later by Pune Police. The prosecution examined 26 witnesses for the case and forensic evidence such as call details and medical records also proved crucial. For all the latest Pune News , download Indian Express App",
"entities": [
{
"name": "Mohinder Ebrol"
},
{
"name": "Sessions Court"
},
{
"name": "Pune Police"
},
{
"name": "Pune News"
},
{
"name": "Indian Express"
}
]
}
If I wanted to edit just the first name in that array (Mohinder Ebrol) to be Manu Ebrol, how would I accomplish this via API call? Do I need to pass in the entire array to update the one name?
I have figured it out via the documentation:
The call Url is:
POST http://elastichost:9200/indexname/_doc/JRPXqmQBatyecf67YEfq/_update?pretty
And the body simply looks like this (yes, you do have to provide the entire array):
{
"doc": { "entities": [
{
"name": "Manu Ebrol"
},
{
"name": "Sessions Court"
},
{
"name": "Pune Police"
},
{
"name": "Pune News"
},
{
"name": "Indian Express"
}
] }
}
Hope this can help someone in the future.

RethinkDB: How to do recursive joins on three tables?

I am developing a platform with JSON API using Python Flask. In some cases I need to join three tables. How to join tables with a array of IDs gave me some guidance but I need a solution beyond it.
Let's assume we have three tables for a messaging app.
Accounts
Conversations
Messages
Message Readers
Accounts table snippet
{
"id": "account111",
"name": "John Doe",
},
Conversations table snippet
{
"id": "conversation111",
"to": ["account111", "account222", "account333"], // accounts who are participating the conversation
"subject": "RethinkDB",
}
Messages table snippet
{
"id": "message111",
"text": "I love how RethinkDB does joins.",
"from": "account111", // accounts who is the author of the message
"conversation": "conversation111"
}
Message Readers table snippet
{
"id": "messagereader111",
"message": "message111",
"reader": "account111",
}
My question is "What's the magic query to get the document below when I receive a get request on an account document with id="account111"?"
{
"id": "account111",
"name": John Doe,
"conversations": [ // 2) Join account table with conversations
{
"id": "conversation111",
"name": "RethinkDB",
"to": [ // 3) Join conversations table with accounts
{
"id": "account111",
"name": "John Doe",
},
{
"id": "account222",
"name": "Bobby Zoya",
},
{
"id": "account333",
"name": "Maya Bee",
},
]
"messages": [ // 4) Join conversations with messages
{
"id": "message111",
"text": "I love how RethinkDB does joins.",
"from": { // 5) Join messages with accounts
"id": "account111",
"name": "John Doe",
},
"message_readers": [
{
"name": "John Doe",
"id": "account111",
}
],
},
],
},
],
}
Any guidance or advice would be fantastic. JavaScript or Python code would be awesome.
I had a hard time understanding what you want (you have multiple documents with the id 111), but I think this is the query you are looking for
Python query:
r.table("accounts").map(lambda account:
account.merge({
"conversations": r.table("conversations").filter(lambda conversation:
conversation["to"].contains(account["id"])).coerce_to("array").map(lambda conversation:
conversation.merge({
"to": conversation["to"].map(lambda account:
r.table("accounts").get(account)).pluck(["id", "name",]).coerce_to("array"),
"messages": r.table("messages").filter(lambda message:
message["conversation"] == conversation["id"]).coerce_to("array").map(lambda message:
message.merge({
"from": r.table("accounts").get(message["from"]).pluck(["id", "name",]),
"readers": r.table("message_readers").filter(lambda message_reader:
message["id"] == message_reader["message"]).coerce_to("array").order_by(r.desc("created_on")),
})).order_by(r.desc("created_on"))
})).order_by(r.desc("modified_on"))
})).order_by("id").run(db_connection)

ElasticSearch - Querying only for particular array elements that are not empty

I'm relatively new to ES and am having difficulty finding really good references or tutorials on the query dsl.
We have a document type of the example below. The query I wish to conduct is thus: "Return all the email_package records that have at least one entities record (one record in the 'entities' array)." And yes I want the complete 'email' record.
Could anyone assist? Also if you could point to a reference or tutorial or cookbook somewhere that addresses question like this, that would be also greatly appreciated.
"email_package": {
"email": {
"date": "2007-02-13T18:24:22-04:00",
"subject": "this is the subject",
"body": "this is the body"
},
"entities": [
{
"Louisville": {
"City": "South"
}
},
{
"Memphis": {
"City": "South"
}
}
]
}
// more 'email_package records follow...
Your document is a bit problematic, since you seems to be nesting objects and giving them different names. If you are not bound to the current structure, I would have changed the mapping into something that is more manageable, and queries will be straight forward, e.g:
"email_package": {
"email": {
"body": "this is the body1",
"date": "2007-02-13T18:24:22-04:00",
"subject": "this is the subject"
},
"entities": [
{
"name": "Louisville"
"City": "South",
},
{
"name": "Memphis"
"City": "South",
}
]
}
Query:
{ "filter": {
"exists": {
"field": "email_package.entities.name"
}
}

Resources