Elasticsearch: document relationship - elasticsearch

I'm doing a elastic search autocomplete-as-you-type
I'm using cool features like ngram's and other stuff to create needed analyzer.
currently I break my had around indexing following data.
Let say I have Payments type,
each document in this type looks like this
{
..elastic meta data..
paymentId: 123453425342,
providerAccount : {
id: 123456
firstName: Alex,
lastName: Web
},
consumerAccount : {
id: 7575757,
firstName: John,
lastName: Doe
},
amount: 556,
date : 342523454235345 (some unix timestamp)
}
so basically this document represents not only the payment itself but it also shows the relationship of the payment, the 2 entities which related to the payment.
Payment always have its provider and consumer.
I need this data in payment document because I want to show it in UI.
By indexing it like so, it might be a big pain for handling the updates of Consumer or Provider because each time some of them change its properties I have to update all the payments which has this entity.
Another possible solution is to store only id's of this consumers/providers and make a query on payments and then 2 queries for the entities for retrieving needed fields, but i'm not sure about this because i'm doing ajax requests each time a character entered, so here comes the performance question.
I have also looked into parent/child relationship solution which basically fits my case but I wasn't able to figure out if I can retrieve also the parent(consumer/provider) fields while I querying child(payment).
What would you suggest?
Thanks!

Yes, you can retrieve the parent while querying child using has_child.
Considering payment as child and consumer as parent, You can search all the consumers by :
GET /index_name/consumer/_search
{
"query": {
"has_child": {
"type": "payment",
"query": {
// any query on payment table
},
"inner_hits": {}
}
}
}
This would fetch you all the consumer based on the query on child i.e payment in your case.
inner_hits is what you are looking for. This will retrieve you the children as well. But it was introduced in elasticsearch 1.5.0. So version should be greater than elasticsearch 1.5.0.
You can refer https://www.elastic.co/blog/elasticsearch-1-5-0-released.

Your problem is not an issue. I suppose you want tot freeze data after the pay, right? So you don't need to update the accounts data in existing payment documents.
Further: parent/schild is easy for updating, but less efficient with querying. For auto complete, stay using your current mapping!

Related

Query does not return some items from DynamoDB via GraphQL

May I please know what is the reason why are items in DynamoDB not being fetched by GraphQL?
When searching via the DynamoDB console interface, I could easily see and query the item in there but once used in GraphQL, some items are not showing. Mind you, this isn't a connection problem because I could query items its just there's a specific item that is not being returned.
For example, if I query all Posts, it will return all posts in an array but the item is not showing there. However, when I try to query a Post just by targetting it by its ID, it is working well.
Example code that is not working:
listPosts(filter: {groupID: {eq: "25"}}) {
items {
id
content
}
}
but when I do this, it is working well:
getPost(id: "c59ce7e9") {
id
content
}
I had this same issue and can share what i found and worked for me.
The default resolver for the list operation has a limit:20 built in.
{
"version": "2017-02-28",
"operation": "Scan",
"filter": #if($context.args.filter) $util.transform.toDynamoDBFilterExpression($ctx.args.filter) #else null #end,
"limit": $util.defaultIfNull($ctx.args.limit, 20),
"nextToken": $util.toJson($util.defaultIfNullOrEmpty($ctx.args.nextToken, null)),
}
I imagine you could change this or you could add a limit filter to your query like this:
listPosts(filter: {groupID: {eq: "25"}}, limit:100) {
items {
id
content
}
}
The limit should be higher than the number of records.
You can see that this would be an issue because it is using the scan operation meaning it inspects each record for a match. this would hurt performance. you could add pagination or you should craft a query for this. you will need to look into pagination, relations and connection.
https://docs.aws.amazon.com/appsync/latest/devguide/designing-your-schema.html#advanced-relations-and-pagination

How to extend a response object in GraphQL with linked contents?

I've been working with restful apis. To serve a page, I needed to make lots of calls. So I've started to search graphQL.. I've read the graphql's documentation but couldn't see what I need exactly.
Let me try to tell what I need to do:
I've topic and document models. We can assume I've 100 topics and 1000+ documents..
I need to list all topics with list of documents linked with each topic. We make a call to get all topics then, we make calls for each topics to get their documents..
In GraphQL, Is there a smarter/better way to do that?
Topic{
name:String,
id: ref
}
Document{
name:String,
topic:Ref(Topic.id)
}
Expected Response
Response:[
{
name:"topic1",
documents:[{name:"document1"}, {name:"document2"}]
},
{
name:"topic2",
documents:[{name:"documentX}, {name:"documentY.."}]
},
...
]
And If I extend my needing here..:
There might be another type which will be used by Topic.
Category {
name: String,
id: ref
}
Topic {
...
category: Ref(Category.id)
}
So, there might be multiple topics and multiple documents.. Is there a way to get categories with linked documents for each category in a single response? :)
Category1:[Doc1, Doc2, Doc3, Doc4, Doc5]
Category1
Topic1 Topic2
Doc1 Doc2. Doc3 Doc4 Doc5

Apollo/React mutating two related tables

Say I have two tables, one containing products and the other containing prices.
In Graphql the query might look like this:
option {
id
price {
id
optionID
price
date
}
description
}
I present the user with a single form (in React) where they can enter the product detail and price at the same time.
When they submit the form I need to create an entry in the "product" table and then create a related entry in the "price" table.
I'm very new to Graphql, and React for that matter, and am finding it a steep learning curve and have been following an Apollo tutorial and reading docs but so far the solution to this task is remaining a mystery!
Could someone put me out of my misery and give me, or point me in the direction of, the simplest example of handling the mutations necessary for this?
Long story short, that's something that should actually be handled by your server if you want to optimize for as few requests as possible.
Problem: The issue here is that you have a dependency. You need the product to be created first and then with that product's ID, relate that to a new price.
Solution: The best way to implement this on the server is by adding another field to Product in your mutation input that allows you to input the details for Price as well in the same request input. This is called a "nested create" on Scaphold.
For example:
// Mutation
mutation CreateProduct ($input: CreateProductInput!) {
createProduct(input: $input) {
changedProduct {
id
name
price {
id
amount
}
}
}
}
// Variables
{
input: {
name: "My First Product",
price: {
amount: 1000
}
}
}
Then, on the server, you can parse out the price object in your resolver arguments and create the new price object while creating the product. Meanwhile, you can also relate them in one go on the server as well.
Hope this helps!

Logstash -> Elasticsearch - update denormalized data

Use case explanation
We have a relational database with data about our day-to-day operations. The goal is to allow users to search the important data with a full-text search engine. The data is normalized and thus not in the best form to make full-text queries, so the idea was to denormalize a subset of the data and copy it in real-time to Elasticsearch, which allows us to create a fast and accurate search application.
We already have a system in place that enables Event Sourcing of our database operations (inserts, updates, deletes). The events only contains the changed columns and primary keys (on an update we don't get the whole row). Logstash already gets notified for each event so this part is already handled.
Actual problem
Now we are getting to our problem. Since the plan is to denormalize our data we will have to make sure updates on parent objects are propagated to the denormalized child objects in Elasticsearch. How can we configure logstash to do this?
Example
Lets say we maintain a list of Employees in Elasticsearch. Each Employee is assigned to a Company. Since the data is denormalized (for the purpose of faster search), each Employee also carries the name and address of the Company. An update changes the name of a Company - how can we configure logstash to update the company name in all Employees, assigned to the Company?
Additional explanation
#Darth_Vader:
The problem we are facing is, that we get an event that a Company has changed, but we want to modify documents of type Employee in Elasticsearch, because they carry the data about the company in itself. Your answer expects that we will get an event for every Employee, which is not the case.
Maybe this will make it clearer. We have 3 employees in Elasticsearch:
{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}
Then an update happens in the source DB.
UPDATE company SET name = 'Company NEW' WHERE cmp_id = 1;
We get an event in logstash, where it says something like this:
{type:'company',cmp_id:'1',old.name:'Company A',new.name:'Company NEW'}
This should then be propagated to Elasticsearch, so that the resulting employees are:
{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}
Notice that the field company.name changed.
I suggest a similar solution to what I've posted here, i.e. to use the http output plugin in order to issue an update by query call to the Employee index. The query would need to look like this:
POST employees/_update_by_query
{
"script": {
"source": "ctx._source.company.name = params.name",
"lang": "painless",
"params": {
"name": "Company NEW"
}
},
"query": {
"term": {
"company.cmp_id": "1"
}
}
}
So your Logstash config should look like this:
input {
...
}
filter {
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.company.name = params.name"
"[script][params][name]" => "%{new.name}"
"[query][term][company.cmp_id]" => "%{cmp_id}"
}
remove_field => ["host", "#version", "#timestamp", "type", "cmp_id", "old.name", "new.name"]
}
}
output {
http {
url => "http://localhost:9200/employees/_update_by_query"
http_method => "post"
format => "json"
}
}

Elasticsearch with multiple parent/child relationship

I'm building an application with complicated model, says Book, User and Review.
A Review contains both Book and User id.
To be able to search for Books that contain at least one review, I've set the Book as Review's parent and have routing as such. However I also need to find Users who wrote reviews that contain certain phrases.
Is it possible to have both the Book and User as Review's parent? Is there a better way to handle such situation?
Note that I'm not able to change the way data is modeled/not willing to do so because the data is transfered to Elasticsearch from a persistence database.
As far as I know you can't have a document with two parents.
My suggestion based on Application-side join chapter of Elasticsearch the definitive guide:
Create a parent/child relationship Book/Review
Be sure you have user_id property in Review mapping which contain the user id who wrote that review.
I think that covers both uses cases you described as follows:
Books that contain at least one review It can be solved with has child filter/query
Users who wrote reviews that contain certain phrases It can be solved by querying to reviews with the phrase you want to search and perform a cardinality aggregation on field user_id. If you need users information you have to query your database (or another elasticsearch index) with the ids retrieved.
Edit: "give me the books that have reviews this month written by user whose name started with John"
I recommend you to collect all those advanced uses cases and denormalize the data you need to achieve them. In this particular case it's enough with denormalizing the user name into Review. In any case elasticsearch people has written about managing relations in their blog or elasticsearch the definitive guide
Somths like (just make Books type as parent for Users and Reviews types)
.../index/users/_search?pretty" -d '
{
"query": {
"filtered": {
"filter": {
"and": [
{
"has_parent": {
"parent_type": "books",
"filter": {
"has_child": {
"type": "Reviews",
"query": {
"term": {
"text_review": "some word"
}
}
}
}
}
}
]
}
}
}
}
'
You have two options
Elasticsearch Nested Objects
Elasticsearch parent&child
both are compared and evaluated nicely here

Resources