Elasticsearch how to support transaction involving multiple documents

Elasticsearch how to support transaction involving multiple documents - elasticsearch

i use elasticsearch,and Denormalizing Data,like
PUT /my_index/user/1
{
"name": "John Smith",
"email": "john#smith.com",
"dob": "1970/10/24"
}
PUT /my_index/blogpost/2
{
"title": "Relationships",
"body": "It's complicated...",
"user": {
"id": 1,
"name": "John Smith"
}
}
but the problem is that Elasticsearch does not support ACID transactions. Changes to individual documents are ACIDic, but not changes involving multiple documents.if i want to change /my_index/user/1 and /my_index/blogpost/2 user name at one transaction,if one error it will rollback, how to do that?

There are no transactions in ES and never will according to inside sources.
The best way to achieve what you want is to make your updates in bulk and then check the response of each individual responses.
POST _bulk
{"index": {"_index": "my_index", "_type": "user", "_id": "1"}}
{ "name": "John Smith", "email": "john#smith.com", "dob": "1970/10/24" }
{"index": {"_index": "my_index", "_type": "blogpost", "_id": "2"}}
{ "title": "Relationships", "body": "It's complicated...", "user": { "id": 1, "name": "John Smith" }}
When your client gets the responses, it should check the items array and make sure that each item status is 200 (updated) or 201 (created). If that's the case, your bulk "transaction" was properly committed, if not, then everything with status 200 or 201 was committed otherwise the commit failed.

Related

Apollo cache error on deleted "child" table

NOTE: the __typename Client below refers to a person (like a customer) - not client/server.
Getting the following error:
Cache data may be lost when replacing the clientServices field of a Client object.
To address this problem (which is not a bug in Apollo Client), define a custom merge function for the Client.clientServices field, so InMemoryCache can safely merge these objects:
existing: [{"__ref":"ClientService:46"}]
incoming: []
What's happening is that I've deleted a "child" record - so there's no data.
First time I get this returned. Note: there's info associated with field clientServices
{
"client": {
"id": "41",
"companyId": "3",
"firstName": "Lew",
"lastName": "Terry",
"email": "lewterry#diamond.com",
"clientServices": [
{
"id": "46",
"serviceId": "13",
"description": "Individual psychotherapy - 45 minutes",
"sessionFee": 90,
"cptCode": "90834",
"__typename": "ClientService"
}
],
"__typename": "Client"
}
}
And here's the results on a refetch after the client record is saved and the clientServices "child" data has been deleted so it's an empty array.
{
"client": {
"id": "41",
"companyId": "3",
"firstName": "Lew",
"lastName": "Terry",
"email": "lewterry#diamond.com",
clientServices: []
"__typename": "Client"
}
}
Do I really need a custom merge? Or is there another solution?

Auto Increment a field value every time a doc is updated in elasticsearch

This is payload
{
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn",
}
Result in the form
{"_index": "rwe",
"_type": "_doc",
"_id": "8wEed3ABcYN_H8khP4hB",
"_score": 1,
"_source": {
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn"
}
}
This is a document where I want to increment the count value of the field every time when this doc gets updated.
we have to add new field which has the name counter_value.
Expected Resultt
{"_index": "rwe",
"_type": "_doc",
"_id": "8wEed3ABcYN_H8khP4hB",
"_score": 1,
"_source": {
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn",
"counter_value": 1
}
}

You can just increment the counter via scripting, see here and here. However, elastic already has a version field. Depending on your usecase, it might be enough to add the version parameter to your query, as described here:
curl -XGET 'http://localhost:9200/rwe/_search?version=true'

ElasticSearch - how to edit a field inside an array of a document

I have a document in our ElasticSearch index which looks like this:
{
"_index": "nm_doc",
"_type": "nm_doc",
"_id": "JRPXqmQBatyecf67YEfq",
"_score": 0.86147696,
"_source": {
"text": "A 29-year-old IT professional from Bhopal was convicted and sentenced to life imprisonment by an Additional Sessions Court in Pune on Wednesday for the rape and brutal murder of a woman in 2008, after she had refused his advances. Watch What Else is Making News The court found Manu Mohinder Ebrol, who worked in the same firm as the girl, of raping and killing the woman after stabbing her 18 times on the night of October 20, 2008, in her rented apartment. After committing the crime, Ebrol had fled to Bhopal. He was arrested later by Pune Police. The prosecution examined 26 witnesses for the case and forensic evidence such as call details and medical records also proved crucial. For all the latest Pune News , download Indian Express App",
"entities": [
{
"name": "Mohinder Ebrol"
},
{
"name": "Sessions Court"
},
{
"name": "Pune Police"
},
{
"name": "Pune News"
},
{
"name": "Indian Express"
}
]
}
If I wanted to edit just the first name in that array (Mohinder Ebrol) to be Manu Ebrol, how would I accomplish this via API call? Do I need to pass in the entire array to update the one name?

I have figured it out via the documentation:
The call Url is:
POST http://elastichost:9200/indexname/_doc/JRPXqmQBatyecf67YEfq/_update?pretty
And the body simply looks like this (yes, you do have to provide the entire array):
{
"doc": { "entities": [
{
"name": "Manu Ebrol"
},
{
"name": "Sessions Court"
},
{
"name": "Pune Police"
},
{
"name": "Pune News"
},
{
"name": "Indian Express"
}
] }
}
Hope this can help someone in the future.

Best approch of Elastic Search time based feeds module?

I am new with elastic search and looking for the best solution with which i can create a feed module which have time based feeds along with there group and comment.
I learned little and come up with following.
PUT /group
{
"mappings": {
"groupDetail": {},
"content": {
"_parent": {
"type": "groupDetail"
}
},
"comment": {
"_parent": {
"type": "content"
}
}
}
}
so that will be placed separately as per index.
but than after i found one post where i found that parent child is costly operation for search than nested objects.
something like following is two group(feed) having details with content and comments as nested element.
{
"_index": "group",
"_type": "groupDetail",
"_id": 6829,
"_score": 1,
"_source": {
"groupid": 6829,
"name": "Jignesh Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": true,
"tags": [
"spotrs",
"surat"
],
"content": [
{
"contentid": 1,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 1"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 2,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
{
"_index": "group",
"_type": "groupDetail",
"_id": 6849,
"_score": 1,
"_source": {
"groupid": 6849,
"name": "Xyz Group Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": false,
"tags": [
"spotrs",
"food"
],
"content": [
{
"contentid": 3,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 3"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 4,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
now if i try to think with nested object than i confused if user add comment very frequently than reindexing factor will effect?
So main think i want to ask is which is the best approach with which i can add comment frequently and my content searching result is also faster.

Performance
Parent/child stores relevant data in same shards, as separately doc, which avoid the network;
Parent/child needs a joining process when retrieving data;
Nested object store the inner and outer object together, as a single doc;
So, we can infer:
Update nested object will re-index whole index, which can very expensive if your document is large;
Update parent or child alone will not affect the other one;
Searching nested object is a little fast, which save the process of joining;
Suggestions
As far as I understand your problem, you should use parent/child.
When your group's comments become more and more, adding a new comment will still re-index whole content, which can be very time-consuming;
On the other hand, search a comment with parent/child just need one more look up after finding the child, which is relative acceptable.
Furthermore, you should also take the rate of searching a comment comparing to adding a comment into account:
If you need searching a lot but a little new comments, maybe you can choose nested object;
Otherwise, choose parent/child;
By the way, you may combine both of them:
When this feed is active, use parent/child to store them;
When it is closed, i.e., no more comments can be added, move them to a new index with nested object;

If you do not specify more detailed info other than very frequently it is going to be hard to come up with a recommendation. Also you have not mentioned how your data looks like. A comment in a blog post might be happening rare, even in heated discussions. A comment/reply in a forum post (that will result in a huge document) might be sth very different. I'd personally start with nested and see how it goes, but I also do not know all the requirements, so this might be a very wrong answer.

ElasticSearch _Source is always empty on the return

I am posting a query to http://localhost:9200/movie_db/movie/_search but _source attribute is always empty on the return resposne. I made it enabled but that doesn't help.
Movie DB:
TRY DELETE /movie_db
PUT /movie_db {"mappings": {"movie": {"properties": {"title": {"type": "string", "analyzer": "snowball"}, "actors": {"type": "string", "position_offset_gap" : 100, "analyzer": "standard"}, "genre": {"type": "string", "index": "not_analyzed"}, "release_year": {"type": "integer", "index": "not_analyzed"}, "description": {"_source": true, "type": "string", "analyzer": "snowball"}}}}}
BULK INDEX movie_db/movie
{"_id": 1, "title": "Hackers", "release_year": 1995, "genre": ["Action", "Crime", "Drama"], "actors": ["Johnny Lee Miller", "Angelina Jolie"], "description": "High-school age computer expert Zero Cool and his hacker friends take on an evil corporation's computer virus with their hacking skills."}
{"_id": 2, "title": "Johnny Mnemonic", "release": 1995, "genre": ["Science Fiction", "Action"], "actors": ["Keanu Reeves", "Dolph Lundgren"], "description": "A guy with a chip in his head shouts incomprehensibly about room service in this dystopian vision of our future."}
{"_id": 3, "title": "Swordfish", "release_year": 2001, "genre": ["Action", "Crime"], "actors": ["John Travolta", "Hugh Jackman", "Halle Berry"], "description": "A cast of characters challenge society's commonly held view that computer experts are not the beautiful people. Somehow, the CIA is hacked in under 5 minutes."}
{"_id": 4, "title": "Tomb Raider", "release_year": 2001, "genre": ["Adventure", "Action", "Fantasy"], "actors": ["Angelina Jolie", "Jon Voigt"], "description": "The story of a girl and her quest for antiquities in the face of adversity. This epic is adapter from its traditional video-game format to the big screen"}
Query:
{
"query" :
{
"term" : { "genre" : "Crime" }
},
}
Results:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
"_index": "movie_db",
"_type": "movie",
"_id": "3",
"_score": 0.30685282,
"_source": {}
},
{
"_index": "movie_db",
"_type": "movie",
"_id": "1",
"_score": 0.30685282,
"_source": {}
}
]
}
}

I had the same problem: despite enabling _source in my query as well as in my mappings, _source would always be {}.
Your proposed solution of setting cluster.name in elasticsearch.yml gave me the hint that the problem must be some hidden setting in the old cluster.
I found out that I had an index template definition that came with a plugin I installed (in my case elasticsearch-transport-couchbase), which said
"_source" : {
"includes" : [ "meta.*" ]
},
thereby implicitely excluding all fields other than meta.* from source.
Check your templates like this:
curl -XGET localhost:9200/_template/?pretty
I deleted the couchbase template like so
curl -XDELETE localhost:9200/_template/couchbase
and created a new, almost identical one but with source enabled.
Here is how:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html

Solution:
In elasticsearch config folder, open elasticsearch.yml and set cluster.name to a different value, then restart elasticsearch.bat

I once accidentally passed a single field in source array and that too didn't exist. Just for example "_source": ["bazinga"] and in the aggregations result source was empty.
So maybe you could simple pass a totally unrelated string into the _source array. This can be a better solution instead of making changes in the elasticsearch.yml file.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch how to support transaction involving multiple documents - elasticsearch

Related

Apollo cache error on deleted "child" table

Auto Increment a field value every time a doc is updated in elasticsearch

ElasticSearch - how to edit a field inside an array of a document

Best approch of Elastic Search time based feeds module?

ElasticSearch _Source is always empty on the return

Categories

Resources