Fully update documents without creating if not existent - elasticsearch

Is there any method on elasticsearch for fully (not partially) updating documents and not create new ones in case it doesn’t already exists?
Until now, I found that the _update method, while passing a doc attribute inside the json request body to partially updating documents, however, I would like to replace the entire document in this case, not only partially.
I have also found that, the index method, where sending a PUT request works fine, although creating a new document in case the id not yet indexed.
Setting the op_type parameter to create will enforce document creation instead update.
I was wondering if there is any way to always enforce update and never create a new one?
Or perhaps is there another method that would allow me to achieve such task?

If I understand correctly, you want to index a doc, but only if it already exists? Like an op_type option of update?
You can mostly do it with the update API, given that your mapping remains consistent. With an _update, if the document doesn't exist, you'll get back a 404. If it does exist, ES will merge the contents of doc with whatever document exists there. If you make sure you're sending over a new doc with all the fields in the mapping, then you're effectively replacing it outright.
Note, however, that you can do it without the document merge rather efficiently in two requests; the first one checking for doc existence with a HEAD request. If HEAD /idx/type/id is successful, then do a PUT. This is essentially what's happening internally anyway with the update API, with a little extra overhead. But HEAD is really cheap because it's not shuffling any payload around. It simply returns an HTTP 200/404.

Related

Partial mutations with vuex-orm over graphql?

Background:
I use mongodb where a typical document may contain fields with large values. A description field may hold over 200KB.
The same document also contains a title field which is limited to 64 characters (max).
I’d like to setup the code so that it's more network-efficient when the user modifies title.
Current state:
I use doc.$push() to store the changes in the document.
Spec is: https://vuex-orm.github.io/plugin-graphql/guide/push.html
In this case the browser devtools network tab shows that the whole document is being sent, including the description despite not being modified. Needless to say, that's an unreasonable overhead for such a network request.
How do I set it up so that only the title value is included and sent in the network request?
One approach is to use apollo-client with a custom mutation query updating only title. This approach ends up messing up the codebase because there are more collections and more fields which are required to update without resending the whole document. So really, I seek a generic approach.
So, any ideas as for how to execute partial mutations with vuex-orm over graphql?

Can I use a painless script on a GET to update a counter?

I have an index with documents that have an "access count" field which is intended to store the number of times that the document has been accessed. (Much like this web page.)
Of course I can use an update after each get to update the field, but is there a way to attach a Painless script to the get request to increment the field? Something like:
doc['access_count'] += 1;
I don't see an obvious answer out of the documentation, but if anyone has done this it would be helpful to know.
A GET call is supposed to be idempotent, i.e. calling the same URI multiple times doesn't change the underlying resource. So what you're asking goes against that principle and that's (luckily) not possible to do it.
You'll need to update a counter on that document separately.

Usage of filter_path with helpers.scan in elastisearch client

When doing a search operation in elasticsearch i want the metadata to be filtered out and return only "_source" in the response. I'm able to achieve the same through "search" in the following way:
out1 = es.search(index='index.com', filter_path=['hits.hits._id',
'hits.hits._source'])
But when i do the same with scan method it just returns an empty list:
out2 = helpers.scan(es, query, index='index.com',
doc_type='2016-07-27',filter_path= ['hits.hits._source'])
The problem may be with the way i'm processing the response of 'scan' method or with the way i'm passing the value to filter_path. To check the output i parse out2 to a list.
The scan helper currently doesn't allow passing extra parameters to the scroll API so your filter_path doesn't apply to it. It does, however, get applied to the initial search API call which is used to initiate the scan/scroll cycle. This means that the scroll_id is stripped from the response causing the entire operation to fail.
In your case even passing the filter_path parameter to the scroll API calls would cause the helper to fail because it would strip the scroll_id which is needed for this operation to work and also because the helper relies on the structure of the response.
My recommendation would be to use source filtering if you need to limit the size of the response or use smaller size parameter than the default 1000.
Hope this helps,
Honza
You could pass filter_path=['_scroll_id', '_shards', 'hits.hits._source'] to the scan helper to get it to work. Obviously that leaves some metadata in the response but it removes as much as possible while allowing the scroll to work. _shards is required because it is used internally by the scan helper.

Is it possible to check previous value of a key in beforeSave?

Let's say I want to perform custom logic only, say, when a user's verified field changes from false to true (in order to make sure they are allowed to be performing this operation). Is there a way in Cloud Code to see what the 'current', i.e. about-to-be-overwritten value of a field is?
I would look at changedAttributes(), previousAttributes() and previous("columnName") to see if these have been exposed in the beforeSave handler yet.
Update note: none of those methods help.
The only other option I've seen in some older questions is to check object.existed() and in that case do a get() request to load the original values before the save. Obviously this causes 2 API requests per save.
It would be great to hear back if the changed/previous methods work.
Update
I have since done some more thorough testing, and the only option is to get() the previous version of the record. Nothing else works. This of course requires that you do it in the before-save handler.

GET vs POST in AJAX?

Why are there GET and POST requests in AJAX as it does not affect page URL anyway? What difference does it make by passing sensitive data over GET in AJAX as the data is not getting reflected to page URL?
You should use the proper HTTP verb according to what you require from your web service.
When dealing with a Collection URI like: http://example.com/resources/
GET: List the members of the collection, complete with their member URIs for further navigation. For example, list all the cars for sale.
PUT: Meaning defined as "replace the entire collection with another collection".
POST: Create a new entry in the collection where the ID is assigned automatically by the collection. The ID created is usually included as part of the data returned by this operation.
DELETE: Meaning defined as "delete the entire collection".
When dealing with a Member URI like: http://example.com/resources/7HOU57Y
GET: Retrieve a representation of the addressed member of the collection expressed in an appropriate MIME type.
PUT: Update the addressed member of the collection or create it with the specified ID.
POST: Treats the addressed member as a collection in its own right and creates a new subordinate of it.
DELETE: Delete the addressed member of the collection.
Source: Wikipedia
Well, as for GET, you still have the url length limitation. Other than that, it is quite conceivable that the server treats POST and GET requests differently; thus the need to be able to specify what request you're doing.
Another difference between GET and POST is the way caching is handled in browsers. POST response is never cached. GET may or may not be cached based on the caching rules specified in your response headers.
Two primary reasons for having them:
GET requests have some pretty restrictive limitations on size; POST are typically capable of containing much more information.
The backend may be expecting GET or POST, depending on how it's designed. We need the flexibility of doing a GET if the backend expects one, or a POST if that's what it's expecting.
It's simply down to respecting the rules of the http protocol.
Get - calls must be idempotent. This means that if you call it multiple times you will get the same result. It is not intended to change the underlying data. You might use this for a search box etc.
Post - calls are NOT idempotent. It is allowed to make a change to the underlying data, so might be used in a create method. If you call it multiple times you will create multiple entries.
You normally send parameters to the AJAX script, it returns data based on these parameters. It works just like a form that has method="get" or method="post". When using the GET method, the parameters are passed in the query string. When using POST method, the parameters are sent in the post body.
Generally, if your parameters have very few characters and do not contain sensitive information then you send them via GET method. Sensitive data (e.g. password) or long text (e.g. an 8000 character long bio of a person) are better sent via POST method.
Thanks..
I mainly use the GET method with Ajax and I haven't got any problems until now except the following:
Internet Explorer (unlike Firefox and Google Chrome) cache GET calling if using the same GET values.
So, using some interval with Ajax GET can show the same results unless you change URL with irrelevant random number usage for each Ajax GET.
Others have covered the main points (context/idempotency, and size), but i'll add another: encryption. If you are using SSL and want to encrypt your input args, you need to use POST.
When we use the GET method in Ajax, only the content of the value of the field is sent, not the format in which the content is. For example, content in the text area is just added in the URL in case of the GET method (without a new line character). That is not the case in the POST method.

Resources