Usage of filter_path with helpers.scan in elastisearch client - elasticsearch

When doing a search operation in elasticsearch i want the metadata to be filtered out and return only "_source" in the response. I'm able to achieve the same through "search" in the following way:
out1 = es.search(index='index.com', filter_path=['hits.hits._id',
'hits.hits._source'])
But when i do the same with scan method it just returns an empty list:
out2 = helpers.scan(es, query, index='index.com',
doc_type='2016-07-27',filter_path= ['hits.hits._source'])
The problem may be with the way i'm processing the response of 'scan' method or with the way i'm passing the value to filter_path. To check the output i parse out2 to a list.

The scan helper currently doesn't allow passing extra parameters to the scroll API so your filter_path doesn't apply to it. It does, however, get applied to the initial search API call which is used to initiate the scan/scroll cycle. This means that the scroll_id is stripped from the response causing the entire operation to fail.
In your case even passing the filter_path parameter to the scroll API calls would cause the helper to fail because it would strip the scroll_id which is needed for this operation to work and also because the helper relies on the structure of the response.
My recommendation would be to use source filtering if you need to limit the size of the response or use smaller size parameter than the default 1000.
Hope this helps,
Honza

You could pass filter_path=['_scroll_id', '_shards', 'hits.hits._source'] to the scan helper to get it to work. Obviously that leaves some metadata in the response but it removes as much as possible while allowing the scroll to work. _shards is required because it is used internally by the scan helper.

Related

How to submit queries from the elastic cloud api console?

I'm new to the elastic-cloud interface. It allows to chooose operations get, post, put and del. I'm trying to submit queries, but I don't know the precise syntax. For instance:
tweet/_search?q=something
works, but:
tweet/_search?q={ "match_all": {} }
does not, returning a parser error. I have tried with double quotes, but it seems that then it searches for the query as a string.
The preferred way to test the search APIs are using the POST method, GET API in some case, gives even incorrect search results as it ignores the search and brings the top 10 search results for match_all query.
Elasticsearch supports both methods GET and POST to search but using the GET method which has payload information isn't common on modern app-severs, although Elasticsearch implemented it requires carefully crafting your queries.
Still, if you want to use the GET API, then for complex queries its better to send it as part of request body, I know it sounds weird to send a body to GET request but it works 😀 .

Fully update documents without creating if not existent

Is there any method on elasticsearch for fully (not partially) updating documents and not create new ones in case it doesn’t already exists?
Until now, I found that the _update method, while passing a doc attribute inside the json request body to partially updating documents, however, I would like to replace the entire document in this case, not only partially.
I have also found that, the index method, where sending a PUT request works fine, although creating a new document in case the id not yet indexed.
Setting the op_type parameter to create will enforce document creation instead update.
I was wondering if there is any way to always enforce update and never create a new one?
Or perhaps is there another method that would allow me to achieve such task?
If I understand correctly, you want to index a doc, but only if it already exists? Like an op_type option of update?
You can mostly do it with the update API, given that your mapping remains consistent. With an _update, if the document doesn't exist, you'll get back a 404. If it does exist, ES will merge the contents of doc with whatever document exists there. If you make sure you're sending over a new doc with all the fields in the mapping, then you're effectively replacing it outright.
Note, however, that you can do it without the document merge rather efficiently in two requests; the first one checking for doc existence with a HEAD request. If HEAD /idx/type/id is successful, then do a PUT. This is essentially what's happening internally anyway with the update API, with a little extra overhead. But HEAD is really cheap because it's not shuffling any payload around. It simply returns an HTTP 200/404.

How can I use a list function in CouchDB to generate a valid (/normal) ViewResults object?

I have a simple problem I need to solve, and list functions are my current attempt to do so. I have a view that generates almost what I need, but in certain cases there are duplicate entries that make it through when I send in edge-case parameters.
Therefore, I am looking to filter these extra results out. I have found examples of filtering, which I am using (see this SO post). However, rather than generate HTML or XML or what-have-you, I just want a regular ol' view result. That is, the same kind of object that I would get if I queried CouchDB without a list function. It should have JSON data as normal and be the same in every way, except that it is missing duplicate results.
Any help on this would be appreciated! I have tried to send() data in quite a few different ways, but I usually get that "No JSON object could be decoded", or that indices need to be integers and not strings. I even tried to use the list to store every row until the end and send the entire list object back at once.
Example code (this is using an example from this page to send data:
function(head, req) {
var row; var dupes = [];
while(row=getRow()) {
if (dupes.indexOf(row.key) == -1) {
dupes.push(row.key);
send(row.value);
}
};
}
Lastly, I'm using Flask with Flask-CouchDB, and I'm seeing the aforementioned errors in the flask development server that I'm running.
Thanks! I can try to supply more details if need be.
Don't you need to prepend a [, send a , after each row value except the last, and end with ]? To actually mimic a view result, you'd actually need to wrap that in a JSON structure:
{"total_rows":0,"offset":0,"rows":[<your stuff here>]}

GET vs POST in AJAX?

Why are there GET and POST requests in AJAX as it does not affect page URL anyway? What difference does it make by passing sensitive data over GET in AJAX as the data is not getting reflected to page URL?
You should use the proper HTTP verb according to what you require from your web service.
When dealing with a Collection URI like: http://example.com/resources/
GET: List the members of the collection, complete with their member URIs for further navigation. For example, list all the cars for sale.
PUT: Meaning defined as "replace the entire collection with another collection".
POST: Create a new entry in the collection where the ID is assigned automatically by the collection. The ID created is usually included as part of the data returned by this operation.
DELETE: Meaning defined as "delete the entire collection".
When dealing with a Member URI like: http://example.com/resources/7HOU57Y
GET: Retrieve a representation of the addressed member of the collection expressed in an appropriate MIME type.
PUT: Update the addressed member of the collection or create it with the specified ID.
POST: Treats the addressed member as a collection in its own right and creates a new subordinate of it.
DELETE: Delete the addressed member of the collection.
Source: Wikipedia
Well, as for GET, you still have the url length limitation. Other than that, it is quite conceivable that the server treats POST and GET requests differently; thus the need to be able to specify what request you're doing.
Another difference between GET and POST is the way caching is handled in browsers. POST response is never cached. GET may or may not be cached based on the caching rules specified in your response headers.
Two primary reasons for having them:
GET requests have some pretty restrictive limitations on size; POST are typically capable of containing much more information.
The backend may be expecting GET or POST, depending on how it's designed. We need the flexibility of doing a GET if the backend expects one, or a POST if that's what it's expecting.
It's simply down to respecting the rules of the http protocol.
Get - calls must be idempotent. This means that if you call it multiple times you will get the same result. It is not intended to change the underlying data. You might use this for a search box etc.
Post - calls are NOT idempotent. It is allowed to make a change to the underlying data, so might be used in a create method. If you call it multiple times you will create multiple entries.
You normally send parameters to the AJAX script, it returns data based on these parameters. It works just like a form that has method="get" or method="post". When using the GET method, the parameters are passed in the query string. When using POST method, the parameters are sent in the post body.
Generally, if your parameters have very few characters and do not contain sensitive information then you send them via GET method. Sensitive data (e.g. password) or long text (e.g. an 8000 character long bio of a person) are better sent via POST method.
Thanks..
I mainly use the GET method with Ajax and I haven't got any problems until now except the following:
Internet Explorer (unlike Firefox and Google Chrome) cache GET calling if using the same GET values.
So, using some interval with Ajax GET can show the same results unless you change URL with irrelevant random number usage for each Ajax GET.
Others have covered the main points (context/idempotency, and size), but i'll add another: encryption. If you are using SSL and want to encrypt your input args, you need to use POST.
When we use the GET method in Ajax, only the content of the value of the field is sent, not the format in which the content is. For example, content in the text area is just added in the URL in case of the GET method (without a new line character). That is not the case in the POST method.

jQuery POST and GET methods: Construct URL or use data param?

I am using the post and get methods for Ajax calls, and have a general question. There are two methods I've seen people use on the web:
Construct the URL and parameters by
hand
Use the data parameter
Both approaches work. I've included them below:
// Construct the POST URL by hand
queryStringDelimiter = "?";
settings.queryParam = "q";
$.post(settings.url + queryStringDelimiter + settings.queryParam + "=" + query, {}, callback, settings.contentType);
// Use the data param
$.post(settings.url, {q:query}, callback, settings.contentType);
Are there any situations where you would construct the URL and parameters by hand instead of using the built-in data parameter? Any advantages of one method over the other?
I'd say the data approach is better since it formalizes the process and reduces the chances of producing errors while string building. Besides, the JQuery library will do the string building for you so its basically the same amount of work.
No reason I can think of why one would construct them by hand unless they didn't know of the data parameter if there's more than 1 or 2 parameters, it's also cleaner to keep them separated so if you have to loop through the data object and possibly modify some values you'd just iterate over the object instead of parsing a string manually.
If you let jQuery concatenating the data in to the appropriately formatted string you...
avoid having to duplicate that code...
avoid worrying about escaping the data for transport...
can easily switch between GET and POST requests in the future...
Really, the only argument AGAINST using the data parameter is if you already have the data in a concatenated format.
If I am using a GET I tend to just construct the URL, but when using POST I use the data parameter.
I do it because it is closer to how I was doing ajax calls before jQuery, when I wrote everything myself.

Resources