Is it possible to receive a gzipped response via elasticsearch-py? - elasticsearch

I have an API (via hug) that sits between a UI and Elasticsearch.
The API uses elasticsearch-py to run searches, for example:
es = Elasticsearch([URL], http_compress=True)
#hug.post('/search')
def search(body):
return es.search(index='index', body=body)
This works fine; however, I cannot figure out how to obtain a compressed JSON result.
Elasticsearch is capable of this because a curl test checks out — the following returns a mess of characters to the console instead of JSON and this is what I want to emulate:
curl -X GET -H 'Accept-Encoding: gzip' <URL>/<INDEX>/_search
I've tried the approach here to modify HTTP headers, but interestingly enough the "Accept-Encoding": "gzip" header is already there: it just doesn't appear to be passed to Elastic because the result is always uncompressed.
Lastly, I'm passing http_compress=True when creating the Elastic instance; however, this only compresses the payload — not the result.
Has anyone had a similar struggle and figured it out?

Related

Passing request_body with GET request?

Like at this elastic get query I see below example where per my understanding query_string is passed under request body in GET request . Is n't it ? But I believe we can't pass request body with GET request then how come this example is true ?
GET /_search
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
}
In fact when I used the option COPY as CURL from the stated link I see below copied text
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
}
'
Am I missing anything here or something wrong in example? In fact I do not see the way to send the request body under Postman tool.
The fact is that you can send a GET request with a body. The current HTTP standard rfc7231 (obsoletes rfc2616 and updates rfc2817) does not strictly define what must happen to a GET request with a body. The previous versions were different in this regard. For that reason, some HTTP servers allow it, but some others don't, I'm afraid. This case is mentioned in the latest standard as follows:
A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.
In terms of Elasticsearch, using using GET for a search request is a design decision. They feel it makes more sense semantically. Because it represents a data retrieving action better than the POST verb.
On the other hand, as mentioned above, a GET request with a body is not supported universally. That's why Postman does not allow you to do so, although Kibana > Dev Tool does it by using cURL. Therefore, the Elasticsearch search API also supports POST requests to search and retrieve information. So, when you cannot make a GET request with a body, you can obtain exactly the same result by making a POST request.
This is actually very interested question. In fact, a lot of HTTP clients aren’t supporting GET requests with body (i just recently face, that iOS client in Cocoa isn’t able to do so).
I also had a lot of discussions with my colleagues - to me after using Elasticsearch for a long time GET with a body sounds like a perfectly fine HTTP request, however some may argue, that GET shouldn’t go with body at all according to HTTP standard. However, I will leave this discussion out of this answer.
In general this leads to a situation, that if you’re using client which not supporting GET, you could either change it to POST or switch to something else - I used to use cURL all the time or Kibana Dev Tools if I needed to construct complex query on the fly

How do I log all queries in embedded ElasticSearch?

I'm trying to debug an ElasticSearch query. I've enabled explain for the problematic query, and that is showing that the query is doing a product of intermediate scores where it should be doing a sum. (I'm creating the query request using elastic4s.)
The problem is I cannot see what the generated query actually is. I want to determine whether the bug is in elastic4s (generating the query request incorrectly), in my code, or in elasticsearch. So I've enabled logging for the embedded elasticsearch instance used in the tests using the following code:
ESLoggerFactory.setDefaultFactory(new Slf4jESLoggerFactory())
val settings = Settings.settingsBuilder
.put("path.data", dataDirPath)
.put("path.home", "/var/elastic/")
.put("cluster.name", clusterName)
.put("http.enabled", httpEnabled)
.put("index.number_of_shards", 1)
.put("index.number_of_replicas", 0)
.put("discovery.zen.ping.multicast.enabled", false)
.put("index.refresh_interval", "10ms")
.put("script.engine.groovy.inline.search", true)
.put("script.engine.groovy.inline.update", true)
.put("script.engine.groovy.inline.mapping", true)
.put("index.search.slowlog.threshold.query.debug", "0s")
.put("index.search.slowlog.threshold.fetch.debug", "0s")
.build
but I can't find any queries being logged in the log file configured in my logback.xml. Other log messages from elasticsearch are appearing there, just not the actual queries.
You can't, at least not directly, at least not in ES versions currently available. It's something that has been discussed at some length (eg https://github.com/elastic/elasticsearch/issues/9172 and https://github.com/elastic/elasticsearch/issues/12187) it seems like this may change soon, with the rewrite of the tasks API. In the meantime, you can use things like ES Restlog (https://github.com/etsy/es-restlog) and/or put nginx in front of ES and capture the queries in the nginx logs. You can also use tcpdump (eg tcpdump -vvv -x -X -i any port 9200) and capture the query as it's running on the server. One last option is to modify your application and echo the query instead of executing it (and/or inserting the query into ES itself before you execute it, since the query itself is JSON).
In the specific case of elastic4s, it offers the ability to call .show on the elastic4s query object to generate what the JSON body part of the request would have been if the JSON-over-HTTP protocol had been used to send the request, for most types of request. This can then be logged at a convenient point in your code, e.g. if you have one method that generates all ES search queries. The code in Elasticsearch that generates the fake JSON could still have bugs of course, so it should not entirely be trusted. However, it's worth trying to reproduce the issue with the output of .show using Sense against a real Elasticsearch cluster over HTTP - if you can, you (a) know that it's not an elastic4s bug, and (b) can easily manipulate the JSON to try to figure out what's causing the problem.
show calls toString in some cases, so with the plain Elasticsearch API or another JVM-based wrapper on top of it, you can call that to get the JSON string to log.
With embedded Elasticsearch, this is as good as you're going to get in terms of logging - short of putting a breakpoint on the builder invocations and observing the actual Java Elasticsearch request objects that are created (which is the most accurate approach).

Retrieve POST response data with cURL

In jQuery/AJAX you can make POST requests and get their response with something like
$.post(url, data, function(res){
//do something
});
Where res contains the server's response. I am trying to replicate this with cURL in bash.
curl -d "data=data" --cookie cookies.txt --header "Content-Type:text/html" https://example.com/path > result.html
Returns gibberish (some sort of js object maybe?), but I am expecting html. Is there a way to retrieve the data that would be in res using cURL?
Thanks in advance.
Sometimes servers send back compressed content. It looks like random garbage. Use curl --compressed to get the decompressed result.
It seems that you are trying to get a data from a POST request. Server returns you a javascript object. And you are getting it right way, javascript converts returned data to HTML, not cURL. So basically you can't do that.

How to include an image in an http POST request

I'm trying to make a POST request from the command line to my Flask app, and I want it to include an image. But I don't know how to include it with the command. I've only used strings as data successfully.
So, if my POST request looks like this:
curl -i -H "Content-Type: application/json" -X POST -d '{"username":"user1", "password":"password", "image":##What do I put here?##}' http://localhost:5000/my_app/api/users
I don't know what to put in that image part of the JSON. I'm tagging flask in this question because it might be a specific answer with regards to flask.
I would like to include an actual image here, and then on the Flask side of things, put the image in a folder of the app where all the uploads go, then save the path to the image in the database for later access. But, to do that, I need to know how to send an image in the first place. Any thoughts?
Seems you're mixing things up here.
From your example seems you want to upload an image in a JSON object. This is generally bad for 2 reasons:
Overhead: the image data should be encoded in printable characters, e.g. using base64. This creates a huge overhead on the data itself causing the JSON decoder to slow down.
Testing: You can't try this using curl on the commandline. You should make some command line utility to test the request.
HTTP knows about data uploads. So to mantain the JSON structure without the slowdown of above, you should upload your image as traditional data upload using a field, and another field for the JSON data structure.
Using curl this is achieved with the -F option.
curl -i -Ffiledata=#file.jpg -Fdata='{"username":"user1", "password":"password"}' http://localhost:5000/my_app/api/users
With the above command you are sending an HTTP POST with a file upload named filedata and a 'data` field containing your 'JSON' payload to parse in the receiving view handler.
You must use 2 HTTP fields because in HTTP upload you're using multipart encoding.
Behind the scenes
You're sending the image contents encoded in base64 but since the data is decoded by the application framework knowing exactly what to do (JSON parse must figure it out while parsing) it's a lot faster.

How to I find all data from JSON POST request with curl

I am playing with the open data project at spogo.co.uk (sport england).
See here for a search example: https://spogo.co.uk/search#all/Football Pitch/near-london/range-5.
I have been using cygwin and curl to POST JSON data to the MVC controller. An example is below:
curl -i -X POST -k -H "Accept: application/json" -H "Content-Type: application/json; charset=utf-8" https://spogo.co.uk/search/all.json --data '{"searchString":"","address": {"latitude":55,"longitude":-3},"page":0}'
Question:
How can I find out what other variables can be included in the post data?
How can I return all results, rather than just 20 at a time? Cycling through page numbers doesn't deliver all at once.
AJAX is simply a technique of posting data over a connection asynchronously, JSON is just a string format that can contain data. Neither of which have built in mechanisms for querying information such as what fields are accepted or the amount of data returned.
You will want to check the web service documentation for on spogo.co.uk for these answers, IF their web service exposes such functionality they will be the final authority on what the commands and formats are.

Resources