From curl's manpage
Use "-C -" to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.
So if using
curl \
--retry 9999 \
--continue-at - \
https://mydomain.test/some.file.bin \
| target-program
and the download fails (once) half-way through, and the server supports range requests, will curl retry, via a range request, so target-program receives the full bytes of some.file.bin as its input?
From testing, curl will not retry using a range request.
I wrote a broken HTTP server, requiring the client to retry using a range-request to get a full response. Using wget
wget -O - http://127.0.0.1:8888/ | less
results in the full response
abcdefghijklmnopqrstuvwxyz
and I can see on the server side there way a request with 'Range': 'bytes=24-' in the request headers.
However, using curl
curl --retry 9999 --continue-at - http://127.0.0.1:8888/ | less
results in only the incomplete response, and no range request in the server log.
abcdefghijklmnopqrstuvwx
The Python server used
import asyncio
import re
from aiohttp import web
async def main():
data = b'abcdefghijklmnopqrstuvwxyz'
async def handle(request):
print(request.headers)
# A too-short response with an exception that will close the
# connection, so the client should retry
if 'Range' not in request.headers:
start = 0
end = len(data) - 2
data_to_send = data[start:end]
headers = {
'Content-Length': str(len(data)),
'Accept-Ranges': 'bytes',
}
print('Sending headers', headers)
print('Sending data', data_to_send)
response = web.StreamResponse(
headers=headers,
status=200,
)
await response.prepare(request)
await response.write(data_to_send)
raise Exception()
# Any range request
match = re.match(r'^bytes=(?P<start>\d+)-(?P<end>\d+)?$', request.headers['Range'])
start = int(match['start'])
end = \
int(match['end']) + 1 if match['end'] else \
len(data)
data_to_send = data[start:end + 1]
headers = {
'Content-Range': 'bytes {}-{}/{}'.format(start, end - 1, len(data)),
'Content-Length': str(len(data_to_send)),
}
print('Sending headers', headers)
print('Sending data', data_to_send)
response = web.StreamResponse(
headers=headers,
status=206
)
await response.prepare(request)
await response.write(data_to_send)
await response.write_eof()
return response
app = web.Application()
app.add_routes([web.get(r'/', handle)])
runner = web.AppRunner(app)
await runner.setup()
site = web.TCPSite(runner, '0.0.0.0', 8888)
await site.start()
await asyncio.Future()
asyncio.run(main())
Related
Does the AsyncElasticsearch client open a new session for each async request?
AsyncElasticsearch (from elasticsearch-py) uses AIOHTTP. From what I understand, AIOHTTP recommends a using a context manager for the aiohttp.ClientSession object, so as to not generate a new session for each request:
async with aiohttp.ClientSession() as session:
...
I'm trying to speed up my bulk ingests.
How do I know if the AsyncElasticsearch client is using the same session, or setting up multiple?
Do I need the above async with... command in my code snippet below?
# %%------------------------------------------------------------------------------------
# Create async elastic client
async_es = AsyncElasticsearch(
hosts=[os.getenv("ELASTIC_URL")],
verify_certs=False,
http_auth=(os.getenv("ELASTIC_USERNAME"), os.getenv("ELASTIC_PW")),
timeout=60 * 60,
ssl_show_warn=False,
)
# %%------------------------------------------------------------------------------------
# Upload csv to elastic
# Chunk files to keep memory low
with pd.read_csv(file, usecols=["attributes"], chunksize=50_000) as reader:
for df in reader:
# Upload to elastic with username as id
async def generate_actions(df_chunk):
for index, record in df_chunk.iterrows():
doc = record.replace({np.nan: None}).to_dict()
doc.update(
{"_id": doc["username"], "_index": "users",}
)
yield doc
es_upl_chunk = 1000
async def main():
tasks = []
for i in range(0, len(df), es_upl_chunk):
tasks.append(
helpers.async_bulk(
client=async_es,
actions=generate_actions(df[i : i + es_upl_chunk]),
chunk_size=es_upl_chunk,
)
)
successes = 0
errors = []
print("Uploading to es...")
progress = tqdm(unit=" docs", total=len(df))
for task in asyncio.as_completed(tasks):
resp = await task
successes += resp[0]
errors.extend(resp[1])
progress.update(es_upl_chunk)
return successes, errors
responses = asyncio.run(main())
print(f"Uploaded {responses[0]} documents from {file}")
if len(responses[1]) > 0:
print(
f"WARNING: Encountered the following errors: {','.join(responses[1])}"
)
Turns out the AsyncElasticsearch was not the right client to speed up bulk ingests in this case. I use the helpers.parallel_bulk() function instead.
I need to translate autocannon performance test into locust python code and reach the same requests per second criteria > 3000
this is the autocannon command:
AUTOCANNON="taskset -c 8-15 /opt/autocannon-tests/node_modules/.bin/autocannon --amount 100000 --connections 30 --bailout 5 --json"
$AUTOCANNON $URL/applications -m PUT -H "Content-Type:application/json" -H "Authorization=$AUTHORIZATION_HEADER" -b '{"name":"test"}'
I managed to reach number of requests per second > 3000
I wrote a python code
class _PerformanceTask(SequentialTaskSet):
def __init__(self, *args, **kwargs):
SequentialTaskSet.__init__(self, *args, **kwargs)
self.username = 'admin'
self.password = 'admin'
self.token = None
self.identifier = time.time()
self.error = None
self.as3_user_id = None
self.non_admin_user_token = None
self.as3_user_token = None
self.system_id = None
self.open_api_retrieve_count = 0
self.declare_id = None
self.network_id = None
self.irule_app = None
self.irule_network_id = None
self.application_editor_user = None
def on_start(self):
self.login()
def _log(self, fmt, *args):
print('[%s] %s' % (self.identifier, fmt % args))
def _request(self, method, path, non_admin_user_token=False, headers=None, **kwargs):
self._log('[%s]%s', method, path)
self._log('%s', repr(kwargs))
if not headers:
headers = {'Content-Type': 'application/json'}
if self.token:
headers['Authorization'] = 'Bearer %s' % self.token
if non_admin_user_token:
headers['Authorization'] = 'Bearer %s' % self.non_admin_user_token
resp = self.client.request(method, path, headers=headers, **kwargs)
self._log('resp status code: %s', resp.status_code)
self._log('resp content: %s', resp.text)
assert resp.status_code in (200, 201, 204, 202)
if (re.search('^[\[\{]', resp.text)):
return resp.json()
return resp.text
def login(self):
self._log('login')
resp = self._request(
method='GET',
path='/login',
auth=(self.username, self.password),
)
self.token = resp['token']
self._log('token is: %s', self.token)
#task
def run_performance(self):
self._log('PUT request to $URL/applications with auth. header.')
resp = self._request(
method='PUT',
path='/applications',
json={
"name":"test",
}
)
self._log('response is: %s', resp)
class PerformanceTask(FastHttpUser):
tasks = [_PerformanceTask]
Note: I am using FastHttpUser + locust-plugins installed
But I can't reach the same result.
The ways I run this performance.py script
locust --locustfile performance.py --host https://localhost:5443/api/v1 --headless -u 30 -i 100000
and also distributed:
locust --locustfile performance.py --host https://localhost:5443/api/v1 --headless -u 30 -i 10000 --master --expect-workers=8
and start workers like
locust --locustfile performance.py --worker --master-host=127.0.0.1 -i 10000 &
anyway - I get table of results and the speed is much lower no matter how I run :
req/s failures/s
224.49 0.00
I hope you have ideas
I'm not familiar with autocannon so I'm not entirely sure, but a quick look through the documentation says that the --connections doesn't seem like it translates to Locust's --users/-u. It says it's "The number of concurrent connections to use." To get something similar to that, I believe you'd have to set up a FastHttpSession and specify concurrency there. Something like:
fast_http_session = FastHttpSession(environment=env, base_url="https://localhost:5443/api/v1", user=None, concurrency=30)
You'll need to get the environment from Locust when it runs to pass it into there, and may or may not want to specify your actual user (which you can pass as self if you put this in your user class).
But that should get you the number of concurrent connections to use, and then you'd want to crank up the number of users you spawn. As you make your calls using the session you created, the users will reuse the 30 open connections, it will just be up to you to discover how many users you need to spawn to "saturate" the connections like autocannon claims to do and/or how many the machine you run it on can handle.
I would like to get image via post request and then read it. I'm trying to do is this way:
import numpy as np
from PIL import Image
from fastapi import FastAPI, File, UploadFile, HTTPException, Depends
app = FastAPI()
#app.post("/predict_image")
#logger.catch
def predict_image(predict_image: UploadFile = File(...)):
logger.info('predict_image POST request performed')
try:
pil_image = np.array(Image.open(predict_image.file))
except:
raise HTTPException(
status_code=HTTP_422_UNPROCESSABLE_ENTITY, detail="Unable to process file"
)
pred = pil_image.shape
logger.info('predict_image POST request performed, shape {}'.format(pred))
return {'input_shape': pred}
Calling post request returns INFO: 127.0.0.1:59364 - "POST /predict_image HTTP/1.1" 400 Bad Request
How to fix it?
UPD:
Example from official tutorial return same exception:
#app.post("/uploadfile/")
async def create_upload_file(file: UploadFile = File(...)):
return {"filename": file.filename}
You probably need to install python-multipart.
Just:
pip install python-multipart
Fixed this way:
#app.post("/predict_image/")
#logger.catch
def make_inference(file: bytes = File(...)):
try:
pil_image = np.array(Image.open(BytesIO(file)))
except:
raise HTTPException(
status_code=HTTP_422_UNPROCESSABLE_ENTITY, detail="Unable to process file"
)
Inspired by ipython-notebook-proxy, and based on ipydra, and extending the latter to support more complex user authentication as well as a proxy, because in my use case, only port 80 can be exposed.
I am using flask-sockets for the gunicorn worker, but I am having troubles to proxy WebSockets. IPython uses three different WebSockets connections, /shell, /stdin, and /iopub, but I am only able to get the 101 Switching Protocols for the first two. And /stdin receives a Connection Close Frame as soon as is created.
This is the excerpt code in question:
# Flask imports...
from werkzeug import LocalProxy
from ws4py.client.geventclient import WebSocketClient
# I use my own LocalProxy because flask-sockets does not support Werkzeug Rules
websocket = LocalProxy(lambda: request.environ.get('wsgi.websocket', None))
websockets = {}
PROXY_DOMAIN = "127.0.0.1:8888" # IPython host and port
methods = ["GET", "POST", "PUT", "DELETE", "HEAD", "OPTIONS", "PATCH",
"CONNECT"]
#app.route('/', defaults={'url': ''}, methods=methods)
#app.route('/<path:url>', methods=methods)
def proxy(url):
with app.test_request_context():
if websocket:
while True:
data = websocket.receive()
websocket_url = 'ws://{}/{}'.format(PROXY_DOMAIN, url)
if websocket_url not in websockets:
client = WebSocketClient(websocket_url,
protocols=['http-only', 'chat'])
websockets[websocket_url] = client
else:
client = websockets[websocket_url]
client.connect()
if data:
client.send(data)
client_data = client.receive()
if client_data:
websocket.send(client_data)
return Response()
I also tried to create my own WebSocket proxy class, but it doesn't work either.
class WebSocketProxy(WebSocketClient):
def __init__(self, to, *args, **kwargs):
self.to = to
print(("Proxy to", self.to))
super(WebSocketProxy, self).__init__(*args, **kwargs)
def opened(self):
m = self.to.receive()
print("<= %d %s" % (len(m), str(m)))
self.send(m)
def closed(self, code, reason):
print(("Closed down", code, reason))
def received_message(self, m):
print("=> %d %s" % (len(m), str(m)))
self.to.send(m)
Regular request-response cycle works like a charm, so I removed that code. If interested, the complete code is hosted in hidra.
I run the server with
$ gunicorn -k flask_sockets.worker hidra:app
Here is my solution(ish). It is crude, but should serve as a starting point for building websocket proxy. The full code is available in unreleased project, pyramid_notebook.
This uses ws4py and uWSGI instead of gunicorn
We use uWSGI's internal mechanism to receive downstream websocket message loop. There is nothing like WSGI for websockets in Python world (yet?), but looks like every web server implements its own mechanism.
A custom ws4py ProxyConnection is created which can combine ws4py event loop with uWSGI event loop
The thing is started and messages start fly around
This uses Pyramid request (based on WebOb), but this really shouldn't matter and code should be fine for any Python WSGI app with little modifications
As you can see, this does not really take advantage of asynchronicity, but just sleep() if there is nothing coming in from the socket
Code goes here:
"""UWSGI websocket proxy."""
from urllib.parse import urlparse, urlunparse
import logging
import time
import uwsgi
from ws4py import WS_VERSION
from ws4py.client import WebSocketBaseClient
#: HTTP headers we need to proxy to upstream websocket server when the Connect: upgrade is performed
CAPTURE_CONNECT_HEADERS = ["sec-websocket-extensions", "sec-websocket-key", "origin"]
logger = logging.getLogger(__name__)
class ProxyClient(WebSocketBaseClient):
"""Proxy between upstream WebSocket server and downstream UWSGI."""
#property
def handshake_headers(self):
"""
List of headers appropriate for the upgrade
handshake.
"""
headers = [
('Host', self.host),
('Connection', 'Upgrade'),
('Upgrade', 'websocket'),
('Sec-WebSocket-Key', self.key.decode('utf-8')),
# Origin is proxyed from the downstream server, don't set it twice
# ('Origin', self.url),
('Sec-WebSocket-Version', str(max(WS_VERSION)))
]
if self.protocols:
headers.append(('Sec-WebSocket-Protocol', ','.join(self.protocols)))
if self.extra_headers:
headers.extend(self.extra_headers)
logger.info("Handshake headers: %s", headers)
return headers
def received_message(self, m):
"""Push upstream messages to downstream."""
# TODO: No support for binary messages
m = str(m)
logger.debug("Incoming upstream WS: %s", m)
uwsgi.websocket_send(m)
logger.debug("Send ok")
def handshake_ok(self):
"""
Called when the upgrade handshake has completed
successfully.
Starts the client's thread.
"""
self.run()
def terminate(self):
raise RuntimeError("NO!")
super(ProxyClient, self).terminate()
def run(self):
"""Combine async uwsgi message loop with ws4py message loop.
TODO: This could do some serious optimizations and behave asynchronously correct instead of just sleep().
"""
self.sock.setblocking(False)
try:
while not self.terminated:
logger.debug("Doing nothing")
time.sleep(0.050)
logger.debug("Asking for downstream msg")
msg = uwsgi.websocket_recv_nb()
if msg:
logger.debug("Incoming downstream WS: %s", msg)
self.send(msg)
s = self.stream
self.opened()
logger.debug("Asking for upstream msg")
try:
bytes = self.sock.recv(self.reading_buffer_size)
if bytes:
self.process(bytes)
except BlockingIOError:
pass
except Exception as e:
logger.exception(e)
finally:
logger.info("Terminating WS proxy loop")
self.terminate()
def serve_websocket(request, port):
"""Start UWSGI websocket loop and proxy."""
env = request.environ
# Send HTTP response 101 Switch Protocol downstream
uwsgi.websocket_handshake(env['HTTP_SEC_WEBSOCKET_KEY'], env.get('HTTP_ORIGIN', ''))
# Map the websocket URL to the upstream localhost:4000x Notebook instance
parts = urlparse(request.url)
parts = parts._replace(scheme="ws", netloc="localhost:{}".format(port))
url = urlunparse(parts)
# Proxy initial connection headers
headers = [(header, value) for header, value in request.headers.items() if header.lower() in CAPTURE_CONNECT_HEADERS]
logger.info("Connecting to upstream websockets: %s, headers: %s", url, headers)
ws = ProxyClient(url, headers=headers)
ws.connect()
# Happens only if exceptions fly around
return ""
We are developing a WP8 app that requires push notifications.
To test it we have run the push notification POST request with CURL command line, making sure that it actually connects, authenticates with the client SSL certificate and sends the correct data. We know for a fact that this work as we are receiving pushes to the devices.
This is the CURL command we have been using for testing purposes:
curl --cert client_cert.pem -v -H "Content-Type:text/xml" -H "X-WindowsPhone-Target:Toast" -H "X-NotificationClass:2" -X POST -d "<?xml version='1.0' encoding='utf-8'?><wp:Notification xmlns:wp='WPNotification'><wp:Toast><wp:Text1>My title</wp:Text1><wp:Text2>My subtitle</wp:Text2></wp:Toast></wp:Notification>" https://db3.notify.live.net/unthrottledthirdparty/01.00/AAF9MBULkDV0Tpyj24I3bzE3AgAAAAADCQAAAAQUZm52OkE1OUZCRDkzM0MyREY1RkE
Of course our SSL cert is needed to actually use the URL, but I was hoping someone else has done this and can see what we are doing wrong.
Now, our problem is that we need to make this work with Ruby instead, something we have been unable to get to work so far.
We have tried using HTTParty with no luck, and also net/http directly without any luck.
Here is a very simple HTTParty test script I have used to test with:
require "httparty"
payload = "<?xml version='1.0' encoding='utf-8'?><wp:Notification xmlns:wp='WPNotification'><wp:Toast><wp:Text1>My title</wp:Text1><wp:Text2>My subtitle</wp:Text2></wp:Toast></wp:Notification>"
uri = "https://db3.notify.live.net/unthrottledthirdparty/01.00/AAF9MBULkDV0Tpyj24I3bzE3AgAAAAADCQAAAAQUZm52OkE1OUZCRDkzM0MyREY1RkE"
opts = {
body: payload,
headers: {
"Content-Type" => "text/xml",
"X-WindowsPhone-Target" => "Toast",
"X-NotificationClass" => "2"
},
debug_output: $stderr,
pem: File.read("/Users/kenny/Desktop/client_cert.pem"),
ca_file: File.read('/usr/local/opt/curl-ca-bundle/share/ca-bundle.crt')
}
resp = HTTParty.post uri, opts
puts resp.code
This seems to connect with SSL properly, but then the MS IIS server returns 403 to us for some reason we don't get.
Here is essentially the same thing I've tried using net/http:
require "net/http"
url = URI.parse "https://db3.notify.live.net/unthrottledthirdparty/01.00/AAF9MBULkDV0Tpyj24I3bzE3AgAAAAADCQAAAAQUZm52OkE1OUZCRDkzM0MyREY1RkE"
payload = "<?xml version='1.0' encoding='utf-8'?><wp:Notification xmlns:wp='WPNotification'><wp:Toast><wp:Text1>My title</wp:Text1><wp:Text2>My subtitle</wp:Text2></wp:Toast></wp:Notification>"
pem_path = "./client_cert.pem"
cert = File.read pem_path
http = Net::HTTP.new url.host, url.port
http.use_ssl = true
http.cert = OpenSSL::X509::Certificate.new cert
http.key = OpenSSL::PKey::RSA.new cert
http.ca_path = '/etc/ssl/certs' if File.exists?('/etc/ssl/certs') # Ubuntu
http.ca_file = '/usr/local/opt/curl-ca-bundle/share/ca-bundle.crt' if File.exists?('/usr/local/opt/curl-ca-bundle/share/ca-bundle.crt') # Mac OS X
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
r = Net::HTTP::Post.new url.path
r.body = payload
r.content_type = "text/xml"
r["X-WindowsPhone-Target"] = "toast"
r["X-NotificationClass"] = "2"
http.start do
resp = http.request r
puts resp.code, resp.body
end
Like the HTTParty version, this also returns 403..
I'm starting to get the feeling that this won't actually work with net/http, but I've also seen a few examples of code claiming to work, but I can't see any difference compared to what we have tested with here.
Does anyone know how to fix this? Is it possible? Should I use libcurl instead perhaps? Or even do a system call to curl? (I may have to do the last one as an interim solution if we can't get this to work soon).
Any input is greatly appreciated!
Thanks,
Kenny
Try using some tool like http://mitmproxy.org to compare requests from your code and curl.
For example curl in addition to specified headers does send User-Agent and Accept-headers, microsoft servers may be checking for these for some reason.
If this does not help - then it's ssl-related