How do I get useful diagnostics from boto? - amazon-ec2

How can I get useful diagnostics out of boto? All I ever seem to get is the infuriatingly useless "400 Bad Request". I recognize that boto is just passing along what the underlying API makes available, but surely there's some way to get something more useful than "Bad Request".
Traceback (most recent call last):
File "./mongo_pulldown.py", line 153, in <module>
main()
File "./mongo_pulldown.py", line 24, in main
print "snap = %r" % snap
File "./mongo_pulldown.py", line 149, in __exit__
self.connection.delete_volume(self.volume.id)
File "/home/roy/deploy/current/python/local/lib/python2.7/site-packages/boto/ec2/connection.py", line 1507, in delete_volume
return self.get_status('DeleteVolume', params, verb='POST')
File "/home/roy/deploy/current/python/local/lib/python2.7/site-packages/boto/connection.py", line 985, in get_status
raise self.ResponseError(response.status, response.reason, body)
boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request

I didn't have much luck with putting the debug setting in the config file, but the call to ec2.connect_to_region() takes a debug parameter, with the same values as in j0nes' answer.
ec2 = boto.ec2.connect_to_region("eu-west-1", debug=2)
Everything that connection object sends/receives will get dumped to stdout.

You can configure the boto.cfg file to be more verbose:
[Boto]
debug = 2
debug: Controls the level of debug messages that will be printed by
the boto library. The following values are defined:
0 - no debug messages are printed
1 - basic debug messages from boto are printed
2 - all boto debugging messages plus request/response messages from httplib

Related

Rasa Timeout Issue

When running Rasa (tried on versions 1.3.3, 1.3.7, 1.3.8) I encounter this timeout exception message almost every time I make a call. I am running a simple program that recognises when a user offers their age, and stores the age in a database through an action response:
Bot loaded. Type a message and press enter (use '/stop' to exit):
Your input -> I am 24 years old
2019-10-10 13:29:33 ERROR asyncio - Task exception was never retrieved
future: <Task finished coro=<configure_app.<locals>.run_cmdline_io() done, defined at /Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/rasa/core/run.py:123> exception=TimeoutError()>
Traceback (most recent call last):
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/rasa/core/run.py", line 127, in run_cmdline_io
server_url=constants.DEFAULT_SERVER_FORMAT.format("http", port)
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/rasa/core/channels/console.py", line 138, in record_messages
async for response in bot_responses:
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/async_generator/_impl.py", line 366, in step
return await ANextIter(self._it, start_fn, *args)
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/async_generator/_impl.py", line 205, in throw
return self._invoke(self._it.throw, type, value, traceback)
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/async_generator/_impl.py", line 209, in _invoke
result = fn(*args)
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/rasa/core/channels/console.py", line 103, in send_message_receive_stream
async for line in resp.content:
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/aiohttp/streams.py", line 40, in __anext__
rv = await self.read_func()
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/aiohttp/streams.py", line 329, in readline
await self._wait('readline')
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/aiohttp/streams.py", line 297, in _wait
await waiter
File "/Users/Kami/Documents/rasa/venv/lib/python3.7/site-packages/aiohttp/helpers.py", line 585, in __exit__
raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError
Transport closed # ('127.0.0.1', 63319) and exception experienced during error handling
Originally I thought this timeout was being caused by using large lookup tables for another part of my Rasa program, but for age recognition I am using a simple regex:
## regex:age
- ^(0?[1-9]|[1-9][0-9]|[1][1-9][1-9])$
And even this also causes the timeout.
Please help me solve this. I don't even need to avoid the timeout, I just want to know where I can catch/ignore this exception.
Thanks!
I was fetching data from an API wherein I was getting a Timeout error because it was not able to fetch the data in the default time limit :
Go to the directory: venv/Lib/site-packages/rasa/core/channels/console.py
Change the default value of DEFAULT_STREAM_READING_TIMEOUT_IN_SECONDS to more than 10, in my case I changed it to 30 it worked.
Another reason could be fetching of data again and again within a short period of time which could result in a timeout.
Observations :
When DEFAULT_STREAM_READING_TIMEOUT_IN_SECONDS is set to 10 i get timeout error
When DEFAULT_STREAM_READING_TIMEOUT_IN_SECONDS is set to 30 and keep on running rasa shell again and again I get a timeout error
When DEFAULT_STREAM_READING_TIMEOUT_IN_SECONDS is set to 30 and run rasa shell not frequently it functions perfectly.
Make sure that you uncomment the below code
action_endpoint:
url: "http://localhost:5055/webhook"
in the endpoints.yml. It is used when you are making custom actions to query database.
I had the same problem and it was not solved by increasing timeout.
Make sure you are sending back a 'string' to the rasa shell from rasa action sever. What I mean is, if you are using 'text = ' in your utter_message, make sure that the async result is also a string and not just an object or something else. Change the type if required.
dispatcher.utter_message(text='has to be a string')
Running 'rasa shell -vv' showed me that it is receiving an object and that is why it is not able to parse it, hence timeout.
I can't comment now, but add followup to Vishal response. To check that hooks are present and waiting for connection you can use -vv command line switch. This display all available hooks at startup. For example:
2020-04-21 14:05:56 DEBUG rasa.core.utils - Available web server routes:
/webhooks/rasa GET custom_webhook_RasaChatInput.health
/webhooks/rasa/webhook POST custom_webhook_RasaChatInput.receive
/webhooks/rest GET custom_webhook_RestInput.health
/webhooks/rest/webhook POST custom_webhook_RestInput.receive
/ GET hello

HDFS: Errno 22 on attempt to edit an existing file in mounted NFS volume

Summary: I mounted HDFS nfs volume in OSX and it will not let me edit existing files. I can append and create files with content but not "open them with write flag".
Originally, I asked about a particular problem with JupyterLab failing to save notebooks into nfs mounted volumes but while trying to dig down to the roots, I realized (hopefully right) that it's about editing existing files.
I mounted HDFS nfs on OSX and I can access the files, read and write and whatnot. JupyterLab though can do pretty much everything but can't really save notebooks.
I was able to identify the pattern for what's really happening and the problem boils down to this: you can't open existing files in nfs volume for write:
This will work with a new file:
with open("rand.txt", 'w') as f:
f.write("random text")
But if you try to run it again (the file has been created now and the content is there), you'll get the following Exception:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-15-94a46812fad4> in <module>()
----> 1 with open("rand.txt", 'w') as f:
2 f.write("random text")
OSError: [Errno 22] Invalid argument: 'rand.txt'
I am pretty sure the permissions and all are ok:
with open("seven.txt", 'w') as f:
f.write("random text")
f.writelines(["one","two","three"])
r = open("seven.txt", 'r')
print(r.read())
random textonetwothree
I can also append to files no problem:
aleksandrs-mbp:direct sasha$ echo "Another line of text" >> seven.txt && cat seven.txt
random textonetwothreeAnother line of text
I mount it with the following options:
aleksandrs-mbp:hadoop sasha$ mount -t nfs -o
vers=3,proto=tcp,nolock,noacl,sync localhost:/
/srv/ti/jupyter-samples/~Hadoop
Apache documentation suggests that NFS gateway does not support random writes. I tried looking at mount documentation but could not find anything specific that points to enforcing sequential writing. I tried playing with different options but it doesn't seem to help much.
This is the exception I get from JupyterLab when it's trying to save the notebook:
[I 03:03:33.969 LabApp] Saving file at /~Hadoop/direct/One.ipynb
[E 03:03:33.980 LabApp] Error while saving file: ~Hadoop/direct/One.ipynb [Errno 22] Invalid argument: '/srv/ti/jupyter-samples/~Hadoop/direct/.~One.ipynb'
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/notebook/services/contents/filemanager.py", line 471, in save
self._save_notebook(os_path, nb)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/notebook/services/contents/fileio.py", line 293, in _save_notebook
with self.atomic_writing(os_path, encoding='utf-8') as f:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 82, in __enter__
return next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/notebook/services/contents/fileio.py", line 213, in atomic_writing
with atomic_writing(os_path, *args, log=self.log, **kwargs) as f:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 82, in __enter__
return next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/notebook/services/contents/fileio.py", line 103, in atomic_writing
copy2_safe(path, tmp_path, log=log)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/notebook/services/contents/fileio.py", line 51, in copy2_safe
shutil.copyfile(src, dst)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/shutil.py", line 115, in copyfile
with open(dst, 'wb') as fdst:
OSError: [Errno 22] Invalid argument: '/srv/ti/jupyter-samples/~Hadoop/direct/.~One.ipynb'
[W 03:03:33.981 LabApp] 500 PUT /api/contents/~Hadoop/direct/One.ipynb?1534835013966 (::1): Unexpected error while saving file: ~Hadoop/direct/One.ipynb [Errno 22] Invalid argument: '/srv/ti/jupyter-samples/~Hadoop/direct/.~One.ipynb'
[W 03:03:33.981 LabApp] Unexpected error while saving file: ~Hadoop/direct/One.ipynb [Errno 22] Invalid argument: '/srv/ti/jupyter-samples/~Hadoop/direct/.~One.ipynb'
This is what I see in the NFS logs at the same time:
2018-08-21 03:05:34,006 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: Setting file size is not supported when setattr, fileId: 16417
2018-08-21 03:05:34,006 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: Setting file size is not supported when setattr, fileId: 16417
Not exactly sure what this means but if I understand the RFC, it should be part of implementation:
Servers must support extending the file size via SETATTR.
I understand the complexity behind mounting hdfs and letting clients write all they want while keeping these files distributed and maintain integrity. Is there though a compromise that would enable writes via nfs?

Computing Sky View Factor in GrassGis

Hy community,
I´m currently working on my Master Thesis and I have to compute the "sky view factor". Since ESRI Arcmap is not a helpful choice to do that, I found that it is fairly easy to compute with GrassGIS (V.7) using the r.skyview command.
But I get an error message in the logfile i can´t really deal with. Hope that someone of you is experienced with that kind of problem and can help me out with this.
Here is what the GrassGIS output says:
*(Fri Jan 09 16:17:10 2015)
r.skyview input=Subset#PERMANENT output=Subset_SVF ndir=16 maxdistance=15.0
Unknown module parameter "keyword" at line 21
Unknown module parameter "keyword" at line 22
FEHLER: Value <rast> ambiguous for parameter <type>
Valid options: raster,raster_3d,vector,old_vector,ascii_vector,labels,region,group,all
Traceback (most recent call last):
File "C:\Users\Axel-HP\AppData\Roaming\GRASS7\addons/scripts/r.skyview.py", line 120, in <module>
sys.exit(main())
File "C:\Users\Axel-HP\AppData\Roaming\GRASS7\addons/scripts/r.skyview.py", line82, in main
old_maps = _get_horizon_maps()
File "C:\Users\Axel-HP\AppData\Roaming\GRASS7\addons/scripts/r.skyview.py", line 114, in_get_horizon_maps
pattern=TMP_NAME + "*")[gcore.gisenv()['MAPSET']]
File "C:\Temp\GRASSGIS7\etc\python\grass\script\core.py", line 1176, in list_grouped
type=types, pattern=pattern,
exclude=exclude).splitlines():
File "C:\Temp\GRASSGIS7\etc\python\grass\script\core.py", line 425, in read_command
return handle_errors(returncode, stdout, args, kwargs)
File "C:\Temp\GRASSGIS7\etc\python\grass\script\core.py", line 308, in handle_errors
returncode=returncode)
grass.exceptions.CalledModuleError: Module run None
['g.list', '--q', '-m', 'type=rast', 'pattern=tmp_horizon_2340*'] ended with error
Process ended with non-zero return code 1. See errors in the (error) output.
(Fri Jan 09 16:17:11 2015) Befehl ausgeführt (1 Sek)*
I just tested r.skyview and it is working. There were big changes in GRASS module parameter names recently which caused the trouble, but now it should work without problems.

HUE Web UI will not login first time

I've installed CDH 4.2.1 and now I'm trying to access HUE Web UI for the first time. I enter a new user name and password, click Sign Up, and wait and wait and nothing happens 20 minutes. If I open another window and try to access the login page then I get a message that the database is locked.
I'm running on a single node. And following is the error message for the second window:
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/eventlet-0.9.14-py2.6.egg/eventlet/wsgi.py", line 336, in handle_one_response
result = self.application(self.environ, start_response)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/handlers/wsgi.py", line 245, in __call__
response = middleware_method(request, response)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/contrib/sessions/middleware.py", line 36, in process_response
request.session.save()
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/contrib/sessions/backends/db.py", line 63, in save
obj.save(force_insert=must_create, using=using)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/models/base.py", line 434, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/models/base.py", line 500, in save_base
rows = manager.using(using).filter(pk=pk_val)._update(values)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/models/query.py", line 491, in _update
return query.get_compiler(self.db).execute_sql(None)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/models/sql/compiler.py", line 861, in execute_sql
cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/models/sql/compiler.py", line 727, in execute_sql
cursor.execute(sql, params)
File "/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/backends/sqlite3/base.py", line 200, in execute
return Database.Cursor.execute(self, query, params)
DatabaseError: database is locked
Any idea?
Thank you,
Roberto.
The error here means it is trying to connect to Sqlite and fails as the first connection is still not finished (and Sqlite is not concurrent) so it is not very useful here.
If would look in the hue logs, especially 'runcpserver.log' if there is more information.
Adding 'export DESKTOP_DEBUG=1' in the environment and restarting Hue might give more details.
I would go on http://HUE_SERVER:HUE_PORT/dump_config, look at the 'database' value and delete the file and run a /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/share/hue syncb or sync database if in CM.
It will recreate the database and make sure no other process is using it.
If it still does not work I would give a try to MySQL: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_15_8.html

Tornado app halting regularly for few seconds with 100% CPU

I am trying to troubleshoot an app running on tornado 2.4 on Ubuntu 11.04 on EC2. It appears to be hitting 100% CPU regularly and halts at that request for few seconds.
Any help on this is greatly appreciated.
Symptoms:
top shows 100% cpu just at the time it halts. Normally server is about 30-60% cpu utilization.
It halts every 2-5 minutes just for one request. I have checked that there are no cronjobs affecting this.
It halts for about 2 to 9 seconds. Problem goes away on restarting tornado and worsens with tornado uptime. Longer the server is up, for longer duration it halts.
Http requests for which the problem appears do not seem to have any pattern.
Interestingly, next request in log sometimes sometimes matches the halting duration and some times does not. Example:
00:00:00 GET /some/request ()
00:00:09 GET /next/request (9000ms)
00:00:00 GET /some/request ()
00:00:09 GET /next/request (1ms)
# 9 seconds gap in requests is certainly not possible as clients are constantly polling.
Database (mongodb) shows no expensive or large number of queries. No page faults. Database is on the same machine - local disk.
vmstat shows no change in read/write sizes compared to last few minutes.
tornado is running behind nginx.
sending SIGINT when it was most likely halting, gives different stacktrace everytime. Some of them are below:
Traceback (most recent call last):
File "chat/main.py", line 3396, in <module>
main()
File "chat/main.py", line 3392, in main
tornado.ioloop.IOLoop.instance().start()
File "/home/ubuntu/tornado/tornado/ioloop.py", line 515, in start
self._run_callback(callback)
File "/home/ubuntu/tornado/tornado/ioloop.py", line 370, in _run_callback
callback()
File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped
callback(*args, **kwargs)
File "/home/ubuntu/tornado/tornado/iostream.py", line 303, in wrapper
callback(*args)
File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped
callback(*args, **kwargs)
File "/home/ubuntu/tornado/tornado/httpserver.py", line 298, in _on_request_body
self.request_callback(self._request)
File "/home/ubuntu/tornado/tornado/web.py", line 1421, in __call__
handler = spec.handler_class(self, request, **spec.kwargs)
File "/home/ubuntu/tornado/tornado/web.py", line 126, in __init__
application.ui_modules.iteritems())
File "/home/ubuntu/tornado/tornado/web.py", line 125, in <genexpr>
self.ui["_modules"] = ObjectDict((n, self._ui_module(n, m)) for n, m in
File "/home/ubuntu/tornado/tornado/web.py", line 1114, in _ui_module
def _ui_module(self, name, module):
KeyboardInterrupt
Traceback (most recent call last):
File "chat/main.py", line 3398, in <module>
main()
File "chat/main.py", line 3394, in main
tornado.ioloop.IOLoop.instance().start()
File "/home/ubuntu/tornado/tornado/ioloop.py", line 515, in start
self._run_callback(callback)
File "/home/ubuntu/tornado/tornado/ioloop.py", line 370, in _run_callback
callback()
File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped
callback(*args, **kwargs)
File "/home/ubuntu/tornado/tornado/iostream.py", line 303, in wrapper
callback(*args)
File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped
callback(*args, **kwargs)
File "/home/ubuntu/tornado/tornado/httpserver.py", line 285, in _on_headers
self.request_callback(self._request)
File "/home/ubuntu/tornado/tornado/web.py", line 1408, in __call__
transforms = [t(request) for t in self.transforms]
File "/home/ubuntu/tornado/tornado/web.py", line 1811, in __init__
def __init__(self, request):
KeyboardInterrupt
Traceback (most recent call last):
File "chat/main.py", line 3351, in <module>
main()
File "chat/main.py", line 3347, in main
tornado.ioloop.IOLoop.instance().start()
File "/home/ubuntu/tornado/tornado/ioloop.py", line 571, in start
self._handlers[fd](fd, events)
File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped
callback(*args, **kwargs)
File "/home/ubuntu/tornado/tornado/netutil.py", line 342, in accept_handler
callback(connection, address)
File "/home/ubuntu/tornado/tornado/netutil.py", line 237, in _handle_connection
self.handle_stream(stream, address)
File "/home/ubuntu/tornado/tornado/httpserver.py", line 156, in handle_stream
self.no_keep_alive, self.xheaders, self.protocol)
File "/home/ubuntu/tornado/tornado/httpserver.py", line 183, in __init__
self.stream.read_until(b("\r\n\r\n"), self._header_callback)
File "/home/ubuntu/tornado/tornado/iostream.py", line 139, in read_until
self._try_inline_read()
File "/home/ubuntu/tornado/tornado/iostream.py", line 385, in _try_inline_read
if self._read_to_buffer() == 0:
File "/home/ubuntu/tornado/tornado/iostream.py", line 401, in _read_to_buffer
chunk = self.read_from_fd()
File "/home/ubuntu/tornado/tornado/iostream.py", line 632, in read_from_fd
chunk = self.socket.recv(self.read_chunk_size)
KeyboardInterrupt
Any tips on how to troubleshoot this is greatly appreciated.
Further observations:
strace -p, during the time it hangs, shows empty output.
ltrace -p during hang time shows only free() calls in large numbers:
free(0x6fa70080) =
free(0x1175f8060) =
free(0x117a5c370) =
It sounds like you're suffering from garbage collection (GC) storms. The behavior you've described is typical of that diagnosis, and the ltrace further supports the hypothesis.
Lots of objects are being allocated and disposed of in the main/event loops being exercised by your usage ... and the periodic flurries of calls to free() result from that.
One possible approach would be to profile your code (or libraries on which you are depending) and see if you can refactor it to use (and re-use) objects from pre-allocated pools.
Another possible mitigation would be to make your own, more frequent, calls to trigger the garbage collection --- more expensive in aggregate but possibly less costly at each call. (That would be a trade-off for more predictable throughput).
You might be able to use the Python: gc module both for investigating the issue more deeply (using gc.set_debug()) and for a simple attempted mitigation (calls to gc.collect() after each transaction for example). You might also try running your application with gc.disable() for a reasonable length of time to see if further implicates the Python garbage collector. Note that disabling the garbage collector for an extended period of time will almost certainly cause paging/swapping ... so use it only for validation of our hypothesis and don't expect that to solve the problem in any meaningful way. It may just defer the problem 'til the whole system is thrashing and needs to be rebooted.
Here's an example of using gc.collect() in another SO thread on Tornado: SO: Tornado memory leak on dropped connections

Resources