I am developing a Wikipedia bot to analyze editing contributions. Unfortunately, it takes hours to complete a single run and during that time Wikipedia's database replication delay—at some point during the run—is sure to exceed 5 seconds (the default maxlag value). The recommendation in the API's maxlag parameter is to detect the lag error, pause for X seconds and retry.
But all I am doing is reading contributions with:
usrpg = pywikibot.Page(site, 'User:' + username)
usr = pywikibot.User(usrpg)
for contrib in usr.contributions(total=max_per_user_contribs):
# (analyzes contrib here)
How to detect the error and resume it? This is the error:
WARNING: API error maxlag: Waiting for 10.64.32.21: 7.1454429626465 seconds lagged
Traceback (most recent call last):
File ".../bot/core/pwb.py", line 256, in <module>
if not main():
File ".../bot/core/pwb.py", line 250, in main
run_python_file(filename, [filename] + args, argvu, file_package)
File ".../bot/core/pwb.py", line 121, in run_python_file
main_mod.__dict__)
File "analyze_activity.py", line 230, in <module>
attrs = usr.getprops()
File ".../bot/core/pywikibot/page.py", line 2913, in getprops
self._userprops = list(self.site.users([self.username, ]))[0]
File ".../bot/core/pywikibot/data/api.py", line 2739, in __iter__
self.data = self.request.submit()
File ".../bot/core/pywikibot/data/api.py", line 2183, in submit
raise APIError(**result['error'])
pywikibot.data.api.APIError: maxlag: Waiting for 10.64.32.21:
7.1454 seconds lagged [help:See https://en.wikipedia.org/w/api.php for API usage]
<class 'pywikibot.data.api.APIError'>
CRITICAL: Closing network session.
It occurs to me to catch the exception thrown in that line of code:
raise APIError(**result['error'])
But then restarting the contributions for the user seems terribly inefficient. Some users have 400,000 edits, so rerunning that from the beginning is a lot of backsliding.
I have googled for examples of doing this (detecting the error and retrying) but I found nothing useful.
Converting the previous conversation in comments into an answer.
One possible method to resolve this is to try/catch the error and redo the piece of code which caused the error.
But, pywikibot already does this internally for us ! Pywikibot, by default tries to retry every failed API call 2 times if you're using the default user-config.py it generates. I found that increasing the following configs does the trick in my case:
maxlag = 20
retry_wait = 20
max_retries = 8
The maxlag is the parameter recommended to increase according to the documentation of Maxlag parameter, especially if you're doing a large number of writes in a short span of time. But, the retry_wait and max_retries configs are useful in case someone else is writing a lot (As is my case: My scripts just read from wiki).
Related
I am using Instaloader via command line on Windows 11, with the following command:
.\instaloader --login=MYUSERNAME :saved --dirname-pattern="Saved_Posts\{profile}" --filename-pattern="{profile}-{shortcode}" --no-resume --no-metadata-json --slide 1 --no-captions --no-video-thumbnails --no-iphone
This attempts to download approximately 12,000 saved posts from a profile. Instaloader behaves as expected for several thousand posts, occasionally giving the following error:
Too many queries in the last time. Need to wait 15 seconds, until 13:19.
The process then resumes successfully for several hundred more posts. Eventually, however, I start encountering 429 errors:
JSON Query to graphql/query: 429 Too Many Requests [retrying; skip with ^C]
Number of requests within last 10/11/20/22/30/60 minutes grouped by type:
d6f4427fbe92d846298cf93df0b937d3: 0 0 0 0 0 0
f883d95537fbcd400f466f63d42bd8a1: 0 0 0 1 1 11
* 2b0673e0dc4580674a88d426fe00ea90: 59 64 121 134 191 709
Instagram responded with HTTP error "429 - Too Many Requests". Please
do not run multiple instances of Instaloader in parallel or within
short sequence. Also, do not use any Instagram App while Instaloader
is running.
The request will be retried in 7 seconds, at 14:01.
This error then repeats over and over again, I believe until the default maximum connection attempts limit is reached and it moves onto the next post — which also receives the same error. Importantly, this error does not go away after several hours of these 'slower' requests being made; it seems to persist as long as Instaloader stays open. I have seen these 429 errors with very few requests in the last 60 minutes (i.e. <100), which makes me think I am hitting quite a long shadowban.
I have tried setting the maximum connection attempts to 0 (i.e. retry indefinitely), but this time limit appears to be capped at 666 seconds, or 11 minutes. The error does not seem to clear even leaving Instaloader to send requests every 11 minutes in this way; it is as though each individual request 'resets' the ban or something.
I am looking for a way of resolving this issue, which could include:
Adding a command to force 429 errors to be retried after subsequently longer periods of time (instead of the number of seconds being capped at 666 seconds)
Adding a command that 'preserves' wait times after each 429 error. e.g. if downloading Post 456 fails and retries after 5, then 10, then 15 seconds before successfully downloading, and then downloading Post 457 immediately fails... start the wait for a retry on Post 457 at at LEAST 15 seconds, rather than going back to 5!
Avoiding the 429 errors in the first place, if there appears to be an issue with my command line prompt
Breaking down the request into 'batches' and running one of those prompts every few days. e.g. is there a way to download Saved Posts 1-500, then 500-1000, and so on? (The Saved Posts are not necessarily in chronological order of the post date, which is what I've tried so far)
I have looked at several other posts on 429 errors but the general theme seems to be either:
Wait some time for the issue to clear — have tried this for up to 48 hours, but running the command again starts from post #1 and never makes it to the latter half of posts
Disable iPhone API requests — already done, which helps but does not solve the issue
The 429 errors simply should not be encountered during normal behaviour – well, they are!
I'm developing a load testing tool with gevent.
I create a testing script like the following
while True:
# send http request
response = client.sendAndRecv()
gevent.sleep(0.001)
The send/receive action completed very quick, like 0.1ms
So the expected rate should be close to 1000 per second.
But actually I got it like about 500 per second on both Ubuntu and Windows platform.
Most likely the gevent sleep is not accuate.
Gevent use libuv or libev for internal loop. And I got the following description about how libuv handle poll timeout from here
If the loop was run with the UV_RUN_NOWAIT flag, the timeout is 0.
If the loop is going to be stopped (uv_stop() was called), the timeout is 0.
If there are no active handles or requests, the timeout is 0.
If there are any idle handles active, the timeout is 0.
If there are any handles pending to be closed, the timeout is 0.
If none of the above cases matches, the timeout of the closest timer is taken, or if there are no active timers, infinity.
It seems when we have gevent sleep , actually it will setup a timer, and libuv loop use the timeout of the closest timer.
I really doubt that is the root cause : the OS system select timeout is not precise !!
I noticed libuv loop could run with UV_RUN_NOWAIT mode, and it will make loop timeout 0. That is no sleeping if no iOS event.
It may cause the load of one CPU core to 100%, but it is acceptable to me.
So I modify the function run of gevent code hub.py, as the following
loop.run(nowait=True)
But when I run the tool, I got the complain 'This operation would block forever', like the following
gevent.sleep(0.001)
File "C:\Python37\lib\site-packages\gevent\hub.py", line 159, in sleep
hub.wait(t)
File "src\gevent\_hub_primitives.py", line 46, in gevent.__hub_primitives.WaitOperationsGreenlet.wait
File "src\gevent\_hub_primitives.py", line 55, in gevent.__hub_primitives.WaitOperationsGreenlet.wait
File "src\gevent\_waiter.py", line 151, in gevent.__waiter.Waiter.get
File "src\gevent\_greenlet_primitives.py", line 60, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src\gevent\_greenlet_primitives.py", line 60, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src\gevent\_greenlet_primitives.py", line 64, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src\gevent\__greenlet_primitives.pxd", line 35, in gevent.__greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
So what should I do?
Yes, I finally found the trick.
if libuv loop run mode is not UV_RUN_DEFAULT, gevent will do some checking and if libuv loop is 'nowait' mode, It will say "This operation would block forever".
That's wired, actually it will not blcok forever.
Anyway, I just modify the line 473 of the file libuv/loop.py as the following
if mode == libuv.UV_RUN_DEFAULT:
while self._ptr and self._ptr.data:
self._run_callbacks()
self._prepare_ran_callbacks = False
# here, change from UV_RUN_ONCE to UV_RUN_NOWAIT
ran_status = libuv.uv_run(self._ptr, libuv.UV_RUN_NOWAIT)
After that, run the load tool, Wow..... exactly as what I expected, TPS is very close to what I set, but one core load is 100%.
That totally acceptable, because it is load testing tool.
So if we have real time OS kenel, we don't bother to do that.
I am using the wxProgressdialog to show time between switching ports and time between taking measurements. I am running this test for over 24 hours (repeating the same thing over and over while recording the data). Therro that appears during hour 7 is:
Traceback (most recent call last):
File "C:\Users\localuser\Desktop\Thermal\Cheyenne_Antenna_Cal_PDA_Thermal_Test.py", line 2117, in take_measurements_at_interval
self.take_measurement(self)
File "C:\Users\localuser\Desktop\Thermal\Cheyenne_Antenna_Cal_PDA_Thermal_Test.py", line 2185, in take_measurement
self.Measure_Plot(self)
File "C:\Users\localuser\Desktop\Thermal\Cheyenne_Antenna_Cal_PDA_Thermal_Test.py", line 2231, in Measure_Plot
style=wx.PD_AUTO_HIDE | wx.PD_ELAPSED_TIME | wx.PD_REMAINING_TIME)
File "C:\Python27\lib\site-packages\wx-2.8-msw-unicode\wx_windows.py", line 2951, in init
windows.ProgressDialog_swiginit(self,windows.new_ProgressDialog(*args, **kwargs))
wx._core.PyAssertionError: C++ assertion "wxAssertFailure" failed at ....\src\msw\control.cpp(159) in wxControl::MSWCreateControl(): CreateWindowEx("STATIC", flags=52000100, ex=00000000) failed
Here is the code that is being used to 'delay time'
#Giving Time for switch to toggle next port
progressMax = 5
dialog = wx.ProgressDialog("A progress box", "Time to switch", progressMax,
style=wx.PD_AUTO_HIDE | wx.PD_ELAPSED_TIME | wx.PD_REMAINING_TIME)
keepGoing = True
count = 0
while keepGoing and count < progressMax:
count = count + 1
wx.Sleep(1)
keepGoing = dialog.Update(count)
dialog.Destroy()
The code pauses 5 seconds to allow switch hardware and PNA to be steady before data is recorded. All of this is happening in a 'For' loop for a period of time. If anyone needs more information I will be happy to proved.
If the window creation fails after running for a long time, chances are you simply run out of windows, which are still a very limited resource under Microsoft Windows (the exact limit depends on the Windows version, but could be as log as 16,384).
This could happen if you never return to the main event loop during all this time because the top level windows will only be really destroyed (and not just hidden) once you get back to it.
I have an agent-based model, where several agents are started by a central process and communicate via another central process. Every agent and the communication process communicate via zmq. However, when I start more than 100 agents standard_out sends:
Invalid argument (src/stream_engine.cpp:143) Too many open files
(src/ipc_listener.cpp:292)
and Mac Os prompts a problem report :
Python quit unexpectedly while using the libzmq.5.dylib plug-in.
The problem appears to me that too many contexts are opened. But how can I avoid this with multiprocessing?
I attach part of the code below:
class Agent(Database, Logger, Trade, Messaging, multiprocessing.Process):
def __init__(self, idn, group, _addresses, trade_logging):
multiprocessing.Process.__init__(self)
....
def run(self):
self.context = zmq.Context()
self.commands = self.context.socket(zmq.SUB)
self.commands.connect(self._addresses['command_addresse'])
self.commands.setsockopt(zmq.SUBSCRIBE, "all")
self.commands.setsockopt(zmq.SUBSCRIBE, self.name)
self.commands.setsockopt(zmq.SUBSCRIBE, group_address(self.group))
self.out = self.context.socket(zmq.PUSH)
self.out.connect(self._addresses['frontend'])
time.sleep(0.1)
self.database_connection = self.context.socket(zmq.PUSH)
self.database_connection.connect(self._addresses['database'])
time.sleep(0.1)
self.logger_connection = self.context.socket(zmq.PUSH)
self.logger_connection.connect(self._addresses['logger'])
self.messages_in = self.context.socket(zmq.DEALER)
self.messages_in.setsockopt(zmq.IDENTITY, self.name)
self.messages_in.connect(self._addresses['backend'])
self.shout = self.context.socket(zmq.SUB)
self.shout.connect(self._addresses['group_backend'])
self.shout.setsockopt(zmq.SUBSCRIBE, "all")
self.shout.setsockopt(zmq.SUBSCRIBE, self.name)
self.shout.setsockopt(zmq.SUBSCRIBE, group_address(self.group))
self.out.send_multipart(['!', '!', 'register_agent', self.name])
while True:
try:
self.commands.recv() # catches the group adress.
except KeyboardInterrupt:
print('KeyboardInterrupt: %s,self.commands.recv() to catch own adress ~1888' % (self.name))
break
command = self.commands.recv()
if command == "!":
subcommand = self.commands.recv()
if subcommand == 'die':
self.__signal_finished()
break
try:
self._methods[command]()
except KeyError:
if command not in self._methods:
raise SystemExit('The method - ' + command + ' - called in the agent_list is not declared (' + self.name)
else:
raise
except KeyboardInterrupt:
print('KeyboardInterrupt: %s, Current command: %s ~1984' % (self.name, command))
break
if command[0] != '_':
self.__reject_polled_but_not_accepted_offers()
self.__signal_finished()
#self.context.destroy()
the whole code is under http://www.github.com/DavoudTaghawiNejad/abce
Odds are it's not too many contexts, it's too many sockets. Looking through your repo, I see you're (correctly) using IPC as your transport; IPC uses a file descriptor as the "address" to pass data back and forth between different processes. If I'm reading correctly, you're opening up to 7 sockets per process, so that'll add up quickly. I'm betting that if you do some debugging in the middle of your code, you'll see that it doesn't fail when the last context is created, but when the last socket pushes the open file limit over the edge.
My understanding is that the typical user limit for open FDs is around 1000, so at around 100 agents you're pushing 700 open FDs just for your sockets. The remainder is probably just typical. There should be no problem increasing your limit up to 10,000, higher depending on your situation. Otherwise you'll have to rewrite to use less sockets per process to get a higher process limit.
This has nothing to do with zeromq nor python. It's the underlying operating system, that allow only up to a certain threshold of concurrently opened files. This limit includes normal files, but also socket connections.
You can see your current limit using ulimit -n, it will probably default to 1024. Machines running servers or having other reasons (like your multiprocessing) often require to set this limit higher or just to unlimited. – More info about ulimit.
Additionally, there's another global limit, however it's nothing I had to adjust yet.
In general, you should ask yourself, if you really need that many agents. Usually, X / 2X worker processes should be enough, where X corresponds to your CPU count.
You should increase the number of allowed open files for the process as in this question:
Python: Which command increases the number of open files on Windows?
the default per process is 512
import win32file
print win32file._getmaxstdio() #512
win32file._setmaxstdio(1024)
print win32file._getmaxstdio() #1024
I am doing a method where I am using 2 Parse.Cloud.httpRequest calls, with one being inside of the other. However, this method seem to fail with an alarming frequency. Like 1 in 5 tries, each time the error is:
Request failed with response code 500
{"uuid":"bc75e304-8964-30f9-c9d5-92fabf02f624","status":500,"error":{"code":-1,"error":"Request timed out"},"headers":{},"text":"{\"code\":124,\"error\":\"Request timed out\"}","cookies":{}}
I looked up code 124, and it corresponds to
Timeout 124 Error code indicating that the request timed out on the server. Typically this indicates that the request is too expensive to run.
I am only running a couple REST requests per minute and the run of the method does not exceed 3 seconds. I checked the same calls via REST and there is never any problems.
What's the cause for this problem and can I fix it by upgrading my parse account?