```
# try.py
import uos
dir = 16384
def walk(t): # recursive function
print('-',t)
w = uos.ilistdir(t)
for x in w:
L = list(x)
print(L[0], L[1], L[3])
if L[1] == dir:
walk(L[0])
else:
return
z = uos.ilistdir()
for x in z:
L = list(x)
print(L[0], L[1], L[3])
if L[1] == dir:
walk(L[0])
```
The code stops with an error on line 7, with an error:
Output:
Traverse.py 32768 773
boot.py 32768 139
lib 16384 0
-lib
one 16384 0
-one
Traceback (most recent call last):
File "stdin", line 21, in
File "stdi>", line 12, in walk
File "<tdin", line 7, in walk
OSError: [Errno 2] ENOENT
The directory structure is:
lib
one
two
three
three.py
boot.py
main.py
one.py
Traverse.py
It seems that it stops on a directory that has no files in it
Don't have an ESP to test, but there are some problems here:
you shouldn't return if the entry is a file but instead continue, this is why it stops too soon
you should skip the current and parent directory to avoid infinite recursion
when recursing you have to prepend the top directory, that is probably the reason for the error i.e. your code calls walk('two') but there is no such directory, it has to be one/two)
you can use the walk function on the current directory so that last bit where you duplicate the implementation isn't needed.
Additionally:
iterating ilistdir returns tuples which can be indexed as well so no need to convert it into a list
and passing collections to print directly also works, no need for separate print(x[0], x[1], ...)
Adpatation, with slightly different printing of full paths so it's easier to follow:
import uos
dir_code = 16384
def walk(t):
print('-', t)
for x in uos.ilistdir(t):
print(x)
if x[1] == dir_code and x[0] != '.' and x[0] != '..':
walk(t + '/' + x[0])
walk('.')
This will still print directories twice, add all that indexing makes things hard to read. Adaptation with tuple unpacking and printing directories once:
import uos
dir_code = 16384
def walk(top):
print(top)
for name, code, _ in uos.ilistdir(top):
if code != dir_code:
print(top + '/' + name)
elif name not in ('.', '..'):
walk(top + '/' + name)
walk('.')
Related
This program displays a home circuit breaker panel. the user can view what is on each breaker on the panel (data taken from an imported dictionary of entered breaker panel info) or the user can check what breakers control any list zone (kitchen basement, etc) The breakerville program closes when the user decides and is supposed to play a wave file at the close. It doesn't play after the program is made into an exe with pyinstaller just the windows 'beep'.
I am suspecting that I may need to edit the spec file to get the wave file to work after compiled. Is this correct and if so how? Do I need to modify the spec file?
from playsound import playsound # CURRENTLY USING
from chart import chart
from BreakerZones import BreakerZones
import time
import sys
import colorama
import yaml # to print the nested_lookup results(n) on separate lines
from nested_lookup import nested_lookup, get_all_keys # importing 2 items from nested_lookup
from colorama import Fore, Back, Style
colorama.init(autoreset=True) # If you don't want to print Style.RESET_ALL all the time,
# reset automatically after each print statement with True
print(colorama.ansi.clear_screen())
print('\n'*4) # prints a newline 4 times
print(Fore.MAGENTA + ' Arriving-' + Fore.GREEN + ' *** BREAKERVILLE USA ***')
def main():
print('\n' * 2)
print(Fore.BLUE + ' Breaker Numbers and Zones')
k = get_all_keys(BreakerZones)
# raw amount of keys even repeats , has quotes
new_l = [] # eliminate extra repeating nested keys
for e in k: # has quotes
if e not in new_l and sorted(e) not in new_l: #
new_l.append(e) #
print()
new_l.sort() # make alphabetical
newer_l = ('%s' % ', '.join(map(str, new_l)).strip("' ,")) # remove ['%s'] brackets so they don't show up when run
print(' ', yaml.dump(newer_l, default_flow_style=False)) # strip("' ,") or will see leading "' ," in output
print(Fore.BLUE + ' ENTER A BREAKER # OR ZONE', Fore.GREEN + ': ', end='')
i = input().strip().lower() # these lines is workaround for the colorama
print() # user input() issue of 'code' appearing in screen output
if i in k:
n = (nested_lookup(i, BreakerZones, wild=False, with_keys=False)) # wild=True means key not case sensitive,
print(yaml.dump(n, default_flow_style=False)) # 'with_keys' returns values + keys also
# for key, value in n.items(): eliminated by using yaml
# print(key, '--', value) eliminated by using yaml
else:
print(Fore.YELLOW + ' Typo,' + Fore.GREEN + ' try again')
main()
print()
print(Fore.GREEN + ' Continue? Y or N: C for breaker chart : ', end='') # see comments ENTER A BREAKER
ans = input().strip().lower() # strip() removes any spaces before or after user input
if ans == 'c':
chart()
print()
print(Fore.GREEN + ' Continue? Y or N : ', end='')
ans = input().strip().lower() # strip() removes any spaces before or after user input
if ans == 'y': # shorter version 'continue Y or N' after printing breaker chart
main()
else:
print()
print(Fore.MAGENTA + ' Departing -' + Fore.GREEN + ' *** BREAKERVILLE ***')
playsound('train whistle.wav')
time.sleep(2) # delay to exit program
sys.exit()
elif ans != 'y':
print()
print(Fore.MAGENTA + ' Good Day -' + Fore.GREEN + ' *** BREAKERVILLE ***')
playsound('train whistle.wav') #CURRENTLY USING
time.sleep(2) # delay to exit program
sys.exit()
else:
main()
main()
For the records: The issue is fixed by providing full path to the sound file.
This is probably linked to the implementation of playsound and how it determines what is the current working directory. Please refer to https://pyinstaller.readthedocs.io/en/stable/runtime-information.html#run-time-information for a better understanding of that topic with pyinstaller
Sphinx-autodoc flattens dicts, lists, and tuples - making long ones barely readable. Pretty-print format isn't always desired either, as some nested containers are better kept flattened than columned. Is there a way to display iterables as typed in source code?
Get it straight from source, and add an .rst command for it:
# conf.py
from importlib import import_module
from docutils import nodes
from sphinx import addnodes
from inspect import getsource
from docutils.parsers.rst import Directive
class PrettyPrintIterable(Directive):
required_arguments = 1
def run(self):
def _get_iter_source(src, varname):
# 1. identifies target iterable by variable name, (cannot be spaced)
# 2. determines iter source code start & end by tracking brackets
# 3. returns source code between found start & end
start = end = None
open_brackets = closed_brackets = 0
for i, line in enumerate(src):
if line.startswith(varname):
if start is None:
start = i
if start is not None:
open_brackets += sum(line.count(b) for b in "([{")
closed_brackets += sum(line.count(b) for b in ")]}")
if open_brackets > 0 and (open_brackets - closed_brackets == 0):
end = i + 1
break
return '\n'.join(src[start:end])
module_path, member_name = self.arguments[0].rsplit('.', 1)
src = getsource(import_module(module_path)).split('\n')
code = _get_iter_source(src, member_name)
literal = nodes.literal_block(code, code)
literal['language'] = 'python'
return [addnodes.desc_name(text=member_name),
addnodes.desc_content('', literal)]
def setup(app):
app.add_directive('pprint', PrettyPrintIterable)
Example .rst and result:
(:autodata: with empty :annotation: is to exclude the original flattened dictionary).
Some code borrowed from this answer.
Hi I am running this python script to remove over-representative sequences from my fastq files, but I keep getting the error. I am new to bioinfomatics and have been following a fixed set of pipeline for sequence assembly. I wanted to remove over-representative sequences with this script
python /home/TranscriptomeAssemblyTools/RemoveFastqcOverrepSequenceReads.py -1 R1_1.fq -2 R1_2.fq
**Here is the error
Traceback (most recent call last):
File "TranscriptomeAssemblyTools/RemoveFastqcOverrepSequenceReads.py", line 46, in
leftseqs=ParseFastqcLog(opts.l_fastqc)
File "TranscriptomeAssemblyTools/RemoveFastqcOverrepSequenceReads.py", line 33, in ParseFastqcLog
with open(fastqclog) as fp:
TypeError: coercing to Unicode: need string or buffer, NoneType found**
Here is the script :
import sys
import gzip
from os.path import basename
import argparse
import re
from itertools import izip,izip_longest
def seqsmatch(overreplist,read):
flag=False
if overreplist!=[]:
for seq in overreplist:
if seq in read:
flag=True
break
return flag
def get_input_streams(r1file,r2file):
if r1file[-2:]=='gz':
r1handle=gzip.open(r1file,'rb')
r2handle=gzip.open(r2file,'rb')
else:
r1handle=open(r1file,'r')
r2handle=open(r2file,'r')
return r1handle,r2handle
def FastqIterate(iterable,fillvalue=None):
"Grab one 4-line fastq read at a time"
args = [iter(iterable)] * 4
return izip_longest(fillvalue=fillvalue, *args)
def ParseFastqcLog(fastqclog):
with open(fastqclog) as fp:
for result in re.findall('Overrepresented sequences(.*?)END_MODULE', fp.read(), re.S):
seqs=([i.split('\t')[0] for i in result.split('\n')[2:-1]])
return seqs
if __name__=="__main__":
parser = argparse.ArgumentParser(description="options for removing reads with over-represented sequences")
parser.add_argument('-1','--left_reads',dest='leftreads',type=str,help='R1 fastq file')
parser.add_argument('-2','--right_reads',dest='rightreads',type=str,help='R2 fastq file')
parser.add_argument('-fql','--fastqc_left',dest='l_fastqc',type=str,help='fastqc text file for R1')
parser.add_argument('-fqr','--fastqc_right',dest='r_fastqc',type=str,help='fastqc text file for R2')
opts = parser.parse_args()
leftseqs=ParseFastqcLog(opts.l_fastqc)
rightseqs=ParseFastqcLog(opts.r_fastqc)
r1_out=open('rmoverrep_'+basename(opts.leftreads).replace('.gz',''),'w')
r2_out=open('rmoverrep_'+basename(opts.rightreads).replace('.gz',''),'w')
r1_stream,r2_stream=get_input_streams(opts.leftreads,opts.rightreads)
counter=0
failcounter=0
with r1_stream as f1, r2_stream as f2:
R1=FastqIterate(f1)
R2=FastqIterate(f2)
for entry in R1:
counter+=1
if counter%100000==0:
print "%s reads processed" % counter
head1,seq1,placeholder1,qual1=[i.strip() for i in entry]
head2,seq2,placeholder2,qual2=[j.strip() for j in R2.next()]
flagleft,flagright=seqsmatch(leftseqs,seq1),seqsmatch(rightseqs,seq2)
if True not in (flagleft,flagright):
r1_out.write('%s\n' % '\n'.join([head1,seq1,'+',qual1]))
r2_out.write('%s\n' % '\n'.join([head2,seq2,'+',qual2]))
else:
failcounter+=1
print 'total # of reads evaluated = %s' % counter
print 'number of reads retained = %s' % (counter-failcounter)
print 'number of PE reads filtered = %s' % failcounter
r1_out.close()
r2_out.close()
Maybe you already solved it, I had the same error but now is running well.
Hope this help
(1) Files we need:
usage: RemoveFastqcOverrepSequenceReads.py [-h] [-1 LEFTREADS] [-2 RIGHTREADS] [-fql L_FASTQC] [-fqr R_FASTQC
(2) Specify fastqc_data.text files that are in the fastqc output, unzip the output directory
'-fql','--fastqc_left',dest='l_fastqc',type=str,help='fastqc text file for R1'
'-fqr','--fastqc_right',dest='r_fastqc',type=str,help='fastqc text file for R2'
(3) Keep the reads and the fastqc_data text in the same directory
(4) Specify the path location before each file
python RemoveFastqcOverrepSequenceReads.py
-1 ./bicho.fq.1.gz -2./bicho.fq.2.gz
-fql ./fastqc_data_bicho_1.txt -fqr ./fastqc_data_bicho_2.txt
(5) run! :)
I am struggling to find a text comparison tool or algorithm that can compare an expected text against the current state of the text being typed.
I will have an experimentee typewrite a text that he has in front of his eyes. My idea is to compare the current state of the text against the expected text whenever something is typed. That way I want to find out when and what the subject does wrong (I also want to find errors that are not in the resulting text but were in the intermediate text for some time).
Can someone point me in a direction?
Update #1
I have access to the typing data in a csv format:
This is example output data of me typing "foOBar". Every line has the form (timestamp, Key, Press/Release)
17293398.576653,F,P
17293398.6885,F,R
17293399.135282,LeftShift,P
17293399.626881,LeftShift,R
17293401.313254,O,P
17293401.391732,O,R
17293401.827314,LeftShift,P
17293402.073046,O,P
17293402.184859,O,R
17293403.178612,B,P
17293403.301748,B,R
17293403.458137,LeftShift,R
17293404.966193,A,P
17293405.077869,A,R
17293405.725405,R,P
17293405.815159,R,R
In Python
Given your input csv file (I called it keyboard_records.csv)
17293398.576653,F,P
17293398.6885,F,R
17293399.135282,LeftShift,P
17293399.626881,LeftShift,R
17293401.313254,O,P
17293401.391732,O,R
17293401.827314,LeftShift,P
17293402.073046,O,P
17293402.184859,O,R
17293403.178612,B,P
17293403.301748,B,R
17293403.458137,LeftShift,R
17293404.966193,A,P
17293405.077869,A,R
17293405.725405,R,P
17293405.815159,R,R
The following code does the following:
Read its content and store it in a list named steps
For each step in steps recognizes what happened and
If it was a shift press or release sets a flag (shift_on) accordingly
If it was an arrow pressed moves the cursor (index of current where we insert characters) – if it the cursor is at the start or at the end of the string it shouldn't move, that's why those min() and max()
If it was a letter/number/symbol it adds it in curret at cursor position and increments cursor
Here you have it
import csv
steps = [] # list of all actions performed by user
expected = "Hello"
with open("keyboard.csv") as csvfile:
for row in csv.reader(csvfile, delimiter=','):
steps.append((float(row[0]), row[1], row[2]))
# Now we parse the information
current = [] # text written by the user
shift_on = False # is shift pressed
cursor = 0 # where is the cursor in the current text
for step in steps:
time, key, action = step
if key == 'LeftShift':
if action == 'P':
shift_on = True
else:
shift_on = False
continue
if key == 'LeftArrow' and action == 'P':
cursor = max(0, cursor-1)
continue
if key == 'RightArrow' and action == 'P':
cursor = min(len(current), cursor+1)
continue
if action == 'P':
if shift_on is True:
current.insert(cursor, key.upper())
else:
current.insert(cursor, key.lower())
cursor += 1
# Now you can join current into a string
# and compare current with expected
print(''.join(current)) # printing current (just to see what's happening)
else:
# What to do when a key is released?
# Depends on your needs...
continue
To compare current and expected have a look here.
Note: by playing around with the code above and a few more flags you can make it recognize also symbols. This will depend on your keyboard. In mine Shift + 6 = &, AltGr + E = € and Ctrl + Shift + AltGr + è = {. I think this is a good point to start.
Update
Comparing 2 texts isn't a difficult task and you can find tons of pages on the web about it.
Anyway I wanted to present you an object oriented approach to the problem, so I added the compare part that I previously omitted in the first solution.
This is still a rough code, without primary controls over the input. But, as you asked, this is pointing you in a direction.
class UserText:
# Initialize UserText:
# - empty text
# - cursor at beginning
# - shift off
def __init__(self, expected):
self.expected = expected
self.letters = []
self.cursor = 0
self.shift = False
# compares a and b and returns a
# list containing the indices of
# mismatches between a and b
def compare(a, b):
err = []
for i in range(min(len(a), len(b))):
if a[i] != b[i]:
err.append(i)
return err
# Parse a command given in the
# form (time, key, action)
def parse(self, command):
time, key, action = command
output = ""
if action == 'P':
if key == 'LeftShift':
self.shift = True
elif key == 'LeftArrow':
self.cursor = max(0, self.cursor - 1)
elif key == 'RightArrow':
self.cursor = min(len(self.letters), self.cursor + 1)
else:
# Else, a letter/number was pressed. Let's
# add it to self.letters in cursor position
if self.shift is True:
self.letters.insert(self.cursor, key.upper())
else:
self.letters.insert(self.cursor, key.lower())
self.cursor += 1
########## COMPARE WITH EXPECTED ##########
output += "Expected: \t" + self.expected + "\n"
output += "Current: \t" + str(self) + "\n"
errors = UserText.compare(str(self), self.expected[:len(str(self))])
output += "\t\t"
i = 0
for e in errors:
while i != e:
output += " "
i += 1
output += "^"
i += 1
output += "\n[{} errors at time {}]".format(len(errors), time)
return output
else:
if key == 'LeftShift':
self.shift = False
return output
def __str__(self):
return "".join(self.letters)
import csv
steps = [] # list of all actions performed by user
expected = "foobar"
with open("keyboard.csv") as csvfile:
for row in csv.reader(csvfile, delimiter=','):
steps.append((float(row[0]), row[1], row[2]))
# Now we parse the information
ut = UserText(expected)
for step in steps:
print(ut.parse(step))
The output for the csv file above was:
Expected: foobar
Current: f
[0 errors at time 17293398.576653]
Expected: foobar
Current: fo
[0 errors at time 17293401.313254]
Expected: foobar
Current: foO
^
[1 errors at time 17293402.073046]
Expected: foobar
Current: foOB
^^
[2 errors at time 17293403.178612]
Expected: foobar
Current: foOBa
^^
[2 errors at time 17293404.966193]
Expected: foobar
Current: foOBar
^^
[2 errors at time 17293405.725405]
I found the solution to my own question around a year ago. Now i have time to share it with you:
In their 2003 paper 'Metrics for text entry research: An evaluation of MSD and KSPC, and a new unified error metric', R. William Soukoreff and I. Scott MacKenzie propose three major new metrics: 'total error rate', 'corrected error rate' and 'not corrected error rate'. These metrics have become well established since the publication of this paper. These are exaclty the metrics i was looking for.
If you are trying to do something similiar to what i did, e.g. compare the writing performance on different input devices this is the way to go.
I'm trying to run a kind of simulation in Python for loop in parallel using Dask multiprocessing. Parallelization works fine when number of iterations is fairly low but fails when the amount increases. The issue occurs on Win7 (4 cores, 10 Gb RAM), Win10 (8 cores, 8 Gb RAM) and Azure VM running Windows Server 2016 (16 cores, 32 Gb RAM). The slowest one, Win7, can go through most iterations before failing. The issue can be mitigated by adding long enough sleep time at the end of each function included in the process, but the required amount of sleeping results in very low performance, similar to running sequentially.
I hope someone will be able to help me out here. Thanks in advance for comments and answers!
The following simple code contains some phases of the for loop and repeats the error.
import json
import pandas as pd
from pymongo import MongoClient
# Create random DataFrame
df = pd.DataFrame(np.random.randint(0,100,size=(100,11)), columns=list('ABCDEFGHIJK'))
# Save to Mongo
client = MongoClient()
db = client.errordemo
res = db.errordemo.insert_many(json.loads(df.to_json(orient='records')))
db.client.close()
class ToBeRunParallel:
def __init__(self):
pass
def functionToBeRunParallel(self, i):
# Read data from mongo
with MongoClient() as client:
db = client.errordemo
dataFromMongo = pd.DataFrame.from_records(db.errordemo.find({}, {'_id': 0}))
# Randomize data
dataRand = dataFromMongo.apply(pd.to_numeric).apply(rand, volatility=0.1)
# Sum rows
dataSum = dataRand.sum(axis=1)
# Select randomly one of the resulting values and return
return dataSum.sample().values[0]
Call the function functionToBeRunParallel either in console or Jupyter (both fail). 'errordemo' is a local module containing the class ToBeRunParallel. While running the on Azure VM, the code succeeds with 500 loops and fails with 5,000.
import errordemo
from dask import delayed, compute, multiprocessing
# Determine how many times to loop
rng = range(15000)
# Define empty result lists
resList = []
# Create instance
err = errordemo.ToBeRunParallel()
# Loop in parallel using Dask
for i in rng:
sampleValue = delayed(err.functionToBeRunParallel)(i)
resList.append(sampleValue)
# Compute in parallel
result = compute(*resList, get=multiprocessing.get)
The error stack in Jupyter is as follows.
---------------------------------------------------------------------------
AutoReconnect Traceback (most recent call last)
<ipython-input-3-9f535dd4c621> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', '# Determine how many times to loop\nrng = range(50000)\n\n# Define empty result lists\nresList = []\n\n# Create instance\nerr = errordemo.ToBeRunParallel()\n\n# Loop in parallel using Dask\nfor i in rng:\n sampleValue = delayed(err.functionToBeRunParallel)(i)\n resList.append(sampleValue)\n \n# Compute in parallel \nresult = compute(*resList, get=dask.multiprocessing.get)')
C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2113 magic_arg_s = self.var_expand(line, stack_depth)
2114 with self.builtin_trap:
-> 2115 result = fn(magic_arg_s, cell)
2116 return result
2117
<decorator-gen-60> in time(self, line, cell, local_ns)
C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\magics\execution.py in time(self, line, cell, local_ns)
1178 else:
1179 st = clock2()
-> 1180 exec(code, glob, local_ns)
1181 end = clock2()
1182 out = None
<timed exec> in <module>()
C:\ProgramData\Anaconda3\lib\site-packages\dask\base.py in compute(*args, **kwargs)
200 dsk = collections_to_dsk(variables, optimize_graph, **kwargs)
201 keys = [var._keys() for var in variables]
--> 202 results = get(dsk, keys, **kwargs)
203
204 results_iter = iter(results)
C:\ProgramData\Anaconda3\lib\site-packages\dask\multiprocessing.py in get(dsk, keys, num_workers, func_loads, func_dumps, optimize_graph, **kwargs)
85 result = get_async(pool.apply_async, len(pool._pool), dsk3, keys,
86 get_id=_process_get_id,
---> 87 dumps=dumps, loads=loads, **kwargs)
88 finally:
89 if cleanup:
C:\ProgramData\Anaconda3\lib\site-packages\dask\async.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, dumps, loads, **kwargs)
498 _execute_task(task, data) # Re-execute locally
499 else:
--> 500 raise(remote_exception(res, tb))
501 state['cache'][key] = res
502 finish_task(dsk, key, state, results, keyorder.get)
AutoReconnect: localhost:27017: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
Traceback
---------
File "C:\ProgramData\Anaconda3\lib\site-packages\dask\async.py", line 266, in execute_task
result = _execute_task(task, data)
File "C:\ProgramData\Anaconda3\lib\site-packages\dask\async.py", line 247, in _execute_task
return func(*args2)
File "C:\Git_repository\footie\Pipeline\errordemo.py", line 20, in functionToBeRunParallel
dataFromMongo = pd.DataFrame.from_records(db.errordemo.find({}, {'_id': 0}))
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 981, in from_records
first_row = next(data)
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\cursor.py", line 1090, in next
if len(self.__data) or self._refresh():
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\cursor.py", line 1012, in _refresh
self.__read_concern))
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\cursor.py", line 850, in __send_message
**kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\mongo_client.py", line 844, in _send_message_with_response
exhaust)
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\mongo_client.py", line 855, in _reset_on_error
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\server.py", line 99, in send_message_with_response
with self.get_socket(all_credentials, exhaust) as sock_info:
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 82, in __enter__
return next(self.gen)
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\server.py", line 163, in get_socket
with self.pool.get_socket(all_credentials, checkout) as sock_info:
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 82, in __enter__
return next(self.gen)
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\pool.py", line 582, in get_socket
sock_info = self._get_socket_no_auth()
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\pool.py", line 618, in _get_socket_no_auth
sock_info, from_pool = self.connect(), False
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\pool.py", line 555, in connect
_raise_connection_failure(self.address, error)
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\pool.py", line 65, in _raise_connection_failure
raise AutoReconnect(msg)
UPDATE:
Following this post, I created a decorator to catch AutoReconnect exception like shown below. Together with parameters for MongoClient the looping works, but it's still very slow, double the time it should take. (timing on the Azure VM):
500 iterations: 3.74s
50,000 iterations: 12min 12s
def safe_mongocall(call):
def _safe_mongocall(*args, **kwargs):
for i in range(5):
try:
return call(*args, **kwargs)
except errors.AutoReconnect:
sleep(random.random() / 100)
print('Error: Failed operation!')
return _safe_mongocall
#safe_mongocall
def functionToBeRunParallel(self, i):
# Read data from mongo
with MongoClient(connect=False, maxPoolSize=None, maxIdleTimeMS=100) as client:
db = client.errordemo
dataFromMongo = pd.DataFrame.from_records(db.errordemo.find({}, {'_id': 0}))
# Randomize data
dataRand = dataFromMongo.apply(pd.to_numeric).apply(rand, volatility=0.1)
# Sum rows
dataSum = dataRand.sum(axis=1)
# Select randomly one of the resulting values and return
return dataSum.sample().values[0]
The actual issue is exhausting of TCP/IP ports, hence the solution is to avoid exhaustion. Following article by Microsoft, I added the following registry keys and values to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters:
MaxUserPort: 65534
TcpTimedWaitDelay: 30