How to get progress bar with tqdm in a for loop over directory - for-loop

I am trying to conditionally load some files from a directory. I would like to have a progress bar from tqdm on the process. I currently running this:
loaddir = r'D:\Folder'
# loop the files in the directory
print('Data load initiated')
for subdir, dirs, files in os.walk(loaddir_res):
for name in tqdm(files):
if name.startswith('Test'):
#do things
which gives
Data load initiated
0%| | 0/6723 [00:00<?, ?it/s]
0%| | 26/6723 [00:00<00:28, 238.51it/s]
1%| | 47/6723 [00:00<00:31, 213.62it/s]
1%| | 72/6723 [00:00<00:30, 220.84it/s]
1%|▏ | 91/6723 [00:00<00:31, 213.59it/s]
2%|▏ | 115/6723 [00:00<00:30, 213.73it/s]
This has two problems:
When progress is updated a new line appears in my IPython console in Spyder
I am actually timing the loop over the files and not over the files that start with 'Test' and therefore progress and remaining time are not accurate.
However, if I try this:
loaddir = r'D:\Folder'
# loop the files in the directory
print('Data load initiated')
for subdir, dirs, files in os.walk(loaddir_res):
for name in files:
if tqdm(name.startswith('Test')):
#do things
I get the following error.
Traceback (most recent call last):
File "<ipython-input-80-b801165d4cdb>", line 21, in <module>
if tqdm(name.startswith('Probe')):
TypeError: 'NoneType' object cannot be interpreted as an integer
I would like to have a progress bar in only one line that updates whenever the startswith loop is activated.
----UPDATE----
I also found out here that it can also be used like this:
files = [f for f in tqdm(files) if f.startswith('Test')]
Which allows to track progress with list comprehension by wrapping the iterable with tqdm. However in spyder this results in a separate line for each progress update.
----UPDATE2----
It actually works fine in spyder. Sometimes if the loop fails, it might go back to printing one line of progress update. But i haven't seen this very often after the latest updates.

firstly the answer:
loaddir = r'D:\surfdrive\COMSOL files\Batch folder\Current batch simulation files'
# loop the files in the directory
print('Data load initiated')
for subdir, dirs, files in os.walk(loaddir_res):
files = [f for f in files if f.startswith('Test')]
for name in tqdm(files):
#do things
This will work in any decent environment (including a bare terminal). The solution is to not give tqdm the unused filenames. You may find https://github.com/tqdm/tqdm/wiki/How-to-make-a-great-Progress-Bar insightful.
Secondly the issue with multiple lines output is well-known and due to some environments being broken (https://github.com/tqdm/tqdm#faq-and-known-issues) by not supporting carriage return (\r).
The correct links for this problem in Spyder are https://github.com/tqdm/tqdm/issues/512 and https://github.com/spyder-ide/spyder/issues/6172

(Spyder maintainer here) This is a known limitation of TQDM progress bars in Spyder. I'd recommend you to open an issue about it in its Github repository.

Specify position=0 and leave=True like this:
for i in tqdm(range(10), position=0, leave=True):
# Some code
Or in a list comprehension:
nums = [i for i in tqdm(range(10), position=0, leave=True)]
It's worth to mention that you can set `position=0` and `leave=True` to be the default settings, so you won't need to specify them each time, like this:
from tqdm import tqdm
from functools import partial
tqdm = partial(tqdm, position=0, leave=True) # this line does the magic
# for loop
for i in tqdm(range(10)):
# Some code
# list comprehension
nums = [for i in tqdm(range(10))]

Related

Is there a way to change the working directory of fiddle?

I'm trying to load a C shared library within Ruby using Fiddle.
Here is a minimal example:
require 'fiddle'
require 'fiddle/import'
module Era
extend Fiddle::Importer
dlload './ServerApi.so'
extern 'int era_init_lib()'
extern 'void era_deinit_lib()'
extern 'int era_process_request(const char* request, char** response)'
extern 'void era_free(char* response)'
end
Era.era_init_lib
begin
# ...
ensure
Era.era_deinit_lib
end
The shared library loads without issues. However when I call Era.era_init_lib it tries to load additional libraries (Network.so and Protobuf.so). I have these file located in the current working directory (in the same directory as ServerApi.so).
However when I try to execute the code above I receive the following error:
! Failed to load library: /home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so, error: /home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so: cannot open shared object file: No such file or directory
If I place the file at the location the error describes everything works fine.
My guess is that the C working directory of fiddle is different from the Ruby working directory. I would like to keep the project files within the project and not in the Ruby installation directory.
How can I use Network.so from my project folder?
All the *.so files are provided by a third-party. I do not have the source and as a result cannot change these files. The function signatures are provided by the documentation.
Searching for Network.so in the strace gives me these results:
readlink("/proc/self/exe", "/home/username/.rvm/rubies/ruby-2."..., 4096) = 44
openat(AT_FDCWD, "/home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
futex(0x7fcc16666d90, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7fcc16b44520, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "! Failed to load library: ", 26! Failed to load library: ) = 26
write(2, "/home/username/.rvm/rubies/ruby-2."..., 50/home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so) = 50
write(2, ", error: ", 9, error: ) = 9
write(2, "/home/username/.rvm/rubies/ruby-2."..., 109/home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so: cannot open shared object file: No such file or directory) = 109
write(2, "\n", 1) = 1
I've also written a C script which does the same thing which works perfectly fine when the files are dropped into the same directory. So it might be the fault of the library, which I assume checks the location of the current running program, then tries to load the library from that folder. This would explain the behavior when ran as a Ruby script (since it runs as part of the Ruby program), whereas a C binary runs standalone.
For those that want to re-create the (Linux) issue. You can download the necessary files from here. Which gives you the server-linux-x86_64.sh file.
Supported distros are: Suse, Ubuntu, Debian, Red Hat and CentOS but others may also work fine.
You can either run the installer, which should place the files in /opt/eset/RemoteAdministrator/Server. Or, assuming most of you don't want to install the full application you can run the following command:
sed '1,/^# Start of TAR\.GZ file #$/d' server-linux-x86_64.sh | sed '1d' > server-linux-x86_64.tar.gz
Which removes all the installer instructions from the .sh file and only leaves the binary .tar.gz data, writing it to server-linux-x86_64.tar.gz.
Copy the files ServerApi.so, Protobuf.so and Network.so into a directory of your liking. Create a Ruby script (with the question code) in the same directory and run the script.
Because ServerApi.so checks /proc/self/exe for the location of all subsequent files to load, and it is very difficult to modify this target by normal means, it is easier to just modify ServerApi.so itself so that it uses something else besides proc for the source.
If we run strings ServerApi.so, we can verify that the location to check is stored inside a string in ServerApi.so:
strings ServerApi.so | grep 'proc/self/exe'
B/proc/self/exe
So now all we need to do is modify this string to something else that works for us.
The easiest way to modify the string is to replace it with something that is exactly the same length as the original. This way we do not have to worry about changing the end-of-string zero padding or accidentally changing the total size of ServerApi.so.
Here we can see a suitable candidate could be /tmp/scriptexe:
/proc/self/exe
/tmp/scriptexe <- same length
So let's do that:
sed -e 's/proc\/self\/exe/tmp\/scriptexe/' ServerApi.so > ServerApi_Mod.so
Now we can verify the change:
strings ServerApi_Mod.so | grep scriptexe
B/tmp/scriptexe
Next we need to create /tmp/scriptexe to actually point to our Ruby script:
ln -s /the/full/path/to/our/ruby/script.rb /tmp/scriptexe
Then we modify our script:
dlload './ServerApi_Mod.so
Now we can run it as normal:
ruby script.rb
And everything should work.
If we read the strace output we see that the library obtains the current executable location from /proc/self/exe, and then searches subsequent libraries from there.
/proc/self/exe is not easily modifiable, but by using a hard link to a Ruby executable in the current directory we can trick it to point to a new folder.
Problem is making a hard link requires root.
In any case, here is a self-contained solution (note that it will ask for root password the first time you run it, in order to create the hard link).
Put this at the top of your script:
# Obtain path to current executable
exe = File.readlink("/proc/self/exe")
# Check if we are running the hard-liked version
if !exe.match /localruby/
if !File.exist?('localruby')
# Create a hard link to the current Ruby exe using sudo
system("sudo ln #{exe} localruby")
end
puts "Restarting..."
# In order to prevent infinite busy loop in case of some mishap
sleep 1
# Rerun self using the hard-linked Ruby executable.
# This will make /proc/self/exe point to the hard-link, which then
# allows the ESET library to search for .so files in current folder.
exec('./localruby', File.expand_path(__FILE__))
end
require 'fiddle'
require 'fiddle/import'
# ...rest of your script goes here...
A simple solution without any extra Ruby code is to just create the hard link manually, and then always run the script with ./localruby myscript.rb, instead of using the normal ruby myscript.rb.

Some .txt files will open with the wrong encoding

I have been working on a project for a while now, and I just reached another big step! However, for some .txt files that my program creates, it will give me this message:
File was loaded in the wrong encoding: 'UTF-8'
Most of the .txt files are fine, but it gives me this error for others at the top (I can still read them). Here is my code:
from socket import *
import codecs
import subprocess
ipa = '192.168.1.' # These are the first 3 digits of the IP addresses that the program looks for.
def is_up(adr):
s = socket(AF_INET, SOCK_STREAM)
s.settimeout(0.01)
if not s.connect_ex((adr, 135)):
s.close()
return 1
else:
s.close()
def main():
for i in range(1, 256):
adr = ipa + str(i)
if is_up(adr):
with codecs.open("" + getfqdn(adr) + ".txt", "w+", 'utf-8-sig') as f:
subprocess.run('ipconfig | findstr /i "ipv4"', stdout=f, shell=True, check=True)
subprocess.run('wmic/node:'+adr+' product get name, version, vendor', stdout=f, shell=True, check=True)
main()
# Most code provided by Ashish Jain
Unfortunately I don't think I'm allowed to say exactly which files are giving me trouble, because I might be distributing information that someone can use for malicious intent.
Since your script only writes to files, there's no reason to open it in w+ mode, which enables reading. Opening the files in w mode should be enough.
Furthermore, the commands that your script runs must not be outputting in utf-8-sig-encoded text, and hence the error. In most cases outputting with default encoding by not specifying an encoding will suffice.
Lastly, you're missing a space between wmic and /node: in the second command you run.

Use Python to search through output of external .exe

I am trying to use Python to build a 'wrapper' around an external .exe file. The file, when run, will reply back something like the following:
Ignoring profile '\\MachineName\C$\Users\UserName1' (reason: directory inclusion)
Ignoring profile '\\MachineName\C$\Users\UserName2' (reason: directory inclusion)
The following user profiles match the deletion criteria:
\\MachineName\C$\Users\UserName3
There could be any number of ignored profiles and any number of matching profiles or none.
What I would like to know is can I get Python to search the output for this exe and then do something else if there is a matching profile?
The code to run the exe is simply:
subprocess.Popen(c:\delprof2\DelProf2.exe /l, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Thanks!
# Python 2.7
import subprocess
def extractProfiles(source):
result = []
for line in source:
line = line.strip()
if not line: continue
result.append(line)
profiles = []
proc = subprocess.Popen(....)
for line in proc.stdout:
if line.strip() != 'The following user profiles match the deletion criteria:':
continue
profiles = extractProfiles(proc.stdout)
break
# now do something with the profiles
Some caveats:
The above is somewhat brittle in that it's looking for an exact match of when to start remembering profiles ("The following user profiles...") in the output of the subprocess. If you're not sure it will be exactly that sentence character-for-character, it may be worth using the re module and a regular expression to find it.
The above assumes that the "matching" profiles are the only things that will appear in the subprocess's output after that triggering sentence is seen ("The following user profiles..."). You'd have to do something to detect the delimiter signalling the end of the list of profiles if that is not the case.

Python: Check if a directory is an alias

Does python have a simple function for checking if a directory is an actual directory or if it's just an alias to another directory? I'm trying to list all files/folders in a directory but because of these alias folders, I'm getting a lost of stuff that looks like this:
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bundle/Home/bundle/Home/bundle/Home/bundle/Home/bundle/Home/bundle/Home/bundle/Home/bundle/Home/bundle/Home/bundle/Home/bundle
I know I can write a function that will compare paths and quit if it seems like I'm going in circles, but is there a simple function that does exactly that that I'm not aware of?
E.g.
os.isAlias( …pathname… )
Thanks!
Here's a version of os.path.realpath that works on Mac aliases as well as on symbolic links under Python 2:
from Carbon import File
def osx_realpath (path):
return File.FSResolveAliasFile(path, True)[0].as_pathname()
If you call osx_realpath on each directory before you recurse into it you should avoid duplication. Alternatively you could define something like
def is_osx_realpath (path):
return path == osx_realpath(path)
Here you have to worry a little about false negatives, however. If you filter for is_osx_realpath and the path you start with is an alias, your program will stop without looking at anything.
So far I don't know of a way to do this under Python 3. I have a question here where I'm hoping for an answer. Right now I can't do better than using subprocess.call to invoke something that does the check on the command line.
EDIT: I should add that not only is Carbon.File not available in Python 3, but it is deprecated and so is best avoided in Python 2 as well--however it's the most pragmatic solution I know of for Python 2 at present.
EDIT 2: here is a way to check if a file is an alias that I believe to be Python 3-friendly. However, I don't have code to resolve the alias. I believe you need PyObjC installed.
from AppKit import NSWorkspace
def is_alias (path):
uti, err = NSWorkspace.sharedWorkspace().typeOfFile_error_(
os.path.realpath(path), None)
if err:
raise Exception(unicode(err))
else:
return "com.apple.alias-file" == uti
(source)
The answer above is incorrect.
While it is true that Finder reports symlinks as alias, they are distinct things.
Symlinks are a basic feature of UNIX, but alias are a Apple only feature.
If you doubt this create a symlink to a directory and an alias. The symlink will be small typically 50-100 bytes, whereas the alias can be several MB.
os.path.islink( … ) will report symlinks, but not alias.
I am not sure how you would find them in Python, but the following link shows other methods.
https://stackoverflow.com/a/21151368/838253
You can check whether a file or directory is an alias with the GetFileInfo command in Mac OS X. GetFileInfo -aa foo prints a line with "1" if foo is an alias and "0" if not.
import subprocess
def is_alias(path):
return subprocess.check_output(["GetFileInfo", "-aa", path]) == "1\n"
Seems a little sad to spawn a process for every check, but I think this works with versions of Mac OS X since probably 10.4.4 (2006), 32-bit, 64-bit, Python 2 and Python 3. The version of GetFileInfo I have (from 2009) is a "universal" i386 + PPC binary.
GetFileInfo is part of Xcode, which is large, but you can download the command-line tools separately (see the "Separate Download" section here).
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/GetFileInfo.1.html
Old question, but I just ran into this myself.
I have no perfect method for checking if the file is an alias, however, if using mimetypes, python will return None for an alias or a symlink. Might be useful in some situations. I've only tested this in python 3.8 on macOS Big Sur.
import mimetypes
for idx, f in enumerate(filepaths):
type = mimetypes.guess_type(f)[0]
print(f"type is: {type}")
returns (without my added comments):
type is: None # <-- Folder Alias
type is: None # <-- File Alias
type is: text/x-python
type is: None # <-- Folder Alias
type is: video/mp4
type is: image/png
type is: None # <-- Folder Alias
type is: None # <-- Symlink
type is: image/png
type is: application/zip
type is: image/png
type is: image/jpeg
type is: None # <-- Symlink
I ran some files through exiftool just to see what types they returned, and aliases and symlinks both showed the following:
File Type : ALIAS
File Type Extension : alias
MIME Type : application/x-macos
You might be able to init the mimetypes for these, but haven't tested and not sure if it will give false positives if anything else shows up as application/x-macos

Completely disable IPython output caching

I'm dealing with some GB-sized numpy arrays in IPython. When I delete them, I definitely want them gone, in order to recover the memory. IPythons output cache is quite annoying there, as it keeps the objects alive even after deleting the last actively intended reference to them. I already set
c.TerminalInteractiveShell.cache_size = 0
in the IPython configuration, but this only disables caching of entries to _oh, the other variables like _, __ and so on are still created. I'm also aware of %xdel, but anyways, I'd prefer to disable it completely, as I rarely use the output history anyways, so that a plain del would work again right away.
Looking at IPython/core/displayhook.py Line 209-214 I would say that it is not configurable. You could try making a PR to add an option to disable it totally.
Enter
echo "__builtin__._ = True" > ~/.config/ipython/profile_default/startup/00-disable-history.py
and your history should be gone.
Edit:
Seems like the path to the config directory is sometimes a bit different, either ~/.config/ipython or just ~/.ipython/. So just check which one you got and adjust the path accordingly. The solution still works with jupyter console.
Seems that we can suppress the output cache by putting a ";" at the end of the line now.
See http://ipython.org/ipython-doc/stable/interactive/tips.html#suppress-output
Create an ipython profile:
!ipython profile create
The output might be (for ipython v4.0):
[ProfileCreate] Generating default config file: '/root/.ipython/profile_default/ipython_config.py'
[ProfileCreate] Generating default config file: '/root/.ipython/profile_default/ipython_kernel_config.py'
Then add the line 'c.InteractiveShell.cache_size = 0' to the ipython_kernel_config.py file by
!echo 'c.InteractiveShell.cache_size = 0' >> /root/.ipython/profile_default/ipython_kernel_config.py
Load another ipython kernel and check if this work
In [1]: 123
Out[1]: 123
In [2]: _1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-51-21553803e553> in <module>()
----> 1 _1
NameError: name '_1' is not defined
In [3]: len(Out)
Out[3]: 0

Resources