How does a shell script read the data in a batch test folder - shell

I recently replicated a SEGAN experiment based on TensorFlow0.12.1.The author provides a shell script for testing (clean_wav.sh), as shown in the figure below:
This is the original version provided by the author. According to the path of my test data, the modified version is as follows:
Noisy_testset_wav_16k is my test data folder, but running the script system will report an error:
This folder is a directory, but when I change the path to:
NOISY_WAVNAME='/home/zyf/SEGAN/ SEGAN/segan-master1/noisy_testset_wav_16k/p232_023.wav'
the script runs normally and the program function can also be achieved.
However, only one audio file can be processed at a time and cannot be processed in batches.Hope everybody knows reason or have opinion, can give me give advice or two, thank very much.

The code is written in the way it only processes file, you can add a loop in shell script to process all files in the folder:
for f in $NOISY_WAVDIR/*.wav; do
python main.py --init_noise_std 0. --save_path segan_v1.1 \
--batch_size 100 --g_nl prelu --weights SEGAN-41700 \
--preemph 0.95 --bias_deconv True \
--bias_downconv True --bias_D_conv True \
--test_wav $f --save_clean_path $SAVE_PATH
done
but that would not be optimal use of the GPU since you do not process audio in batch. Ideally you'd want to modify python code to process audio in batches but that would not be a trivial task.

Related

Using gitpython to get current hash does not work when using qsub for job submission on a cluster

I use python to do my data analysis and lately I came up with the idea to save the current git hash in a log file so I can later check which code version created my results (in case I find inconsistencies or whatever).
It works fine as long as I do it locally.
import git
import os
rep = git.Repo(os.getcwd(), search_parent_directories=True)
git_hash = rep.head.object.hexsha
with open ('logfile.txt', 'w+') as writer:
writer.write('Code version: {}'.format(git_hash))
However, I have a lot of heavy calculations that I run on a cluster to speed things up (run analyses of subjects parallel), using qsub, which looks more or less like this:
qsub -l nodes=1:ppn=12 analysis.py -q shared
This always results in a git.exc.InvalidGitRepositoryError.
EDIT
Printing os.getcwd() showed me, that on the cluster the current working dir is always my $HOME directory no matter from where I submit the job.
My next solution was to get the directory where the file is located using some of the solutions suggested here.
However, these solutions result in the same error because (that's how I understand it) my file is somehow copied to a directory deep in the root structure of the cluster's headnode (/var/spool/torque/mom_priv/jobs).
I could of course write down the location of my file as a hardcoded variable, but I would like a general solution for all my scripts.
So after I explained my problem to IT in detail, they could help me solve the problem.
Apparently the $PBS_O_WORKDIR variable stores the directory from which the job was committed.
So I adjusted my access to the githash as follows:
try:
script_file_directory = os.environ["PBS_O_WORKDIR"]
except KeyError:
script_file_directory = os.getcwd()
try:
rep = git.Repo(script_file_directory, search_parent_directories=True)
git_hash = rep.head.object.hexsha
except git.InvalidGitRepositoryError:
git_hash = 'not-found'
# create a log file, that saves some information about the run script
with open('logfile.txt'), 'w+') as writer:
writer.write('Codeversion: {} \n'.format(git_hash))
I first check if the PBS_O_WORKDIR variable exists (hence if I run the script as a job on the cluster). If it does get the githash from this directory if it doesn't use the current working directory.
Very specific, but maybe one day someone has the same problem...

Creating a production ready binary from Julia code

I have a Julia program that inputs a csv and transforms the data via a bunch of functions, and outputs a csv file. I want to turn this into a binary so that I can run on different machines without having the source code on different machines.
I am looking at PackageCompiler.jl, but I can't find any understandable documentation for creating a binary app. I am trying:
using PackageCompiler
#time create_app("JuliaPrograms", "test"; precompile_execution_file="script.jl")
The file that contains all my code is script.jl and it lives in the dir JuliaPrograms, and I want the compiled binary to be named test.
When I run julia script.jl it performs as I want. I want to be able to run ./test with the same result.
However, I get this error:
ERROR: could not find project at "/Users/userx/JuliaPrograms/"
What am I doing wrong? Do I need some special project directory?
Per the docs here: https://julialang.github.io/PackageCompiler.jl/dev/apps.html#Creating-an-app-1 you need to make sure you define:
function julia_main()::Cint
# do something based on ARGS?
return 0 # if things finished successfully
end
a function called julia_main as the entry point to the app. You can find an example app here: https://github.com/JuliaLang/PackageCompiler.jl/tree/master/examples/MyApp
You may also want to check the location of the code itself. Is it being saved at "/Users/userx/JuliaPrograms/"? You can switch your directory in the Julia Reply by typing ; which will enter you into shell mode and then you can cd into the directory where your code is.

Shell script to verify data packages

I need to make shell script to check my algorithms with loads of data(tests packages saved in .in files, every package contains folder with .in file and the other one with .out file where supposed to be correct result)
Sometimes It's about 1000 files in one packages so there's no point of doing it manually. I need some kind of loop which opens this .in file then redirect input of my c++ program and also redirect output of this program(save result to .out files) But the point is I can't get this language as quick as I need.
And I would like this script to compare results of my algorithm to .out files from packages
for f in ExternalIn/*.in; do//part of code which opens process with my algorithm and compare its .out file to .out file from package
Skipping checks for missing files, whitespace-safety, etc., you probably need something like:
for f in ExternalIn/*.in; do
# diff the result of my_cpp_app eating file.in with file.out
# and store the comparison result in file.diff
diff ${f/.in/.out} <(my_cpp_app <$f 2>/dev/null) > ${f/.in/.diff}
done
Although I would probably do it with find / xargs pipeline which is not only safer but also allows parallel execution.
Or even write a Makefile for this and use make, which after all is a tool for exactly this kind of work.

Which performs better in python, subprocess or open file and go line-by-line?

I have a script in python that does the following:
For a folder of XML files (each file lacks a docroot) :
Read the first 7 lines of source file, but do nothing with them as they need to "not be in the output"
Write a new file (in separate directory) that starts with XML tag & opening Docroot /
Parent Tag
While still reading source file at line 8, go line-by-line and append same new file
Append a closing Docroot / Parent tag to end of new file
Inspiration from John Machin Feb 1 2011
I have a similar solution using bash and sed.
The project sponsor is looking to have the script get called by AWS Lambda, and as such, is leaning towards python as the script's language.
I'm looking for a performance boost and scaling (the source files range in size from 2 MB to 241 MB, and may be larger in the future).
Is it better to stick to Pure Python solution, or use Python, but call out to the sed routines or run the bash script using the subprocess module? Thanks.
Per the advice from Jean-François Fabre , I used subprocess.call() and it worked great. Thanks again for your help

Writing to popen and reading back several files in Ruby

I need to run some shell commands on a number of files and sometimes I get back more than one file in response. The question is: How can I read back several files from IO.popen in Ruby?
For instance, imagine the following case:
file = grid.get(record['_id']) # fetch a file from database
IO.popen('tar -Oxmz', 'ab') {|pipe| pipe.write(file.read)} # pass to tar and extract
This necessitates that I reread all the extracted files from the filesystem. I figured out this is the speed bottleneck of my script and I wonder if I can accomplish the same task in-memroy. I tried the following:
file = grid.get(record['_id'])
IO.popen('tar -Oxmz', 'w+b') do |pipe|
pipe.write(file.read)
pipe.close_write
output = pipe.read
end
It works, but I get the whole response, here including several extracted files, in one piece (in variable output). I need the files separate from each other and possibly with their names. Is there any way to do this?
By the way, the resulting files are most of the time text, but sometimes binary. Running a pipe for each output file is not a solution, because the actual overhead of running the commands for each file outweights the benefits of doing the transformation in-memory.
P.S. The actual use case does not rely on tar only. I use software that do not have Ruby wrappers.

Resources