Using gitpython to get current hash does not work when using qsub for job submission on a cluster - gitpython

I use python to do my data analysis and lately I came up with the idea to save the current git hash in a log file so I can later check which code version created my results (in case I find inconsistencies or whatever).
It works fine as long as I do it locally.
import git
import os
rep = git.Repo(os.getcwd(), search_parent_directories=True)
git_hash = rep.head.object.hexsha
with open ('logfile.txt', 'w+') as writer:
writer.write('Code version: {}'.format(git_hash))
However, I have a lot of heavy calculations that I run on a cluster to speed things up (run analyses of subjects parallel), using qsub, which looks more or less like this:
qsub -l nodes=1:ppn=12 analysis.py -q shared
This always results in a git.exc.InvalidGitRepositoryError.
EDIT
Printing os.getcwd() showed me, that on the cluster the current working dir is always my $HOME directory no matter from where I submit the job.
My next solution was to get the directory where the file is located using some of the solutions suggested here.
However, these solutions result in the same error because (that's how I understand it) my file is somehow copied to a directory deep in the root structure of the cluster's headnode (/var/spool/torque/mom_priv/jobs).
I could of course write down the location of my file as a hardcoded variable, but I would like a general solution for all my scripts.

So after I explained my problem to IT in detail, they could help me solve the problem.
Apparently the $PBS_O_WORKDIR variable stores the directory from which the job was committed.
So I adjusted my access to the githash as follows:
try:
script_file_directory = os.environ["PBS_O_WORKDIR"]
except KeyError:
script_file_directory = os.getcwd()
try:
rep = git.Repo(script_file_directory, search_parent_directories=True)
git_hash = rep.head.object.hexsha
except git.InvalidGitRepositoryError:
git_hash = 'not-found'
# create a log file, that saves some information about the run script
with open('logfile.txt'), 'w+') as writer:
writer.write('Codeversion: {} \n'.format(git_hash))
I first check if the PBS_O_WORKDIR variable exists (hence if I run the script as a job on the cluster). If it does get the githash from this directory if it doesn't use the current working directory.
Very specific, but maybe one day someone has the same problem...

Related

Creating a production ready binary from Julia code

I have a Julia program that inputs a csv and transforms the data via a bunch of functions, and outputs a csv file. I want to turn this into a binary so that I can run on different machines without having the source code on different machines.
I am looking at PackageCompiler.jl, but I can't find any understandable documentation for creating a binary app. I am trying:
using PackageCompiler
#time create_app("JuliaPrograms", "test"; precompile_execution_file="script.jl")
The file that contains all my code is script.jl and it lives in the dir JuliaPrograms, and I want the compiled binary to be named test.
When I run julia script.jl it performs as I want. I want to be able to run ./test with the same result.
However, I get this error:
ERROR: could not find project at "/Users/userx/JuliaPrograms/"
What am I doing wrong? Do I need some special project directory?
Per the docs here: https://julialang.github.io/PackageCompiler.jl/dev/apps.html#Creating-an-app-1 you need to make sure you define:
function julia_main()::Cint
# do something based on ARGS?
return 0 # if things finished successfully
end
a function called julia_main as the entry point to the app. You can find an example app here: https://github.com/JuliaLang/PackageCompiler.jl/tree/master/examples/MyApp
You may also want to check the location of the code itself. Is it being saved at "/Users/userx/JuliaPrograms/"? You can switch your directory in the Julia Reply by typing ; which will enter you into shell mode and then you can cd into the directory where your code is.

How does a shell script read the data in a batch test folder

I recently replicated a SEGAN experiment based on TensorFlow0.12.1.The author provides a shell script for testing (clean_wav.sh), as shown in the figure below:
This is the original version provided by the author. According to the path of my test data, the modified version is as follows:
Noisy_testset_wav_16k is my test data folder, but running the script system will report an error:
This folder is a directory, but when I change the path to:
NOISY_WAVNAME='/home/zyf/SEGAN/ SEGAN/segan-master1/noisy_testset_wav_16k/p232_023.wav'
the script runs normally and the program function can also be achieved.
However, only one audio file can be processed at a time and cannot be processed in batches.Hope everybody knows reason or have opinion, can give me give advice or two, thank very much.
The code is written in the way it only processes file, you can add a loop in shell script to process all files in the folder:
for f in $NOISY_WAVDIR/*.wav; do
python main.py --init_noise_std 0. --save_path segan_v1.1 \
--batch_size 100 --g_nl prelu --weights SEGAN-41700 \
--preemph 0.95 --bias_deconv True \
--bias_downconv True --bias_D_conv True \
--test_wav $f --save_clean_path $SAVE_PATH
done
but that would not be optimal use of the GPU since you do not process audio in batch. Ideally you'd want to modify python code to process audio in batches but that would not be a trivial task.

Reinstalling packages from a list generated by command: ado dir

I am recovering Stata following a Windows upgrade. I have a list of my packages generated from ado dir in the following format:
[1] package mdesc from http://fmwww.bc.edu/RePEc/bocode/m
'MDESC': module to tabulate prevalence of missing values
[2] package univar from http://fmwww.bc.edu/RePEc/bocode/u
'UNIVAR': module to generate univariate summary with box-and-whiskers plot
[3] package tabmiss from http://www.ats.ucla.edu/stat/stata/ado/analysis
tabmiss. Shows tabulation of number of missing and non-missing values
I have many packages and would like to reinstall them without having to designate each directory/url via net cd. While using net cd along with net install or ssc install along with package names in a loop is trivial (as below), it would seem that an automated method for this task might be available.
net cd http://www.ats.ucla.edu/stat/stata/ado/analysis
local ucla tabmiss csgof powerlog ldfbeta
foreach x of local ucla {
net install `x'
}
To my knowledge, there is no built-in or automated method of tracking and managing your installed packages outside of what is available through ado or net.
I would also tend to agree with #Nick Cox that this task seems strange and I can't imagine how a new Stata install or reinstall could know what was installed previously, but I find the question interesting for other reasons.
The main reason being for users who have Stata installed on multiple machines who need the same packages on both machines. I faced a similar issue when I purchased a new computer and installed Stata but wanted all of the packages I use to be available as well. Outside of moving the ado directory or selected contents I'm not aware of any quick solution.
Here it would be possible to use the output of ado dir on one machine to determine what you need to install on a second machine with a new Stata install.
The method you propose using a foreach loop could save you time from having to type in or copy/paste a lot of packages and URLs. At the same time however, this is only beneficial if you have many packages from only a few repositories because you will need to net cd to the URL each time as you show in your example.
An alternative solution is the programmatic solution. As you know, ado dir will list each installed package, the URL and a short description of the package. Using this, a log file, and the built in I/O functionality, a short program could be written to automate the process and dynamically build a do file that contains the commands to install the already installed packages.
The code below generates a do file containing commands (in this case, net describe package, from(url)) for each package I have installed on my computer.
clear *
tempfile log1
log using "`log1'", text name(mylog)
ado dir
log close mylog
tempname logfile
file open `logfile' using "`log1'", read
file read `logfile' line
file open dfh using "path/to/your/dofile.do", write replace
local pckage "package"
while r(eof) == 0 {
if `: list pckage in line' {
local packageName : word 3 of `line'
local dirName : word 5 of `line'
di "`packageName' `dirName'"
file write dfh "net describe `packageName', from(`dirName')"
file write dfh _newline
}
file read `logfile' line
}
file close `logfile'
file close dfh
In the above code, I create a temp file to write a .txt log file to and store the contents of ado dir in that file.
Then, I open the log file using file open and read it line by line in the while loop.
Above the loop, I'm creating a do file at /path/to/your/dofile.do to hold the output of the loop - the dynamically created commands relating to the installed packages on my machine.
The loop will iterate so long as r(eof) = 0, where r(eof) is an end of file marker. I use an if statement to sort out lines of the log file which contain the word package, as I'm only interested in those lines with the package name and URL in them.
Inside of the if block, I parse the local macro line to pull the package name and the URL/directory name.
this is important: this section of code assumes that the 3rd and 5th words in the macro will always be the package name and URL respectively - Confirm this from the output of ado dir before executing.
You will also need to change the command that is being written to the file handle dfh inside of the loop to what you want (net install, etc) when you are ready to execute.
For more help on using file, locals, and tempfiles execute any of the following in Stata:
help file
help extended_fcn
help macrolists
There may be nicer ways to parse the contents of ado dir but this has worked for me. And of course I'd always advise that you take the time to understand what the code is doing so that you can make any necessary tweaks to fit your particular situation.

Writing to popen and reading back several files in Ruby

I need to run some shell commands on a number of files and sometimes I get back more than one file in response. The question is: How can I read back several files from IO.popen in Ruby?
For instance, imagine the following case:
file = grid.get(record['_id']) # fetch a file from database
IO.popen('tar -Oxmz', 'ab') {|pipe| pipe.write(file.read)} # pass to tar and extract
This necessitates that I reread all the extracted files from the filesystem. I figured out this is the speed bottleneck of my script and I wonder if I can accomplish the same task in-memroy. I tried the following:
file = grid.get(record['_id'])
IO.popen('tar -Oxmz', 'w+b') do |pipe|
pipe.write(file.read)
pipe.close_write
output = pipe.read
end
It works, but I get the whole response, here including several extracted files, in one piece (in variable output). I need the files separate from each other and possibly with their names. Is there any way to do this?
By the way, the resulting files are most of the time text, but sometimes binary. Running a pipe for each output file is not a solution, because the actual overhead of running the commands for each file outweights the benefits of doing the transformation in-memory.
P.S. The actual use case does not rely on tar only. I use software that do not have Ruby wrappers.

NodeJS fs.watch on directory only fires when changed by editor, but not shell or fs module

When the code below is ran, the watch is only triggered if I edit and save tmp.txt manually, using either my ide, TextEditor.app, or vim.
It doesn't by method of the write stream or manual shell output redirection (typing echo "test" > /path/to/tmp.txt").
Although if I watch the file itself, and not its dirname, then it works.
var fs, Path, file, watchPath, w;
fs = require('fs' );
Path = require('path');
file = __dirname + '/tmp.txt';
watchPath = Path.dirname(file); // changing this to just file makes it trigger
w = fs.watch ( watchPath, function (e,f) {
console.log("will not get here by itself");
w.close();
});
fs.writeFileSync(file,"test","utf-8");
fs.createWriteStream(file, {
flags:'w',
mode: 0777
} )
.end('the_date="'+new Date+'";' ); // another method fails as well
setTimeout (function () {
fs.writeFileSync(file,"test","utf-8");
},500); // as does this one
// child_process exec and spawn fail the same way with or without timeout
So the questions are: why? and how to trigger this event programmatically from a node script?
Thanks!
It doesn't trigger because a change to the contents of a file isn't a change to the directory.
Under the covers, at least as of 0.6, fs.watch on Mac uses kqueue, and it's a pretty thin wrapper around kqueue file system notifications. So, if you really want to understand the details, you have to understand kqueue, and inodes and things like that.
But if you want a short "lie-to-children" explanation: What a user thinks of as a "file" is really two separate things—the actual file, and the directory entry that points to the actual file. This is what allows you to have things like hard links, and files that can still be read and written even after you've deleted them, and so on.
In general, when you write to an existing file, this doesn't make any change to the directory entry, so anyone watching the directory won't see any change. That's why echo >tmp.txt doesn't trigger you.
However, if you, e.g., write a new temporary file and then move it over the old file, that does change the directory entry (making it a pointer to the new file instead of the old one), so you will be notified. That's why TextEditor.app does trigger you.
The thing is, you've asked to watch the directory and not the file.
The directory isn't updated when the file is modified, such as via shell redirection; in this case, the file is opened, modified, and closed. The directory isn't changed -- only the file is.
When you use a text editor to modify a file, the usual set of system calls behind the scenes looks something like this:
fd = open("foo.new")
write(fd, new foo contents)
unlink("foo")
rename("foo.new", "foo")
This way, the foo file is either entirely the old file or entirely the new file, and there's no way for there to be a "partial file" with the new contents. The renaming operations do modify the directory, thus triggering the directory watch.
Although the above answers seems reasonable, they are not fully accurate. It is actually a very useful feature to be able to listen to a directory for file changes, not just "renames". I think this feature works as expected in Windows at least, and in node 0.9.2 is also working for mac since they changed to the FSEvents API that supports the feature:
Version 0.9.2 (Unstable)

Resources