Code shows error in cluster but works fine otherwise - bash

Hello everyone,
I have a bash file which has the following code:
./lda --num_topics 15 --alpha 0.1 --beta 0.01 --training_data_file testdata/test_data.txt --model_file Model_Files/lda_model_t15.txt --burn_in_iterations 120 --total_iterations 150
This works perfectly fine normally but when I run it in a cluster it is not loading the data that it is supposed to load from the connected .cc files. I have given #!/bin/bash in the header. What can I do to rectify this situation? Please help!

You will need to mention the full path to the lda executable. Since it's not invoked by you manually, the system will not know where to find the executable if invoked by the shell. Since this is not a shell command, you don't necessarily need the #!/bin/bash even.
/<FullPath>/lda --num_topics 15 --alpha 0.1 --beta 0.01 --training_data_file testdata/test_data.txt --model_file Model_Files/lda_model_t15.txt --burn_in_iterations 120 --total_iterations 150

Related

Rsync include or exclude directories using text file

I'm using rsync to backup some data from a remote host.
this is how I'm using the rsync cmd:
rsync --dry-run -avhi -e ssh --include-from=/home/rsync_list/test.txt root#10.10.4.61:/ /mnt/BACKUP/my_BACKUP/
this is the file /home/rsync_list/test.txt
+ /usr/acs/conf/**
+ /usr/acs/bin/**
+ /raid0/opmdps/TEMP_folder/**
- *
I want to copy only the listed folders excluding the remaining files.
I always get
receiving file list ... done
sent 103 bytes received 48 bytes 302.00 bytes/sec
total size is 0 speedup is 0.00 (DRY RUN)
Could you tell me what I'm doing wrong? How should I write the rsync command if I would like to sync, for example, only /raid0/opmdps/TEMP_folder/ without its subfolders?
I wonder if you really only tried with the command you posted?
Not only you are using "--dry-run" option, even the output indicates this:
total size is 0 speedup is 0.00 (DRY RUN)
Please consult the manpage:
-n, --dry-run perform a trial run with no changes made
https://linux.die.net/man/1/rsync
May I suggest you give it a run without --dry-run?

Error creating LMDB database file in MATLAB for Caffe

I am trying to convert a dataset of images to LMDB format for use with Caffe, and I need to call the convert_imageset function for applying this conversion from inside Matlab.
I am using Linux, and I have created a shell (.sh) script with the needed parameters for running the conversion. Here is an example of how does my shell file look like:
GLOG_logtostderr=1 /usr/local/caffe-master2/build/tools/convert_imageset -resize_height=256 -resize_width=256 images_folder data_split/train.txt data_split/dataCNN_train_lmdb
When I simply run my script from the terminal like this:
./example_shell.sh
it works without any problem.
But when I try to do it from Matlab using the system() function:
system('./example_shell.sh')
it seems it is not able to open/find my files, rising the following error for each image in train.txt:
I0917 18:15:13.637830 8605 convert_imageset.cpp:82] A total of 68175 images.
I0917 18:15:13.638947 8605 db.cpp:34] Opened lmdb data_split/dataCNN_train_lmdb
E0917 18:15:13.639143 8605 io.cpp:77] Could not open or find file ...
E0917 18:15:13.639143 8605 io.cpp:77] Could not open or find file ...
E0917 18:15:13.639143 8605 io.cpp:77] Could not open or find file ...
Here are some sample lines from train.txt file (do not mind the 0s, they are just dummy labels):
/media/user/HDD_2TB/Food_101_Dataset/images/beef_carpaccio/970563.jpg 0
/media/user/HDD_2TB/Food_101_Dataset/images/chocolate_mousse/1908117.jpg 0
/media/user/HDD_2TB/Food_101_Dataset/images/cup_cakes/632892.jpg 0
/media/user/HDD_2TB/Food_101_Dataset/images/garlic_bread/1498092.jpg 0
/media/user/HDD_2TB/Food_101_Dataset/images/ceviche/3115634.jpg 0
They are absolute paths, so there should be no problem.
Any idea you might have about what could be happening can be very helpful for me!
Thank you,
Marc
I have not been able to solve the specific problem with Matlab, but I have managed to do the following (weird) workaround by using .txt files for communication:
Call the main Matlab's program from Python.
Check whenever Matlab needs to call the ./example_shell.sh script.
Python does the conversion calling ./example_shell.sh.
Matlab execution continues.

Jobs being killed on Condor Enviroment

I am running an executable in Condor that basically processes an input Image and saves a binary image in a given folder. I use this code in 213 images.
My condor configuration file contents are as following:
universe = vanilla
executable = /datasets/me/output_cpen_database/source_codes_techniques/test/vole
arguments = cmfd -I /datasets/me/cpen_database/scale1/$(Process)/$(Process).png -O /datasets/me/output_cpen_database/scale1/dct/$(Process)/ --numThreads 10 --chan GRAY --featvec DCT --blockSize 16 --minDistEuclidian 50 --kdsort --fastsats --minSameShift 1000 --markRegions --useOrig --writePost --writeMatrix
initialdir = /datasets/me/output_cpen_database/source_codes_techniques/test
requirements = (OpSysAndVer == "Ubuntu12")
request_cpus = 5
request_memory = 20000
output = logs/output-$(Process).log
error = logs/error-$(Process).log
log = logs/log-$(Process).log
Notification = Complete
Notify_User = mymail#gmail.com
Queue 214
Some images are processed OK, but in some cases I receive the following error in my mailbox:
Condor job 1273.47
/datasets/me/output_cpen_database/source_codes_techniques/test/vole cmfd -I /datasets/me/cpen_database/scale1/47/47.png -O /datasets/me/output_cpen_database/scale1/dct/47/ --numThreads 10 --chan GRAY --featvec DCT --blockSize 16 --minDistEuclidian 50 --kdsort --fastsats --minSameShift 1000 --markRegions --useOrig --writePost --writeMatrix
died on signal 9 (Killed)
I was thinking if this happens because of lack of memory, but this image's (named 47) size is not longer than 20MB (actually it has 16.7MB).
As I said before, the condor runs this executable ok for some other images .
Should I have to increase the request_memory in my configuration file? what is happening here?
Usually, a job dying on signal 9 means problems with some of the shared libraries required by your executable. What I would check is whether or not all jobs die on a particular host. If that's the case, you could run the code manually and see if you get a missing shared library error.

Automating scripts, Rscripts

Im having a lot of trouble of automating my .R files, and im having trouble understand the information regarding it. But here goes:
Im using windows 7 and simply want to automatically run a R.script every morning at 08.00. The .R file spits out the output by itself, so I dont want a separate output-file. I've created a bat-file like this:
"C:\R\R-3.0.1\bin\x64\Rscript.exe" "C:\R\R-3.0.1\bin\x64\Scripts\24AR_v1bat.R"
Echo %DATE% %TIME% %ERRORLEVEL% >> C:\R\R-3.0.1\bin\x64\scripts\24AR_v1.txt
When I run this manually, it works perfectly. Both with/without the:
--default-packages=list
When I run it through the cmd-window, it works perfectly. Yet when I try to run it through the task-scheduler it runs, but does not work. (I either get a 1 or 2 error in my error-message file).
I've looked at R Introduction - Invoking R from the command line, and help(Rscript) but I still can't manage to get it to work.
NEW EDIT: I found that not doing the MS SQL-call, will let my code run from the scheduler. Not sure if I should make a new question or?
EDIT: Adding the R-script
# 24 Hour AR-model, v1 ----------------------------------------------------
#Remove all variables from the workspace
#rm(list=ls())
# Loading Packages
library(forecast)
#Get spot-prices System from 2012-01-01 to today
source("/location/Scripts/SQL_hourlyprices.R")
sys <- data.frame()
sys <- spot
rm(spot)
# Ordering the data, first making a matrix with names: SYS
colnames(sys) <- c("date","hour","day","spot")
hour <-factor(sys[,2])
day <-factor(sys[,3])
dt<-sys[,1]
dt<-as.Date(dt)
x<-sys[,4]
q <-ts(x, frequency=24)
x0<- q[hour==0]
x1<- q[hour==1]
x0 <-ts(x0, frequency=7)
x1 <-ts(x1, frequency=7)
# ARIMA MODELS
y0<-Arima(x0,order=c(2,1,0))
y1<-Arima(x1,order=c(2,1,1))
fr0 <- forecast.Arima(y0,h=1)
fr1 <- forecast.Arima(y1,h=1)
h1<-as.numeric(fr0$mean)
h2<-as.numeric(fr1$mean)
day1 <-Sys.Date()+1
atable<-data.frame
runtime<-Sys.time()
atable<-cbind(runtime,day1,h1,h2)
options(digits=4)
write.table(atable, file="//location/24ar_v1.csv",
append=TRUE,quote=FALSE, sep=",", row.names=F, col.names=F)
But as I said, I can manually run the code with the batch-file and have it work perfectly, yet with the scheduler it won't work.
After hours of trying everything, it seems the problem was that I had:
source("/location/Scripts/SQL_hourlyprices.R")
Where I simply had a SQL-call inside:
sqlQuery(dbdata2, "SELECT CONVERT(char(10), [lokaldatotid],126) AS date,
DATEPART(HOUR,lokaldatotid) as hour,
DATENAME(DW,lokaldatotid) as dag,
pris as spot
FROM [SpotPriser] vp1
WHERE (vp1.boers_id=0)
AND (vp1.omraade_id=0)
AND lokaldatotid >='2012-01-01'
GROUP BY lokaldatotid, pris
ORDER BY lokaldatotid, hour desc") -> spot
When I moved this directly into the script, and deleted the source-line, the scripts would run with the scheduler.
I have no idea why....

ab (Apache Bench) error: apr_poll: The timeout specified has expired (70007) on Windows

I'm load testing IIS 7.5 (WinR2/SP1) from my Windows 7/SP1 client. I have a script that makes three ab calls like:
start /B cmd /c ab.exe -k -n 500 -c 50 http://rhvwr2vsu410/HelloWebAPI/Home/SyncProducts > SyncProducts.txt
When the concurrency is > 5, I soon get the error message
apr_poll: The timeout specified has expired (70007)
And ab stops making requests. I don't even get to Completed 100 requests.
This happens within 30 seconds of starting my script. The ab documentation page doesn't provide much. Related Stack Overflow question. Server Fault related question .
You must have the 2.4 version and use -s timeout option.
Edit:
https://www.wampserver.com/ - includes Apache 2.4.x Win32 and Win64.
Deprecated but still available however I not known until when and just not available:
You can use my win32-x86 binary (compiled under Visual Studio 2008 from trunk 8 Feb 2013):
http://mars.iti.pk.edu.pl/~nkg/ab-standalone.exe (no longer available)
http://mars.iti.pk.edu.pl/~nkg/ab-standalone-src.zip (no longer available)
I was made it using: http://code.google.com/p/apachebench-standalone/wiki/HowToBuild and
http://ftp.ps.pl/pub/apache//apr/binaries/win32/apr-1.3.6-iconv-1.2.1-util-1.3.8-win32-x86-msvcrt60.zip (just not available).
ab --help
-s timeout Seconds to max. wait for each response
Default is 30 seconds
Add option: -s 120 to ab command, Where 120 is new timeout. If it is not enough set it even higher...
ab --help
-s timeout Seconds to max. wait for each response
Default is 30 seconds
-k Use HTTP KeepAlive feature
It works for me
Sounds like an ab bug.
I had a similar problem on OS X (now that you mention it happens on Windows, I feel more confident that ab is the culprit). I went around profiling and tracing my web application, but couldn't find anything. I then tested static pages off of nginx, and it still gave me the error. So I then went and found a replacement... jMeter. Works great, but I would still like to know what the ab problem is.

Resources