Run local macro in RSUBMIT SAS/CONNECT - parallel-processing

This is a similar question as in here, but using OPTIONS MSTORED SASMSTORE instead.
The issue is having functions stored in a library utils, and calling them inside a RSUBMIT statement. A minimum code would be like:
LIBNAME utils 'path/to/utils';
OPTIONS MSTORED SASMSTORE=utils;
%MACRO foo() / STORE;
*sas code;
%MEND foo;
%MACRO run_foo_parallel() / STORE;
options sascmd="sas"
SIGNON task;
RSUBMIT task;
%foo();
ENDRSUBMIT;
WAITFOR _all_;
SIGNOFF _all_;
%MEND;
In my real problem, I am parallelizing and running foo() in several datasets at a time. I don't know how to tell SAS to identify %foo(); in the correct utils folder. Things I've considered:
%SYSLPUT but this is only for macro variables
Use RSUBMIT task inheritlib(utils); and add OPTIONS MSTORED SASMSTORE=utils; inside the RSUBMIT. But then it raises an error A lock is not available for UTILS.SASMCR.CATALOG
Any help would be valuable! The "easy trick" is to write foo() inside the RSUBMIT without the STORE option, but it does not seem best practice.
Thank you all

You can't run a macro defined in one session in another session. So make the macros available to both sessions. You have used these two lines to make %foo() available.
LIBNAME utils 'path/to/utils';
OPTIONS MSTORED SASMSTORE=utils;
So run those lines in the remote session before trying to call the macro.
SIGNON task;
RSUBMIT task;
LIBNAME utils 'path/to/utils';
OPTIONS MSTORED SASMSTORE=utils;
%foo();
ENDRSUBMIT;
Make sure the path used in the remote session is valid where ever that session is running. Possibly you could upload the compiled macro catalog, similar to the file upload in the topic you linked. But that would require that the remove session is using the same version of SAS so that the catalog is readable.
Since you appear to be using this for parallel processing and not actually to connect to some SAS server that is actually remote then you can just setup the command you use to launch SAS to automatically setup the search path then just use !sascmd to launch the parallel sessions should yield the same setup.
So put the two setup lines in your AUTOEXEC (or config file or -initstmt command line option, etc.) so they run before the code in your actual program. Then when you spin up the parallel session using:
signon task cmd='!sascmd';
They will run also. So now both sessions have the same set of defined macros available.

Related

Limitations on file append when using in multi-processed environment

My process creates a log file and appends a new line at the end of the file by using a, e.g:
fopen("log.txt", "a");
The order of the writes is not critical, but I need to ensure that fopen always succeeds. My question is, can the call above be executed from multiple processes at the same time on Windows, Linux and macOS without any race-condition?
If not, what is the most common and easy way to ensure I can write to the log file? There is file-lokcing, but also a file-lock (aka log.txt.lock) possible. Could anyone share some insights or resources which go more into detail?
If you do not use any synchronization between processes, you'll highly likely have moment when several processes will try to write to the file and the best you can get is mesh of input strings.
In order to synchronize any work in several processes (multiprocessing module). Use Lock. It will prevent several processes to do some work simultaneously.
It will look something like this:
import multiprocessing
# create lock in main process and "send" it to child processes.
lock = multiprocessing.Lock()
# ...
# in child Process
with lock:
do_some_work()
If you need more detailed example, feel free to ask.
Also you can check example in official docs

Parameterise Parm File name In Informatatica

I want to know how to (or can I) parameterize the parm file name in informatica?
little bit of background. I am building a standard map in informatica. Which business users can call directly after selecting the standard filters they want to apply in the map using a GUI.
The parm file name will be given by business users and all the filters that he/she selected will be in parm. The file will be dropped in the parm folder in informatica server.
This is a good case scenario, when only 1 users is using it at 1 point of time.
Also, I want to find out what should I do when multiple users are working on GUI and generating the parm files and invoking the informatica map. How do I get multiple instences of the same map running at the same time?
I hope I am making sense here....
Thanks!!!
You can achieve this by using concurrent execution of the workflow. Read about it and understand how can you implement it.
Once you know how to implement it, use a backend script/code by the gui to assign an instance name to each call through GUI. For each instance name, you can have an individual parameter file. (I believe that there would be a finite set of combination of variable values in your case). You can use below command to call individual instances, (either through you GUI or by any other backend code.
pmcmd %workflow_name% %informatica_folder_name%
-paramfile %paramfilepathandname% -rin %instance_name%
It might sound a bit confusing, but once you understand how concurrent workflows work, you can build on it based on the above input.
It'll be only possible if you call the Informatica from external tool, not the Client tools. One way is described by #Utsav, the other is when you use Informatica WSH to call a Workflow - you can indicate the parameterfile you want to be used with the workflow, as well as desired instance name.
I Think this guide to concurrent workflows May be what you are looking for:
https://kb.informatica.com/howto/6/Pages/17/301264.aspx

perl script performance misbehave while calling another script

I have a perlscript which calls another perl script which functions normal database queries.
But sometimes when the inner script calling is so frequent then it will not inserting proper values in database, in debug I found that it was not able to fetch all the records from database which is most important before inserting the new calculated values.
I used system() method to call child script. this will wait till the child process end but then how it is possible to misbehave while the calling of child is frequently. In normal scenario main script will hold for 30 seconds which results proper executions of child script.
can anyone have any suggestion for debugging my code or any solution for this kind of issue.
Running Perl scripts for every DB insert is very ineffective.
You should notice that Perl will first compile called script (every time it has been called) - so it creates significant overhead.
Much better to use OOP and to put DB handling code into separate class - it will be compiled only once - at run time. Or you can use modules and put DB code into functions, they will be compiled only once too. Look at "use" pragma.
For example: simple programm with module
main_file.pl
use strict;
use DB_code;
DB_code::insert($data);
DB_code.pm
package DB_code;
use strict;
sub insert {
my $data = shift;
print "Your data has been inserted!";
}
1;

Prevent overwriting modules in Julia parallelization

I've written a Julia module with various functions which I call to analyze data. Several of these functions are dependent on packages, which are included at the start of the file "NeuroTools.jl."
module NeuroTools
using MAT, PyPlot, PyCall;
function getHists(channels::Array{Int8,2}...
Many of the functions I have are useful to run in parallel, so I wrote a driver script to map functions to different threads using remotecall/fetch. To load the functions on each thread, I launch Julia with the -L option to load my module on each worker.
julia -p 16 -L NeuroTools.jl parallelize.jl
To bring the loaded functions into scope, the "parallelize.jl" script has the line
#everywhere using NeuroTools
My parallel function works and executes properly, but each worker thread spits out a bunch of warnings from the modules being overwritten.
WARNING: replacing module MAT
WARNING: Method definition read(Union{HDF5.HDF5Dataset, HDF5.HDF5Datatype, HDF5.HDF5Group}, Type{Bool}) in module MAT_HDF5...
(contniues for many lines)
Is there a way to load the module differently or change the scope to prevent all these warnings? The documentation does not seem entirely clear on this issue.
Coincidentally I was looking for the same thing this morning
(rd,wr) = redirect_stdout()
So you'd need to call
remotecall_fetch(worker_id, redirect_stdout)
If you want to completely turn it off, this will work
If you want to turn it back on, you could
out = STDOUT
(a,b) = redirect_stdout()
#then to turn it back on, do:
redirect_stdout(out)
This is fixed in the more recent releases, and #everywhere using ... is right if you really need the module in scope in all workers. This GitHub issue talks about the problem and has links to some of the other relevant discussions.
If still using older versions of Julia where this was the case, just write using NeuroTools in NeuroTools.jl after defining the module, instead of executing #everywhere using NeuroTools.
The Parallel Computing section of the Julia documentation for version 0.5 says,
using DummyModule causes the module to be loaded on all processes; however, the module is brought into scope only on the one executing the statement.
Executing #everywhere using NeuroTools used to tell each processes to load the module on all processes, and the result was a pile of replacing module warnings.

redis: EVAL and the TIME

I like the Lua-scripting for redis but i have a big problem with TIME.
I store events in a SortedSet.
The score is the time, so that in my application i can view all events in given time-window.
redis.call('zadd', myEventsSet, TIME, EventID);
Ok, but this is not working - i can not access the TIME (Servertime).
Is there any way to get a time from the Server without passing it as an argument to my lua-script? Or is passing the time as argument the best way to do it?
This is explicitly forbidden (as far as I remember). The reasoning behind this is that your lua functions must be deterministic and depend only on their arguments. What if this Lua call gets replicated to a slave with different system time?
Edit (by Linus G Thiel): This is correct. From the redis EVAL docs:
Scripts as pure functions
A very important part of scripting is writing scripts that are pure functions. Scripts executed in a Redis instance are replicated on slaves by sending the script -- not the resulting commands.
[...]
In order to enforce this behavior in scripts Redis does the following:
Lua does not export commands to access the system time or other external state.
Redis will block the script with an error if a script calls a Redis command able to alter the data set after a Redis random command like RANDOMKEY, SRANDMEMBER, TIME. This means that if a script is read-only and does not modify the data set it is free to call those commands. Note that a random command does not necessarily mean a command that uses random numbers: any non-deterministic command is considered a random command (the best example in this regard is the TIME command).
There is a wealth of information on why this is, how to deal with this in different scenarios, and what Lua libraries are available to scripts. I recommend you read the whole documentation!

Resources