Im trying to develop my own flow manager and even if I'm not fully familiar with Ansible, it looks like it can do the job.
I'd like to evaluate a part of concept with you and to understand if it is doable in Ansible or not. So rather than asking for a solution Im asking for suggestions about architecture.
Here are the requirements:
Flow executes on one machine.
Flow should be divided on arbitrary number of steps (it depends on project requirements) that can be executed sequentially or in parallel. Eg.
- step_0
- step_1
- step_2
step_3
- step_4
step_5
Here step_0 should be executed first, and once it is done step_1 should be launched. Having done step_1, steps 2 and 3 should start in parallel and when both of them are done, steps 4 and 5 should be run, again in parallel
Every step should be a logical wrapper around arbitrary number of commands. Eg. step_0 can execute script that makes directory skeleton, followed by commands for setting ENV VAR, followed by commands for linking. Then step_1 starts with new logical unit etc.
For every step I would like to have common generic callbacks before and after step execution. Callback requirements (again eg. for step_0):
pre_exe callback:
- create flag files:
step_0.START
step_0.RUNNING
- create log file step_0.log and redirect output content of step_0 to step_0.log
post_exe callback
- delete step_0.RUNNING
- create flag file step_0.DONE
- grep step_0.log for failing_signature (one or more strings - fail, error etc)
- grep step_0.log for passing_signature (few strings - pass, script_finished_successfully etc)
- based on results of grepping create flag files step_0.PASS (in case !FSIG & PSIG) or step_0.FAIL (in any other case)
- if step_0.FAIL is created terminate flow execution
Generally it would be good to have PSIG and FSIG, configurable on step level, but I can imagine it with hard-coded strings for all steps.
I would be happy if somebody can confirm if it is doable in Ansible or not, and if it is, to suggest high level architecture, so that I can focus my attention.
Related
I'm new to trying out snakemake (last week or so) in order to handle less of the small details for workflows, previously I have coded up my own specific workflow through python.
I generated a small workflow which among the steps would use Illumina PE reads and ran Kraken against them. I'd then parse the output of the Kraken output to detect the most common species (within a set of allowable) if a species value wasn't provided (running with snakemake -s test.snake --config R1_reads= R2_reads= species=''.
I have 2 questions.
What is the recommended approach given the dynamic output/input?
Currently my strategy for this is to create a temp file which
contains the detected species and then cat {input.species} it into
other shell commands. This doesn't seem elegant but looking through
the docs I couldn't quite find an adequate alternative. I noticed
PersistentDicts would let me pass variables between run: commands
but I'm unsure if I can use that to load variables into a shell:
section. I also noticed that wrappers could allow me to handle it
however from the point I need that variable on I'd be wrapping the
remainder of my workflow.
Is snakemake the right tool if I want to use the species afterwards to run a set of scripts specific to the species (with multiple species specific workflows)?
Right now my impression on how to solve this problem is to have multiple workflow files for the species and have a run with switch which calls the associated species workflow dependant on the species.
Appreciate any insight on these questions.
-Kim
You can mark output as dynamic (e.g. expecting one file per species). Then, Snakemake will determine the downstream DAG of jobs after those files have been generated. See http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#dynamic-files
I have a java program that will process 800 images.
I decided to use Condor as a platform for distributed computing, aiming that I can divide those images onto available nodes -> get processed -> combined the results back to me.
Say I have 4 nodes. I want to divide the processing to be 200 images on each node and combine the end result back to me.
I have tried executing it normally by submitting it as java program and stating the requirements = Machine == .. (stating all nodes). But it doesn't seem to work.
How can I divide the processing and execute it in parallel?
HTCondor can definitely help you but you might need to do a little bit of work yourself :-)
There are two possible approaches that come to mind: job arrays and DAG applications.
Job arrays: as you can see from example 5 on the HTCondor Quick Start Guide, you can use the queue command to submit more than 1 job. For instance, queue 800 at the bottom of your job file would submit 800 jobs to your HTCondor pool.
What people do in this case is organize the data to process using a filename convention and exploit that convention in the job file. For instance you could rename your images as img_0.jpg, img_1.jpg, ... img_799.jpg (possibly using symlinks rather than renaming the actual files) and then use a job file along these lines:
Executable = /path/to/my/script
Arguments = /path/to/data/dir/img_$(Process)
Queue 800
When the 800 jobs run, $(Process) gets automatically assigned the value of the corresponding process ID (i.e. a integer going from 0 to 799). Which means that your code will pick up the correct image to process.
DAG: Another approach is to organize your processing in a simple DAG. In this case you could have a pre-processing script (SCRIPT PRE entry in your DAG file) organizing your input data (possibly creating symlinks named appropriately). The real job would be just like the example above.
I like the Lua-scripting for redis but i have a big problem with TIME.
I store events in a SortedSet.
The score is the time, so that in my application i can view all events in given time-window.
redis.call('zadd', myEventsSet, TIME, EventID);
Ok, but this is not working - i can not access the TIME (Servertime).
Is there any way to get a time from the Server without passing it as an argument to my lua-script? Or is passing the time as argument the best way to do it?
This is explicitly forbidden (as far as I remember). The reasoning behind this is that your lua functions must be deterministic and depend only on their arguments. What if this Lua call gets replicated to a slave with different system time?
Edit (by Linus G Thiel): This is correct. From the redis EVAL docs:
Scripts as pure functions
A very important part of scripting is writing scripts that are pure functions. Scripts executed in a Redis instance are replicated on slaves by sending the script -- not the resulting commands.
[...]
In order to enforce this behavior in scripts Redis does the following:
Lua does not export commands to access the system time or other external state.
Redis will block the script with an error if a script calls a Redis command able to alter the data set after a Redis random command like RANDOMKEY, SRANDMEMBER, TIME. This means that if a script is read-only and does not modify the data set it is free to call those commands. Note that a random command does not necessarily mean a command that uses random numbers: any non-deterministic command is considered a random command (the best example in this regard is the TIME command).
There is a wealth of information on why this is, how to deal with this in different scenarios, and what Lua libraries are available to scripts. I recommend you read the whole documentation!
This is my problem, I've got a batch-script that I can't modify (lets call it foo) and I would like to count how many times/day this script is executed - to keep track of that data.
Preferably, I would like to write the number of executions with date and exit-code to some kind of log file.
So my question is if this is possible and in that case - how? To create a batch-script/something that works in the background and writes every execution of foo to a log.
(I know this would be easy if I could modify foo but I can't. Also, everything is running on WinXP machines.)
You could write a wrapper script that does the logging and calls the existing script. Then use the wrapper in place of the original script
Consider writing a program that interrogates the Task Manager.
See http://www.netomatix.com/ProcDiagnostics.aspx
You could, for example, write a simple Console app which runs on a timer; every 5 seconds it checks that your foo application process exists. If it finds that it does, it assumes that find as the start time of the application; if it doesn't find it, it assumes the application has now closed and logs that information. It wouldn't be accurate to the second by any means, but would give you a rough approximation of when the thing is running and closing.
You might be able to configure Process Monitor
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx to capture the information you require
I'm attempting to automate a really old dos application. I've decided the best way to do this is via input redirection. The legacy app (menu driven) has many tasks within tasks with branching logic. In order to easily understand and reuse the input for these tasks, I'd like to break them into bit size pieces. Since I'll need to start a fresh app on each run, repeating a context to consume a bit might be messy.
I'd like to create an object model that:
allows me to concentrate on the task at hand
allows me to reuse common tasks from different start points
prevents me from calling a task from the wrong start point
To be more explicit, given I have the following task hierarchy:
START
A
A1
A1a
A1b
A2
A2a
B
B1
B1a
I'd like an object model that lets me generate an input file for task "A1b" buy using building blocks like:
START -> do_A, do_A1, do_A1b
but prevents me from:
START -> do_A1 // because I'm assuming a different call chain from above
This will help me write "do_A1b" because I can always assume the same starting context and will simplify writing "do_A1a" because it has THE SAME starting context. What patterns will help me out here? I'm using ruby at the moment so if dynamic language features can help, I'm game.
EDIT: after re-reading your question, I realized I misunderstood it. Let me answer what you actually asked...
I would create a hierarchy of classes. The simplest ones would be have functions like "do task A1b" that would output the appropriate steps to accomplish this. On top of that, I would build functions that would call the sub-tasks in specific orders to accomplish specific goals.
Pretending VIM was the program being controlled, the first level tasks would be things like 'Enter insert mode' 'Enter command mode' 'write the file' or 'input this arbitrary set of inputs'. On top of this I would build functions like 'insert "foobar" into the open file at the start of line 5' which would call the lower-level tasks.