I'm currently trying to run some quick diagnostic tests on some parallel code. I'm submitting the code through a batch system to the cluster backend through .pbs scripts. I'm capturing the diagnostic data from the executables but I would like to plot them using GNUPLOT.
Is there anyway to do this? I've ssh'd into the cluster front-end through an x11 term, so I feel like I'm almost there. Perhaps something I could do by passing the - I flag to qsub?
I'm also aware that this may not at all be the best way to do this. Any suggestions would be appreciated (e.g. can a .pbs script run the plot command on the front-end?)
Related
I am writing e2e tests for a command line app where I have to do file manipulation (such as cp, mv, rm, touch, and mkdir). The tests can execute just fine in my local environment. The problem occurs when they are executed on the server across platforms, where the file manipulation gets interfered with each other. Questions are:
It seems wrong to have shell command in test code to begin with, should I just code the commands programmatically?
If above is yes, is there something that would work as a "temporary file system" that is only visible for the process? So that when the tests run on other platforms, the files would not get messed up?
It seems like mutex lock can work as well but it would slow down the entire build.
Sorry this is more of a general and specific question at the same time. Doubt there will be a perfect answer but would love to hear some suggestions and opinions as I am new in both Go and testing. Appreciate the help!
There is nothing wrong in using OS commands in your code otherwise these will not be available to be used, although it may be incompatible depending on the target environment and as you are facing now may have some restrictions.
One tool that can work as a layer to the file commands is Afero, where you can use it even to simulate in-memory operations and S3 resources.
I have spent time working on a bioinformatics project and produced numerous scripts and now I would like to use them for building a bioinformatics software that runs in the command line terminal, with the costumary manual and binary files. I would like to be able 1. to protect the code, 2. Make it fancy by not having to count with multiple scripts and 3. share the code with any one interested.
Since I don't really know where to start from, I would like to ask for orientation on the topic. I have been reading about script compilation and I think this could work, but I have scripts in three different coding languages, mainly python and bash, so I have not seen any tutorial on this specific case.
Any help as sharing resources (videos, manuals, software, etc.) or giving tips is appreciated. I know this is a VERY open question, so open answers are also welcome.
You could use the python argparse library to build a command line application that accepts arguments and flags. With this method, you can provide flags for user input and run your different scripts, including the bash scripts, based on user input.
https://realpython.com/command-line-interfaces-python-argparse/
Similarly, you can do this in a bash script that provides the user with options and run your other scripts based on input.
https://www.redhat.com/sysadmin/arguments-options-bash-scripts
I'm not sure what you mean by protect the code? If you mean hide the code, as far as I know, you cannot easily hide bash and python code or turn them into binaries if you want to share the script.
I have a program written in C++11. On the current input it takes too long to run. Luckily, the data can be safely split into chunks for parallel processing, which makes it a good candidate for, say, a Map/Reduce service.
AWS EMR could be a possible solution. However, since my code uses many modern libraries, it's quite a pain to compile it on the instances that are assigned for Apache Hadoop clusters. For example, I want to use soci (not available at all), boost 1.58+ (1.53 is there), etc etc. I also need a modern C++ compiler.
Obviously, all libraries and compilers can be manually upgraded (and the process scripted), but this sounds like a lot of manual work. And what about slave nodes - will they get all the libraries? Somehow I'm not sure. And the whole process of initializing the environment can now take very long time - thus killing a lot of performance advantage that distributing the jobs was supposed to bring in to begin with.
On the other hand, I don't really need all the advanced functionality that Apache Hadoop provides. And I don't want to set up a personal permanent cluster with my own installation of Hadoop or similar, because I will need to run the tasks only periodically and most of the time the servers will be idle, wasting money.
So, what would be the best product (or overall strategy) that could do the following:
Grab the given binaries + set of input files
Run the binaries on a predefined number of instances, using a recent Linux, ideally Ubuntu 15.10
Put the resulting files in a predefined location (S3 bucket?)
Shut everything down
I am sure I could write a number of scripts using the aws tool to achieve that manually, but I really don't want to reinvent the wheel. Any thoughts?
Thanks in advance!
Honestly that would be pretty easy to script, and you'll need to probably use scripting to grab the latest code on the servers when they start up anyway. I would suggest looking into defining an AutoScaling group with scheduled scaling policies. Alternatively you could have a Lambda function scheduled to run and issue the API command to create your instances.
You could either have a startup script on the server AMI, or simply pass a user-data script when you create the instances, that pulls down the binaries and input files and runs the command. The final step of the script could be to copy results to S3 and shutdown the server.
The (relatively new) AWS Batch is made for this purpose specifically.
Is it possible to write to a 3rd output stream? My situation is that I have an a number of scripts that execute various commands, remotely across a grid of machines. Those commands result in stdout and stderr. I would however like to feedback progress to the central controlling machine, without cluttering it with the interlaced stdout and stderr of the various machines in the grid. I was thinking that if it is possible to write to a 3rd output stream, that I could use it for specific status events from the grid, that the controlling script can report on, meanwhile stdout and stderr can remain redirected to log files for debugging should something go wrong.
For what it is worth I will probably be implementing this in ruby, and the machines involved will be a mixture of windows and unix machines.
I don't think how you architect your logging is constrained by the language you're using, but log4r and syslog come to mind if you're set on ruby. If you need a truly multiplatform solution maybe you might consider some kind of message bus or ØMQ although this will incur an extra layer of complexity.
It sounds like common logfiles for info and errors that all your scripts write to might be the simplest solution. Seeing as you're managing lots of small processes rather than one big monolithic app, using a tool like Splunk might help to aggregate and analyse all the logged events.
I am trying to learn how ruby is used in a server based back end environment. For example I want to be running a ruby script 24/7 on a server. What are the best practices for this and how does one go about doing this?
Can anyone provide some resources on how to do this or if you could label what I am trying to do? I am unsure of the terms that I am supposed to be googling.
Use cron. From OS point of view Ruby app is just a script like bash.
Also all Unix OSes have some kind of daemon script (like see examples in /etc/init.d)
Try BackgroundRb - this is a special Rails plugin that works like a Linux daemon. You could use any classes/models defined in your Rails application within the background code. You could also pass data to/from background process.