Groovy script executable - shell

I'd like to make my groovy script portable and I inserted in the first line:
#!/usr/bin/env groovy
The problem comes up when I run the script outside of its directory, it can't find libraries. I come from python world and all imports in python resolving relatively script's path. In groovy it seems I have to specify -classpath but I can't do that in the first #! line.
Any suggestions how to resolve it?

If the libraries are stored in a Maven repository that is accessible wherever you want to run it, one solution would be to use Grape to grab the libraries.
This provides several benefits:
You don't have to worry about the classpath at all
You don't have to distribute the libraries — just make sure Groovy is available on the client
Libraries are downloaded just once, so even if you upgrade the application, you only have to redistribute the .groovy file.
A simple example:
#!/usr/bin/env groovy
#Grab(group='commons-io', module='commons-io', version='2.3')
import org.apache.commons.io.FileUtils
... use FileUtils like normal ...
There's a lot of existing libraries available on mvnrepository.com already.
Even if you have non-public libraries, it's relatively easy to put your own libraries into a local / private repository.

Related

combining pkg-config with module environment

This question may not make much sense if my understanding of both the pkg-config and environment modules is somewhat incorrect, but I'll ask anyways as I could not find anything specific on this topic. There might be an entirely better solution available, if that is the case, I am all ears!
I while back I started using modules to easily load my development environment as needed (i.e. using commands like module load foo etc.). More recently, I have adopted the meson build system for my projects. In meson, libraries are treated as dependencies, which are found using pkg-config in the background instead. So now I have two ways of discovering libraries and setting up their lib and include directory.
As an example, I have the following (simplified) module script for library foo (I am using lmod which is based on lua):
prepend_path("LD_LIBRARY_PATH", "/opt/foo/lib")
prepend_path("CPATH", "/opt/foo/include")
I could also have a pkg-config file (*.pc) doing something similar like (that is, if my understanding of pkg-config is correct)
prefix=/opt/foo
includedir=${prefix}/include
libdir=${exec_prefix}/lib
Name: foo
Cflags: -I${includedir}
Libs: -L${libdir} -lfoo
Now both seem to be doing pretty much the same thing (in terms of setting up my environment), but simply using modulefiles will not allow meson to find my dependencies and I still have to use pkg-config (which requires basically creating two files, either manually or dynamically, but that sounds like a maintenance burden and also not very clean). Equally, I could create the pkg-config file and add the location of that file into the PKG_CONFIG_PATH, i.e. something like
prepend_path("LD_LIBRARY_PATH", "/opt/foo/lib")
prepend_path("CPATH", "/opt/foo/include")
prepend_path("PKG_CONFIG_PATH", /path/to/*.pc/file)
but again this requires two files (pkg and module). I rather like the module environment and so don't want to ditch that, so is there a better / cleaner way of doing things, where I just load a module file which will allow pkg-config (and thus meson in turn) to know about the dependency?
As of today, there is no bridge between the environment module and the pkg-config tools. The best thing I think that could be achieved to keep the module system, is to have a script that queries every pkg-config files available and create the corresponding modulefile. And run that script regularly to keep things in sync.

Sharing common script functions with maven-rpm-plugin

I have two separate projects that are built into RPMs using the maven-rpm-plugin.
Both packages have a postinstall script, which contains some duplicated code.
I would like to move the duplicated code into a single 'functions' script that could be inherited by both packages. Is this possible?
Write the common script as a standalone *.sh shell script that is installed and invoked in %post (or any other rpm scriptlet).
Remove the duplication of code by adding a dependency on whatever package you choose to install the common script (but avoiding dependency loops).

Deb file from sh script

Im trying to establish if it possible to create a deb package for the following app:
http://openfoam.org/download/4-0-source/
It uses an Allmake shell script which contains various standard shell commands and wmake commands to compile the source. wmake appears to be specific to this application but does call make:
http://www.cfdsupport.com/OpenFOAM-Training-by-CFD-Support/node25.html
https://github.com/OpenFOAM/OpenFOAM-2.1.x/blob/master/wmake/wmake
Is it possible to call the shell script from within a debian/rules file? or is there a better way of doing this if it is indeed possible?
Any assistance is much appreciated.
Indeed, the general idea of the debian/rules file is to run whatever commands are required to configure and install the upstream package into a location suitable for the dpkg toolchain.
Modern debhelper-based debian/rules files are typically extremely terse, because most typical packages adhere to build conventions for which good, very simple canned helpers are available, but traditional, more complex and explicit rules files are well-documented in older Debian packaging documentation.
Basically, the debian/rules file is a Makefile; it should have a binary target with the commands to build the upstream package into the Debian package root.
https://www.debian.org/doc/manuals/maint-guide/dreq.en.html#rules is probably useful as a starting point - unless your needs are really arcane, the dh defaults will mostly make sense, and it allows you to easily override the parts which don't.

How to run ./configure from another ./configure

I was wondering is it possible to run a configure script from another one? What I have is the situation where my own project uses autotools for config and make. So before any build a configure script is ran (as usual). But now I want to add another lib to my project which also uses the same build principle (It is necessary to run configure script before building a project). So instead of making my future users run two configure scripts, is there a way to automate this. (but without using a shell script - bash, perl, etc.)
Can this be done and if so, how??

Easiest way to install Python dependencies on Spark executor nodes?

I understand that you can send individual files as dependencies with Python Spark programs. But what about full-fledged libraries (e.g. numpy)?
Does Spark have a way to use a provided package manager (e.g. pip) to install library dependencies? Or does this have to be done manually before Spark programs are executed?
If the answer is manual, then what are the "best practice" approaches for synchronizing libraries (installation path, version, etc.) over a large number of distributed nodes?
Actually having actually tried it, I think the link I posted as a comment doesn't do exactly what you want with dependencies. What you are quite reasonably asking for is a way to have Spark play nicely with setuptools and pip regarding installing dependencies. It blows my mind that this isn't supported better in Spark. The third-party dependency problem is largely solved in general purpose Python, but under Spark, it seems the assumption is you'll go back to manual dependency management or something.
I have been using an imperfect but functional pipeline based on virtualenv. The basic idea is
Create a virtualenv purely for your Spark nodes
Each time you run a Spark job, run a fresh pip install of all your own in-house Python libraries. If you have set these up with setuptools, this will install their dependencies
Zip up the site-packages dir of the virtualenv. This will include your library and it's dependencies, which the worker nodes will need, but not the standard Python library, which they already have
Pass the single .zip file, containing your libraries and their dependencies as an argument to --py-files
Of course you would want to code up some helper scripts to manage this process. Here is a helper script adapted from one I have been using, which could doubtless be improved a lot:
#!/usr/bin/env bash
# helper script to fulfil Spark's python packaging requirements.
# Installs everything in a designated virtualenv, then zips up the virtualenv for using as an the value of
# supplied to --py-files argument of `pyspark` or `spark-submit`
# First argument should be the top-level virtualenv
# Second argument is the zipfile which will be created, and
# which you can subsequently supply as the --py-files argument to
# spark-submit
# Subsequent arguments are all the private packages you wish to install
# If these are set up with setuptools, their dependencies will be installed
VENV=$1; shift
ZIPFILE=$1; shift
PACKAGES=$*
. $VENV/bin/activate
for pkg in $PACKAGES; do
pip install --upgrade $pkg
done
TMPZIP="$TMPDIR/$RANDOM.zip" # abs path. Use random number to avoid clashes with other processes
( cd "$VENV/lib/python2.7/site-packages" && zip -q -r $TMPZIP . )
mv $TMPZIP $ZIPFILE
I have a collection of other simple wrapper scripts I run to submit my spark jobs. I simply call this script first as part of that process and make sure that the second argument (name of a zip file) is then passed as the --py-files argument when I run spark-submit (as documented in the comments). I always run these scripts, so I never end up accidentally running old code. Compared to the Spark overhead, the packaging overhead is minimal for my small scale project.
There are loads of improvements that could be made – eg being smart about when to create a new zip file, splitting it up two zip files, one containing often-changing private packages, and one containing rarely changing dependencies, which don't need to be rebuilt so often. You could be smarter about checking for file changes before rebuilding the zip. Also checking validity of arguments would be a good idea. However for now this suffices for my purposes.
The solution I have come up with is not designed for large-scale dependencies like NumPy specifically (although it may work for them). Also, it won't work if you are building C-based extensions, and your driver node has a different architecture to your cluster nodes.
I have seen recommendations elsewhere to just run a Python distribution like Anaconda on all your nodes since it already includes NumPy (and many other packages), and that might be the better way to get NumPy as well as other C-based extensions going. Regardless, we can't always expect Anaconda to have the PyPI package we want in the right version, and in addition you might not be able to control your Spark environment to be able to put Anaconda on it, so I think this virtualenv-based approach is still helpful.

Resources