What is the `patchShebangs` command in Nix build expressions? - shell

Came across the patchShebangs command while looking at packages in the Nixpkgs repo, and saw it used in various phases of the standard environment's generic builder, but not sure what it is for or why it is needed in the first place.

In short: shell scripts used during a Nix build won't work out of the box because Nix clears the environment, and so the interpreter directive (shebang), on the first line of the script determining the program to use to evaluate the script body, will not find it. patchShebangs looks up the interpreter in the Nix store, and amends the script shebang.
0. Introduction
patchShebangs is indirectly mentioned in the Nixpkgs manual when describing the phases of the generic builder of the Nixpkgs standard environment, stating that the fixup phase at one point
rewrites the interpreter paths of shell scripts to paths found in PATH. E.g., /usr/bin/perl will be rewritten to /nix/store/some-perl/bin/perl found in PATH.
->
It is important to note that (paraphrasing #jonringer's comment), "the patchShebangs command is only available during the build if you source the $stdenv/setup setup hook" (more on that below) "provided by stdenv's (the Nixpkgs standard environment's) default builder (you get this by default when using stdenv.mkDerivation), which is why the starting point of almost all nix expressions is import <nixpkgs> {}, stdenv.mkDerivation, or something similar."
1. Where is patchShebangs defined
The file patch-shebangs.sh in the Nixpkgs repo (also documented at 6.7.4. patch-shebangs.sh) defines the patchShebangs function, which in turn is used to implement patchShebangsAuto, the setup hook that is registered to run during the fixup phase.
2. Why are shebang rewrites needed when building Nix packages?
According to the comment at the top of patch-shebangs.sh:
# This setup hook causes the fixup phase to rewrite all script
# interpreter file names (`#! /path') to paths found in $PATH. E.g.,
# /bin/sh will be rewritten to /nix/store/<hash>-some-bash/bin/sh.
# /usr/bin/env gets special treatment so that ".../bin/env python" is
# rewritten to /nix/store/<hash>/bin/python. Interpreters that are
# already in the store are left untouched.
# A script file must be marked as executable, otherwise it will not be
# considered.
IMPORTANT NOTE: The criterion above that the "script file must be marked as executable, otherwise it will not be considered" is an important one.
The line in a shell script starting with #! is called shebang (among others), and it is an interpreter directive to the executing shell as for what program to use to decipher the text below; the characters after #! has to consitute an absolute path that points to this executable. For example, #!/usr/bin/python3 will expect to find the python3 program there to carry out the commands in the body of the shell script written in the Python programming language.
Using shell scripts during package build phases becomes problematic though because
When Nix runs a builder, it initially completely clears the environment (except for the attributes declared in the derivation). For instance, the PATH variable is empty. This is done to prevent undeclared inputs from being used in the build process. If for example the PATH contained /usr/bin, then you might accidentally use /usr/bin/gcc.
->
The quote above is from the Nix manual but the builder, that is shown there as an example, uses $stdenv/setup - a shell script that sets up a pristine sandbox environment for the build process, unsetting most (all?) environment variables from the calling shell, and only including a small number of utilities. (This is done to make builds reproducible, as much as possible.)1
$stdenv/setup is usually called implicitly when using stdenv.mkDerivation with the generic builder (i.e., when the builder attribute is left undeclared) but one can write their own builders and invoke it explicitly during the build process.
TIP: This answer shows one way to find where a certain Nix function is defined (although it is not infallible).
As a corollary, the programs pointed to by the shebang directives won't be at those locations (or unavailable to reach from the sandbox), but they are actually around (or will be) in the Nix store so the paths will need to be re-pointed to their location in there.
NOTE: The generic builder populates PATH from inputs of the derivation so one must make sure that these are included as a dependency.
3. How to use
3.1 Implicitly
As mentioned above,patchShebangs is automatically invoked by the patchShebangsAuto setup hook during the fixup phase whenever a package is built - unless one opts out of this by setting the dontPatchShebangs variable (or the dontFixup variable for that matter) (see Variables controlling the fixup phase in the Nixpkgs manual).
Reminder to self: 6.4 Bash Conditional Expressions.
3.1.0 What scripts is patchShebangs used on when invoked automatically?
Usually on scripts installed by packages (for example to $out/bin).
Or the ones provided default by the Nixpkgs standard library? I presume that these have to be generic enough to run on different platforms so that (1) the template is built, and (2) scripts shebangs are patched in the end. (#jtojnar confirmed this conjecture, but this section needs references, hence the small case.)
3.1.1 How to use the variables controlling a build phase?
Pass it to mkDerivation like any other variable controlling the builder.
stdenv.mkDerivation {
#...
dontPatchShebangs = true;
#...
}
3.2 Explicitly
Historical note: Originally, patchShebangs was not externally callable, but it was later extracted to make its functionality re-usable in other build phases as well.
Again, from the comments in the implementation:
# Run patch shebangs on a directory or file.
# Can take multiple paths as arguments.
# patchShebangs [--build | --host] PATH...
# Flags:
# --build : Lookup commands available at build-time
# --host : Lookup commands available at runtime
# Example use cases,
# $ patchShebangs --host /nix/store/...-hello-1.0/bin
# $ patchShebangs --build configure
It needs to be run on scripts that are to be executed directly (shell scripts included) during build time. These may be
coming from the source of what is being packaged
written by one to be used as helpers during the build process2
Specific examples from around the web:
In Nix, how can I build a package that has a Python post-install script? (Unix & Linux Stackexchange)
hard-coded bin path and NixOS (Stackoverflow)
[QUESTION] Alias and symlinks in NixOS derivations (Reddit)
This systemd-specific issue on IRC
... and quoting #jtojnar:
That is exactly the use case for the explicit patchShebangs call. Meson build system expects to run src/shared/generate-syscall-list.py so it calls it. But that fails because /usr/bin/env does not exist in the build sandbox. And it only gets confusing because kernel/libc/something else reports that the script does not exist, even though it was the interpreter from the shebang which does not exist.
Footnotes
[1]: TODO: Find out more about how the sandbox(es) are built exactly and what are barred and what are allowed. Quoting #jtojnar to bring one example:
/usr/bin/env, which is not available in sandbox either. (NixOS only has that in user space for convenience but that does not carry over to Nix sandbox..
[2]: #jtojnar's comment: "Right, you will not need to use it explicitly for scripts that are only executed at run time, since those will be handled by the implicit call."
All links in this thread have (hopefully) been saved to the Internet Archive. (The soundtrack of the thread is this gem.)

Related

Is it possible to have separate bash completion functions for separate commands which happen to share the same name?

I have two separate scripts with the same filename, in different paths, for different projects:
/home/me/projects/alpha/bin/hithere and /home/me/projects/beta/bin/hithere.
Correspondingly, I have two separate bash completion scripts, because the proper completions differ for each of the scripts. In the completion scripts, the "complete" command is run for each completion specifying the full name of the script in question, i.e.
complete -F _alpha_hithere_completion /home/me/projects/alpha/bin/hithere
However, only the most-recently-run script seems to have an effect, regardless of which actual version of hithere is invoked: it seems that bash completion only cares about the filename of the command and disregards path information.
Is there any way to change this behavior so that I can have these two independent scripts with the same name, each with different completion functions?
Please note that I'm not interested in a solution which requires alpha to know about beta, or which would require a third component to know about either of them--that would defeat the purpose in my case.
The Bash manual describes the lookup process for completions:
If the command word is a full pathname, a compspec for the full pathname is searched for first. If no compspec is found for the full pathname, an attempt is made to find a compspec for the portion following the final slash. If those searches do not result in a compspec, any compspec defined with the -D option to complete is used as the default.
So the full path is used by complete, but only if you invoke the command via its full path. As for getting completions to work using just the short name, I think your only option (judging from the spec) is going to be some sort of dynamic hook that determines which completion function to invoke based on the $PWD - I don't see any evidence that Bash supports overloading a completion name like you're envisioning.
Yes, this is possible. But it's a bit tricky. I am using this for a future scripting concept I am developing: All scripts have the same name as they are build scripts, but still bash completion can do its job.
For this I use a two step process: First of all I place a main script in ~/.config/bash_completion.d. This script is designed to cover all scripts of the particular shared script name. I configured ~/.bashrc in order to load that bash completion file for these scripts.
The script will obtain the full file path of the particular script file I want to have bash completion for. From this path I generate an identifier. For this identifier there exists a file that provides actual bash completion data. So if bash completion is performed the bash completion function from the main bash completion script will check for that file and load it's content. Then it will continue with regular bash completion operation.
If you have two scripts with the same name you will have two different identifiers as those scripts share the same name but have different paths. Therefore two different configurations for bash completion can be used.
This concept works like a charm.
NOTE: I will update this answer soon providing some source code.

Coding a relative path to file in OS X [duplicate]

I have a Haskell script that runs via a shebang line making use of the runhaskell utility. E.g...
#! /usr/bin/env runhaskell
module Main where
main = do { ... }
Now, I'd like to be able to determine the directory in which that script resides from within the script, itself. So, if the script lives in /home/me/my-haskell-app/script.hs, I should be able to run it from anywhere, using a relative or absolute path, and it should know it's located in the /home/me/my-haskell-app/ directory.
I thought the functionality available in the System.Environment module might be able to help, but it fell a little short. getProgName did not seem to provide useful file-path information. I found that the environment variable _ (that's an underscore) would sometimes contain the path to the script, as it was invoked; however, as soon as the script is invoked via some other program or parent script, that environment variable seems to lose its value (and I am needing to invoke my Haskell script from another, parent application).
Also useful-to-know would be whether I can determine the directory in which a pre-compiled Haskell executable lives, using the same technique or otherwise.
As I understand it, this is historically tricky in *nix. There are libraries for some languages to provide this behavior, including FindBin for Haskell:
http://hackage.haskell.org/package/FindBin
I'm not sure what this will report with a script though. Probably the location of the binary that runhaskell compiled just prior to executing it.
Also, for compiled Haskell projects, the Cabal build system provides data-dir and data-files and the corresponding generated Paths_<yourproject>.hs for locating installed files for your project at runtime.
http://www.haskell.org/cabal/release/cabal-latest/doc/users-guide/authors.html#paths-module
There is a FindBin package which seems to suit your needs and it also works for compiled programs.
For compiled executables, In GHC 7.6 or later you can use System.Environment.getExecutablePath.
getExecutablePath :: IO FilePathSource
Returns the absolute pathname of the current executable.
Note that for scripts and interactive sessions, this is the path to the
interpreter (e.g. ghci.)
There is executable-path which worked with my runghc script. FindBin didn't work for me as it returned my current directory instead of the script dir.
I could not find a way to determine script path from Haskell (which is a real pity IMHO). However, as a workaround, you can wrap your Haskell script inside a shell script:
#!/bin/sh
SCRIPT_DIR=`dirname $0`
runhaskell <<EOF
main = putStrLn "My script is in \"$SCRIPT_DIR\""
EOF

Xeon Phi cannot execute binary file

I am trying to execute a binary file on a xeon phi coprocessor, and it is coming back with "bash: cannot execute binary file". So I am trying to find how to either view an error log or have it display what's happening when I tell it to execute that is causing it not work. I have already tried bash --verbose but it didn't display any additional information. Any ideas?
You don't specify where you compiled your executable nor where you tried to execute from.
To compile a program on the host system to be executed directly on the coprocessor, you must either:
if using one of the Intel compilers, add -mmic to the compiler
command line
if using gcc, use the cross-compilers provided with the MPSS
(/usr/linux-k1om-4.7) - note, however, that the gcc compiler does not
take advantage of vectorization on the coprocessor
If you want to compile directly on the coprocessor, you can install the necessary files from the additional rpm files provided for the coprocessor (found in mpss-/k1om) using the directions from the MPSS user's guide for installing additional rpm files.
To run a program on the coprocessor, if you have compiled it on the host, you must either:
copy your executable file and required libraries to the coprocessor
using scp before you ssh to the coprocessor yourself to execute the
code.
use the micnativeloadex command on the host - you can find a man page
for that on the host.
If you are writing a program using the offload model (part of the work is done using the host then some of the work is passed off to the coprocessor), you can compile on the host using the Intel compilers with no special options.
Note, however, that, regardless of what method you use, any libraries to be used with an executable for the coprocessor will need themselves to be built for the coprocessor. The default libraries exist but any libraries you add, you need to build a version for the coprocessor in addition to any version you make for the host system.
I highly recommend the articles you will find under https://software.intel.com/en-us/articles/programming-and-compiling-for-intel-many-integrated-core-architecture. These articles are written by people who develop and/or support the various programming tools for the coprocessor and should answer most of your questions.
Update: What's below does NOT answer the OP's question - it is one possible explanation for the cannot execute binary file error, but the fact that the error message is prefixed with bash: indicates that the binary is being invoked correctly (by bash), but is not compatible with the executing platform (compiled for a different architecture) - as #Barmar has already stated in a comment.
Thus, while the following contains some (hopefully still somewhat useful) general information, it does not address the OP's problem.
One possible reason for cannot execute binary file is to mistakenly pass a binary (executable) file -- rather than a shell script (text file containing shell code) -- as an operand (filename argument) to bash.
The following demonstrates the problem:
bash printf # fails with '/usr/bin/printf: /usr/bin/printf: cannot execute binary file'
Note how the mistakenly passed binary's path prefixes the error message twice; If the first prefix says bash: instead, the cause is most likely not a problem of incorrect invocation, but one of trying to a invoke an incompatible binary (compiled for a different architecture).
If you want bash to invoke a binary, you must use the -c option to pass it, which allows you to specify an entire command line; i.e., the binary plus arguments; e.g.:
bash -c '/usr/bin/printf "%s\n" "hello"' # -> 'hello'
If you pass a mere binary filename instead of a full path - e.g., -c 'program ...' - then a binary by that name must exist in one of the directories listed in the $PATH variable that bash sees, otherwise you'll get a command not found error.
If, by contrast, the binary is located in the current directory, you must prefix the filename with ./ for bash to find it; e.g. -c './program ...'

Where does Ruby memory config go and how can one check if it is set?

In REE, and MRI 1.9+, ruby's garbage collector can be tuned:
http://www.rubyenterpriseedition.com/documentation.html#_garbage_collector_performance_tuning
http://smartic.us/2010/10/27/tune-your-ruby-enterprise-edition-garbage-collection-settings-to-run-tests-faster/
http://blog.evanweaver.com/articles/2009/04/09/ruby-gc-tuning/
But none of these articles say where to put this configuration. I imagine that if it's in the environment, ruby will pick it up when it starts -- however, there's no way to check this as far as I can tell. The settings don't show up in any runtime constants that I can find.
So, where do I put this configuration, and how can I double-check that it's being used?
These settings are environment variables, so you would just need to set them in the parent process of the ruby process itself. Many people recommend creating a simple shell script for this purpose, perhaps calling it /usr/local/bin/ruby-custom:
#!/bin/bash
export RUBY_HEAP_MIN_SLOTS=20000
export RUBY_HEAP_SLOTS_INCREMENT=20000
...etc...
exec "/path/to/ruby" "$#"
The first few lines set whichever custom variables you want, and the last line invokes ruby itself, passing it whatever arguments this script was initially given.
You will next need to mark this script as executable (chmod a+x /usr/local/bin/ruby-custom) and then configure Passenger to use it as the ruby executable, by adding this to your Apache .conf file:
PassengerRuby /usr/local/bin/ruby-custom

Is it possible to override hashbang/shebang path behavior

I have a bunch of scripts (which can't be modified) written on Windows. Windows allows relative paths in its #! commands. We are trying to run these scripts on Unix but Bash only seems to respect absolute paths in its #! directives. I've looked around but haven't been able to locate an option in Bash or a program designed to replace and interpreter name. Is it possible to override that functionality -- perhaps even by using a different shell?
Typically you can just specify the binary to execute the script, which will cause the #! to be ignored. So, if you have a Python script that looks like:
#!..\bin\python2.6
# code would be here.
On Unix/Linux you can just say:
prompt$ python2.6 <scriptfile>
And it'll execute using the command line binary. I view the hashbang line as one which asks the operating system to use the binary specified on the line, but you can override it by not executing the script as a normal executable.
Worst case you could write some wrapper scripts that would explicitly tell the interpreter to execute the code in the script file for all the platforms that you'd be using.

Resources