Multiline shebang in OCaml? - shell

In short, I'd like to abstract this shebang so I can literally copy and paste it into other .ML files without having to specify the filename each time:
#!/usr/bin/env ocamlscript -o hello
print_endline "Hello World!"
I realize I could just drop the -o hello bit, but I'd like all the binaries to have UNIX names (hello), instead of Windows names (hello.ml.exe).
You need a complex shebang to do this. A Clojure example that has the desired behavior:
":";exec clj -m `basename $0 .clj` $0 ${1+"$#"}
":";exit
Clojure is Java-based, which is why clj needs the basename of the file (something, not something.clj). In order to get the basename, you need a multiline shebang, because a single line shebang can only handle a single, simple, static command line argument. In order to do multiline shebangs, you need a syntax which simultaneously:
Sends shell commands to the shell
Hides the shell commands from the main language
Does anyone know of OCaml trickery to do this? I've tried the following with no success:
(*
exec ocamlscript -o `basename $0 .ml` $0 ${1+"$#"}
exit
*)
let rec main = print_endline "Hello World!"

What you're looking for is a shell and Objective Caml polyglot (where the shell part invokes an ocaml interpreter to perform the real work). Here's a relatively simple one. Adapt to use ocamlscript if necessary, though I don't see the point.
#!/bin/sh
"true" = let exec _ _ _ = "-*-ocaml-*- vim:set syntax=ocaml: " in
exec "ocaml" "$0" "$#"
;;
(* OCaml code proper starts here *)
print_endline "hello"

After some trials, I found this shebang:
#!/bin/sh
"true" = let x' = "" in (*'
sh script here
*) x'
It is sort of an improvement of Gilles’ proposal, as it permits to write a full shell script inside the OCaml comment, without being bothered at all with syntax incompatibilities.
The script must terminate (eg. with exec or exit) without reaching the end of the comment, otherwise a syntax error will occur. This can be fixed easily, but it is not very useful regarding the intended use of such a trick.
Here is a variant that entails zero runtime overhead on the OCaml side, but declares a new type name (choose it arbitrarily complicated if this is bothering):
#!/bin/sh
type int' (*' >&- 2>&-
sh script here
*)
For example, here is a script that executes the OCaml code with modules Str and Unix, and can also compile it when passed the parameter --compile:
#!/bin/sh
type int' (*' >&- 2>&-
if [ "$1" = "--compile" ]; then
name="${0%.ml}"
ocamlopt -pp 'sed "1s/^#\!.*//"' \
str.cmxa unix.cmxa "$name.ml" -o "$name" \
|| exit
rm "$name".{cm*,o}
exit
else
exec ocaml str.cma unix.cma "$0" "$#"
fi
*)

I do not think that ocamlscript supports this. It may be worth submitting a feature request to the author to allow customization of the compiled binary's extension without specifying the full output basename.

Related

Store a command in a variable; implement without `eval`

This is almost the exact same question as in this post, except that I do not want to use eval.
Quick question short, I want to execute the command echo aaa | grep a by first storing it in a string variable Command='echo aaa | grep a', and then running it without using eval.
In the post above, the selected answer used eval. That works for me too. What concerns me a lot is that there are plenty of warnings about eval below, followed by some attempts to circumvent it. However, none of them are able to solve my problem (essentially the OP's). I have commented below their attempts, but since it has been there for a long time, I suppose it is better to post the question again with the restriction of not using eval.
Concrete Example
What I want is a shell script that runs my command when I am happy:
#!/bin/bash
# This script run-this-if.sh runs the commands when I am happy
# Warning: the following script does not work (on nose)
if [ "$1" == "I-am-happy" ]; then
"$2"
fi
$ run-if.sh I-am-happy [insert-any-command]
Your sample usage can't ever work with an assignment, because assignments are scoped to the current process and its children. Because there's no reason to try to support assignments, things get suddenly far easier:
#!/bin/sh
if [ "$1" = "I-am-happy" ]; then
shift; "$#"
fi
This then can later use all the usual techniques to run shell pipelines, such as:
run-if-happy "$happiness" \
sh -c 'echo "$1" | grep "$2"' _ "$untrustedStringOne" "$untrustedStringTwo"
Note that we're passing the execve() syscall an argv with six elements:
sh (the shell to run; change to bash etc if preferred)
-c (telling the shell that the following argument is the code for it to run)
echo "$1" | grep "$2" (the code for sh to parse)
_ (a constant which becomes $0)
...whatever the shell variable untrustedStringOne contains... (which becomes $1)
...whatever the shell variable untrustedStringTwo contains... (which becomes $2)
Note here that echo "$1" | grep "$2" is a constant string -- in single-quotes, with no parameter expansions or command substitutions -- and that untrusted values are passed into the slots that fill in $1 and $2, out-of-band from the code being evaluated; this is essential to have any kind of increase in security over what eval would give you.

How to iterate over double-quoted strings in POSIX shell?

I am trying to check if all the non POSIX commands that my script depends on are present before my script proceeds with its main job. This will help me to ensure that my script does not generate errors later due to missing commands.
I want to keep the list of all such non POSIX commands in a variable called DEPS so that as the script evolves and depends on more commands, I can edit this variable.
I want the script to support commands with spaces in them, e.g. my program.
This is my script.
#!/bin/sh
DEPS='ssh scp "my program" sftp'
for i in $DEPS
do
echo "Checking $i ..."
if ! command -v "$i"
then
echo "Error: $i not found"
else
echo "Success: $i found"
fi
echo
done
However, this doesn't work, because "my program" is split into two words while the for loop iterates: "my and program" as you can see in the output below.
# sh foo.sh
Checking ssh ...
/usr/bin/ssh
Success: ssh found
Checking scp ...
/usr/bin/scp
Success: scp found
Checking "my ...
Error: "my not found
Checking program" ...
Error: program" not found
Checking sftp ...
/usr/bin/sftp
Success: sftp found
The output I expected is:
# sh foo.sh
Checking ssh ...
/usr/bin/ssh
Success: ssh found
Checking scp ...
/usr/bin/scp
Success: scp found
Checking my program ...
Error: my program not found
Checking sftp ...
/usr/bin/sftp
Success: sftp found
How can I solve this problem while keeping the script POSIX compliant?
I'll repeat the answer I gave to your previous question: use a while loop with a here document rather than a for loop. You can embed newlines in a string, which is all you need to separate command names in a string if those command names might contain whitespace. (If your command names contain newlines, strongly consider renaming them.)
For maximum POSIX compatibility, use printf, since the POSIX specification of echo is remarkably lax due to differences in how echo was implemented in various shells prior to the definition of the standard.
deps="ssh
scp
my program
sftp
"
while read -r cmd; do
printf "Checking $cmd ...\n"
if ! command -v "$cmd"; then
printf "Error: $i not found\n"
else
printf "Success: $cmd found\n"
fi
printf "\n"
done <<EOF
$deps
EOF
This happens because the steps after parameter expansion are string-splitting and glob-expansion -- not syntax-level parsing (such as handling quoting). To go all the way back to the beginning of the parsing process, you need to use eval.
Frankly, the best approaches are to either:
Target a shell that supports arrays (ksh, bash, zsh, etc) rather than trying to support POSIX
Don't try to retrieve the value from a variable.
...there's a reason proper array support is ubiquitous in modern shells; writing unambiguously correct code, particularly when handling untrusted data, is much harder without it.
That said, you have the option of using $# to store your contents, which can be set, albeit dangerously, using eval:
deps='goodbye "cruel world"'
eval "set -- $deps"
for program; do
echo "processing $program"
done
If you do this inside of a function, you'll override only the function's argument list, leaving the global list unmodified.
Alternately, eval "yourfunction $deps" will have the same effect, setting the argument list within the function to the results of running all the usual parsing and expansion phases on the contents of $deps.
Because the script is in your controll, you can use the eval with reasonable safety, so #Charles Duffy's answer is an simple and good solution. Use it. :)
Also, consider to use the autoconf for generating the usual configure script what is doing good job for what you need - e.g. checking commands and much more... At least, check some configure scripts for ideas how to solvle common problems...
If you want play with your own implementation:
divide the dependecies into two groups
core_deps - unix tools, what are commonly needed for the script itself, like sed, cat cp and such. Those programs doesn't contains spaces in their names, nor in the $PATH.
runtime_deps - programs, what are needed for your application, but not for the script itself.
do the checks in two steps (or more, for example if you need check e.g. libraries)
never use the for loop for space delimited elements unless you getting them as the function arguments - so you can use the "$#"
As starting script could be something like the following:
_check_core_deps() {
for _cmd
do
_cpath=$(command -v "$_cmd")
case "$_cpath" in
/*) continue;;
*) echo "Missing install dependency [$_cmd] - can't continue" ; exit 1 ;;
esac
done
return 0
}
core_deps="grep sed hooloovoo cp" #list of "core" commands - they doesn't contains spaces
_check_core_deps $core_deps || exit 1
The above will blow up on non-existent "hooloovoo" command. :)
Now you can safely continue, all core commands needed for the install script are available. In the next step, you can check other strange dependencies.
Some ideas:
# function what returns your dependecies as lines from HEREDOC
# (e.g. could contain any character except "\n")
# you can decorate the dependecies with comments...
# because we have sed (checked in the 1st step, can use it)
# if want, you can add "fields" too, for some extended functinality with an specified delimiter
list_deps() {
_sptab=$(printf " \t") # the $' \t' is approved by POSIX for the next version only
#the "sed" removes comments and empty lines
#the UUOC (useless use of cat) is intentional here
#for example if you want add "tr" before the "sed"
#of course, you can remove it...
cat - <<DEPS |sed "s/[$_sptab]*#.*//;/^[$_sptab]*$/d"
########## DEPENDECIES ############
#some comment
ssh
scp
sftp
#comment
#bla bla
my program #some comment
/Applications/Some Long And Spaced OSX Apllication.app
DEPS
########## END of DEPENDECIES #####
}
_check_deps() {
#in the "while" loop you can use IFS=: or such and adding anouter variable to read
#for getting more fields for some extended functionality
list_deps | while read -r line
do
#do any checks with the line
#implement additional functionalities as functions
#etc...
#remember - your in an subshell here
printf "command:%s\n" "$line"
done
}
_check_deps
One more thing :), (or two)
if you doubt about the content of some variables, don't use the echo. The POSIX isn't defines how it should act when contains escaped characters (e.g. echo "some\nwed"). Use:
printf '%s' "$variable"
never use uppercase only variables like "DEPS"... they're only for environment variables...

Quoting rules for script -c 'command'

I am looking for the quoting/splitting rules for a command passed to script -c command. The man pages just says
-c, --command command: Run the command rather than an interactive shell.
but I want to make sure "command" is properly escaped.
The COMMAND argument is just a regular string that is processed by the shell as if it were an excerpt of a file. We may think of -c COMMAND as being functionally equivalent of
printf '%s' COMMAND > /tmp/command_to_execute.sh
sh /tmp/command_to_execute.sh
The form -c COMMAND is however superior to the version relying of an auxiliary file because it avoids race conditions related to using an auxiliary file.
In the typical usage of the -c COMMAND option we pass COMMAND as a single-quoted string, as in this pseudo-code example:
sh -c '
do_some_complicated_tests "$1" "$2";
if something; then
proceed_this_way "$1" "$2";
else
proceed_that_way "$1" "$2";
fi' ARGV0 ARGV1 ARGV2
If command must contain single-quoted string, we can rely on printf to build the COMMAND string, but this can be tedious. An example of this technique is illustrated
by the overcomplicated grep-like COMMAND defined here:
% AWKSCRIPT='$0 ~ expr {print($0)}'
% COMMAND=$(printf 'awk -v expr="$1" \047%s\047' "$AWKSCRIPT")
% sh -c "$COMMAND" print_matching 'tuning' < /usr/share/games/fortune/freebsd-tips
"man tuning" gives some tips how to tune performance of your FreeBSD system.
Recall that 047 is octal representation of the ASCII code for the single quote character.
As a side note, these constructions are quite command in Makefiles where they can replace shell functions.

What does !# (reversed shebang) means?

From this link: http://scala.epfl.ch/documentation/getting-started.html
#!/bin/sh
exec scala "$0" "$#"
!#
object HelloWorld extends App {
println("Hello, world!")
}
HelloWorld.main(args)
I know that $0 is for the script name, and $# for all argument passed to the execution, but what does !# means (google bash "!#" symbols seems to show no result)?
does it mean exit from script and stdin comes from remaining lines?
This is part of scala itself, not bash. Note what's happening: the exec command replaces the process with scala, which then reads the file given as "$0", i.e., the bash script file itself. Scala ignores the part between #! and !# and interprets the rest of the text as the scala program. They chose the "reverse shebang" as an appropriate counterpart to the shebang.
To see what I mean about exec replacing the process, try this simple script:
#!/bin/sh
exec ls
echo hello
It will not print "hello" since the process will be replaced by the ls process when exec is executed.
Reference: http://www.scala-lang.org/files/archive/nightly/docs-2.10.2/manual/html/scala.html
A side comment, consider multiline script,
#!/bin/sh
SOURCE="$LIB1/app.jar:$LIB2/app2.jar"
exec scala -classpath $SOURCE -savecompiled "$0" "$#"
!#
Also note -savecompiled which can speed up reexecutions notably.

bash: function + source + declare = boom

Here is a problem:
In my bash scripts I want to source several file with some checks, so I have:
if [ -r foo ] ; then
source foo
else
logger -t $0 -p crit "unable to source foo"
exit 1
fi
if [ -r bar ] ; then
source bar
else
logger -t $0 -p crit "unable to source bar"
exit 1
fi
# ... etc ...
Naively I tried to create a function that do:
function safe_source() {
if [ -r $1 ] ; then
source $1
else
logger -t $0 -p crit "unable to source $1"
exit 1
fi
}
safe_source foo
safe_source bar
# ... etc ...
But there is a snag there.
If one of the files foo, bar, etc. have a global such as --
declare GLOBAL_VAR=42
-- it will effectively become:
function safe_source() {
# ...
declare GLOBAL_VAR=42
# ...
}
thus a global variable becomes local.
The question:
An alias in bash seems too weak for this, so must I unroll the above function, and repeat myself, or is there a more elegant approach?
... and yes, I agree that Python, Perl, Ruby would make my life easier, but when working with legacy system, one doesn't always have the privilege of choosing the best tool.
It's a bit late answer, but now declare supports a -g parameter, which makes a variable global (when used inside function). Same works in sourced file.
If you need a global (read exported) variable, use:
declare -g DATA="Hello World, meow!"
Yes, Bash's 'eval' command can make this work. 'eval' isn't very elegant, and it sometimes can be difficult to understand and debug code that uses it. I usually try to avoid it, but Bash often leaves you with no other choice (like the situation that prompted your question). You'll have to weigh the pros and cons of using 'eval' for yourself.
Some background on 'eval'
If you're not familiar with 'eval', it's a Bash built-in command that expects you to pass it a string as its parameter. 'eval' dynamically interprets and executes your string as a command in its own right, in the current shell context and scope. Here's a basic example of a common use (dynamic variable assignment):
$> a_var_name="color"
$> eval ${a_var_name}="blue"
$> echo -e "The color is ${color}."
The color is blue.
See the Advanced Bash Scripting Guide for more info and examples: http://tldp.org/LDP/abs/html/internal.html#EVALREF
Solving your 'source' problem
To make 'eval' handle your sourcing issue, you'd start by rewriting your function, 'safe_source()'. Instead of actually executing the command, 'safe_source()' should just PRINT the command as a string on STDOUT:
function safe_source() { echo eval " \
if [ -r $1 ] ; then \
source $1 ; \
else \
logger -t $0 -p crit \"unable to source $1\" ; \
exit 1 ; \
fi \
"; }
Also, you'll need to change your function invocations, slightly, to actually execute the 'eval' command:
`safe_source foo`
`safe_source bar`
(Those are backticks/backquotes, BTW.)
How it works
In short:
We converted the function into a command-string emitter.
Our new function emits an 'eval' command invocation string.
Our new backticks call the new function in a subshell context, returning the 'eval' command string output by the function back up to the main script.
The main script executes the 'eval' command string, captured by the backticks, in the main script context.
The 'eval' command string re-parses and executes the 'eval' command string in the main script context, running the whole if-then-else block, including (if the file exists) executing the 'source' command.
It's kind of complex. Like I said, 'eval' is not exactly elegant. In particular, there are a couple of special things you should notice about the changes we made:
The entire IF-THEN-ELSE block has becomes one whole double-quoted string, with backslashes at the end of each line "hiding" the newlines.
Some of the shell special characters like '"') have been backslash-escaped, while others ('$') have been left un-escaped.
'echo eval' has been prepended to the whole command string.
Extra semicolons have been appended to all of the lines where a command gets executed to terminate them, a role that the (now-hidden) newlines originally performed.
The function invocation has been wrapped in backticks.
Most of these changes are motived by the fact that 'eval' won't handle newlines. It can only deal with multiple commands if we combine them into a single line delimited by semicolons, instead. The new function's line breaks are purely a formatting convenience for the human eye.
If any of this is unclear, run your script with Bash's '-x' (debug execution) flag turned on, and that should give you a better picture of exactly what's happening. For instance, in the function context, the function actually produces the 'eval' command string by executing this command:
echo eval ' if [ -r <INCL_FILE> ] ; then source <INCL_FILE> ; else logger -t <SCRIPT_NAME> -p crit "unable to source <INCL_FILE>" ; exit 1 ; fi '
Then, in the main context, the main script executes this:
eval if '[' -r <INCL_FILE> ']' ';' then source <INCL_FILE> ';' else logger -t <SCRIPT_NAME> -p crit '"unable' to source '<INCL_FILE>"' ';' exit 1 ';' fi
Finally, again in the main context, the eval command executes these two commands if exists:
'[' -r <INCL_FILE> ']'
source <INCL_FILE>
Good luck.
declare inside a function makes the variable local to that function. export affects the environment of child processes not the current or parent environments.
You can set the values of your variables inside the functions and do the declare -r, declare -i or declare -ri after the fact.

Resources