Prepend header to file without changing the file - bash

Background
The enscript command can apply syntax highlighting to various types of source files, including SQL statements, shell scripts, PHP code, HTML files, and more. I am using enscript to generate 300dpi images of source code for a technical manual to:
Generate content for the book based on actual source code.
Distribute the source code along with the book, without any modification.
Run and test the scripts while writing the book.
Problem
The following shell script performs the conversion almost as desired:
#!/bin/bash
DIRNAME=$(dirname $1)
FILENAME=$(basename $1)
# Remove the extension from the filename.
BASENAME=${FILENAME%%.*}
FILETYPE=${FILENAME##*.}
LIGHTGRAY="#f3f3f3"
enscript --escapes --color -f Courier10 -X ps -B -1 --highlight=$FILETYPE \
$2 -h -o - $1 | \
gs -dSAFER -sDEVICE=pngalpha -dGraphicsAlphaBits=4 -dNOPAUSE -r300 \
-sOutputFile=$BASENAME.png -dBackgroundColor=16$LIGHTGRAY > /dev/null && \
convert -trim $BASENAME.png $BASENAME-trimmed.png && \
mv $BASENAME-trimmed.png $BASENAME.png
The problem is that the background is not a light gray colour. According to the enscript man page, the --escapes (-e) option indicates that the file (i.e., $1) has enscript-specific control sequences embedded within it.
Adding the control sequences means having to duplicate code, which defeats the purpose of having a single source.
Solution
The enscript documentation implies that it should be possible to concatenate two files together (the target and a "header") before running the script, to create a third file:
^#shade{0.85} -- header line
#!/bin/bash -- start of source file
Then delete the third file once the command completes.
Questions
Q.1. What is a more efficient way to pipe the control sequences and the source file to the enscript program without using a third file?
Q.2. What other options are available to automate syntax highlighting for a book, while honouring the single source requirements I have described? (For example, write the book in LyX and use LaTeX commands for import and syntax highlighting.)

Q1 You can use braces '{}' to do I/O redirection:
{ echo "^#shade{0.85}"; cat $1; } |
enscript --color -f Courier10 -X ps -B -1 --highlight=$FILETYPE $2 -h -o - |
gs -dSAFER -sDEVICE=pngalpha -dGraphicsAlphaBits=4 -dNOPAUSE -r300 \
-sOutputFile=$BASENAME.png -dBackgroundColor=16$LIGHTGRAY > /dev/null &&
convert -trim $BASENAME.png $BASENAME-trimmed.png &&
mv $BASENAME-trimmed.png $BASENAME.png
This assumes that enscript reads its standard input when not given an explicit file name; if not, you may need to use an option (perhaps '-i -') or some more serious magic, possibly even 'process substitution' in bash.
You could also use parentheses to run a sub-shell:
(echo "^#shade{0.85}"; cat $1) | ...
Note that the semi-colon after cat is necessary with braces and not necessary with parentheses (and a space is necessary after the open brace) - such are the mysteries of shell scripting.
Q2 I don't have any alternatives to offer. When I produced a book (20 years ago now, using troff), I wrote a program to convert source into the the necessary markup, so that the book was produced from the source code, but by an automated process.
(Is 300 dpi sufficiently high resolution?)
Edit
To work-around the enscript program interpreting the escape sequence embedded in the conversion script itself:
{ cat ../../enscript-header.txt $1; } |

Q2: Use LaTeX with the listings package.

Related

How to pass a Bash command to `entr`, quoting to guard against filenames with spaces?

My Goal
I'm writing a small Bash script, which uses entr, which is a utility to re-run arbitrary commands when it detects file-system events. My immediate goal is to pass entr a command which converts a given markdown file to HTML. entr will run this command every time the markdown file changes. A simplified but working script looks like:
# script 1
in="$1"
out="${in%.md}.html"
echo "$in" | entr pandoc "${in}" -o "${out}"
This works fine. The filename to be watched is supplied to entr on stdin. On detecting changes in that file, entr runs the command specified by its args. In this example that is pandoc, and all the args after it, to convert the markdown file to an HTML file.
For future reference, set -x shows that entr was invoked as we'd expect. (Throughout, lines starting with + show the output from set -x):
+ entr pandoc 'READ ME.md' -o 'READ ME.html'
The problem
I want to look-up the command given to entr depending on the file-type of the
given input file. So the file-conversion command ends up in a variable, and I want to use that variable as the command-line args to entr. But I can't get the quoting right.
Again, simplified:
# script 2
in="$1"
out="${in%.md}.html"
cmd="pandoc \"${in}\" -o \"${out}\""
echo "$in" | entr "$cmd"
(shellcheck.net detects no issues on the above)
This fails. Because "$cmd" in the final line is in quotes, the entirety of $cmd
is treated as a single arg to entr:
+ entr 'pandoc "READ ME.md" -o "READ ME.html"'
entr tries to interpret the whole thing as the name of an executable, which
it cannot find:
entr: exec pandoc "READ ME.md" -o "READ ME.html": No such file or directory
So how should I modify script 2, to use the content of $cmd as the args to
entr?
What have I tried?
Check that $cmd is being formed as I expect? If I echo "$cmd" right after
it is defined in script 2, it looks exactly how I'd hope:
pandoc "READ ME.md" -o "READ ME.html"
I tried messing around with alternate ways of constructing cmd, such as:
cmd='pandoc "'"${in}"'" -o "'"${out}"'"'
but variations like this produce identical values of $cmd, and identical
behavior as script2.
Try not quoting the use of $cmd?
Since the final line of script 2 erroneously treats the whole of "$cmd"
as a single arg, and we want it to split up the words into seprate args
instead, maybe removing the quotes and using a bare $cmd is a step in the
right direction?
echo "$in" | entr $cmd
Predictably enough though, this splits $cmd up on every space, even the
ones inside our double-quotes:
+ entr pandoc '"READ' 'ME.md"' -o '"READ' 'ME.html"'
This makes Pandoc try, and fail, to open a file called "READ:
pandoc: "READ: openBinaryFile: does not exist (No such file or directory)
Try constructing $cmd using printf?
I notice printf -v can store output in a variable. How about using that
instead of assiging to cmd?
printf -v cmd 'pandoc "%s" -o "%s"' "$in" "$out"
Predictably enough, this produces the same results as script2. I tried some
speculative variations, such as %q in the format string, or using $in
and $out directly in the format string, but didn't stumble on anything
that seemed to help.
Try using the ${var#Q} form of parameter expansion.
echo "$in" | entr ${cmd#Q}
Tried with and without double quotes around the use of ${cmd#q}. No joy,
I guess I'm misunderstanding what #Q is for.
+ entr ''\''pandoc' '"READ' 'ME.md"' -o '"READ' 'ME.html"'\'''
entr: exec 'pandoc: No such file or directory
Details
I'm using Bash v5.1.16, in Pop!_OS 22.04, derived from Ubuntu 22.04 (Jammy).
The current 'apt' version of entr (v5.1) in Ubuntu Jammy (22.04) is too old
for my needs (e.g. the -z flag doesn't work.) so I'm compiling my own from
the latest v5.3 source release.
I know there are a lot of questions about quoting in Bash, but I don't see any that seem to match this. Apologies if I'm wrong.
Assemble the command as an array, instead of a string.
I read somewhere that maybe $# might do what I need, so I put the parts of $cmd into an array:
in="$1"
out="${in%.md}.html"
cmd=(pandoc "$in" -o "$out")
echo "$in" | entr "${cmd[#]}"
This correctly quotes the items in ${cmd[#]} which require it (e.g. have spaces in.)
+ entr pandoc 'READ ME.md' -o 'READ ME.html'
So ‘entr’ successfully calls ‘pandoc’, which successfully converts the documents. It works! I confess I did not expect that.
This approach seems viable for other similar situations, not just when invoking entr.
So I have a solution. It doesn't seem completely ideal for my future plans. I had visions of these 'file conversion commands' being configurable, and hence defined in a text file somewhere, so that users (==me, probably) could override them and define their own, and I'm not fluent enough with Bash to be sure how to go about that when commands are defined as arrays instead of strings.
I can't help but feel I've overlooked something simpler.
Use a shell to interpret the value of "$cmd":
echo "$in" | entr sh -c "$cmd"
This approach seems viable for other similar situations, not just when invoking entr.
Similarly, entr has a -s option which invokes a shell for you (chosen using the first word in $SHELL):
echo "$in" | entr -s "$cmd"
These both work well, at the minor cost of spawning an extra shell process.

how to untar certain files from an archive and grep in parallel in bash

We've got extensive amount of tarballs and in each tarball I need to search for a particular pattern only in some files which names are known before hand.
As the disk access is slower and there is quite a few cores and plenty of memory available on this system, we aim minimising the disk writes and going through the memory as much as possible.
echo "a.txt" > file_subset_in_tar.txt
echo "b.txt" >> file_subset_in_tar.txt
echo "c.txt" >> file_subset_in_tar.txt
tarball_name="tarball.tgz";
pattern="mypattern"
echo "pattern: $pattern"
(parallel -j-2 tar xf $tarball_name -O ::: `cat file_subset_in_tar.txt` | grep -ac "$pattern")
This works just fine on the bash terminal directly. However, when I paste this in a script with bash bang on the top, it just prints zero.
If I change the $pattern to a hard coded string, it runs ok. It feels like there is something wrong with the pipe sequencing or something similar. So, ideally an update to the attempt above or another solution which satisfies the mentioned disk/memory use requirements would be much appreciated.
I believe your parallel command is constructed incorrectly. You can run the pipeline of commands like the following:
parallel -j -2 "tar xf $tarball_name -O {} | grep -ac $pattern" :::: file_subset_in_tar.txt
Also note that the backticks and use of cat is unnecessary, parameters can be fed to parallel from a file using ::::.

Bash: How to assign output of command that ends with segmentation fault to variable

I am using a small program written by someone else in bash that runs according to cron on my Synology NAS and basically it does search for subtitles for my movies collection and convert their encoding to utf8 if needed.
In general the main bash script calls another subscripts, and unfortunetly it doesn't work 100% as it should. During my investigation I have narrowed down the problem being this specific function in one of the subscripts:
subs_getCharset_SO() {
local file="$1"
local charset=
local et=
tools_isDetected "file" || return $G_RETFAIL
et=$(file \
--brief \
--mime-encoding \
--exclude apptype \
--exclude tokens \
--exclude cdf \
--exclude compress \
--exclude elf \
--exclude soft \
--exclude tar \
"$file" | wrappers_lcase_SO) || {
return $G_RETFAIL
}
case "$et" in
*utf*) charset="UTF8";;
*iso*) charset="ISO-8859-2";;
us-ascii) charset="US-ASCII";;
csascii) charset="CSASCII";;
*ascii*) charset="ASCII";;
*) charset="WINDOWS-1250";;
esac
echo "$charset"
}
It turns out that running the file command on every movie file causes always a Segmentation fault. I have reproduced it by running this command in terminal manually:
admin#Synek:/volume1/video/Filmy/Ghostland.2018$ file --brief --mime encoding Ghostland.2018.txt
The output is:
utf-8
Segmentation fault
So my main problem as I think is that the output of the file command is not assigned to et variable. My goal ideally would be to capture the first line of the output and assign it to et variable. Or at least redirect the output to a file, so far I have tried some solutions that I have found in the web:
admin#Synek:/volume1/video/Filmy/Ghostland.2018$ { file --brief --mime-encoding ./Ghostland.2018.txt; } 2> log
which outputs in terminal just the line that I need and omits the Segmentation fault message:
utf8
Running:
admin#Synek:/volume1/video/Filmy/Ghostland.2018$ cat log
Gives:
Segmentation fault
But I just can't find a way to get the first line before Segmentation fault written in the log output file.
Any help appreciated!
When stdout is to a TTY, GNU libc (like most implementations) configures line-buffering by default, so output written with the standard C library is printed whenever a full line is complete (since it's assumed that a human is watching and wants to see results as soon as they're available, even if that makes overall execution take longer). By contrast, when stdout is to a FIFO or a file, a larger output buffer is used for better efficiency.
Because a SIGSEGV doesn't allow a program to flush its buffers, that means that data still in the buffer at the time of the failure is lost.
On a system with GNU coreutils, you can configure unbuffered or line-buffered stdout (by default, programs can still override it) using the tool stdbuf:
result=$(stdbuf -o0 file --brief --mime-encoding ./Ghostland.2018.txt)
...or, on systems without GNU coreutils but with expect installed, you can use the tool unbuffer:
result=$(unbuffer file --brief --mime-encoding ./Ghostland.2018.txt)
See BashFAQ #9 for more background on buffering and its control from the shell.

read multiple files in bash

I have two .txt files that I want to read line per line simultaneously in .sh script. Both .txt files have the same number of lines. Inside the loop I want to use the sed-command to change the full_sample_name and sample_name in another file.
I know how this works if you just read one file, but I cannot get it work for two files.
#! /bin/bash
FULL_SAMPLE="file1.txt"
SAMPLE="file2.txt"
while read ... && ...
do
sed -e "s/\<full_sample_name\>/$FULL_SAMPLE/g" -e "s/\<sample_name\>/$SAMPLE/g" pipeline.sh > $SAMPLE.sh
done < ...?
Charles provided a very good answer.
You could use paste to join the lines of the files with some delimiter (that shouldn't appear in the files):
paste -d ":" file1.txt file2.txt | while IFS=":" read -r full samp; do
do_stuff_with "$full" and "$samp"
done
#!/bin/bash
full_sample_file="file1.txt"
sample_file="file2.txt"
while read -r -u 3 full_sample_name && read -r -u 4 sample_name; do
sed -e "s/\<full_sample_name\>/$full_sample_name/g" \
-e "s/\<sample_name\>/$sample_name/g" \
pipeline.sh >"$sample_name.sh"
done 3<"$full_sample_file" 4<"$sample_file" # automatically closed on loop exit
In this case, I'm assigning file descriptor 3 to file1.txt and file descriptor 4 to file2.txt.
By the way, with bash 4.1 or newer, you no longer need to handle file descriptors manually:
# opening explicitly, since even if opened on the loop, these need
# to be explicitly closed.
exec {full_sample_fd}<file1.txt
exec {sample_fd}<file2.txt
while read -r -u "$full_sample_fd" full_sample_name \
&& read -r -u "$sample_fd" sample_name; do
: do stuff here with "$full_sample_name" and "$sample_name"
done
# close the files explicitly
exec {full_sample_fd}>&- {sample_fd}>&-
One more note: You could make this a bit more efficient (and also more correct, if your sample_name and full_sample_name values aren't guaranteed to evaluate to themselves when interpreted as regular expressions, if your input file contains no literal NULs [which, as a shell script, it shouldn't], and if the arrow brackets are intended to be literal rather than word-boundary regex characters) by not using sed at all, but just reading the input to be converted into a shell variable, and doing the replacements there!
exec {full_sample_fd}<file1.txt
exec {sample_fd}<file2.txt
IFS= read -r -d '' input_file <pipeline.sh
while read -r -u "$full_sample_fd" full_sample_name \
&& read -r -u "$sample_fd" sample_name; do
output=${input_file//'<full_sample_name>'/${full_sample_name}}
output=${output//'<sample_name>'/${sample_name}}
printf '%s' "$output" >"${sample_name}.sh"
done
# close the files explicitly
exec {full_sample_fd}>&- {sample_fd}>&-
With GNU Parallel it will look like this:
#! /bin/bash
do_sed() {
sed -e "s/\<full_sample_name\>/$1/g" -e "s/\<sample_name\>/$2/g" pipeline.sh > "$2".sh
}
export -f do_sed
parallel --xapply do_sed {1} {2} :::: file1.txt file2.txt
The added benefit is that you get it run in parallel. Depending on your storage system this may speed up the processing: On a raid6 I have seen a 6x speedup by running 10 jobs in parallel. YMMV, so the only way to know for sure is to test and measure.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

bash completion for teamocil

I'm trying to define bash auto-completion for teamocil so that when I type teamocil <tab> it should complete with the file names in the folder ~/.teamocil/ without the file extensions. There's an example for zsh in the website:
compctl -g '~/.teamocil/*(:t:r)' teamocil
how can I use this in bash?
Edit: Influenced by michael_n 's answer I have come up with a one-liner:
complete -W "$(teamocil --list)" teamocil
Here's a generalized version of another completion script I have that does something similar. It assumes a generic hypothetical command "flist", using some directory of files defined by FLIST_DIR to complete the command (omitting options).
Modify the following for your program (teamocil), change the default dir from $HOME/flist to $HOME/.teamocil), define your own filters/transformations, etc; and then just source it (e.g., . ~/bin/completion/bash_completion_flist), optionally adding it to your existing list of bash completions.
# bash_completion_flist:
# for some hypothetical command called "flist",
# generate completions using a directory of files
FLIST_DIR=${FLIST_DIR=:-"$HOME/flist"}
_flist_list_files() {
ls $FLIST_DIR | sed 's/\..*//'
}
_flist() {
local cur="${COMP_WORDS[COMP_CWORD]}"
COMPREPLY=()
[[ ${cur} != -* ]] \
&& COMPREPLY=($(compgen -W "$(_flist_list_files)" -- ${cur}))
}
complete -o bashdefault -o default -o nospace -F _flist flist 2>/dev/null \
|| complete -o default -o nospace -F _flist flist
Notes:
it could be shorter, but this is more or less a template for longer, more complicated completions. (Functions are Good.)
the actual completion command (complete -o ...) is a bit of a hack to work across different versions of bash.
the suffix stripping is over-simplfied if there are "." in the filename, and is left as an exercise for the reader :-) There are multiple ways to do this (sed, awk, etc); the best is via bash-isms (base=${filename%.*}), but the easiest is arguably the simple sed with some assumptions about the filename format.
Bash implements similar idea but another way, so commands and files for zsh won't work in bash. But you may write your own rules for autocompletion. More info:
An introduction to bash completion: part 1
An introduction to bash completion: part 2
SO: How to enable tab-completion of command line switches in bash?
SO: Auto-complete command line arguments

Resources