Ack in a bash subshell - bash

I want to search a number of files "in a bash loop".
Suppose I exec an ack search in a bash loop:
#!/bin/bash
seq 3 | while read i
do
test=`ack root /etc/passwd`
echo $test
done
prints one empty line.
Where as
#!/bin/bash
seq 3 | while read i
do
# test=`ack root /etc/passwd`
echo $test
done
prints 3 empty lines.
If I ran just that one command from the bash it works:
ack root /etc/passwd
This also works:
$ test=`ack root /etc/passwd`
$ echo $test
I think ack somehow breaks the loop.
Here's "the origin of the problem":
ls input/* | while read scan
do
temp=`basename "$scan"`
baseName=${temp%.*}
extension=${temp#*.}
# OCR:
abbyyocr11 -rl Russian -if "$scan" -f TextUnicodeDefaults -of "temp/$baseName.txt"
# get data:
firstName=` ack '^Имя\s+(.+)' --output='$1' "temp/$baseName.txt" | sed 's/ //g'`
middleName=`ack '^Отчество\s+(.+)' --output='$1' "temp/$baseName.txt" | sed 's/ //g'`
lastName=` ack '^Фамилия\s+(.+)' --output='$1' "temp/$baseName.txt" | sed 's/ //g'`
# copy the file with a meaningful name:
cp --backup=numbered "$scan" "output/$lastName$firstName$middleName.$extension"
done
Edit
Turns out --nofilter option solves it. According to --help message it forces ack treat stdin as tty as opposed to pipe. I wonder what this means.

It looks like it doesn't like a pipe on stdin for some reason.
I don't know why it does that, but a quick and easy fix would be:
seq 3 | while read i
do
test=`</dev/null ack root /etc/passwd`
echo $test
done
or in your case
# get data:
{
#....
} </dev/null

Related

Set a command to a variable in bash script problem

Trying to run a command as a variable but I am getting strange results
Expected result "1" :
grep -i nosuid /etc/fstab | grep -iq nfs
echo $?
1
Unexpected result as a variable command:
cmd="grep -i nosuid /etc/fstab | grep -iq nfs"
$cmd
echo $?
0
It seems it returns 0 as the command was correct not actual outcome. How to do this better ?
You can only execute exactly one command stored in a variable. The pipe is passed as an argument to the first grep.
Example
$ printArgs() { printf %s\\n "$#"; }
# Two commands. The 1st command has parameters "a" and "b".
# The 2nd command prints stdin from the first command.
$ printArgs a b | cat
a
b
$ cmd='printArgs a b | cat'
# Only one command with parameters "a", "b", "|", and "cat".
$ $cmd
a
b
|
cat
How to do this better?
Don't execute the command using variables.
Use a function.
$ cmd() { grep -i nosuid /etc/fstab | grep -iq nfs; }
$ cmd
$ echo $?
1
Solution to the actual problem
I see three options to your actual problem:
Use a DEBUG trap and the BASH_COMMAND variable inside the trap.
Enable bash's history feature for your script and use the hist command.
Use a function which takes a command string and executes it using eval.
Regarding your comment on the last approach: You only need one function. Something like
execAndLog() {
description="$1"
shift
if eval "$*"; then
info="PASSED: $description: $*"
passed+=("${FUNCNAME[1]}")
else
info="FAILED: $description: $*"
failed+=("${FUNCNAME[1]}")
done
}
You can use this function as follows
execAndLog 'Scanned system' 'grep -i nfs /etc/fstab | grep -iq noexec'
The first argument is the description for the log, the remaining arguments are the command to be executed.
using bash -x or set -x will allow you to see what bash executes:
> cmd="grep -i nosuid /etc/fstab | grep -iq nfs"
> set -x
> $cmd
+ grep -i nosuid /etc/fstab '|' grep -iq nfs
as you can see your pipe | is passed as an argument to the first grep command.

Bad Substitution error with pdfgrep as variable?

I'm using a bash script to parse information from a PDF and use it to rename the file (with the help of pdfgrep). However, after some working, I'm receiving a "Bad Substitution" error with line 5. Any ideas on how to reformat it?
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+")
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+")
$({ read dobmonth; read dobday; read dobyear; } < (pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+"))
# Check id1 is found, else do nothing
if [ ${#id1} ]; then
mv "$f" "${id1}_${id2}_${printf '%02d-%02d-%04d\n' "$dobmonth" "$dobday" "$dobyear"}.pdf"
fi
done
There are several unrelated bugs in this code; a corrected version might look like the following:
#!/usr/bin/env bash
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+") || continue
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+") || continue
{ read dobmonth; read dobday; read dobyear; } < <(pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+")
printf -v date '%02d-%02d-%04d' "$dobmonth" "$dobday" "$dobyear"
mv -- "$f" "${id1}_${id2}_${date}.pdf"
done
< (...) isn't meaningful bash syntax. If you want to redirect from a process substitution, you should use the redirection syntax < and the process substitution <(...) separately.
$(...) generates a subshell -- a separate process with its own memory, such that variables assigned in that subprocess aren't exposed to the larger shell as a whole. Consequently, if you want the contents you set with read to be visible, you can't have them be in a subshell.
${printf ...} isn't meaningful syntax. Perhaps you wanted a command substitution? That would be $(printf ...), not ${printf ...}. However, it's more efficient to use printf -v varname 'fmt' ..., which avoids the overhead of forking off a subshell altogether.
Because we put the || continues on the id1=$(... | grep ...) command, we no longer need to test whether id1 is nonempty: The continue will trigger and cause the shell to continue to the next file should the grep fail.
Do what Charles suggests wrt creating the new file name but you might consider a different approach to parsing the PDF file to reduce how many pdfregs and pipes and greps you're doing on each file. I don't have pdfgrep on my system, nor do I know what your input file looks like but if we use this input file:
$ cat file
foo
ID #: M13
foo
Date Of Birth: 05 21 1996
foo
Second ID: V27
foo
and grep -E in place of pdfgrep then here's how I'd get the info from the input file by just reading it once with pdfgrep and parsing that output with awk instead of reading it multiple times with pdfgrep and using multiple pipes and greps to extract the info you need:
$ grep -E -i '(ID #|Second ID|Date Of Birth): ' file |
awk -F': +' '{f[$1]=$2} END{print f["ID #"], f["Second ID"], f["Date Of Birth"]}'
M13 V27 05 21 1996
Given that you can use the same read approach to save the output in variables (or an array). You obviously may need to massage the awk command depending on what your pdfgrep output actually looks like.

bash: process substitution, paste and echo

I'm trying out process substitution and this is just a fun exercise.
I want to append the string "XXX" to all the values of 'ls':
paste -d ' ' <(ls -1) <(echo "XXX")
How come this does not work? XXX is not appended. However if I want to append the file name to itself such as
paste -d ' ' <(ls -1) <(ls -1)
it works.
I do not understand the behavior. Both echo and ls -1 write to stdout but echo's output isn't read by paste.
Try doing this, using a printf hack to display the file with zero length output and XXX appended.
paste -d ' ' <(ls -1) <(printf "%.0sXXX\n" * )
Demo :
$ ls -1
filename1
filename10
filename2
filename3
filename4
filename5
filename6
filename7
filename8
filename9
Output :
filename1 XXX
filename10 XXX
filename2 XXX
filename3 XXX
filename4 XXX
filename5 XXX
filename6 XXX
filename7 XXX
filename8 XXX
filename9 XXX
If you just want to append XXX, this one will be simpler :
printf "%sXXX\n"
If you want the XXX after every line of ls -l output, you need a second command that output x times the string. You are echoing it just once and therefore it will get appended to the first line of ls output only.
If you are searching for a tiny command line to achieve the task you may use sed:
ls -l | sed -n 's/\(^.*\)$/\1 XXX/p'
And here's a funny one, not using any external command except the legendary yes command!
while read -u 4 head && read -u 5 tail ; do echo "$head $tail"; done 4< <(ls -1) 5< <(yes XXX)
(I'm only posting this because it's funny and it's actually not 100% off topic since it uses file descriptors and process substitutions)
... you have to:
for i in $( ls -1 ); do echo "$i XXXX"; done
Never use for i in $(command). See this answer for more details.
So, to answer of this original question, you could simply use something like this :
for file in *; do echo "$file XXXX"; done
Another solution with awk :
ls -1|awk '{print $0" XXXX"}'
awk '{print $0" XXXX"}' <(ls -1) # with process substitution
Another solution with sed :
ls -1|sed "s/\(.*\)/\1 XXXX/g"
sed "s/\(.*\)/\1 XXXX/g" <(ls -1) # with process substitution
And useless solutions, just for fun :
while read; do echo "$REPLY XXXX"; done <<< "$(ls -1)"
ls -1|while read; do echo "$REPLY XXXX"; done
It does it only for the first line, since it groups the first line from parameter 1 with the first line from parameter 2:
paste -d ' ' <(ls -1) <(echo "XXX")
... outputs:
/dir/file-a XXXX
/dir/file-b
/dir/file-c
... you have to:
for i in $( ls -1 ); do echo "$i XXXX"; done
You can use xargs for the same effect:
ls -1 | xargs -I{} echo {} XXX

Speed up bash filter function to run commands consecutively instead of per line

I have written the following filter as a function in my ~/.bash_profile:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
exit 0
}
to find lines of anything piped into it matching a regular expression, and highlight matches using ANSI escape codes on a VT100-compatible terminal.
For example, the following finds and highlights the strings bin, U or 1 which are whole words in the last 10 lines of /etc/passwd:
tail /etc/passwd | hilite "\b(bin|[U1])\b"
However, the script runs very slowly as each line forks an echo, egrep and sed.
In this case, it would be more efficient to do egrep on the entire input, and then run sed on its output.
How can I modify my function to do this? I would prefer to not create any temporary files if possible.
P.S. Is there another way to find and highlight lines in a similar way?
sed can do a bit of grepping itself: if you give it the -n flag (or #n instruction in a script) it won't echo any output unless asked. So
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
could be simplified to
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
EDIT:
Here's the whole function:
hilite() {
REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g");
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
}
That's all there is to it - no while loop, reading, grepping, etc.
If your egrep supports --color, just put this in .bash_profile:
hilite() { command egrep --color=auto "$#"; }
(Personally, I would name the function egrep; hence the usage of command).
I think you can replace the whole while loop with simply
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
because sed can read from stdin line-by-line so you don't need read
I'm not sure if running egrep and piping to sed is faster than using sed alone, but you can always compare using time.
Edit: added -n and p to sed to print only highlighted lines.
Well, you could simply do this:
egrep "$1" $line | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
But I'm not sure that it'll be that much faster ; )
Just for the record, this is a method using a temporary file:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
export FILE=$2
if [ -z "$FILE" ]
then
export FILE=~/tmp
echo -n > $FILE
while read line
do
echo $line >> $FILE
done
fi
egrep "$1" $FILE | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
return $?
}
which also takes a file/pathname as the second argument, for case like
cat /etc/passwd | hilite "\b(bin|[U1])\b"

Extract numbers from filename

In BASH I thought to use sed, but can't figure how to extract pattern instead usual replace.
For example:
FILENAME = 'blah_blah_#######_blah.ext'
number of ciphers (in above example written with "#" substitute) could be either 7 or 10
I want to extract only the number
If all you need is to remove anything but digits, you could use
ls | sed -e s/[^0-9]//g
to get all digits grouped per filename (123test456.ext will become 123456), or
ls | egrep -o [0-9]+
for all groups of numbers (123test456.ext will turn up 123 and 456)
You can use this simple code:
filename=zc_adsf_qwer132467_xcvasdfrqw
echo ${filename//[^0-9]/} # ==> 132467
Just bash:
shopt -s extglob
filename=zc_adsf_qwer132467_xcvasdfrqw
tmp=${filename##+([^0-9])}
nums=${tmp%%+([^0-9])}
echo $nums # ==> 132467
or, with bash 4
[[ "$filename" =~ [0-9]+ ]] && nums=${BASH_REMATCH[0]}
Is there any number anywhere else in the file name? If not:
ls | sed 's/[^0-9][^0-9]*\([0-9][0-9]*\).*/\1/g'
Should work.
A Perl one liner might work a bit better because Perl simply has a more advanced regular expression parsing and will give you the ability to specify the range of digits must be between 7 and 10:
ls | perl -ne 's/.*\D+(\d{7,10}).*/$1/;print if /^\d+$/;'
$ ls -1
blah_blah_123_blah.ext
blah_blah_234_blah.ext
blah_blah_456_blah.ext
Having such files in a directory you run:
$ ls -1 | sed 's/blah_blah_//' | sed 's/_blah.ext//'
123
234
456
or with a single sed run:
$ ls -1 | sed 's/^blah_blah_\([0-9]*\)_blah.ext$/\1/'
This will work for you -
echo $FILENAME | sed -e 's/[^(0-9|)]//g' | sed -e 's/|/,/g'

Resources