Speed up bash filter function to run commands consecutively instead of per line - bash

I have written the following filter as a function in my ~/.bash_profile:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
exit 0
}
to find lines of anything piped into it matching a regular expression, and highlight matches using ANSI escape codes on a VT100-compatible terminal.
For example, the following finds and highlights the strings bin, U or 1 which are whole words in the last 10 lines of /etc/passwd:
tail /etc/passwd | hilite "\b(bin|[U1])\b"
However, the script runs very slowly as each line forks an echo, egrep and sed.
In this case, it would be more efficient to do egrep on the entire input, and then run sed on its output.
How can I modify my function to do this? I would prefer to not create any temporary files if possible.
P.S. Is there another way to find and highlight lines in a similar way?

sed can do a bit of grepping itself: if you give it the -n flag (or #n instruction in a script) it won't echo any output unless asked. So
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
could be simplified to
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
EDIT:
Here's the whole function:
hilite() {
REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g");
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
}
That's all there is to it - no while loop, reading, grepping, etc.

If your egrep supports --color, just put this in .bash_profile:
hilite() { command egrep --color=auto "$#"; }
(Personally, I would name the function egrep; hence the usage of command).

I think you can replace the whole while loop with simply
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
because sed can read from stdin line-by-line so you don't need read
I'm not sure if running egrep and piping to sed is faster than using sed alone, but you can always compare using time.
Edit: added -n and p to sed to print only highlighted lines.

Well, you could simply do this:
egrep "$1" $line | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
But I'm not sure that it'll be that much faster ; )

Just for the record, this is a method using a temporary file:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
export FILE=$2
if [ -z "$FILE" ]
then
export FILE=~/tmp
echo -n > $FILE
while read line
do
echo $line >> $FILE
done
fi
egrep "$1" $FILE | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
return $?
}
which also takes a file/pathname as the second argument, for case like
cat /etc/passwd | hilite "\b(bin|[U1])\b"

Related

mv: Cannot stat - No such file or directory

I have piped the output of ls command into a file. The contents are like so:
[Chihiro]_Grisaia_no_Kajitsu_-_01_[1920x816_Blu-ray_FLAC][D2B961D6].mkv
[Chihiro]_Grisaia_no_Kajitsu_-_02_[1920x816_Blu-ray_FLAC][38F88A81].mkv
[Chihiro]_Grisaia_no_Kajitsu_-_03_[1920x816_Blu-ray_FLAC][410F74F7].mkv
My attempt to rename these episodes according to episode number is as follows:
cat grisaia | while read line;
#get the episode number
do EP=$(echo $line | egrep -o "_([0-9]{2})_" | cut -d "_" -f2)
if [[ $EP ]]
#escape special characters
then line=$(echo $line | sed 's/\[/\\[/g' | sed 's/\]/\\]/g')
mv "$line" "Grisaia_no_Kajitsu_${EP}.mkv"
fi
done
The mv commands exit with code 1 with the following error:
mv: cannot stat
'\[Chihiro\]_Grisaia_no_Kajitsu_-01\[1920x816_Blu-ray_FLAC\]\[D2B961D6\].mkv':
No such file or directory
What I really don't get is that if I copy the file that could not be stat and attempt to stat the file, it works. I can even take the exact same string that is output and execute the mv command individually.
If you surround your variable ($line) with double quotes (") you don't need to escape those special characters. So you have two options there:
Remove the following assignation completely:
then # line=$(echo $line | sed 's/\[/\\[/g' | sed 's/\]/\\]/g')`
or
Remove the double quotes in the following line:
mv $line "Grisaia_no_Kajitsu_${EP}.mkv"
Further considerations
Parsing the output of ls is never a good idea. Think about filenames with spaces. See this document for more information.
The cat here is unnecessary:
cat grisaia | while read line;
...
done
Use this instead to avoid an unnecessary pipe:
while read line;
...
done < grisaia
Why is good to avoid pipes in some scenarios? (answering comment)
Pipes create subshells (which are expensive), and you can also make some mistakes as the following:
last=""
cat grisaia | while read line; do
last=$line
done
echo $last # surprise!! it outputs an empty string
The reason is that $last inside the loop belongs to another subshell.
Now, see the same approach wothout pipes:
while read line; do
last=$line
done < grisaia
echo $last # it works as expected and prints the last line

Conditionally pipe output through sed

Is there a way to conditionally pipe the output of a command through sed, in a bash script? Depending upon a script option, I either want to pipe the output of a long pipe through sed, or omit the pipe through sed. Currently I'm doing
if [ $pipeit ]; then
sed_args='/omit this line/d'
else
sed_args='/$^/d' # pass-thru (what's a better sed pass thru?)
fi
some_cmd | sed "$sed_args"
I would keep it as simple as:
if [ $pipeit ]; then
some_cmd | sed '/omit this line/d'
else
some_cmd
fi
Why should you call sed if you don't need it? Just for your information, a possible sed command that does not change the input would be sed -n p
Btw, if some_cmd is kind of a large beast and you want to avoid duplicating it, wrap it into a function.
By default sed prints all lines:
if [ $pipeit ]; then
sed_args='/omit this line/d'
else
sed_args="" # pass-thru
fi
some_cmd | sed "${sed_args}"
There is this other tested solution:
some_cmd | if [ $pipeit ]; then
sed "/omit this line/d"
else
sed ""
fi
cat could be used instead of the sed ""
Finally, a string can be built and executed using eval.
some_cmd='printf "foo\n\nbar\n"'
if [ $pipeit ]; then
conditional_pipe='| sed "/foo/d"'
else
conditional_pipe=""
fi
eval "${some_cmd}" "${conditional_pipe}"
If some_cmd is complex it migth be tricky to build a string that would behave as expected with eval.
----
First solution for history
Using an impossible match with sed would make it print all lines to stdout:
$ printf "foo\n\nbar\n" | sed "/./{/^$/d}"
foo
bar
/./ selects a line with at least one char.
/^$/ selects an empty line.

Pipe output to egrep function

I'm trying to define a bash function, highlight, that I can use to highlight search terms in the output of a previous command. When I do this from the terminal, it works fine as follows:
# highlight all occurrences of bar in file foo
cat foo | egrep '(bar|$)'
Yes, catting is a simplified example, but it demonstrates how I can do this from the command line. I'd like to use this generically as: cat foo | hightlight bar
From what I've read, I can't simply pipe results to egrep like I hoped so I naively tried defining my bash function as:
highlight() {
while read line; do
pat="'("$1"|$)'"
echo \"$line\" | egrep $pat
done
}
However, this isn't working. Please advise.
Your quoting is just about entirely wrong.
pat="'("$1"|$)'"
you include literal single quotes in the pattern, and you're actually not quoting the function parameter.
echo \"$line\" | egrep $pat
You're including literal double quotes in the echo statement, and failing to quote both variables.
This is better:
highlight() {
while read -r line; do
pat="($1|$)"
echo "$line" | grep -E "$pat"
done
}
However, grep knows how to read from stdin, so simplify:
highlight() { grep -E "($1|$)"; }
I'm not sure what you read, but it is wrong. egrep stands for extended grep because it is using the extended POSIX regular expression syntax. It behaves like the standard grep -E
Read the man page:
egrep is the same as grep -E
It seems to me you should have changed this
pat="'("$1"|$)'"
To
pat="($1|\$)"
Also, I don't see the need of "$". In addition, I think it is better to move the initialization of "pat" out of the loop. Here is what I got (tested):
#!/bin/bash
function highlight {
pattern="$1"
while read line; do
echo "$line" | egrep --color "$pattern"
done
}
echo -e 'a\nb\nbar\nhibar' | highlight bar

awk parse filename and add result to the end of each line

I have number of files which have similar names like
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out
etc.
I need to get number before .csv(1 or 2) from the file name and put it into end of every line in file with TAB separator.
I have written this code, it finds number that I need, but i do not know how to put this number into file. There is space in the filename, my script breaks because of it.
Also I am not sure, how to send to script list of files. Now I am working only with one file.
My code:
#!/bin/sh
string="DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out"
out=$(echo $string | awk 'BEGIN {FS="_"};{print substr ($7,0,1)}')
awk ' { print $0"\t$out" } ' $string
for file in *
do
sfx=$(echo "$file" | sed 's/.*_\(.*\).csv.*/\1/')
sed -i "s/$/\t$sfx/" "$file"
done
Using sed:
$ sed 's/.*_\(.*\).csv.*/&\t\1/' file
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out 1
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out 2
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out 1
To make this for many files:
sed 's/.*_\(.*\).csv.*/&\t\1/' file1 file2 file3
OR
sed 's/.*_\(.*\).csv.*/&\t\1/' file*
To make this changed get saved in the same file(If you have GNU sed):
sed -i 's/.*\(.\).csv.*/&\t\1/' file
Untested, but this should do what you want (extract the number before .csv and append that number to the end of every line in the .out file)
awk 'FNR==1 { split(FILENAME, field, /[_.]/) }
{ print $0"\t"field[7] > FILENAME"_aaaa" }' *.out
for file in *_aaaa; do mv "$file" "${file/_aaaa}"; done
If I understood correctly, you want to append the number from the filename to every line in that file - this should do it:
#!/bin/bash
while [[ 0 < $# ]]; do
num=$(echo "$1" | sed -r 's/.*_([0-9]+).csv.*/\t\1/' )
#awk -e "{ print \$0\"\t${num}\"; }" < "$1" > "$1.new"
#sed -r "s/$/\t$num/" < "$1" > "$1.mew"
#sed -ri "s/$/\t$num/" "$1"
shift
done
Run the script and give it names of the files you want to process. $# is the number of command line arguments for the script which is decremented at the end of the loop by shift, which drops the first argument, and shifts the other ones. Extract the number from the filename and pick one of the three commented lines to do the appending: awk gives you more flexibility, first sed creates new files, second sed processes them in-place (in case you are running GNU sed, that is).
Instead of awk, you may want to go with sed or coreutils.
Grab number from filename, with grep for variety:
num=$(<<<filename grep -Eo '[^_]+\.csv' | cut -d. -f1)
<<<filename is equivalent to echo filename.
With sed
Append num to each line with GNU sed:
sed "s/\$/\t$num" filename
Use the -i switch to modify filename in-place.
With paste
You also need to know the length of the file for this method:
len=$(<filename wc -l)
Combine filename and num with paste:
paste filename <(seq $len | while read; do echo $num; done)
Complete example
for filename in DWH_Export*; do
num=$(echo $filename | grep -Eo '[^_]+\.csv' | cut -d. -f1)
sed -i "s/\$/\t$num" $filename
done

Substitution with sed + bash function

my question seems to be general, but i can't find any answers.
In sed command, how can you replace the substitution pattern by a value returned by a simple bash function.
For instance, I created the following function :
function parseDates(){
#Some process here with $1 (the pattern found)
return "dateParsed;
}
and the folowing sed command :
myCatFile=`sed -e "s/[0-3][0-9]\/[0-1][0-9]\/[0-9][0-9]/& parseDates &\}/p" myfile`
I found that the caracter '&' represents the current pattern found, i'd like it to be passed to my bash function and the whole pattern to be substituted by the pattern found +dateParsed.
Does anybody have an idea ?
Thanks
you can use the "e" option in sed command like this:
cat t.sh
myecho() {
echo ">>hello,$1<<"
}
export -f myecho
sed -e "s/.*/myecho &/e" <<END
ni
END
you can see the result without "e":
cat t.sh
myecho() {
echo ">>hello,$1<<"
}
export -f myecho
sed -e "s/.*/myecho &/" <<END
ni
END
Agree with Glenn Jackman.
If you want to use bash function in sed, something like this :
sed -rn 's/^([[:digit:].]+)/`date -d #&`/p' file |
while read -r line; do
eval echo "$line"
done
My file here begins with a unix timestamp (e.g. 1362407133.936).
Bash function inside sed (maybe for other purposes):
multi_stdin(){ #Makes function accepet variable or stdin (via pipe)
[[ -n "$1" ]] && echo "$*" || cat -
}
sans_accent(){
multi_stdin "$#" | sed '
y/àáâãäåèéêëìíîïòóôõöùúûü/aaaaaaeeeeiiiiooooouuuu/
y/ÀÁÂÃÄÅÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ/AAAAAAEEEEIIIIOOOOOUUUU/
y/çÇñÑߢÐð£Øø§µÝý¥¹²³ªº/cCnNBcDdLOoSuYyY123ao/
'
}
eval $(echo "Rogério Madureira" | sed -n 's#.*#echo & | sans_accent#p')
or
eval $(echo "Rogério Madureira" | sed -n 's#.*#sans_accent &#p')
Rogerio
And if you need to keep the output into a variable:
VAR=$( eval $(echo "Rogério Madureira" | sed -n 's#.*#echo & | desacentua#p') )
echo "$VAR"
do it step by step. (also you could use an alternate delimiter , such as "|" instead of "/"
function parseDates(){
#Some process here with $1 (the pattern found)
return "dateParsed;
}
value=$(parseDates)
sed -n "s|[0-3][0-9]/[0-1][0-9]/[0-9][0-9]|& $value &|p" myfile
Note the use of double quotes instead of single quotes, so that $value can be interpolated
I'd like to know if there's a way to do this too. However, for this particular problem you don't need it. If you surround the different components of the date with ()s, you can back reference them with \1 \2 etc and reformat however you want.
For instance, let's reverse 03/04/1973:
echo 03/04/1973 | sed -e 's/\([0-9][0-9]\)\/\([0-9][0-9]\)\/\([0-9][0-9][0-9][0-9]\)/\3\/\2\/\1/g'
sed -e 's#[0-3][0-9]/[0-1][0-9]/[0-9][0-9]#& $(parseDates &)#' myfile |
while read -r line; do
eval echo "$line"
done
You can glue together a sed-command by ending a single-quoted section, and reopening it again.
sed -n 's|[0-3][0-9]/[0-1][0-9]/[0-9][0-9]|& '$(parseDates)' &|p' datefile
However, in contrast to other examples, a function in bash can't return strings, only put them out:
function parseDates(){
# Some process here with $1 (the pattern found)
echo dateParsed
}

Resources