Tailing less with a custom LESSOPEN colorize script - bash

I've written the following script to pick out keywords from a log file and highlight terms:
#!/bin/bash
case "$1" in
*.log) sed -e "s/\(.*\[Error\ \].*\)/\x1B[31m&\x1b[0m/" "$1" \
| sed -e "s/\(.*\[Warn\ \ \].*\)/\x1B[33m&\x1b[0m/" \
| sed -e "s/\(.*\[Info\ \ \].*\)/\x1B[32m&\x1b[0m/" \
| sed -e "s/\(.*\[Debug\ \].*\)/\x1B[32m&\x1b[0m/"
;;
esac
It works OK until I try and follow/tail less (Shift+F) at which point it fails to tail any new log lines. Any ideas why?

That colorizes what you pass as an argument to the script. What you want instead is to read from stdin. Wrap your case statement in the following loop:
while read LINE; do
case "$LINE" in
# ... rest of your code here
esac
done
Now you can pipe it into your script:
tail -f somefile | colorize_script.sh
Additional answer:
I had this same need several years ago so I wrote a script that works like grep but colorizes the matching text instead of hiding non-matching text. If you have tcl on your system you can grab my script from here: http://wiki.tcl.tk/38096
Just copy/paste the code (it's only 200 lines) into an empty file and chmod it to make it executable. Name it cgrep (for color-grep) and put it somewhere in your executable path. Now you can do something like this:
tail -f somefile | cgrep '.*\[Error\s*\].*' -fg yellow -bg red

Related

User input into variables and grep a file for pattern

H!
So I am trying to run a script which looks for a string pattern.
For example, from a file I want to find 2 words, located separately
"I like toast, toast is amazing. Bread is just toast before it was toasted."
I want to invoke it from the command line using something like this:
./myscript.sh myfile.txt "toast bread"
My code so far:
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w "$keyword_first""$keyword_second" )
echo $find_keyword
i have tried a few different ways. Directly from the command line I can make it run using:
cat myfile.txt | grep -E 'toast|bread'
I'm trying to put the user input into variables and use the variables to grep the file
You seem to be looking simply for
grep -E "$2|$3" "$1"
What works on the command line will also work in a script, though you will need to switch to double quotes for the shell to replace variables inside the quotes.
In this case, the -E option can be replaced with multiple -e options, too.
grep -e "$2" -e "$3" "$1"
You can pipe to grep twice:
find_keyword=$(cat $text_file | grep -w "$keyword_first" | grep -w "$keyword_second")
Note that your search word "bread" is not found because the string contains the uppercase "Bread". If you want to find the words regardless of this, you should use the case-insensitive option -i for grep:
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
In a full script:
#!/bin/bash
#
# usage: ./myscript.sh myfile.txt "toast" "bread"
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
echo $find_keyword

bash: cURL from a file, increment filename if duplicate exists

I'm trying to curl a list of URLs to aggregate the tabular data on them from a set of 7000+ URLs. The URLs are in a .txt file. My goal was to cURL each line and save them to a local folder after which I would grep and parse out the HTML tables.
Unfortunately, because of the format of the URLs in the file, duplicates exist (example.com/State/City.html. When I ran a short while loop, I got back fewer than 5500 files, so there are at least 1500 dupes in the list. As a result, I tried to grep the "/State/City.html" section of the URL and pipe it to sed to remove the / and substitute a hyphen to use with curl -O. cURL was trying to grab
Here's a sample of what I tried:
while read line
do
FILENAME=$(grep -o -E '\/[A-z]+\/[A-z]+\.htm' | sed 's/^\///' | sed 's/\//-/')
curl $line -o '$FILENAME'
done < source-url-file.txt
It feels like I'm missing something fairly straightforward. I've scanned the man page because I worried I had confused -o and -O which I used to do a lot.
When I run the loop in the terminal, the output is:
Warning: Failed to create the file State-City.htm
I think you dont need multitude seds and grep, just 1 sed should suffice
urls=$(echo -e 'example.com/s1/c1.html\nexample.com/s1/c2.html\nexample.com/s1/c1.html')
for u in $urls
do
FN=$(echo "$u" | sed -E 's/^(.*)\/([^\/]+)\/([^\/]+)$/\2-\3/')
if [[ ! -f "$FN" ]]
then
touch "$FN"
echo "$FN"
fi
done
This script should work and also take care of downloading same files multiple files.
just replace the touch command by your curl one
First: you didn't pass the url info to grep.
Second: try this line instead:
FILENAME=$(echo $line | egrep -o '\/[^\/]+\/[^\/]+\.html' | sed 's/^\///' | sed 's/\//-/')

how to cat command output to string in shell script

in my script i need to loop through lines in a file, once i find some specific line i need to save it to variable so later on i can use it outside the loop, i tried the following but it wont' work:
count=0
res=""
python my.py -p 12345 |
while IFS= read -r line
do
count=$((count+1))
if [ "$count" -eq 5 ]; then
res=`echo "$line" | xargs`
fi
done
echo "$res"
it output nothing, i also tried this,
res=""
... in the loop...
res=$res`echo "$line" | xargs`
still nothing. please help. thanks.
Update: Thanks for all the help. here is my final code:
res=python my.py -p 12345 | sed -n '5p' | xargs
for finding a specific line in a file, have you considered using grep?
grep "thing I'm looking for" /path/to/my.file
this will output the lines that match the thing you're looking for. Moreover this can be piped to xargs as in your question.
If you need to look at a particularly numbered line of a file, consider using the head and tail commands (which can also be piped to grep).
cat /path/to/my.file | head -n5 | tail -n1 | grep "thing I'm looking for"
These commands take the first lines specified (in this case, 5 and 1 respectively) and only prints those out. Hopefully this will help you accomplish your task.
Happy coding! Leave a comment if you have any questions.

Optimize shell script for multiple sed replacements

I have a file containing a list of replacement pairs (about 100 of them) which are used by sed to replace strings in files.
The pairs go like:
old|new
tobereplaced|replacement
(stuffiwant).*(too)|\1\2
and my current code is:
cat replacement_list | while read i
do
old=$(echo "$i" | awk -F'|' '{print $1}') #due to the need for extended regex
new=$(echo "$i" | awk -F'|' '{print $2}')
sed -r "s/`echo "$old"`/`echo "$new"`/g" -i file
done
I cannot help but think that there is a more optimal way of performing the replacements. I tried turning the loop around to run through lines of the file first but that turned out to be much more expensive.
Are there any other ways of speeding up this script?
EDIT
Thanks for all the quick responses. Let me try out the various suggestions before choosing an answer.
One thing to clear up: I also need subexpressions/groups functionality. For example, one replacement I might need is:
([0-9])U|\10 #the extra brackets and escapes were required for my original code
Some details on the improvements (to be updated):
Method: processing time
Original script: 0.85s
cut instead of awk: 0.71s
anubhava's method: 0.18s
chthonicdaemon's method: 0.01s
You can use sed to produce correctly -formatted sed input:
sed -e 's/^/s|/; s/$/|g/' replacement_list | sed -r -f - file
I recently benchmarked various string replacement methods, among them a custom program, sed -e, perl -lnpe and an probably not that widely known MySQL command line utility, replace. replace being optimized for string replacements was almost an order of magnitude faster than sed. The results looked something like this (slowest first):
custom program > sed > LANG=C sed > perl > LANG=C perl > replace
If you want performance, use replace. To have it available on your system, you'll need to install some MySQL distribution, though.
From replace.c:
Replace strings in textfile
This program replaces strings in files or from stdin to stdout. It accepts a list of from-string/to-string pairs and replaces each occurrence of a from-string with the corresponding to-string. The first occurrence of a found string is matched. If there is more than one possibility for the string to replace, longer matches are preferred before shorter matches.
...
The programs make a DFA-state-machine of the strings and the speed isn't dependent on the count of replace-strings (only of the number of replaces). A line is assumed ending with \n or \0. There are no limit exept memory on length of strings.
More on sed. You can utilize multiple cores with sed, by splitting your replacements into #cpus groups and then pipe them through sed commands, something like this:
$ sed -e 's/A/B/g; ...' file.txt | \
sed -e 's/B/C/g; ...' | \
sed -e 's/C/D/g; ...' | \
sed -e 's/D/E/g; ...' > out
Also, if you use sed or perl and your system has an UTF-8 setup, then it also boosts performance to place a LANG=C in front of the commands:
$ LANG=C sed ...
You can cut down unnecessary awk invocations and use BASH to break name-value pairs:
while IFS='|' read -r old new; do
# echo "$old :: $new"
sed -i "s~$old~$new~g" file
done < replacement_list
IFS='|' will give enable read to populate name-value in 2 different shell variables old and new.
This is assuming ~ is not present in your name-value pairs. If that is not the case then feel free to use an alternate sed delimiter.
Here is what I would try:
store your sed search-replace pair in a Bash array like ;
build your sed command based on this array using parameter expansion
run command.
patterns=(
old new
tobereplaced replacement
)
pattern_count=${#patterns[*]} # number of pattern
sedArgs=() # will hold the list of sed arguments
for (( i=0 ; i<$pattern_count ; i=i+2 )); do # don't need to loop on the replacement…
search=${patterns[i]};
replace=${patterns[i+1]}; # … here we got the replacement part
sedArgs+=" -e s/$search/$replace/g"
done
sed ${sedArgs[#]} file
This result in this command:
sed -e s/old/new/g -e s/tobereplaced/replacement/g file
You can try this.
pattern=''
cat replacement_list | while read i
do
old=$(echo "$i" | awk -F'|' '{print $1}') #due to the need for extended regex
new=$(echo "$i" | awk -F'|' '{print $2}')
pattern=${pattern}"s/${old}/${new}/g;"
done
sed -r ${pattern} -i file
This will run the sed command only once on the file with all the replacements. You may also want to replace awk with cut. cut may be more optimized then awk, though I am not sure about that.
old=`echo $i | cut -d"|" -f1`
new=`echo $i | cut -d"|" -f2`
You might want to do the whole thing in awk:
awk -F\| 'NR==FNR{old[++n]=$1;new[n]=$2;next}{for(i=1;i<=n;++i)gsub(old[i],new[i])}1' replacement_list file
Build up a list of old and new words from the first file. The next ensures that the rest of the script isn't run on the first file. For the second file, loop through the list of replacements and perform them each one by one. The 1 at the end means that the line is printed.
{ cat replacement_list;echo "-End-"; cat YourFile; } | sed -n '1,/-End-/ s/$/³/;1h;1!H;$ {g
t again
:again
/^-End-³\n/ {s///;b done
}
s/^\([^|]*\)|\([^³]*\)³\(\n\)\(.*\)\1/\1|\2³\3\4\2/
t again
s/^[^³]*³\n//
t again
:done
p
}'
More for fun to code via sed. Try maybe for a time perfomance because this start only 1 sed that is recursif.
for posix sed (so --posix with GNU sed)
explaination
copy replacement list in front of file content with a delimiter (for line with ³ and for list with -End-) for an easier sed handling (hard to use \n in class character in posix sed.
place all line in buffer (add the delimiter of line for replacement list and -End- before)
if this is -End-³, remove the line and go to final print
replace each first pattern (group 1) found in text by second patttern (group 2)
if found, restart (t again)
remove first line
restart process (t again). T is needed because b does not reset the test and next t is always true.
Thanks to #miku above;
I have a 100MB file with a list of 80k replacement-strings.
I tried various combinations of sed's sequentially or parallel, but didn't see throughputs getting shorter than about a 20-hour runtime.
Instead I put my list into a sequence of scripts like "cat in | replace aold anew bold bnew cold cnew ... > out ; rm in ; mv out in".
I randomly picked 1000 replacements per file, so it all went like this:
# first, split my replace-list into manageable chunks (89 files in this case)
split -a 4 -l 1000 80kReplacePairs rep_
# next, make a 'replace' script out of each chunk
for F in rep_* ; do \
echo "create and make executable a scriptfile" ; \
echo '#!/bin/sh' > run_$F.sh ; chmod +x run_$F.sh ; \
echo "for each chunk-file line, strip line-ends," ; \
echo "then with sed, turn '{long list}' into 'cat in | {long list}' > out" ; \
cat $F | tr '\n' ' ' | sed 's/^/cat in | replace /;s/$/ > out/' >> run_$F.sh ;
echo "and append commands to switch in and out files, for next script" ; \
echo -e " && \\\\ \nrm in && mv out in\n" >> run_$F.sh ; \
done
# put all the replace-scripts in sequence into a main script
ls ./run_rep_aa* > allrun.sh
# make it executable
chmod +x allrun.sh
# run it
nohup ./allrun.sh &
.. which ran in under 5 mins, a lot less than 20 hours !
Looking back, I could have used more pairs per script, by finding how many lines would make up the limit.
xargs --show-limits </dev/null 2>&1 | grep --color=always "actually use:"
Maximum length of command we could actually use: 2090490
So just under 2MB; how many pairs would that be for my script ?
head -c 2090490 80kReplacePairs | wc -l
76923
So it seems I could have used 2 * 40000-line chunks
to expand on chthonicdaemon's solution
live demo
#! /bin/sh
# build regex from text file
REGEX_FILE=some-patch.regex.diff
# test
# set these with "export key=val"
SOME_VAR_NAME=hello
ANOTHER_VAR_NAME=world
escape_b() {
echo "$1" | sed 's,/,\\/,g'
}
regex="$(
(echo; cat "$REGEX_FILE"; echo) \
| perl -p -0 -e '
s/\n#[^\n]*/\n/g;
s/\(\(SOME_VAR_NAME\)\)/'"$(escape_b "$SOME_VAR_NAME")"'/g;
s/\(\(ANOTHER_VAR_NAME\)\)/'"$(escape_b "$ANOTHER_VAR_NAME")"'/g;
s/([^\n])\//\1\\\//g;
s/\n-([^\n]+)\n\+([^\n]*)(?:\n\/([^\n]+))?\n/s\/\1\/\2\/\3;\n/g;
'
)"
echo "regex:"; echo "$regex" # debug
exec perl -00 -p -i -e "$regex" "$#"
prefixing lines with -+/ allows empty "plus" values, and protects leading whitespace from buggy text editors
sample input: some-patch.regex.diff
# file format is similar to diff/patch
# this is a comment
# replace all "a/a" with "b/b"
-a/a
+b/b
/g
-a1|a2
+b1|b2
/sg
# this is another comment
-(a1).*(a2)
+b\1b\2b
-a\na\na
+b
-a1-((SOME_VAR_NAME))-a2
+b1-((ANOTHER_VAR_NAME))-b2
sample output
s/a\/a/b\/b/g;
s/a1|a2/b1|b2/;;
s/(a1).*(a2)/b\1b\2b/;
s/a\na\na/b/;
s/a1-hello-a2/b1-world-b2/;
this regex format is compatible with sed and perl
since miku mentioned mysql replace:
replacing fixed strings with regex is non-trivial,
since you must escape all regex chars,
but you also must handle backslash escapes ...
naive escaper:
echo '\(\n' | perl -p -e 's/([.+*?()\[\]])/\\\1/g'
\\(\n

Reading a file line by line in ksh

We use some package called Autosys and there are some specific commands of this package. I have a list of variables which i like to pass in one of the Autosys commands as variables one by one.
For example one such variable is var1, using this var1 i would like to launch a command something like this
autosys_showJobHistory.sh var1
Now when I launch the below written command, it gives me the desired output.
echo "var1" | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
But if i put the var1 in a file say Test.txt and launch the same command using cat, it gives me nothing. I have the impression that command autosys_showJobHistory.sh does not work in that case.
cat Test.txt | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
What I am doing wrong in the second command ?
Wrote all of below, and then noticed your grep statement.
Recall that ksh doesn't support .. as an indicator for 'expand this range of values'. (I assume that's your intent). It's also made ambiguous by your lack of quoting arguments to grep. If you were using syntax that the shell would convert, then you wouldn't really know what reg-exp is being sent to grep. Always better to quote argments, unless you know for sure that you need the unquoted values. Try rewriting as
grep '1[1-6]:[0-9][0-9]' | grep '24.12.2012'
Also, are you deliberately using the 'match any char' operator '.' OR do you want to only match a period char? If you want to only match a period, then you need to escape it like \..
Finally, if any of your files you're processing have been created on a windows machine and then transfered to Unix/Linux, very likely that the line endings (Ctrl-MCtrl-J) (\r\n) are causing you problems. Cleanup your PC based files (or anything that was sent via ftp) with dos2unix file [file2 ...].
If the above doesn't help, You'll have to "divide and conquer" to debug your problem.
When I did the following tests, I got the expected output
$ echo "var1" | while read line ; do print "line=${line}" ; done
line=var1
$ vi Test.txt
$ cat Test.txt
var1
$ cat Test.txt | while read line ; do print "line=${line}" ; done
line=var1
Unrelated to your question, but certain to cause comment is your use of the cat commnad in this context, which will bring you the UUOC award. That can be rewritten as
while read line ; do print "line=${line}" ; done < Test.txt
But to solve your problem, now turn on the shell debugging/trace options, either by changing the top line of the script (the shebang line) like
#!/bin/ksh -vx
Or by using a matched pair to track the status on just these lines, i.e.
set -vx
while read line; do
print -u2 -- "#dbg: Line=${line}XX"
autosys_showJobHistory.sh $line \
| grep 1[1..6]:[0..9][0..9] \
| grep 24.12.2012 \
| tail -1
done < Test.txt
set +vx
I've added an extra debug step, the print -u2 -- .... (u2=stderror, -- closes option processing for print)
Now you can make sure no extra space or tab chars are creeping in, by looking at that output.
They shouldn't matter, as you have left your $line unquoted. As part of your testing, I'd recommend quoting it like "${line}".
Then I'd comment out the tail and the grep lines. You want to see what step is causing this to break, right? So does the autosys_script by itself still produce the intermediate output you're expecting? Then does autosys + 1 grep produce out as expected, +2 greps, + tail? You should be able to easily see where you're loosing your output.
IHTH

Resources