Using a pipe to read a file, run script and write to the same file - bash

I need to write a script with one line that gets a file and print on the same file on the end of each line the numbers of words on the sentence only if the word "word" Appears on it. I can use another script that can do what ever I want.
My problem is that after I run the script the file is empty, the file that I sent to the script.
This is the one line script:
#!/bin/bash
cat $1 | ./words_num word | cat $1
words_num
#!/bin/bash
while read line; do
temp=`echo $line | grep $1 | wc -l`
if (($temp==1)); then
word_cnt=`echo $line | wc -w`
echo "$line $word_cnt"
else
echo "$line"
fi
done
For example, before the file is:
bla bla blaa word
words blaa
bla bla
after file:
bla bla blaa word 4
words blaa 2
bla bla
Can you help?

The one-liner:
cat $1 | ./words_num word | cat $1
is peculiar. It is approximately equivalent to:
cat $1 | ./words_num word >/dev/null; cat $1
which is unlikely to be the intended result. It is also a candidate for a UUOC (Useless Use of cat) award.
If the intention is to overwrite the original file with the amended version, then you should probably write:
./words_num word < $1 > tmp.$$; mv tmp.$$ $1
If you want to see the results on the screen as well, then:
./words_num word < $1 | tee tmp.$$; mv tmp.$$ $1
Both these will leave a temporary file around if interrupted. You can avoid that with:
#!/bin/bash
trap "rm -f tmp.$$; exit 1" 0 1 2 3 13 15
./words_num word < $1 | tee tmp.$$
mv tmp.$$ $1
trap 0
The trap sets signal handlers (EXIT, HUP, INT, QUIT, PIPE, TERM) and removes the temporary file (if it exists) and exits with a failure status. The trap 0 at the end cancels the exit trap so the command exits successfully.
As for the words_num script, that seems to call for awk rather than shell:
#!/bin/bash
[ $# == 0 ] && { echo "Usage: $0 word [file ...]" >&2; exit 1; }
word=$1
shift
awk "/$word/"' { print $0, NF; next } { print }' "$#"
You can reduce that if you're into code golfing your awk scripts, but I prefer clarify to sub-par code. It looks for lines containing the word, prints the line along with the number of fields in the line, and moves to the next line. If the line doesn't match, it is simply printed. The assignment and shift mean that "$#" contains all the other arguments to words_num, and awk will automatically cycle through the named files, or read standard input if no files are named.
The script should check that the given word does not contain any slashes as that will mess up the regex (it would be OK to replace each one that appears with [/], a character class containing only a slash). That level of bullet-proofing is left for the interested user.

cat $1 | ./words_num word | tee $1

Related

Unix bash script grep loop counter (for)

I am looping our the a grep result. The result contains 10 lines (every line has different content). So the loop stuff in the loop gets executed 10 times.
I need to get the index, 0-9, in the run so i can do actions based on the index.
ABC=(cat test.log | grep "stuff")
counter=0
for x in $ABC
do
echo $x
((counter++))
echo "COUNTER $counter"
done
Currently the counter won't really change.
Output:
51209
120049
148480
1211441
373948
0
0
0
728304
0
COUNTER: 1
If your requirement is to only print counter(which is as per shown samples only), in that case you could use awk(if you are ok with it), this could be done in a single awk like, without creating variable and then using grep like you are doing currently, awk could perform both search and counter printing in a single shot.
awk -v counter=0 '/stuff/{print "counter=" counter++}' Input_file
Replace stuff string above with the actual string you are looking for and place your actual file name for Input_file in above.
This should print like:
counter=1
counter=2
........and so on
Your shell script contains what should be an obvious syntax error.
ABC=(cat test.log | grep "stuff")
This fails with
-bash: syntax error near unexpected token `|'
There is no need to save the output in a variable if you only want to process one at a time (and obviously no need for the useless cat).
grep "stuff" test.log | nl
gets you numbered lines, though the index will be 1-based, not zero-based.
If you absolutely need zero-based, refactoring to Awk should solve it easily:
awk '/stuff/ { print n++, $0 }' test.log
If you want to loop over this and do something more with this information,
awk '/stuff/ { print n++, $0 }' test.log |
while read -r index output; do
echo index is "$index"
echo output is "$output"
done
Because the while loop executes in a subshell the value of index will not be visible outside of the loop. (I guess that's what your real code did with the counter as well. I don't think that part of the code you posted will repro either.)
Do not store the result of grep in a scalar variable $ABC.
If the line of the log file contains whitespaces, the variable $x
is split on them due to the word splitting of bash.
(BTW the statement ABC=(cat test.log | grep "stuff") causes a syntax error.)
Please try something like:
readarray -t abc < <(grep "stuff" test.log)
for x in "${abc[#]}"
do
echo "$x"
echo "COUNTER $((++counter))"
done
or
readarray -t abc < <(grep "stuff" test.log)
for i in "${!abc[#]}"
do
echo "${abc[i]}"
echo "COUNTER $((i + 1))"
done
you can use below increment statement-
counter=$(( $counter + 1));

How to compare 2 files word by word and storing the different words in result output file

Suppose there are two files:
File1.txt
My name is Anamika.
File2.txt
My name is Anamitra.
I want result file storing:
Result.txt
Anamika
Anamitra
I use putty so can't use wdiff, any other alternative.
not my greatest script, but it works. Other might come up with something more elegant.
#!/bin/bash
if [ $# != 2 ]
then
echo "Arguments: file1 file2"
exit 1
fi
file1=$1
file2=$2
# Do this for both files
for F in $file1 $file2
do
if [ ! -f $F ]
then
echo "ERROR: $F does not exist."
exit 2
else
# Create a temporary file with every word from the file
for w in $(cat $F)
do
echo $w >> ${F}.tmp
done
fi
done
# Compare the temporary files, since they are now 1 word per line
# The egrep keeps only the lines diff starts with > or <
# The awk keeps only the word (i.e. removes < or >)
# The sed removes any character that is not alphanumeric.
# Removes a . at the end for example
diff ${file1}.tmp ${file2}.tmp | egrep -E "<|>" | awk '{print $2}' | sed 's/[^a-zA-Z0-9]//g' > Result.txt
# Cleanup!
rm -f ${file1}.tmp ${file2}.tmp
This uses a trick with the for loop. If you use a for to loop on a file, it will loop on each word. NOT each line like beginners in bash tend to believe. Here it is actually a nice thing to know, since it transforms the files into 1 word per line.
Ex: file content == This is a sentence.
After the for loop is done, the temporary file will contain:
This
is
a
sentence.
Then it is trivial to run diff on the files.
One last detail, your sample output did not include a . at the end, hence the sed command to keep only alphanumeric charactes.

Using cut on stdout with tabs

I have a file which contains one line of text with tabs
echo -e "foo\tbar\tfoo2\nx\ty\tz" > file.txt
I'd like to get the first column with cut. It works if I do
$ cut -f 1 file.txt
foo
x
But if I read it in a bash script
while read line
do
new_name=`echo -e $line | cut -f 1`
echo -e "$new_name"
done < file.txt
Then I get instead
foo bar foo2
x y z
What am I doing wrong?
/edit: My script looks like that right now
while IFS=$'\t' read word definition
do
clean_word=`echo -e $word | external-command'`
echo -e "$clean_word\t<b>$word</b><br>$definition" >> $2
done < $1
External command removes diacritics from a Greek word. Can the script be optimized any further without changing external-command?
What is happening is that you did not quote $line when reading the file. Then, the original tab-delimited format was lost and instead of tabs, spaces show in between words. And since cut's default delimiter is a TAB, it does not find any and it prints the whole line.
So quoting works:
while read line
do
new_name=`echo -e "$line" | cut -f 1`
#----------------^^^^^^^
echo -e "$new_name"
done < file.txt
Note, however, that you could have used IFS to set the tab as field separator and read more than one parameter at a time:
while IFS=$'\t' read name rest;
do
echo "$name"
done < file.txt
returning:
foo
x
And, again, note that awk is even faster for this purpose:
$ awk -F"\t" '{print $1}' file.txt
foo
x
So, unless you want to call some external command while looping the file, awk (or sed) is better.

Correctly count number of lines a bash variable

I need to count the number of lines of a given variable. For example I need to find how many lines VAR has, where VAR=$(git log -n 10 --format="%s").
I tried with echo "$VAR" | wc -l), which indeed works, but if VAR is empty, is prints 1, which is wrong. Is there a workaround for this? Something better than using an if clause to check whether the variable is empty...(maybe add a line and subtract 1 from the returned value?).
The wc counts the number of newline chars. You can use grep -c '^' for counting lines.
You can see the difference with:
#!/bin/bash
count_it() {
echo "Variablie contains $2: ==>$1<=="
echo -n 'grep:'; echo -n "$1" | grep -c '^'
echo -n 'wc :'; echo -n "$1" | wc -l
echo
}
VAR=''
count_it "$VAR" "empty variable"
VAR='one line'
count_it "$VAR" "one line without \n at the end"
VAR='line1
'
count_it "$VAR" "one line with \n at the end"
VAR='line1
line2'
count_it "$VAR" "two lines without \n at the end"
VAR='line1
line2
'
count_it "$VAR" "two lines with \n at the end"
what produces:
Variablie contains empty variable: ==><==
grep:0
wc : 0
Variablie contains one line without \n at the end: ==>one line<==
grep:1
wc : 0
Variablie contains one line with \n at the end: ==>line1
<==
grep:1
wc : 1
Variablie contains two lines without \n at the end: ==>line1
line2<==
grep:2
wc : 1
Variablie contains two lines with \n at the end: ==>line1
line2
<==
grep:2
wc : 2
You can always write it conditionally:
[ -n "$VAR" ] && echo "$VAR" | wc -l || echo 0
This will check whether $VAR has contents and act accordingly.
For a pure bash solution: instead of putting the output of the git command into a variable (which, arguably, is ugly), put it in an array, one line per field:
mapfile -t ary < <(git log -n 10 --format="%s")
Then you only need to count the number of fields in the array ary:
echo "${#ary[#]}"
This design will also make your life simpler if, e.g., you need to retrieve the 5th commit message:
echo "${ary[4]}"
try:
echo "$VAR" | grep ^ | wc -l

Count mutiple occurences of a word on the same line using grep

Here I made a small script that take input from user searching some pattern from a file and displays required no of lines from that file where the pattern is found. Although this code is searching the pattern line wise due to standard grep practice. I mean if the pattern occurs twice on the same line, i want the output to print twice. Hope I make some sense.
#!/bin/sh
cat /dev/null>copy.txt
echo "Please enter the sentence you want to search:"
read "inputVar"
echo "Please enter the name of the file in which you want to search:"
read "inputFileName"
echo "Please enter the number of lines you want to copy:"
read "inputLineNumber"
[[-z "$inputLineNumber"]] || inputLineNumber=20
cat /dev/null > copy.txt
for N in `grep -n $inputVar $inputFileName | cut -d ":" -f1`
do
LIMIT=`expr $N + $inputLineNumber`
sed -n $N,${LIMIT}p $inputFileName >> copy.txt
echo "-----------------------" >> copy.txt
done
cat copy.txt
As I understood, the task is to count number of pattern occurrences in line. It can be done like so:
count=$((`echo "$line" | sed -e "s|$pattern|\n|g" | wc -l` - 1))
Suppose you have one file to read. Then, code will be following:
#!/bin/bash
file=$1
pattern="an."
#reading file line by line
cat -n $file | while read input
do
#storing line to $tmp
tmp=`echo $input | grep "$pattern"`
#counting occurrences count
count=$((`echo "$tmp" | sed -e "s|$pattern|\n|g" | wc -l` - 1))
#printing $tmp line $count times
for i in `seq 1 $count`
do
echo $tmp
done
done
I checked this for pattern "an." and input:
I pass here an example of many 'an' letters
an
ananas
an-an-as
Output is:
$ ./test.sh input
1 I pass here an example of many 'an' letters
1 I pass here an example of many 'an' letters
1 I pass here an example of many 'an' letters
3 ananas
4 an-an-as
4 an-an-as
Adapt this to your needs.
How about using awk?
Assume the pattern you are searching for is in variable $pattern and the file you are checking is $file
The
count=`awk 'BEGIN{n=0}{n+=split($0,a,"'$pattern'")-1}END {print n}' $file`
or for a line
count=`echo $line | awk '{n=split($0,a,"'$pattern'")-1;print n}`

Resources