Looping over input fields as array - bash

Is it possible to do something like this:
$ cat foo.txt
1 2 3 4
foo bar baz
hello world
$ awk '{ for(i in $){ print $[i]; } }' foo.txt
1
2
3
4
foo
bar
baz
hello
world
I know you could do this:
$ awk '{ split($0,array," "); for(i in array){ print array[i]; } }' foo.txt
2
3
4
1
bar
baz
foo
world
hello
But then the result is not in order.

Found out myself:
$ awk '{ for(i = 1; i <= NF; i++) { print $i; } }' foo.txt

I'd use sed:
sed 's/\ /\n/g' foo.txt

No need for awk, sed or perl. You can easily do this directly in the shell:
for i in $(cat foo.txt); do echo "$i"; done

If you're open to using Perl, either of these should do the trick:
perl -lane 'print $_ for #F' foo.txt
perl -lane 'print join "\n",#F' foo.txt
These command-line options are used:
-n loop around each line of the input file, do not automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code

Related

How to get the line number of a string in another string in Shell

Given
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
I'd like to get the line number of the first occurrence of $str in $sourceStr, which should be 3.
I don't know how to do it.
I have tried:
awk 'match($0, v) { print NR; exit }' v=$str <<<$sourceStr
grep -n $str <<< $sourceStr | grep -Eo '^[^:]+';
grep -n $str <<< $sourceStr | cut -f1 -d: | sort -ug
grep -n $str <<< $sourceStr | awk -F: '{ print $1 }' | sort -u
All output 1, not 3.
How can I get the line number of $str in $sourceStr?
Thanks!
You may use this awk + printf in bash:
awk -v s="$str" '$0 == s {print NR; exit}' <(printf "%b\n" "$sourceStr")
3
Or even this awk without any bash support:
awk -v s="$str" -v source="$sourceStr" 'BEGIN {
split(source, a); for (i=1; i in a; ++i) if (a[i] == s) {print i; exit}}'
3
You may use this sed as well:
sed -n "/^$str$/{=;q;}" <(printf "%b\n" "$sourceStr")
3
Or this grep + cut:
printf "%b\n" "$sourceStr" | grep -nxF -m 1 "$str" | cut -d: -f1
3
It's not clear if you've just made a cut-n-paste error, but your sourceStr is not a multiline string (as demonstrated below). Also, you really need to quote your herestring (also demonstrated below). Perhaps you just want:
$ sourceStr="abc\nefg\nhij\nlmn\nhij"
$ echo "$sourceStr"
abc\nefg\nhij\nlmn\nhij
$ sourceStr=$'abc\nefg\nhij\nlmn\nhij'
$ echo "$sourceStr"
abc
efg
hij
lmn
hij
$ cat <<< $sourceStr
abc efg hij lmn hij
$ cat <<< "$sourceStr"
abc
efg
hij
lmn
hij
$ str=hij
$ awk "/${str}/ {print NR; exit}" <<< "$sourceStr"
3
Just use sed!
printf 'abc\nefg\nhij\nlmn\nhij\n' \
| sed -n '/hij/ { =; q; }'
Explanation: if sed meets a line that contains "hij" (regex /hij/), it prints the line number (the = command) and exits (the q command). Else it doesn't print anything (the -n switch) and goes on with the next line.
[update] Hmmm, sorry, I just noticed your "All output 1, not 3".
The primary reason why your commands don't output 3 is that sourceStr="abc\nefg\nhij\nlmn\nhij" doesn't automagically change your \n into new lines, so it ends up being one single line and that's why your commands always display 1.
If you want a multiline string, here are two solutions with bash:
printf -v sourceStr "abc\nefg\nhij\nlmn\nhij"
sourceStr=$'abc\nefg\nhij\nlmn\nhij'
And now that your variable contains space characters (new lines), as stated by William Pursell, in order to preserve them, you must enclose your $sourceStr with double quotes:
grep -n "$str" <<< "$sourceStr" | ...
There's always a hard way to do it:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | nl | grep $str | head -1 | gawk '{ print $1 }'
or, a bit more efficient:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | gawk '/'$str/'{ print NR; exit }'

Bash shell: Count occurrences of pattern (in one file) listed in arrays (array elements loaded from different file)

Hi I have loaded patterns of pattern.txt file into array and now I would like to grep count of each array element from second file (named as count.csv)
pattern.txt
abc
def
ghi
count.csv
1234,abc,joseph
5678,ramson,abc
2231,sam,def
1123,abc,richard
2521,ghi,albert
7371,jackson,def
bash shell script is given below:
declare -a myArray
myArray=( $(awk '{print $1}' ./pattern.txt))
for ((i=0; i < ${#myArray[*]}; i++))
do
var1=$(grep -c "${myArray[i]}" count.csv)
echo $var1
done
But, when I run the script, instead of giving below output
3
2
1
It gives output as
0
0
1
i.e. it only gives correct count of last array element.
grep + sort + uniq pipeline solution:
grep -o -w -f pattern.txt count.csv | sort | uniq -c
The output:
3 abc
2 def
1 ghi
grep options:
-f - obtain pattern(s) from file
-o - print only the matched parts of matching lines
-w - select only those lines containing matches that form whole words
The alternative awk approach:
awk 'NR==FNR{p[$0]; next}{ for(i=1;i<=NF;i++){ if($i in p) {p[$i]++; break} }}
END {for(i in p) print p[i],i}' pattern.txt FS="," count.csv
The output:
2 def
3 abc
1 ghi
p[$0] - accumulating patterns from the 1st input file (pattern.txt)
for(i=1;i<=NF;i++) - iterating though the fields of the line of the 2nd file (count.csv)
if($i in p) {p[$i]++; break} - incrementing counter for each matched pattern
It is better to use awk for processing text files line by line:
awk -F, 'NR==FNR {wrd[$1]; next} $2 in wrd{wrd[$2]++} $3 in wrd{wrd[$3]++}
END{for (w in wrd) print w, wrd[w]}' pattern.txt count.csv
def 2
abc 3
ghi 1
Reference: Effective AWK Programming
You could also skip the array and just loop over the patterns:
while read -r pattern; do
[[ -n $pattern ]] && grep -c "$pattern" count.csv
done < pattern.txt
grep -c outputs just the counts of the matches
Try using this command instead:
mapfile -t myArray < pattern.txt
for pattern in ${myArray[*]}; do
echo $(grep -o $pattern count.csv| wc -l)
done
Output:
3
2
1
mapfile will store every pattern in pattern.txt into myArray
The for loop will iterate through each pattern in myArray and print the number of occurrence of pattern in count.csv

Get the word after a match in a line when line has multiple match

I have a big text file with content as below:
Register foo1 ... Register foo2 ... Register foo10...
Register foo20 ...
Un-Register bar1 ... Register foo21 ...
I wrote below bash script, which will work only if there is one "Register" per line, but how to get all foo's in same line?
#!/bin/bash
file=/tmp/log
grep -e 'Register\s' $file | awk '{print $2}' | grep -v Un-Register | while read -r line; do
#do something with $line
done
Try this :
perl -pe 's/\s+Register/\nRegister/g' file |
grep -oP '^Register\s+\Kfoo\S*'
Output :
foo1
foo2
foo10...
foo20
foo21
here's a perl one-liner to find the word after "Register" but not "Un-Register", and all the words from a line will be kept on a line
$ perl -nE 'say "#{[/(?<!Un-)Register\s+\K\S+/g]}"' file
foo1 foo2 foo10...
foo20
foo21
A less dense version:
$ perl -nE '
#words = / (?<!Un-) # preceding characters are not "Un-"
Register \s+ # must have "Register" followed by whitespace
\K # disregard the previous from matching
\S+ # capture the next non-whitespace characters
/gx; # "g"lobally on this line
say "#words";
' file
Here is a non-regex awk solution to get the job done:
awk '{
s=""
for (i=2; i<=NF; i++)
if ($(i-1) == "Register")
s = sprintf("%s%s", (s==""?"":s OFS), $i)
print s
}' file
foo1 foo2 foo10...
foo20
foo21
egrep -o '(^|[^-])Register \w*' file | awk '{print $2 }'
first grep filters Register word (and not Un-Register) and print the matches in new lines (-o option)
And awk prints only the word

tab delimit a file in bash

I have two files. I would like to join them by column and convert them from tab delimited to space delimted.
What is needed on top of
paste fileA fileB
to make that work?
Through awk,
awk 'FNR==NR{a[FNR]=$1; next} {print a[FNR]"\t"$2}' file1 file2
Example:
$ cat m
cat
dog
$ cat r
foo bar
bar foo
$ awk 'FNR==NR{a[FNR]=$1; next} {print a[FNR]"\t"$2}' m r
cat bar
dog foo
Talking about pure bash, something like this, haven't tested but you should be able to fix any bugs:
exec 3<file1
exec 4<file2
while :; do
read -r -u 3 f1_w || exit
read -r -u 4 f2_w1 f2_w2 || exit 1
echo -e "${f1_w}\t${f2_w2}"
done

Replace certain token with the content of a file (using a bash-script)

I have a file containing some text and the words INSERT_HERE1 and INSERT_HERE2. I'd like to replace these words with the content of file1.txt and file2.txt respectively.
I suspect sed or awk could pull it off but I've basically never used them.
Sed does have a built-in read file command. The commands you want would look something like this:
$ sed -e '/INSERT_HERE1/ {
r FILE1
d }' -e '/INSERT_HERE2/ {
r FILE2
d }' < file
This would output
foo
this is file1
bar
this is file2
baz
The r command reads the file, and the d command deletes the line with the INSERT_HERE tags. You need to use the curly braces since sed commands and multi-line input since sed commands have to start on their own line, and depending on your shell, you may need \ at the end of the lines to avoid premature execution. If this is something you would use a lot, you can just put the command in a file and use sed -f to run it.
If you are okay with Perl you can do:
$ cat FILE1
this is file1
$ cat FILE2
this is file2
$ cat file
foo
INSERT_HERE1
bar
INSERT_HERE2
baz
$ perl -ne 's/^INSERT_HERE(\d+)\s+$/`cat FILE$1`/e;print' file
foo
this is file1
bar
this is file2
baz
$
This is not tested, but would be pretty close to what you need:
sed -e "s/INSERT_HERE1/`cat file1.txt`/" -e "s/INSERT_HERE2/`cat file2.txt`/" <file >file.out
It will not properly handle a file with slashes in it, though, so you may need to tweak it a bit.
I'd recommend Perl instead, though. Something like this:
#!/usr/bin/perl -w
my $f1 = `cat file1.txt`;
my $f2 = `cat file2.txt`;
while (<>) {
chomp;
s/INSERT_HERE1/$f1/;
s/INSERT_HERE2/$f2/;
print "$_\n";
}
This assumes that INSERT_HERE1 and INSERT_HERE2 may only appear once per line, and that the file1.txt does not include the text INSERT_HERE2 (wouldn't be difficult to fix, though). Use like this:
./script <file >file.out
This is suitable for small substitution files that may be substituted many times:
awk 'BEGIN {
while ((getline line < ARGV[1]) > 0) {file1 = file1 nl line; nl = "\n"};
close (ARGV[1]); nl = "";
while ((getline line < ARGV[2]) > 0) {file2 = file2 nl line; nl = "\n"};
close (ARGV[2]);
ARGV[1] = ""; ARGV[2] = "" }
{ gsub("token1", file1);
gsub("token2", file2);
print }' file1.txt file2.txt mainfile.txt
You may want to add some extra newlines here and there, depending on how you want your output to look.
Easily done with Bash. If you need it to be POSIX shell let me know:
#!/bin/bash
IFS= # Needed to prevent the shell from interpreting the newlines
f1=$(< /path/to/file1.txt)
f2=$(< /path/to/file2.txt)
while read line; do
if [[ "$line" == "INSERT_HERE1" ]]; then
echo "$f1"
elif [[ "$line" == "INSERT_HERE2" ]]; then
echo "$f2"
else
echo "$line"
fi
done < /path/to/input/file
This snippet replaces any section that is specified in the upper array. For e.g. here
<!--insert.txt-->
with the contents of "insert.txt"
#!/bin/bash
replace[1]=\<!--insert.txt--\> ; file[1]=insert.txt
replace[2]=\<!--insert2.txt--\> ; file[2]=insert2.txt
replacelength=${#replace[#]}
cat blank.txt > tmp.txt
for i in $(seq 1 ${replacelength})
do
echo Replacing ${file[i]} ...
sed -e "/${replace[i]}/r ${file[i]}" -e "/${replace[i]}/d" tmp.txt > tmp_2.txt
mv tmp_2.txt tmp.txt
done
mv tmp.txt file.txt
If you're not afraid of .zip files you can try this example as long as it is online: http://ablage.stabentheiner.de/2013-04-16_contentreplace.zip
I would use perl's in place replacement with -i.ext option
perl -pi.bak -e 's|INSERT_HERE1|`cat FILE1`|ge;
s|INSERT_HERE2|`cat FILE2`|ge;' myfile
Then, use diff myfile.bak myfile to verify:

Resources