Get the word after a match in a line when line has multiple match - bash

I have a big text file with content as below:
Register foo1 ... Register foo2 ... Register foo10...
Register foo20 ...
Un-Register bar1 ... Register foo21 ...
I wrote below bash script, which will work only if there is one "Register" per line, but how to get all foo's in same line?
#!/bin/bash
file=/tmp/log
grep -e 'Register\s' $file | awk '{print $2}' | grep -v Un-Register | while read -r line; do
#do something with $line
done

Try this :
perl -pe 's/\s+Register/\nRegister/g' file |
grep -oP '^Register\s+\Kfoo\S*'
Output :
foo1
foo2
foo10...
foo20
foo21

here's a perl one-liner to find the word after "Register" but not "Un-Register", and all the words from a line will be kept on a line
$ perl -nE 'say "#{[/(?<!Un-)Register\s+\K\S+/g]}"' file
foo1 foo2 foo10...
foo20
foo21
A less dense version:
$ perl -nE '
#words = / (?<!Un-) # preceding characters are not "Un-"
Register \s+ # must have "Register" followed by whitespace
\K # disregard the previous from matching
\S+ # capture the next non-whitespace characters
/gx; # "g"lobally on this line
say "#words";
' file

Here is a non-regex awk solution to get the job done:
awk '{
s=""
for (i=2; i<=NF; i++)
if ($(i-1) == "Register")
s = sprintf("%s%s", (s==""?"":s OFS), $i)
print s
}' file
foo1 foo2 foo10...
foo20
foo21

egrep -o '(^|[^-])Register \w*' file | awk '{print $2 }'
first grep filters Register word (and not Un-Register) and print the matches in new lines (-o option)
And awk prints only the word

Related

how to change words with the same words but with number at the back bash

I have a file for example with the name file.csv and content
adult,REZ
man,BRB
women,SYO
animal,HIJ
and a line that is nor a directory nor a file
file.csv BRB1 REZ3 SYO2
And what I want to do is change the content of the file with the words that are on the line and then get the nth letter of that word with the number at the end of the those words in capital
and the output should then be
umo
I know that I can get over the line with
for i in "${#:2}"
do
words+=$(echo "$i ")
done
and then the output is
REZ3 BRB1 SYO2
Using awk:
Pass the string of values as an awk variable and then split them into an array a. For each record in file.csv, iterate this array and if the second field of current record matches the first three characters of the current array value, then strip the target character from the first field of the current record and append it to a variable. Print the value of the aggregated variable.
awk -v arr="BRB1 REZ3 SYO2" -F, 'BEGIN{split(arr,a," ")} {for (v in a) { if ($2 == substr(a[v],0,3)) {n=substr(a[v],length(a[v]),1); w=w""substr($1,n,1) }}} END{print w}' file.csv
umo
You can also put this into a script:
#!/bin/bash
words="${2}"
src_file="${1}"
awk -v arr="$words" -F, 'BEGIN{split(arr,a," ")} \
{for (v in a) { \
if ($2 == substr(a[v],0,3)) { \
n=substr(a[v],length(a[v]),1); \
w=w""substr($1,n,1);
}
}
} END{print w}' "$src_file"
Script execution:
./script file.csv "BRB1 REZ3 SYO2"
umo
This is a way using sed.
Create a pattern string from command arguments and convert lines with sed.
#!/bin/bash
file="$1"
pat='s/^/ /;Te;'
for i in ${#:2}; do
pat+=$(echo $i | sed 's#^\([^0-9]*\)\([0-9]*\)$#s/.\\{\2\\}\\(.\\).*,\1$/\\1/;#')
done
pat+='Te;H;:e;${x;s/\n//g;p}'
eval "sed -n '$pat' $file"
Try this code:
#!/bin/bash
declare -A idx_dic
filename="$1"
pattern_string=""
for i in "${#:2}";
do
pattern_words=$(echo "$i" | grep -oE '[A-Z]+')
index=$(echo "$i" | grep -oE '[0-9]+')
pattern_string+=$(echo "$pattern_words|")
idx_dic["$pattern_words"]="$index"
done
pattern_string=${pattern_string%|*}
while IFS= read -r line
do
line_pattern=$(echo $line | grep -oE $pattern_string)
[[ -n $line_pattern ]] && line_index="${idx_dic[$line_pattern]}" && echo $line | awk -v i="$line_index" '{split($0, chars, ""); printf("%s", chars[i]);}'
done < $filename
first find the capital words pattern and catch the index corresponding
then construct the hole pattern words string which connect with |
at last, iterate the every line according to the pattern string, and find the letter by the index
Execute this script.sh like:
bash script.sh file.csv BRB1 REZ3 SYO2

Replace one character by the other (and vice-versa) in shell

Say I have strings that look like this:
$ a='/o\\'
$ echo $a
/o\
$ b='\//\\\\/'
$ echo $b
\//\\/
I'd like a shell script (ideally a one-liner) to replace / occurrences by \ and vice-versa.
Suppose the command is called invert, it would yield (in a shell prompt):
$ invert $a
\o/
$ invert $b
/\\//\
For example using sed, it seems unavoidable to use a temporary character, which is not great, like so:
$ echo $a | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
\o/
$ echo $b | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
/\\//\
For some context, this is useful for proper printing of git log --graph --all | tac (I like to see newer commits at the bottom).
tr is your friend:
% echo 'abc' | tr ab ba
bac
% echo '/o\' | tr '\\/' '/\\'
\o/
(escaping the backslashes in the output might require a separate step)
I think this can be done with (g)awk:
$ echo a/\\b\\/c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a\/b/\c
$ echo a\\/b/\\c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a/\b\/c
$
-F "/" This defines the separator, The input will be split in "/", and should no longer contain a "/" character.
for(i=1;i<=NF;i++) gsub(/\\/,"/",$i);. This will replace, in all items in the input, the backslash (\) for a slash (/).
If you want to replace every instance of / with \, you can uses the y command of sed, which is quite similar to what tr does:
$ a='/o\'
$ echo "$a"
/o\
$ echo "$a" | sed 'y|/\\|\\/|'
\o/
$ b='\//\\/'
$ echo "$b"
\//\\/
$ echo "$b" | sed 'y|/\\|\\/|'
/\\//\
If you are strictly limited to GNU AWK you might get desired result following way, let file.txt content be
\//\\\\/
then
awk 'BEGIN{FPAT=".";OFS="";arr["/"]="\\";arr["\\"]="/"}{for(i=1;i<=NF;i+=1){if($i in arr){$i=arr[$i]}};print}' file.txt
gives output
/\\////\
Explanation: I inform GNU AWK that field is any single character using FPAT built-in variable and that output field separator (OFS) is empty string and create array where key-value pair represent charactertobereplace-replacement, \ needs to be escaped hence \\ denote literal \. Then for each line I iterate overall all fields using for loop and if given field hold character present in array arr keys I do exchange it for corresponding value, after loop I print line.
(tested in gawk 4.2.1)

AWK -F with print all but last record

/Home/in/test_file.txt
echo /Home/in/test_file.txt | awk -F'/' '{ print $2,$3 }'
Gives the result as:
Home in
But I need /Home/in/ as the result .I have to get all except test_file.txt
How to achieve this?
$ echo '/Home/in/test_file.txt' | awk '{sub("/[^/]+$","")} 1'
/Home/in
$ echo '/Home/in/test_file.txt' | awk '{sub("[^/]+$","")} 1'
/Home/in/
$ echo '/Home/in/test_file.txt' | sed 's:/[^/]*$::'
/Home/in
$ echo '/Home/in/test_file.txt' | sed 's:[^/]*$::'
/Home/in/
$ dirname '/Home/in/test_file.txt'
/Home/in
Your attempt awk -F'/' '{ print $2,$3 }' didn't do what you wanted as -F'/' is telling awk to split the input into fields at every / and then print $2,$3 is telling awk to print the 2nd and 3rd fields separated by a blank char (the default value for OFS). You could do:
$ echo '/Home/in/test_file.txt' | awk 'BEGIN{FS=OFS="/"} { print "",$2,$3,"" }'
/Home/in/
to get the expected output but it'd be the wrong approach since it's removing the field you don't want AND removing the input separators AND then adding new output separators which happen to the have the same value as the input separators rather than simply removing the field you don't want like the other solutions above do.
echo /Home/in/test_file.txt | awk -F'/[^/]*$' '{ print $1 }'
..will print the everything but the trailing slash
There are several ways to achieve this:
Using dirname:
$ dirname /home/in/test_file.txt
/home/in
Using Shell substitution:
$ var="/home/in/test_file.txt"
$ echo "${var%/*}"
/home/in
Using sed: (See Ed Morton)
Using AWK:
$ echo "/home/in/test_file.txt" | awk -F'/' '{OFS=FS;$NF=""}1'
/home/in/
Remark: all these work since you can't have a filename with a forward slash (Is it possible to use "/" in a filename?)
Note: all but dirname will fail if you just have a single file_name without a path. While dirname foo will return ./ all others will return foo
awk behaves as it should.
When you define slash / as a separator, the fields in your expression become the content between the separators.
If you need the separator to be printed as well, you need to do it explicitly, like:
echo /Home/in/test_file.txt | awk -F'/' '{ printf "%s/%s/",$2,$3 }'
replace your last field with an empty string and
put the slash back in as the (builtin) Output Field Separator (OFS)
echo /Home/in/test_file.txt | awk -F'/' -vOFS='/' '{$NF="";print}

printing results in one line separated by commas in bash

How can I print all text file location separated by commas in one line? Can I do this in for loop?
Here is an example of files.
/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt
/data/home/files/txt_files_2/file3.txt
output would look like
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Thanks
Here is the correct code
#!/bin/bash
delim=""
for i in /data/home/files/txt_files_1/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
printf "\\"
printf "\n"
for i in /data/home/files/txt_files_2/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
For single file input:
awk -v OFS=, -v RS= 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Or
awk -v OFS=, -v RS= -v ORS='\n\n' 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
You can use printf "%s," "$file" to print several names into a single line. To get the delimiters right, I use this trick:
delim=""
...loop...
printf "%s%s" "$delim" "$file"
delim=","
printf "\n"
<command to generate lines of paths> | tr '\n' ','
example:
echo "/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt" | tr '\n' ','
outputs:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt,,/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt
Assuming your input is in a file called list, this Perl one-liner does the job:
perl -F'\n' -00 -ane 'push #a, join(",", #F) }{ print(join(" \\\n\n", #a), "\n")' list
explanation
-00, in combination with -n, reads the file one block (paragraph) at a time.
The -a switch in combination with -F'\n' auto-splits the text on each new line. The result goes into the array #F.
An array is built, each element containing the comma separated list of the elements in #F
Once the file has been processed, all the elements of the array #a are printed, joined together as you specified. The additional "\n" on the end is optional.
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt

Looping over input fields as array

Is it possible to do something like this:
$ cat foo.txt
1 2 3 4
foo bar baz
hello world
$ awk '{ for(i in $){ print $[i]; } }' foo.txt
1
2
3
4
foo
bar
baz
hello
world
I know you could do this:
$ awk '{ split($0,array," "); for(i in array){ print array[i]; } }' foo.txt
2
3
4
1
bar
baz
foo
world
hello
But then the result is not in order.
Found out myself:
$ awk '{ for(i = 1; i <= NF; i++) { print $i; } }' foo.txt
I'd use sed:
sed 's/\ /\n/g' foo.txt
No need for awk, sed or perl. You can easily do this directly in the shell:
for i in $(cat foo.txt); do echo "$i"; done
If you're open to using Perl, either of these should do the trick:
perl -lane 'print $_ for #F' foo.txt
perl -lane 'print join "\n",#F' foo.txt
These command-line options are used:
-n loop around each line of the input file, do not automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code

Resources