Formatting based on condition in bash - bash

Here is my code:
grep -E -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b|first_name.{0,40}|[(]?[2-9]{1}[0-9]{2}[)-. ]?[2-9]{1}[0-9]{2}[-. ]?[0-9]{4}" file.txt | awk -v ORS= '
NR>1 && !/,/ {print "\n"}
{print}
END {if (NR) print "\n"}' | sed -e :a -e '$!N;s/\n[0-9]{3}/,/;ta' -e 'P;D' | sed '$!N;s/\n\s*[0-9]//;P;D'
I'm pretty close. The above code works, but is removing the first digit from phone number.
I'm looking for a bash solution to do the following:
Combine two lines if the lines do not start with a number.
If the line starts with a number, combine the previous two lines + the line with the number for 3 fields in one line.
Here's an example?
jim.bob3#email.com
Jim Bob
jane.bob#email.com
Jane Bob
joebob1122#email.com
Joe Bob
555 555 5555
jbob44#email.com
Jeff Bob
....
Results:
jim.bob3#email.com Jim Bob
jane.bob#email.com Jane Bob
joebob1122#email.com Joe Bob 555 555 5555
jbob44#email.com Jeff Bob
Thanks!

If your Input_file is same as shown sample then following awk solution may help you in same.
awk '{printf("%s",$0~/^name/&&FNR>1?RS $0:FNR==1?$0:FS $0)} END{print ""}' Input_file
Output will be as follows.
name1#email.com Jim Bob
name2#email.com Jane Bob
name3#email.com Joe Bob 555 555 5555
name4#email.com Jeff Bob
Explanation: Following code is only for understanding purposes NOT for running you could use above code for running.
awk '{printf(\ ##Using printf keyword from awk here to print the values etc.
"%s",\ ##Mentioning %s means it tells printf that we are going to print a string here.
$0~/^name/&&FNR>1\ ##Checking here condition if a line starts from string name and line number is greater than 1 then:
?\ ##? means following statement will be printed as condition is TRUE.
RS $0\ ##printing RS(record separator) and current line here.
:\ ##: means in case mentioned above condition was NOT TRUE then perform following steps:
FNR==1\ ##Checking again condition here if a line number is 1 then do following:
?\ ##? means execute statements in case above condition is TRUE following ?
$0\ ##printing simply current line here.
:\ ##: means in case above mentioned conditions NOT TRUE then perform actions following :
FS $0)} ##Printing FS(field separator) and current line here.
END{print ""}' file24 ##Printing a NULL value here to print a new line and mentioning the Input_file name here too.

Using awk
awk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infile
Input:
$ cat infile
jim.bob3#emaawk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infileil.com
Jim Bob
jane.bob#email.com
Jane Bob
joebob1122#email.com
Joe Bob
555 555 5555
jbob44#email.com
Jeff Bob
Output:
$ awk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infile
jim.bob3#email.com Jim Bob
jane.bob#email.com Jane Bob
joebob1122#email.com Joe Bob 555 555 5555
jbob44#email.com Jeff Bob
Explanation:
awk '/#/{ # ir row/line/record contains #
if(s)print s; # if variable s was set before print it.
s="" # nullify or reset variable s
}
{
s=(s?s OFS:"")$0 # concatenate variable s with its previous content if it was set before, with
# OFS o/p field separator and
# current row/line/record ($0),
# otherwise s will be just current record
}
END{ # end block
if(s)print s # if s was set before print it
}
' infile

Related

add ### at the beginning of a file if there is a match with the content of a list of strings in another file

I have a file with some strings, I need to grep these strings in another file and if match add ### at the beginnig of the line that match.
Assuming this file (1.txt) the file with strings:
123
456
789
and this one the file (2.txt) where to perform the add of the ###:
mko 123 nhy
zaq rte vfr
cde nbv 456
789 bbb aaa
ooo www qqq
I'm expecting this output:
###mko 123 nhy
zaq rte vfr
###cde nbv 456
###789 bbb aaa
ooo www qqq
I've already tried the following without success:
cat 1.txt |while read line ; do sed '/^$line/s/./###&/' 2.txt >2.txt.out; done
With your shown samples please try following awk code.
awk '
FNR==NR{
arr[$0]
next
}
{
for(i=1;i<=NF;i++){
if($i in arr){
$0="###" $0
break
}
}
}
1
' 1.txt 2.txt
Explanation: Adding detailed explanation here.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition when 1.txt is being read.
arr[$0] ##Creating array arr with index of current line.
next ##next will skip all further all statements from here.
}
{
for(i=1;i<=NF;i++){ ##Traversing through all fields from here.
if($i in arr){ ##Checking if current field is present in arr then do following.
$0="###" $0 ##Adding ### before current line.
break;
}
}
}
1 ##Printing current edited/non-edited line here.
' 1.txt 2.txt ##Mentioning Input_file names here.
This might work for you (GNU sed):
sed 's#.*#/&/s/^#*/###/#' file1 | sed -f - file2
Create a sed script from file1 and run it against file2.
$ while read -r line; do sed -i "/\<$line\>/s/^/###/" 2.txt; done < 1.txt
$ cat 1.txt
###mko 123 nhy
zaq rte vfr
###cde nbv 456
###789 bbb aaa
ooo www qqq

Find a line with a single word and merge it with the next line

I have an issue with grep that i can't sort out.
What I have.
A listing of firstnames and lastnames, like:
John Doe
Alice Smith
Bob Smith
My problem.
Sometimes, firstname and lastname are disjointed, like:
Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line
What I'd like to achieve.
Concatenate the "orphan" name with the next line.
Alice Smith
Bod Doolittle
Mark Von Doe
What I already tried
grep -ozP "^\w+\n\w.+" file | tr '\n' ' '
So, here I ask grep to find a line with just one word and concatenate it with the following line, even is this next line has more than one word.
It works correctly but only if the isolated word is at the very beginning of the file. If it appears below the first line, grep do not spot it. So a quick and dirty solution where I would loop through the file and remove a line after each pass doesn't work for me.
If awk is acceptable:
awk '
NF==1 {printf "%s ",$1; getline; print; next}
1' names.dat
Where:
NF==1 - if only one name/field in the current record ...
printf / getline / print / next - print field #1, read next line and print it, then skip to next line
1 - print all other lines as is
As a one-liner:
awk 'NF==1{printf "%s ",$1;getline;print;next}1' names.dat
This generates:
Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line
You can use GNU sed like this:
sed -E -i '/^[^[:space:]]+$/{N;s/\n/ /}' file
See the sed demo:
s='Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line'
sed -E '/^[^[:space:]]+$/{N;s/\n/ /}' <<< "$s"
Output:
Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line
Details:
/^[^[:space:]]+$/ finds a line with no whitespace
{N;s/\n/ /} - reads in the next line, and appends a newline char with this new line to the current pattern space, and then s/\n/ / replaces this newline char with a space.
Use this Perl one-liner:
perl -lane 'BEGIN { $is_first_name = 1; } if ( #F == 1 && $is_first_name ) { #prev = #F; $is_first_name = 0; } else { print join " ", #prev, #F; $is_first_name = 1; #prev = (); }' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
Using awk:
awk '
{f=$2 ? 1 : 0}
v==1{v=0; print; next}
f==0{v=1; printf "%s ", $1; next}
1
' file
Output
Alice Smith
Bob Doolittle
Mark Von Doe
This might work for you (GNU sed):
sed -E 'N;s/^(\S+)\n/\1 /;P;D' file
Append the next line.
If the first line in the pattern space contains one word only, replace the following newline with a space.
Print/delete the first line and repeat.

awk print "matched" after successful match

Trying to use awk to search a file for specific lines and writing "matched" after printing each line if it matches.
For example, if I have a file that contains a list of names and emails but I only want to match emails ending in "#yahoo.com" I want it to print out that line with "matched" at the end and if the line does NOT contain #yahoo.com, I just want it to print out that line and continue.
awk -F, '{if($3~/yahoo.com/){ print $1,$2,$3 " matched"}else{ print $1,$2,$3 }}' emails.txt
This returns:
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com`
So its only matching the first #yahoo.com, then printing the other lines off regardless of their email address. What am I missing?
Could you please try following.
awk '/#yahoo\.com/{print $0,"matched"}' Input_file
Explanation: Your shown sample Input_file was not having comma so removed that field separator part from it.
Also a person's name can have more than 2 fields so I am not matching 3rd field in condition, rather I am checking condition on whole line here.
This is probably what you want:
awk -F, '{print $0 ($NF ~ /#yahoo\.com$/ ? " matched" : "")}' emails.txt
but without seeing your input file it's just an untested guess.
I tested the example with the following input: file delimiter given is , in your example and after splitting it u can access $3 record(email) and match your condition.
Input:
Joe,Smith,joe.smith#yahoo.com
John,Doe,john.doe#gmail.com
Sally,Sue,sally.sue#yahoo.com```
Script
awk -F, '{if($3~/yahoo.com/){ print $1,$2,$3 " matched"}else{ print $1,$2,$3 }}' emails.txt
Output:
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com matched
If the line ends with #yahoo.com , reassign the entire line with itself followed by string matched.
awk '/#yahoo.com$/{$0=$0 " matched"}1' input
Joe Smith joe.smith#yahoo.com matched
John Doe john.doe#gmail.com
Sally Sue sally.sue#yahoo.com matched

apply dictionary mapping to the column of a file with awk

I have a text file file.txt with several columns (tab separated), and the first column can contain indexes such as 1, 2, and 3. I want to update the first column so that 1 becomes "one", 2 becomes "two", and 3 becomes "three". I created a bash file a.sh containing:
declare -A DICO=( [1]="one" [2]="two" [3]="three" )
awk '{ $1 = ${DICO[$1]}; print }'
But now when I run cat file.txt | ./a.sh I get:
awk: cmd. line:1: { $1 = ${DICO[$1]}; print }
awk: cmd. line:1: ^ syntax error
I'm not able to fix the syntax. Any ideas? Also there is maybe a better way to do this with bash, but I could not think of another simple approach.
For instance, if the input is a file containing:
2 xxx
2 yyy
1 zzz
3 000
4 bla
The expected output would be:
two xxx
two yyy
one zzz
three 000
UNKNOWN bla
EDIT: Since OP had now added samples so changed solution as per that now.
awk 'BEGIN{split("one,two,three",array,",")} {$1=$1 in array?array[$1]:"UNKONW"} 1' OFS="\t" Input_file
Explanation: Adding explanation for above code too now.
awk '
BEGIN{ ##Starting BEGIN block of awk code here.
split("one,two,three",array,",") ##Creating an array named array whose values are string one two three with delimiter as comma.
}
{
$1=$1 in array?array[$1]:"UNKOWN" ##Re-creating first column which will be if $1 comes in array then its value will be aray[$1] else it will be UNKOWN string.
}
1 ##Mentioning 1 here. awk works on method of condition then action, so making condition is TRUE here and not mentioning any action so by default print of current line will happen.
' Input_file ##mentioning Input_file name here.
Since you haven't shown samples so couldn't tested completely, could you please try following and let me know if this helps.
awk 'function check(value){gsub(value,array[value],$1)} BEGIN{split("one,two,three",array,",")} check(1) check(2) check(3); 1' Input_file
Adding a non-one liner form of solution too here.
awk '
function check(value){
gsub(value,array[value],$1)
}
BEGIN{
split("one,two,three",array,",")
}
check(1)
check(2)
check(3);
1' OFS="\t" Input_file
Tested code as follows too:
Let's say we have following Input_file:
cat Input_file
1213121312111122243434onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvr13232424
Then after running the code following will be the output:
onetwoonethreeonetwoonethreeonetwooneoneoneonetwotwotwo4three4three4onetwothree wguwvrwvrwvbvrwvrvr
vkewjvrkmvronethreetwothreetwo4two4
Given a dico file containing this:
$ cat dico
1 one
2 two
3 three
You could use this awk script:
awk 'NR==FNR{a[$1]=$2;next}($1 in a){$1=a[$1]}1' dico file.txt
This fills the array a with the content of the dico file and replaces the first element of the file.txt file if this one is part of the array.

awk pattern match for a line with two specific words

Using AWK, want to print last line containing two specific words.
Suppose I have log.txt which contains below logs
log1|Amy|Call to Bob for Food
log2|Jaz|Call to Mary for Toy and Cookies
log3|Ron|Call to Jerry then Bob for Book
log4|Amy|Message to John for Cycle
Now, Need to extract last line with "Call" and "Bob".
I tried with-
#!/bin/bash
log="log.txt"
var="Bob"
check=$(awk -F'|' '$3 ~ "/Call.*$var/" {print NR}' $log | tail -1)
echo "Value:$check"
so Value:3 (3rd record) should be printed.
But it's not printed.Please suggest. I have to use awk.
With GNU awk for word delimiters to avoid "Bob" incorrectly matching "Bobbing":
$ awk -v var="Bob" -F'|' '$3 ~ "Call.*\\<" var "\\>"{nr=NR; rec=$0} END{if (nr) print nr, rec}' file
3 log3|Ron|Call to Jerry then Bob for Book
See http://cfajohnson.com/shell/cus-faq-2.html#Q24. If Bob is already saved in a shell variable named var then use awk -v var="$var" ....

Resources