awk: if condition is met, add string to these lines - bash

I am trying to combine these two commands:
awk -F[' '] '{if ($1=="string" || $4=="string") print $0" ""blue"}' file >file.out
awk -F[' '] '{if ($1!="string" && $4!="string") print $0}' file >>file.out
Basically I want to add a column, but only print blue in that column if either the first or the fourth column are equal to a string.
input:
string 123 452 abc
def 420 902 ghi
expected output:
string 123 452 abc blue
def 420 902 ghi

You can do it in a single pass like
awk '$1 == "string" || $4 == "string" {print $0 " blue"; next} {print}' file > file.out
which will test if either the first or forth field is string and will print the line followed by blue if so. We use next then to not process the next block (rather than doing the negated test we just did). Any line we didn't print with blue on, we just print as it was.

Another awk solution:
$ awk '$1=="string" || $4=="string"{ NF+=1; $NF="blue" }1' file
string 123 452 abc blue
def 420 902 ghi

awk '{print $0 (($1 == "string" || $4 == "string") ? OFS "blue":"")}' infile

Another similar alternative:
awk '$1=="string" || $4=="string" {$0=$0" blue"} 1' infile
Output:
string 123 452 abc blue
def 420 902 ghi

Following may also help you, where I am not mentioning the field numbers in check condition(considering that your Input_file is same as shown sample), so it looks if a line starts from string "string" and ends from string "string" then it prints blue else print normal line.
awk '/^string/||/string$/{print $0,"blue";next} 1' Input_file

Related

awk to get first column if the a specific number in the line is greater than a digit

I have a data file (file.txt) contains the below lines:
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=22:00,dom=sss.co.uk,user2=lis
I'm expecting to get the first column ($1) only if the ETA= number is greater than 15, like here I will have 2nd and 3rd line first column only is expected.
345
456
I tried like cat file.txt | awk -F [,TPF=]' '{print $1}' but its print whole line which has ETA at the end.
Using awk
$ awk -F"[=, ]" '{for (i=1;i<NF;i++) if ($i=="ETA") if ($(i+1) > 15) print $1}' input_file
345
456
With your shown samples please try following GNU awk code. Using match function of GNU awk where I am using regex (^[0-9]+).*ETA=([0-9]+):[0-9]+ which creates 2 capturing groups and saves its values into array arr. Then checking condition if 2nd element of arr is greater than 15 then print 1st value of arr array as per requirement.
awk '
match($0,/(^[0-9]+).*\<ETA=([0-9]+):[0-9]+/,arr) && arr[2]+0>15{
print arr[1]
}
' Input_file
I would harness GNU AWK for this task following way, let file.txt content be
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=02:00,dom=sss.co.uk,user2=lis
then
awk 'substr($0,index($0,"ETA=")+4,2)+0>15{print $1}' file.txt
gives output
345
Explanation: I use String functions, index to find where is ETA= then substr to get 2 characters after ETA=, 4 is used as ETA= is 4 characters long and index gives start position, I use +0 to convert to integer then compare it with 15. Disclaimer: this solution assumes every row has ETA= followed by exactly 2 digits.
(tested in GNU Awk 5.0.1)
Whenever input contains tag=value pairs as yours does, it's best to first create an array of those mappings (v[]) below and then you can just access the values by their tags (names):
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
v["ETA"]+0 > 15 {
print $1
}
$ awk -f tst.awk file
345
456
With that approach you can trivially enhance the script in future to access whatever values you like by their names, test them in whatever combinations you like, output them in whatever order you like, etc. For example:
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
(v["pro"] ~ /b/) && (v["ETA"]+0 > 15) {
print $1, v["team"], v["dom"]
}
$ awk -f tst.awk file
345,abc,sbc.int
456,efg,sss.co.uk
Think about how you'd enhance any other solution to do the above or anything remotely similar.
It's unclear why you think your attempt would do anything of the sort. Your attempt uses a completely different field separator and does not compare anything against the number 15.
You'll also want to get rid of the useless use of cat.
When you specify a column separator with -F that changes what the first column $1 actually means; it is then everything before the first occurrence of the separator. Probably separately split the line to obtain the first column, space-separated.
awk -F 'ETA=' '$2 > 15 { split($0, n, /[ \t]+/); print n[1] }' file.txt
The value in $2 will be the data after the first separator (and up until the next one) but using it in a numeric comparison simply ignores any non-numeric text after the number at the beginning of the field. So for example, on the first line, we are actually literally checking if 12:00, team=xyz,user1=tom,dom=dby.com is larger than 15 but it effectively checks if 12 is larger than 15 (which is obviously false).
When the condition is true, we split the original line $0 into the array n on sequences of whitespace, and then print the first element of this array.
Using awk you could match ETA= followed by 1 or more digits. Then get the match without the ETA= part and check if the number is greater than 15 and print the first field.
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4)+0 > 15) print $1
}' file
Output
345
456
If the first field should start with a number:
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4) > 15)+0 print $1
}' file

Find a match in a field and print next n fields

BASH noob here.
I have a tab separated file structured like this:
ABC DEF x 123 456
GHI x 678 910
I need to match "x" and print x plus the following two fields:
x 123 456
x 678 910
I've tried a few things but the issue that throws me off is that "x" is not always in the same field. Can please somebody help?
Thanks in advance.
If you are working in bash, then bash provides parameter expansions with substring removal that are built-in. They (along with many more) are:
${var#pattern} Strip shortest match of pattern from front of $var
${var##pattern} Strip longest match of pattern from front of $var
${var%pattern} Strip shortest match of pattern from back of $var
${var%%pattern} Strip longest match of pattern from back of $var
So in your case you want to trim the longest path from the front up to x as the pattern, e.g.
while read line || [ -n "$line" ]; do
echo "x${line##*x}"
done
Where you read each line and then trim from the front until 'x' is found (you remove the 'x' as well), so you simply output "x....." where "....." is the rest of the line (restoring the 'x')
(for large data sets, you would want to use awk or sed for efficiency reasons)
Example Use/Output
Using your sample data in a heredoc, you could do:
$ while read line || [ -n "$line" ]; do
> echo "x${line##*x}"
> done << 'eof'
> ABC DEF x 123 456
> GHI x 678 910
> eof
x 123 456
x 678 910
You can just select-copy/middle-mouse-paste the following in your xterm to test:
while read line || [ -n "$line" ]; do
echo "x${line##*x}"
done << 'eof'
ABC DEF x 123 456
GHI x 678 910
eof
Using grep -o For Simplicity
You other option, is to use grep -o where the -o option returns the part of the line only-matching the expression you provide, so
grep -o 'x.*$' file
Is another simple option, e.g.
$ grep -o 'x.*$' << 'eof'
> ABC DEF x 123 456
> GHI x 678 910
> eof
x 123 456
x 678 910
Let me know if you have any further questions.
In case you need to match only tab separated field x:
pcregrep -o '(^|\t)\Kx(\t|$).*' file
awk 'n=match($0,/(^|\t)x(\t|$)/) {$0=substr($0,n); sub(/^\t/,""); print}' file
To print only the two following fields:
pcregrep -o '(^|\t)\Kx(\t[^\t]*){2}' file
awk 'n=match($0,/(^|\t)x\t[^\t]*\t[^\t]*/) {$0=substr($0,n,RLENGTH); sub(/^\t/,""); print}' file
Could you please try following, written and tested with shown samples in GNU awk.
awk '
match($0,/[[:space:]]+x[[:space:]]+[0-9]+[[:space:]]+[0-9]+$/){
val=substr($0,RSTART,RLENGTH)
sub(/^[[:space:]]+/,"",val)
print val
}
' Input_file
OR to match more than 1 set of digits after x with spaces try following.
awk '
match($0,/[[:space:]]+x[[:space:]]+([0-9]+[[:space:]]+){1,}[0-9]+/){
val=substr($0,RSTART,RLENGTH)
sub(/^[[:space:]]+/,"",val)
print val
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[[:space:]]+x[[:space:]]+[0-9]+[[:space:]]+[0-9]+$/){ ##Using match function to match regex here.
val=substr($0,RSTART,RLENGTH) ##Creating val which has sub string of matched regex(previous step) length.
sub(/^[[:space:]]+/,"",val) ##Substituting initial space with NULL in val here.
print val ##Printing val here.
}
' Input_file ##mentioning Input_file name here.
I need to match x and print x plus the following two fields:
Using awk without any regex:
awk 'BEGIN {FS=OFS="\t"} {for (i=1; i<=NF; ++i) if ($i == "x") break;
print $i, $(i+1), $(i+2)}' file
x 123 456
x 678 910
Or, using gnu sed:
sed -E 's/(^|.*\t)(x(\t[^\t]+){2}).*/\2/' file
x 123 456
x 678 910
If you want to remove everything before the "x", you can run a sed command like this:
sed 's/^.*x/x/g' file.txt
It finds all occurrences of the pattern ^.*x and replace it with x.
Breakdown of ^.*x:
^ means beginning of a line
.* a wild card pattern that can be more than one character
x the character "x"
Hence it replaces everything before and including "x" that are on the same line with the new pattern, just "x".
For more info on sed's find and replace command, see https://www.cyberciti.biz/faq/how-to-use-sed-to-find-and-replace-text-in-files-in-linux-unix-shell/.

BASH - Split file into several files based on conditions

I have a file (input.txt) with the following structure:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
...
I would like to split this file into multiple files (day.txt; month.txt; ...). Each new text file would contain all "header" lines (the one starting with >) and their content (lines between two header lines).
day.txt would therefore be:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
and month.txt:
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
I cannot use split -l in this case because the amount of lines is not the same for each category (day, month, etc.). However, each sub-category has the same number of lines (=3).
EDIT: As per OP adding 1 more solution now.
awk -F'[>_]' '/^>/{file=$2".txt"} {print > file}' Input_file
Explanation:
awk -F'[>_]' ' ##Creating field separator as > or _ in current lines.
/^>/{ file=$2".txt" } ##Searching a line which starts with > if yes then creating a variable named file whose value is 2nd field".txt"
{ print > file } ##Printing current line to variable file(which will create file name of variable file's value).
' Input_file ##Mentioning Input_file name here.
Following awk may help you on same.
awk '/^>day/{file="day.txt"} /^>month/{file="month.txt"} {print > file}' Input_file
You can set the record separator to > and then just set the file name based on the category given by $1.
$ awk -v RS=">" 'NF {f=$1; sub(/_.*$/, ".txt", f); printf ">%s", $0 > f}' input.txt
$ cat day.txt
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
$ cat month.txt
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
Here's a generic solution for >name_number format
$ awk 'match($0, /^>[^_]+_/){k = substr($0, RSTART+1, RLENGTH-2);
if(!(k in a)){close(op); a[k]; op=k".txt"}}
{print > op}' ip.txt
match($0, /^>[^_]+_/) if line matches >name_ at start of line
k = substr($0, RSTART+1, RLENGTH-2) save the name portion
if(!(k in a)) if the key is not found in array
a[k] add key to array
op=k".txt" output file name
close(op) in case there are too many files to write
print > op print input record to filename saved in op
Since each subcategory is composed of the same amount of lines, you can use grep's -A / --after flag to specify that number of lines to match after a header.
So if you know in advance the list of categories, you just have to grep the headers of their subcategories to redirect them with their content to the correct file :
lines_by_subcategory=3 # number of lines *after* a subcategory's header
for category in "month" "day"; do
grep ">$category" -A $lines_by_subcategory input.txt >> "$category.txt"
done
You can try it here.
Note that this isn't the most efficient solution as it must browse the input once for each category. Other solutions could instead browse the content and redirect each subcategory to their respective file in a single pass.

Formatting based on condition in bash

Here is my code:
grep -E -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b|first_name.{0,40}|[(]?[2-9]{1}[0-9]{2}[)-. ]?[2-9]{1}[0-9]{2}[-. ]?[0-9]{4}" file.txt | awk -v ORS= '
NR>1 && !/,/ {print "\n"}
{print}
END {if (NR) print "\n"}' | sed -e :a -e '$!N;s/\n[0-9]{3}/,/;ta' -e 'P;D' | sed '$!N;s/\n\s*[0-9]//;P;D'
I'm pretty close. The above code works, but is removing the first digit from phone number.
I'm looking for a bash solution to do the following:
Combine two lines if the lines do not start with a number.
If the line starts with a number, combine the previous two lines + the line with the number for 3 fields in one line.
Here's an example?
jim.bob3#email.com
Jim Bob
jane.bob#email.com
Jane Bob
joebob1122#email.com
Joe Bob
555 555 5555
jbob44#email.com
Jeff Bob
....
Results:
jim.bob3#email.com Jim Bob
jane.bob#email.com Jane Bob
joebob1122#email.com Joe Bob 555 555 5555
jbob44#email.com Jeff Bob
Thanks!
If your Input_file is same as shown sample then following awk solution may help you in same.
awk '{printf("%s",$0~/^name/&&FNR>1?RS $0:FNR==1?$0:FS $0)} END{print ""}' Input_file
Output will be as follows.
name1#email.com Jim Bob
name2#email.com Jane Bob
name3#email.com Joe Bob 555 555 5555
name4#email.com Jeff Bob
Explanation: Following code is only for understanding purposes NOT for running you could use above code for running.
awk '{printf(\ ##Using printf keyword from awk here to print the values etc.
"%s",\ ##Mentioning %s means it tells printf that we are going to print a string here.
$0~/^name/&&FNR>1\ ##Checking here condition if a line starts from string name and line number is greater than 1 then:
?\ ##? means following statement will be printed as condition is TRUE.
RS $0\ ##printing RS(record separator) and current line here.
:\ ##: means in case mentioned above condition was NOT TRUE then perform following steps:
FNR==1\ ##Checking again condition here if a line number is 1 then do following:
?\ ##? means execute statements in case above condition is TRUE following ?
$0\ ##printing simply current line here.
:\ ##: means in case above mentioned conditions NOT TRUE then perform actions following :
FS $0)} ##Printing FS(field separator) and current line here.
END{print ""}' file24 ##Printing a NULL value here to print a new line and mentioning the Input_file name here too.
Using awk
awk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infile
Input:
$ cat infile
jim.bob3#emaawk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infileil.com
Jim Bob
jane.bob#email.com
Jane Bob
joebob1122#email.com
Joe Bob
555 555 5555
jbob44#email.com
Jeff Bob
Output:
$ awk '/#/{if(s)print s;s=""}{s=(s?s OFS:"")$0}END{if(s)print s}' infile
jim.bob3#email.com Jim Bob
jane.bob#email.com Jane Bob
joebob1122#email.com Joe Bob 555 555 5555
jbob44#email.com Jeff Bob
Explanation:
awk '/#/{ # ir row/line/record contains #
if(s)print s; # if variable s was set before print it.
s="" # nullify or reset variable s
}
{
s=(s?s OFS:"")$0 # concatenate variable s with its previous content if it was set before, with
# OFS o/p field separator and
# current row/line/record ($0),
# otherwise s will be just current record
}
END{ # end block
if(s)print s # if s was set before print it
}
' infile

Sum Values in Shell Script

I have the following values in this format in a file:
Filevalue.txt
abc
123
dev
456
hij
567
123
542
I need to add the numerical values from the character values which are below it
Output
abc 123
dev 456
hij 1232
Anyhelp will be deeply appreciated?
Here's another awk that mantains the order.
Example
$ awk '/[0-9]+/{a+=$1;next}a{print a;a=0}{printf($1FS);a=0}END{print a}' file
abc 123
dev 456
hij 1232
Explanation
/[0-9]+/{a+=$1;next}:When a number is detected as the content of the record, it's value is accumulated into a var, then next is used to stop further processing and pass the flow to the next record.
a{print a;a=0}: ONLY when the counter is not null we print a value that correspond to the previous word and initialize it.
{printf($1FS);a=0}: Print current record and the separator avoiding the carriage return. This is applied to all text records.
END{print a}: Show the final counter that will remain after the last record.
It's possible to do this by using awk and arrays:
$ awk '{if($1!~/^[0-9]+$/){cur=$1}else{sums[cur]+=$1}}END{for(el in sums){printf("%s %d\n",el,sums[el])}}' Filevalue.txt
hij 1232
dev 456
abc 123
Here is the same code but written to the file sum.awk and with comments:
# this block will be executed for every line in the file
{
# if current processing line is not a number, then it's a name of the next
# group, let's save it to cur variable
if ($1 !~ /^[0-9]+$/) {
cur = $1
} else {
# here we're summing values
sums[cur] += $1
}
}
# this block will be executed at the end of file processing, here we're
# printing array with sums
END {
for (el in sums) {
printf("%s %d\n", el, sums[el])
}
}
Use it like this:
$ awk -f sum.awk Filevalue.txt
hij 1232
dev 456
abc 123
The only downside of using awk here is that it doesn't preserve keys order.

Resources