Error with awk (newline or end of string) - bash

I am having an issue with the following command:
awk ‘{if ($1 ~ /^##contig/) {next}else if ($1 ~ /^#/) {print $0; next}else {print $0 | “sort -k1,1V -k2,2n”}’ file.vcf > out.vcf
It gives the following error:
^ unexpected newline or end of string

Your command contains "fancy quotes" instead of normal ones, in addition to a missing }.
awk '{if ($1 ~ /^##contig/) {next} else if ($1 ~ /^#/) {print $0; next} else {print $0 | "sort -k1,1V -k2,2n"} }' file.vcf > out.vcf
Changing your command to the above should work as expected.

Related

Get substring of regex from string

I am using a bash script to parse logs and need to extract a substring.
My string is something like this -
TestingmkJHSBD,MFV from testing:2.6.1.566-978.7 testing
How can I extract this string using bash
Would you please try the following:
awk '
/FROM image/ {
if (match($0, /image[^[:space:]]+/))
print(substr($0, RSTART, RLENGTH))
}
' logfile
The regex image[^[:space:]]+ matches a substring which starts with
image and followed by non-space character(s).
Then the awk variables RSTART and RLENGTH are assigned to the position
and the length of the matched substring.
Another awk option:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"TestingmkJHSBD,MFV FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7
Or using different log string:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"bdkjf asfjkklsdfsg FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7

How to compare two different files and two different columns bash

file 1
Client ID,USER ID,DH SERV, ...
,abs,2022-04-24, ...
,btg,2022-04-24, ...
file 2
abs,124235235
dsg,262356527
If second columnt from first file = first column from the second file, then add second column from second file in the first column 1 file.
I need to get:
Client ID,USER ID,DH SERV, ...
124235235,abs,2022-04-24, ...
,btg,2022-04-24, ...
How can I do this?
That's my attempts, but i don't understand very much
#!/bin/bash
#awk -F, 'FNR==NR{a[$1]=$0;next} ($1 in a){print $2,a[$1]}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'NR==FNR{a[FNR]=$1; next} {$2 == a[FNR] ? a[FNR]","$0 : $0}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'NR==FNR{a[FNR]=$1; next} {$2 == a[FNR] ? a[FNR]","$0 : $0}' wamfactory_6100.csv mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv > test
#awk -F, '{print FILENAME, NR, FNR, a[FNR]=$2,"||", b[NR]=$1}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#Work
#awk -F, 'NR==FNR{A[$2]; next}$1 in A' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'NR==FNR{A[NR]=$1; next}($2 in A) {print A[NR]}' wamfactory_6100.csv mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv> test
#awk -F, 'NR==FNR{A[$2]=$2; next}$1 in A{print A[$2]}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'FNR==NR{A[$1]=$1; next}$2 in A{print A[$1]}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
awk -F, 'NR==FNR {arr[$1]=$2 $1; next}
{print arr[$1]","$0}
' wamfactory_6100.csv mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv > test
$ awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1]=$2; next} $2 in a{$1=a[$2]} 1' file2 file1
Client ID,USER ID,DH SERV, ...
124235235,abs,2022-04-24, ...
,btg,2022-04-24, ...
This should do the trick:
awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=$2} NR!=FNR{if($2 in a){print a[$2],$2,$3,$4 }else{print $0}}' file2 file1
Client ID,USER ID,DH SERV, ...
124235235,abs,2022-04-24, ...
,btg,2022-04-24, ...
Just using bash's associative and indexed arrays:
f(){
# delimit input/output with comma not whitespace
local IFS=,
# treat m as an associative array
declare -A m
# initialize m from second file
while read -r k v; do
m[$k]=$v
done <"$2"
# process lines of the first file
while read -ra i; do
# get second column
k=${i[1]}
# attempt substitution
i[1]=${m[$k]:-$k}
# print the line (quoting forces interpolation of IFS)
echo "${i[*]}"
done <"$1"
}
f "file 1" "file 2"

awk-IF...ELSE IF issue in command

cat file1
xizaoshuijiao #E0488_5#
chifandaqiu #E0488_3#
gongzuoyouxi #E0977_5#
cat file2
#E0488_3# #E0488_3#
#E0488_5# #E0488_5#
#E0977_3# #E0977_3#
#E0977_5# #E0977_5#
#E0977_6# #E0977_6#
Purpose:if $NF in file1 found in file2 $1, than replace $NF in file1 with file2 $2.otherwise, makes no change.
My code:
awk '\
NR==FNR{a[$1]=$1;b[$2]=$2;next}\
{if($NF in a)\
{$NF=b[FNR];print $0}\
else if!($NF in a)\
{print $0}\
}' file2 file1
Then it came error:
awk: cmd. line:5: else if!($NF in a)\
awk: cmd. line:5: ^ syntax error
awk: cmd. line:6: {print $0}\
awk: cmd. line:6: ^ syntax error
So it seems that "!" issue. because I want to print all content in file1(both changed line and unchanged line).How can I do it ?
you can rewrite it in this form
awk 'NR==FNR {a[$1]=$2; next}
$NF in a {$2=a[$1]}1' file2 file1
since your file2 has the same values for $1 and $2, it seems useless.
Since you want to print unconditionally, don't print in the condition block. Here 1 corresponds to {print} which is the same as {print $0}
Replace:
if!($NF in a)
With:
if(!($NF in a))
! is part of the test-condition and awk expects the test-condition to all be inside parens.
Here comes my code after verification.
awk '\
NR==FNR{a[$1]=$1;b[$2]=$2;next}\
{if($NF in a)\
{$NF=b[FNR];print $0}\
else # use else... it will work, no need else if... , but why ? How can I achieve it with else if !($NF in a)
{print $0}\
}' file2 file1

Remove duplicate from csv using bash / awk

I have a csv file with the format :
"id-1"|"A"
"id-2"|"C"
"id-1"|"B"
"id-1"|"D"
"id-2"|"B"
"id-3"|"A"
"id-3"|"A"
"id-1"|"B"
I want to group by first column unique id's and concat types in a single row like this:
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"
I found awk does a great job in handling such scenarios. But all I could achieve is this:
"id-1"|"A":"B":"D":"B"
"id-2"|"B":"C"
"id-3"|"A":"A"
I used this command:
awk -F "|" '{if(a[$1])a[$1]=a[$1]":"$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' OFS="|" file
How can I remove the duplicates and also handle the formatting of the second column types?
quick fix:
$ awk -F "|" '!seen[$0]++{if(a[$1])a[$1]=a[$1]":"$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' OFS="|" file
"id-1"|"A":"B":"D"
"id-2"|"C":"B"
"id-3"|"A"
!seen[$0]++ will be true only if line was not already seen
If second column should all be within double quotes
$ awk -v dq='"' 'BEGIN{FS=OFS="|"}
!seen[$0]++{a[$1]=a[$1] ? a[$1]":"$2 : $2}
END{for (i in a){gsub(dq,"",a[i]); print i, dq a[i] dq}}' file
"id-1"|"A:B:D"
"id-2"|"C:B"
"id-3"|"A"
With GNU awk for true multi-dimensional arrays and gensub() and sorted_in:
$ awk -F'|' '
{ a[$1][gensub(/"/,"","g",$2)] }
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for (i in a) {
c = 0
for (j in a[i]) {
printf "%s%s", (c++ ? ":" : i "|\""), j
}
print "\""
}
}
' file
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"
The output rows and columns will both be string-sorted (i.e. alphabetically by characters) in ascending order.
Short GNU datamash + tr solution:
datamash -st'|' -g1 unique 2 <file | tr ',' ':'
The output:
"id-1"|"A":"B":"D"
"id-2"|"B":"C"
"id-3"|"A"
----------
In case if between-item double quotes should be eliminated - use the following alternative:
datamash -st'|' -g1 unique 2 <file | sed 's/","/:/g'
The output:
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"
For sample, input below one will work, but unsorted
One-liner
# using two array ( recommended )
awk 'BEGIN{FS=OFS="|"}!seen[$1,$2]++{a[$1] = ($1 in a ? a[$1] ":" : "") $2}END{for(i in a)print i,a[i]}' infile
# using regexp
awk 'BEGIN{FS=OFS="|"}{ a[$1] = $1 in a ? ( a[$1] ~ ("(^|:)"$2"(:|$)") ? a[$1] : a[$1]":"$2 ) : $2}END{for(i in a)print i,a[i]}' infile
Test Results:
$ cat infile
"id-1"|"A"
"id-2"|"C"
"id-1"|"B"
"id-1"|"D"
"id-2"|"B"
"id-3"|"A"
"id-3"|"A"
"id-1"|"B"
$ awk 'BEGIN{FS=OFS="|"}!seen[$1,$2]++{a[$1] = ($1 in a ? a[$1] ":" : "") $2}END{for(i in a)print i,a[i]}' infile
"id-1"|"A":"B":"D"
"id-2"|"C":"B"
"id-3"|"A"
$ awk 'BEGIN{FS=OFS="|"}{ a[$1] = $1 in a ? ( a[$1] ~ ("(^|:)"$2"(:|$)") ? a[$1] : a[$1]":"$2 ) : $2}END{for(i in a)print i,a[i]}' infile
"id-1"|"A":"B":"D"
"id-2"|"C":"B"
"id-3"|"A"
Better Readable:
Using regexp
awk 'BEGIN{
FS=OFS="|"
}
{
a[$1] =$1 in a ?(a[$1] ~ ("(^|:)"$2"(:|$)") ? a[$1] : a[$1]":"$2):$2
}
END{
for(i in a)
print i,a[i]
}
' infile
Using two array
awk 'BEGIN{
FS=OFS="|"
}
!seen[$1,$2]++{
a[$1] = ($1 in a ? a[$1] ":" : "") $2
}
END{
for(i in a)
print i,a[i]
}' infile
Note: you can also use !seen[$0]++, it will use entire line as index, but in case in your real data, if
you want to prefer some other column, you may prefer !seen[$1,$2]++,
here column1 and column2 are used as index
awk + sort solution:
awk -F'|' '{ gsub(/"/,"",$2); a[$1]=b[$1]++? a[$1]":"$2:$2 }
END{ for(i in a) printf "%s|\"%s\"\n",i,a[i] }' <(sort -u file)
The output:
"id-1"|"A:B:D"
"id-2"|"B:C"
"id-3"|"A"

Multiple pattern matching

I have an input file with columns seperated by | as follows.
[3yu23yuoi]|$name
!$fjkdjl|[kkklkl]
$hjhj|$mmkj
I want the output as
0 $name
!$fjkdjl 0
$hjhj $mmkj
Whenever the string begins with $ or !$ or "any", i want it to get printed as such else 0.
I have tried the following command.It prints verything same as input file only.
awk -F="|" '{if (($1 ~ /^.*\$/) || ($1 ~ /^.*\!$/) || ($1 ~ /^any/)) {print $1} else if ($1 ~ /^\[.*/){print "0"} else if (($2 ~ /^.*\$/) || ($2 ~ /^.*\!$/) || ($2 ~ /^any/)) {print $2} else if($2 ~ /^\[.*/){print "0"}}' input > output
This should do:
awk -F\| '{$1=$1;for (i=1;i<=NF;i++) if ($i!~/^(\$|!\$|any)/) $i=0}1' file
0 $name
!$fjkdjl 0
$hjhj $mmkj
If data does not start with $ !$ or any, set it to 0
Or if you like tab as separator:
awk -F\| '{$1=$1;for (i=1;i<=NF;i++) if ($i!~/^(\$|!\$|^any)/) $i=0}1' OFS="\t" file
0 $name
!$fjkdjl 0
$hjhj $mmkj
$1=$1 make sure all line have same output, even if no data is changed.

Resources