Awk replace after loop - bash

I want to override the child_value with the parent_value using awk. The solution must be generic applicable for larger data sources. The parent-record is defined by $1==$2.
This is my input file (format: ID;PARENT_ID;VALUE):
10;20;child_value
20;20;parent_value
This is the result I want:
10;20;parent_value
20;20;parent_value
This is my current approach:
awk -F\;
BEGIN {
OFS = FS
}
{
if ($1 == $2) {
mapping[$1] = $3
}
all[$1]=$0
}
END {
for (i in all) {
if (i[$3] == 'child_value') {
i[$3] = mapping[i]
}
print i
}
}
' file.in
Needless to say, that it doesn't work like that ;-) Anyone can help?

for multiple parent/child pairs perhaps on non-consecutive lines...
$ awk -F\; -v OFS=\; 'NR==FNR {if($1==$2) a[$2]=$3; next}
$1!=$2 {$3=a[$2]}1' file{,}
10;20;parent_value
20;20;parent_value
assumes the second field is the parent id.

Well, if your data is sorted in descnding order (you could use sort if not sorted at all or rev if data is sorted in ascending order) before processing, it's enough to hash the first entry of each key in $2 and use the value on the first match for the following records with the same key in $2:
$ sort -t\; -k2nr -k1nr bar | \
awk '
BEGIN{
FS=OFS=";"
}
{
if($2 in a) # if $2 in hash a, use it
$3=a[$2]
else # else add it
a[$2]=$3
if(p!=$2) # delete previous entries from wasting memory
delete a[p]
p=$2 # p is for previous on next round
}1'
20;20;parent_value
10;20;parent_value

Related

Bash group by on the basis of n number of columns

This is related to my previous question that I [asked] (bash command for group by count)
What if I want to generalize this? For instance
The input file is
ABC|1|2
ABC|3|4
BCD|7|2
ABC|5|6
BCD|3|5
The output should be
ABC|9|12
BCD|10|7
The result is calculated by group first column and adding the values of 2nd column, and 3rd column, just like similar to group by command in SQL.
I tried modifying the command provided in the link but failed. I don't know whether I'm making a conceptual error or a silly mistake but all I know is none of the mentioned commands aren't working.
Command used
awk -F "|" '{arr[$1]+=$2} END arr2[$1]+=$5 END {for (i in arr) {print i"|"arr[i]"|"arr2[i]}}' sample
awk -F "|" '{arr[$1]+=$2} END {arr2[$1]+=$5} END {for (i in arr) {print i"|"arr[i]"|"arr2[i]}}' sample
awk -F "|" '{arr[$1]+=$2 arr2[$1]+=$5} END {for (i in arr2) {print i"|"arr[i]"|"arr2[i]}}' sample
Additionally, what if I'm trying here is to limit the use to summing the columns upto 2 only. What if there are n columns and we want to perform operations such as addition in one column and subtraction in other? How can that further be modified?
Example
ABC|1|2|4|......... upto n columns
ABC|4|5|6|......... upto n columns
DEF|1|4|6|......... upto n columns
lets say if sum is needed with first column, average may be for second column, some other operation for third column, etc. How this can be tackled?
For 3 fields (key and 2 data fields):
$ awk '
BEGIN { FS=OFS="|" } # set separators
{
a[$1]+=$2 # sum second field to a hash
b[$1]+=$3 # ... b hash
}
END { # in the end
for(i in a) # loop all
print i,a[i],b[i] # and output
}' file
BCD|10|7
ABC|9|12
More generic solution for n columns using GNU awk:
$ awk '
BEGIN { FS=OFS="|" }
{
for(i=2;i<=NF;i++) # loop all data fields
a[$1][i]+=$i # sum them up to related cells
a[$1][1]=i # set field count to first cell
}
END {
for(i in a) {
for((j=2)&&b="";j<a[i][1];j++) # buffer output
b=b (b==""?"":OFS)a[i][j]
print i,b # output
}
}' file
BCD|10|7
ABC|9|12
Latter only tested for 2 fields (busy at a meeting :).
gawk approach using multidimensional array:
awk 'BEGIN{ FS=OFS="|" }{ a[$1]["f2"]+=$2; a[$1]["f3"]+=$3 }
END{ for(i in a) print i,a[i]["f2"],a[i]["f3"] }' file
a[$1]["f2"]+=$2 - summing up values of the 2nd field (f2 - field 2)
a[$1]["f3"]+=$3 - summing up values of the 3rd field (f3 - field 3)
The output:
ABC|9|12
BCD|10|7
Additional short datamash solution (will give the same output):
datamash -st\| -g1 sum 2 sum 3 <file
-s - sort the input lines
-t\| - field separator
sum 2 sum 3 - sums up values of the 2nd and 3rd fields respectively
awk -F\| '{ array[$1]="";for (i=1;i<=NF;i++) { arr[$1,i]+=$i } } END { for (i in array) { printf "%s",i;for (p=2;p<=NF;p++) { printf "|%s",arr[i,p] } print "\n" } }' filename
We use two arrays, (array and arr) array is a single dimensional array tracking all the first pieces and arr is a multidimensional array keyed on the first piece and then the piece index and so for example arr["ABC",1]=1 and arr["ABC",2]=2. At the end we loop through array and then each field in the data set, we pull out the data from the multidimensional array arr.
This will work in any awk and will retain the input keys order in the output:
$ cat tst.awk
BEGIN { FS=OFS="|" }
!seen[$1]++ { keys[++numKeys] = $1 }
{
for (i=2;i<=NF;i++) {
sum[$1,i] += $i
}
}
END {
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "%s%s", key, OFS
for (i=2;i<=NF;i++) {
printf "%s%s", sum[key,i], (i<NF?OFS:ORS)
}
}
}
$ awk -f tst.awk file
ABC|9|12
BCD|10|7

Finding hits based on subgroups (A, B, C) in big data set

I have a huge amount of data to analyze.
From this example file I need to know all hits ("samples") that were found either
only in B,
only in C,
in A and C or
in A and B,
but not the ones that are found
in B and C or
in A, B and C.
[edit: to keep it more simple: there should be no co-occurence of B and C]
These letters are found in column $8.
The first two columns together can be used as an identifier for each "sample".
Example: You can see that for “463;88” we find A and C in column $8 which would make “463;88” a hit that I need in a separate output file. "348;64" is found in A, B and C and would therefore be discarded/ignored.
File1.csv
463;88;1;193187729;280062;CDC73;IS;A;0.0
463;88;1;193188065;280062;CDC73;IS;A;0.0
463;88;1;193188527;280062;CDC73;IS;A;0.0
463;88;1;193188542;280062;CDC73;IS;C;0.0
348;64;1;155219446;384172;GBAP1;IS;B;0.0
348;64;1;155224629;384172;GBAP1;IS;C;0.0
348;64;1;155224965;384172;GBAP1;IS;A;0.0
71;35;2;27400461;145220;PPM1G;IS;A;0.0
71;35;2;27400930;145220;PPM1G;IS;A;0.0
71;35;2;27401162;145220;PPM1G;IS;A;0.0
71;35;2;27403518;145220;PPM1G;IS;B;0.0
71;35;2;27403545;145220;PPM1G;IS;B;0.0
71;35;2;27404353;145220;PPM1G;IS;B;0.0
71;35;2;27419156;145220;NRBP1;IS;B;0.0
7;14;20;2894103;92099;PTPRA;IS;B;0.0
7;14;20;2906211;92099;PTPRA;IS;C;0.0
7;14;20;2907301;92099;PTPRA;IS;C;0.0
...
Does anyone have a suggestion how to do this, eg with bash, awk, grep...?
It does not need to be very efficient or fast, it just needs to be reliable.
Edit:
I generated a csv table with the columns 1 and 2 of lines that contain <3 different entries in column $8 in several steps.
awk print $1, $2, $8 | sort –n | uniq > file.tmp1
awk print $1, $2 from file.tmp1 | sort –n | unic –c | sed for csv format > file.tmp2
Finally,
awk to print only the identifier columns from file.tmp2 where the count was <3 (= only one or two different letters in column $8 of original file).
File2.csv
6;3;
12;9;
348;40;
463;88;
...
Then, I wanted to use
fgrep --file=File2.csv File1.csv
but this does not seem to work properly. And it still requires manual analysis as it gives me also false hits.
another alternative
keeps only the keys of the lines to be deleted, but scans the files twice. Also, file doesn't need to be sorted.
$ awk -F';' '{k=$1 FS $2}
NR==FNR {if($8=="B") b[k];
else if($8=="C") c[k];
if(k in b && k in c) d[k];
next}
!(k in d)' file{,}
463;88;1;193187729;280062;CDC73;IS;A;0.0
463;88;1;193188065;280062;CDC73;IS;A;0.0
463;88;1;193188527;280062;CDC73;IS;A;0.0
463;88;1;193188542;280062;CDC73;IS;C;0.0
71;35;2;27400461;145220;PPM1G;IS;A;0.0
71;35;2;27400930;145220;PPM1G;IS;A;0.0
71;35;2;27401162;145220;PPM1G;IS;A;0.0
71;35;2;27403518;145220;PPM1G;IS;B;0.0
71;35;2;27403545;145220;PPM1G;IS;B;0.0
71;35;2;27404353;145220;PPM1G;IS;B;0.0
71;35;2;27419156;145220;NRBP1;IS;B;0.0
with gawks bitwise operations, can be further simplified to
$ awk -F';' 'BEGIN {c["B"]=1; c["C"]=2}
{k=$1 FS $2}
NR==FNR {d[k]=or(d[k],c[$8]); next}
d[k]!=3' file{,}
or is an idempotent function updates the array if "B" or "C" are seen. If both seen the value will be 3, in the second round print everything else.
You don't show the expected output in your question so it's a guess but is this what you're looking for?
$ cat tst.awk
BEGIN { FS=OFS=";" }
{ curr = $1 FS $2 }
curr != prev { prt(); prev=curr }
{ lines[++numLines]=$0; seen[$8]++ }
END { prt() }
function prt() {
if ( !(seen["B"] && seen["C"]) ) {
for ( lineNr=1; lineNr<=numLines; lineNr++) {
print lines[lineNr]
}
}
delete seen
numLines=0
}
$ awk -f tst.awk file
463;88;1;193187729;280062;CDC73;IS;A;0.0
463;88;1;193188065;280062;CDC73;IS;A;0.0
463;88;1;193188527;280062;CDC73;IS;A;0.0
463;88;1;193188542;280062;CDC73;IS;C;0.0
71;35;2;27400461;145220;PPM1G;IS;A;0.0
71;35;2;27400930;145220;PPM1G;IS;A;0.0
71;35;2;27401162;145220;PPM1G;IS;A;0.0
71;35;2;27403518;145220;PPM1G;IS;B;0.0
71;35;2;27403545;145220;PPM1G;IS;B;0.0
71;35;2;27404353;145220;PPM1G;IS;B;0.0
71;35;2;27419156;145220;NRBP1;IS;B;0.0
Something like this should work, unless you run out of memory:
BEGIN { FS=";"; }
{
keys[$1,$2] = 1;
data[$1,$2,$8] = $0;
}
END {
for (key in keys) {
a = data[key,"A"];
b = data[key,"B"];
c = data[key,"C"];
if (!(b && c)) {
if (a) { print a; }
if (b) { print b; }
if (c) { print c; }
}
}
}
Assuming that all lines with the same key are consecutive, this should work:
BEGIN { FS=";"; }
$1";"$2 != key {
if (!(data["B"] && data["C"])) {
print key;
}
delete data;
key = $1";"$2;
}
{
data[$8] = 1;
}

Using Awk and match()

I have a sequencing file to analyze that has many lines like the following tab separated line:
chr12 3356475 . C A 76.508 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=3;CIGAR=1X;DP=3;DPB=3;DPRA=0;EPP=9.52472;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=8.76405;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=111;QR=0;RO=0;RPP=9.52472;RPPR=0;RUN=1;SAF=3;SAP=9.52472;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:RO:QR:AO:QA:GL 1/1:3:0:0:3:111:-10,-0.90309,0
I am trying to use awk to match particular regions to their DP value. This is how I'm trying it:
awk '$2 == 33564.. { match(DP=) }' file.txt | head
Neither the matching nor the wildcards seem to work.
Ideally this code would output 3 because that is what DP equals.
You can use either ; or tab as the field delimiter. Doing so you can access the number in $2 and the DP= field in $14:
awk -F'[;\t]' '$2 ~ /33564../{sub(/DP=/,"",$14);print $14}' file.txt
The sub function is used to remove DP= from $14 which leaves only the value.
Btw, if you also add = to the set of field delimiters the value of DP will be in field 21:
awk -F'[;\t=]' '$2 ~ /33564../{print $21}' file.txt
Having worked with genomic data, I believe that the following will be more robust than the previously posted solution. The main difference is that the key-value pairs are treated as such, without any assumption about their ordering, etc. The minor difference is the carat ("^") in the regex:
awk -F'\t' '
$2 ~ /^33564../ {
n=split($8,a,";");
for(i=1;i<=n;i++) {
split(a[i],b,"=");
if (b[1]=="DP") {print $2, b[2]} }}'
If this script is to be used more than once, then it would be better to abstract the lookup functionality, e.g. like so:
awk -F'\t' '
function lookup(key, string, i,n,a,b) {
n=split(string,a,";");
for(i=1;i<=n;i++) {
split(a[i],b,"=");
if (b[1]==key) {return b[2]}
}
}
$2 ~ /^33564../ {
val = lookup("DP", $8);
if (val) {print $2, val;}
}'

Unix AWK field seperator finding sum of one field group by other

I am using below awk command which is returning me unique value of parameter $11 and occurrence of it in the file as output separated by commas. But along with that I am looking for sum of parameter $14(last value) in the output. Please help me on it.
sample string in file
EXSTAT|BNK|2014|11|05|15|29|46|23169|E582754245|QABD|S|000|351
$14 is last value 351
bash-3.2$ grep 'EXSTAT|' abc.log|grep '|S|' |
awk -F"|" '{ a[$11]++ } END { for (b in a) { print b"," a[b] ; } }'
QDER,3
QCOL,1
QASM,36
QBEND,23
QAST,3
QGLBE,30
QCD,30
TBENO,1
QABD,9
QABE,5
QDCD,5
TESUB,1
QFDE,12
QCPA,3
QADT,80
QLSMR,6
bash-3.2$ grep 'EXSTAT|' abc.log
EXSTAT|BNK|2014|11|05|15|29|03|23146|E582754222|QGLBE|S|000|424
EXSTAT|BNK|2014|11|05|15|29|05|23147|E582754223|QCD|S|000|373
EXSTAT|BNK|2014|11|05|15|29|12|23148|E582754224|QASM|S|000|1592
EXSTAT|BNK|2014|11|05|15|29|13|23149|E582754225|QADT|S|000|660
EXSTAT|BNK|2014|11|05|15|29|14|23150|E582754226|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|14|23151|E582754227|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|15|23152|E582754228|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|15|23153|E582754229|QADT|S|000|258
EXSTAT|BNK|2014|11|05|15|29|17|23154|E582754230|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|18|23155|E582754231|QADT|S|000|263
EXSTAT|BNK|2014|11|05|15|29|18|23156|E582754232|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|19|23157|E582754233|QADT|S|000|270
EXSTAT|BNK|2014|11|05|15|29|19|23158|E582754234|QADT|S|000|264
EXSTAT|BNK|2014|11|05|15|29|20|23159|E582754235|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|20|23160|E582754236|QADT|S|000|241
EXSTAT|BNK|2014|11|05|15|29|21|23161|E582754237|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|21|23162|E582754238|QADT|S|000|229
EXSTAT|BNK|2014|11|05|15|29|22|23163|E582754239|QADT|S|000|234
EXSTAT|BNK|2014|11|05|15|29|22|23164|E582754240|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|23|23165|E582754241|QADT|S|000|254
EXSTAT|BNK|2014|11|05|15|29|23|23166|E582754242|QADT|S|000|402
EXSTAT|BNK|2014|11|05|15|29|24|23167|E582754243|QADT|S|000|223
EXSTAT|BNK|2014|11|05|15|29|24|23168|E582754244|QADT|S|000|226
Just add another associative array:
awk -F"|" '{a[$11]++;c[$11]+=$14}END{for(b in a){print b"," a[b]","c[b]}}'
tested below:
> cat temp
EXSTAT|BNK|2014|11|05|15|29|03|23146|E582754222|QGLBE|S|000|424
EXSTAT|BNK|2014|11|05|15|29|05|23147|E582754223|QCD|S|000|373
EXSTAT|BNK|2014|11|05|15|29|12|23148|E582754224|QASM|S|000|1592
EXSTAT|BNK|2014|11|05|15|29|13|23149|E582754225|QADT|S|000|660
EXSTAT|BNK|2014|11|05|15|29|14|23150|E582754226|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|14|23151|E582754227|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|15|23152|E582754228|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|15|23153|E582754229|QADT|S|000|258
EXSTAT|BNK|2014|11|05|15|29|17|23154|E582754230|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|18|23155|E582754231|QADT|S|000|263
EXSTAT|BNK|2014|11|05|15|29|18|23156|E582754232|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|19|23157|E582754233|QADT|S|000|270
EXSTAT|BNK|2014|11|05|15|29|19|23158|E582754234|QADT|S|000|264
EXSTAT|BNK|2014|11|05|15|29|20|23159|E582754235|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|20|23160|E582754236|QADT|S|000|241
EXSTAT|BNK|2014|11|05|15|29|21|23161|E582754237|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|21|23162|E582754238|QADT|S|000|229
EXSTAT|BNK|2014|11|05|15|29|22|23163|E582754239|QADT|S|000|234
EXSTAT|BNK|2014|11|05|15|29|22|23164|E582754240|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|23|23165|E582754241|QADT|S|000|254
EXSTAT|BNK|2014|11|05|15|29|23|23166|E582754242|QADT|S|000|402
EXSTAT|BNK|2014|11|05|15|29|24|23167|E582754243|QADT|S|000|223
EXSTAT|BNK|2014|11|05|15|29|24|23168|E582754244|QADT|S|000|226
> awk -F"|" '{a[$11]++;c[$11]+=$14}END{for(b in a){print b"," a[b]","c[b]}}' temp
QGLBE,1,424
QADT,20,5510
QASM,1,1592
QCD,1,373
>
also check the test here
You need not use grep for searching the file if it contains EXSTAT the awk can do that for you as well.
For example:
awk 'BEGIN{FS="|"; OFS=","} $1~EXSTAT && $12~S {sum[$11]+=$14; count[$11]++}END{for (i in sum) print i,count[i],sum[i]}' abc.log
for the input file abc.log with contents
EXSTAT|BNK|2014|11|05|15|29|03|23146|E582754222|QGLBE|S|000|424
EXSTAT|BNK|2014|11|05|15|29|05|23147|E582754223|QCD|S|000|373
EXSTAT|BNK|2014|11|05|15|29|12|23148|E582754224|QASM|S|000|1592
EXSTAT|BNK|2014|11|05|15|29|13|23149|E582754225|QADT|S|000|660
EXSTAT|BNK|2014|11|05|15|29|14|23150|E582754226|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|14|23151|E582754227|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|15|23152|E582754228|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|15|23153|E582754229|QADT|S|000|258
EXSTAT|BNK|2014|11|05|15|29|17|23154|E582754230|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|18|23155|E582754231|QADT|S|000|263
EXSTAT|BNK|2014|11|05|15|29|18|23156|E582754232|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|19|23157|E582754233|QADT|S|000|270
EXSTAT|BNK|2014|11|05|15|29|19|23158|E582754234|QADT|S|000|264
EXSTAT|BNK|2014|11|05|15|29|20|23159|E582754235|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|20|23160|E582754236|QADT|S|000|241
EXSTAT|BNK|2014|11|05|15|29|21|23161|E582754237|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|21|23162|E582754238|QADT|S|000|229
EXSTAT|BNK|2014|11|05|15|29|22|23163|E582754239|QADT|S|000|234
EXSTAT|BNK|2014|11|05|15|29|22|23164|E582754240|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|23|23165|E582754241|QADT|S|000|254
EXSTAT|BNK|2014|11|05|15|29|23|23166|E582754242|QADT|S|000|402
EXSTAT|BNK|2014|11|05|15|29|24|23167|E582754243|QADT|S|000|223
EXSTAT|BNK|2014|11|05|15|29|24|23168|E582754244|QADT|S|000|226
it will give an output as
QASM,1,1592
QGLBE,1,424
QADT,20,5510
QCD,1,373
What it does?
'BEGIN{FS="|"; OFS=","} excecuted before the input file is processed. It sets FS, input field seperator as | and OFS output field seperator as ,
$1~EXSTAT && $12~S{sum[$11]+=$14; count[$11]++} action is for each line
$1~EXSTAT && $12~S checks if first field is EXSTAT and 12th field is S
sum[$11]+=$14 array sum of field $14 indexed by $11
count[$11]++ array count indexed by $11
END{for (i in sum) print i,count[i],sum[i]}' excecuted at end of file, prints the content of the arrays
You can use a second array.
awk -F"|" '/EXSTAT\|/&&/\|S\|/{a[$11]++}/EXSTAT\|/{s[$11]+=$14}\
END{for(b in a)print b","a[b]","s[b];}' abc.log
Explanation
/EXSTAT\|/&&/\|S\|/{a[$11]++} on lines that contain both EXSTAT| and |S|, increment a[$11].
/EXSTAT\|/ on lines containing EXSTAT| add $14 to s[$11]
END{for(b in a)print b","a[b]","s[b];} print out all keys in array a, values of array a, and values of array s, separated by commas.
#!awk -f
BEGIN {
FS = "|"
}
$1 == "EXSTAT" && $12 == "S" {
foo[$11] += $14
}
END {
for (bar in foo)
printf "%s,%s\n", bar, foo[bar]
}

How to remove several columns and the field separators at once in AWK?

I have a big file with several thousands of columns. I want to delete some specific columns and the field separators at once with AWK in Bash.
I can delete one column at a time with this oneliner (column 3 will be deleted and its corresponding field separator):
awk -vkf=3 -vFS="\t" -vOFS="\t" '{for(i=kf; i<NF;i++){ $i=$(i+1);}; NF--; print}' < Big_File
However, I want to delete several columns at once... Can someone help me figure this out?
You can pass list of columns to be deleted from shell to awk like this:
awk -vkf="3,5,11" ...
then in the awk programm parse it into array:
split(kf,kf_array,",")
and then go thru all the colums and test if each particular column is in the kf_array and possibly skip it
Other possibility is to call your oneliner several times :-)
Here is an implementation of Kamil's idea:
awk -v remove="3,8,5" '
BEGIN {
OFS=FS="\t"
split(remove,a,",")
for (i in a) b[a[i]]=1
}
{
j=1
for (i=1;i<=NF;++i) {
if (!(i in b)) {
$j=$i
++j
}
}
NF=j-1
print
}
'
If you can use cut instead of awk, this one is easier with cut:
e.g. this obtains columns 1,3, and from 50 on from file:
cut -f1,3,50- file
Something like this should work:
awk -F'\t' -v remove='3|8|5' '
{
rec=ofs=""
for (i=1;i<=NF;i++) {
if (i !~ "^(" remove ")$" ) {
rec = rec ofs $i
ofs = FS
}
}
print rec
}
' file

Resources