How to add an input file name to multiple output files in awk? - bash

The question might be trivial. I'm trying to figure out a way to add a part of my input file name to multiple outputs generated by the following awk script.
zcat $1 | BEGIN {
# the number of sequences per file
if (!N) N=10000;
# file prefix
if (!prefix) prefix = "seq";
# file suffix
if (!suffix) suffix = "fa";
# this keeps track of the sequences
count = 0
# skip empty lines at the beginning
/^$/ { next; }
# act on fasta header
/^>/ {
if (count % N == 0) {
if (output) close(output)
output = sprintf("%s%07d.%s", prefix, count, suffix)
print > output
count ++
# write the fasta body into the file
print >> output
The input in $1 variable is 30_C_283_1_5.9.fa.gz
The output files generated by the script are
myseq0000000.fa, myseq1000000.fa and so on....
I would like the output to be
30_C_283_1_5.9_myseq000000.fa, 30_C_283_1_5.9_myseq100000.fa....
Looking forward for some inputs in this regard.

There's a way to direct the output from inside the Awk script:


Retreive specific values from file

I have a file containing:
process {
withName : teq {
file = "/path/to/teq-0.20.9.txt"
process {
withName : cad {
file = "/path/to/cad-4.0.txt"
process {
withName : sik {
file = "/path/to/sik-20.0.txt"
I would like to retreive value associated at the end of the file for teq, cad and sik
I was first thinking about something like
grep -E 'teq'
and get only second raw and then remove part of recurrence in line
But it may be easier to do something like:
for a in
line=$(sed -n '{$a}p'
if line=teq
#next line using sed -n?
do print nextline &> teq.txt
else if line=cad
do print nextline &> cad.txt
else if line=sik
do print nextline &> sik.txt
(obviously it doesn't work)
output wanted:
teq.txt containing teq-0.20.9, cad.txt containing cad-4.0 and sik.txt containing sik-20.0
Is there a good way to do that? Thank you for your comments
Based on your given sample:
awk '/withName/{close(f); f=$3 ".txt"}
/file/{sub(/.*\//, ""); sub(/\.txt".*/, "");
print > f}' ip.txt
/withName/{close(f); f=$3 ".txt"} if line contains withName, save filename in f using the third field. close() will close any previous file handle
/file/{sub(/.*\//, ""); sub(/\.txt".*/, ""); if line contains file, remove everything except the value required
print > f print the modified line and redirect to filename in f
if you can have multiple entries, use >> instead of >
Here is a solution in awk:
awk '/withName/{name=$3} /file =/{print $3 > name ".txt"}'
/withName/{name=$3}: when I see the line containing "withName", I save that name
When I see the line with "file =", I print

Split CSV into two files based on column matching values in an array in bash / posh

I have a input CSV that I would like to split into two CSV files. If the value of column 4 matches any value in WLTarray it should go in output file 1, if it doesn't it should go in output file 2.
"22532" "79994" "18809" "21032"
input CSV file:
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
output CSV file1:
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
output CSV file2:
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
I've been looking at awk to filter this (python & perl not an option in my environment) but I think there is probably a much smarter way:
declare -a WLTarray=("22532" "79994" "18809" "21032")
for WLTvalue in "${WLTarray[#]}" #Everything in the WLTarray will go to $filename-WLT.tmp
awk -F, '($4=='$WLTvalue'){print}' $filename.tmp >> $filename-WLT.tmp #move the lines to the WLT file
# now filter to remove non matching values? why not just move the rows entirely?
With regular awk you can make use of split and substr (to handle double-quote removal for comparison) and split the csv file as you indicate. For example you can use:
awk 'BEGIN { FS=","; s="22532 79994 18809 21032"
split (s,a," ") # split s into array a
for (i in a) # loop over each index in a
b[a[i]]=1 # use value in a as index for b
FNR == 1 { # first record, write header to both output files
print $0 > "output1.csv"
print $0 > "output2.csv"
substr($4,2,length($4)-2) in b { # 4th field w/o quotes in b?
print $0 > "output1.csv" # write to output1.csv
{ print $0 > "output2.csv" } # otherwise write to output2.csv
' input.csv
in the BEGIN {...} rule you set the field separator (FS) to break on comma, and split the string containing your desired output1.csv field 4 matches into the array a, then loops over the values in a using them for the indexes in array b (to allow a simple i in b check);
the first rule is applied to the first records in the file (the header line) which is simply written out to both output files;
the next rule removes the double-quotes surrounding field-4 and then checks if the number in field-4 matches an index in array b. If so the record is written to output1.csv otherwise it is written to output2.csv.
Example Input File
$ cat input.csv
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
Resulting Output Files
$ cat output1.csv
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
$ cat output2.csv
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
You can use gawk like this:
#!/usr/bin/gawk -f
split("22532 79994 18809 21032", a)
for(i in a) {
NR > 1 {
if ($4 in WLTarray) {
print >> "output1.csv"
} else {
print >> "output2.csv"
Make it executable and run it like this:
chmod +x test.awk
./test.awk input.csv
using grep with a filter file as input was the simplest answer.
declare -a WLTarray=("22532" "79994" "18809" "21032")
for WLTvalue in "${WLTarray[#]}"
awkstring="'\$4 == "\"\\\"$WLTvalue\\\"\"" {print}'"
eval "awk -F, $awkstring input.csv >> output.WLT.csv"
grep -v -x -f output.WLT.csv input.csv > output.NonWLT.csv

Find, Replace, Remove - with in file

I'm currently using this code:
awk 'BEGIN { s = \"{$CNEW}\" } /WORD_MATCH/ { $0 = s; n = 1 } 1; END { if(!n) print s }' filename > new_filename
To find a match on WORD_MATCH and then replace that line with $CNEW in a file called filename the results are written to new_filename
This all works well. But I have an issue where I may want to DELETE the line instead of replace it.
So I set $CNEW = '' which works in that I get a blank line in the file, but not actually removing the line.
Is there anyway to adapt the AWK command to allow the removal of the line ?
The total aim is :
If there isn't a line in the file containing WORD_MATCH add one, based on $CNEW
If there is a line in the file containing WORD_MATCH update that line with the new value from $CNEW
If $CNEW ='' then delete the line contain WORD_MATCH.
There will only be one line in he file containing WORD_MATCH
awk -v s="$CNEW" '/WORD_MATCH/ { n=1; if (s) $0=s; else next; } 1; END { if(s && !n) print s }' file
How it works
-v s="$CNEW"
This creates s as an awk variable with the value $CNEW. Note that the use of -v neatly eliminates the quoting problems that can occur by trying to define s in a BEGIN block.
/WORD_MATCH/ { n=1; if (s) $0=s; else next; }
If the current line matches WORD_MATCH, then set n to 1. If s is non-empty, then set the current line to s. If not, skip the rest of the commands and start over on the next line.
This is cryptic shorthand for print the line.
END { if(s && !n) print s }
At the end of the file, if n is still not 1 and s is non-empty, then print s.

How to pipe program output so as to eliminate specific text

I have a program which produces results to the terminal which contains a header and a footer. The header ends when the first line containing only '-' characters is encountered and the footer begins when the last line containing a '-'is encountered. I would like to pass the output of this program through another program that will cut out the header and footer, leaving only the data. I am not sure what the most efficient way to do this is. The files are roughly 20MB in size. I am running Mac OSX
You could use 'awk' to do the work. Below is a awk program file I wrote in a file named clip.awk.
You can trim a data file that you described data.txt like this:
$ cat data.txt | awk -f clip.awk
Here is the program clip.awk:
BEGIN { state = 0; # HEADER
# match a line of all ----
/^-+$/ {
if (state == 0)
state = 1; # DATA
state = 2; # FOOTER
# Skip to next line
# print any line while in DATA section
{ if (state == 1) print }

Replace or append block of text in file with contest of another file

I have two files:
second line;
I want to run a script and have input.conf replace the contents between the #blockbegin and #blockend lines.
I already have this:
sed -i -ne '/^#blockbegin/ {p; r input.conf' -e ':a; n; /#blockend/ {p; b}; ba}; p' super.conf
It works well but until I change or remove #blockend line in super.conf, then script replaces all lines after #blockbegin.
In addition, I want script to replace block or if block doesn't exists in super.conf append new block with content of input.conf to super.conf.
It can be accomplished by remove + append, but how to remove block using sed or other unix command?
Though I gotta question the utility of this scheme -- I tend to favor systems that complain loudly when expectations aren't met instead of being more loosey-goosey like this -- I believe the following script will do what you want.
Theory of operation: It reads in everything up-front, and then emits its output all in one fell swoop.
Assuming you name the file injector, call it like injector input.conf super.conf.
#!/usr/bin/env awk -f
# Expects to be called with two files. First is the content to inject,
# second is the file to inject into.
FNR == 1 {
# This switches from "read replacement content" to "read template"
# at the boundary between reading the first and second files. This
# will of course do something suprising if you pass more than two
# files.
readReplacement = !readReplacement;
# Read a line of replacement content.
readReplacement {
replacement[rCount] = $0;
# Read a line of template content.
template[tCount] = $0;
# Note the beginning of the replacement area.
/^#blockbegin$/ {
beginAt = tCount;
# Note the end of the replacement area.
/^#blockend$/ {
endAt = tCount;
# Finished reading everything. Process it all.
if (beginAt && endAt) {
# Both beginning and ending markers were found; replace what's
# in the middle of them.
emitTemplate(1, beginAt);
emitTemplate(endAt, tCount);
} else {
# Didn't find both markers; just append.
emitTemplate(1, tCount);
# Emit the indicated portion of the template to stdout.
function emitTemplate(from, to) {
for (i = from; i <= to; i++) {
print template[i];
# Emit the replacement text to stdout.
function emitReplacement() {
for (i = 1; i <= rCount; i++) {
print replacement[i];
I've written perl one-liner:
perl -0777lni -e 'BEGIN{open(F,pop(#ARGV))||die;$b="#blockbegin";$e="#blockend";local $/;$d=<F>;close(F);}s|\n$b(.*)$e\n||s;print;print "\n$b\n",$d,"\n$e\n" if eof;' edited.file input.file
edited.file - path to updated file
input.file - path to file with new content of block
Script first delete block (if find one matching) and next append new block with new content.
You mean say
sed '/^#blockbegin/,/#blockend/d' super.conf
