Retreive specific values from file - bash

I have a file test.cf containing:
process {
withName : teq {
file = "/path/to/teq-0.20.9.txt"
}
}
process {
withName : cad {
file = "/path/to/cad-4.0.txt"
}
}
process {
withName : sik {
file = "/path/to/sik-20.0.txt"
}
}
I would like to retreive value associated at the end of the file for teq, cad and sik
I was first thinking about something like
grep -E 'teq' test.cf
and get only second raw and then remove part of recurrence in line
But it may be easier to do something like:
for a in test.cf
do
line=$(sed -n '{$a}p' test.cf)
if line=teq
#next line using sed -n?
do print nextline &> teq.txt
else if line=cad
do print nextline &> cad.txt
else if line=sik
do print nextline &> sik.txt
done
(obviously it doesn't work)
EDIT:
output wanted:
teq.txt containing teq-0.20.9, cad.txt containing cad-4.0 and sik.txt containing sik-20.0
Is there a good way to do that? Thank you for your comments

Based on your given sample:
awk '/withName/{close(f); f=$3 ".txt"}
/file/{sub(/.*\//, ""); sub(/\.txt".*/, "");
print > f}' ip.txt
/withName/{close(f); f=$3 ".txt"} if line contains withName, save filename in f using the third field. close() will close any previous file handle
/file/{sub(/.*\//, ""); sub(/\.txt".*/, ""); if line contains file, remove everything except the value required
print > f print the modified line and redirect to filename in f
if you can have multiple entries, use >> instead of >

Here is a solution in awk:
awk '/withName/{name=$3} /file =/{print $3 > name ".txt"}' test.cf
/withName/{name=$3}: when I see the line containing "withName", I save that name
When I see the line with "file =", I print

Related

AWK - Unable to update the files

I have 300 txt files in my directory in the following format
regional_vol_WM_atlas[1-300].txt
651328 651328
553949 553949
307287 307287
2558 2558
The following awk script was supposed to create a new file by performing calculation on fourth row of each existing files in my directory .
#!/bin/bash
awk=/usr/bin/awk
awkcommand='
FNR == 1 {
newfilename = FILENAME ; sub(".txt", "_prop.txt", newfilename)
printf "" > newf
ilename
}
FNR == 4 {
$1=($1/0.824198)*0.8490061
$2=($2/0.824198)*0.8490061
}
{
print >> newfilename
}
'regional_vol_WM_atlas[0-9].txt regional_vol_WM_atlas[0-9][0-9].txt regional_vol_WM_atlas1[0-4][0-9].txt regional_vol_WM_atlas15[02].txt
Unfortunately i could not update any file in the directory ,when i run the file, i am incurring following error
dev#dev-OptiPlex-780:/media/dev/Daten/Task1/subject1/t1$ '/media/dev/Daten/Task1/subject1/t1/Method'
/media/dev/Daten/Task1/subject1/t1/Method: line 18: regional_vol_WM_atlas10.txt: command not found
Could you please correct me where i am wrong
Your script is not calling awk. It defines a variable named awk and then tries to execute the file regional_vol_WM_atlas10.txt with the variable awkcommand set in its environment. Alas, that file is not in your PATH, so bash cannot find it. You need to instead do:
awk "$awkcommand" file1 file2 ...
(where file1, file2, etc. are the input files you want to use as input.)
Also, note that your current script is appending the literal text regional_vol_WM_atlas[0-9].txt to the end of the awk command (or if a file exists which matches that glob, the name of that file is being appended), which you do not want. Overall, what you were trying to do should have been written:
#!/bin/bash
awkcommand='
FNR == 1 {
newfilename = FILENAME ; sub(".txt", "_prop.txt", newfilename)
printf "" > newfilename
}
FNR == 4 {
$1=($1/0.824198)*0.8490061
$2=($2/0.824198)*0.8490061
}
{
print >> newfilename
}
'
awk "$awkcommand" regional_vol_WM_atlas[0-9].txt \
regional_vol_WM_atlas[0-9][0-9].txt \
regional_vol_WM_atlas1[0-4][0-9].txt \
regional_vol_WM_atlas15[02].txt
The problem is that a variable can be assigned for a command, for example:
x='hello' some_command
Which in effect is what bash thinks you are trying to do. The culprit is the whitespace, which acts as a command separator, so just escape (prefix with a \) the whitespace in the list of filenames:
#!/bin/bash
awk=/usr/bin/awk
awkcommand='
FNR == 1 {
newfilename = FILENAME ; sub(".txt", "_prop.txt", newfilename)
printf "" > newf
ilename
}
FNR == 4 {
$1=($1/0.824198)*0.8490061
$2=($2/0.824198)*0.8490061
}
{
print >> newfilename
}
'\ regional_vol_WM_atlas[0-9].txt\ regional_vol_WM_atlas[0-9][0-9].txt\ regional_vol_WM_atlas1[0-4][0-9].txt\ regional_vol_WM_atlas15[02].txt
The only thing I have altered is the final line.

Find, Replace, Remove - with in file

I'm currently using this code:
awk 'BEGIN { s = \"{$CNEW}\" } /WORD_MATCH/ { $0 = s; n = 1 } 1; END { if(!n) print s }' filename > new_filename
To find a match on WORD_MATCH and then replace that line with $CNEW in a file called filename the results are written to new_filename
This all works well. But I have an issue where I may want to DELETE the line instead of replace it.
So I set $CNEW = '' which works in that I get a blank line in the file, but not actually removing the line.
Is there anyway to adapt the AWK command to allow the removal of the line ?
The total aim is :
If there isn't a line in the file containing WORD_MATCH add one, based on $CNEW
If there is a line in the file containing WORD_MATCH update that line with the new value from $CNEW
If $CNEW ='' then delete the line contain WORD_MATCH.
There will only be one line in he file containing WORD_MATCH
Thanks
awk -v s="$CNEW" '/WORD_MATCH/ { n=1; if (s) $0=s; else next; } 1; END { if(s && !n) print s }' file
How it works
-v s="$CNEW"
This creates s as an awk variable with the value $CNEW. Note that the use of -v neatly eliminates the quoting problems that can occur by trying to define s in a BEGIN block.
/WORD_MATCH/ { n=1; if (s) $0=s; else next; }
If the current line matches WORD_MATCH, then set n to 1. If s is non-empty, then set the current line to s. If not, skip the rest of the commands and start over on the next line.
1
This is cryptic shorthand for print the line.
END { if(s && !n) print s }
At the end of the file, if n is still not 1 and s is non-empty, then print s.

How to add an input file name to multiple output files in awk?

The question might be trivial. I'm trying to figure out a way to add a part of my input file name to multiple outputs generated by the following awk script.
Script:
zcat $1 | BEGIN {
# the number of sequences per file
if (!N) N=10000;
# file prefix
if (!prefix) prefix = "seq";
# file suffix
if (!suffix) suffix = "fa";
# this keeps track of the sequences
count = 0
}
# skip empty lines at the beginning
/^$/ { next; }
# act on fasta header
/^>/ {
if (count % N == 0) {
if (output) close(output)
output = sprintf("%s%07d.%s", prefix, count, suffix)
}
print > output
count ++
next
}
# write the fasta body into the file
{
print >> output
}
The input in $1 variable is 30_C_283_1_5.9.fa.gz
The output files generated by the script are
myseq0000000.fa, myseq1000000.fa and so on....
I would like the output to be
30_C_283_1_5.9_myseq000000.fa, 30_C_283_1_5.9_myseq100000.fa....
Looking forward for some inputs in this regard.
There's a way to direct the output from inside the Awk script:
https://www.gnu.org/software/gawk/manual/html_node/Redirection.html

Adding file information to an AWK comparison

I'm using awk to perform a file comparison against a file listing in found.txt
while read line; do
awk 'FNR==NR{a[$1]++;next}$1 in a' $line compare.txt >> $CHECKFILE
done < found.txt
found.txt contains full path information to a number of files that may contain the data. While I am able to determine that data exists in both files and output that data to $CHECKFILE, I wanted to be able to put the line from found.txt (the filename) where the line was found.
In other words I end up with something like:
File " /xxxx/yyy/zzz/data.txt "contains the following lines in found.txt $line
just not sure how to get the /xxxx/yyy/zzz/data.txt information into the stream.
Appended for clarification:
The file found.txt contains the full path information to several files on the system
/path/to/data/directory1/file.txt
/path/to/data/directory2/file2.txt
/path/to/data/directory3/file3.txt
each of the files has a list of parameters that need to be checked for existence before appending additional information to them later in the script.
so for example, file.txt contains the following fields
parameter1 = true
parameter2 = false
...
parameter35 = true
the compare.txt file contains a number of parameters as well.
So if parameter35 (or any other parameter) shows up in one of the three files I get it's output dropped to the Checkfile.
Both of the scripts (yours and the one I posted) will give me that output but I would also like to echo in the line that is being read at that point in the loop. Sounds like I would just be able to somehow pipe it in, but my awk expertise is limited.
It's not really clear what you want but try this (no shell loop required):
awk '
ARGIND==1 { ARGV[ARGC] = $0; ARGC++; next }
ARGIND==2 { keys[$1]; next }
$1 in keys { print FILENAME, $1 }
' found.txt compare.txt > "$CHECKFILE"
ARGIND is gawk-specific, if you don't have it add FNR==1{ARGIND++}.
Pass the name into awk inside a variable like this:
awk -v file="$line" '{... print "File: " file }'

AWK split for multiple delimiters lines

I'm trying to split a file using AWK one-line but the code below that I came with is not working properly.
awk '
BEGIN { idx=0; file="original_file.split." }
/^REC_DELIMITER.(HIGH|TOP)$/ { idx++ }
/^REC_DELIMITER.TOP$/,/^REC_DELIMITER.(HIGH|TOP)$/ { print > file sprintf("%03d", idx) }
' original_file
Test file is "original_file":
REC_DELIMITER.TOP
lineA1
lineA2
lineA3
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3
REC_DELIMITER.TOP
lineC1
lineC2
lineC3
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3
AWK code above is for REC_DELIMITER.TOP and it is giving me these files:
original_file.split.001:
REC_DELIMITER.TOP
original_file.split.003:
REC_DELIMITER.TOP
however, I'm trying to get this:
original_file.split.001:
REC_DELIMITER.TOP
lineA1
lineA2
lineA3
original_file.split.003:
REC_DELIMITER.TOP
lineC1
lineC2
lineC3
There will be other record delimiters, and when needed, we can run for them like REC_DELIMITER.HIGH, this way getting files like below:
original_file.split.002:
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3
original_file.split.004:
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3
Any help guys is very appreciate, I have been trying to get this working past few days and AWK code above is the best I was able to get. I need now help from AWK masters. :)
Thank you!
You can try something like this:
awk '
/REC_DELIMITER\.TOP/ {
a=1
b=0
file = sprintf (FILENAME".split.%03d",++n)
}
/REC_DELIMITER\.HIGH/ {
b=1
a=0
file = sprintf (FILENAME".split.%03d",++n)
}
a {
print $0 > file
}
b {
print $0 > file
}' file
You need something like this (untested):
awk -v dtype="TOP" '
BEGIN { dbase = "^REC_DELIMITER\\."; delim = dbase dtype "$" }
$0 ~ dbase { inBlock=0 }
$0 ~ delim { inBlock=1; idx++ }
inBlock { print > sprintf("original_file.split.%03d", idx) }
' original_file
awk -vRS=REC_DELIMITER '/^.TOP\n/{print RS $0 > sprintf("original_file.split.%03d",n)};!++n' original_file
(Give or take an extra newline at the end.)
Generally, when input is supposed to be treated as a series of multi-line records with a special line as delimiter, the most direct approach is to set RS (and often ORS) to that delimiter.
Normally you'd want to add newlines to its beginning and/or end, but this case is a little special so it's easier without them.
Edited to add: You need GNU Awk for this. Standard Awk considers only the first character of RS.
I made some changes so the different delimiters go to the their own file, even when they occur later in the file. make a file like splitter.awk with the contents below, the chmod +x it and run it with ./splitter.awk original_file
#!/usr/bin/awk -f
BEGIN {
idx=0;
file="original_file.split.";
out=""
}
{
if($0 ~ /^REC_DELIMITER.(TOP|HIGH)/){
if (!cnt[$0]) {
cnt[$0] = ++idx;
}
out=cnt[$0];
}
print > file sprintf("%03d", out)
}
I'm not very used to AWK, however, plasticide's answer put me towards right direction and I finally got AWK script working as requirements.
In below code, first IF turn echo to 0 if a demilier is found. Second IF turn echo to 1 if the wanted delimiter is found, then the want ones are are split from file.
I know regex could be something like /^(REC_(DELIMITER\.(TOP|HIGH|LOW)|NO_CATEGORY)$/ but since regex is created dynamically via shellscript that reads from an specific file a list of delimiters, it will look more like in AWK below.
awk 'BEGIN {
idx=0; echo=1; file="original_file.split."
}
{
#All the delimiters to consider in given file
if($0 ~ /^(REC_DELIMITER.TOP|REC_DELIMITER.HIGH|REC_DELIMITER.LOW|REC_NO_CATEGORY)$/) {
echo=0
}
#Delimiters that should actually be pulled
if($0 ~ /^(REC_DELIMITER.HIGH|REC_DELIMITER.LOW)$/ {
idx++; echo=1
}
#Print to a file is match wanted delimmiter
if(echo) {
print > file idx
}
}' original_file
Thank you all. I really appreciate it very much.

Resources