I have a large .xml file like that:
c1="a1" c2="b1" c3="cccc1"
c1="aa2" c2="bbbb2" c3="cc2"
c1="aaaaaa3" c2="bb3" c3="cc3"
...
I need the result like the following:
a1 b1 cccc1
aa2 bbbb2 cc2
aaaaaa3 bb3 cc3
...
How can I get the column in BASH?
I have the following method in PL/SQL,but it's very inconvenient:
SELECT C1,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 1)) c1,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 2)) c2,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 3)) c3
FROM TEST;
Use cut:
cut -d'"' -f2,4,6 --output-delimiter=" " test.txt
Or you can use sed if the number of columns is not known:
sed 's/[a-z][a-z0-9]\+="\([^"]\+\)"/\1/g' < test.txt
Explanation:
[a-z][a-z0-9]\+ - matches a string starting with a alpha char followed by any number of alphanumeric chars
"\([^"]\+\)" - captures any string inside the quotes
\1 - represents the captured string that in this case is used to replace the entire match
A perl approach (based on the awk answer by #A-Ray)
perl -F'"' -ane 'print join(" ",#F[ map { 2 * $_ + 1} (0 .. $#F) ]),"\n";' < test.txt
Explanation:
-F'"' set input separator to "
-a turn autosplit on - this results in #F being filed with content of fields in the input
-n iterate through all lines but don't print them by default
-e execute code following
map { 2 * $_ + 1} (0 .. $#F) generates a list of indexes (1,3,5 ...)
#F[map { 2 * $_ + 1} (0 .. $#F)] takes a slice from the array, selecting only odd fields
join - joins the slice with spaces
NOTE: I would not use this approach without a good reason, the first two are easier.
Some benchmarking (on a Raspberry Pi, with a 60000 lines input file and output thrown away to /dev/null)
cut - 0m0.135s no surprise there
sed - 0m5.864s
perl - 0m8.218s - I guess regenerating the index list every line isn't that fast (with a hard coded slice list it goes to half, but that would defeat the purpose)
the read based solution - 0m52.027s
You can also look at the built-in substring replacement/removal bash offers. Either in a short script or One-Liner:
#!/bin/bash
while read -r line; do
new=${line//c[0-9]=/} ## remove 'cX=', where X is '0-9'
new=${new//\"/} ## remove all '"' (double-quotes)
echo "$new"
done <"$1"
exit 0
Input
$ cat dat/stuff.xml
c1="a1" c2="b1" c3="cccc1"
c1="aa2" c2="bbbb2" c3="cc2"
c1="aaaaaa3" c2="bb3" c3="cc3"
Output
$ bash parsexmlcx.sh dat/stuff.xml
a1 b1 cccc1
aa2 bbbb2 cc2
aaaaaa3 bb3 cc3
As a One-Liner
while read -r line; do new=${line//c[0-9]=/}; new=${new//\"/}; echo "$new"; done <dat/stuff.xml
awk -F '"' '{ for(i=2; i<=NF; i+=2) { printf $i" " } print "" }'
Explanation
-F '"' makes Awk treat quotation marks (") as field delimiters. For example, Awk will split the line...
c1="a1" c2="b1" c3="cccc1"
...into fields numbered as...
1: 'c1='
2: 'a1'
3: ' c2='
4: 'b1'
5: ' c3='
6: 'cccc1'
7: ''
for(i=2; i<=NF; i+=2) { printf $i" " } starts at field 2, prints the value of the field, skips a field, and continues. In this case, fields 2, 4, and 6 will be printed.
print outputs a string following by a newline. printf also outputs a string, but doesn't append a newline. Therefore...
printf $i" "
...outputs the value of field $i followed by a space.
print ""
...simply outputs a newline.
Related
Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed
I have tried several different search terms but have not found exactly what I want, I am sure there is already an answer for this so please point me to it if so.
I would like to understand how to increment a letter code given a standard number convention in a bash script.
Starting with AAAA=0 or with leading zerosAAAA=000000 (26x26x26x26) I would like to increment the value with a a positive single digit each time, so aaab=000001,aaac=000002 and aaba=000026 and aaaca=000052 etc.
Thanks Art!
I guess this is what you want
echo {a..z}{a..z}{a..z}{a..z} | tr ' ' '\n' | nl
will be too long, perhaps test with this first
echo {a..z}{a..z} | tr ' ' '\n' | nl
if you don't need the line numbers remove last pipe and nl
If you need the output in xxxx=nnnnnn format, you can use awk
echo {a..z}{a..z}{a..z}{a..z} | tr ' ' '\n' | awk '{printf "%s=%06d\n", $0, NR-1}'
aaaa=000000
aaab=000001
aaac=000002
aaad=000003
aaae=000004
aaaf=000005
aaag=000006
aaah=000007
aaai=000008
aaaj=000009
...
zzzv=456971
zzzw=456972
zzzx=456973
zzzy=456974
zzzz=456975
Fast
If you are aiming for speed and simplicity:
#!/bin/bash
i=0
for text in {a..z}{a..z}{a..z}{a..z}; do
printf '%06d %5.5s\n' "$i" "$text"
(( i++ ))
done
Precise
Aiming at having a function that convert any number to the character string:
We must Understand that what you are describing is a number written in base 26, using the character a as 0, b as 1, c as 3, etc.
Thus, aaaa means 0000, aaab means 0001, aaac means 0002, .... aaaz means 0025
and aaba means 0026, aaca means 0052.
bc could do the base conversion directly (as numbers):
$ echo 'obase=26; 199'|bc
07 17
The 7th letter is: a0, b1, c2, d3, e4, f5, g6, (h)7,
the 17th letter is (r).
If we set the variable list to: list=$(printf '%s' {a..z}) or list=abcdefghijklmnopqrstuvwxyz
We could get each letter from the number with: ${list:7:1} and ${list:17:1}
$ echo "${list:7:1} and ${list:17:1}"
h and r
$ printf '%s' "${list:7:1}" "${list:17:1}" # Using printf:
hr
Script
All together inside an script, is:
#!/bin/bash
list=$(printf '%s' {a..z})
getletters(){
local numbers
numbers="$(echo "obase=26; $1"|bc)"
for number in $numbers; do
printf '%s' "${list:10#$number:1}";
done;
echo;
}
count=2
limit=$(( 26**$count - 1 ))
for (( i=0; i<=$limit; i++)); do
printf '%06d %-5.5s\n' "$i" "$(getletters "$i")"
done
Please change count from 2 to 4 to get the whole list. Be aware that such list is more than half a million lines: The limit is 456,975 and will take some time.
With perl, you can ++ a string to increment the letter:
for (my ($n,$s) = (0,"aaaa"); $n < 200; $n++, $s++) {
printf "%s=%0*d\n", $s, length($s), $n;
}
outputs
aaaa=0000
aaab=0001
aaac=0002
aaad=0003
...
aaby=0050
aabz=0051
aaca=0052
aacb=0053
...
aahp=0197
aahq=0198
aahr=0199
I need advice on how to achieve this output:
myoutputfile.txt
Tom Hagen 1892
State: Canada
Hank Moody 1555
State: Cuba
J.Lo 156
State: France
output of mycommand:
/usr/bin/mycommand
Tom Hagen
1892
Canada
Hank Moody
1555
Cuba
J.Lo
156
France
Im trying to achieve with this shell script:
IFS=$'\r\n' GLOBIGNORE='*' :; names=( $(/usr/bin/mycommand) )
for name in ${names[#]}
do
#echo $name
echo ${name[0]}
#echo ${name:0}
done
Thanks
Assuming you can always rely on the command to output groups of 3 lines, one option might be
/usr/bin/mycommand |
while read name;
read year;
read state; do
echo "$name $year"
echo "State: $state"
done
An array isn't really necessary here.
One improvement could be to exit the loop if you don't get all three required lines:
while read name && read year && read state; do
# Guaranteed that name, year, and state are all set
...
done
An easy one-liner (not tuned for performance):
/usr/bin/mycommand | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
It reads 3 lines at a time from the pipe and then passes them to a new instance of printf which is used to format the output.
If you have whitespace at the beginning (it looks like that in your example output), you may need to use something like this:
/usr/bin/mycommand | sed -e 's/^\s*//g' | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
#!/bin/bash
COUNTER=0
/usr/bin/mycommand | while read LINE
do
if [ $COUNTER = 0 ]; then
NAME="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 1 ]; then
YEAR="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 2 ]; then
STATE="$LINE"
COUNTER=0
echo "$NAME $YEAR"
echo "State: $STATE"
fi
done
chepner's pure bash solution is simple and elegant, but slow with large input files (loops in bash are slow).
Michael Jaros' solution is even simpler, if you have GNU xargs (verify with xargs --version), but also does not perform well with large input files (external utility printf is called once for every 3 input lines).
If performance matters, try the following awk solution:
/usr/bin/mycommand | awk '
{ ORS = (NR % 3 == 1 ? " " : "\n")
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") }
{ print (NR % 3 == 0 ? "State: " : "") $0 }
' > myoutputfile.txt
NR % 3 returns the 0-based index of each input line within its respective group of consecutive 3 lines; returns 1 for the 1st line, 2 for the 2nd, and 0(!) for the 3rd.
{ ORS = (NR % 3 == 1 ? " " : "\n") determines ORS, the output-record separator, based on that index: a space for line 1, and a newline for lines 2 and 3; the space ensures that line 2 is appended to line 1 with a space when using print.
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") strips leading and trailing whitespace from the line - including, if present, a trailing \r, which your input seems to have.
{ print (NR % 3 == 0 ? "State: " : "") $0 } prints the trimmed input line, prefixed by "State: " only for every 3rd input line, and implicitly followed by ORS (due to use of print).
I'm looking for a way to merge 4 lines of dna probing results into one line.
The problem here is:
I don't want to append the lines. But associating them
The 4 lines of dna probing:
A----A----------A----A-A--AAAA-
-CC----CCCC-C-----CCC-C-------C
------G----G--G--G------G------
---TT--------T-T---------T-----
I need these to be 1 line, not just appended but intermixed without the dashes.
First characters of the result:
ACCTTAGCCCCGC...
This seem to be a kind of general problem, so the language choosed to solve this don't matter.
For fun: one bash way:
lines=(
A----A----------A----A-A--AAAA-
-CC----CCCC-C-----CCC-C-------C
------G----G--G--G------G------
---TT--------T-T---------T-----
)
result=""
for ((i=0;i<${#lines};i++)) ;do
chr=- c=()
for ((l=0;l<${#lines[#]};l++)) ;do
[ "${lines[l]:i:1}" != "-" ] &&
chr="${lines[l]:i:1}" &&
c+=($l)
done
[ ${#c[#]} -eq 0 ] && printf 'Char #%d not replaced.\n' $i
[ ${#c[#]} -gt 1 ] && c="${c[*]}" && chr="*" &&
printf "Conflict at char #%d (lines: %s).\n" $i "${c// /, }"
result+=$chr
done
echo $result
With provided input, there is no conflict and all characters is replaced. So the output is:
ACCTTAGCCCCGCTGTAGCCCACAGTAAAAC
Note: Question stand for 4 different files, so lines= syntax could be:
lines=($(cat file1 file2 file3 file4))
But with a wrong input:
lines=(
A----A---A-----A-----A-A--AAAA-
-CC----CCCC-C-----CCC-C-------C
------G----G---G-G------G------
---TT--------T-T---------T-----
)
output could be:
Conflict at char #9 (lines: 0, 1).
Char #14 not replaced.
Conflict at char #15 (lines: 0, 2, 3).
Char #16 not replaced.
and
echo $result
ACCTTAGCC*CGCT-*-GCCCACAGTAAAAC
Small perl filter
But if input are not to be verified, this little perl filter could do the job:
(Thanks #jm666 for }{ syntax)
perl -nlE 'y+-+\0+;$,|=$_}{say$,' <(cat file1 file2 file3 file4)
where
-n process all lines without output
-l whipe leading cariage return at end of lines
y+lhs+rhs+ replace (translate) chars from 'lhs' to 'rhs'
\0 is the *null* character, binary 0.
$, is a variable
|= binary or, between himself and current line ($_)
}{ at END, once all lines processed
Alternative way - not very efficient - but short:
file="./gene"
line1=$(head -1 "$file")
seq ${#line1} | xargs -n1 -I% cut -c% "$file" | paste -s - | tr -cd '[A-Z\n]'
prints:
ACCTTAGCCCCGCTGTAGCCCACAGTAAAAC
Assumption: each line has the same length.
Decomposition:
the line1=$(head -1 "$file") read the 1st line into the variable line1
the seq ${#line1} generate a sequence of numbers 1..char_count_in_the_line1, like
1
2
..
31
the xargs -n1 -I% cut -c% "$file" will run for each above number the command cut like cut -c22 filename - what extract the given column from the file, e.g. you will get output like:
A
-
-
-
-
C
-
-
# and so on
the paste -s - will join the above lines into one long line with the \t (tab) separator, like:
A - - - - C - - - C - - - - - T ... etc...
finally the tr -cd '[A-Z\n]' remove everything what isn't uppercase character or newline, so will get the final
ACCTTAGCCCCGCTGTAGCCCACAGTAAAAC
Okay, I have two files: one is baseline and the other is a generated report. I have to validate a specific string in both the files match, it is not just a single word see example below:
.
.
name os ksd
56633223223
some text..................
some text..................
My search criteria here is to find unique number such as "56633223223" and retrieve above 1 line and below 3 lines, i can do that on both the basefile and the report, and then compare if they match. In whole i need shell script for this.
Since the strings above and below are unique but the line count varies, I had put it in a file called "actlist":
56633223223 1 5
56633223224 1 6
56633223225 1 3
.
.
Now from below "Rcount" I get how many iterations to be performed, and in each iteration i have to get ith row and see if the word count is 3, if it is then take those values into variable form and use something like this
I'm stuck at the below, which command to be used. I'm thinking of using AWK but if there is anything better please advise. Here's some pseudo-code showing what I'm trying to do:
xxxxx=/root/xxx/xxxxxxx
Rcount=`wc -l $xxxxx | awk -F " " '{print $1}'`
i=1
while ((i <= Rcount))
do
record=_________________'(Awk command to retrieve ith(1st) record (of $xxxx),
wcount=_________________'(Awk command to count the number of words in $record)
(( i=i+1 ))
done
Note: record, wcount values are later printed to a log file.
Sounds like you're looking for something like this:
#!/bin/bash
while read -r word1 word2 word3 junk; do
if [[ -n "$word1" && -n "$word2" && -n "$word3" && -z "$junk" ]]; then
echo "all good"
else
echo "error"
fi
done < /root/shravan/actlist
This will go through each line of your input file, assigning the three columns to word1, word2 and word3. The -n tests that read hasn't assigned an empty value to each variable. The -z checks that there are only three columns, so $junk is empty.
I PROMISE you you are going about this all wrong. To find words in file1 and search for those words in file2 and file3 is just:
awk '
NR==FNR{ for (i=1;i<=NF;i++) words[$i]; next }
{ for (word in words) if ($0 ~ word) print FILENAME, word }
' file1 file2 file3
or similar (assuming a simple grep -f file1 file2 file3 isn't adequate). It DOES NOT involve shell loops to call awk to pull out strings to save in shell variables to pass to other shell commands, etc, etc.
So far all you're doing is asking us to help you implement part of what you think is the solution to your problem, but we're struggling to do that because what you're asking for doesn't make sense as part of any kind of reasonable solution to what it sounds like your problem is so it's hard to suggest anything sensible.
If you tells us what you are trying to do AS A WHOLE with sample input and expected output for your whole process then we can help you.
We don't seem to be getting anywhere so let's try a stab at the kind of solution I think you might want and then take it from there.
Look at these 2 files "old" and "new" side by side (line numbers added by the cat -n):
$ paste old new | cat -n
1 a b
2 b 56633223223
3 56633223223 c
4 c d
5 d h
6 e 56633223225
7 f i
8 g Z
9 h k
10 56633223225 l
11 i
12 j
13 k
14 l
Now lets take this "actlist":
$ cat actlist
56633223223 1 2
56633223225 1 3
and run this awk command on all 3 of the above files (yes, I know it could be briefer, more efficient, etc. but favoring simplicity and clarity for now):
$ cat tst.awk
ARGIND==1 {
numPre[$1] = $2
numSuc[$1] = $3
}
ARGIND==2 {
oldLine[FNR] = $0
if ($0 in numPre) {
oldHitFnr[$0] = FNR
}
}
ARGIND==3 {
newLine[FNR] = $0
if ($0 in numPre) {
newHitFnr[$0] = FNR
}
}
END {
for (str in numPre) {
if ( str in oldHitFnr ) {
if ( str in newHitFnr ) {
for (i=-numPre[str]; i<=numSuc[str]; i++) {
oldFnr = oldHitFnr[str] + i
newFnr = newHitFnr[str] + i
if (oldLine[oldFnr] != newLine[newFnr]) {
print str, "mismatch at old line", oldFnr, "new line", newFnr
print "\t" oldLine[oldFnr], "vs", newLine[newFnr]
}
}
}
else {
print str, "is present in old file but not new file"
}
}
else if (str in newHitFnr) {
print str, "is present in new file but not old file"
}
}
}
.
$ awk -f tst.awk actlist old new
56633223225 mismatch at old line 12 new line 8
j vs Z
It's outputing that result because the 2nd line after 56633223225 is j in file "old" but Z in file "new" and the file "actlist" said the 2 files had to be common from one line before until 3 lines after that pattern.
Is that what you're trying to do? The above uses GNU awk for ARGIND but the workaround is trivial for other awks.
Use the below code:
awk '{if (NF == 3) { word1=$1; word2=$2; word3=$3; print "Words are:" word1, word2, word3} else {print "Line", NR, "is having", NF, "Words" }}' filename.txt
I have given the solution as per the requirement.
awk '{ # awk starts from here and read a file line by line
if (NF == 3) # It will check if current line is having 3 fields. NF represents number of fields in current line
{ word1=$1; # If current line is having exact 3 fields then 1st field will be assigned to word1 variable
word2=$2; # 2nd field will be assigned to word2 variable
word3=$3; # 3rd field will be assigned to word3 variable
print word1, word2, word3} # It will print all 3 fields
}' filename.txt >> output.txt # THese 3 fields will be redirected to a file which can be used for further processing.
This is as per the requirement, but there are many other ways of doing this but it was asked using awk.