I am trying to edit text in Bash, i got to point where i am no longer able to continue and i need help.
The text i need to edit:
Symbol Name Sector Market Cap, $K Last Links
AAPL
Apple Inc
Computers and Technology
2,006,722,560
118.03
AMGN
Amgen Inc
Medical
132,594,808
227.76
AXP
American Express Company
Finance
91,986,280
114.24
BA
Boeing Company
Aerospace
114,768,960
203.30
The text i need:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
I already tried :
sed 's/$/,/' BIPSukol.txt > BIPSukol1.txt | awk 'NR==1{print}' BIPSukol1.txt | awk '(NR-1)%5{printf "%s ", $0;next;}1' BIPSukol1.txt | sed 's/.$//'
But it doesnt quite do the job.
(BIPSukol1.txt is the name of the file i am editing)
The biggest problem you have is you do not have consistent delimiters between your fields. Some have commas, some don't and some are just a combination of 3-fields that happen to run together.
The tool you want is awk. It will allow you to treat the first line differently and then condition the output that follows with convenient counters you keep within the script. In awk you write rules (what comes between the outer {...} and then awk applies your rules in the order they are written. This allows you to "fix-up" your hap-hazard format and arrive at the desired output.
The first rule applied FNR==1 is applied to the 1st line. It loops over the fields and finds the problematic "Market Cap $K" field and considers it as one, skipping beyond it to output the remaining headings. It stores a counter count = NF - 3 as you only have 5 lines of data for each Symbol, and skips to the next record.
When count==n the next rule is triggered which just outputs the records stored in the a[] array, zeros count and deletes the a[] array for refilling.
The next rule is applied to every record (line) of input from the 2nd-on. It simply removes any whitespece from the fields by forcing awk to recalculate the fields with $1 = $1 and then stores the record in the array incrementing count.
The last rule, END is a special rule that runs after all records are processed (it lets you sum final tallies or output final lines of data) Here it is used to output the records that remain in a[] when the end of the file is reached.
Putting it altogether in another cut at awk:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
for (i=1;i<=n;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
delete a
count = 0
}
{
$1 = $1
a[++count] = $0
}
END {
for (i=1;i<=count;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
}
' file
Example Use/Output
Note: you can simply select-copy the script above and then middle-mouse-paste it into an xterm with the directory set so it contains file (you will need to rename file to whatever your input filename is)
$ awk '
> FNR==1 {
> for (i=1;i<=NF;i++)
> if ($i == "Market") {
> printf ",Market Cap $K"
> i = i + 2
> }
> else
> printf (i>1?",%s":"%s"), $i
> print ""
> n = NF-3
> count = 0
> next
> }
> count==n {
> for (i=1;i<=n;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> delete a
> count = 0
> }
> {
> $1 = $1
> a[++count] = $0
> }
> END {
> for (i=1;i<=count;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> }
> ' file
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
(note: it is unclear why you want the "Links" heading included since there is no information for that field -- but that is how your desired output is specified)
More Efficient No Array
You always have afterthoughts that creep in after you post an answer, no different than remembering a better way to answer a question as you are walking out of an exam, or thinking about the one additional question you wished you would have asked after you excuse a witness or rest your case at trial. (there was some song that captured it -- a little bit ironic :)
The following does essentially the same thing, but without using arrays. Instead it simply outputs the information after formatting it rather than buffer it in an array for output all at once. It was one of those type afterthoughts:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
print ""
count = 0
}
{
$1 = $1
printf (++count>1?",%s":"%s"), $0
}
END { print "" }
' file
(same output)
With your shown samples, could you please try following(written and tested in GNU awk). Considering that(by seeing OP's attempts) after header of Input_file you want to make every 5 lines into a single line.
awk '
BEGIN{
OFS=","
}
FNR==1{
NF--
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
OR if your awk doesn't support NF-- then try following.
awk '
BEGIN{
OFS=","
}
FNR==1{
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +Links( +)?$/,"",lastPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
NOTE: Looks like your header/first line needed special manipulation because we can't simply set , for all spaces, so taken care of it in this solution as per shown samples.
With GNU awk. If your first line is always the same.
echo 'Symbol,Name,Sector,Market Cap $K,Last,Links'
awk 'NR>1 && NF=5' RS='\n ' ORS='\n' FS='\n' OFS=',' file
Output:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
I've found this useful asnswer on SO to a problem I'm having (https://stackoverflow.com/a/30387380)
However I cannot figure out how to use the construct within a for loop.
The below is my last attempt
awk '
BEGIN{split(ENVIRON["LABELS"], label)}
{
for (i = 1; i <= NF; i++)
!found && /label[i]/ { print "# "label[i];found=1} 1
}' >> "${TMPFIL}"
But that fails with:
awk: syntax error at source line 5
context is
!found && /label[i]/ >>> { <<<
awk: illegal statement at source line 5
EDIT TO ADD DETAIL....
Further to answer from #Inian which needs further refining, here's a bit of further background to help.
I have a list of readings (in a text file) :
Foo{foober="x"} 5
Foo{foober="x"} 5
Bar{barfoo="y"} 0
Bar{barfoo"y"} 0
So, given something like :
LABELS='
Foo
Bar' \
awk '
BEGIN{split(ENVIRON["LABELS"], label)}
{
for (i = 1; i <= NF; i++)
!found && /label[i]/ { print "# "label[i];found=1} 1
}' >> "${TMPFIL}"
The expected output looks like :
# Foo
Foo{foober="x"} 5
Foo{foober="x"} 5
# Bar
Bar{barfoo="y"} 0
Bar{barfoo"y"} 0
EDIT2: As per OP's ask adding following code now.
awk -F"{" 'old!=$1{print "# "$1} {old=$1;print}' Input_file
Could you please try following. Considering that you want output as per $1's count where delimiter is space(by default in awk it is space)
awk '!a[$1]++{print "# header_group_"++count} 1' Input_file
In case you want to look for string before { then try following.
awk 'BEGIN{FS="{"} !a[$1]++{print "# header_group_"++count} 1' Input_file
You could try this awk script:
awk -F'{' '$1!=old{print "# header_group_" ++c}{old=$1}1' file
This relies on the field separator { such that the first field is the key to group lines together.
When the first field is different from the previous, the header line is printed.
I want to insert array values with all other contents of testfile.ps into result.ps file but array values not getting printed,please help.
My requirement is every time condition is met array next index value should get printed with other contents of testfile.ps into result.ps
actually arr[0] and arr[1] are big strings in my project but for simplicity i am editing it
#!/bin/bash
a[0]=""lineto""\n""stroke""
a[1]=""476.00"" ""26.00""
awk '{ if($1 == "(Page" ){for (i=0; i<2; i++){print $arr[i]; print $0; }}
else print }' testfile.ps > result.ps
testfile.ps
(Page 1 of 2 )
move
(Page 1 of 3 )
"gsave""\n""2.00"" ""setlinewidth""\n"
result.ps should be
(Page 1 of 2 )
lineto
stroke
move
(Page 1 of 3 )
476.00 26.00
gsave
2.00
setlinewidth
means once second time condition is met array index should be incremented to 1 and it should print a[1]
i applied this approch also,with only single array element but not getting any output
awk -v "a0=$a[0]" 'BEGIN {a[0]=""lineto""stroke""; if($1 == "move" ){for (i in a){ print a0;print $0; }} else print }' testfile.txt
edited:
hi , I have resolved the issue up to some extent but stuck at one place, how can i compare two strings like "a=476.00 1.00 lineto\nstroke\ngrestore\n" and "b=26.00 moveto\n368.00 1.00 lineto\n" in awk command, i am trying
awk -v "a=476.00 1.00 lineto\nstroke\ngrestore\n" -v "b=26.00 moveto\n368.00 1.00 lineto\n" -v "i=$a" '{
if ($1 == "(Page" && ($2%2==0 || $2==1) && $3 == "of"){
print i;
if [ i == a ];then
i=b; print $0;
fi
else if [ i == b ];then
i=c; print $0;
fi
else print $0;
}'testfile.txt
You are using in your awk program a variable arr which is never initialized.
In your case, you want to pass a variable from the shell to awk. From the awk man page:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such
variable values are available to the BEGIN rule of an AWK program.
Hence, you need something like
awk -v "a0=$a[0]" -v "a1=$a[1]" .....
and in a BEGIN block, you can set up your array arr from the variables a0 and a1 in any way you want.
Gather the data to a single var using a separator:
$ awk -v s="lineto\nstroke;476.00 26.00" ' # ; as separator
BEGIN{ n=split(s,a,";") } # split s var to a array
1 # output record
/\(Page/ && i<n { print a[++i] } # if (Page and still data in a
' file
(Page 1 of 2 )
lineto
stroke
move
(Page 1 of 3 )
476.00 26.00
"gsave""\n""2.00"" ""setlinewidth""\n"
I am trying to solve the SPOJ problem SIZECON using the awk programming language.Using the below code
awk ' {
t = $1;
while ( t-- ) {
getline b;
x + b * (b > 0);
print x;
}
exit;
}'
OUTPUT:
4(No.of test cases)
5
5
-5
5
6
11
-1
11
The Expected INPUT and OUTPUT is:
Input:
4
5
-5
6
-1
Output:
11
the code works perfectly fine on my linux system and getting error when submitting in spoj (NZEC ERROR).Can anyone help me ?Thanks in Advance.
This might be what you want:
$ awk 'NR<2{t=$0;next} $0>0{s+=$0} NR>t{print s+0;exit}' file
11
I originally was just going to test for t having a value but the requirements on that site just say it will be less than 1000 so I guess it could be zero.
Also you need to print s+0 to ensure you get a numeric value instead of a null string if t is zero or the file is empty.
NR<2 tests for the first input line. It would be more naturally written as NR==1 but I understand you are looking for brevity over clarity.
awk scripts are made of a series of <condition> { <action> } segments, wrapped in an implicit while read loop so the posted script is equivalent to this pseudo-code in a procedural language:
while read line from file
do
lineNr++
if (lineNr < 2) {
t=$0
next
}
if (line > 0) {
s+=$0
}
if (lineNr > t) {
print s+0
exit
}
done
I think you should be able to figure the rest out given that and with google and the awk man pages when needed.
Given the set of integers, find the sum of all positive integers in it.
This is what you're doing? Seems pretty simple:
awk '
{
if ( NR == 1 ) {
total_to_read = $0 + 1
next
}
if ( $0 > 0 ) total += $0
if ( total_to_read == NR ) {
print total
exit
}
}' test.txt
The END phase is what you want to do at the end of the loop. I am simply taking each element in the loop and adding it to total if it's greater than 0.
It's not that simple. He needs to only read the number of values specified by the integer on the first line of input and he needs the briefest possible solution (excepting white space) - Ed Morton
My original answer was to show that you were overthinking Awk. Awk does the loop for you.
I've modified the above program to include the read the first number requirement. No more END needed. I save the first value, and go to the next line. When I get to the total lines to read, I print out that total, and do exit which should end my loop.
You can see this is actually equivalent to the psuedo-code given in Ed Morton's answer. It should be easier to understand.
Ed Morton pointed out that Awk can have a series of <expression> {code} segments. I always knew you could have one, but never thought of doing it multiple times.
This means that I could use this to imply if statements instead of spelling them out. Making your code a wee bit shorter:
awk '
( NR == 1 ) {
total_to_read = $1 + 1
next
}
( $0 > 0 ) {total += $0}
( total_to_read == NR ) {
print total
exit
}' test.txt
To make it even shorter, we could use shorter variable names. Let's use t for total_to_read and s for total:
awk '
( NR == 1 ) {
t = $1 + 1
next
}
( $0 > 0 ) {s += $0}
( t == NR ) {
print s
exit
}' test.txt
A few more tweaks. Instead of equals NR == 1, I'll do NR < 2. NR is the number of records, and if you are talking about NR being less than 2, it has to be 1. You can't have zero or negative number of records in your implied awk loop.
In my original program, I was adding 1 to t (total lines to read), then testing to exit if t == NR. If I don't add 1 to the total lines to read, I save a few characters, and I can test t > NR which saves another character:
awk '
( NR < 2 ) {
t = $0
next
}
( $0 > 0 ) {s += $0}
( t > NR ) {
print s
exit
}' test.txt
Now, let's eliminate all that useless whitespace and cram it all together!
awk 'NR<2{t=$0;next} $0>0{s+=$0} NR>t{print s+0;exit}' test.txt
And, I get Ed Morton's answer... Damn.
Well, at least I hope you understand this step-by-step explanation, and understand how Ed Morton's solution works.
I'm piping a program's output through some awk commands, and I'm almost where I need to be. The command thus far is:
myprogram | awk '/chk/ { if ( $12 > $13) printf("%s %d\n", $1, $12 - $13); else printf("%s %d\n", $1, $13 - $12) } ' | awk '!x[$0]++'
The last bit is a poor man's uniq, which isn't available on my target. Given the chance the command above produces an output such as this:
GR_CB20-chk_2, 0
GR_CB20-chk_2, 3
GR_CB200-chk_2, 0
GR_CB200-chk_2, 1
GR_HB20-chk_2, 0
GR_HB20-chk_2, 6
GR_HB20-chk_2, 0
GR_HB200-chk_2, 0
GR_MID20-chk_2, 0
GR_MID20-chk_2, 3
GR_MID200-chk_2, 0
GR_MID200-chk_2, 2
What I'd like to have is this:
GR_CB20-chk_2, 3
GR_CB200-chk_2, 1
GR_HB20-chk_2, 6
GR_HB200-chk_2, 0
GR_MID20-chk_2, 3
GR_MID200-chk_2, 2
That is, I'd like to print only line that has a maximum value for a given tag (the first 'field'). The above example is representative of the at data in that the output will be sorted (as though it had been piped through a sort command).
Based on my answer to a similar need, this script keeps things in order and doesn't accumulate a big array. It prints the line with the highest value from each group.
#!/usr/bin/awk -f
{
s = substr($0, 0, match($0, /,[^,]*$/))
if (s != prevs) {
if ( FNR > 1 ) print prevline
prevval = $2
prevline = $0
}
else if ( $2 > prevval ) {
prevval = $2
prevline = $0
}
prevs = s
}
END {
print prevline
}
If you don't need the items to be in the same order they were output from myprogram, the following works:
... | awk '{ if ($2 > x[$1]) x[$1] = $2 } END { for (k in x) printf "%s %s", k, x[k] }'