awk get the nextline - bash

i'm trying to use awk to format a file thats contains multiple line.
Contains of file:
ABC;0;1
ABC;0;0;10
ABC;0;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12
KLM;6;18;1200
KLM;10;18;14
KLM;1;18;15
result desired:
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
I am using the code below :
awk -F ";" '{
ligne= ligne $0
ma_var = $1
{getline
if($1 != ma_var){
ligne= ligne "\n" $0
}
else {
ligne= ligne";"NF
}
}
}
END {
print ligne
} ' ${FILE_IN} > ${FILE_OUT}
the objectif is to compare the first column of the next line to the first column the current line, if it matches then add the last column of the next line to the current line, and delete the next line, else print the next line.
Kind regards,

As with life, it's a lot easier to make decisions based on what has happened (the previous line) than what will happen (the next line). Re-state your requirements as the objective is to compare the first column of the current line to the first column the previous line, if it matches then add the last column of the current line to the previous line, and delete the current line, else print the current line. and the code to implement it becomes relatively straight-forward:
$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 == p1 { prev = prev OFS $NF; next }
{ if (NR>1) print prev; prev=$0; p1=$1 }
END { print prev }
$ awk -f tst.awk file
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
If you're ever tempted to use getline again, be sure you fully understand everything discussed at http://awk.freeshell.org/AllAboutGetline before making a decision.

I would take a slightly different approach than Ed:
$ awk '$1 == p { printf ";%s", $NF; next } NR > 1 { print "" } {p=$1;
printf "%s" , $0} END{print ""}' FS=\; input
At each line, check if the first column matches the previous. If it does, just print the last field. If it doesn't, print the whole line with no trailing newline.

Related

Merge rows with same value and every 100 lines in csv file using command

I have a csv file like below:
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
...
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
...
I want combine the csv file to new csv file like below:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
http://www.z.com/4
...
http://www.z.com/100
",flower
"http://www.z.com/101
http://www.z.com/102
http://www.z.com/103
http://www.z.com/104
...
http://www.z.com/200
",flower
I want keep the first column every cell have max 100 lines http url.
Column two same value will appear in corresponding cell.
Is there a very simple command pattern to achieve this idea ?
I used command below:
awk '{if(NR%100!=0)ORS="\t";else ORS="\n"}1' test.csv > result.csv
$ awk -F, '$2!=p || n==100 {if(NR!=1) print "\"," p; printf "\""; p=$2; n=0}
{print $1; n+=1} END {print "\"," p}' test.csv
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
",flower
First set the field separator to the comma (-F,). Then:
If the second field changes ($2!=p) or if we already printed 100 lines in the current batch (n==100):
if it is not the first line, print a double quote, a comma, the previous second field and a newline,
print a double quote,
store the new second field in variable p for later comparisons,
reset line counter n.
For all lines print the first field and increment line counter n.
At the end print a double quote, a comma and the last value of second field.
1st solution: With your shown samples, please try following awk code.
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
' Input_file
2nd solution: In case your Input_file is NOT sorted with 2nd column then try following sort + awk code.
sort -t, -k2 Input_file |
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
'
Output will be as follows:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3",flower
Given:
cat file
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
Here is a two pass awk to do this:
awk -F, 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
If you want to print either at the change of the $2 value or at some fixed line interval (like 100) you can do:
awk -F, -v n=100 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR || FNR%n==0{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
Either prints:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4"
,apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3"
,flower

AWK Script: Matching Two File with Unique Identifier and append record if already match

I'm trying to comparing two files with field as unique identifier to match.
With file 1 having account number and compare with second file.
If account number both file, next is condition to match the value and append to the original file or record.
Sample file 1:
ACCT1,PHONE1,TEST1
ACCT2,PHONE2,TEST3
Sample file 2:
ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING1
But since the awk always gets the last occurrences of the file even there is already match before the end of record.
Actual Output base with condition below:
ACCT1,PHONE1,TEST1,000
ACCT2,PHONE2,TEST3,001
Expected Output:
ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001
Code I'm trying to:
awk -f test.awk pass=0 samplefile2.txt pass=1 samplefile1.txt > output.txt
BEGIN{
}
pass==0{
FS=","
ACT=$1
RES1[ACT]=$2
}
pass==1{
ACCTNO=$1
PHNO=$2
FIELD3=$3
LVCODE=RES1[ACCTNO]
if(LVCODE=="SOMETHING1"){ OTHERFLAG="001" }
else if(LVCODE=="SOMETHING4"){ OTHERFLAG="002" }
else{ OTHERFLAG="000" }
printf("%s\,", ACCTNO)
printf("%s\,", PHNO)
printf("%s\,", FIELD3)
printf("%s", OTHERFLAG)
printf "\n"
}
I'm trying to loop the variable that holds array, unfortunately it turns to infinite loop during my run.
You may use this awk command:
awk '
BEGIN {FS=OFS=","}
NR==FNR {
map[$1] = $0
next
}
$1 in map {
print map[$1], ($2 == "SOMETHING1" ? "001" : ($2 == "SOMETHING4" ? "002" : "000"))
delete map[$1]
}' file1 file2
ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001
Once we print a matching record from file2 we delete record from associative array map to ensure only first matching record is evaluated.
It sounds like you want to know the first occurrence of ACCTx in samplefile2.txt if SOMETHING1 or SOMETHING4 is present. I think you should read samplefile1.txt first into a data struction and then iterate line by line in samplefile2.txt looking for your criteria
BEGIN {
FS=","
while (getline < ACCOUNTFILE ) accounts[$1]=$0
}
{ OTHERFLAG = "" }
$2 == "SOMETHING1" { OTHERFLAG="001" }
$2 == "SOMETHING4" { OTHERFLAG="002" }
($1 in accounts) && OTHERFLAG!="" {
print(accounts[$1] "," OTHERFLAG)
# delete the accounts so that it does not print again.
# Only the first occurrence in samplefile2.txt will matter.
delete accounts[$1]
}
END {
# Print remaining accounts that did not match above
for (acct in accounts) print(accounts[acct] ",000")
}
Run above with:
awk -v ACCOUNTFILE=samplefile1.txt -f test.awk samplefile2.txt
I am not sure what you want to do if both SOMETHING1 and SOMETHING4 are in samplefile2.txt for the same ACCT1. If you want 'precedence' so that SOMETHING4 will overrule SOMETHING1 if it comes after you will need additional logic. In that case you probably want to avoid the 'delete' and keep updating the accounts[$1] array until you reach the end of the file and then print all the accounts at the end.

awk: search for a string, but only inside a range

I want to search for a string in a file, and print that line plus the line preceding it, but only after the line where string XXX appears in the file. How can I achieve this?
Here is an example: search for lines containing the string "### records", but only after the line that says "start real work"
INPUT FILE
cat << EOF > x.x
start job
initialization
20 records
start real work
first step
30 records
# comments
second step
0 records
#comments
third step
22 records
end
EOF
AWK ONE-LINER - This searches through the whole file, I can't figure out how to only start searching for the string "#### records" after the line that says "start real work"
awk '/records/ && !/^0 records/{for(i=1;i<=x;)print a[i++];print} \
{for(i=1;i<x;i++)a[i]=a[i+1];a[x]=$0;}' x=1 x.x
DESIRED OUTPUT
first step
30 records
third step
22 records
With your shown samples, please try following awk code.
awk '
/start real work/{
found=1
next
}
val && /records/{
if($1>0){
print val ORS $0
}
val=""
next
}
found && NF && !/#/{
val=$0
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/start real work/{ ##Check if line contains start real work then do following.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
val && /records/{ ##Checking if val is set and line contains records then do following.
if($1>0){ ##Check if 1st field is greater than 0 then do following.
print val ORS $0 ##printing val ORS and current line here.
}
val="" ##Nullifying val here.
next ##next will skip all further statements from here.
}
found && NF && !/#/{ ##Checking if found is SET and NF is NOT NULL and lines is not having #
val=$0 ##Then set val to current line.
}
' Input_file ##Mentioning Input_file name here.
awk '
/start real work/ { inWork = 1 }
inWork && /^[1-9].* records/ { print prev ORS $0 }
{ prev = $0 }
' file
first step
30 records
third step
22 records
awk '/^start real work/{flag=1} flag && !/[0-9]{2}/{lastline=$0} flag && /^[0-9]{2} records/{print lastline;print}' x.x
first step
30 records
third step
22 records
note:
your problem description does not mention anything about the "step" lines shown your output.
The idea is to set a flag when you see the signal to begin and check the flag along with any other test you may require.
If the flag is set and a line is not a valid "records" line,
then stash it (as lastline).
If the flag is set and the line is a valid two digit "records" line then output the stashed line and then the current line.
With awk in paragraph mode:
awk -v RS= -v FS='\n' -v OFS='\n' '
/start real work/ {f=1;next}
f && (/records/ && !/^#/)
f && (/^#/ && $3 !~ /^0/) {print $2,$3}
' file
first step
30 records
third step
22 records

Editing text in Bash

I am trying to edit text in Bash, i got to point where i am no longer able to continue and i need help.
The text i need to edit:
Symbol Name Sector Market Cap, $K Last Links
AAPL
Apple Inc
Computers and Technology
2,006,722,560
118.03
AMGN
Amgen Inc
Medical
132,594,808
227.76
AXP
American Express Company
Finance
91,986,280
114.24
BA
Boeing Company
Aerospace
114,768,960
203.30
The text i need:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
I already tried :
sed 's/$/,/' BIPSukol.txt > BIPSukol1.txt | awk 'NR==1{print}' BIPSukol1.txt | awk '(NR-1)%5{printf "%s ", $0;next;}1' BIPSukol1.txt | sed 's/.$//'
But it doesnt quite do the job.
(BIPSukol1.txt is the name of the file i am editing)
The biggest problem you have is you do not have consistent delimiters between your fields. Some have commas, some don't and some are just a combination of 3-fields that happen to run together.
The tool you want is awk. It will allow you to treat the first line differently and then condition the output that follows with convenient counters you keep within the script. In awk you write rules (what comes between the outer {...} and then awk applies your rules in the order they are written. This allows you to "fix-up" your hap-hazard format and arrive at the desired output.
The first rule applied FNR==1 is applied to the 1st line. It loops over the fields and finds the problematic "Market Cap $K" field and considers it as one, skipping beyond it to output the remaining headings. It stores a counter count = NF - 3 as you only have 5 lines of data for each Symbol, and skips to the next record.
When count==n the next rule is triggered which just outputs the records stored in the a[] array, zeros count and deletes the a[] array for refilling.
The next rule is applied to every record (line) of input from the 2nd-on. It simply removes any whitespece from the fields by forcing awk to recalculate the fields with $1 = $1 and then stores the record in the array incrementing count.
The last rule, END is a special rule that runs after all records are processed (it lets you sum final tallies or output final lines of data) Here it is used to output the records that remain in a[] when the end of the file is reached.
Putting it altogether in another cut at awk:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
for (i=1;i<=n;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
delete a
count = 0
}
{
$1 = $1
a[++count] = $0
}
END {
for (i=1;i<=count;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
}
' file
Example Use/Output
Note: you can simply select-copy the script above and then middle-mouse-paste it into an xterm with the directory set so it contains file (you will need to rename file to whatever your input filename is)
$ awk '
> FNR==1 {
> for (i=1;i<=NF;i++)
> if ($i == "Market") {
> printf ",Market Cap $K"
> i = i + 2
> }
> else
> printf (i>1?",%s":"%s"), $i
> print ""
> n = NF-3
> count = 0
> next
> }
> count==n {
> for (i=1;i<=n;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> delete a
> count = 0
> }
> {
> $1 = $1
> a[++count] = $0
> }
> END {
> for (i=1;i<=count;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> }
> ' file
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
(note: it is unclear why you want the "Links" heading included since there is no information for that field -- but that is how your desired output is specified)
More Efficient No Array
You always have afterthoughts that creep in after you post an answer, no different than remembering a better way to answer a question as you are walking out of an exam, or thinking about the one additional question you wished you would have asked after you excuse a witness or rest your case at trial. (there was some song that captured it -- a little bit ironic :)
The following does essentially the same thing, but without using arrays. Instead it simply outputs the information after formatting it rather than buffer it in an array for output all at once. It was one of those type afterthoughts:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
print ""
count = 0
}
{
$1 = $1
printf (++count>1?",%s":"%s"), $0
}
END { print "" }
' file
(same output)
With your shown samples, could you please try following(written and tested in GNU awk). Considering that(by seeing OP's attempts) after header of Input_file you want to make every 5 lines into a single line.
awk '
BEGIN{
OFS=","
}
FNR==1{
NF--
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
OR if your awk doesn't support NF-- then try following.
awk '
BEGIN{
OFS=","
}
FNR==1{
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +Links( +)?$/,"",lastPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
NOTE: Looks like your header/first line needed special manipulation because we can't simply set , for all spaces, so taken care of it in this solution as per shown samples.
With GNU awk. If your first line is always the same.
echo 'Symbol,Name,Sector,Market Cap $K,Last,Links'
awk 'NR>1 && NF=5' RS='\n ' ORS='\n' FS='\n' OFS=',' file
Output:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

Display input file without print in awk

My Code is in mid of manipulating two input files.
awk -F'|' -v PARM_VAL="${PARM_VALUE[*]}" '
BEGIN { split(PARM_VAL,pa," ") }
FNR==NR
{
for(i=1;i<=NF;i++)
a[NR,i]=$i;
}
END {printf "second value of SPPIN : "a[2,2]", parm : "pa[2]", File val : " FILENAME "First rec of SPPOUT: " $0 ;printf "\n" } ' SPP_IN SPP_OUT
I am passing parm array to awk, storing first input file in array. Just executed the above command.
My first input file is getting displayed without print. Anyway to suppress or avoid it?
Don't split FNR == NR and the { of the action.
FNR == NR
{
Put them on the same line instead.
FNR == NR {
awk is seeing FNR==NR as a pattern without an action and using the default action of print.

Resources