Replace column by comparing with the other column - bash

I have a file like this
1 CC AAA
1 Na AAA
1 Na AAA
1 Na AAA
1 Na AAA
1 CC BBB
1 Na BBB
1 Na BBB
1 xa BBB
1 CC CCC
1 Na CCC
1 da CCC
I would like to remove the column 2 and then replce with "01" for AAA, "02" for BBB and so on for entire file. Finally the output should looks like,
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 02 BBB
1 02 BBB
1 02 BBB
1 02 BBB
1 03 CCC
1 03 CCC
1 03 CCC
I dont have any clue to make this working. Please help me if possible. Here in every cc the new variable starts. that is from AAA to BBB can be track by only CC in 2nd column.

One way of doing it in awk:
awk '$3!=a&&NF{a=$3;x=sprintf("%02d",++x);print $1,x,$3;next}$3==a&&NF{print $1,x,$3;next }1' inputFile

Here's one way using awk:
awk '$3 != r { ++i } { $2 = sprintf ("%02d", i) } { r = $3 }1' OFS="\t" file
I've set the OFS to a tab-char, but you can choose what you like. Results:
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 02 BBB
1 02 BBB
1 02 BBB
1 02 BBB
1 03 CCC
1 03 CCC
1 03 CCC

Seems like you want:
awk '$2=="CC" { a+=1 } {$2=sprintf("%02d",a)} 1' input

Related

Joining two files that both have duplicate rows

I am trying to join two files that have identical column 1 and different column 2:
File1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
File2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
When I try to join them with
join File1 File2 > File3
I get
aaa 1 2
bbb 3 2
bbb 3 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 1 0
ccc 1 1
ccc 1 1
ccc 1 0
ccc 0 1
ccc 0 1
ccc 0 0
join is trying to expand the duplicates when all I want it to do is go line-by line so the output should be
aaa 1 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 0 0
How do I tell join to ignore duplicates and just combine the files line-by-line?
EDIT: This is being done in a loop with multiple files that all have the same column 1 but different column 2. I am joining the first two files into a temporary file and then looping through the other files joining with that temporary file.
Based on a suggestion from #Andre Wildberg, this worked best:
paste File1 <(cut -d " " -f 2 File2)
This allowed be to loop through a list of files:
cat File1 > tmp
for file in $files
do
paste tmp <(cut -d " " -f 2 $file) > tmpf
mv tmpf tmp
done
mv tmp FinalFile
Assumptions:
all files have the same number of rows
all files have the same values in the first column for the same numbered row
the final result set can fit into memory
Sample input:
$ for f in f{1..4}
do
echo "############ $f"
cat $f
done
############ f1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
############ f2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
############ f3
aaa 12
bbb 12
bbb 12
ccc 11
ccc 11
ccc 10
############ f4
aaa 202
bbb 202
bbb 202
ccc 201
ccc 201
ccc 200
One awk idea:
awk '
FNR==NR { a[FNR]=$0; next }
{ a[FNR]=a[FNR] OFS $2 }
END { for (i=1;i<=FNR;i++)
print a[i]
}
' f1 f2 f3 f4
This generates:
aaa 1 2 12 202
bbb 3 2 12 202
bbb 3 2 12 202
ccc 1 1 11 201
ccc 1 1 11 201
ccc 0 0 10 200

How to replace columns (matching pattern) using awk?

I am trying to use awk to edit files but I cant manage to do it without creating intermediate files.
Basicaly I want to search using column 1 in file2 and file3 and so on, and replace the 2nd column for matching 1st column lines. (note that file2 and file3 may contain other stuff)
I have
File1.txt
aaa 111
aaa 222
bbb 333
bbb 444
File2.txt
zzz zzz
aaa 999
zzz zzz
aaa 888
File3.txt
bbb 000
bbb 001
yyy yyy
yyy yyy
Desired output
aaa 999
aaa 888
bbb 000
bbb 001
this does what you specified but I guess there are many edge cases not covered.
$ awk 'NR==FNR{a[$1]; next} $1 in a' file{1..3}
aaa 999
aaa 888
bbb 000
bbb 001

How to apply loop in two files in awk with one matched column?

I am trying to extract data from two files with a common column but I am unable to fetch the required data.
File1
A B C D E F G
Dec 3 abc 10 2B 21 OK
Dec 1 %xyZ 09 3F 09 NOK
Dec 5 mnp 89 R5 11 OK
File2
H I
abc 10
xyz 00
pqr 45
I am able to get output A B C D E F G but unable to add I in between C & E column.
Trail 1:
awk 'FNR==1{next}
NR==FNR{a[$1]=$2; next}
{k=$3; sub(/^\%/,"",k)} k in a{print $1,$2,$3,$4,a[$2],$5,$6,$7; delete a[k]}
END{for(k in a) print k,a[k] > "unmatched"}' File2 File1 > matched
Required output:
matched:
A B C D I E F G
Dec 3 abc 10 10 2B 21 OK
Dec 1 %xyZ 09 00 3F 09 NOK
unmatched :
H I
pqr 45
Could you please help me for getting this output please ? Thank you.
Be careful that you have an upper case Z in file1. I put it to lower case in my test --- if it's not a typo it's another small detail to deal with.
$ awk 'FNR==1 {next}
NR==FNR {a[$1]=$2; next}
{k=$3; sub(/^\%/,"",k)}
k in a {print $1,$2,$3,$4,a[k],$5,$6,$7; delete a[k]}
END {for(k in a) print k,a[k] > "unmatched"}' File2 File1 > matched
$ cat matched
Dec 3 abc 10 10 2B 21 OK
Dec 1 %xyz 09 00 3F 09 NOK
$ cat unmatched
pqr 45
File 2 with four columns:
$ cat a
A B C D E F G
Dec 3 abc 10 2B 21 OK
Dec 1 %xyz 09 3F 09 NOK
Dec 5 mnp 89 R5 11 OK
$ cat b
H I J K
abc 10 j1 k1
xyz 00 j2 k2
pqr 45 j3 k3
$ cat x.awk
FNR==1 {next}
NR==FNR {a[$1]=$0; next}
{k=$3; sub(/^\%/,"",k)}
k in a {
split(a[k], b)
print $1,$2,b[2],$3,b[3],b[4],$4,$5,$6,$7; delete a[k]
}
END {for(k in a) print a[k] > "unmatched"}
$ awk -f x.awk b a
Dec 3 10 abc j1 k1 10 2B 21 OK
Dec 1 00 %xyz j2 k2 09 3F 09 NOK
$ cat unmatched
pqr 45 j3 k3

Can Anyone help me with Computer theory?

Hey guys I was given the following:
Consider the language S* where S = {aa, aaa}. Describe all the ways that a^12 can be written as the concatenation of factors in S.
I got 0 on this question even though it seemed pretty straight forward
I would interpreted it as in how many different ways can I reach 12 'a's.
Which would be 12 as you said:
one for only aa:
aa aa aa aa aa aa
one for only aaa
aaa aaa aaa aaa
and (5*4)/2 for two aaas
aa aa aa aaa aaa
aa aa aaa aa aaa
aa aaa aa aa aaa
aaa aa aa aa aaa
aa aa aaa aaa aa
aa aaa aa aaa aa
aaa aa aa aaa aa
aa aaa aaa aa aa
aaa aa aaa aa aa
aaa aaa aa aa aa

Oracle SQL How do you fill out missing months and corresponding 0 [duplicate]

I have an Mview that brings data group by idNumber and Month. So I want to display 0 if there is no data for an specific month. This is my query:
select MonthName, myCost, myNumber
from
(
select MONTH mm, myCost, myNumber
from myOracle_mv
) myTotals,
(
select to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from dual
connect by level <= 12
) ALLMONTHS
where mm = MonthName
So I was expecting:
Month Number Data
-----------------------
1 abc123 4444
2 0
3 abc123 4444
4 abc123 4444
5 0
6 abc123 4444
7 abc123 4444
8 0
9 abc123 4444
10 abc123 4444
11 0
12 abc123 4444
Instead I'm still getting:
1 abc123 4444
3 abc123 4444
4 abc123 4444
6 abc123 4444
7 abc123 4444
9 abc123 4444
10 abc123 4444
12 abc123 4444
Any ideas?
Thanks!
EDIT: Thanks for the answers. I did have the outer join in my Query but forgot to type in because was concentrating in changes the names of the table/columns.
So yes, I have tried with the OUTER JOIN and I still not getting the expected results. Any feedback is greatly appreciated.
EDIT: This is the data on myOracle_MV:
3777.24 AAA 1 2012
49973.12 AAA 2 2012
4049.91 AAA 3 2012
469.485 AAA 4 2012
5872.22 AAA 5 2012
65837.71 AAA 6 2012
566.23 AAA 7 2012
18432.95 AAA 8 2012
4337.75 AAA 12 2011
18811 BBB 1 2012
29872.67 BBB 2 2012
29068.55 BBB 3 2012
264957.8 BBB 4 2012
67673 BBB 5 2012
855.02 BBB 6 2012
5226.1 BBB 7 2012
2663.24 BBB 8 2012
5490.58 BBB 12 2011
3845.47 CCC 1 2012
3050.54 CCC 2 2012
3784.44 CCC 3 2012
799.73 CCC 4 2012
124884.2 CCC 5 2012
5157.24 CCC 6 2012
19184.78 CCC 7 2012
2280.05 CCC 8 2012
107.07 DDD 3 2012
181.78 DDD 4 2012
110.09 DDD 5 2012
18016.19 DDD 6 2012
1772.95 DDD 7 2012
63.32 DDD 8 2012
Very similar to exsiting answers, but this:
select months.month, mv.mycost, coalesce(mv.mynumber, 0) as mynumber
from (
select to_char(date '1970-01-01'
+ numtoyminterval(level - 1, 'month'), 'mm') as month
from dual
connect by level <= 12) months
left join myoracle_mv mv
on mv.month = months.month
order by months.month, mv.mycost, mv.mynumber;
gives this with the data you posted:
MONTH MYCOST MYNUMBER
----- ------ ----------
01 AAA 3777.24
01 BBB 18811
01 CCC 3845.47
02 AAA 49973.12
02 BBB 29872.67
02 CCC 3050.54
03 AAA 4049.91
03 BBB 29068.55
03 CCC 3784.44
03 DDD 107.07
04 AAA 469.485
04 BBB 264957.8
04 CCC 799.73
04 DDD 181.78
05 AAA 5872.22
05 BBB 67673
05 CCC 124884.2
05 DDD 110.09
06 AAA 65837.71
06 BBB 855.02
06 CCC 5157.24
06 DDD 18016.19
07 AAA 566.23
07 BBB 5226.1
07 CCC 19184.78
07 DDD 1772.95
08 AAA 18432.95
08 BBB 2663.24
08 CCC 2280.05
08 DDD 63.32
09 0
10 0
11 0
12 AAA 4337.75
12 BBB 5490.58
35 rows selected
If you want a zero to appear in the mynumber column then you can make that:
select months.month, mv.mycost, coalesce(mv.mynumber, 0) as mynumber
which gives:
...
08 DDD 63.32
09 0
10 0
11 0
12 AAA 4337.75
...
From the comments on Jafar's answer it sounds like maybe you'd got that far on your own but you want zero values for all mycost values for all months. If that is the case then you need to get the list of possible values for mycost and outer join to that as well. This is taking all values that are in the MV already:
select months.month, costs.mycost, coalesce(mv.mynumber, 0) as mynumber
from (
select to_char(date '1970-01-01'
+ numtoyminterval(level - 1, 'month'), 'mm') as month
from dual
connect by level <= 12) months
cross join (
select distinct mycost
from myoracle_mv) costs
left join myoracle_mv mv
on mv.month = months.month
and mv.mycost = costs.mycost
order by months.month, costs.mycost, mv.mynumber;
and gives:
MONTH MYCOST MYNUMBER
----- ------ ----------
01 AAA 3777.24
01 BBB 18811
01 CCC 3845.47
01 DDD 0
02 AAA 49973.12
02 BBB 29872.67
02 CCC 3050.54
02 DDD 0
03 AAA 4049.91
03 BBB 29068.55
03 CCC 3784.44
03 DDD 107.07
04 AAA 469.485
04 BBB 264957.8
04 CCC 799.73
04 DDD 181.78
05 AAA 5872.22
05 BBB 67673
05 CCC 124884.2
05 DDD 110.09
06 AAA 65837.71
06 BBB 855.02
06 CCC 5157.24
06 DDD 18016.19
07 AAA 566.23
07 BBB 5226.1
07 CCC 19184.78
07 DDD 1772.95
08 AAA 18432.95
08 BBB 2663.24
08 CCC 2280.05
08 DDD 63.32
09 AAA 0
09 BBB 0
09 CCC 0
09 DDD 0
10 AAA 0
10 BBB 0
10 CCC 0
10 DDD 0
11 AAA 0
11 BBB 0
11 CCC 0
11 DDD 0
12 AAA 4337.75
12 BBB 5490.58
12 CCC 0
12 DDD 0
48 rows selected
But hopefully you have another table that holds the possble mycost values (assuming that's representing something like a cost center, rather than a price; slightly hard to tell what's what) and you can use that instead of the subquery.
SQL Fiddle.
Also note that if you wanted to add a filter, e.g. to restrict data to a particular year, you'd need to do that in the in the left join clause, not as a where clause, or you'd revert the outer join to an inner one. For example, adding this:
where mv.year = 2011
would mean you only got back two rows:
MONTH MYCOST MYNUMBER
----- ------ ----------
12 AAA 4337.75
12 BBB 5490.58
But if you made than another condition on the outer join you'd still get 48 rows back, with 46 of them having zeros and two having the values above:
...
left join myoracle_mv mv
on mv.month = months.month
and mv.mycost = costs.mycost
and mv.year = 2011
order by months.month, costs.mycost, mv.mynumber;
...
11 CCC 0
11 DDD 0
12 AAA 4337.75
12 BBB 5490.58
12 CCC 0
12 DDD 0
48 rows selected
You'd need to do an outer join between the two inline views
select MonthName, myCost, myNumber
from (select MONTH mm, myCost, myNumber
from myOracle_mv
) myTotals
right outer join
(select to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from dual
connect by level <= 12) ALLMONTHS
on( myTotals.mm = allmonths.MonthName )
You can also use the old Oracle-specific (+) syntax for outer joins but I would generally suggest using the SQL standard syntax.
Maybe something like this
select MonthName, COALESCE(myCost,0), myNumber
from
(
select to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from dual
connect by level <= 12
) ALLMONTHS LEFT OUTER JOIN
(
select MONTH mm, myCost, myNumber
from myOracle_mv
) myTotals ON
mm = MonthName
you need an outer join ( (+) at the end of your query ):
select MonthName, myCost, myNumber from
(
select
MONTH mm, myCost, myNumber
from
myOracle_mv
) myTotals,
(
select
to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from
dual
connect by level <= 12
)ALLMONTHS
where mm = MonthName(+)
for your example, you don't need to calculate dates:
select MONTH mm, NVL(myCost, 0), myNumber
from
(select level from dual connect by level <= 12) NUM
left outer join myOracle_mv MV ON ( MV.MONTH = NUM.level )
;

Resources