Hey guys I was given the following:
Consider the language S* where S = {aa, aaa}. Describe all the ways that a^12 can be written as the concatenation of factors in S.
I got 0 on this question even though it seemed pretty straight forward
I would interpreted it as in how many different ways can I reach 12 'a's.
Which would be 12 as you said:
one for only aa:
aa aa aa aa aa aa
one for only aaa
aaa aaa aaa aaa
and (5*4)/2 for two aaas
aa aa aa aaa aaa
aa aa aaa aa aaa
aa aaa aa aa aaa
aaa aa aa aa aaa
aa aa aaa aaa aa
aa aaa aa aaa aa
aaa aa aa aaa aa
aa aaa aaa aa aa
aaa aa aaa aa aa
aaa aaa aa aa aa
Related
I am trying to join two files that have identical column 1 and different column 2:
File1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
File2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
When I try to join them with
join File1 File2 > File3
I get
aaa 1 2
bbb 3 2
bbb 3 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 1 0
ccc 1 1
ccc 1 1
ccc 1 0
ccc 0 1
ccc 0 1
ccc 0 0
join is trying to expand the duplicates when all I want it to do is go line-by line so the output should be
aaa 1 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 0 0
How do I tell join to ignore duplicates and just combine the files line-by-line?
EDIT: This is being done in a loop with multiple files that all have the same column 1 but different column 2. I am joining the first two files into a temporary file and then looping through the other files joining with that temporary file.
Based on a suggestion from #Andre Wildberg, this worked best:
paste File1 <(cut -d " " -f 2 File2)
This allowed be to loop through a list of files:
cat File1 > tmp
for file in $files
do
paste tmp <(cut -d " " -f 2 $file) > tmpf
mv tmpf tmp
done
mv tmp FinalFile
Assumptions:
all files have the same number of rows
all files have the same values in the first column for the same numbered row
the final result set can fit into memory
Sample input:
$ for f in f{1..4}
do
echo "############ $f"
cat $f
done
############ f1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
############ f2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
############ f3
aaa 12
bbb 12
bbb 12
ccc 11
ccc 11
ccc 10
############ f4
aaa 202
bbb 202
bbb 202
ccc 201
ccc 201
ccc 200
One awk idea:
awk '
FNR==NR { a[FNR]=$0; next }
{ a[FNR]=a[FNR] OFS $2 }
END { for (i=1;i<=FNR;i++)
print a[i]
}
' f1 f2 f3 f4
This generates:
aaa 1 2 12 202
bbb 3 2 12 202
bbb 3 2 12 202
ccc 1 1 11 201
ccc 1 1 11 201
ccc 0 0 10 200
I would like to count number of times appear the different susbtrings into a set of strings in 2nd column inside a tab file. So, in this way I'm doing an split to separate every substring and then try to count them. However does not work correctly.
The input is like
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA AA
The desired output
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA AA=9;AC=2
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA CC AA=10;CC=1
and so on....
awk 'BEGIN {FS=OFS="\t"} {gf=split($2,gfp," ")} {for (i=1;i<=gf;i++){
if (gfp[i]=="AA"){i++; printf $1FS$2FS"%s\n" i, gfp[i]}
else if (gfp[i]=="AC" || gfp[i] == "CA"){i++; printf $1FS$2FS"%s"gfp[i]"="i";\n"}
}}' input > output
and also I'm try to do other script but I think count repeating each count the same number of times that take place for every row. Here I have performed an split under the first split to discern between substrings
awk 'BEGIN {FS=OFS="\t"} {gf=split($2,gfp," ");} {for (i=1;i<=gf;i++){
par=gfp[i];
gfeach=split($2,gfpeach,par);
print par "=" gfeach[i]";"
}
}' input > output
I'm for sure there are some more easy ways to do it but I cannot get solve completely. Is it possible to do in UNIX environment? Thanks in advance
Your input doesn't match your output so we're all just guessing but this might be what you want:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
delete cnt
split($2,tmp,/ /)
for (i in tmp) {
str = tmp[i]
cnt[str]++
}
printf "%s", $0
sep = OFS
for (str in cnt) {
printf "%s%s=%d", sep, str, cnt[str]
sep = ";"
}
print ""
}
Depending on what your input really is the above will output the following:
$ cat file
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA AA
$ awk -f tst.awk file
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA AA=9;AC=2
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA AA AA=11
$ cat file
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA CC
$ awk -f tst.awk file
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA AA=9;AC=2
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA CC AA=10;CC=1
something like this?
$ awk '{for(i=4;i<=NF;i++) c[$i]++;
for(k in c) {s=s sep k"="c[k]; sep=";"; c[k]=0}
$NF=$NF OFS s; s=sep=""}1' file | column -t
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA AA=9;AC=2
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA AA AA=11;AC=0
note that the captured letters are progressively increasing since only the observed keys up to a row will be printed. For example if you had CC in the second row, the count won't be listed in the first line.
Could do it in perl
perl -lpe '$a{$_}++ for /\b[A-Z]{2}\b/g;
$_.=" ".join(";",map{"$_=$a{$_}"}keys%a);
%a = map{$_=>0}keys%a' file
produces
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA AA=9;AC=2
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA CC AA=10;CC=1;AC=0
For new requirement
perl -lpe '$a{$_}++ for /\b[A-Z]{2}\b/g;
$_.=" ".join(";",map{"$_=$a{$_}"}keys%a);
undef %a' file
produces
rs12255619 A/C chr10 AA AA AC AA AA AA AA AA AA AC AA AC=2;AA=9
rs7909677 A/G chr10 AA AA AA AA AA AA AA AA AA AA CC CC=1;AA=10
#!/bin/bash
strings="AA AC CC"
while read line; do
echo -n "$line: "
for name in $strings; do
num=$(echo $line | xargs -n1 | grep -cw $name)
if [[ $num -ne 0 ]]; then
echo -n "$name=$num;"
fi
done
echo
done < inputFile.txt
I am trying to use awk to edit files but I cant manage to do it without creating intermediate files.
Basicaly I want to search using column 1 in file2 and file3 and so on, and replace the 2nd column for matching 1st column lines. (note that file2 and file3 may contain other stuff)
I have
File1.txt
aaa 111
aaa 222
bbb 333
bbb 444
File2.txt
zzz zzz
aaa 999
zzz zzz
aaa 888
File3.txt
bbb 000
bbb 001
yyy yyy
yyy yyy
Desired output
aaa 999
aaa 888
bbb 000
bbb 001
this does what you specified but I guess there are many edge cases not covered.
$ awk 'NR==FNR{a[$1]; next} $1 in a' file{1..3}
aaa 999
aaa 888
bbb 000
bbb 001
I have an Mview that brings data group by idNumber and Month. So I want to display 0 if there is no data for an specific month. This is my query:
select MonthName, myCost, myNumber
from
(
select MONTH mm, myCost, myNumber
from myOracle_mv
) myTotals,
(
select to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from dual
connect by level <= 12
) ALLMONTHS
where mm = MonthName
So I was expecting:
Month Number Data
-----------------------
1 abc123 4444
2 0
3 abc123 4444
4 abc123 4444
5 0
6 abc123 4444
7 abc123 4444
8 0
9 abc123 4444
10 abc123 4444
11 0
12 abc123 4444
Instead I'm still getting:
1 abc123 4444
3 abc123 4444
4 abc123 4444
6 abc123 4444
7 abc123 4444
9 abc123 4444
10 abc123 4444
12 abc123 4444
Any ideas?
Thanks!
EDIT: Thanks for the answers. I did have the outer join in my Query but forgot to type in because was concentrating in changes the names of the table/columns.
So yes, I have tried with the OUTER JOIN and I still not getting the expected results. Any feedback is greatly appreciated.
EDIT: This is the data on myOracle_MV:
3777.24 AAA 1 2012
49973.12 AAA 2 2012
4049.91 AAA 3 2012
469.485 AAA 4 2012
5872.22 AAA 5 2012
65837.71 AAA 6 2012
566.23 AAA 7 2012
18432.95 AAA 8 2012
4337.75 AAA 12 2011
18811 BBB 1 2012
29872.67 BBB 2 2012
29068.55 BBB 3 2012
264957.8 BBB 4 2012
67673 BBB 5 2012
855.02 BBB 6 2012
5226.1 BBB 7 2012
2663.24 BBB 8 2012
5490.58 BBB 12 2011
3845.47 CCC 1 2012
3050.54 CCC 2 2012
3784.44 CCC 3 2012
799.73 CCC 4 2012
124884.2 CCC 5 2012
5157.24 CCC 6 2012
19184.78 CCC 7 2012
2280.05 CCC 8 2012
107.07 DDD 3 2012
181.78 DDD 4 2012
110.09 DDD 5 2012
18016.19 DDD 6 2012
1772.95 DDD 7 2012
63.32 DDD 8 2012
Very similar to exsiting answers, but this:
select months.month, mv.mycost, coalesce(mv.mynumber, 0) as mynumber
from (
select to_char(date '1970-01-01'
+ numtoyminterval(level - 1, 'month'), 'mm') as month
from dual
connect by level <= 12) months
left join myoracle_mv mv
on mv.month = months.month
order by months.month, mv.mycost, mv.mynumber;
gives this with the data you posted:
MONTH MYCOST MYNUMBER
----- ------ ----------
01 AAA 3777.24
01 BBB 18811
01 CCC 3845.47
02 AAA 49973.12
02 BBB 29872.67
02 CCC 3050.54
03 AAA 4049.91
03 BBB 29068.55
03 CCC 3784.44
03 DDD 107.07
04 AAA 469.485
04 BBB 264957.8
04 CCC 799.73
04 DDD 181.78
05 AAA 5872.22
05 BBB 67673
05 CCC 124884.2
05 DDD 110.09
06 AAA 65837.71
06 BBB 855.02
06 CCC 5157.24
06 DDD 18016.19
07 AAA 566.23
07 BBB 5226.1
07 CCC 19184.78
07 DDD 1772.95
08 AAA 18432.95
08 BBB 2663.24
08 CCC 2280.05
08 DDD 63.32
09 0
10 0
11 0
12 AAA 4337.75
12 BBB 5490.58
35 rows selected
If you want a zero to appear in the mynumber column then you can make that:
select months.month, mv.mycost, coalesce(mv.mynumber, 0) as mynumber
which gives:
...
08 DDD 63.32
09 0
10 0
11 0
12 AAA 4337.75
...
From the comments on Jafar's answer it sounds like maybe you'd got that far on your own but you want zero values for all mycost values for all months. If that is the case then you need to get the list of possible values for mycost and outer join to that as well. This is taking all values that are in the MV already:
select months.month, costs.mycost, coalesce(mv.mynumber, 0) as mynumber
from (
select to_char(date '1970-01-01'
+ numtoyminterval(level - 1, 'month'), 'mm') as month
from dual
connect by level <= 12) months
cross join (
select distinct mycost
from myoracle_mv) costs
left join myoracle_mv mv
on mv.month = months.month
and mv.mycost = costs.mycost
order by months.month, costs.mycost, mv.mynumber;
and gives:
MONTH MYCOST MYNUMBER
----- ------ ----------
01 AAA 3777.24
01 BBB 18811
01 CCC 3845.47
01 DDD 0
02 AAA 49973.12
02 BBB 29872.67
02 CCC 3050.54
02 DDD 0
03 AAA 4049.91
03 BBB 29068.55
03 CCC 3784.44
03 DDD 107.07
04 AAA 469.485
04 BBB 264957.8
04 CCC 799.73
04 DDD 181.78
05 AAA 5872.22
05 BBB 67673
05 CCC 124884.2
05 DDD 110.09
06 AAA 65837.71
06 BBB 855.02
06 CCC 5157.24
06 DDD 18016.19
07 AAA 566.23
07 BBB 5226.1
07 CCC 19184.78
07 DDD 1772.95
08 AAA 18432.95
08 BBB 2663.24
08 CCC 2280.05
08 DDD 63.32
09 AAA 0
09 BBB 0
09 CCC 0
09 DDD 0
10 AAA 0
10 BBB 0
10 CCC 0
10 DDD 0
11 AAA 0
11 BBB 0
11 CCC 0
11 DDD 0
12 AAA 4337.75
12 BBB 5490.58
12 CCC 0
12 DDD 0
48 rows selected
But hopefully you have another table that holds the possble mycost values (assuming that's representing something like a cost center, rather than a price; slightly hard to tell what's what) and you can use that instead of the subquery.
SQL Fiddle.
Also note that if you wanted to add a filter, e.g. to restrict data to a particular year, you'd need to do that in the in the left join clause, not as a where clause, or you'd revert the outer join to an inner one. For example, adding this:
where mv.year = 2011
would mean you only got back two rows:
MONTH MYCOST MYNUMBER
----- ------ ----------
12 AAA 4337.75
12 BBB 5490.58
But if you made than another condition on the outer join you'd still get 48 rows back, with 46 of them having zeros and two having the values above:
...
left join myoracle_mv mv
on mv.month = months.month
and mv.mycost = costs.mycost
and mv.year = 2011
order by months.month, costs.mycost, mv.mynumber;
...
11 CCC 0
11 DDD 0
12 AAA 4337.75
12 BBB 5490.58
12 CCC 0
12 DDD 0
48 rows selected
You'd need to do an outer join between the two inline views
select MonthName, myCost, myNumber
from (select MONTH mm, myCost, myNumber
from myOracle_mv
) myTotals
right outer join
(select to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from dual
connect by level <= 12) ALLMONTHS
on( myTotals.mm = allmonths.MonthName )
You can also use the old Oracle-specific (+) syntax for outer joins but I would generally suggest using the SQL standard syntax.
Maybe something like this
select MonthName, COALESCE(myCost,0), myNumber
from
(
select to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from dual
connect by level <= 12
) ALLMONTHS LEFT OUTER JOIN
(
select MONTH mm, myCost, myNumber
from myOracle_mv
) myTotals ON
mm = MonthName
you need an outer join ( (+) at the end of your query ):
select MonthName, myCost, myNumber from
(
select
MONTH mm, myCost, myNumber
from
myOracle_mv
) myTotals,
(
select
to_char(date '2012-12-1' + numtoyminterval(level,'month'), 'mm') MonthName
from
dual
connect by level <= 12
)ALLMONTHS
where mm = MonthName(+)
for your example, you don't need to calculate dates:
select MONTH mm, NVL(myCost, 0), myNumber
from
(select level from dual connect by level <= 12) NUM
left outer join myOracle_mv MV ON ( MV.MONTH = NUM.level )
;
I have a file like this
1 CC AAA
1 Na AAA
1 Na AAA
1 Na AAA
1 Na AAA
1 CC BBB
1 Na BBB
1 Na BBB
1 xa BBB
1 CC CCC
1 Na CCC
1 da CCC
I would like to remove the column 2 and then replce with "01" for AAA, "02" for BBB and so on for entire file. Finally the output should looks like,
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 02 BBB
1 02 BBB
1 02 BBB
1 02 BBB
1 03 CCC
1 03 CCC
1 03 CCC
I dont have any clue to make this working. Please help me if possible. Here in every cc the new variable starts. that is from AAA to BBB can be track by only CC in 2nd column.
One way of doing it in awk:
awk '$3!=a&&NF{a=$3;x=sprintf("%02d",++x);print $1,x,$3;next}$3==a&&NF{print $1,x,$3;next }1' inputFile
Here's one way using awk:
awk '$3 != r { ++i } { $2 = sprintf ("%02d", i) } { r = $3 }1' OFS="\t" file
I've set the OFS to a tab-char, but you can choose what you like. Results:
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 02 BBB
1 02 BBB
1 02 BBB
1 02 BBB
1 03 CCC
1 03 CCC
1 03 CCC
Seems like you want:
awk '$2=="CC" { a+=1 } {$2=sprintf("%02d",a)} 1' input