Its possible to Group by data instead of column? - oracle

I was thinking, if it was possible to use GROUP BY based on the data of a certaint column in a expecific way, instead of the column. So my question is can i create groups based on the 0 occurence of a certant field.
DIA MES YEAR TODAY TOMORROW ANALYSIS LIMIT
---------- ---------- ---------- ---------- ---------- ---------- ----------
19 9 2016 111 988 0 150
20 9 2016 988 853 853 150
21 9 2016 853 895 895 150
22 9 2016 895 776 776 150
23 9 2016 776 954 0 150
26 9 2016 954 968 968 150
27 9 2016 968 810 810 150
28 9 2016 810 937 937 150
29 9 2016 937 769 769 150
30 9 2016 769 1020 0 150
3 10 2016 1020 923 923 150
4 10 2016 923 32 32 150
Like, in this case, i would want to create groups, like this:
Group 1 (Analysis): 0
Group 2(Analysis): 853, 895,776,0
Group 3(Analysis): 968,810,937,169,0
...

Assuming your table name is tbl, something like this should work (it's called "start-of-group" method if you want to Google it):
select
from ( select tbl.*,
count(case when analysis = 0 then 1 end)
over (order by year, mes, dia) as cnt
from tbl
)
where ...
GROUP BY cnt
;

Related

How can I sort csv data alphabetically then numerically by column?

If I have a set of data that has repeating name values but with different variations per repeating value, how can I sort by the top of each of those repeating values? Hopefully that made sense, but I hope to demonstrate what I mean further below.
Take for example this set of data in a tab separated csv file
Ranking ID Year Make Model Total
1 128 2010 Infiniti G37 128
2 124 2015 Jeep Wrangler 124
3 15 014 Audi S4 120
4 113 2012 Acura Tsx sportwagon 116
5 83 2014 Honda Accord 112
6 112 2008 Acura TL 110
7 65 2009 Honda Fit 106
8 91 2010 Mitsu Lancer 102
9 50 2015 Acura TLX 102
10 31 2007 Honda Fit 102
11 216 2007 Chrystler 300 96
12 126 2010 Volkswagen Eos 92
13 13 2016 Honda Civic 1.5t 92
If you look in the Make column, you can see names like Acura and Honda repeat, with differences in the Model and Total column. Assume that there's 200 or so rows of this in the csv file. How can I sort the file so that the items are grouped by Make with only three of the highest in value under the Total column being displayed by each Make?
Expected output below
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 106
4 83 2014 Honda Accord 112
5 31 2007 Honda Fit 102
6 13 2016 Honda Civic 1.5t 92
...
Here is my awk code so far, I can't get past this part to even attempt grouping the makes by total column
BEGIN {
FS = OFS = "\t";
}
FNR == 1 {
print;
next;
}
FNR > 1 {
a[NR] = $4;
}
END {
PROCINFO["sorted_in"] = "#val_str_desc"
for(i = 1; i < FN-1; i++) {
print a[i];
}
}
Currently, my code reads the text file, prints the headers (column titles) and then stops there, it doesn't go on to print out the rest of the data in alphabetical order. Any ideas?
The following assumes bash (if you don't use bash replace $'\t' by a quoted real tab character) and GNU coreutils. It also assumes that you want to sort alphabetically by Make column first, then numerically in decreasing order by Total, and finally keep at most the first 3 of each Make entries.
Sorting is a job for sort, head and tail can be used to isolate the header line, and awk can be used to keep maximum 3 of each Make, and re-number the first column:
$ head -n1 data.tsv; tail -n+2 data.tsv | sort -t$'\t' -k4,4 -k6,6rn |
awk -F'\t' -vOFS='\t' '$4==p {n+=1} $4!=p {n=1;p=$4} {$1=++r} n<=3'
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
10 128 2010 Infiniti G37 128
11 124 2015 Jeep Wrangler 124
12 91 2010 Mitsu Lancer 102
13 126 2010 Volkswagen Eos 92
Note that this is different from your expected output: Make is sorted in alphabetic order (Audi comes after Acura, not Honda) and only the 3 largest Total are kept (112, 106, 102 for Honda, not 112, 102, 92).
If you use GNU awk, and your input file is small enough to fit in memory, you can also do all this with just awk, thanks to its multidimensional arrays and its asorti function, that sorts arrays based on indices:
$ awk -F'\t' -vOFS='\t' 'NR==1 {print; next} {l[$4][$6][$0]}
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for(m in l) {
n = asorti(l[m], t, "#ind_num_desc"); n = (n>3) ? 3 : n
for(i=1; i<=n; i++) for(s in l[m][t[i]]) {$0 = s; $1 = ++r; print}
}
}' data.tsv
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
9 128 2010 Infiniti G37 128
10 124 2015 Jeep Wrangler 124
11 91 2010 Mitsu Lancer 102
12 126 2010 Volkswagen Eos 92
Using GNU awk for arrays of arrays and sorted_in:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR == 1 {
print
next
}
{
rows[$4][$6][++numRows[$4,$6]] = $0
}
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for ( make in rows ) {
PROCINFO["sorted_in"] = "#ind_num_desc"
cnt = 0
for ( total in rows[make] ) {
for ( rowNr=1; rowNr<=numRows[make,total]; rowNr++ ) {
if ( ++cnt <= 3 ) {
row = rows[make][total][rowNr]
print row, cnt
}
}
}
}
}
$ awk -f tst.awk file
Ranking ID Year Make Model Total
4 113 2012 Acura Tsx sportwagon 116 1
6 112 2008 Acura TL 110 2
9 50 2015 Acura TLX 102 3
3 15 014 Audi S4 120 1
11 216 2007 Chrystler 300 96 1
5 83 2014 Honda Accord 112 1
7 65 2009 Honda Fit 106 2
10 31 2007 Honda Fit 102 3
1 128 2010 Infiniti G37 128 1
2 124 2015 Jeep Wrangler 124 1
8 91 2010 Mitsu Lancer 102 1
12 126 2010 Volkswagen Eos 92 1
The above will handle cases where multiple cars of 1 make have the same total by always just printing the top 3 rows for that make, e.g. gven this input where 4 Acuras all have 116 total:
$ cat file
Ranking ID Year Make Model Total
1 128 2010 Infiniti G37 128
2 124 2015 Jeep Wrangler 124
3 15 014 Audi S4 120
4 113 2012 Acura Tsx sportwagon 116
4 113 2012 Acura Foo 116
4 113 2012 Acura Bar 116
4 113 2012 Acura Other 116
5 83 2014 Honda Accord 112
6 112 2008 Acura TL 110
7 65 2009 Honda Fit 106
8 91 2010 Mitsu Lancer 102
9 50 2015 Acura TLX 102
10 31 2007 Honda Fit 102
11 216 2007 Chrystler 300 96
12 126 2010 Volkswagen Eos 92
13 13 2016 Honda Civic 1.5t 92
this is the output showing just 3 of those 4 116 Acuras:
$ awk -f tst.awk file
Ranking ID Year Make Model Total
4 113 2012 Acura Tsx sportwagon 116 1
4 113 2012 Acura Foo 116 2
4 113 2012 Acura Bar 116 3
3 15 014 Audi S4 120 1
11 216 2007 Chrystler 300 96 1
5 83 2014 Honda Accord 112 1
7 65 2009 Honda Fit 106 2
10 31 2007 Honda Fit 102 3
1 128 2010 Infiniti G37 128 1
2 124 2015 Jeep Wrangler 124 1
8 91 2010 Mitsu Lancer 102 1
12 126 2010 Volkswagen Eos 92 1
If that's not what you want then move the if ( ++cnt <= 3 ) test to the outer loop or handle it however else you want.

splitting file to smaller max n-chars files without cutting any line

Here is a sample input text file generated with the cal command:
$ cal 2743 > sample_text
In this example this file have 2180 characters
$ wc sample_text
36 462 2180 sample_text
I want to split it into smaller files each one having no more than 700 lines but preserving lines in complete state (no line can be cut)
I can view each such block with following awk code:
$ awk '{l=length+l;if(l<=700){print l,$0}else{l=length;print "\nnext block\n",l,$0}}' sample_text
32 2743
98 January February March
164 Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
230 1 2 1 2 3 4 5 6 1 2 3 4 5 6
296 3 4 5 6 7 8 9 7 8 9 10 11 12 13 7 8 9 10 11 12 13
362 10 11 12 13 14 15 16 14 15 16 17 18 19 20 14 15 16 17 18 19 20
428 17 18 19 20 21 22 23 21 22 23 24 25 26 27 21 22 23 24 25 26 27
494 24 25 26 27 28 29 30 28 28 29 30 31
560 31
560
626 April May June
692 Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
next block
66 1 2 3 1 1 2 3 4 5
132 4 5 6 7 8 9 10 2 3 4 5 6 7 8 6 7 8 9 10 11 12
198 11 12 13 14 15 16 17 9 10 11 12 13 14 15 13 14 15 16 17 18 19
264 18 19 20 21 22 23 24 16 17 18 19 20 21 22 20 21 22 23 24 25 26
330 25 26 27 28 29 30 23 24 25 26 27 28 29 27 28 29 30
396 30 31
396
462 July August September
528 Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
594 1 2 3 1 2 3 4 5 6 7 1 2 3 4
660 4 5 6 7 8 9 10 8 9 10 11 12 13 14 5 6 7 8 9 10 11
next block
66 11 12 13 14 15 16 17 15 16 17 18 19 20 21 12 13 14 15 16 17 18
132 18 19 20 21 22 23 24 22 23 24 25 26 27 28 19 20 21 22 23 24 25
198 25 26 27 28 29 30 31 29 30 31 26 27 28 29 30
264
264
330 October November December
396 Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
462 1 2 1 2 3 4 5 6 1 2 3 4
528 3 4 5 6 7 8 9 7 8 9 10 11 12 13 5 6 7 8 9 10 11
594 10 11 12 13 14 15 16 14 15 16 17 18 19 20 12 13 14 15 16 17 18
660 17 18 19 20 21 22 23 21 22 23 24 25 26 27 19 20 21 22 23 24 25
next block
66 24 25 26 27 28 29 30 28 29 30 26 27 28 29 30 31
132 31
I have the problem to save each max 700 chars block into separate file - with following command it only produces one file.0, and expected were split files file.0, file.1, file.2 and file.3 for this input example
$ awk 'c=0;{l=length+l;if(l<=700){print>"file."c}else{c=c++;l=length;print>"file."c}}' sample_text
$ cksum *
3868619974 2180 file.0
3868619974 2180 sample_text
This should do it:
BEGIN {
maxChars = 700
out = "file.0"
}
{
numChars = length($0)
totChars += numChars
if ( totChars > maxChars ) {
close(out)
out = "file." ++cnt
totChars = numChars
}
print > out
}

Replace exact numbers in a column keeping order

I have this file, and I would like to replace the number of the 3rd column so that they appear in order. Also, I would need to skip the first row (the header of the file).
Initial file:
#results from program A
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 3 CTTG 61 145M
2005 17 3 TTCG 30 145M
91823 17 4 ATGAAGC 22 146M
91823 17 4 GTAGGCC 19 146M
16523 17 5 GGGGGTCGGT 45 30M1D115M
Modified file:
#results from program A
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 2 CTTG 61 145M
2005 17 2 TTCG 30 145M
91823 17 3 ATGAAGC 22 146M
91823 17 3 GTAGGCC 19 146M
16523 17 4 GGGGGTCGGT 45 30M1D115M
Do you know how I could do it?
Could you please try following.
awk 'prev!=$1{++count}{$3=count;prev=$1;$1=$1} 1' OFS="\t" Input_file
To remove headers use following:
awk 'FNR==1{print;next}prev!=$1{++count}{$3=count;prev=$1;$1=$1} 1' OFS="\t" Input_file
Solution 2nd: In case your Input_file's 1st field is NOT in order then following may help you here.
awk 'FNR==NR{if(!a[$1]++){b[$1]=++count};next} {$3=b[$1];$1=$1} 1' OFS="\t" Input_file Input_file
To remove headers for 2nd solution above use following.
awk 'FNR==1{if(++val==1){print};next}FNR==NR{if(!a[$1]++){b[$1]=++count};next} {$3=b[$1];$1=$1} 1' OFS="\t" Input_file Input_file
another minimalist awk
$ awk '{$3=c+=p!=$1;p=$1}1' file | column -t
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 2 CTTG 61 145M
2005 17 2 TTCG 30 145M
91823 17 3 ATGAAGC 22 146M
91823 17 3 GTAGGCC 19 146M
16523 17 4 GGGGGTCGGT 45 30M1D115M
with header version
$ awk 'NR==1; NR>1{$3=c+=p!=$1;p=$1; print | "column -t"}' file
#results from program A
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 2 CTTG 61 145M
2005 17 2 TTCG 30 145M
91823 17 3 ATGAAGC 22 146M
91823 17 3 GTAGGCC 19 146M
16523 17 4 GGGGGTCGGT 45 30M1D115M

cumsum with more than 1 variable using ddply

I'm trying to get cumsum for more than one variable using ddply, but it's not working.
I'm using this code:
ddply(.data=Summaryday, .variables=('DaysToClose_'),.fun=transform,
cumsumPosit=cumsum(PositCount),
cumsumNegat=cumsum(NegatCount))
but the result isn't correct:
DaysToClose_ PositCount NegatCount cumsumPosit cumsumNegat
1 1 7340 27256 7340 27256
2 2 2243 7597 2243 7597
3 3 1526 4545 1526 4545
4 4 1315 3756 1315 3756
5 5 1142 3320 1142 3320
6 6 1216 3118 1216 3118
7 7 1252 3324 1252 3324
8 8 1180 3077 1180 3077
9 9 975 2053 975 2053
10 10 684 1429 684 1429
11 11 613 1244 613 1244
12 12 596 1199 596 1199
13 13 542 1218 542 1218
14 14 711 1434 711 1434
15 15 645 1333 645 1333
16 16 577 899 577 899
17 17 373 667 373 667
18 18 369 656 369 656
19 19 340 624 340 624
If someone can help me on this, I appreciate that.
I am not sure why you would use ddply here. You can't really subset by DaysToClose because each row is then a unique subset, and so you always get cumsum of a single value. Maybe you'd want to use mutate instead
library(tidyverse)
data %>% mutate(cumsumPosit = cumsum(PositCount),
cumsumNegat = cumsum(NegatCount))

Creating frequency interval in Crystal Report

I am trying to create a dataset of frequency interval in crystal report something like below. First column is rowid, second is start interval , third column is end interval and fourth column is interval name.
1 0 29 0 - 29
2 30 59 30 - 59
3 60 89 60 - 89
4 90 119 90 - 119
5 120 149 120 - 149
6 150 179 150 - 179
7 180 209 180 - 209
8 210 239 210 - 239
9 240 269 240 - 269
10 270 299 270 - 299
11 300 329 300 - 329
12 330 359 330 - 359
13 360 389 360 - 389
14 390 419 390 - 419
15 420 449 420 - 449
16 450 479 450 - 479
17 480 509 480 - 509
18 510 539 510 - 539
19 540 569 540 - 569
20 570 599 570 - 599
21 600 629 600 - 629
22 630 659 630 - 659
23 660 689 660 - 689
24 690 719 690 - 719
25 720 749 720 - 749
26 750 779 750 - 779
27 780 809 780 - 809
28 810 839 810 - 839
29 840 869 840 - 869
30 870 899 870 - 899
Can I write a CTE to generate this interval so that I can use it directly in crystal report without writing function on database side? Below is the code which I wrote:
declare intervalStart integer := 0;
intervalEnd integer := 900;
intervalMins varchar(10) := 30;
totalIntervals number := 0;
begin
begin
execute immediate 'create global temporary table intervalTable (row_Id int not null, intStart integer, intEnd integer, intervalName varchar2(25))ON COMMIT DELETE ROWS';
exception when others then dbms_output.put_line(sqlerrm);
end;
totalIntervals := intervalEnd/intervalMins;
--dbms_output.put_line(totalIntervals);
for i in 1 ..totalIntervals loop
intervalStart := 0;
intervalEnd := 0;
intervalStart := intervalStart + (i-1)*intervalMins;
intervalEnd := intervalEnd + (i*intervalMins)-1;
--dbms_output.put_line(intervalStart || ' - ' || intervalEnd);
insert into intervalTable
(
row_id,
intStart,
intEnd,
intervalName
)
values(i, intervalStart, intervalEnd, (intervalStart || ' - ' || intervalEnd));
end loop;
end;
I think you want something like this:
with freq_data as (
select level as id, (level-1)*30 as start_interval, ((level-1)*30) + 29 as end_interval, (level-1)*30 || ' - ' || to_char(((level-1)*30) + 29) as label
from dual
connect by level <= 30
order by level
)
select * from freq_data;
Output
ID START_INTERVAL END_INTERVAL LABEL
1 0 29 0 - 29
2 30 59 30 - 59
3 60 89 60 - 89
4 90 119 90 - 119
5 120 149 120 - 149
6 150 179 150 - 179
7 180 209 180 - 209
8 210 239 210 - 239
9 240 269 240 - 269
10 270 299 270 - 299
11 300 329 300 - 329
12 330 359 330 - 359
13 360 389 360 - 389
14 390 419 390 - 419
15 420 449 420 - 449
16 450 479 450 - 479
17 480 509 480 - 509
18 510 539 510 - 539
19 540 569 540 - 569
20 570 599 570 - 599
21 600 629 600 - 629
22 630 659 630 - 659
23 660 689 660 - 689
24 690 719 690 - 719
25 720 749 720 - 749
26 750 779 750 - 779
27 780 809 780 - 809
28 810 839 810 - 839
29 840 869 840 - 869
30 870 899 870 - 899
An example using the above in a join query:
create table my_test
(
num number
-- other important data ...
);
-- insert some random numbers
insert into my_test
select trunc(DBMS_RANDOM.VALUE(0,900))
from dual
connect by level <= 10;
commit;
Now joining to get the label for each num field:
with freq_data as (
select level as id, (level-1)*30 as start_interval, ((level-1)*30) + 29 as end_interval, (level-1)*30 || ' - ' || to_char(((level-1)*30) + 29) as label
from dual
connect by level <= 30
order by level
)
select t.num, d.label
from my_test t
left join freq_data d ON (t.num between d.start_interval and d.end_interval);
Output:
NUM LABEL
64 60 - 89
73 60 - 89
128 120 - 149
154 150 - 179
267 240 - 269
328 300 - 329
550 540 - 569
586 570 - 599
745 720 - 749
795 780 - 809

Resources