query to capture rows if value changes in a column - hadoop

I need to capture those specific rows where there is a change in value of a specific column like "Toggle"
I have the below data :
ID ROW Toggle Date
661 1 1 2017-03-01
661 2 1 2017-03-02
661 3 1 2017-03-03
661 4 1 2017-03-04
661 5 1 2017-03-05
661 6 1 2017-03-06
661 7 1 2017-03-07
661 8 1 2017-03-08
661 9 1 2017-03-09
661 10 1 2017-03-10
661 11 1 2017-03-11
661 12 1 2017-03-12
661 13 1 2017-03-13
661 14 1 2017-03-14
661 15 1 2017-03-15
661 16 1 2017-03-16
661 17 1 2017-03-17
661 18 1 2017-03-18
661 19 1 2017-03-19
661 20 1 2017-03-20
661 21 1 2017-03-21
661 22 1 2017-03-22
661 23 1 2017-03-23
661 24 1 2017-03-24
661 25 1 2017-03-25
661 26 1 2017-03-26
661 27 1 2017-03-27
661 28 1 2017-03-28
661 29 1 2017-03-29
661 30 1 2017-03-30
661 31 1 2017-03-31
661 32 1 2017-04-01
661 33 1 2017-04-02
661 34 1 2017-04-03
661 35 1 2017-04-04
661 36 1 2017-04-05
661 37 0 2017-04-06
661 38 0 2017-04-07
661 39 0 2017-04-08
661 40 0 2017-04-09
Query used :
select b.id, b.ROW b.tog, b.ts
from
(select id, ts, tog,
ROW_NUMBER() OVER (order by ts ASC) as ROW
from database.source_table
where id = 661
) b
Can anyone help me with the query so that I can fetch only 1st and 37th row from source table?

Use row_number() + filter. This query will output 1st and 37th row:
select b.id, b.ROW, b.toggle, b.date
from
(select id, date, toggle,
ROW_NUMBER() OVER (partition by id, toggle order by date ASC) as rn,
ROW_NUMBER() OVER (partition by id order by date ASC) as ROW
from test_table
where id = 661
) b
where rn=1
order by date asc
Result:
OK
661 1 1 2017-03-01
661 37 0 2017-04-06
Time taken: 192.38 seconds, Fetched: 2 row(s)

Related

Replace exact numbers in a column keeping order

I have this file, and I would like to replace the number of the 3rd column so that they appear in order. Also, I would need to skip the first row (the header of the file).
Initial file:
#results from program A
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 3 CTTG 61 145M
2005 17 3 TTCG 30 145M
91823 17 4 ATGAAGC 22 146M
91823 17 4 GTAGGCC 19 146M
16523 17 5 GGGGGTCGGT 45 30M1D115M
Modified file:
#results from program A
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 2 CTTG 61 145M
2005 17 2 TTCG 30 145M
91823 17 3 ATGAAGC 22 146M
91823 17 3 GTAGGCC 19 146M
16523 17 4 GGGGGTCGGT 45 30M1D115M
Do you know how I could do it?
Could you please try following.
awk 'prev!=$1{++count}{$3=count;prev=$1;$1=$1} 1' OFS="\t" Input_file
To remove headers use following:
awk 'FNR==1{print;next}prev!=$1{++count}{$3=count;prev=$1;$1=$1} 1' OFS="\t" Input_file
Solution 2nd: In case your Input_file's 1st field is NOT in order then following may help you here.
awk 'FNR==NR{if(!a[$1]++){b[$1]=++count};next} {$3=b[$1];$1=$1} 1' OFS="\t" Input_file Input_file
To remove headers for 2nd solution above use following.
awk 'FNR==1{if(++val==1){print};next}FNR==NR{if(!a[$1]++){b[$1]=++count};next} {$3=b[$1];$1=$1} 1' OFS="\t" Input_file Input_file
another minimalist awk
$ awk '{$3=c+=p!=$1;p=$1}1' file | column -t
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 2 CTTG 61 145M
2005 17 2 TTCG 30 145M
91823 17 3 ATGAAGC 22 146M
91823 17 3 GTAGGCC 19 146M
16523 17 4 GGGGGTCGGT 45 30M1D115M
with header version
$ awk 'NR==1; NR>1{$3=c+=p!=$1;p=$1; print | "column -t"}' file
#results from program A
8536 17 1 CGTCGCCTAT 116 147M2D
8536 17 1 CGTCGCTTAT 116 147M2D
8536 17 1 CGTTGCCTAT 116 147M2D
8536 17 1 CGTTGCTTAT 116 147M2D
2005 17 2 CTTG 61 145M
2005 17 2 TTCG 30 145M
91823 17 3 ATGAAGC 22 146M
91823 17 3 GTAGGCC 19 146M
16523 17 4 GGGGGTCGGT 45 30M1D115M

Laravel groupby id get newest records

I have :
ID | BRAND_ID | CUST_ID | EXPIRY_DATE | CREATED_DATE
1 1 22 2018-02-02 2018-01-01 00:00:00
2 1 22 2018-02-02 2018-02-02 00:00:00
3 1 22 2019-02-02 2018-02-02 00:05:00
4 1 22 2019-02-02 2018-02-02 00:05:00
5 1 22 2018-02-02 2018-02-02 00:07:00
6 1 22 2018-02-02 2018-02-02 00:07:00
trying to get the last newest records grouping by custid
->groupBy('CUST_ID')
->ordrBy('CREATED_DATE', 'desc')
but i'm getting the first 2 rows when i add groupBy, not getting the last 2

cumsum with more than 1 variable using ddply

I'm trying to get cumsum for more than one variable using ddply, but it's not working.
I'm using this code:
ddply(.data=Summaryday, .variables=('DaysToClose_'),.fun=transform,
cumsumPosit=cumsum(PositCount),
cumsumNegat=cumsum(NegatCount))
but the result isn't correct:
DaysToClose_ PositCount NegatCount cumsumPosit cumsumNegat
1 1 7340 27256 7340 27256
2 2 2243 7597 2243 7597
3 3 1526 4545 1526 4545
4 4 1315 3756 1315 3756
5 5 1142 3320 1142 3320
6 6 1216 3118 1216 3118
7 7 1252 3324 1252 3324
8 8 1180 3077 1180 3077
9 9 975 2053 975 2053
10 10 684 1429 684 1429
11 11 613 1244 613 1244
12 12 596 1199 596 1199
13 13 542 1218 542 1218
14 14 711 1434 711 1434
15 15 645 1333 645 1333
16 16 577 899 577 899
17 17 373 667 373 667
18 18 369 656 369 656
19 19 340 624 340 624
If someone can help me on this, I appreciate that.
I am not sure why you would use ddply here. You can't really subset by DaysToClose because each row is then a unique subset, and so you always get cumsum of a single value. Maybe you'd want to use mutate instead
library(tidyverse)
data %>% mutate(cumsumPosit = cumsum(PositCount),
cumsumNegat = cumsum(NegatCount))

Its possible to Group by data instead of column?

I was thinking, if it was possible to use GROUP BY based on the data of a certaint column in a expecific way, instead of the column. So my question is can i create groups based on the 0 occurence of a certant field.
DIA MES YEAR TODAY TOMORROW ANALYSIS LIMIT
---------- ---------- ---------- ---------- ---------- ---------- ----------
19 9 2016 111 988 0 150
20 9 2016 988 853 853 150
21 9 2016 853 895 895 150
22 9 2016 895 776 776 150
23 9 2016 776 954 0 150
26 9 2016 954 968 968 150
27 9 2016 968 810 810 150
28 9 2016 810 937 937 150
29 9 2016 937 769 769 150
30 9 2016 769 1020 0 150
3 10 2016 1020 923 923 150
4 10 2016 923 32 32 150
Like, in this case, i would want to create groups, like this:
Group 1 (Analysis): 0
Group 2(Analysis): 853, 895,776,0
Group 3(Analysis): 968,810,937,169,0
...
Assuming your table name is tbl, something like this should work (it's called "start-of-group" method if you want to Google it):
select
from ( select tbl.*,
count(case when analysis = 0 then 1 end)
over (order by year, mes, dia) as cnt
from tbl
)
where ...
GROUP BY cnt
;

Oracle grand total column and row

I have this table as a result from another query
STATUS R1 R2 R3 R4 R5 R6 R7 R8 R9
----------------------------------------------------
ACCEPTED 322 241 278 473 575 595 567 449 605
ADECUACIONES 0 0 0 0 2 0 1 0 50
AET 0 0 2 0 0 0 0 0 11
EXECUTED 0 80 1 18 9 57 34 30 20
IN PROCESS 0 0 0 0 0 4 25 2 112
FREQ 0 55 2 76 25 117 7 73 48
INSTALL 1 4 1 10 5 14 2 13 62
WO INSTALL 9 2 51 24 143 17 15 59 16
WOT VL 0 1 0 0 1 0 0 0 0
OTHER 22 7 20 28 44 30 6 6 109
PROG 1 0 1 0 0 2 3 0 0
PTE PROG 0 5 0 0 0 0 3 19 93
TMX 0 0 0 28 4 8 11 3 14
PROJ 0 1 12 26 13 8 0 2 4
What I expect to have is this
STATUS R1 R2 R3 R4 R5 R6 R7 R8 R9 TOTAL
----------------------------------------------------------
ACCEPTED 322 241 278 473 575 595 567 449 605 4105
ADECUACIONES 0 0 0 0 2 0 1 0 50 53
AET 0 0 2 0 0 0 0 0 11 13
EXECUTED 0 80 1 18 9 57 34 30 20 249
IN PROCESS 0 0 0 0 0 4 25 2 112 143
FREQ 0 55 2 76 25 117 7 73 48 403
INSTALL 1 4 1 10 5 14 2 13 62 112
WO INSTALL 9 2 51 24 143 17 15 59 16 336
WOT VL 0 1 0 0 1 0 0 0 0 2
OTHER 22 7 20 28 44 30 6 6 109 272
PROG 1 0 1 0 0 2 3 0 0 7
PTE PROG 0 5 0 0 0 0 3 19 93 120
TMX 0 0 0 28 4 8 11 3 14 68
PROJ 0 1 12 26 13 8 0 2 4 66
TOTAL 355 396 368 683 821 852 674 656 1144 5949
I've been playing with grouping() and rollup(), but I always get duplicated rows and unwanted null values.
If you have problems, grouping_id function will help you.
(You can select grouping_id(col), but also grouping_id(col1, col2, col3, etc..))
But your case is simpler.
It is like:
drop table fg_test_group;
create table fg_test_group (a number, b number, c number, d number);
insert into fg_test_group values (1, 2, 3, 4);
insert into fg_test_group values (2, 2, 3, 4);
insert into fg_test_group values (3, 2, 3, 4);
select nvl(to_char(a), 'total') as a , sum(b), sum(c), sum(d), grouping_id(a)
from fg_test_group
group by rollup (a)
;
where a is Status in your case.
CREATE TABLE TEST1 (STATUS VARCHAR2(10), R1 NUMBER, R2 NUMBER, R3 NUMBER);
INSERT INTO TEST1 VALUES ('ACCEPTED', 322,241,278);
INSERT INTO TEST1 VALUES ('EXECUTED', 0, 80, 1);
INSERT INTO TEST1 VALUES ('FREQ', 0, 55, 2);
COMMIT;
select NVL(TO_CHAR(STATUS), 'total') as STATUS ,SUM(R1) R1, SUM(R2) R2 , SUM(R3) R3, SUM(R1+R2+R3)
from TEST1
group by rollup (STATUS)
;
STATUS R1 R2 R3 SUM(R1+R2+R3)
ACCEPTED 322 241 278 841
EXECUTED 0 80 1 81
FREQ 0 55 2 57
total 322 376 281 979

Resources