d
The above 'd' is a data frame which outputs the following table containing
14 obs and 2 vars.
NUMBER DATE
1 20 2017-01-01
2 30 2017-01-02
3 40 2017-01-03
4 40 2017-01-04
5 50 2017-01-05
6 50 2017-01-06
7 60 2017-01-07
8 20 2017-01-08
9 30 2017-01-09
10 40 2017-01-10
11 40 2017-01-11
12 50 2017-01-12
13 50 2017-01-13
14 60 2017-01-14
After running the following code:
a<-c(0,7)
for(i in a){
w <- subset(d, DATE >= as.Date("2017-01-01")+a & DATE <= as.Date("2017-01-
07")+a )
print(w)
}
I get the following output. I was expecting the output to be two tables with the first dates 1st-7th and the second 8th-14. If i change the 'a' variable in the code to just '0' or just '7' it will run the dates as i'd expect (but obviously only 1 table each time i run the for loop).
NUMBER DATE
1 20 2017-01-01
3 40 2017-01-03
5 50 2017-01-05
7 60 2017-01-07
8 20 2017-01-08
10 40 2017-01-10
12 50 2017-01-12
14 60 2017-01-14
NUMBER DATE
1 20 2017-01-01
3 40 2017-01-03
5 50 2017-01-05
7 60 2017-01-07
8 20 2017-01-08
10 40 2017-01-10
12 50 2017-01-12
14 60 2017-01-14
So how do i get the output to show table 1: 1st-7th dates and table 2: 8th-14th dates?
I realise my error now. I was using 'a' instead of 'i' within the for loop.
Related
We've run an Interrupted Time Series analysis on some aggregate count data using a Poisson regression. Code shown below - where Subject Total is the count, Quarter is time, int2 is the dummy variable for the intervention [0 pre, 1 post] and time_since_intervention2 the dummy variable for time since intervention [0 pre, 1:N post].
fit1a <- glm(`Subject Total` ~ Quarter + int2 + time_since_intervention2 , df, family = "poisson")
Quarter Subject Total int2 time_since_intervention2 subjectfit subcounter
1 1 34 0 0 34.20968 34.20968
2 2 32 0 0 33.39850 33.39850
3 3 36 0 0 32.60656 32.60656
4 4 34 0 0 31.83339 31.83339
5 5 23 0 0 31.07856 31.07856
6 6 34 0 0 30.34163 30.34163
7 7 33 0 0 29.62217 29.62217
8 8 24 0 0 28.91977 28.91977
9 9 31 0 0 28.23402 28.23402
10 10 32 0 0 27.56454 27.56454
11 11 21 0 0 26.91093 26.91093
12 12 26 0 0 26.27282 26.27282
13 13 22 0 0 25.64984 25.64984
14 14 28 0 0 25.04163 25.04163
15 15 28 0 0 24.44784 24.44784
16 16 22 0 0 23.86814 23.86814
17 17 14 1 1 17.88365 23.30218
18 18 16 1 2 17.01622 22.74964
19 19 20 1 3 16.19087 22.21020
20 20 19 1 4 15.40556 21.68355
21 21 13 1 5 14.65833 21.16939
22 22 15 1 6 13.94735 20.66743
23 23 16 1 7 13.27085 20.17736
24 24 8 1 8 12.62717 19.69892
Due to the need to exponentiate the outcome the summary is currently being derived using the margins package.
> summary(margins(fit1a))
factor AME SE z p lower upper
int2 -5.7843 5.1734 -1.1181 0.2635 -15.9241 4.3555
Quarter -0.5809 0.2469 -2.3526 0.0186 -1.0649 -0.0970
time_since_intervention2 -0.6227 0.9955 -0.6255 0.5316 -2.5738 1.3285
If reading the outcome correctly it would suggest that the level change between the final quarter in the pre-intervention period and first in the post-intervention period is -5.7843.
I've tried inputting coefficient values into my model [initial intercept = 35.0405575], but they don't appear to correlate at all with the subjectfit data, which I believed it would. Should the level change reported by the margins package replicate the difference in the full data.....?
I have this query, which works:
SELECT TO_CHAR(last_date_called,'HH24'), count(*)
FROM log_table
GROUP BY TO_CHAR(last_date_called,'HH24');
But, in some cases there are not 24 hours worth of data. What I want to do, is always generate 24 rows, and if there is nothing for that hour, return 0. So, results may look like this:
00 10
01 25
02 33
03 0
04 55
05 0
06 23
And so on........
You'll need a row generator to create all hours in a day, and then outer join it to your "real" table. Something like this (see comments within code):
SQL> with
2 hours as
3 -- row generator, to create all hours in a day
4 (select lpad(level - 1, 2, '0') hour
5 from dual
6 connect by level <= 24
7 ),
8 log_table (last_date_called) as
9 -- sample data, just to return "something"
10 (select to_date('08.07.2021 13:32', 'dd.mm.yyyy hh24:mi') from dual union all
11 select to_date('16.02.2021 08:20', 'dd.mm.yyyy hh24:mi') from dual
12 )
13 -- final query
14 select h.hour,
15 count(l.last_date_called) cnt
16 from hours h left join log_table l on h.hour = to_char(l.last_date_called, 'hh24')
17 group by h.hour
18 order by h.hour;
HO CNT
-- ----------
00 0
01 0
02 0
03 0
04 0
05 0
06 0
07 0
08 1
09 0
10 0
11 0
12 0
13 1
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 rows selected.
SQL>
I have this table:
Year
Month
Agency
Value
2019
9
1
233
2019
9
4
132
2019
8
3
342
2020
3
2
321
2020
3
4
34
2020
5
2
56
2020
5
4
221
2020
5
1
117
2018
12
2
112
2018
12
2
411
2020
4
3
241
2020
4
2
155
I'd like to set a new measure/column where last month from last year is 1, and 0 in another cases:
Year
Month
Agency
Value
Filter
2019
9
1
233
0
2019
9
4
132
0
2019
8
3
342
0
2020
3
2
321
0
2020
3
4
34
0
2020
5
2
56
1
2020
5
4
221
1
2020
5
1
117
1
2018
12
2
112
0
2018
12
2
411
0
2020
4
3
241
0
2020
4
2
155
0
I've been able to "copy" a new table with values from Month=5 and Year=2020 ("the lastest from the lastest"):
TableData - Last Charge =
var table = FILTER(
TableData,
AND(
MAX('TableData '[Year])='TableData '[Year],
MAX('TableData '[Month])='TableData '[Month]
)
)
return SUMMARIZE(table , TableData [Year], TableData [Month], TableData [Agency], TableData [Value])
However, my intention is don't create new tables and use measures/columns tu use it like FILTER when I create a graphic.
Thanks a lot, and sorry for my poor english.
I solved it with this measure:
Measure =
VAR a =
MAX ( 'Table'[Year] )
VAR b =
MAX ( 'Table'[Months] )
VAR c =
MAXX ( ALL ( 'Table' ), [Year] )
VAR d =
MAXX ( FILTER ( ALL ( 'Table' ), [Year] = c ), [Months] )
RETURN
IF ( a * 100 + b = c * 100 + d, 1, 0 )
I'm working with oracle and it's group by clause seems to behave very differently than I'd expect.
When using this query:
SELECT stats.gds_id,
stats.stat_date,
SUM(stats.A_BOOKINGS_NBR) as "Bookings",
SUM(stats.RESPONSES_LESS_1_NBR) as "<1",
SUM(stats.RESPONSES_LESS_2_NBR) AS "<2",
SUM(STATS.RESPONSES_LESS_3_NBR) AS "<3",
SUM(stats.RESPONSES_LESS_4_NBR) AS "<4",
SUM(stats.RESPONSES_LESS_5_NBR) AS "<5",
SUM(stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) AS ">5",
SUM(stats.RESPONSES_LESS_6_NBR) AS "<6",
SUM(stats.RESPONSES_LESS_7_NBR) AS "<7",
SUM(stats.RESPONSES_GREATER_7_NBR) AS ">7",
SUM(stats.RESPONSES_LESS_1_NBR + stats.RESPONSES_LESS_2_NBR + stats.RESPONSES_LESS_3_NBR + stats.RESPONSES_LESS_4_NBR + stats.RESPONSES_LESS_5_NBR + stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) as "Total"
FROM gwydb.statistics stats
WHERE stats.stat_date >= '01-JUN-2011'
GROUP BY stats.gds_id, stats.stat_date
I get results like this:
GDS_ID STAT_DATE Bookings <1 <2 <3 <4 <5 >5 <6 <7 >7 Total
02 12-JUN-11 0 1 0 0 0 0 0 0 0 0 1
1A 01-JUN-11 15 831 52 6 2 2 4 1 1 2 897
1A 01-JUN-11 15 758 59 8 1 1 5 2 1 2 832
1A 01-JUN-11 10 593 40 2 2 1 2 1 0 1 640
1A 01-JUN-11 12 678 40 10 5 2 3 1 0 2 738
1A 01-JUN-11 24 612 56 6 1 3 4 0 0 4 682
1A 01-JUN-11 23 552 37 7 1 1 2 0 1 1 600
1A 01-JUN-11 35 1147 132 13 6 0 8 0 2 6 1306
1A 01-JUN-11 91 2331 114 14 5 1 14 3 1 10 2479
As you can see, I have multiple duplicate STAT_DATE's per GDS_ID. Why is that, and how can I make it group by both of those? I.E. Sum the values for each GDS_ID per STAT_DATE.
Probably because STAT_DATE has a time component, which is being taken into account in the GROUP BY but not being displayed in the results due to the default format mask. To ignore the time, do this:
SELECT stats.gds_id,
TRUNC(stats.stat_date) stat_date,
SUM(stats.A_BOOKINGS_NBR) as "Bookings",
SUM(stats.RESPONSES_LESS_1_NBR) as "<1",
SUM(stats.RESPONSES_LESS_2_NBR) AS "<2",
SUM(STATS.RESPONSES_LESS_3_NBR) AS "<3",
SUM(stats.RESPONSES_LESS_4_NBR) AS "<4",
SUM(stats.RESPONSES_LESS_5_NBR) AS "<5",
SUM(stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) AS ">5",
SUM(stats.RESPONSES_LESS_6_NBR) AS "<6",
SUM(stats.RESPONSES_LESS_7_NBR) AS "<7",
SUM(stats.RESPONSES_GREATER_7_NBR) AS ">7",
SUM(stats.RESPONSES_LESS_1_NBR + stats.RESPONSES_LESS_2_NBR + stats.RESPONSES_LESS_3_NBR + stats.RESPONSES_LESS_4_NBR + stats.RESPONSES_LESS_5_NBR + stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) as "Total"
FROM gwydb.statistics stats
WHERE stats.stat_date >= '01-JUN-2011'
GROUP BY stats.gds_id, TRUNC(stats.stat_date)
I have this task and i can't figure out how to do it.
I need to find persons age in days, there are given birth and death dates, there's data file:
8
Albertas Einšteinas 1879 03 14 1955 04 18
Balys Sruoga 1896 02 02 1947 10 16
Antanas Vienuolis 1882 04 07 1957 08 17
Ernestas Rezerfordas 1871 08 30 1937 10 17
Nilsas Boras 1885 10 07 1962 11 18
Nežiniukas Pirmasis 8 05 24 8 05 25
Nežiniukas Antrasis 888 05 25 888 05 25
Nežiniukas Trečiasis 1 01 01 125 01 01
and there's how result file should look like:
1879 3 14 1955 4 18 27775
1896 2 2 1947 10 16 18871
1882 4 7 1957 8 17 27507
1871 8 30 1937 10 17 24138
1885 10 7 1962 11 18 28147
8 5 24 8 5 25 1
888 5 25 888 5 25 0
1 1 1 125 1 1 45260
Few things to notice: all februarys have 28 days.
My function for calculating age:
function AmziusFunc(Mas : TZmogus) : longint;
var amzius, max : longint;
begin
max := 125 * 365;
amzius := (Mas.mirY - Mas.gimY) * 365 + (Mas.mirM - Mas.gimM) * 31 +
(Mas.mirD - Mas.gimD);
if ( amzius >= max ) then amzius := 0;
AmziusFunc := amzius;
end;
What should i change there? Thanks.
function AmziusFunc(Mas : TZmogus) : longint;
var amzius, max : longint;
begin
max := 125 * 365;
amzius := (Mas.mirY - Mas.gimY) * 365 + (Mas.mirM - Mas.gimM) * 31 +
(Mas.mirD - Mas.gimD);
if ( amzius >= max ) then amzius := 0;
AmziusFunc := amzius;
end;