Group By isn't grouping properly - oracle

I'm working with oracle and it's group by clause seems to behave very differently than I'd expect.
When using this query:
SELECT stats.gds_id,
stats.stat_date,
SUM(stats.A_BOOKINGS_NBR) as "Bookings",
SUM(stats.RESPONSES_LESS_1_NBR) as "<1",
SUM(stats.RESPONSES_LESS_2_NBR) AS "<2",
SUM(STATS.RESPONSES_LESS_3_NBR) AS "<3",
SUM(stats.RESPONSES_LESS_4_NBR) AS "<4",
SUM(stats.RESPONSES_LESS_5_NBR) AS "<5",
SUM(stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) AS ">5",
SUM(stats.RESPONSES_LESS_6_NBR) AS "<6",
SUM(stats.RESPONSES_LESS_7_NBR) AS "<7",
SUM(stats.RESPONSES_GREATER_7_NBR) AS ">7",
SUM(stats.RESPONSES_LESS_1_NBR + stats.RESPONSES_LESS_2_NBR + stats.RESPONSES_LESS_3_NBR + stats.RESPONSES_LESS_4_NBR + stats.RESPONSES_LESS_5_NBR + stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) as "Total"
FROM gwydb.statistics stats
WHERE stats.stat_date >= '01-JUN-2011'
GROUP BY stats.gds_id, stats.stat_date
I get results like this:
GDS_ID STAT_DATE Bookings <1 <2 <3 <4 <5 >5 <6 <7 >7 Total
02 12-JUN-11 0 1 0 0 0 0 0 0 0 0 1
1A 01-JUN-11 15 831 52 6 2 2 4 1 1 2 897
1A 01-JUN-11 15 758 59 8 1 1 5 2 1 2 832
1A 01-JUN-11 10 593 40 2 2 1 2 1 0 1 640
1A 01-JUN-11 12 678 40 10 5 2 3 1 0 2 738
1A 01-JUN-11 24 612 56 6 1 3 4 0 0 4 682
1A 01-JUN-11 23 552 37 7 1 1 2 0 1 1 600
1A 01-JUN-11 35 1147 132 13 6 0 8 0 2 6 1306
1A 01-JUN-11 91 2331 114 14 5 1 14 3 1 10 2479
As you can see, I have multiple duplicate STAT_DATE's per GDS_ID. Why is that, and how can I make it group by both of those? I.E. Sum the values for each GDS_ID per STAT_DATE.

Probably because STAT_DATE has a time component, which is being taken into account in the GROUP BY but not being displayed in the results due to the default format mask. To ignore the time, do this:
SELECT stats.gds_id,
TRUNC(stats.stat_date) stat_date,
SUM(stats.A_BOOKINGS_NBR) as "Bookings",
SUM(stats.RESPONSES_LESS_1_NBR) as "<1",
SUM(stats.RESPONSES_LESS_2_NBR) AS "<2",
SUM(STATS.RESPONSES_LESS_3_NBR) AS "<3",
SUM(stats.RESPONSES_LESS_4_NBR) AS "<4",
SUM(stats.RESPONSES_LESS_5_NBR) AS "<5",
SUM(stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) AS ">5",
SUM(stats.RESPONSES_LESS_6_NBR) AS "<6",
SUM(stats.RESPONSES_LESS_7_NBR) AS "<7",
SUM(stats.RESPONSES_GREATER_7_NBR) AS ">7",
SUM(stats.RESPONSES_LESS_1_NBR + stats.RESPONSES_LESS_2_NBR + stats.RESPONSES_LESS_3_NBR + stats.RESPONSES_LESS_4_NBR + stats.RESPONSES_LESS_5_NBR + stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) as "Total"
FROM gwydb.statistics stats
WHERE stats.stat_date >= '01-JUN-2011'
GROUP BY stats.gds_id, TRUNC(stats.stat_date)

Related

R Margins outcome non-compliant with excpected raw data

We've run an Interrupted Time Series analysis on some aggregate count data using a Poisson regression. Code shown below - where Subject Total is the count, Quarter is time, int2 is the dummy variable for the intervention [0 pre, 1 post] and time_since_intervention2 the dummy variable for time since intervention [0 pre, 1:N post].
fit1a <- glm(`Subject Total` ~ Quarter + int2 + time_since_intervention2 , df, family = "poisson")
Quarter Subject Total int2 time_since_intervention2 subjectfit subcounter
1 1 34 0 0 34.20968 34.20968
2 2 32 0 0 33.39850 33.39850
3 3 36 0 0 32.60656 32.60656
4 4 34 0 0 31.83339 31.83339
5 5 23 0 0 31.07856 31.07856
6 6 34 0 0 30.34163 30.34163
7 7 33 0 0 29.62217 29.62217
8 8 24 0 0 28.91977 28.91977
9 9 31 0 0 28.23402 28.23402
10 10 32 0 0 27.56454 27.56454
11 11 21 0 0 26.91093 26.91093
12 12 26 0 0 26.27282 26.27282
13 13 22 0 0 25.64984 25.64984
14 14 28 0 0 25.04163 25.04163
15 15 28 0 0 24.44784 24.44784
16 16 22 0 0 23.86814 23.86814
17 17 14 1 1 17.88365 23.30218
18 18 16 1 2 17.01622 22.74964
19 19 20 1 3 16.19087 22.21020
20 20 19 1 4 15.40556 21.68355
21 21 13 1 5 14.65833 21.16939
22 22 15 1 6 13.94735 20.66743
23 23 16 1 7 13.27085 20.17736
24 24 8 1 8 12.62717 19.69892
Due to the need to exponentiate the outcome the summary is currently being derived using the margins package.
> summary(margins(fit1a))
factor AME SE z p lower upper
int2 -5.7843 5.1734 -1.1181 0.2635 -15.9241 4.3555
Quarter -0.5809 0.2469 -2.3526 0.0186 -1.0649 -0.0970
time_since_intervention2 -0.6227 0.9955 -0.6255 0.5316 -2.5738 1.3285
If reading the outcome correctly it would suggest that the level change between the final quarter in the pre-intervention period and first in the post-intervention period is -5.7843.
I've tried inputting coefficient values into my model [initial intercept = 35.0405575], but they don't appear to correlate at all with the subjectfit data, which I believed it would. Should the level change reported by the margins package replicate the difference in the full data.....?

The Traveling Salesman algorithm bug

I have tried to make an algorithm solving the traveling salesman problem as follows:
%main function:
[siz, ~] = size(table);
done(1:siz) = false;
done(1) = true;
[dist, path] = bruteForce(table, done, 1);
function bruteForce:
function [distance, path] = bruteForce(table, done, index)
size = length(done);
dmin = inf;
distance = 0;
path = [];
%finding minimum distance
for i = 1:size
if ~done(i)
done(i) = true;
%iterating through all nodes using recursion
[d, p] = bruteForce(table, done, i);
if (d < dmin)
dmin = d;
path = [i p];
distance = dmin + table(i, index);
end
%freing the node again
done(i) = false;
end
end
if distance == 0
distance = table(1, index);
path = 1;
end
Unfortunately, for the following matrix:
B = [0 29 20 21 16 31 100 12 4 31 18;
29 0 15 29 28 40 72 21 29 41 12;
20 15 0 15 14 25 81 9 23 27 13;
21 29 15 0 4 12 92 12 25 13 25;
16 28 14 4 0 16 94 9 20 16 22;
31 40 25 12 16 0 95 24 36 3 37;
100 72 81 92 94 95 0 90 101 99 84;
12 21 9 12 9 24 90 0 15 25 13;
4 29 23 25 20 36 101 15 0 35 18;
31 41 27 13 16 3 99 25 35 0 38;
18 12 13 25 22 37 84 13 18 38 0];
Instead of getting the expected result:
1-8-5-4-10-6-3-7-2-11-9-1 = 253km
I get:
1-8-11-3-4-6-10-5-9-2-7-1 = 271km
Could you help me find the bug?
If brute force is a must and speed is no issue, then just use the perms function for the number of cities. This allows for an easy implementation:
table = [0 29 20 21 16 31 100 12 4 31 18;
29 0 15 29 28 40 72 21 29 41 12;
20 15 0 15 14 25 81 9 23 27 13;
21 29 15 0 4 12 92 12 25 13 25;
16 28 14 4 0 16 94 9 20 16 22;
31 40 25 12 16 0 95 24 36 3 37;
100 72 81 92 94 95 0 90 101 99 84;
12 21 9 12 9 24 90 0 15 25 13;
4 29 23 25 20 36 101 15 0 35 18;
31 41 27 13 16 3 99 25 35 0 38;
18 12 13 25 22 37 84 13 18 38 0];
[siz, ~] = size(table);
[bp, b] = bruteForce(table, siz)
function [bestpath, best] = bruteForce(table, siz)
p = perms(1:siz);
[r, c] = size(p);
best = inf;
for i = 1:r
path = p(i, :);
dist = distCalculatorReturn(table, path);
if dist < best
best = dist;
bestpath = path;
end
end
bestpath = [bestpath, bestpath(1)];
end
function [totaldist] = distCalculatorReturn(distMatrix, proposedPath)
dist = 0;
i = 1;
while i ~= length(proposedPath)
dist = dist + distMatrix(proposedPath(i),proposedPath(i+1));
i = i+1;
end
dist = dist + distMatrix(proposedPath(1), proposedPath(end));
totaldist = dist;
end
This yields the answer you are looking for. However, if you are only solving problems of that size, why not apply a standard simulated annealing. This gives much faster solution times and should solve the problem size consistently:
table = [0 29 20 21 16 31 100 12 4 31 18;
29 0 15 29 28 40 72 21 29 41 12;
20 15 0 15 14 25 81 9 23 27 13;
21 29 15 0 4 12 92 12 25 13 25;
16 28 14 4 0 16 94 9 20 16 22;
31 40 25 12 16 0 95 24 36 3 37;
100 72 81 92 94 95 0 90 101 99 84;
12 21 9 12 9 24 90 0 15 25 13;
4 29 23 25 20 36 101 15 0 35 18;
31 41 27 13 16 3 99 25 35 0 38;
18 12 13 25 22 37 84 13 18 38 0];
[path, dist] = tsp(table, length(table))
function [path, dist] = tsp(D, n)
L = 40*n;
epsi = 1e-9;
x = randperm(n);
fx = distCalculatorReturn(D, x);
T = 1000000;
while T > epsi
for i=1:L
num1 = 1 + floor(rand*n);
num2 = 1 + floor(rand*n);
while num1 == num2
num1 = 1 + floor(rand*n);
end
y = x;
swap1 = y(num1);
y(num1) = y(num2);
y(num2) = swap1;
fy = distCalculatorReturn(D,y);
if fy < fx
x = y;
fx = fy;
elseif rand < exp(-(fy - fx)/T)
x = y;
fx = fy;
end
end
T = 0.9*T;
end
path = [x, x(1)];
dist = fx;
end
Your code does not compute the distance for each possible path (as bruteForce suggests). Instead it always starts at node 1 and from there goes always to the node that is closest to the current node. As your example shows, that does not necessarily lead to the overall shortest path. You will need to go through all possible paths to be sure you find the optimum.
Here is my go at your problem:
% distance matrix
B = [0 29 20 21 16 31 100 12 4 31 18;
29 0 15 29 28 40 72 21 29 41 12;
20 15 0 15 14 25 81 9 23 27 13;
21 29 15 0 4 12 92 12 25 13 25;
16 28 14 4 0 16 94 9 20 16 22;
31 40 25 12 16 0 95 24 36 3 37;
100 72 81 92 94 95 0 90 101 99 84;
12 21 9 12 9 24 90 0 15 25 13;
4 29 23 25 20 36 101 15 0 35 18;
31 41 27 13 16 3 99 25 35 0 38;
18 12 13 25 22 37 84 13 18 38 0];
% compute all possible paths assuming we always start at node 1
nNodes = size(B,1);
paths = perms(2:nNodes);
nPaths = size(paths,1);
paths = [ones(nPaths,1) paths ones(nPaths,1)]; % start and finish tour at node 1
% with a random start point:
% paths = perms(1:nNodes);
% paths = [perms(1:nNodes) paths(:,1)];
% compute overall distance for each path
distance = inf;
for idx=1:nPaths
from = paths(idx,1:end-1);
to = paths(idx,2:end);
d = sum(diag(B(from,to)));
if d<distance
distance = d;
optPath = paths(idx,:);
end
end
This leads to the following result:
optPath = [1 9 11 2 7 3 6 10 4 5 8 1]
distance = 253

EC2 high stolen time without load

I can see very high % of stolen time on a EC2 web server (t2.micro) without any load (one current user) with a high page load time. Is there a correlation between hight load time and hight stolen time? I have the same symptoms with another server from class t2.medium
Do you have an explanation?
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 79824 7428 479172 0 0 0 0 52 49 18 0 0 0 82
1 0 0 79792 7436 479172 0 0 0 6 54 49 18 0 0 0 82
1 0 0 79824 7444 479172 0 0 0 5 54 51 18 0 0 0 82

Fetch data from columns using max over another column using Oracle analytic functions

I need to fetch the sum between bal and res columns for each different account (acc) using the max(timestamp), look at this:
ID ACC BAL RES TIMESTAMP
--------------------------
1 100 70 0 1430238709
2 101 4 0 1430238710
3 102 0 0 1430238720
4 103 3 1 1430238721
5 100 22 1 1430238731
6 101 89 0 1430238732
7 102 101 1 1430238742
8 103 105 1 1430238753
9 100 106 0 1430238763
10 101 100 1 1430238774
11 102 1 1 1430238784
12 103 65 0 1430238795
What I need is for MAX(timestamp) <= 1430238763, the sum: bal + res grouped by acc like this:
ACC TOT
-------
100 106
101 89
102 102
103 106
I know how to do it using subqueries, but I would like to try Analytics.
Regards
How about:
Select * from (
Select t.*, max(TIMESTAMP) over (partition by id) mx
from tab t
)
where mx=TIMESTAMP;
hth
your query is not solve without sub query.
select acc,sum(bal + res) from table_name where timestamp in
(select acc,max(timestamp) from table_Name group by acc having hax(timestamp)<=1430238763)
Regards.

Fastest way to find the sign of different square

Given an image I and two matrices m_1 ;m_2 (same size with I). The function f is defined as:
Because my goal design wants to get the sign of f . Hence, the function f can rewritten as following:
I think that second formula is faster than first formula because: It
can ignore the square term
It can compute the sign directly, instead of two steps in first equation: compute the f and check sign.
Do you agree with me? Do you have another faster formula for f
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
The ouput f is
f =[1 1 1 1 1
1 1 -1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1]
I implemented the first way, but I did not finish the second way by matlab. Could you check help me the second way and compare it
UPDATE: I would like to add code of chepyle and Divakar to make clearly question. Note that both of them give the same result as above f
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
f(f==0)=1;
end
function f= second_way()
f = double(abs(I-m1) >= abs(I-m2));
f(f==0) = -1;
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
f(f==0) = 1;
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
First way : 1.2897e-05
Second way: 1.9381e-05
Third way : 2.0077e-05
This seems to be comparable and might be a wee bit faster at times than the original approach -
f = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2))
Benchmarking Code
%// Create random inputs
N = 5000;
I = randi(1000,N,N);
m1 = randi(1000,N,N);
m2 = randi(1000,N,N);
num_iter = 20; %// Number of iterations for all approaches
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------- With Original Approach')
tic
for iter = 1:num_iter
out1 = sign((I-m1).^2-(I-m2).^2);
out1(out1==0)=-1;
end
toc, clear out1
disp('------------------------- With Proposed Approach')
tic
for iter = 1:num_iter
out2 = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2));
end
toc
Results
------------------------- With Original Approach
Elapsed time is 1.751966 seconds.
------------------------- With Proposed Approach
Elapsed time is 1.681263 seconds.
There is a problem with the accuracy of second formula, but for the sake of comparison, here's how I would implement it in matlab, along with a third approach to avoid squaring and the sign() function, inline with your intent. Note that the matlab's matrix and sign functions are pretty well optimized, the second and third approaches are both slower.
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
end
function f= second_way()
v1=(I-m1);
v2=(I-m2);
f= int8(v1<=0 & v2>0) + -1* int8(v1>0 & v2<=0);
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
The output:
First way : 9.4226e-06
Second way: 1.2247e-05
Third way : 1.1546e-05

Resources