R Margins outcome non-compliant with excpected raw data - time

We've run an Interrupted Time Series analysis on some aggregate count data using a Poisson regression. Code shown below - where Subject Total is the count, Quarter is time, int2 is the dummy variable for the intervention [0 pre, 1 post] and time_since_intervention2 the dummy variable for time since intervention [0 pre, 1:N post].
fit1a <- glm(`Subject Total` ~ Quarter + int2 + time_since_intervention2 , df, family = "poisson")
Quarter Subject Total int2 time_since_intervention2 subjectfit subcounter
1 1 34 0 0 34.20968 34.20968
2 2 32 0 0 33.39850 33.39850
3 3 36 0 0 32.60656 32.60656
4 4 34 0 0 31.83339 31.83339
5 5 23 0 0 31.07856 31.07856
6 6 34 0 0 30.34163 30.34163
7 7 33 0 0 29.62217 29.62217
8 8 24 0 0 28.91977 28.91977
9 9 31 0 0 28.23402 28.23402
10 10 32 0 0 27.56454 27.56454
11 11 21 0 0 26.91093 26.91093
12 12 26 0 0 26.27282 26.27282
13 13 22 0 0 25.64984 25.64984
14 14 28 0 0 25.04163 25.04163
15 15 28 0 0 24.44784 24.44784
16 16 22 0 0 23.86814 23.86814
17 17 14 1 1 17.88365 23.30218
18 18 16 1 2 17.01622 22.74964
19 19 20 1 3 16.19087 22.21020
20 20 19 1 4 15.40556 21.68355
21 21 13 1 5 14.65833 21.16939
22 22 15 1 6 13.94735 20.66743
23 23 16 1 7 13.27085 20.17736
24 24 8 1 8 12.62717 19.69892
Due to the need to exponentiate the outcome the summary is currently being derived using the margins package.
> summary(margins(fit1a))
factor AME SE z p lower upper
int2 -5.7843 5.1734 -1.1181 0.2635 -15.9241 4.3555
Quarter -0.5809 0.2469 -2.3526 0.0186 -1.0649 -0.0970
time_since_intervention2 -0.6227 0.9955 -0.6255 0.5316 -2.5738 1.3285
If reading the outcome correctly it would suggest that the level change between the final quarter in the pre-intervention period and first in the post-intervention period is -5.7843.
I've tried inputting coefficient values into my model [initial intercept = 35.0405575], but they don't appear to correlate at all with the subjectfit data, which I believed it would. Should the level change reported by the margins package replicate the difference in the full data.....?

Related

Why does printing out local data mess with MPI_GATHERV results?

The following code needs 12 processors to run properly.
program GameOfLife
use mpi
implicit none
integer ierr, myid, numprocs
integer send, recv, count, tag
parameter (tag=111)
integer recv_buff, send_buff, request
integer stat(MPI_STATUS_SIZE)
integer N, m, i, j, sum
parameter (N=3) !# of squares per a processors
integer W, H
parameter (W=4,H=3) !# of processors up and across
integer A(N,N), buff(0:N+1,0:N+1), G(N*H, N*W)
! real r
integer sizes(2), subsizes(2), starts(2), recvcount(N*N)
integer newtype, intsize, resizedtype
integer(kind=MPI_ADDRESS_KIND) extent, begin
integer disp(W*H)
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
! fill up subgrid
do i = 1, N
do j = 1, N
! call random_number(r)
A(i,j) = myid! floor(r*2)
end do
end do
do i = 1, N
print *, A(i, :)
end do
starts = [0,0]
sizes = [N*H, N*W]
subsizes = [N, N]
call MPI_Type_create_subarray(2, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = intsize*N
begin = 0
call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
disp = [0, 1, 2, 9, 10, 11, 18, 19, 20, 27, 28, 29]
recvcount = 1
call MPI_GATHERV(A,N*N,MPI_INTEGER,G,recvcount,disp,resizedtype,0,MPI_COMM_WORLD,ierr)
call MPI_WAIT(request, stat, ierr)
if ( myid == 0 ) then
do i = 1, N*H
print *, G(i,:)
end do
endif
call MPI_FINALIZE(ierr)
end program GameOfLife
When ran without printing out the matrix A, everything works mostly okay. But If I try to print out A before feeding it to the gather statement, I get a jumbled mess.
What's going on here? I assume memory is trying to be accessed at the same time or something along those lines.
Output of G looks like
0 0 0 4 4 4 0 -1302241474 1 13 13 13
0 0 0 4 4 4 0 0 0 13 13 13
0 0 0 4 4 4 -10349344 -12542198 -10350200 13 13 13
1 1 1 5 5 5 59 59 59 14 14 14
1 1 1 5 5 5 -1342953792 0 0 14 14 14
1 1 1 5 5 5 32767 0 0 14 14 14
2 2 2 6 6 6 -1342953752 1451441280 0 15 15 15
2 2 2 6 6 6 32767 10985 0 15 15 15
2 2 2 6 6 6 -10350200 1 0 15 15 15
3 3 3 7 7 8 8 8 12 12 12 0
3 3 3 7 7 8 8 8 12 12 12 0
3 3 3 7 7 8 8 8 12 12 12 0

How do I rearrange elements in a 1D matrix array in-place?

I want to align the memory of a 5x5 matrix represented as an one-dimensional array.
The original array looks like this:
let mut a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25];
or
[ 1 2 3 4 5 ]
[ 6 7 8 9 10 ]
a = [ 11 12 13 14 15 ]
[ 16 17 18 19 20 ]
[ 21 22 23 24 25 ]
with a length of 25 elements.
after resizing the memory to memory aligned bounds (power of 2), the array will look like this:
a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 6 7 8 ]
[ 9 10 11 12 13 14 15 16 ]
[ 17 18 19 20 21 22 23 24 ]
[ 25 0 0 0 0 0 0 0 ]
a = [ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
the len of a is now 64 elements.
so it will become an 8x8 matrix
the goal is to have following representation:
a = [1 2 3 4 5 0 0 0 6 7 8 9 10 0 0 0 11 12 13 14 15 0 0 0 16 17 18 19 20 0 0 0 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 0 0 0 ]
[ 6 7 8 9 10 0 0 0 ]
[ 11 12 13 14 15 0 0 0 ]
[ 16 17 18 19 20 0 0 0 ]
[ 21 22 23 24 25 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
The background is to have a memory aligned to a power of two, so calculations can be partially done in parallel ( for OpenCL float4, or the available vector sizes.). I also do not want to use a new array to simply insert the old elements at the correct positions to keep memory consumption low.
At first, I thought about swapping the elements at the range, where there should be a zero with the elements at the end of the array, keeping a pointer to the elements and simulating a queue, but elements would stack up towards the end, and I didn't came up with a working solution.
My language of choice is rust. Is there any smart algorithm to achieve the desired result?
So you have an N * N matrix represented as a vector of size N^2, then you resize the vector to M^2 (M > N), so that the first N^2 elements are the original ones. Now you want to rearrange the original elements, so that the N * N sub-matrix in the upper left of the M * M matrix is the same as the original.
One thing to note is that if you go backwards you will never overwrite a value that you will need later.
The position of index X in the M * M matrix is row X / M (integer division) and column X % M.
The desired position of index X is row X / N and column X % N
An element at row R and column C in the M * M matrix has the index R * M + C
Now taking all this information we can come up with the formula to get the new index Y for the old index X:
Y = (X / N) * M + (X % N)
So you can just make a loop from N^2 - 1 to N and copy the element to the new position calculated with the formula and set its original position to 0. (Everything is 0-based, I hope rust is 0-based as well or you will have to add some +1.)
According to maraca's solution, the code would look like this:
fn zeropad<T: Copy>(
first: T,
data: &mut Vec<T>,
dims: (usize, usize),
) -> (usize, usize) {
let r = next_pow2(dims.0);
let c = next_pow2(dims.1);
if (r, c) == (dims.0, dims.1) {
return (r, c);
}
let new_len = r * c;
let old_len = data.len();
let old_col = dims.1;
// resize
data.resize(new_len, first);
for i in (old_col..old_len).rev() {
let row: usize = i / c;
let col: usize = i % c;
// bigger matrix
let pos_old = row * c + col;
// smaller matrix
let pos_new = (i / dims.1) * c + (i % dims.1);
data[pos_new] = data[pos_old];
data[pos_old] = first;
}
return (r, c);
}

EC2 high stolen time without load

I can see very high % of stolen time on a EC2 web server (t2.micro) without any load (one current user) with a high page load time. Is there a correlation between hight load time and hight stolen time? I have the same symptoms with another server from class t2.medium
Do you have an explanation?
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 79824 7428 479172 0 0 0 0 52 49 18 0 0 0 82
1 0 0 79792 7436 479172 0 0 0 6 54 49 18 0 0 0 82
1 0 0 79824 7444 479172 0 0 0 5 54 51 18 0 0 0 82

Fastest way to find the sign of different square

Given an image I and two matrices m_1 ;m_2 (same size with I). The function f is defined as:
Because my goal design wants to get the sign of f . Hence, the function f can rewritten as following:
I think that second formula is faster than first formula because: It
can ignore the square term
It can compute the sign directly, instead of two steps in first equation: compute the f and check sign.
Do you agree with me? Do you have another faster formula for f
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
The ouput f is
f =[1 1 1 1 1
1 1 -1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1]
I implemented the first way, but I did not finish the second way by matlab. Could you check help me the second way and compare it
UPDATE: I would like to add code of chepyle and Divakar to make clearly question. Note that both of them give the same result as above f
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
f(f==0)=1;
end
function f= second_way()
f = double(abs(I-m1) >= abs(I-m2));
f(f==0) = -1;
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
f(f==0) = 1;
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
First way : 1.2897e-05
Second way: 1.9381e-05
Third way : 2.0077e-05
This seems to be comparable and might be a wee bit faster at times than the original approach -
f = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2))
Benchmarking Code
%// Create random inputs
N = 5000;
I = randi(1000,N,N);
m1 = randi(1000,N,N);
m2 = randi(1000,N,N);
num_iter = 20; %// Number of iterations for all approaches
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------- With Original Approach')
tic
for iter = 1:num_iter
out1 = sign((I-m1).^2-(I-m2).^2);
out1(out1==0)=-1;
end
toc, clear out1
disp('------------------------- With Proposed Approach')
tic
for iter = 1:num_iter
out2 = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2));
end
toc
Results
------------------------- With Original Approach
Elapsed time is 1.751966 seconds.
------------------------- With Proposed Approach
Elapsed time is 1.681263 seconds.
There is a problem with the accuracy of second formula, but for the sake of comparison, here's how I would implement it in matlab, along with a third approach to avoid squaring and the sign() function, inline with your intent. Note that the matlab's matrix and sign functions are pretty well optimized, the second and third approaches are both slower.
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
end
function f= second_way()
v1=(I-m1);
v2=(I-m2);
f= int8(v1<=0 & v2>0) + -1* int8(v1>0 & v2<=0);
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
The output:
First way : 9.4226e-06
Second way: 1.2247e-05
Third way : 1.1546e-05

Group By isn't grouping properly

I'm working with oracle and it's group by clause seems to behave very differently than I'd expect.
When using this query:
SELECT stats.gds_id,
stats.stat_date,
SUM(stats.A_BOOKINGS_NBR) as "Bookings",
SUM(stats.RESPONSES_LESS_1_NBR) as "<1",
SUM(stats.RESPONSES_LESS_2_NBR) AS "<2",
SUM(STATS.RESPONSES_LESS_3_NBR) AS "<3",
SUM(stats.RESPONSES_LESS_4_NBR) AS "<4",
SUM(stats.RESPONSES_LESS_5_NBR) AS "<5",
SUM(stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) AS ">5",
SUM(stats.RESPONSES_LESS_6_NBR) AS "<6",
SUM(stats.RESPONSES_LESS_7_NBR) AS "<7",
SUM(stats.RESPONSES_GREATER_7_NBR) AS ">7",
SUM(stats.RESPONSES_LESS_1_NBR + stats.RESPONSES_LESS_2_NBR + stats.RESPONSES_LESS_3_NBR + stats.RESPONSES_LESS_4_NBR + stats.RESPONSES_LESS_5_NBR + stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) as "Total"
FROM gwydb.statistics stats
WHERE stats.stat_date >= '01-JUN-2011'
GROUP BY stats.gds_id, stats.stat_date
I get results like this:
GDS_ID STAT_DATE Bookings <1 <2 <3 <4 <5 >5 <6 <7 >7 Total
02 12-JUN-11 0 1 0 0 0 0 0 0 0 0 1
1A 01-JUN-11 15 831 52 6 2 2 4 1 1 2 897
1A 01-JUN-11 15 758 59 8 1 1 5 2 1 2 832
1A 01-JUN-11 10 593 40 2 2 1 2 1 0 1 640
1A 01-JUN-11 12 678 40 10 5 2 3 1 0 2 738
1A 01-JUN-11 24 612 56 6 1 3 4 0 0 4 682
1A 01-JUN-11 23 552 37 7 1 1 2 0 1 1 600
1A 01-JUN-11 35 1147 132 13 6 0 8 0 2 6 1306
1A 01-JUN-11 91 2331 114 14 5 1 14 3 1 10 2479
As you can see, I have multiple duplicate STAT_DATE's per GDS_ID. Why is that, and how can I make it group by both of those? I.E. Sum the values for each GDS_ID per STAT_DATE.
Probably because STAT_DATE has a time component, which is being taken into account in the GROUP BY but not being displayed in the results due to the default format mask. To ignore the time, do this:
SELECT stats.gds_id,
TRUNC(stats.stat_date) stat_date,
SUM(stats.A_BOOKINGS_NBR) as "Bookings",
SUM(stats.RESPONSES_LESS_1_NBR) as "<1",
SUM(stats.RESPONSES_LESS_2_NBR) AS "<2",
SUM(STATS.RESPONSES_LESS_3_NBR) AS "<3",
SUM(stats.RESPONSES_LESS_4_NBR) AS "<4",
SUM(stats.RESPONSES_LESS_5_NBR) AS "<5",
SUM(stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) AS ">5",
SUM(stats.RESPONSES_LESS_6_NBR) AS "<6",
SUM(stats.RESPONSES_LESS_7_NBR) AS "<7",
SUM(stats.RESPONSES_GREATER_7_NBR) AS ">7",
SUM(stats.RESPONSES_LESS_1_NBR + stats.RESPONSES_LESS_2_NBR + stats.RESPONSES_LESS_3_NBR + stats.RESPONSES_LESS_4_NBR + stats.RESPONSES_LESS_5_NBR + stats.RESPONSES_LESS_6_NBR + stats.RESPONSES_LESS_7_NBR + stats.RESPONSES_GREATER_7_NBR) as "Total"
FROM gwydb.statistics stats
WHERE stats.stat_date >= '01-JUN-2011'
GROUP BY stats.gds_id, TRUNC(stats.stat_date)

Resources