Vertical white lines when plotting heatmap in TIFF - image

When I plot a matrix with the image function as a TIFF file, I often get vertical or horizontal lines.
My matrix is of 150000 rows x 2000 columns, the lines also appears when plotting matrices of 150000 rows x 100 columns. The results are the same.
Where do the lines come from? Is this some sort of pixelated artifact? I get them almost all the time.
The matrix looks like this:
V999 V1000 V1001 V1002 V1003 V1004 V1005 V1006 V1007 V1008 V1009 V1010
[1,] 1 4 0 0 15 15 15 15 8 0 1 0
[2,] 0 3 12 5 15 15 15 1 15 4 0 2
[3,] 0 0 0 3 6 15 15 15 15 15 0 3
[4,] 3 6 15 15 15 15 15 0 3 15 15 2
[5,] 15 15 15 0 3 15 15 2 1 5 8 11
[6,] 2 1 5 8 11 15 15 15 0 0 4 3
tiff("test.tiff", width=450, height=1100)
image(t(mc), col = col1, main="950-1500"
dev.off()
Any hints/comments will be much appreciated.

You're seeing an aliasing artifact from the x11() display. You can try dragging the window to make it bigger or smaller and eventually you'll find a window size height and width that is compatible with your desired resolution.

Related

Amazon QuickSight - Running Difference

I have the following table and want to add a column with running difference.
time_gap
amounts
0
150
0.5
19
1.5
2
6
1
7
4
my desired out is
time_gap
amounts
diff
0
150
150
0.5
19
131
1.5
2
129
6
10
119
7
4
115
What I've tried:
I duplicate the amounts column and used table calculation, difference, but got the difference between two consecutive rows instead:
time_gap
amounts
diff
0
150
0.5
19
-131
1.5
2
-17
6
10
8
7
4
-6
I tried some calculated fields formulas but that didn't work either.
thank you!

Degrees of Freedom in SAS Proc MIXED

Below is a SAS proc Mixed code generated by JMP. Both JMP and SAS give me a confidence interval for the variance components (in the table "covariance parameter estimates"). I would like the degrees of freedom in the output, or at least a formula to calculate them. Can anyone tell me where that would be?
DATA Untitled; INPUT x y Batch &$; Lines;
0 107.2109269 4
3 106.3777088 4
6 103.8625117 4
9 103.7524023 4
12 101.6895595 4
18 104.4145268 4
24 100.6606813 4
0 107.6603635 5
3 105.9161433 5
6 106.0260339 5
9 104.660272 5
12 102.5820776 5
18 103.7961511 5
24 101.2887124 5
0 109.2066284 6
3 106.9341754 6
6 106.6141445 6
9 106.8234541 6
12 104.7778902 6
18 106.0184734 6
24 102.9822743 6
;
RUN;
PROC MIXED ASYCOV NOBOUND DATA=Untitled ALPHA=0.05;
CLASS Batch;
MODEL y = x/ SOLUTION DDFM=KENWARDROGER;
RANDOM Batch / SOLUTION ;
RUN;

plotting multiple graphs and animation from a data file in gnuplot

Suppose I have the following sample data file.
0 1 2
0 3 4
0 1 9
0 9 2
0 19 0
0 6 1
0 11 0
1 3 2
1 3 4
1 1 6
1 9 2
1 15 0
1 6 6
1 11 1
2 3 2
2 4 4
2 1 6
2 9 6
2 15 0
2 6 6
2 11 1
first column gives value of time. Second gives values of x and 3rd column y. I wish to plot graphs of y as functions of x from this data file at different times,
i.e, for t=0, I shall plot using 2:3 with lines up to t=0 index. Then same thing I shall do for the variables at t=1.
At the end of the day, I want to get a gif, i.e, an animation of how the y vs x graph changes shape as time goes on. How can I do this in gnuplot?
What have you tried so far? (Check help ternary and help gif)
You need to filter your data with the ternary operator and then create the animation.
Code:
### plot filtered data and animate
reset session
$Data <<EOD
0 1 2
0 3 4
0 1 9
0 9 2
0 19 0
0 6 1
0 11 0
1 3 2
1 3 4
1 1 6
1 9 2
1 15 0
1 6 6
1 11 1
2 3 2
2 4 4
2 1 6
2 9 6
2 15 0
2 6
2 11 1
EOD
set terminal gif animate delay 50 optimize
set output "myAnimation.gif"
set xrange[0:20]
set yrange[0:10]
do for [i=0:2] {
plot $Data u 2:($1==i?$3:NaN) w lp pt 7 ti sprintf("Time: %g",i)
}
set output
### end of code
Result:
Addition:
The meaning of $1==i?$3:NaN in words:
If the value in the first column is equal to i then the result is the value in the third column else it will be NaN ("Not a Number").

Speed up code to compare fields in a struct

I have the struct Trajectories with field uniqueDate, dateAll, label: I want to compare the fields uniqueDate and dateAll and, if there is a correspondence, I will save in label a value from an other struct.
I have written this code:
for k=1:nCols
for j=1:size(Trajectories(1,k).dateAll,1)
for i=1:size(Trajectories(1,k).uniqueDate,1)
if (~isempty(s(1,k).places))&&(Trajectories(1,k).dateAll(j,1)==Trajectories(1,k).uniqueDate(i,1))&&(Trajectories(1,k).dateAll(j,2)==Trajectories(1,k).uniqueDate(i,2))&&(Trajectories(1,k).dateAll(j,3)==Trajectories(1,k).uniqueDate(i,3))
for z=1:24
if(Trajectories(1,k).dateAll(j,4)==z)&&(size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1))
Trajectories(1,k).label(j)=s(1,k).places.all(z,i);
else if(Trajectories(1,k).dateAll(j,4)==z)&&(size(s(1,k).places.all,2)<size(Trajectories(1,k).uniqueDate,1))
for l=1:size(s(1,k).places.all,2)
Trajectories(1,k).label(l)=s(1,k).places.all(z,l);
end
end
end
end
end
end
end
end
E.g
Trajectories(1,4).dateAll=[1 2004 8 1 14 1 15 0 0 0 1 42 13 2;596 2004 8 1 16 20 14 0 0 0 1 29 12 NaN;674 2004 8 1 18 26 11 0 0 0 1 20 38 1;674 2004 8 2 10 7 40 0 0 0 14 26 5 3;674 2004 8 2 11 3 29 0 0 0 1 54 3 3;631 2004 8 2 11 57 56 0 0 0 0 30 8 2;1 2004 8 2 12 4 35 0 0 0 1 53 21 2;631 2004 8 2 12 52 58 0 0 0 0 20 36 2;631 2004 8 2 13 5 3 0 0 0 1 49 40 2;631 2004 8 2 14 0 20 0 0 0 1 56 12 2;631 2004 8 2 15 2 0 0 0 0 1 57 39 2;631 2004 8 2 16 1 4 0 0 0 1 55 53 2;1 2004 8 2 17 9 15 0 0 0 1 48 41 2];
Trajectories(1,4).uniqueDate= [2004 8 1;2004 8 2;2004 8 3;2004 8 4];
it runs but it's very very slow. How can I modify it to speed up?
Let's work from the inside out and see where it gets us.
Step 1: Simplify your comparison condition:
if (~isempty(s(1,k).places))&&(Trajectories(1,k).dateAll(j,1)==Trajectories(1,k).uniqueDate(i,1))&&(Trajectories(1,k).dateAll(j,2)==Trajectories(1,k).uniqueDate(i,2))&&(Trajectories(1,k).dateAll(j,3)==Trajectories(1,k).uniqueDate(i,3))
becomes
if (~isempty(s(1,k).places)) && all( Trajectories(1,k).dateAll(j,1:3)==Trajectories(1,k).uniqueDate(i,1:3) )
Then we want to remove this from a for-loop. The "intersect" function is useful here:
[ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');
We now have a vector i1 of all rows in dateAll that intersect with uniqueDate.
Now we can remove the loop comparing z using a similar approach:
[iz iz1 iz2] = intersect(Trajectories(1,k).dateAll(i1,4),1:24);
We have to be careful about our indices here, using a subset of a subset.
This simplifies the code to:
for k=1:nCols
if isempty(s(1,k).places)
continue; % skip to the next value of k, no need to do the rest of the comparison
end
[ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');
[iz iz1 iz2] = intersect(Trajectories(1,k).dateAll(i1,4),1:24);
usescalarlabel = (size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1);
if (usescalarlabel)
Trajectories(1,k).label(i1(iz1)) = s(1,k).places.all(iz,i2(iz1));
else
% you will need to check this: I think here you were needlessly repeating this step for every match
Trajectories(1,k).label(i1(iz1)) = s(1,k).places.all(iz,:);
end
end
But wait! That z loop is exactly the same as using indexing. So we don't need that second intersect after all:
for k=1:nCols
if isempty(s(1,k).places)
continue; % skip to the next value of k, no need to do the rest of the comparison
end
[ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');
usescalarlabel = (size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1);
label_indices = Trajectories(1,k).dateAll(i1,4);
if (usescalarlabel)
Trajectories(1,k).label(label_indices) = s(1,k).places.all(label_indices,i2);
else
% you will need to check this: I think here you were needlessly repeating this step for every match
Trajectories(1,k).label(label_indices) = s(1,k).places.all(label_indices,:);
end
end
You'll need to check the indexing in this - I'm sure I've made a mistake somewhere without having data to test against, but that should give you an idea on how to proceed removing the loops and using vector expressions instead. Without seeing the data that's as far as I can optimise. You may be able to go further if you can reformat your data into a set of 3d matrices / cells instead of using structs.
I am suspicious of your condition which I have called "usescalarlabel" - it seems like you are mixing two data types. Also I would strongly recommend separating the dateAll matrices into separate "date" and "data" matrices as the row indices 4 onwards don't seem to be dates. Also the example you copy/pasted in seems to have an extra value at row index 1? In that case you'll need to compare Trajectories(1,k).dateAll(:,2:4) instead of Trajectories(1,k).dateAll(:,1:3).
Good luck.

What is the worst case scenario for quicksort?

When does the quicksort algorithm take O(n^2) time?
Quicksort works by taking a pivot, then putting all the elements lower than that pivot on one side and all the higher elements on the other; it then recursively sorts the two sub groups in the same way (all the way down until everything is sorted.) Now if you pick the worst pivot each time (the highest or lowest element in the list) you'll only have one group to sort, with everything in that group other than the original pivot that you picked. This in essence gives you n groups that each need to be iterated through n times, hence the O(n^2) complexity.
The most common reason for this occurring is if the pivot is chosen to be the first or last element in the list in the quicksort implementation. For unsorted lists this is just as valid as any other, however for sorted or nearly sorted lists (which occur quite commonly in practice) this is very likely to give you the worst case scenario. This is why all half-decent implementations tend to take a pivot from the centre of the list.
There are modifications to the standard quicksort algorithm to avoid this edge case - one example is the dual-pivot quicksort that was integrated into Java 7.
In short, Quicksort for sorting an array lowest element first works like this:
Choose a pivot element
Presort array, such that all elements smaller than the pivot are on the left side
Recursively do step 1. and 2. for the left side and the right side
Ideally, you would want a pivot element that partitions the sequence in two equally long subsequences but this is not so easy.
There are different schemes for choosing the pivot element. Early versions just took the leftmost element. In the worst case, the pivot element will always be the lowest element of the current range.
Leftmost element is pivot
In this case it can be easily thought out that the worst case is an monotonic increasing array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Rightmost element is pivot
Similarly, when choosing the rightmost element the worst case will be a decreasing sequence.
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Center element is pivot
One possible remedy for the worst-case for presorted arrays, is to use the center element (or slightly left of center if the sequence is of even length). Then, the worst case would be quite more exotic. It can be constructed by modifying the Quicksort algorithm to set the array elements corresponding to the currently selected pivot element to a monotonic increasing value. I.e. we know the first pivot is the center, so the center must be the lowest value, e.g. 0. Next it gets swapped to the leftmost, i.e. the leftmost value is now in the center and would be the next pivot element, so it must be 1. Now, we can already guess that the array would look like this:
1 ? ? 0 ? ? ?
Here is the C++ code for the modified Quicksort to generate a worst sequence:
// g++ -std=c++11 worstCaseQuicksort.cpp && ./a.out
#include <algorithm> // swap
#include <iostream>
#include <vector>
#include <numeric> // iota
int main( void )
{
std::vector<int> v(20); /**< will hold the worst case later */
/* p basically saves the indices of what was the initial position of the
* elements of v. As they get swapped around by Quicksort p becomes a
* permutation */
auto p = v;
std::iota( p.begin(), p.end(), 0 );
/* in the worst case we need to work on v.size( sequences, because
* the initial sequence is always split after the first element */
for ( auto i = 0u; i < v.size(); ++i )
{
/* i can be interpreted as:
* - subsequence starting index
* - current minimum value, if we start at 0 */
/* note thate in the last step iPivot == v.size()-1 */
auto const iPivot = ( v.size()-1 + i )/2;
v[ p[ iPivot ] ] = i;
std::swap( p[ iPivot ], p[i] );
}
for ( auto x : v ) std::cout << " " << x;
}
The result:
0
0 1
1 0 2
2 0 1 3
1 3 0 2 4
4 2 0 1 3 5
1 5 3 0 2 4 6
4 2 6 0 1 3 5 7
1 5 3 7 0 2 4 6 8
8 2 6 4 0 1 3 5 7 9
1 9 3 7 5 0 2 4 6 8 10
6 2 10 4 8 0 1 3 5 7 9 11
1 7 3 11 5 9 0 2 4 6 8 10 12
10 2 8 4 12 6 0 1 3 5 7 9 11 13
1 11 3 9 5 13 7 0 2 4 6 8 10 12 14
8 2 12 4 10 6 14 0 1 3 5 7 9 11 13 15
1 9 3 13 5 11 7 15 0 2 4 6 8 10 12 14 16
16 2 10 4 14 6 12 8 0 1 3 5 7 9 11 13 15 17
1 17 3 11 5 15 7 13 9 0 2 4 6 8 10 12 14 16 18
10 2 18 4 12 6 16 8 14 0 1 3 5 7 9 11 13 15 17 19
1 11 3 19 5 13 7 17 9 15 0 2 4 6 8 10 12 14 16 18 20
16 2 12 4 20 6 14 8 18 10 0 1 3 5 7 9 11 13 15 17 19 21
1 17 3 13 5 21 7 15 9 19 11 0 2 4 6 8 10 12 14 16 18 20 22
12 2 18 4 14 6 22 8 16 10 20 0 1 3 5 7 9 11 13 15 17 19 21 23
1 13 3 19 5 15 7 23 9 17 11 21 0 2 4 6 8 10 12 14 16 18 20 22 24
There is order in this. The right side is just increments of two starting with zero. The left side also has an order. Let's format the left side for the 73 element long worst case sequence nicely using Ascii art:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
------------------------------------------------------------------------------------------------------------
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
37 39 41 43 45 47 49 51 53
55 57 59 61 63
65 67
69
71
The header is the element index. In the first row numbers starting from 1 and increasing by 2 are given to every 2nd element. In the second row the same is done to every 4th element, in the 3rd row numbers are assigned to every 8th element and so on. In this case the first value to be written in the i-th row is at index 2^i-1, but for certain lengths this looks a tad different.
The resulting structure is reminiscent to an inverted binary tree whose nodes are labeled bottom-up starting from the leaves.
Median of leftmost, center and rightmost elements is pivot
Another way is to use the median of the leftmost, the center and the rightmost element. In this case the worst case can only be, that the w.l.o.g. left subsequence is of length 2 (not just length 1 like in the examples above). Also we assume that the rightmost value will always be the highest of the median-of-three. This also means it is the highest of all values. Making adjustments in the program above, we now have this:
auto p = v;
std::iota( p.begin(), p.end(), 0 );
auto i = 0u;
for ( ; i < v.size(); i+=2 )
{
auto const iPivot0 = i;
auto const iPivot1 = ( i + v.size()-1 )/2;
v[ p[ iPivot1 ] ] = i+1;
v[ p[ iPivot0 ] ] = i;
std::swap( p[ iPivot1 ], p[i+1] );
}
if ( v.size() > 0 && i == v.size() )
v[ v.size()-1 ] = i-1;
The generated sequences are:
0
0 1
0 1 2
0 1 2 3
0 2 1 3 4
0 2 1 3 4 5
0 4 2 1 3 5 6
0 4 2 1 3 5 6 7
0 4 2 6 1 3 5 7 8
0 4 2 6 1 3 5 7 8 9
0 8 2 6 4 1 3 5 7 9 10
0 8 2 6 4 1 3 5 7 9 10 11
0 6 2 10 4 8 1 3 5 7 9 11 12
0 6 2 10 4 8 1 3 5 7 9 11 12 13
0 10 2 8 4 12 6 1 3 5 7 9 11 13 14
0 10 2 8 4 12 6 1 3 5 7 9 11 13 14 15
0 8 2 12 4 10 6 14 1 3 5 7 9 11 13 15 16
0 8 2 12 4 10 6 14 1 3 5 7 9 11 13 15 16 17
0 16 2 10 4 14 6 12 8 1 3 5 7 9 11 13 15 17 18
0 16 2 10 4 14 6 12 8 1 3 5 7 9 11 13 15 17 18 19
0 10 2 18 4 12 6 16 8 14 1 3 5 7 9 11 13 15 17 19 20
0 10 2 18 4 12 6 16 8 14 1 3 5 7 9 11 13 15 17 19 20 21
0 16 2 12 4 20 6 14 8 18 10 1 3 5 7 9 11 13 15 17 19 21 22
0 16 2 12 4 20 6 14 8 18 10 1 3 5 7 9 11 13 15 17 19 21 22 23
0 12 2 18 4 14 6 22 8 16 10 20 1 3 5 7 9 11 13 15 17 19 21 23 24
Pseudorandom element with random seed 0 is pivot
The worst case sequences for center element and median-of-three look already pretty random, but in order to make Quicksort even more robust the pivot element can be chosen randomly. If the random sequence used is at least reproducible on every Quicksort run, then we can also construct a worst case sequence for that. We only have to adjust the iPivot = line in the first program, e.g. to:
srand(0); // you shouldn't use 0 as a seed
for ( auto i = 0u; i < v.size(); ++i )
{
auto const iPivot = i + rand() % ( v.size() - i );
[...]
The generated sequences are:
0
1 0
1 0 2
2 3 1 0
1 4 2 0 3
5 0 1 2 3 4
6 0 5 4 2 1 3
7 2 4 3 6 1 5 0
4 0 3 6 2 8 7 1 5
2 3 6 0 8 5 9 7 1 4
3 6 2 5 7 4 0 1 8 10 9
8 11 7 6 10 4 9 0 5 2 3 1
0 12 3 10 6 8 11 7 2 4 9 1 5
9 0 8 10 11 3 12 4 6 7 1 2 5 13
2 4 14 5 9 1 12 6 13 8 3 7 10 0 11
3 15 1 13 5 8 9 0 10 4 7 2 6 11 12 14
11 16 8 9 10 4 6 1 3 7 0 12 5 14 2 15 13
6 0 15 7 11 4 5 14 13 17 9 2 10 3 12 16 1 8
8 14 0 12 18 13 3 7 5 17 9 2 4 15 11 10 16 1 6
3 6 16 0 11 4 15 9 13 19 7 2 10 17 12 5 1 8 18 14
6 0 14 9 15 2 8 1 11 7 3 19 18 16 20 17 13 12 10 4 5
14 16 7 9 8 1 3 21 5 4 12 17 10 19 18 15 6 0 11 2 13 20
1 2 22 11 16 9 10 14 12 6 17 0 5 20 4 21 19 8 3 7 18 15 13
22 1 15 18 8 19 13 0 14 23 9 12 10 5 11 21 6 4 17 2 16 7 3 20
2 19 17 6 10 13 11 8 0 16 12 22 4 18 15 20 3 24 21 7 5 14 9 1 23
So how to check whether those sequences are correct?
Measure time it took for the sequences. Plot time over the sequence length N. If the curve scales with O(N^2) instead of O(N log(N)), then these are indeed worst case sequences.
Adjust a correct Quicksort to give debug output about the subsequence lengths and/or the chosen pivot elements. One of the subsequences should always be of length 1 (or 2 for median-of-three). The chosen pivot elements printed should be increasing.
Getting a pivot equal to the lowest or highest number, should also trigger the worst case scenario of O(n2).
Different implementations of quicksort have different datasets required to give it a worstcase runtime. It depends on where the algorithm selects it's pivot-element.
And also as Ghpst said, selecting the biggest or smallest number would give you a worstcase.
If I remember correctly quicksort normally uses a random element for pivot to minimize the chance of getting a worstcase.
I think if the array is in revrse order then it will be worst case for pivot the last element of that array
The factors that contribute to the worst-case scenario of quicksort are as follows:
Worst case occurs when the subarrays are completely unbalanced
The worst case occurs when there are 0 elements in one subarray and n-1 elements in the other.
In other words, the worst-case running time of quicksort occurs when Quicksort takes in a sorted array (in decreasing order), to be on the time complexity of O(n^2).

Resources