I am trying to filter my data list using D3. What I am trying to do is filter my data based on date I specify and threshold value for precipitation.
Here is my code as
$(function() {
$("#datepicker").datepicker();
$("#datepicker").on("change",function(){
//var currentDate = $( "#datepicker" ).datepicker( "getDate" )/1000;
//console.log(currentDate)
});
});
function GenerateReport() {
d3.csv("/DataTest.csv", function(data) {
var startdate = $( "#datepicker" ).datepicker( "getDate" )/1000;
var enddate = startdate + 24*60*60
var data_Date = d3.values(data.filter(function(d) { return d["Date"] >=
startdate && d["Date"] <= enddate} ))
var x = document.getElementById("threshold").value
console.log(data_Date)
var data_Date_Threshold = data_Date.filter(function(d) {return
d.Precipitation > x});
My data set looks like
ID Date Prcip Flow Stage
1010 1522281000 0 0 0
1010 1522281600 0 0 0
1010 1522285200 10 0 0
1010 1522303200 12 200 1.2
1010 1522364400 6 300 2
1010 1522371600 4 400 2.5
1010 1522364400 6 500 2.8
1010 1522371600 4 600 3.5
2120 1522281000 0 0 0
2120 1522281600 0 0 0
2120 1522285200 10 100 1
2120 1522303200 12 1000 2
2120 1522364400 6 2000 3
2120 1522371600 4 2500 3.2
2290 1522281000 0 0 0
2290 1522281600 4 0 0
2290 1522285200 5 200 1
2290 1522303200 10 800 1.5
2290 1522364400 6 1500 3
2290 1522371600 0 1000 2
6440 1522281000 0 0 0
6440 1522281600 4 0 0
6440 1522285200 5 200 0.5
6440 1522303200 10 800 1
6440 1522364400 6 1500 2
6440 1522371600 0 100 1.4
When I use filter function, I have some problems.
What I have found is that when I use x = 2 to filter precipitation value, it does not catch precipitation = 10 or 12. However, when I use x=1, it works fine. I am guessing that it catches only the first number (e.g., if x=2, it regards precipitation = 10 or 12 is less than 2 since it looks only 1 in 10 and 12) Is there anyone who had the same issue what I have? Can anyone help me to solve this problem?
Thanks.
You are comparing strings. This comparison is therefore done lexicographically.
In order to accomplish what you want, you need to first convert these strings to numbers:
var x = Number(document.getElementById("threshold").value)
var data_Date_Threshold = data_Date.filter(function(d) {return Number(d.Precipitation) > x});
Alternatively, floats:
var x = parseFloat(document.getElementById("threshold").value)
var data_Date_Threshold = data_Date.filter(function(d) {return parseFloat(d.Precipitation) > x});
Related
I want to align the memory of a 5x5 matrix represented as an one-dimensional array.
The original array looks like this:
let mut a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25];
or
[ 1 2 3 4 5 ]
[ 6 7 8 9 10 ]
a = [ 11 12 13 14 15 ]
[ 16 17 18 19 20 ]
[ 21 22 23 24 25 ]
with a length of 25 elements.
after resizing the memory to memory aligned bounds (power of 2), the array will look like this:
a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 6 7 8 ]
[ 9 10 11 12 13 14 15 16 ]
[ 17 18 19 20 21 22 23 24 ]
[ 25 0 0 0 0 0 0 0 ]
a = [ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
the len of a is now 64 elements.
so it will become an 8x8 matrix
the goal is to have following representation:
a = [1 2 3 4 5 0 0 0 6 7 8 9 10 0 0 0 11 12 13 14 15 0 0 0 16 17 18 19 20 0 0 0 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 0 0 0 ]
[ 6 7 8 9 10 0 0 0 ]
[ 11 12 13 14 15 0 0 0 ]
[ 16 17 18 19 20 0 0 0 ]
[ 21 22 23 24 25 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
The background is to have a memory aligned to a power of two, so calculations can be partially done in parallel ( for OpenCL float4, or the available vector sizes.). I also do not want to use a new array to simply insert the old elements at the correct positions to keep memory consumption low.
At first, I thought about swapping the elements at the range, where there should be a zero with the elements at the end of the array, keeping a pointer to the elements and simulating a queue, but elements would stack up towards the end, and I didn't came up with a working solution.
My language of choice is rust. Is there any smart algorithm to achieve the desired result?
So you have an N * N matrix represented as a vector of size N^2, then you resize the vector to M^2 (M > N), so that the first N^2 elements are the original ones. Now you want to rearrange the original elements, so that the N * N sub-matrix in the upper left of the M * M matrix is the same as the original.
One thing to note is that if you go backwards you will never overwrite a value that you will need later.
The position of index X in the M * M matrix is row X / M (integer division) and column X % M.
The desired position of index X is row X / N and column X % N
An element at row R and column C in the M * M matrix has the index R * M + C
Now taking all this information we can come up with the formula to get the new index Y for the old index X:
Y = (X / N) * M + (X % N)
So you can just make a loop from N^2 - 1 to N and copy the element to the new position calculated with the formula and set its original position to 0. (Everything is 0-based, I hope rust is 0-based as well or you will have to add some +1.)
According to maraca's solution, the code would look like this:
fn zeropad<T: Copy>(
first: T,
data: &mut Vec<T>,
dims: (usize, usize),
) -> (usize, usize) {
let r = next_pow2(dims.0);
let c = next_pow2(dims.1);
if (r, c) == (dims.0, dims.1) {
return (r, c);
}
let new_len = r * c;
let old_len = data.len();
let old_col = dims.1;
// resize
data.resize(new_len, first);
for i in (old_col..old_len).rev() {
let row: usize = i / c;
let col: usize = i % c;
// bigger matrix
let pos_old = row * c + col;
// smaller matrix
let pos_new = (i / dims.1) * c + (i % dims.1);
data[pos_new] = data[pos_old];
data[pos_old] = first;
}
return (r, c);
}
I have a pretty bad way to convert my input logs to the input dataset.
I have an SFrame sf with the following format:
user_id int
timestamp datetime.datetime
action int
reasoncode str
action column takes up 9 values ranging from 1 to 9.
So, every user_id can perform more than 1 action, more than once.
I am trying to obtain all unique user_id from sf and create an op_sf in the following manner:
y = 225
def calc_class(a,x):
diffd = a['timestamp'].apply(lambda x: (dte - x).days)
g = 0
b = 0
for i in diffd:
if i > y:
g += 1
else:
b += 1
if b>= x:
return 4
elif b!= 0:
return 3
elif g>= 0:
return 2
else:
return 1
l1 = []
ids = z['user_id'].unique()
for idd in ids:
temp = sf[sf['user_id']== idd]
zero1 = temp[temp['action'] == 1]
zero2 = temp[temp['action'] == 2]
zero3 = temp[temp['action'] == 3]
zero4 = temp[temp['action'] == 4]
zero5 = temp[temp['action'] == 5]
zero6 = temp[temp['action'] == 6]
zero7 = temp[temp['action'] == 7]
zeroh8 = temp[temp['reasoncode'] == 'xyz']
zero9 = temp[temp['reasoncode'] == 'abc']
/* I'm getting clas1 to clas9 from function calc_class for each action
clas1 to clas9 are 4 integers ranging from 1 to 4
*/
clas1 = calc_class(zero1,2)
clas2 = calc_class(zero2,2)
clas3 = calc_class(zero3,2)
clas4 = calc_class(zero4,2)
clas5 = calc_class(zero5,2)
clas6 = calc_class(zero6,2)
clas7 = calc_class(zero7,2)
clas8 = calc_class(zero8,2)
clas9 = calc_class(zero9,2)
l1.append([idd,clas1,clas2,clas3,clas4,clas5*(-1),clas6*(-1),clas7*(-1),clas8*(-1),clas9])
I wanted to know if this is the fastest way of doing this. Specifically if it is possible to do the same thing without generating the zero1 to zero9 SFrames.
An example sf:
user_id timestamp action reasoncode
574 23/09/15 12:43 1 None
574 23/09/15 11:15 2 None
574 06/10/15 11:20 2 None
574 06/10/15 11:21 3 None
588 04/11/15 10:00 1 None
588 05/11/15 10:00 1 None
555 15/12/15 13:00 1 None
585 22/12/15 17:30 1 None
585 15/01/16 07:44 7 xyz
588 06/01/16 08:10 7 abc
l1 corresponding to the above sf:
574 1 2 2 0 0 0 0 0 0
588 3 0 0 0 0 0 0 0 3
555 3 0 0 0 0 0 0 0 0
585 3 0 0 0 0 0 0 3 0
I think your logic is relatively complex, but it's still more efficient to use column-wise operations on the whole dataset, rather than extracting the subset of rows for each user. The key tools are SFrame.groupby, SFrame.apply, SFrame.unstack, and SFrame.unpack. API docs here:
https://dato.com/products/create/docs/generated/graphlab.SFrame.html
Here's a solution that uses slightly simpler data than your example and slightly simpler logic to code the old vs. new actions.
# Set up and make the data
import graphlab as gl
import datetime as dt
sf = gl.SFrame({'user': [574, 574, 574, 588, 588, 588],
'timestamp': [dt.datetime(2015, 9, 23), dt.datetime(2015, 9, 23),
dt.datetime(2015, 10, 6), dt.datetime(2015, 11, 4),
dt.datetime(2015, 11, 5), dt.datetime(2016, 1, 6)],
'action': [1, 2, 3, 1, 1, 7]})
# Count old vs. new actions.
sf['days_elapsed'] = (dt.datetime.today() - sf['timestamp']) / (3600 * 24)
sf['old_threshold'] = sf['days_elapsed'] > 225
aggregator = {'total_count': gl.aggregate.COUNT('user'),
'old_count': gl.aggregate.SUM('old_threshold')}
grp = sf.groupby(['user', 'action'], aggregator)
# Code the actions according to old vs. new. Use your own logic here.
grp['action_code'] = grp.apply(
lambda x: 2 if x['total_count'] > x['old_count'] else 1)
grp = grp[['user', 'action', 'action_code']]
# Reshape the results into columns.
sf_new = (grp.unstack(['action', 'action_code'], new_column_name='action_code')
.unpack('action_code'))
# Fill in zeros for entries with no actions.
for c in sf_new.column_names():
sf_new[c] = sf_new[c].fillna(0)
print sf_new
+------+---------------+---------------+---------------+---------------+
| user | action_code.1 | action_code.2 | action_code.3 | action_code.7 |
+------+---------------+---------------+---------------+---------------+
| 588 | 2 | 0 | 0 | 2 |
| 574 | 1 | 1 | 1 | 0 |
+------+---------------+---------------+---------------+---------------+
[2 rows x 5 columns]
I have a vector
A = [ 1 1 1 2 2 3 6 8 9 9 ]
I would like to write a loop that counts the frequencies of values in my vector within a range I choose, this would include values that have 0 frequencies
For example, if I chose the range of 1:9 my results would be
3 2 1 0 0 1 0 1 2
If I picked 1:11 the result would be
3 2 1 0 0 1 0 1 2 0 0
Is this possible? Also ideally I would have to do this for giant matrices and vectors, so the fasted way to calculate this would be appreciated.
Here's an alternative suggestion to histcounts, which appears to be ~8x faster on Matlab 2015b:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11;
N = accumarray(A(:), 1, [maxRange,1])';
N =
3 2 1 0 0 1 0 1 2 0 0
Comparing the speed:
K>> tic; for i = 1:100000, N1 = accumarray(A(:), 1, [maxRange,1])'; end; toc;
Elapsed time is 0.537597 seconds.
K>> tic; for i = 1:100000, N2 = histcounts(A,1:maxRange+1); end; toc;
Elapsed time is 4.333394 seconds.
K>> isequal(N1, N2)
ans =
1
As per the loop request, here's a looped version, which should not be too slow since the latest engine overhaul:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11; %// your range
output = zeros(1,maxRange); %// initialise output
for ii = 1:maxRange
tmp = A==ii; %// temporary storage
output(ii) = sum(tmp(:)); %// find the number of occurences
end
which would result in
output =
3 2 1 0 0 1 0 1 2 0 0
Faster and not-looping would be #beaker's suggestion to use histcounts:
[N,edges] = histcounts(A,1:maxRange+1);
N =
3 2 1 0 0 1 0 1 2 0
where the +1 makes sure the last entry is included as well.
Assuming the input A to be a sorted array and the range starts from 1 and goes until some value greater than or equal to the largest element in A, here's an approach using diff and find -
%// Inputs
A = [2 4 4 4 8 9 11 11 11 12]; %// Modified for variety
maxN = 13;
idx = [0 find(diff(A)>0) numel(A)]+1;
out = zeros(1,maxN); %// OR for better performance : out(maxN) = 0;
out(A(idx(1:end-1))) = diff(idx);
Output -
out =
0 1 0 3 0 0 0 1 1 0 3 1 0
This can be done very easily with bsxfun.
Let the data be
A = [ 1 1 1 2 2 3 6 8 9 9 ]; %// data
B = 1:9; %// possible values
Then
result = sum(bsxfun(#eq, A(:), B(:).'), 1);
gives
result =
3 2 1 0 0 1 0 1 2
Given an image I and two matrices m_1 ;m_2 (same size with I). The function f is defined as:
Because my goal design wants to get the sign of f . Hence, the function f can rewritten as following:
I think that second formula is faster than first formula because: It
can ignore the square term
It can compute the sign directly, instead of two steps in first equation: compute the f and check sign.
Do you agree with me? Do you have another faster formula for f
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
The ouput f is
f =[1 1 1 1 1
1 1 -1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1]
I implemented the first way, but I did not finish the second way by matlab. Could you check help me the second way and compare it
UPDATE: I would like to add code of chepyle and Divakar to make clearly question. Note that both of them give the same result as above f
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
f(f==0)=1;
end
function f= second_way()
f = double(abs(I-m1) >= abs(I-m2));
f(f==0) = -1;
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
f(f==0) = 1;
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
First way : 1.2897e-05
Second way: 1.9381e-05
Third way : 2.0077e-05
This seems to be comparable and might be a wee bit faster at times than the original approach -
f = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2))
Benchmarking Code
%// Create random inputs
N = 5000;
I = randi(1000,N,N);
m1 = randi(1000,N,N);
m2 = randi(1000,N,N);
num_iter = 20; %// Number of iterations for all approaches
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------- With Original Approach')
tic
for iter = 1:num_iter
out1 = sign((I-m1).^2-(I-m2).^2);
out1(out1==0)=-1;
end
toc, clear out1
disp('------------------------- With Proposed Approach')
tic
for iter = 1:num_iter
out2 = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2));
end
toc
Results
------------------------- With Original Approach
Elapsed time is 1.751966 seconds.
------------------------- With Proposed Approach
Elapsed time is 1.681263 seconds.
There is a problem with the accuracy of second formula, but for the sake of comparison, here's how I would implement it in matlab, along with a third approach to avoid squaring and the sign() function, inline with your intent. Note that the matlab's matrix and sign functions are pretty well optimized, the second and third approaches are both slower.
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
end
function f= second_way()
v1=(I-m1);
v2=(I-m2);
f= int8(v1<=0 & v2>0) + -1* int8(v1>0 & v2<=0);
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
The output:
First way : 9.4226e-06
Second way: 1.2247e-05
Third way : 1.1546e-05
Assume the following matrix:
myMatrix = [
1 0 1
1 0 0
1 1 1
1 1 1
0 1 1
0 0 0
0 0 0
0 1 0
1 0 0
0 0 0
0 0 0
0 0 1
0 0 1
0 0 1
];
Given the above (and treating each column independently), I'm trying to create a matrix that will contain the number of rows since the last value of 1 has "shown up". For example, in the first column, the first four values would become 0 since there are 0 rows between each of those rows and the previous value of 1.
Row 5 would become 1, row 6 = 2, row 7 = 3, row 8 = 4. Since row 9 contains a 1, it would become 0 and the count starts again with row 10. The final matrix should look like this:
FinalMatrix = [
0 1 0
0 2 1
0 0 0
0 0 0
1 0 0
2 1 1
3 2 2
4 0 3
0 1 4
1 2 5
2 3 6
3 4 0
4 5 0
5 6 0
];
What is a good way of accomplishing something like this?
EDIT: I'm currently using the following code:
[numRow,numCol] = size(myMatrix);
oneColumn = 1:numRow;
FinalMatrix = repmat(oneColumn',1,numCol);
toSubtract = zeros(numRow,numCol);
for m=1:numCol
rowsWithOnes = find(myMatrix(:,m));
for mm=1:length(rowsWithOnes);
toSubtract(rowsWithOnes(mm):end,m) = rowsWithOnes(mm);
end
end
FinalMatrix = FinalMatrix - toSubtract;
which runs about 5 times faster than the bsxfun solution posted over many trials and data sets (which are about 1500 x 2500 in size). Can the code above be optimized?
For a single column you could do this:
col = 1; %// desired column
vals = bsxfun(#minus, 1:size(myMatrix,1), find(myMatrix(:,col)));
vals(vals<0) = inf;
result = min(vals, [], 1).';
Result for first column:
result =
0
0
0
0
1
2
3
4
0
1
2
3
4
5
find + diff + cumsum based approach -
offset_array = zeros(size(myMatrix));
for k1 = 1:size(myMatrix,2)
a = myMatrix(:,k1);
widths = diff(find(diff([1 ; a])~=0));
idx = find(diff(a)==1)+1;
offset_array(idx(idx<=numel(a)),k1) = widths(1:2:end);
end
FinalMatrix1 = cumsum(double(myMatrix==0) - offset_array);
Benchmarking
The benchmarking code for comparing the above mentioned approach against the one in the question is listed here -
clear all
myMatrix = round(rand(1500,2500)); %// create random input array
for k = 1:50000
tic(); elapsed = toc(); %// Warm up tic/toc
end
disp('------------- With FIND+DIFF+CUMSUM based approach') %//'#
tic
offset_array = zeros(size(myMatrix));
for k1 = 1:size(myMatrix,2)
a = myMatrix(:,k1);
widths = diff(find(diff([1 ; a])~=0));
idx = find(diff(a)==1)+1;
offset_array(idx(idx<=numel(a)),k1) = widths(1:2:end);
end
FinalMatrix1 = cumsum(double(myMatrix==0) - offset_array);
toc
clear FinalMatrix1 offset_array idx widths a
disp('------------- With original approach') %//'#
tic
[numRow,numCol] = size(myMatrix);
oneColumn = 1:numRow;
FinalMatrix = repmat(oneColumn',1,numCol); %//'#
toSubtract = zeros(numRow,numCol);
for m=1:numCol
rowsWithOnes = find(myMatrix(:,m));
for mm=1:length(rowsWithOnes);
toSubtract(rowsWithOnes(mm):end,m) = rowsWithOnes(mm);
end
end
FinalMatrix = FinalMatrix - toSubtract;
toc
The results I got were -
------------- With FIND+DIFF+CUMSUM based approach
Elapsed time is 0.311115 seconds.
------------- With original approach
Elapsed time is 7.587798 seconds.