Get top hit with filter in Kibana visualization - kibana-7

I will try to explain my issue,
I have this Index
ST ID time
0 1 12:04
1 1 12:00
0 2 12:02
1 2 11:58
0 2 11:22
1 3 12:33
0 3 12:20
In Kibana, I'm trying to get a table with the top hit from ID when ST = 0 to have results as
ST ID time
0 1 12:04
0 2 12:02
As you could see ID = 3 has top hit with ST = 1, so it shouldn't appear in the table
Could someone help me with this?
BR
BR

Related

plotting multiple graphs and animation from a data file in gnuplot

Suppose I have the following sample data file.
0 1 2
0 3 4
0 1 9
0 9 2
0 19 0
0 6 1
0 11 0
1 3 2
1 3 4
1 1 6
1 9 2
1 15 0
1 6 6
1 11 1
2 3 2
2 4 4
2 1 6
2 9 6
2 15 0
2 6 6
2 11 1
first column gives value of time. Second gives values of x and 3rd column y. I wish to plot graphs of y as functions of x from this data file at different times,
i.e, for t=0, I shall plot using 2:3 with lines up to t=0 index. Then same thing I shall do for the variables at t=1.
At the end of the day, I want to get a gif, i.e, an animation of how the y vs x graph changes shape as time goes on. How can I do this in gnuplot?
What have you tried so far? (Check help ternary and help gif)
You need to filter your data with the ternary operator and then create the animation.
Code:
### plot filtered data and animate
reset session
$Data <<EOD
0 1 2
0 3 4
0 1 9
0 9 2
0 19 0
0 6 1
0 11 0
1 3 2
1 3 4
1 1 6
1 9 2
1 15 0
1 6 6
1 11 1
2 3 2
2 4 4
2 1 6
2 9 6
2 15 0
2 6
2 11 1
EOD
set terminal gif animate delay 50 optimize
set output "myAnimation.gif"
set xrange[0:20]
set yrange[0:10]
do for [i=0:2] {
plot $Data u 2:($1==i?$3:NaN) w lp pt 7 ti sprintf("Time: %g",i)
}
set output
### end of code
Result:
Addition:
The meaning of $1==i?$3:NaN in words:
If the value in the first column is equal to i then the result is the value in the third column else it will be NaN ("Not a Number").

add a new column for unique ID in hive table

i have a table in hive with two columns: session_id and duration_time like this:
|| session_id || duration||
1 14
1 10
1 20
1 10
1 12
1 16
1 8
2 9
2 6
2 30
2 22
i want to add a new column with unique id when:
the session_id is changing or the duration_time > 15
i want the output to be like this:
session_id duration unique_id
1 14 1
1 10 1
1 20 2
1 10 2
1 12 2
1 16 3
1 8 3
2 9 4
2 6 4
2 30 5
2 22 6
any ideas how to do that in hive QL?
thanks!
SQL tables represent unordered sets. You need a column specifying the ordering of the values, because you seem to care about the ordering. This could be an id column or a created-at column, for instance.
You can do this using a cumulative sum:
select t.*,
sum(case when duration > 15 or seqnum = 1 then 1 else 0 end) over
(order by ??) as unique_id
from (select t.*,
row_number() over (partition by session_id order by ??) as seqnum
from t
) t;

Speed up code to compare fields in a struct

I have the struct Trajectories with field uniqueDate, dateAll, label: I want to compare the fields uniqueDate and dateAll and, if there is a correspondence, I will save in label a value from an other struct.
I have written this code:
for k=1:nCols
for j=1:size(Trajectories(1,k).dateAll,1)
for i=1:size(Trajectories(1,k).uniqueDate,1)
if (~isempty(s(1,k).places))&&(Trajectories(1,k).dateAll(j,1)==Trajectories(1,k).uniqueDate(i,1))&&(Trajectories(1,k).dateAll(j,2)==Trajectories(1,k).uniqueDate(i,2))&&(Trajectories(1,k).dateAll(j,3)==Trajectories(1,k).uniqueDate(i,3))
for z=1:24
if(Trajectories(1,k).dateAll(j,4)==z)&&(size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1))
Trajectories(1,k).label(j)=s(1,k).places.all(z,i);
else if(Trajectories(1,k).dateAll(j,4)==z)&&(size(s(1,k).places.all,2)<size(Trajectories(1,k).uniqueDate,1))
for l=1:size(s(1,k).places.all,2)
Trajectories(1,k).label(l)=s(1,k).places.all(z,l);
end
end
end
end
end
end
end
end
E.g
Trajectories(1,4).dateAll=[1 2004 8 1 14 1 15 0 0 0 1 42 13 2;596 2004 8 1 16 20 14 0 0 0 1 29 12 NaN;674 2004 8 1 18 26 11 0 0 0 1 20 38 1;674 2004 8 2 10 7 40 0 0 0 14 26 5 3;674 2004 8 2 11 3 29 0 0 0 1 54 3 3;631 2004 8 2 11 57 56 0 0 0 0 30 8 2;1 2004 8 2 12 4 35 0 0 0 1 53 21 2;631 2004 8 2 12 52 58 0 0 0 0 20 36 2;631 2004 8 2 13 5 3 0 0 0 1 49 40 2;631 2004 8 2 14 0 20 0 0 0 1 56 12 2;631 2004 8 2 15 2 0 0 0 0 1 57 39 2;631 2004 8 2 16 1 4 0 0 0 1 55 53 2;1 2004 8 2 17 9 15 0 0 0 1 48 41 2];
Trajectories(1,4).uniqueDate= [2004 8 1;2004 8 2;2004 8 3;2004 8 4];
it runs but it's very very slow. How can I modify it to speed up?
Let's work from the inside out and see where it gets us.
Step 1: Simplify your comparison condition:
if (~isempty(s(1,k).places))&&(Trajectories(1,k).dateAll(j,1)==Trajectories(1,k).uniqueDate(i,1))&&(Trajectories(1,k).dateAll(j,2)==Trajectories(1,k).uniqueDate(i,2))&&(Trajectories(1,k).dateAll(j,3)==Trajectories(1,k).uniqueDate(i,3))
becomes
if (~isempty(s(1,k).places)) && all( Trajectories(1,k).dateAll(j,1:3)==Trajectories(1,k).uniqueDate(i,1:3) )
Then we want to remove this from a for-loop. The "intersect" function is useful here:
[ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');
We now have a vector i1 of all rows in dateAll that intersect with uniqueDate.
Now we can remove the loop comparing z using a similar approach:
[iz iz1 iz2] = intersect(Trajectories(1,k).dateAll(i1,4),1:24);
We have to be careful about our indices here, using a subset of a subset.
This simplifies the code to:
for k=1:nCols
if isempty(s(1,k).places)
continue; % skip to the next value of k, no need to do the rest of the comparison
end
[ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');
[iz iz1 iz2] = intersect(Trajectories(1,k).dateAll(i1,4),1:24);
usescalarlabel = (size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1);
if (usescalarlabel)
Trajectories(1,k).label(i1(iz1)) = s(1,k).places.all(iz,i2(iz1));
else
% you will need to check this: I think here you were needlessly repeating this step for every match
Trajectories(1,k).label(i1(iz1)) = s(1,k).places.all(iz,:);
end
end
But wait! That z loop is exactly the same as using indexing. So we don't need that second intersect after all:
for k=1:nCols
if isempty(s(1,k).places)
continue; % skip to the next value of k, no need to do the rest of the comparison
end
[ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');
usescalarlabel = (size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1);
label_indices = Trajectories(1,k).dateAll(i1,4);
if (usescalarlabel)
Trajectories(1,k).label(label_indices) = s(1,k).places.all(label_indices,i2);
else
% you will need to check this: I think here you were needlessly repeating this step for every match
Trajectories(1,k).label(label_indices) = s(1,k).places.all(label_indices,:);
end
end
You'll need to check the indexing in this - I'm sure I've made a mistake somewhere without having data to test against, but that should give you an idea on how to proceed removing the loops and using vector expressions instead. Without seeing the data that's as far as I can optimise. You may be able to go further if you can reformat your data into a set of 3d matrices / cells instead of using structs.
I am suspicious of your condition which I have called "usescalarlabel" - it seems like you are mixing two data types. Also I would strongly recommend separating the dateAll matrices into separate "date" and "data" matrices as the row indices 4 onwards don't seem to be dates. Also the example you copy/pasted in seems to have an extra value at row index 1? In that case you'll need to compare Trajectories(1,k).dateAll(:,2:4) instead of Trajectories(1,k).dateAll(:,1:3).
Good luck.

Store values from a variable and reuse them

This is a question that could help me to solve another, still unsolved question I posted. Basically I need to condition a dataset in Stata and I thought a procedure which would need to first store certain values of a variable in a sort of matrix and then use compare the values of another variable with those stored in the matrix. A simple example could be the following:
obs id act1 act2 year act1year
1 1 0 1 2000 0
2 1 1 0 2001 2001
3 1 0 1 2004 0
4 2 1 0 2001 2001
5 2 1 0 2002 2002
6 2 0 1 2004 0
The code should be able to save in the matrix by(id) the value of act1year different from 0 (in this case 2001) for group 1 and then check if this value, for observations for which act2 is 1, is included in the range for obs i=1,3 [year(i) : year(i)-2] in this case the range does not contain the value stored in the matrix; therefore the observation will be dropped. For group id 2 the code should store [2001, 2002] and then check if the range [year(6):year(6)-2] contains any of the values stored in the matrix.
I hope my question is clear enough! Apologies for not posting any attempt but this is something I really have no idea about how to do.
Both this question and the previous discussion are difficult for me to understand, so let me suggest the following as a starting point to a solution that identifies observations for which either (a) act1 occurs or (b) act2 occurs no more than 2 years after the most recent act1 occurrence.
clear
input id act1 act2 year
1 0 1 2000
1 1 0 2001
1 0 1 2004
2 1 0 2001
2 1 0 2002
2 0 1 2004
end
generate a1yr = 0
replace a1yr = year if act1==1
generate act1r = -act1
bysort id (year act1r): replace a1yr=a1yr[_n-1] if a1yr==0 & _n>1
generate tokeep = 0
replace tokeep = 1 if act1==1
replace tokeep = 1 if act2==1 & year-a1yr<=2
list, clean noobs
Looking at the previous discussion, as it now stands, suggests substituting the following data into the code above and seeing if the code then meets the needs of that discussion.
input obsno id act1 act2 year
1 1 1 0 2000
2 1 0 1 2001
3 1 0 1 2002
4 1 0 1 2002
5 1 0 1 2003
6 2 1 0 2000
7 2 1 0 2001
8 2 0 1 2002
9 2 0 1 2002
10 2 0 1 2003
end

Sorting rows and columns of adjacency matrix to reveal cliques

I'm looking for a reordering technique to group connected components of an adjacency matrix together.
For example, I've made an illustration with two groups, blue and green. Initially the '1's entries are distributed across the rows and columns of the matrix. By reordering the rows and columns, all '1''s can be located in two contiguous sections of the matrix, revealing the blue and green components more clearly.
I can't remember what this reordering technique is called. I've searched for many combinations of adjacency matrix, clique, sorting, and reordering.
The closest hits I've found are
symrcm moves the elements closer to the diagonal, but does not make groups.
Is there a way to reorder the rows and columns of matrix to create a dense corner, in R? which focuses on removing completely empty rows and columns
Please either provide the common name for this technique so that I can google more effectively, or point me in the direction of a Matlab function.
I don't know whether there is a better alternative which should give you direct results, but here is one approach which may serve your purpose.
Your input:
>> A
A =
0 1 1 0 1
1 0 0 1 0
0 1 1 0 1
1 0 0 1 0
0 1 1 0 1
Method 1
Taking first row and first column as Column-Mask(maskCol) and
Row-Mask(maskRow) respectively.
Get the mask of which values contains ones in both first row, and first column
maskRow = A(:,1)==1;
maskCol = A(1,:)~=1;
Rearrange the Rows (according to the Row-mask)
out = [A(maskRow,:);A(~maskRow,:)];
Gives something like this:
out =
1 0 0 1 0
1 0 0 1 0
0 1 1 0 1
0 1 1 0 1
0 1 1 0 1
Rearrange columns (according to the column-mask)
out = [out(:,maskCol),out(:,~maskCol)]
Gives the desired results:
out =
1 1 0 0 0
1 1 0 0 0
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
Just a check whether the indices are where they are supposed to be or if you want the corresponding re-arranged indices ;)
Before Re-arranging:
idx = reshape(1:25,5,[])
idx =
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
After re-arranging (same process we did before)
outidx = [idx(maskRow,:);idx(~maskRow,:)];
outidx = [outidx(:,maskCol),outidx(:,~maskCol)]
Output:
outidx =
2 17 7 12 22
4 19 9 14 24
1 16 6 11 21
3 18 8 13 23
5 20 10 15 25
Method 2
For Generic case, if you don't know the matrix beforehand, here is the procedure to find the maskRow and maskCol
Logic used:
Take first row. Consider it as column mask (maskCol).
For 2nd row to last row, the following process are repeated.
Compare the current row with maskCol.
If any one value matches with the maskCol, then find the element
wise logical OR and update it as new maskCol
Repeat this process till the last row.
Same process for finding maskRow while the column are used for
iterations instead.
Code:
%// If you have a square matrix, you can combine both these loops into a single loop.
maskCol = A(1,:);
for ii = 2:size(A,1)
if sum(A(ii,:) & maskCol)>0
maskCol = maskCol | A(ii,:);
end
end
maskCol = ~maskCol;
maskRow = A(:,1);
for ii = 2:size(A,2)
if sum(A(:,ii) & maskRow)>0
maskRow = maskRow | A(:,ii);
end
end
Here is an example to try that:
%// Here I removed some 'ones' from first, last rows and columns.
%// Compare it with the original example.
A = [0 0 1 0 1
0 0 0 1 0
0 1 1 0 0
1 0 0 1 0
0 1 0 0 1];
Then, repeat the procedure you followed before:
out = [A(maskRow,:);A(~maskRow,:)]; %// same code used
out = [out(:,maskCol),out(:,~maskCol)]; %// same code used
Here is the result:
>> out
out =
0 1 0 0 0
1 1 0 0 0
0 0 0 1 1
0 0 1 1 0
0 0 1 0 1
Note: This approach may work for most of the cases but still may fail for some rare cases.
Here, is an example:
%// this works well.
A = [0 0 1 0 1 0
1 0 0 1 0 0
0 1 0 0 0 1
1 0 0 1 0 0
0 0 1 0 1 0
0 1 0 0 1 1];
%// This may not
%// Second col, last row changed to zero from one
A = [0 0 1 0 1 0
1 0 0 1 0 0
0 1 0 0 0 1
1 0 0 1 0 0
0 0 1 0 1 0
0 0 0 0 1 1];
Why does it fail?
As we loop through each row (to find the column mask), for eg, when we move to 3rd row, none of the cols match the first row (current maskCol). So the only information carried by 3rd row (2nd element) is lost.
This may be the rare case because some other row might still contain the same information. See the first example. There also none of the elements of third row matches with 1st row but since the last row has the same information (1 at the 2nd element), it gave correct results. Only in rare cases, similar to this might happen. Still it is good to know this disadvantage.
Method 3
This one is Brute-force Alternative. Could be applied if you think the previous case might fail. Here, we use while loop to run the previous code (finding row and col mask) number of times with updated maskCol, so that it finds the correct mask.
Procedure:
maskCol = A(1,:);
count = 1;
while(count<3)
for ii = 2:size(A,1)
if sum(A(ii,:) & maskCol)>0
maskCol = maskCol | A(ii,:);
end
end
count = count+1;
end
Previous example is taken (where the previous method fails) and is run with and without while-loop
Without Brute force:
>> out
out =
1 0 1 0 0 0
1 0 1 0 0 0
0 0 0 1 1 0
0 1 0 0 0 1
0 0 0 1 1 0
0 0 0 0 1 1
With Brute-Forcing while loop:
>> out
out =
1 1 0 0 0 0
1 1 0 0 0 0
0 0 0 1 1 0
0 0 1 0 0 1
0 0 0 1 1 0
0 0 0 0 1 1
The number of iterations required to get the correct results may vary. But it is safe to have a good number.
Good Luck!

Resources