I have a column called Project_Id which lists the names of many different projects, say Project A, Project B and so on. A second column lists the sales for each project.
A third column shows time series information. For example:
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
B 22 1
B 38 2
B 76 3
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
D 14 1
D 62 2
From this dataset, I need to choose (and thus create a new data set) only those projects which have at least 4 time series points, say, to get the new dataset (How do I get this by using an R code, is my question):
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
Could someone please help?
Thanks a lot!
I tried to do some filtering with R but had little success.
Related
I have matrix A
A =
5 10 15 20 25
10 9 8 7 6
-5 -15 -25 -35 -45
1 2 3 4 5
28 91 154 217 280
And i need to make a matrix B of the first, fourth and fifth row and the first and fifth column from matrix A.
How can i do it?
>> B = A([1,4,5],[1,5])
B =
5 25
1 5
28 280
You should look up how to use index expressions in the Matlab and Octave language to extract and work with submatrices.
See the Octave help on Index expressions: https://octave.org/doc/latest/Index-Expressions.html
I have some data:
P = [3 10 25 32 43 1 3
6 12 35 39 49 4 9
2 9 23 36 47 2 9
...
7 20 35 42 44 3 7
15 18 19 41 42 4 6
10 18 32 35 46 3 10];
Data is always between 1 and 50.
I am selecting left 5 columns and 2 right columns:
L=P(:,1:5);
R=P(:,6:7);
I am counting occurrences:
a=tabul(L);
b=tabul(R);
In this moment, in a I am getting:
50. 3.
49. 4.
48. 3.
which tells me, that value 50 occurs 3 times, 49 occurs 4 times and so on.
What I need now is sort matrix a by second column but the first column should be arranged with the second column values. So it would look like this:
49. 4.
50. 3.
48. 3.
How can I sort matrix a this way (later I will sort b the same way)?
I was trying something like:
[a,idx]=gsort(a(:,2),"g","d");
a=a(idx,:);
but this not does what I need.
It does not work because you are overwriting a in the gsort call although you just need the index here. The following does what you want:
[dummy,idx]=gsort(a(:,2),"g","d");
a=a(idx,:);
I have a table with a list of numbers. Each number belongs to an entity.
Entity Number
1 1
1 2
1 3
1 4
...
1 20
2 21
2 22
2 23
1 24
2 25
2 26
2 30
2 31
2 32
2 33
The goal is to list the numbers, grouped by the entities as ranges (min-max pairs).
I need to find a way to group the above table as:
Entity Min Max
1 1 20
2 21 23
1 24 24
2 25 26
2 30 33
I've succesfully done this in my education, but I always found it hard and can't remember how the algorithm was done
This looks similar to SQL Data Range Min Max category
and TSQL Select Min & Max row when grouping
In most face recognition SDK, it only provides two major functions
detecting faces and extracting templates from photos, this is called detection.
comparing two templates and returning the similar score, this is called recognition.
However, beyond those two functions, what I am looking for is an algorithm or SDK for grouping photos with similar faces together, e.g. based on similar scores.
Thanks
First, perform step 1 to extract the templates, then compare each template with all the others by applying step two on all the possible pairs, obtaining their similarity scores.
Sort the matches based on this similarity score, decide on a threshold and group together those templates that exceed it.
Take, for instance, the following case:
Ten templates: A, B, C, D, E, F, G, H, I, J.
Scores between: 0 and 100.
Similarity threshold: 80.
Similarity table:
A B C D E F G H I J
A 100 85 8 0 1 50 55 88 90 10
B 85 100 5 30 99 60 15 23 8 2
C 8 5 100 60 16 80 29 33 5 8
D 0 30 60 100 50 50 34 18 2 66
E 1 99 16 50 100 8 3 2 19 6
F 50 60 80 50 8 100 20 55 13 90
G 55 15 29 34 3 20 100 51 57 16
H 88 23 33 18 2 55 51 100 8 0
I 90 8 5 2 19 13 57 8 100 3
J 10 2 8 66 6 90 16 0 3 100
Sorted matches list:
AI 90
FJ 90
BE 99
AH 88
AB 85
CF 80
------- <-- Threshold cutoff line
DJ 66
.......
Iterate through the list until the threshold cutoff point, where the values no longer exceed it, maintain a full templates set and association sets for each template, obtaining the final groups:
// Empty initial full templates set
fullSet = {};
// Iterate through the pairs list
foreach (templatePair : pairList)
{
// If the full set contains the first template from the pair
if (fullSet.contains(templatePair.first))
{
// Add the second template to its group
templatePair.first.addTemplateToGroup(templatePair.second);
// If the full set also contains the second template
if (fullSet.contains(templatePair.second))
{
// The second template is removed from the full set
fullSet.remove(templatePair.second);
// The second template's group is added to the first template's group
templatePair.first.addGroupToGroup(templatePair.second.group);
}
}
else
{
// If the full set contains only the second template from the pair
if (fullSet.contains(templatePair.second))
{
// Add the first template to its group
templatePair.second.addTemplateToGroup(templatePair.first);
}
}
else
{
// If none of the templates are present in the full set, add the first one
// to the full set and the second one to the first one's group
fullSet.add(templatePair.first);
templatePair.first.addTemplateToGroup(templatePair.second);
}
}
Execution details on the list:
AI: fullSet.add(A); A.addTemplateToGroup(I);
FJ: fullSet.add(F); F.addTemplateToGroup(J);
BE: fullSet.add(B); B.addTemplateToGroup(E);
AH: A.addTemplateToGroup(H);
AB: A.addTemplateToGroup(B); fullSet.remove(B); A.addGroupToGroup(B.group);
CF: C.addTemplateToGroup(F);
In the end, you end up with the following similarity groups:
A - I, H, B, E
C - F, J
Imagine I've defined the following name in J:
m =: >: i. 2 4 5
This looks like the following:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
I want to create a monadic verb of rank 1 that applies to each list in this list of lists. It will double (+:) or add 1 (>:) to each alternate item in the list. If we were to apply this verb to the first row, we'd get 2 3 6 5 10.
It's fairly easy to get a list of booleans which alternate with each item, e.g., 0 1 $~{:$ m gives us 0 1 0 1 0. I thought, aha! I'll use something like +:`>: #. followed by some expression, but I could never quite get it to work.
Any suggestions?
UPDATE
The following appears to work, but perhaps it can be refactored into something more elegant by a J pro.
poop =: monad define
(($ y) $ 0 1 $~{:$ y) ((]+:)`(]>:) #. [)"0 y
)
I would use the oblique verb, with rank 1 (/."1)- so it applies to successive elements of each list in turn.
You can pass a gerund into /. and it applies them in order, extending cyclically.
+:`>: /."1 m
2
3
6
5
10
12
8
16
10
20
22
13
26
15
30
32
18
36
20
40
42
23
46
25
50
52
28
56
30
60
62
33
66
35
70
72
38
76
40
80
I spent a long time and I looked at it, and I believe that I know why ,# works to recover the shape of the argument.
The shape of the arguments to the parenthesized phrase is the shape of the argument passed to it on the right, even though the rank is altered by the " conjugate (well, that is what trace called it, I thought it was an adverb). If , were monadic, it would be a ravel, and the result would be a vector or at least of a lower rank than the input, based on adverbs to ravel. That is what happens if you take the conjunction out - you get a vector.
So what I believe is happening is that the conjunction is making , act like a dyadic , which is called an append. The append alters what it is appending to what it is appending to. It is appending to nothing but that thing still has a shape, and so it ends up altering the intermediate vector back to the shape of the input.
Now I'm probably wrong. But $,"0#(+:>:/.)"1 >: i. 2 4 5 -> 2 4 5 1 1` which I thought sort of proved my case.
(,#(+:`>:/.)"1 a) works, but note that ((* 2 1 $~ $)#(+ 0 1 $~ $)"1 a) would also have worked (and is about 20 times faster, on large arrays, in my brief tests).