A problem similar to the travelling salesman problem - algorithm

I have different Features in my dataset these features names as following A B C D E F G H
There is a correlation between these features
Features Correlation
----------------------
A B 70
A C 78
B C 96
A G 93
.
.
.
Therefore, I would like to group similar features together so they can be represented by one feature
Something Like this
Seed Group Correlations Avg
-----------------------------------
A D & G 98 + 93 / 2 = 95.5
B F & C & E 85 + 96 + 79 / 3 = 86.6
..
..
..
H - -
So I get all close correlations in the same group
Another view to the problem
multiple cities in the country (City A B C D.. H)
Each city has a connection to another city
Cities Connection %
----------------------
A B 70
A C 78
B C 96
A G 93
.
.
.
We would like to hire area managers where cities with close connections can be served by the same area manager
We want to have the optimal number of area managers and where they should reside
Office Area Other Served Areas Connection Avg
------------------------------------------------------
A D & G 98 + 93 / 2 = 95.5
B F & C & E 85 + 96 + 79 / 3 = 86.6
..
..
..
H - -
I just want a method of how to figure how to split these features/cities in an optimum way that can cover most features/cities with a minimum number of links/area managers

Related

Creating subset of a data

I have a column called Project_Id which lists the names of many different projects, say Project A, Project B and so on. A second column lists the sales for each project.
A third column shows time series information. For example:
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
B 22 1
B 38 2
B 76 3
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
D 14 1
D 62 2
From this dataset, I need to choose (and thus create a new data set) only those projects which have at least 4 time series points, say, to get the new dataset (How do I get this by using an R code, is my question):
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
Could someone please help?
Thanks a lot!
I tried to do some filtering with R but had little success.

How to count number of occurrences in a sorted text file

I have a sorted text file with the following format:
Company1 Company2 Date TransactionAmount
A B 1/1/19 20000
A B 1/4/19 200000
A B 1/19/19 324
A C 2/1/19 3456
A C 2/1/19 663633
A D 1/6/19 3632
B C 1/9/19 84335
B C 1/23/19 253
B C 1/13/19 850
B D 1/1/19 234
B D 1/8/19 635
C D 1/9/19 749
C D 1/10/19 203200
Ultimately I want a Python dictionary so that each pair maps to a list containing the number of transactions and the total amount of all transactions. For instance, (A,B) would map to [3,220324].
The file has ~250,000 lines in this format and each pair may have 1 transaction up to ~10 or so transactions. There are also tens of thousands of pairs of companies.
Here's the only way I've thought of implementing it.
my_dict = {}
file = open("my_file.txt").readlines()[1:]
for i in file:
i = i.split()
pair = (i[0],i[1])
amt = int(i[3])
if pair in my_dict:
exist = my_dict[pair]
exist[0] += 1
exist[1] += amt
my_dict[pair] = exist
else:
my_dict[pair] = [1,amt]
I feel like there is a faster way to do this. Any ideas?

Can someone explain what is happening with the final part of this RSA example?

The title may not be correct, I was unsure how to phrase my question.
I am attempting to program with Python3.6 an asymmetric cipher similar to, I believe, that used with RSA encrypted communication
My logic understanding of this is as follows:
Person1 (p1) picks two prime numbers say 17 and 19
let p = 17 and q = 19
the product of these two numbers will be called n (n = p * q)
n = 323
p1 will then make public n
P1 will then make public another prime called e, e = 7
Person2(p2) wants to send p1 the letter H (72 in Ascii)
To do this p2 does the following ((72 ^ e) % n) and calls this value M
M = 13
p2 sends M to p1
p1 receives M and now needs to decrypt it
p1 can do this by calculating D where (e^D) % ((p-1)*(q-1)) = 1
In this example i know D = 247
With D p1 can calculate p2 message using M^D % n
which successfully gives 72 ('H' in ASCII)
With this said the following rules must apply:
GCD(e,m) = 1
where m = ((p-1)*(q-1))
otherwise (e^D) % ((p-1)*(q-1)) = 1 does not exist.
Now comes by issue! :)
Calculating D where the numbers are not so easy to work with.
Now please tell me if there is an easier way to calculate D but this is where I got upto using online aid.
(the example I looked at online used different values so they are as follows:
p=47
q=71
n = p*q = 3337
(p-1)*(q-1) = 3220
e = 79
Now we must find D. We know (e^D) % ((p-1)*(q-1)) = 1
Therefore D = 79^-1 % 3220
The equation is rewritten as 79*d = 1 mod 3220
This is where I get confused
Using regular Euclidean Algorithm gcd(79,3220) must equal 1 or there may not actually be a solution (are my descriptions correct here?)
3220 = 40*79 + 60 (79 goes into 3220 40 times with remainder 60)
79 = 1*60 + 19 (The last remainder 60 goes into 79 once with r 19)
60 = 3*19 + 3 (The last remainder 19 goes into 60 three times with r 3)
19 = 6*3 + 1 (The last remainder 3 goes into 19 6 times with r 1)
3 = 3*1 + 0 (The last remainder 1 goes into 3 three times with r 0)
The last nonzero remainder is the gcd. Thus gcd(79,3220) = 1 (as it should be)
The last step here I do not know what on earth is happening
I am told write the gcd(one) as a linear combination of 19 and 3220 by working back up the tree...
1 = 19-6*3
= 19-6*(60-3*19)
= 19*19 - 6*60
= 19*(79-60) - 6*60
= 19*79 - 25*60
= 19*79 - 25*(3220-40*79)
= 1019*79 - 25*3220
After this I am left with 1019*79 - 25*3220 = 1 and if i mod 3220 on both sides i get 1019*79 = 1 mod 3220
(the term that contains 3220 goes away because 3220 = 0 mod 3220).
Thus d = 1019.
So, the problem is to unwind the following sequence:
3220 = 40*79 + 60
79 = 1*60 + 19
60 = 3*19 + 3
19 = 6*3 + 1
3 = 3*1 + 0
First, forget the last row and start from the one with the last non-null remainder.
Then proceed step by step:
1 = 19 - 6*3 ; now expand 3
= 19 - 6*(60 - 3*19) = 19 - 6*60 + 18*19
= 19*19 - 6*60 ; now expand 19
= 19*(79 - 1*60) - 6*60 = 19*79 - 19*60 - 6*60
= 19*79 - 25*60 ; now expand 60
= 19*79 - 25*(3220 - 40*79) = 19*79 - 25*3220 + 1000*79
= 1019*79 - 25*3220 ; done
Note that the idea is to expand, at each step, the previous remainder. For instance, when expanding remainder 19 with: 79 - 1*60, you transform 19*19 - 6*60 into 19*(79 - 1*60) - 6*60. This lets you regroup around 79 and 60 and keep going.

Tableau LOD to find median

I have some data:
Inst Dest_Group Dest Cipn1 N
I1 C a 43
I1 F a 63
I1 U a 54
I1 C b 96
I1 F b 3
I1 U b 78
I1 C c 12
I1 F c 65
I1 U c 49
I2 C a 3
I2 F a 47
etc...
My worksheet is set up so that [Dest Cipn1] is a row, and [Dest Group] is a column. They display [value] as a bar chart. [value] = {include [Inst] : sum([N])} / {fixed [Inst] : sum([N])}
This worksheet is filtered on [Inst] = I1. I would like to add a reference line that shows the median value for each bar (cell) across all the [Inst]. (In the end I will add a band that displays 25th - 75th percentile but I figured working with the median would be simpler first).
I thought this would work, but it doesn't: [AllInstMedian] = {fixed [Inst],[Dest Group], [Dest Cipn1] : Sum([N])} / {fixed [Inst] : Sum([N])}
Any suggestions? I'm attaching a sample workbook here hoping that helps .
This is cross-posted here
Thank you
Steve mayer commented on the tableau link posted in the question with an answer. I ended up using a Lookup trick to copy inst and then used table calculations on the 25th and 75th window_percentile.

how to group photos with similar faces together

In most face recognition SDK, it only provides two major functions
detecting faces and extracting templates from photos, this is called detection.
comparing two templates and returning the similar score, this is called recognition.
However, beyond those two functions, what I am looking for is an algorithm or SDK for grouping photos with similar faces together, e.g. based on similar scores.
Thanks
First, perform step 1 to extract the templates, then compare each template with all the others by applying step two on all the possible pairs, obtaining their similarity scores.
Sort the matches based on this similarity score, decide on a threshold and group together those templates that exceed it.
Take, for instance, the following case:
Ten templates: A, B, C, D, E, F, G, H, I, J.
Scores between: 0 and 100.
Similarity threshold: 80.
Similarity table:
A B C D E F G H I J
A 100 85 8 0 1 50 55 88 90 10
B 85 100 5 30 99 60 15 23 8 2
C 8 5 100 60 16 80 29 33 5 8
D 0 30 60 100 50 50 34 18 2 66
E 1 99 16 50 100 8 3 2 19 6
F 50 60 80 50 8 100 20 55 13 90
G 55 15 29 34 3 20 100 51 57 16
H 88 23 33 18 2 55 51 100 8 0
I 90 8 5 2 19 13 57 8 100 3
J 10 2 8 66 6 90 16 0 3 100
Sorted matches list:
AI 90
FJ 90
BE 99
AH 88
AB 85
CF 80
------- <-- Threshold cutoff line
DJ 66
.......
Iterate through the list until the threshold cutoff point, where the values no longer exceed it, maintain a full templates set and association sets for each template, obtaining the final groups:
// Empty initial full templates set
fullSet = {};
// Iterate through the pairs list
foreach (templatePair : pairList)
{
// If the full set contains the first template from the pair
if (fullSet.contains(templatePair.first))
{
// Add the second template to its group
templatePair.first.addTemplateToGroup(templatePair.second);
// If the full set also contains the second template
if (fullSet.contains(templatePair.second))
{
// The second template is removed from the full set
fullSet.remove(templatePair.second);
// The second template's group is added to the first template's group
templatePair.first.addGroupToGroup(templatePair.second.group);
}
}
else
{
// If the full set contains only the second template from the pair
if (fullSet.contains(templatePair.second))
{
// Add the first template to its group
templatePair.second.addTemplateToGroup(templatePair.first);
}
}
else
{
// If none of the templates are present in the full set, add the first one
// to the full set and the second one to the first one's group
fullSet.add(templatePair.first);
templatePair.first.addTemplateToGroup(templatePair.second);
}
}
Execution details on the list:
AI: fullSet.add(A); A.addTemplateToGroup(I);
FJ: fullSet.add(F); F.addTemplateToGroup(J);
BE: fullSet.add(B); B.addTemplateToGroup(E);
AH: A.addTemplateToGroup(H);
AB: A.addTemplateToGroup(B); fullSet.remove(B); A.addGroupToGroup(B.group);
CF: C.addTemplateToGroup(F);
In the end, you end up with the following similarity groups:
A - I, H, B, E
C - F, J

Resources