Convert between priority of two systems - algorithm

I have two "systems" the with some records.
System A has record with priority from 1-4 (lets say minSystemAP = 4, maxSystemAP = 1).
System A Priority:
4 - Low, 3- Medium, 2 - High, 1 - Critical.
System B has record with priority from 1-10 (lets say minSystemBP = 10, maxSystemBP = 1).
System A Priority: 1-Minimum, 5- Medium, 10- High.
I'm trying to create record from System B in System A.
How can i "convert" between the priority of SystemA to SystemB?
Meaning, Record with priority 10 in System B will be record with priority ~ 4 in System A.
and record with priority 5 in System B will be record with priority ~ 2 in System A.
What is the best way to do that?

The mapping must assign multiple priorities in SystemB to one priority in SystemA.
E.g.:
A 1 1 2 2 2 3 3 3 4 4
B 1 2 3 4 5 6 7 8 9 10
This mapping-function would meet your requirements and can simply be implemented as integer-division:
int prioA = prioB / 3 + 1;

Related

Algorithm to distribute evenly products value into care packages

i'm currently solving a problem that states:
A company filed for bankruptcy and decided to pay the employees with the last remaining valuable items in the company only if it can be distributed evenly among them so that all of them have at least received 1 item and that the difference between the employee carrying the most valuable items and the employee carrying the least valuable items can not exceed a certain value x;
Input:
First row contains number of employee;
Second row contains the x value so that the the difference between the employee carrying the most valuable items and the employee carrying the least valuable items can not exceed;
Third row contains all the items with their value;
Output:
First number is the least valuable basket of items value and the second is the most valuable basket;
Example:
Input:
5
4
2 5 3 11 4 3 1 15 7 8 10
Output:
13 15
Input:
5
4
1 1 1 11 1 3 1 2 7 8
Output:
NO (It's impossible to distribute evenly)
Input:
5
10
1 1 1 1
Output:
NO (It's impossible to distribute evenly)
My solution to resolve this problem taking the first input is to, sort the items in ascending or descending order so from
2 5 3 11 4 3 1 15 7 8 10 --> 1 2 3 3 4 5 7 8 10 11 15
then create an adjacency list or just store it in simple variables where we add the biggest number to the lowest basket while iterating the item values array
Element 0: 15
Element 1: 11 <- 3 (sum 14)
Element 2: 10 <- 3 (sum 13)
Element 3: 8 <- 4 <- 1 (sum 13)
Element 4: 7 <- 5 <- 2 (sum 14)
So that my solution will have O(nlogN + 2n), first part using merge sort and then finding max e min value, what do you guys think about this solution?

From edge or arc list to clusters in Stata

I have a Stata dataset that represents connections between users that looks like this:
src_user linked_user
1 2
2 3
3 5
1 4
6 7
I would like to get something like this:
user cluster
1 1
2 1
3 1
4 1
5 1
6 2
7 2
where isid user evaluates to TRUE and I have grouped all users into disjoint clusters. I have tried thinking of this as a reshape problem, but without much success. None of the user-written SNA commands seem to accomplish this as far as I can tell.
What is the most efficient way of doing it with Stata, other than looping, which I am eager to avoid ?
If you reshape the data to long form, you can use group_id (from SSC) to get what you want.
clear
input user1 user2
1 2
2 3
3 5
1 4
6 7
end
gen id = _n
reshape long user, i(id) j(n)
clonevar cluster = id
list, sepby(cluster)
group_id cluster, match(user)
bysort cluster user (id): keep if _n == 1
list, sepby(cluster)

KDB: pnl in FIFO manner

Consider the following table:
Id Verb Qty Price
`1 Buy 6 10.0
`2 Sell 5 11.0
`3 Buy 4 10.0
`4 Sell 3 11.0
`5 Sell 8 9.0
`6 Buy 1 8.0
etc...
What I would like is to associate a PNL with each transaction, computed on a FIFO (first-in-first-out basis). Thus, for Id=`1, I want the PNL to be -6*(10.0) +5*(11.0) + 1*(11.0) = +$6.00, for Id=`3, Pnl is -4*(10.0)+2*(11.0)+(2*9.0) = $0, etc.
In layman's terms, For the first buy-order of size 6, I want to offset this by the first 6 sells, and for the second buy-order of size 4, offset this with the subsequent 4 sells that have not been included in the pnl computation for the buy-6 order.
Any advice?
Take data from your example:
txn:([] t: til 6; side:`Buy`Sell`Buy`Sell`Sell`Buy; qty:6 5 4 3 8 1; px: 10.0 11.0 10.0 11.0 9.0 8.0)
Best to maintain buys and sells transactions/fills separately in your database:
buys: select from txn where side=`Buy
sells: select from txn where side=`Sell
Functions we'll need [1]:
/ first-in first-out allocation of bid/buy and ask/sell fills
/ returns connectivity matrix of (b)id fills in rows and (a)sk fills in columns
fifo: {deltas each deltas sums[x] &\: sums[y]};
/ connectivity list from connectivity matrix
lm: {raze(til count x),''where each x};
/ realized profit & loss
rpnl: {[b;s]
t: l,'f ./: l:lm (f:fifo[exec qty from b;exec qty from s])>0;
pnl: (select bt:t, bqty:qty, bpx:px from b#t[;0]),'(select st:t, sqty:qty, spx:px from s#t[;1]),'([] qty: t[;2]);
select tstamp: bt|st, rpnl:qty*spx-bpx from pnl
}
Run:
q)rpnl[buys;sells]
tstamp rpnl
-----------
1 5
3 1
3 2
4 -2
5 1
According to my timings, should be ~ 2x faster than the next best solution, since it's nicely vectorized.
Footnotes:
fifo function is a textbook example from Q for Mortals. In your case, it looks like this:
q)fifo[exec qty from buys;exec qty from sells]
5 1 0
0 2 2
0 0 1
lm function tells which buys and sell pairs were crossed (non-zero fills). More background here: [kdb+/q]: Convert adjacency matrix to adjacency list
q)lm fifo[exec qty from buys;exec qty from sells]>0
0 0
0 1
1 1
1 2
2 2
Cryptic first line of rpnl is then combination of the two concepts above:
q)t: l,'f ./: l:lm (f:fifo[exec qty from buys;exec qty from sells])>0;
0 0 5
0 1 1
1 1 2
1 2 2
2 2 1
A similar approach to JPC, but keeping things tabular:
q)tab:([] Id:`1`2`3`4`5`6;Verb:`Buy`Sell`Buy`Sell`Sell`Buy;Qty:6 5 4 3 8 1;Price:10.0 11.0 10.0 11.0 9.0 8.0)
q)tab
Id Verb Qty Price
-----------------
1 Buy 6 10
2 Sell 5 11
3 Buy 4 10
4 Sell 3 11
5 Sell 8 9
6 Buy 1 8
pnlinfo:{[x;y]
b:exec first'[(Qty;Price)] from x where Id=y;
r:exec (remQty;fifo[remQty;b 0];Price) from x where Verb=`Sell;
x:update remQty:r 1 from x where Verb=`Sell;
update pnl:neg[(*) . b]+sum[r[2]*r[0]-r[1]] from x where Id=y
};
fifo:{x-deltas y&sums x};
pnlinfo/[update remQty:Qty from tab where Verb=`Sell;exec Id from tab where Verb=`Buy]
Id Verb Qty Price remQty pnl
----------------------------
1 Buy 6 10 6
2 Sell 5 11 0
3 Buy 4 10 0
4 Sell 3 11 0
5 Sell 8 9 5
6 Buy 1 8 1
Assumes that Buys will be offset against previous sells as well as future sells.
You could also in theory use other distributions such as
lifo:{x-reverse deltas y&sums reverse x}
but I haven't tested that.
Here is a first attempt to get the ball rolling. Not efficient.
q)t:([]id:1+til 6;v:`b`s`b`s`s`b;qty:6 5 4 3 8 1; px:10 11 10 11 9 8)
//how much of each sale offsets a given purchase
q)alloc:last each (enlist d`s){(fx-c;c:deltas y&sums fx:first x)}\(d:exec qty by v from t)`b
//revenues, ie allocated sale * appropriate price
q)revs:alloc*\:exec px from t where v=`s
q)(sum each revs)-exec qty*px from t where v=`b
6 0 1
Slightly different approach without using over/scan(except in sums...).
Here we create a list of duplicated indices(one per unit Qty) of every Sell order and use cut to assign them to the appropriate Buy order, then we index into the Price of those Sells and find the difference with the Price of the appropriate Buy order.
This should scale with table size, but memory will blow up when Qty is large.
q)tab:([] Id:`1`2`3`4`5`6;Verb:`Buy`Sell`Buy`Sell`Sell`Buy;Qty:6 5 4 3 8 1;Price:10.0 11.0 10.0 11.0 9.0 8.0)
q)sideMap:`Buy`Sell!1 -1
q)update pnl:sum each neg Price - Price{sells:where neg 0&x; -1_(count[sells]&0,sums 0|x) _ sells}Qty*sideMap[Verb] from tab
Id Verb Qty Price pnl
---------------------
1 Buy 6 10 6
2 Sell 5 11 0
3 Buy 4 10 0
4 Sell 3 11 0
5 Sell 8 9 0
6 Buy 1 8 1

File sharding between servers algorithm

I want to distribute files across multiple servers and have them available with very little overhead. So I was thinking of the following naive algorithm:
Providing that each file has an unique ID number: 120151 I'm thinking of segmenting the files using the modulo (%) operator. This works if I know the number of servers in advance:
Example with 2 servers (stands for n servers):
server 1 : ID % 2 = 0 (contains even IDs)
server 2 : ID % 2 = 1 (contains odd IDs)
However when I need to scale this and add more servers I will have to re-shuffle the files to obey the new algorithm rules and we don't want that.
Example:
Say I add server 3 into the mix because I cannot handle the load. Server 3 will contain files that respect the following criteria:
server 3 : ID%3 = 2
Step 1 is to move the files from server 1 and server 2 where ID%3 = 2.
However, I'll have to move some files between server 1 and server 2 so that the following occurs:
server 1 : ID%3 = 0
server 2 : ID%3 = 1
What's the optimal way to achieve this?
My approach would be to use consistent hashing. From Wikipedia:
Consistent hashing is a special kind of hashing such that when a hash
table is resized and consistent hashing is used, only K/n keys need to
be remapped on average, where K is the number of keys, and n is the
number of slots.
The general idea is this:
Think of your servers as arranged on a ring, ordered by their server_id
Each server is assigned a uniformly distributed (random) id, e.g. server_id = SHA(node_name).
Each file is equally assigned a uniformly distributed id, e.g. file_id = SHA(ID), where ID is as given in your example.
Choose the server that is 'closest' to the file_id, i.e. where server_id > file_id (start choosing with the smallest server_id).
If there is no such node, there is a wrap around on the ring
Note: you can use any hash function that generates uniformly distributed hashes, so long as you use the same hash function for both servers and files.
This way, you get to keep O(1) access, and adding/removing is straight forward and does not require reshuffling all files:
a) adding a new server, the new node gets all the files from the next node on the ring with ids lower than the new server
b) removing a server, all of its files are given to the next node on the ring
Tom White's graphically illustrated overview explains in more detail.
To summarize your requirements:
Each server should store an (almost) equal amount of files.
You should be able to determine which server holds a given file - based only on the file's ID, in O(1).
When adding a file, requirements 1 and 2 should hold.
When adding a server, you want to move some files to it from all existing servers, such that requirements 1 and 2 would hold.
Your strategy when adding a 3rd server (x is the file's ID):
x%6 Old New
0 0 0
1 1 1
2 0 --> 2
3 1 --> 0
4 0 --> 1
5 1 --> 2
Alternative strategy:
x%6 Old New
0 0 0
1 1 1
2 0 0
3 1 1
4 0 --> 2
5 1 --> 2
To locate a server after the change:
0: x%6 in [0,2]
1: x%6 in [1,3]
2: x%6 in [4,5]
Adding a 4th server:
x%12 Old New
0 0 0
1 1 1
2 0 0
3 1 1
4 2 2
5 2 2
6 0 0
7 1 1
8 0 --> 3
9 1 --> 3
10 2 2
11 2 --> 3
To locate a server after the change:
0: x%12 in [0,2, 6]
1: x%12 in [1,3, 7]
2: x%12 in [4,5,10]
3: x%12 in [8,9,11]
When you add server, you can always build a new function (actually several alternative functions). The value of the divisor for n servers equals to lcm(1,2,...,n), so it grows very fast.
Note that you didn't mention if files are removed, and if you plan to handle that.

What are some good ways to calculate a score for how difference or close 2 users choices are?

For example, if it is the choice of chocolate, ice cream, donut, ..., for the order of their preference.
If user 1 choose
A B C D E F G H I J
and user 2 chooses
J A B C I G F E D H
what are some good ways to calculate a score from 0 to 100 to tell how close their choices are? It has to make sense, such as if most answers are the same but just 1 or 2 answers different, the score cannot be made to extremely low. Or, if most answers are just "shifted by 1 position", then we cannot count them as "all different" and give 0 score for those differences of only 1 position.
Assign each letter item an integer value starting at 1
A=1, B=2, C=3, D=4, E=5, F=6 (stopping at F for simplicity)
Then consider the order the items are placed, use this as a multiple
So if a number is the first item, its multiplier is 1, if its the 6th item the multipler is 6
Figure out the maximum score you could have (basically when everything is in consecutive order)
item a b c d e f
order 1 2 3 4 5 6
value 1 2 3 4 5 6
score 1 4 9 16 25 36 Sum = 91, Score = 100% (MAX)
item a b d c e f
order 1 2 3 4 5 6
value 1 2 4 3 5 6
score 1 4 12 12 25 36 Sum = 90 Score = 99%
=======================
order 1 2 3 4 5 6
item f d b c e a
value 6 4 2 3 5 1
score 6 8 6 12 25 6 Sum = 63 Score = 69%
order 1 2 3 4 5 6
item d f b c e a
value 4 6 2 3 5 1
score 4 12 6 12 25 6 Sum = 65 Score = 71%
obviously this is a very crude implementation that I just came up with. It may not work for everything. Examples 3 and 4 are swapped by one position yet the score is off by 2% (versus ex 1 and 2 which are off by 1%). It's just a thought. I'm no algorithm expert. You could probably use the final number and do something else to it for a better numerical comparison.
You could
Calculate the edit distance between the sequences;
Subtract the edit distance from the sequence length;
Divide that by the length of the sequence
Multiply it by hundred
Score = 100 * (SequenceLength - Levenshtein( Sequence1, Sequence2 ) ) / SequenceLength
Edit distance is basically the number of operations required to transform sequence one in sequence two. An algorithm therefore is the Levenshtein distance algorithm.
Examples:
Weights
insert: 1
delete: 1
substitute: 1
Seq 1: ABCDEFGHIJ
Seq 2: JABCIGFEDH
Score = 100 * (10-7) / 10 = 30
Seq 1: ABCDEFGHIJ
Seq 2: ABDCFGHIEJ
Score = 100 * (10-3) / 10 = 70
The most straightforward way to calculate it is the Levenshtein distance, which is the number of changes that must be done to transform one string to another.
Disadvantage of Levenshtein distance for your task is that it doesn't measure closeness between products themselves. I.e. you will not know how A and J are close to each other. For example, user 1 may like donuts, and user 2 may like buns, and you know that most people who like first also like the second. From this information you can infer that user 1 makes choices that are close to choices of user 2, through they don't have same elements.
If this is your case, you will have to use one of two: statistical methods to infer correlation between choices or recommendation engines.

Resources