KDB: pnl in FIFO manner - fifo

Consider the following table:
Id Verb Qty Price
`1 Buy 6 10.0
`2 Sell 5 11.0
`3 Buy 4 10.0
`4 Sell 3 11.0
`5 Sell 8 9.0
`6 Buy 1 8.0
etc...
What I would like is to associate a PNL with each transaction, computed on a FIFO (first-in-first-out basis). Thus, for Id=`1, I want the PNL to be -6*(10.0) +5*(11.0) + 1*(11.0) = +$6.00, for Id=`3, Pnl is -4*(10.0)+2*(11.0)+(2*9.0) = $0, etc.
In layman's terms, For the first buy-order of size 6, I want to offset this by the first 6 sells, and for the second buy-order of size 4, offset this with the subsequent 4 sells that have not been included in the pnl computation for the buy-6 order.
Any advice?

Take data from your example:
txn:([] t: til 6; side:`Buy`Sell`Buy`Sell`Sell`Buy; qty:6 5 4 3 8 1; px: 10.0 11.0 10.0 11.0 9.0 8.0)
Best to maintain buys and sells transactions/fills separately in your database:
buys: select from txn where side=`Buy
sells: select from txn where side=`Sell
Functions we'll need [1]:
/ first-in first-out allocation of bid/buy and ask/sell fills
/ returns connectivity matrix of (b)id fills in rows and (a)sk fills in columns
fifo: {deltas each deltas sums[x] &\: sums[y]};
/ connectivity list from connectivity matrix
lm: {raze(til count x),''where each x};
/ realized profit & loss
rpnl: {[b;s]
t: l,'f ./: l:lm (f:fifo[exec qty from b;exec qty from s])>0;
pnl: (select bt:t, bqty:qty, bpx:px from b#t[;0]),'(select st:t, sqty:qty, spx:px from s#t[;1]),'([] qty: t[;2]);
select tstamp: bt|st, rpnl:qty*spx-bpx from pnl
}
Run:
q)rpnl[buys;sells]
tstamp rpnl
-----------
1 5
3 1
3 2
4 -2
5 1
According to my timings, should be ~ 2x faster than the next best solution, since it's nicely vectorized.
Footnotes:
fifo function is a textbook example from Q for Mortals. In your case, it looks like this:
q)fifo[exec qty from buys;exec qty from sells]
5 1 0
0 2 2
0 0 1
lm function tells which buys and sell pairs were crossed (non-zero fills). More background here: [kdb+/q]: Convert adjacency matrix to adjacency list
q)lm fifo[exec qty from buys;exec qty from sells]>0
0 0
0 1
1 1
1 2
2 2
Cryptic first line of rpnl is then combination of the two concepts above:
q)t: l,'f ./: l:lm (f:fifo[exec qty from buys;exec qty from sells])>0;
0 0 5
0 1 1
1 1 2
1 2 2
2 2 1

A similar approach to JPC, but keeping things tabular:
q)tab:([] Id:`1`2`3`4`5`6;Verb:`Buy`Sell`Buy`Sell`Sell`Buy;Qty:6 5 4 3 8 1;Price:10.0 11.0 10.0 11.0 9.0 8.0)
q)tab
Id Verb Qty Price
-----------------
1 Buy 6 10
2 Sell 5 11
3 Buy 4 10
4 Sell 3 11
5 Sell 8 9
6 Buy 1 8
pnlinfo:{[x;y]
b:exec first'[(Qty;Price)] from x where Id=y;
r:exec (remQty;fifo[remQty;b 0];Price) from x where Verb=`Sell;
x:update remQty:r 1 from x where Verb=`Sell;
update pnl:neg[(*) . b]+sum[r[2]*r[0]-r[1]] from x where Id=y
};
fifo:{x-deltas y&sums x};
pnlinfo/[update remQty:Qty from tab where Verb=`Sell;exec Id from tab where Verb=`Buy]
Id Verb Qty Price remQty pnl
----------------------------
1 Buy 6 10 6
2 Sell 5 11 0
3 Buy 4 10 0
4 Sell 3 11 0
5 Sell 8 9 5
6 Buy 1 8 1
Assumes that Buys will be offset against previous sells as well as future sells.
You could also in theory use other distributions such as
lifo:{x-reverse deltas y&sums reverse x}
but I haven't tested that.

Here is a first attempt to get the ball rolling. Not efficient.
q)t:([]id:1+til 6;v:`b`s`b`s`s`b;qty:6 5 4 3 8 1; px:10 11 10 11 9 8)
//how much of each sale offsets a given purchase
q)alloc:last each (enlist d`s){(fx-c;c:deltas y&sums fx:first x)}\(d:exec qty by v from t)`b
//revenues, ie allocated sale * appropriate price
q)revs:alloc*\:exec px from t where v=`s
q)(sum each revs)-exec qty*px from t where v=`b
6 0 1

Slightly different approach without using over/scan(except in sums...).
Here we create a list of duplicated indices(one per unit Qty) of every Sell order and use cut to assign them to the appropriate Buy order, then we index into the Price of those Sells and find the difference with the Price of the appropriate Buy order.
This should scale with table size, but memory will blow up when Qty is large.
q)tab:([] Id:`1`2`3`4`5`6;Verb:`Buy`Sell`Buy`Sell`Sell`Buy;Qty:6 5 4 3 8 1;Price:10.0 11.0 10.0 11.0 9.0 8.0)
q)sideMap:`Buy`Sell!1 -1
q)update pnl:sum each neg Price - Price{sells:where neg 0&x; -1_(count[sells]&0,sums 0|x) _ sells}Qty*sideMap[Verb] from tab
Id Verb Qty Price pnl
---------------------
1 Buy 6 10 6
2 Sell 5 11 0
3 Buy 4 10 0
4 Sell 3 11 0
5 Sell 8 9 0
6 Buy 1 8 1

Related

Algorithm to distribute evenly products value into care packages

i'm currently solving a problem that states:
A company filed for bankruptcy and decided to pay the employees with the last remaining valuable items in the company only if it can be distributed evenly among them so that all of them have at least received 1 item and that the difference between the employee carrying the most valuable items and the employee carrying the least valuable items can not exceed a certain value x;
Input:
First row contains number of employee;
Second row contains the x value so that the the difference between the employee carrying the most valuable items and the employee carrying the least valuable items can not exceed;
Third row contains all the items with their value;
Output:
First number is the least valuable basket of items value and the second is the most valuable basket;
Example:
Input:
5
4
2 5 3 11 4 3 1 15 7 8 10
Output:
13 15
Input:
5
4
1 1 1 11 1 3 1 2 7 8
Output:
NO (It's impossible to distribute evenly)
Input:
5
10
1 1 1 1
Output:
NO (It's impossible to distribute evenly)
My solution to resolve this problem taking the first input is to, sort the items in ascending or descending order so from
2 5 3 11 4 3 1 15 7 8 10 --> 1 2 3 3 4 5 7 8 10 11 15
then create an adjacency list or just store it in simple variables where we add the biggest number to the lowest basket while iterating the item values array
Element 0: 15
Element 1: 11 <- 3 (sum 14)
Element 2: 10 <- 3 (sum 13)
Element 3: 8 <- 4 <- 1 (sum 13)
Element 4: 7 <- 5 <- 2 (sum 14)
So that my solution will have O(nlogN + 2n), first part using merge sort and then finding max e min value, what do you guys think about this solution?

Maximize the minimum score

Given a grid of dimensions A*B with values between 1-9, find a sequence of B numbers that maximizes the minimum number of values matched when compared with A rows.
Describe the certain steps you would take to maximize the minimum score.
Example:
Grid Dimension
A = 5 , B = 10
Grid Values
9 3 9 2 9 9 4 5 7 6
6 3 4 2 8 5 7 5 9 2
4 9 5 8 3 7 3 2 7 6
7 5 8 9 9 4 7 3 3 7
2 6 8 3 2 4 5 4 2 2
Possible Answer
6 3 8 2 9 4 7 5 7 4
Score Calculation
This answer scores
5 when compared with Row 1
5 when compared with Row 2
1 when compared with Row 3
4 when compared with Row 4
2 when compared with Row 5
And thus the minimal score for this answer is 1.
I would go for a local hill-climbing approach that you can complement with a randomization to avoid local minima. Something like:
1. Generate a random starting solution S
2. Compute its score score(S, row) for each row. We'll call min_score(S) the minimum score among all rows for S.
3. Attempt to improve the solution with:
For each digit i (1..B) in S:
If i belongs to a row such that score(S, row) > (min_score(S) + 1) then:
Change i to be the digit of a row with min_score(S). If there was only one row with min_score(S), then min_score(S) has improved by 1
Update the scores of all the rows.
If min_score(S) hasn't improved for more than N iterations of 3, go back to 1 and start with a new random solution.

Increase the numbers in apl

I have the following data:
a b c d
5 9 6 0
3 1 3 2
Characters in the first row, numbers in the second row.
How do I get the character corresponding to the highest number in the second row, and how do I increase the corresponding number in the second row? (For example, here, column b has the highest number, 9, so increase that number by 10%.)
I use Dyalog version 17.1.
With:
⎕←data←3 4⍴'a' 'b' 'c' 'd' 5 9 6 0 3 1 3 2
a b c d
5 9 6 0
3 1 3 2
You can extract the second row with:
2⌷data
5 9 6 0
Now grade it descending, that is, find the indices that would sort it from highest to lowest:
⍒2⌷data
2 3 1 4
The first number is the column we're looking for:
⊃⍒2⌷data
2
Now we can use this to extract the character from the first row:
data[⊂1,⊃⍒2⌷data]
b
But we only need the column index, not the actual character. The full index of the number we want to increase is:
2,⊃⍒2⌷data
2 2
Extracting the data to see that we got the right index:
data[⊂2,⊃⍒2⌷data]
9
Now we can either create a new array with the target value increased by 10%:
1.1×#(⊂2,⊃⍒2⌷data)⊢data
a b c d
5 9.9 6 0
3 1 3 2
Or change it in-place:
data[⊂2,⊃⍒2⌷data]×←1.1
data
a b c d
5 9.9 6 0
3 1 3 2
Try it online!

Making a scatterplot with PCA and how to read results

Im a little newbie with R and not familiar with PCA. My problem is, from a survey I have a list with observations from nine variables, first one is the gender of the respondents, the next five (Q51_1_c,Q51_2_c,Q51_4_c,Q51_6_c,Q51_7_c) ask about entrepreneurial issues and the others ask about future expectations (Q56_1_c, Q56_2_c, Q56_3_c). Except gender, all this variables takes values between 1 and 5. I want to make a scatter plot with two axis. First one with "entrepreneurial variables" and second axis with "future expectations variables" and then define as points in the scatter plot the position of Male and Female. My data look like this:
x <- "Q1b Q51_1_c Q51_2_c Q51_4_c Q51_6_c Q51_7_c Q56_1_c Q56_2_c Q56_3_c
3 Male 5 4 4 4 4 5 4 4
4 Female 4 3 4 4 3 3 4 3
5 Female 1 1 1 1 1 3 1 1
7 Female 2 1 1 1 1 5 1 4
8 Female 4 4 5 4 4 5 4 4
9 Female 3 3 4 4 3 3 4 4
13 Male 4 4 4 4 5 3 3 3
15 Female 3 4 4 4 4 1 1 5
16 Female 4 1 4 4 4 3 3 3
19 Female 3 2 3 3 3 3 3 3
20 Male 1 1 1 1 1 3 1 5
21 Female 3 1 1 2 1 3 3 3
26 Female 5 5 1 2 1 4 4 3
27 Female 2 1 1 1 1 1 1 1
29 Male 2 2 2 2 1 4 4 4
31 Female 3 1 1 1 1 5 2 3
34 Female 4 1 1 4 3 3 1 4
36 Female 5 1 1 4 4 5 1 2
37 Male 5 1 2 4 4 5 4 5
38 Female 3 1 1 1 1 1 1 1"
To run PCA this is my code:
x <- na.omit(x) #Jus to simplyfy
resul <- prcomp(x[,-1], scale = TRUE)
x$PC1 <- resul$x[,1] #Saving Scores PC1
x$PC2 <- resul$x[,2] #Saving Scores PC2
The result axis are like this:
biplot(resul, scale = 0)
Finally, to make the scatter plot:
x %>%
group_by(Q1b) %>%
summarise(mean_PC1 = mean(PC1),
mean_PC2 = mean(PC2)) %>%
ggplot(aes(x=mean_PC1, y=mean_PC2, colour=Q1b)) +
geom_point() +
theme_bw()
Which gives me this:
I'm not sure how about read the results... Should I accept that Females in general get higher values in the dimension of future expectations than Males. And Males get higher values in the entrepreneurial dimension?
Thanks in advance!!
Your interpretation of the axes looks correct, i.e., PC1 is a gradient which from left to right represents decreasing "entrepreneurialness", while PC2 is a gradient which from bottom to top represents increasing future expectations (assuming that "5" in the original data means highest entrepreneurialness/expectations).
In terms of whether males and females are different, you probably need to plot more than the just the means for each group: even if males and females are truly identical in their entrepreneurialness/expectations, you'd never expect the means from two samples to sit right on top of each other on a scatter plot. To address this, you could plot the actual observations rather than their means (i.e., one point per row, coloured by gender) and see if they intermingle vs. separate in the plot space. Or, regress gender against the principal components.
Another issue is whether it's appropriate to use PCA on ordinal data - see here for discussion.

Dynamic Programming - Two spies at the river

I think this is a very complicated dynamic programming problem.
Two spies each have a secret number in [1..m]. To exchange numbers they agree to meet at the river and "innocently" take turns throwing stones: from a pile of n=26 identical stones, each spy in turn throws at least one stone in the river.
The only information is in the number of stones each thrown in each turn. What is the largest m can be so they are sure they can complete the exchange?
Develop a recursive formula to count. Here is the start of the table; complete it to n=26. (You should not expect a closed form.)
n 1 2 3 4 5 6 7 8 9 10 11 12
m 1 1 1 2 2 3 4 6 8 12 16 23
Here are some hints from our professor: I suggest changing the problem to making the following table: Let R(n,m) be the range of numbers [1..R(n,m)] that A can indicate to B if they start with n stones, and both know that A has to also receive a number in [1..m] from B.
For example, if A needs no more information, R(n,1) can be computed by considering how many stones A could throw (one to n), then B thows 1 (if any remain) and A gets to decide again. The base cases R(0,1) = R(1,1) = 1, and you can write a recursive rule if you are careful at the boundaries. (You should find the Fibonacci numbers for R(n,1).)
If A needs information, then B has to send it by his or her choices, so things are a little more complicated. Here is the start of the table:
n\ m 1 2 3 4 5
0 1 0 0 0 0
1 1 0 0 0 0
2 2 0 0 0 0
3 3 1 0 0 0
4 5 2 1 0 0
5 8 4 2 1 1
6 13 7 4 3 2
7 21 12 8 6 4
8 34 20 15 11 8
9 55 33 27 19 16
From the R(n,m) table, how would you recover the entries of the earlier table (the table showing m as a function of n)?

Resources