Caching using Discrete Time Markov Chains and Probability

Caching using Discrete Time Markov Chains and Probability - caching

Suppose that a web server has three web pages, labelled 1, 2, and 3. The probabilities that a user moves from one page to another are:
P(1->1) = 0 P(1->2) = x P(1->3) = 1-x P(2->1) = y P(2->2) = 0 P(2->3) = 1-y P(3->1) = 0 P(3->2) = 1 P(3->3) = 0
(For example, when a user is currently at page 1, they request page 2 next with probability x and page 3 with probability (1-x).) Assume that 0 < x < y < 1/2. Suppose that the web server's cache has enough memory to store two pages. Whenever a request is for a page that is not in the cache, the browser will store that page in the cache, replacing the page least likely to be requested next. For example, if the cache contained pages 2 and 3, and page 1 was requested, the cache would be updated to contain pages 1 and 3 (since x < 1-x).
(a) Find the proportion of time (requests) that the cache contains pages 1 and 2. (Hint:be careful about your choice of state.)
(b) Find the probability of a cache miss (a request is not available in the cache).
Click here for picture version of question

Please see these pictures.
Part 1.
Part 2.
Part 3.

Related

Split array into four boxes such that sum of XOR's of the boxes is maximum

Given an array of integers which are needed to be split into four
boxes such that sum of XOR's of the boxes is maximum.
I/P -- [1,2,1,2,1,2]
O/P -- 9
Explanation: Box1--[1,2]
Box2--[1,2]
Box3--[1,2]
Box4--[]
I've tried using recursion but failed for larger test cases as the
Time Complexity is exponential. I'm expecting a solution using dynamic
programming.
def max_Xor(b1,b2,b3,b4,A,index,size):
if index == size:
return b1+b2+b3+b4
m=max(max_Xor(b1^A[index],b2,b3,b4,A,index+1,size),
max_Xor(b1,b2^A[index],b3,b4,A,index+1,size),
max_Xor(b1,b2,b3^A[index],b4,A,index+1,size),
max_Xor(b1,b2,b3,b4^A[index],A,index+1,size))
return m
def main():
print(max_Xor(0,0,0,0,A,0,len(A)))
Thanks in Advance!!

There are several things to speed up your algorithm:
Build in some start-up logic: it doesn't make sense to put anything into box 3 until boxes 1 & 2 are differentiated. In fact, you should generally have an order of precedence to keep you from repeating configurations in a different order.
Memoize your logic; this avoids repeating computations.
For large cases, take advantage of what value algebra exists.
This last item may turn out to be the biggest saving. For instance, if your longest numbers include several 5-bit and 4-bit numbers, it makes no sense to consider shorter numbers until you've placed those decently in the boxes, gaining maximum advantage for the leading bits. With only four boxes, you cannot have a num from 3-bit numbers that dominates a single misplaced 5-bit number.
Your goal is to place an odd number of 5-bit numbers into 3 or all 4 boxes; against this, check only whether this "pessimizes" bit 4 of the remaining numbers. For instance, given six 5-digit numbers (range 16-31) and a handful of small ones (0-7), your first consideration is to handle only combinations that partition the 5-digit numbers by (3, 1, 1, 1), as this leaves that valuable 5-bit turned on in each set.
With a more even mixture of values in your input, you'll also need to consider how to distribute the 4-bits for a similar "keep it odd" heuristic. Note that, as you work from largest to smallest, you need worry only about keeping it odd, and watching the following bit.
These techniques should let you prune your recursion enough to finish in time.

We can use Dynamic programming here to break the problem into smaller sets then store their result in a table. Then use already stored result to calculate answer for bigger set.
For example:
Input -- [1,2,1,2,1,2]
We need to divide the array consecutively into 4 boxed such that sum of XOR of all boxes is maximised.
Lets take your test case, break the problem into smaller sets and start solving for smaller set.
box = 1, num = [1,2,1,2,1,2]
ans = 1 3 2 0 1 3
Since we only have one box so all numbers will go into this box. We will store this answer into a table. Lets call the matrix as DP.
DP[1] = [1 3 2 0 1 3]
DP[i][j] stores answer for distributing 0-j numbers to i boxes.
now lets take the case where we have two boxes and we will take numbers one by one.
num = [1] since we only have one number it will go into the first box.
DP[1][0] = 1
Lets add another number.
num = [1 2]
now there can be two ways to put this new number into the box.
case 1: 2 will go to the First box. Since we already have answer
for both numbers in one box. we will just use that.
answer = DP[0][1] + 0 (Second box is empty)
case 2: 2 will go to second box.
answer = DP[0][0] + 2 (only 2 is present in the second box)
Maximum of the two cases will be stored in DP[1][1].
DP[1][1] = max(3+0, 1+2) = 3.
Now for num = [1 2 1].
Again for new number we have three cases.
box1 = [1 2 1], box2 = [], DP[0][2] + 0
box1 = [1 2], box2 = [1], DP[0][1] + 1
box1 = [1 ], box2 = [2 1], DP[0][0] + 2^1
Maximum of these three will be answer for DP[1][2].
Similarly we can find answer of num = [1 2 1 2 1 2] box = 4
1 3 2 0 1 3
1 3 4 6 5 3
1 3 4 6 7 9
1 3 4 6 7 9
Also note that a xor b xor a = b. you can use this property to get xor of a segment of an array in constant time as suggested in comments.
This way you can break the problem in smaller subset and use smaller set answer to compute for the bigger ones. Hope this helps. After understanding the concept you can go ahead and implement it with better time than exponential.

I would go bit by bit from the highest bit to the lowest bit. For every bit, try all combinations that distribute the still unused numbers that have that bit set so that an odd number of them is in each box, nothing else matters. Pick the best path overall. One issue that complicates this greedy method is that two boxes with a lower bit set can equal one box with the next higher bit set.
Alternatively, memoize the boxes state in your recursion as an ordered tuple.

dynamic programming problem, finding optimal visits to stores

Say we have a list of stores that contain some value when visited.
E.g. store_value = [2,4,9,1,4,2].
Running from store to store to collect the value has some cost, e.g. run_cost = [0,1,2,3,1,2].
That is if I run to collect value 9 at store i = 3 (not 0-indexed), it will have the cost 2, which means i wouldn't have been able to visit the 2 previous stores, because of the cost required. Consider it amount rested before running to store i.
Now using dynamic programming, we could say V(x,i), where V(0,i) is the maximal value obtainable after the first i stores, if we DO NOT run to store i. V(1,i) is the maximal value obtainable after the first i stores, if we DO run to store i.
What would P(0,i) and P(1,i) running from store i = 1..6 look like?
I tried running the algorithm, but something tells me i am doing something wrong.
From what i could gather:
P(0,1) = 0, P(1,1) = 2
from here on this is where i think im wrong:
P(0,2) = 2, P(1,2) = 4 ... and so forth
If someone could help me understand how i should think about this problem, i'd appreciate it a lot.

An easier formulation would be to define V(i) as the maximum value that can be achieved with stores 1..i. The recursive definition is then:
V(i) = max(
V(i - 1), //do not visit store i
store_value[i] + V(i - run_cost[i] - 1) //visit store i
)
Some care needs to be taken when run_cost is 0.

Best way to distribute a given resource (eg. budget) for optimal output

I am trying to find a solution in which a given resource (eg. budget) will be best distributed to different options which yields different results on the resource provided.
Let's say I have N = 1200 and some functions. (a, b, c, d are some unknown variables)
f1(x) = a * x
f2(x) = b * x^c
f3(x) = a*x + b*x^2 + c*x^3
f4(x) = d^x
f5(x) = log x^d
...
And also, let's say there n number of these functions that yield different results based on its input x, where x = 0 or x >= m, where m is a constant.
Although I am not able to find exact formula for the given functions, I am able to find the output. This means that I can do:
X = f1(N1) + f2(N2) + f3(N3) + ... + fn(Nn) where (N1 + ... Nn) = N as many times as there are ways of distributing N into n numbers, and find a specific case where X is the greatest.
How would I actually go about finding the best distribution of N with the least computation power, using whatever libraries currently available?

If you are happy with allocations constrained to be whole numbers then there is a dynamic programming solution of cost O(Nn) - so you can increase accuracy by scaling if you want, but this will increase cpu time.
For each i=1 to n maintain an array where element j gives the maximum yield using only the first i functions giving them a total allowance of j.
For i=1 this is simply the result of f1().
For i=k+1 consider when working out the result for j consider each possible way of splitting j units between f_{k+1}() and the table that tells you the best return from a distribution among the first k functions - so you can calculate the table for i=k+1 using the table created for k.
At the end you get the best possible return for n functions and N resources. It makes it easier to find out what that best answer is if you maintain of a set of arrays telling the best way to distribute k units among the first i functions, for all possible values of i and k. Then you can look up the best allocation for f100(), subtract off the value this allocated to f100() from N, look up the best allocation for f99() given the resulting resources, and carry on like this until you have worked out the best allocations for all f().
As an example suppose f1(x) = 2x, f2(x) = x^2 and f3(x) = 3 if x>0 and 0 otherwise. Suppose we have 3 units of resource.
The first table is just f1(x) which is 0, 2, 4, 6 for 0,1,2,3 units.
The second table is the best you can do using f1(x) and f2(x) for 0,1,2,3 units and is 0, 2, 4, 9, switching from f1 to f2 at x=2.
The third table is 0, 3, 5, 9. I can get 3 and 5 by using 1 unit for f3() and the rest for the best solution in the second table. 9 is simply the best solution in the second table - there is no better solution using 3 resources that gives any of them to f(3)
So 9 is the best answer here. One way to work out how to get there is to keep the tables around and recalculate that answer. 9 comes from f3(0) + 9 from the second table so all 3 units are available to f2() + f1(). The second table 9 comes from f2(3) so there are no units left for f(1) and we get f1(0) + f2(3) + f3(0).
When you are working the resources to use at stage i=k+1 you have a table form i=k that tells you exactly the result to expect from the resources you have left over after you have decided to use some at stage i=k+1. The best distribution does not become incorrect because that stage i=k you have worked out the result for the best distribution given every possible number of remaining resources.

Linear Hashing calculation?

I am currently studying for my exams and have came up against this question:
(5d) Suppose we are using linear hashing, and start with an empty table with 2 buckets (M = 2), split = 0 and a load factor of 0.9. Explain the steps we go through when the following hashes are added (in order):
5,7,12,11,9
The answer provided for this is:
*— —5— (0,1)
* — —5,7 —
split —*—5,7— — (0,1,2)
—12*—5,7— — —
split —12—5—*—7— (0,1,2,3)
split =M, M = 2*M, split = 0
*—12—5— —7—
*—12—5— —7,11—
split —*—5— —7,11—12— (0,1,2,3,4)
—*—5,9— —7,11—12—
split — —9*— —7,11—12—5— (0,1,2,3,4,5)
This answer doesn't make any sense to me and the lecturer did not go through this.
How do I tackle this question?

I edited your question because the answer looks like a list of descriptions of the hash table state as each operation is performed. Did your professor cover linear hashing at all? The Wikipedia description mention a load factor precisely, but it's in the original LH paper by Witold Litwin. it's integral to when a controlled split occurs. I also found these descriptions:
Let l denote the Linear Hashing scheme’s load factor, i.e., l = S/b where S is the total number of records and b is the number of buckets used.
Linear Hashing by Zhang, et al (PDF)
The linear hashing algorithm performs splits in a deterministic order, rather than splitting at a bucket that overflowed. The splits are performed in linear order (bucket 0 first, then bucket 1, then 2, ...), and a split is performed when any bucket overflows. If the bucket that overflows is not the bucket that is split (which is the common case), overflow techniques such as chaining are used, but the common case is that few overflow buckets are needed.
snip
Instead of splitting on every collision, you can do a split when the "load" (which is bytes stored / (num buckets * bucket size), i.e. utilization of the data structure) crosses some watermark. This is called controlled splitting; the previously described is called uncontrolled splitting.
Linear Hashing: A new Tool for File and Table Addressing Witold Litwin, Summary by: Steve Gribble and Armando Fox, Online Berkley.edu retrieved June 16
So basically, a load factor is a means of predictably controlling when a split will occur. One implementation of linear hashing appears to be called 'uncontrolled split' which adds a new bucket and performs a split whenever a collision occurs. Using a load factor of 0.9 only has a split occur when 90% of the tables buckets are full - or rather, would be full based on the prediction that the buckets are uniformly assigned to.
Based on this and the Wikipedia article I just read, the setup is this:
Table is initially empty with two buckets (N = 2) - - (numbered 0 and 1)
N for number of buckets makes so much more sense to me than M, so I'm using that in my answer.
Apparently N is never changed even as new buckets are added to the table.
Our growth factor (L for bucket level) is 0. It is incremented every time every bucket in the table has been split once, which coincides with when our table has doubled in size.
Step pointer S (also called a split pointer) points to 0th bucket. It indicates which bucket will have a split applied to it next.
This follows the wikipedia article description I linked to above. Now we need to cover the hash and bucket assignment.
A decent hash function for integers you expect to have a normal distribution is to just use the integer itself. So for an input integer I, our hash H(I) is just I. I think this follows the answer key, which is good because the question is unanswerable without knowing H.
To determine which bucket an integer I is added to, one of two function values will be used, depending on whether or not the assignment points to before or after S.
First, calculate H(I) mod (N x 2L), which is really just I mod (N x 2L). I'm going to call this B(I) below for brevity (also for bucket). Call this the assignment address A.
If A is greater than or equal to S, we assign input I to address A and move on.
If A (B(I)) is less than S, we actually use a different hash function, I'll call B'(I), which is calculated as I mod (N x 2L + 1), giving us an actual assignment address of A'.
I think the reasoning for this is to keep the assignment to buckets more even as buckets are split along the way, but I don't have the mathematical proof of its importance.
I think the * in the answer's notation above denotes the location of the split pointer S. In my notation for the rest of the question below:
Let - denote an empty bucket, i denote a bucket with the Integer i in it, and i,j denote a bucket with both i and j in it.
So the first step of your answer key "— —5— (0,1)" is saying bucket 0 is empty and bucket 1 has 5 in it. I would rewrite this as - 5 for clarity.
I'm thinking your answer breakdown reads like this:
Add 5 to the table.
The linear hashing algorithm puts it into the second bucket (index 1) because:
B(5) = 5 mod (2 x 20) = 5 mod (2 x 1) = 5 mod 2 = 1
1 is greater than S, which is still 0, so we use 1 as the address.
Table now has - 5 (0th bucket empty, 1st bucket with 5 in it.
N, L, and S are unchanged
Add 7 to the table.
B(7) = 7 mod 2 = 1, so 7 is added to the same bucket as 5. S still hasn't changed, so again 1 is used as the address.
Table now has - 5,7
A split occurs! Not because a bucket has overflowed, but because the load factor has been exceeded. 2 items added, 2 total buckets, 2/2 = 1.0 > 0.9 = do a split.
First a new bucket is added at the end of the table.
S is incremented to 1. N is not incremented. L is unchanged
The split is done on a bucket. A split means all the items in the bucket get their assignment recalculated based on the new hash table size. However, one key to linear hashing is that the actual buckets are split in order, so the 0th bucket is split even though the 1st bucket is the one thats full.
Post split, the table is now - 5,7 -, with buckets 0 and 2 empty, and 1 still with 5 and 7 in it.
Add 12 to the table.
B(12) = 12 mod (2 x 20) = 12 mod 2 = 0
S is 1 and B(12) is 0, so we calculate B'(12) instead for our address.
Coincidentally, this is 12 mod (2 x 20 + 1) = 12 mod 4, which is still 0, so 12 is added to the 0th bucket.
Table now has 12 5,7 -, only the 3rd, new bucket is empty.
A split occurs again, because 3/3 = 1.0 > 0.9. This split promises to be more interesting than the last!
A new bucket is added to the end of the table, giving us 12 5,7 - -
S = 1, so the bucket with 5,7 is split. That means new buckets are picked for 5 and 7.
Increment S to 2. This is done after the split target bucket is picked, but before the new buckets are assigned. This ensures the new table is more evenly distributed (again, my supposition, don't have proof).
5 mod 2 = 1, 1 < S, calculate 5 mod 2 x 21 = 5 mod 4 = 1. 5 is re-assigned to its same bucket.
7 mod 2 = 1, 1 < S, calculate 7 mod 2 x 21 = 7 mod 4 = 3. 7 is re-assigned to 3.
Table now has 12 5 - 7
S = 2, N still equals 2, and L still = 0. S has now reached N x 2L = 2 x 20 = 2, so S is reset to 0 and L is incremented to 1.
Add 11 to the table.
B(11) = 11 mod (2 x 21) = 11 mod 4 = 3. 11 is assigned to the 3rd bucket.
Table now has 12 5 - 7,11, 4 items and 4 buckets, so a split occurs again.
S is 0 again, so the 0th bucket with 12 is reassigned after a new bucket is added. S is incremented to 1 before choosing a new bucket for 12.
B(12) = 12 mod (2 x 21) = 12 mod 4 = 0. 0 < 1, so recalculate
B'(12) = 12 mod (2 x 21+1) = 12 mod 8 = 4. 12 is assigned to the 4th bucket.
Table now contains - 5 - 7,11 12
Add 9 to the table.
I'll leave the steps to the last one for you. There are a few nuances to the LH algorithm that I'm not quite grasping. I might ask additional questions about them. But hopefully that's enough for you to get going on. In the future, I would recommend asking the course instructor directly.

Generate a random interval where probability of n appearing is 1/n

Let's say we have a news website with 100 pages each displaying several articles, and we want to parse regularly the website to keep statistics on the number of commentaries per article.
The number of commentaries on a article will change rapidly on new articles (so on the first pages), and really slowly on the very old article (on the last pages).
So I will want to parse the first pages way more often than the last pages.
A solution to this problem I imagined would be each time to generate an interval of the pages we want to parse, with the additional requirement that n in this interval would have a probability 1/n of appearing.
For example, we would parse the page 1 every time.
The page 2 would appear in the interval half of the time.
The page 3, 1/3 of the time...
Our algorithm would then generate the 'interval' [1,1] most of the time. The interval [1,2] would be less likely, [1,3] even less ... and [1,100] would be really rare.
Do you see a way to implement this algorithm with the usual random function of most of the languages ?
Is there another way to solve the problem (parse more often the recent content on a website) making more sense ?
Thanks for your help.
edit:
Here is an implementation in Python based on the answer provided by #david-eisenstat.
I tried to implement the version with random() generating integers, but I obtain strange results.
# return a number between 1 and n
def randPage(n):
while True:
r = floor(1 / (1 - random()))
if r <= n:
return r

If you have a function random() that returns doubles in the interval [0, 1), then you look at pages 1 to floor(1 / (1 - random())). Page n is examined if and only if the output of random() is in the interval [1 - 1/n, 1), which has length 1/n.
If you're using an integer random() function in the interval [0, RAND_MAX], then let k = random() and look at RAND_MAX / k pages if k != 0 or all of them if k == 0.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio