Coding Question for SDE Positon - Related to DP - data-structures

My friend gave a test in which he got this question. I tried my best to solve this problem but couldn't get much far,
Can someone provide the approach to solve this question?
Problem statement
Input?Output Test Cases

Lets define a function f(i,j) that gives the maximum value of taken diamonds from the first i boxes (1,2....i) where we picked the diamond j from box i.
Then the answer to your problem will be max(f(n,j)) where j=1,2,...bn where bn is the number of diamonds in box n.
For each box we need to try to pick one diamond and we try to maximize the value with picking one of non similar color diamonds from the previous box, so the formula would be:
f(i,j) = Vj + max(f(i-1,k)) // diamond k should not have the same color as diamond j
To calculate max(f(i-1,k)) in an efficient way you can keep track of the maximum 2 values of function f(i,j) for each box i:
max1 = f(i,j1)
max2 = f(i,j2)
here is a psuedo code:
max1 = -1;
max2 = -1;
for (int i = 1; i <= N; i++) {
for (int j = 1; j <= diamonds_of_box_i; j++) {
f[i][j] = -1;
if (max1.color != diamond[j].color && max1 != -1)
f[i][j] = max(f[i][j], max1) + Vj;
if (max2.color != diamond[j].color && max2 != -1)
f[i][j] = max(f[i][j], max2) + Vj;
}
max1 = max2 = -1;
for (int j = 1; j <= diamonds_of_box_i; j++) {
if (f[i][j] > max1) {
max2 = max1;
max1 = f[i][j];
}
else if (f[i][j] > max2) {
max2 = f[i][j];
}
}
}
for (int j = 1; j < diamonds_of_box_n; j++){
if (f[n][j] > res)
res = f[n][j];
}
print(res);

Your first step is to determine if a given set of inputs provides any solution at all. One way it would be impossible to have a solution is if two consecutive boxes each only have one color. It uses numbers, but I like to think in colors, so say a box is red, the next box is also red. That's no good, condition fails.
But it gets worse - what if a box is red, the next box is red and orange, and the NEXT box is just orange. Well, if you have to take red in that first box, then you must take orange in the second box. But you also must take orange in the third box!
You start by constructing a matrix of these boxes and keeping track of what gems are inside them, and which of those gems are eligible to be chosen. Iterating over the matrix, you take each box with only one color and exclude the boxes on either side from having that color. Then iterate again, since now some boxes will have only one color now (like the red/orange in the previous example). If you have a box with no valid choices, then you're done. There may be some edge case I'm missing, but hopefully you get the idea now.
This concept is actually very similar to how sudoku solvers work - consider looking at those for inspiration.
If every box has a valid choice, then it's time to optimize for value. I would start by just identifying the largest value diamond in each box. Obviously you want to pick this the most often as possible. A naive algorithm might just iterate over the boxes in a single pass, picking the highest gem, or second highest gem if the highest gem was taken by the last box. A more advanced algorithm might look ahead at the next few boxes (or the entire rest of the set?) to determine which is the best combination. There's probably some more advanced way to handle it that I'm not seeing, but again, this should point some interested party in the right direction.

Related

How to deal with draws by repetition in a transposition table?

I'm trying to solve Three Men's Morris. The details of the game don't matter, that that it's a game similar to tic tac toe, but players may be able to force a win from some positions, or be able to force the game to repeat forever by playing the same moves over and over in other positions. So I want to make a function to tell whether a player can force a win, or force a draw by repetition.
I've tried using simple negamax, which works fine but is way too slow to traverse the game tree with unlimited depth. I want to use transposition tables since the number of possible positions is very low (<6000) but that's where my problem comes from. As soon as I add in the transposition table (just a list of all fully searched positions and their values, 0, 1, or -1) the AI starts making weird moves, suddenly saying its a draw in positions where I have a forced win.
I think the problem comes from transposition table entries being saved as draws, since it seemed to work when I limited the depth and only saved forced wins, but I'm not sure how to fix the problem and allow for unlimited depth.
Here's the code in case there's an issue with my implementation:
int evaluate(ThreeMensMorris &board){
//game is won or drawn
if(board.isGameWon()) return -1; //current player lost
if(board.isRepetition()) return 0; //draw by repetition
//check if this position is already in the transposition table
//if so, return its value
uint32_t pos = board.getPosInt();
for(int i = 0; i < transIdx; i++)
if(transList[i] == pos)
return valueList[i];
//negamax
//NOTE: moves are formatted as two numbers, "from" and "to",
//where "to" is -1 to place a piece for the first time
//so this nested for loop goes over all possible moves
int bestValue = -100;
for(int i = 0; i < 9; i++){
for(int j = -1; j < 9; j++){
if(!board.makeMove(i, j)) continue; //illegal move
int value = -1 * evaluate(board, depth+1);
board.unmakeMove(i, j);
if(value > bestValue) bestValue = value;
}
}
//we have a new position complete with a value, push it to the end of the list
transList[transIdx] = pos;
valueList[transIdx] = bestValue;
transIdx++;
return bestValue;
}
I suggest you start looking at transposition tables for chess: https://www.chessprogramming.org/Transposition_Table. You need to give each gamestate an (almost) unique number, e.g. through Zobrist hashing, maybe this is what you do in board.getPosInt()?
A possible fault is that you don't consider who's turn it is? Even if a position is the same on the board, it is not the same if in one position it is player A turn and in the other player B. Are there other things to consider in this game? In chess there are things like en passant possibilities that needs to be considered, and other special cases, to know if the position is actually the same, not just the pieces themselves.
Transposition tables are really complex and super hard to debug unfortunately. I hope you get it to work though!

2D House Robber Algorithm

This seems to be a variation of the LeetCode House Robber problem, but I found it significantly harder to tackle:
There are houses laid out on a NxN grid. Each house is known to contain some amount of valuables. The robbers task is to rob as many houses as possible to maximize the amount of loot. However there is a security system in place and if you rob two adjacent houses (to the left, right, above and below) an alarm will go off. Find the maximum loot the robber can rob.
Houses : alignment
10 20 10 0 1 0
20 40 20 => 1 0 1
10 20 10 0 1 0
This alignment results in the maximum of 80.
I've learned how to solve the optimum selection of houses for a single row with dynamic programming from https://shanzi.gitbooks.io/algorithm-notes/problem_solutions/house_robber.html:
public class HouseRobber {
public int rob(int[] nums) {
if (nums.length == 0) return 0;
if (nums.length == 1) return nums[0];
int[] dp = new int[nums.length];
dp[0] = nums[0];
int max = dp[0];
for (int i = 1; i < dp.length; i++) {
dp[i] = nums[i];
// Do not need to check k < i - 3.
for (int j = 2; i - j >= 0 && j <= 3; j++) {
dp[i] = Math.max(dp[i], dp[i - j] + nums[i]);
}
max = Math.max(dp[i], max);
}
return max;
}
}
But once I select one row's optimum selection, it might not align with with the optimum selections for the rows above and below that row. Even if I find the optimum combination for two rows, the next row might have more valuables than the two rows combined and would require another adjustment, and on and on.
This is difficult because there are a lot more variables to consider than houses on a single row and there could also be more than one optimum alignments that give the robber maximum loot (such as the example above.)
I did find what seemed to be a poorly written solution in Python, but since I only understand C-based languages (Java, C#, C++), I couldn't get much out of it. Could anyone help me with a solution or at least some pointers?
Thanks for your time!
I went through the python code you mentioned. The solution using flow looks plain wrong to me. There is no path from source to sink with finite weight. That solution basically colors the grid like a chessboard and chooses either all the black squares or all the white squares. That is not optimal in the following case:
1 500
300 1
1000 300
It is better to choose 1000 and 500 but that solution will choose 300+300+500.
The dynamic programming solution is exponential.
I don't know enough math to understand the LP solution.
Sorry to not answer your question.

2048 game: how many moves did I do?

2048 used to be quite popular just a little while ago. Everybody played it and a lot of people posted nice screenshots with their accomplishments(myself among them). Then at some point I began to wonder if it possible to tell how long did someone play to get to that score. I benchmarked and it turns out that(at least on the android application I have) no more than one move can be made in one second. Thus if you play long enough(and fast enough) the number of moves you've made is quite good approximation to the number of seconds you've played. Now the question is: is it possible having a screenshot of 2048 game to compute how many moves were made.
Here is an example screenshot(actually my best effort on the game so far):
From the screenshot you can see the field layout at the current moment and the number of points that the player has earned. So: is this information enough to compute how many moves I've made and if so, what is the algorithm to do that?
NOTE: I would like to remind you that points are only scored when two tiles "combine" and the number of points scored is the value of the new tile(i.e. the sum of the values of the tiles being combined).
The short answer is it is possible to compute the number of moves using only this information. I will explain the algorithm to do that and I will try to post my answer in steps. Each step will be an observation targeted at helping you solve the problem. I encourage the reader to try and solve the problem alone after each tip.
Observation number one: after each move exactly one tile appears. This tile is either 4 or 2. Thus what we need to do is to count the number of tiles that appeared. At least on the version of the game I played the game always started with 2 tiles with 2 on them placed at random.
We don't care about the actual layout of the field. We only care about the numbers that are present on it. This will become more obvious when I explain my algorithm.
Seeing the values in the cells on the field we can compute what the score would be if 2 had appeared after each move. Call that value twos_score.
The number of fours that have appeared is equal to the difference of twos_score and actual_score divided by 4. This is true because for forming a 4 from two 2-s we would have scored 4 points, while if the 4 appears straight away we score 0. Call the number of fours fours.
We can compute the number of twos we needed to form all the numbers on the field. After that we need to subtract 2 * fours from this value as a single 4 replaces the need of two 2s. Call this twos.
Using this observations we are able to solve the problem. Now I will explain in more details how to perform the separate steps.
How to compute the score if only two appeared?
I will prove that to form the number 2n, the player would score 2n*(n - 1) points(using induction).
The statements is obvious for 2 as it directly appears and therefor no points are scored for it.
Let's assume that for a fixed k for the number 2k the user will score 2k*(k - 1)
For k + 1: 2k + 1 can only be formed by combining two numbers of value 2k. Thus the user will score 2k*(k - 1) + 2k*(k - 1) + 2k+1(the score for the two numbers being combined plus the score for the new number).
This equals: 2k + 1*(k - 1) + 2k+1= 2k+1 * (k - 1 + 1) = 2k+1 * k. This completes the induction.
Therefor to compute the score if only twos appeared we need to iterate over all numbers on the board and accumulate the score we get for them using the formula above.
How to compute the number of twos needed to form the numbers on the field?
It is much easier to notice that the number of twos needed to form 2n is 2n - 1. A strict proof can again be done using induction, but I will leave this to the reader.
The code
I will provide code for solving the problem in c++. However I do not use anything too language specific(appart from vector which is simply a dynamically expanding array) so it should be very easy to port to many other languages.
/**
* #param a - a vector containing the values currently in the field.
* A value of zero means "empty cell".
* #param score - the score the player currently has.
* #return a pair where the first number is the number of twos that appeared
* and the second number is the number of fours that appeared.
*/
pair<int,int> solve(const vector<vector<int> >& a, int score) {
vector<int> counts(20, 0);
for (int i = 0; i < (int)a.size(); ++i) {
for (int j = 0; j < (int)a[0].size(); ++j) {
if (a[i][j] == 0) {
continue;
}
int num;
for (int l = 1; l < 20; ++l) {
if (a[i][j] == 1 << l) {
num = l;
break;
}
}
counts[num]++;
}
}
// What the score would be if only twos appeared every time
int twos_score = 0;
for (int i = 1; i < 20; ++i) {
twos_score += counts[i] * (1 << i) * (i - 1);
}
// For each 4 that appears instead of a two the overall score decreases by 4
int fours = (twos_score - score) / 4;
// How many twos are needed for all the numbers on the field(ignoring score)
int twos = 0;
for (int i = 1; i < 20; ++i) {
twos += counts[i] * (1 << (i - 1));
}
// Each four replaces two 2-s
twos -= fours * 2;
return make_pair(twos, fours);
}
Now to answer how many moves we've made we should add the two values of the pair returned by this function and subtract two because two tiles with 2 appear straight away.

Divvying people into rooms by last name?

I often teach large introductory programming classes (400 - 600 students) and when exam time comes around, we often have to split the class up into different rooms in order to make sure everyone has a seat for the exam.
To keep things logistically simple, I usually break the class apart by last name. For example, I might send students with last names A - H to one room, last name I - L to a second room, M - S to a third room, and T - Z to a fourth room.
The challenge in doing this is that the rooms often have wildly different capacities and it can be hard to find a way to segment the class in a way that causes everyone to fit. For example, suppose that the distribution of last names is (for simplicity) the following:
Last name starts with A: 25
Last name starts with B: 150
Last name starts with C: 200
Last name starts with D: 50
Suppose that I have rooms with capacities 350, 50, and 50. A greedy algorithm for finding a room assignment might be to sort the rooms into descending order of capacity, then try to fill in the rooms in that order. This, unfortunately, doesn't always work. For example, in this case, the right option is to put last name A in one room of size 50, last names B - C into the room of size 350, and last name D into another room of size 50. The greedy algorithm would put last names A and B into the 350-person room, then fail to find seats for everyone else.
It's easy to solve this problem by just trying all possible permutations of the room orderings and then running the greedy algorithm on each ordering. This will either find an assignment that works or report that none exists. However, I'm wondering if there is a more efficient way to do this, given that the number of rooms might be between 10 and 20 and checking all permutations might not be feasible.
To summarize, the formal problem statement is the following:
You are given a frequency histogram of the last names of the students in a class, along with a list of rooms and their capacities. Your goal is to divvy up the students by the first letter of their last name so that each room is assigned a contiguous block of letters and does not exceed its capacity.
Is there an efficient algorithm for this, or at least one that is efficient for reasonable room sizes?
EDIT: Many people have asked about the contiguous condition. The rules are
Each room should be assigned at most a block of contiguous letters, and
No letter should be assigned to two or more rooms.
For example, you could not put A - E, H - N, and P - Z into the same room. You could also not put A - C in one room and B - D in another.
Thanks!
It can be solved using some sort of DP solution on [m, 2^n] space, where m is number of letters (26 for english) and n is number of rooms. With m == 26 and n == 20 it will take about 100 MB of space and ~1 sec of time.
Below is solution I have just implemented in C# (it will successfully compile on C++ and Java too, just several minor changes will be needed):
int[] GetAssignments(int[] studentsPerLetter, int[] rooms)
{
int numberOfRooms = rooms.Length;
int numberOfLetters = studentsPerLetter.Length;
int roomSets = 1 << numberOfRooms; // 2 ^ (number of rooms)
int[,] map = new int[numberOfLetters + 1, roomSets];
for (int i = 0; i <= numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
map[i, j] = -2;
map[0, 0] = -1; // starting condition
for (int i = 0; i < numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
if (map[i, j] > -2)
{
for (int k = 0; k < numberOfRooms; k++)
if ((j & (1 << k)) == 0)
{
// this room is empty yet.
int roomCapacity = rooms[k];
int t = i;
for (; t < numberOfLetters && roomCapacity >= studentsPerLetter[t]; t++)
roomCapacity -= studentsPerLetter[t];
// marking next state as good, also specifying index of just occupied room
// - it will help to construct solution backwards.
map[t, j | (1 << k)] = k;
}
}
// Constructing solution.
int[] res = new int[numberOfLetters];
int lastIndex = numberOfLetters - 1;
for (int j = 0; j < roomSets; j++)
{
int roomMask = j;
while (map[lastIndex + 1, roomMask] > -1)
{
int lastRoom = map[lastIndex + 1, roomMask];
int roomCapacity = rooms[lastRoom];
for (; lastIndex >= 0 && roomCapacity >= studentsPerLetter[lastIndex]; lastIndex--)
{
res[lastIndex] = lastRoom;
roomCapacity -= studentsPerLetter[lastIndex];
}
roomMask ^= 1 << lastRoom; // Remove last room from set.
j = roomSets; // Over outer loop.
}
}
return lastIndex > -1 ? null : res;
}
Example from OP question:
int[] studentsPerLetter = { 25, 150, 200, 50 };
int[] rooms = { 350, 50, 50 };
int[] ans = GetAssignments(studentsPerLetter, rooms);
Answer will be:
2
0
0
1
Which indicates index of room for each of the student's last name letter. If assignment is not possible my solution will return null.
[Edit]
After thousands of auto generated tests my friend has found a bug in code which constructs solution backwards. It does not influence main algo, so fixing this bug will be an exercise to the reader.
The test case that reveals the bug is students = [13,75,21,49,3,12,27,7] and rooms = [6,82,89,6,56]. My solution return no answers, but actually there is an answer. Please note that first part of solution works properly, but answer construction part fails.
This problem is NP-Complete and thus there is no known polynomial time (aka efficient) solution for this (as long as people cannot prove P = NP). You can reduce an instance of knapsack or bin-packing problem to your problem to prove it is NP-complete.
To solve this you can use 0-1 knapsack problem. Here is how:
First pick the biggest classroom size and try to allocate as many group of students you can (using 0-1 knapsack), i.e equal to the size of the room. You are guaranteed not to split a group of student, as this is 0-1 knapsack. Once done, take the next biggest classroom and continue.
(You use any known heuristic to solve 0-1 knapsack problem.)
Here is the reduction --
You need to reduce a general instance of 0-1 knapsack to a specific instance of your problem.
So lets take a general instance of 0-1 knapsack. Lets take a sack whose weight is W and you have x_1, x_2, ... x_n groups and their corresponding weights are w_1, w_2, ... w_n.
Now the reduction --- this general instance is reduced to your problem as follows:
you have one classroom with seating capacity W. Each x_i (i \in (1,n)) is a group of students whose last alphabet begins with i and their number (aka size of group) is w_i.
Now you can prove if there is a solution of 0-1 knapsack problem, your problem has a solution...and the converse....also if there is no solution for 0-1 knapsack, then your problem have no solution, and vice versa.
Please remember the important thing of reduction -- general instance of a known NP-C problem to a specific instance of your problem.
Hope this helps :)
Here is an approach that should work reasonably well, given common assumptions about the distribution of last names by initial. Fill the rooms from smallest capacity to largest as compactly as possible within the constraints, with no backtracking.
It seems reasonable (to me at least) for the largest room to be listed last, as being for "everyone else" not already listed.
Is there any reason to make life so complicated? Why cann't you assign registration numbers to each student and then use the number to allocate them whatever the way you want :) You do not need to write a code, students are happy, everyone is happy.

Pixies in the custard swamp puzzle

(With thanks to Rich Bradshaw)
I'm looking for optimal strategies for the following puzzle.
As the new fairy king, it is your duty to map the kingdom's custard swamp.
The swamp is covered in an ethereal mist, with islands of custard scattered throughout.
You can send your pixies across the swamp, with instructions to fly low or high at each point.
If a pixie swoops down over a custard, it will be distracted and won't complete its sequence.
Since the mist is so thick, all you know is whether a pixie got to the other side or not.
In coding terms..
bool flutter( bool[size] swoop_map );
This returns whether a pixie exited for a given sequence of swoops.
The simplest way is to pass in sequences with only one swoop. That reveals all custard islands in 'size' tries.
I'd rather something proportional to the number of custards - but have problems with sequences like:
C......C (that is, custards at beginning and end)
Links to other forms of this puzzle would be welcome as well.
This makes me think of divide and conquer. Maybe something like this (this is slightly broken pseudocode. It may have fence-post errors and the like):
retval[size] check()
{
bool[size] retval = ALLFALSE;
bool[size] flut1 = ALLFALSE;
bool[size] flut2 = ALLFALSE;
for (int i = 0; i < size/2; ++i) flut1[i] = TRUE;
for (int i = size/2; i < size; ++i) flut2[i] = TRUE;
if (flutter(flut1)) retval[0..size/2] = <recurse>check
if (flutter(flut2)) retval[size/2..size] = <recurse>check
}
In plain English, it calls flutter on each half of the custard map. If any half returns false, that whole half has no custard. Otherwise, half of the half has the algorithm applied recursively. I'm not sure if it is possible to do better. However, this algorithm is kind of lame if the swamp is mostly custard.
Idea Two:
int itsize = 1
bool[size] retval = ALLFALSE;
for (int pos = 0; pos < size;)
{
bool[size] nextval = ALLFALSE;
for (int pos2 = pos; pos2 < pos + size && pos2 < size; ++pos2) nextval[pos2] = true;
bool flut = flutter(nextval)
if (!flut || itsize == 1)
{
for (int pos2 = pos; pos2 < pos + size && pos2 < size; ++pos2) retval[pos2] = flut;
pos+=itsize;
}
if (flut) itsize = 1;
if (!flut) itsize*=2;
}
In plain English, it calls flutter on each element of the custard map, one at a time. If it does not find custard, the next call will be on twice as many elements as the previous call. This is kind of like binary search, except only in one direction since it does not know how many items it is searching for. I have no idea how efficient this is.
Brian's first divide and conquer algorithm is optimal in the following sense: there exists a constant C such that over all swamps with n squares and at most k custards, no algorithm has a worst case that is more than C times better than Brian's. Brian's algorithm uses O(k log(n/k)) flights, which is within a constant factor the information-theoretic lower bound of log2(n choose k) >= log2((n/k)^k) = k Omega(k log(n/k)). (You need an assumption like k <= n/2 to make the last step rigorous, but at this point, we've already reached the maximum of O(n) flights.)
Why does Brian's algorithm use only O(k log(n/k)) flights? At recursion depth i, it makes at most min(2^i, k) flights. The sum for 0 <= i <= log2(k) is O(k). The sum for log2(k) < i <= log2(n) is k (log2(n) - log2(k)) = k (log2(n/k)).

Resources