Microsoft technical interview: Matrix Algorithm - algorithm

I recently had an interview in which the interviewer gave me some pseudocode and asked questions related to it. Unfortunately, I was not able to answer his questions due to lack of preparation. Due to time constraint, I could not ask him the solution for that problem. I would really appreciate if someone could guide me and help me understand the problem so I can improve for the future. Below is the pseudocode:
A sample state of ‘a’:
[[ 2, NULL, 2, NULL],
[ 2, NULL, 2, NULL],
[NULL, NULL, NULL, NULL],
[NULL, NULL, NULL, NULL]]
FUNCTION foo()
FOR y = 0 to 3
FOR x = 0 to 3
IF a[x+1][y] != NULL
IF a[x+1][y] = a[x][y]:
a[x][y] := a[x][y]*2
a[x+1][y] := NULL
END IF
IF a[x][y] = NULL
a[x][y] := a[x+1][y]
a[x+1][y] := NULL
END IF
END IF
END FOR
END FOR
END FUNCTION
The interviewer asked me:
What is the issue with the above code and how would I fix it?
Once corrected, what does function foo do? Please focus on the result of the function, not the details of the implementation.
How could you make foo more generic? Explain up to three possible generalization directions and describe a strategy for each, no need to write the code!
I mentioned to him:
The state of the matrix looks incorrect because an integer matrix cannot have null values. By default they are assigned 0, false for Boolean and null for the reference type.
Another issue with the above code is at IF a[x+1][y] != NULL, the condition will produce an array index out-of-bounds error when x equals 3.
But I felt the interviewer was looking for something else in my answer and was not satisfied with the explanation.

Have you played the game "2048" (link to game)? If not, this question will likely not make much intuitive sense to you, and because of that, I think it's a poor interview question.
What this attempts to do is simulate one step of the 2048 game where the numbers go upward. Numbers will move upward by one cell unless they hit another number or the matrix border (think of gravity pulling all numbers upward). If the two numbers are equal, they combine and produce a new number (their sum).
Note: this isn't exactly one step of the 2048 game because numbers only move one cell upward, while in the game they move "all they way" until they hit something else. To get a step of the 2048 game, you'd repeat the given function until no more changes occur.
The issue in the code is, as you mentioned, the array index out-of-bounds. It should be fixed by iterating over x = 0 to 2 instead.
To make this more general, you have to be creative:
The main generalization is that it should take a "direction" parameter. (Again you wouldn't know this if you haven't played the 2048 game yourself.) Instead of gravity pulling numbers upward, gravity can pull numbers in any of the 4 cardinal directions.
Maybe the algorithm shouldn't check for NULL but should check against some other sentinel value (which is another input).
It's also pretty easy to generalize this to larger matrices.
Maybe there should be some other rule that dictates when numbers get combined, and how precisely they get combined (not necessarily 2 times the first). These rules can be given in the form of lambdas.
As for this part of your answer:
integer matrix cannot have null values, by default they are assigned 0, false for Boolean and null for the reference type
That is largely dependent on the language being used, so I wouldn't say this is an error in the pseudocode (which isn't supposed to be in any particular language). For instance, in weakly-typed languages you can certainly have a matrix with int and NULL values.
You don't mention what you said about the function's behavior. If I were the interviewer, I would want to see someone "think out loud" and realize at least the following:
The code is trying to compare each element with the one below it.
Nothing happens unless the lower element is NULL.
If the two elements are equal, then the lower one is replaced with NULL and the upper element becomes twice as large.
If the top element is NULL, then the lower non-NULL element "moves" to the top element's place.
These observations about the code are straightforward to obtain just by reading the source code. Whether or not you make sense of these "rules" and notice that it's (similar to) the 2048 game is largely dependent on whether you've played the game before.

Here's the python code for the same program. I have fixed the index out of bound issue in this code. Hope this helps.
null = 0
array = [[2,null,2,null],[2,null,2,null],[null,null,null,null],[null,null,null,null]]
range = [0,1,2]
for y in range:
for x in range:
if array[x+1][y] != null:
if array[x+1][y] == array[x][y]:
array[x][y] = array[x][y]*2
array[x+1][y] = null
if array[x][y] == null:
array[x][y] = array[x+1][y]
array[x+1][y] = null
print(array)

Once corrected, what does function foo do? Please focus on the result of the function, not the details of the implementation
The output will be :
4 null 4 null
null null null null
null null null null
null null null null

Related

Designing an algorithm to check combinations

I´m having serious performance issues with a job that is running everyday and I think i cannot improve the algorithm; so I´m gonnga explain you what is the problem to solve and the algorithm we have, and maybe you have some other ideas to solve the problem better.
So the problem we have to solve is:
There is a set of Rules, ~ 120.000 Rules.
Every rule has a set of combinations of Codes. Codes are basically strings. So we have ~8 combinations per rule. Example of a combination: TTAAT;ZZUHH;GGZZU;WWOOF;SSJJW;FFFOLL
There is a set of Objects, ~800 objects.
Every object has a set of ~200 codes.
We have to check for every Rule, if there is at least one Combination of Codes that is fully contained in the Objects. It means =>
loop in Rules
Loop in Combinations of the rule
Loop in Objects
every code of the combination found in the Object? => create relationship rule/object and continue with the next object
end of loop
end of loop
end of loop
For example, if we have the Rule with this combination of two codes: HHGGT; ZZUUF
And let´s say we have an object with this codes: HHGGT; DHZZU; OIJUH; ZHGTF; HHGGT; JUHZT; ZZUUF; TGRFE; UHZGT; FCDXS
Then we create a relationship between the Object and the Rule because every code of the combination of the rule is contained in the codes of the object => this is what the algorithm has to do.
As you can see this is quite expensive, because we need 120.000 x 8 x 800 = 750 millions of times in the worst-case scenario.
This is a simplified scenario of the real problem; actually what we do in the loops is a little bit more complicated, that´s why we have to reduce this somehow.
I tried to think in a solution but I don´t have any ideas!
Do you see something wrong here?
Best regards and thank you for the time :)
Something like this might work better if I'm understanding correctly (this is in python):
RULES = [
['abc', 'def',],
['aaa', 'sfd',],
['xyy', 'eff',]]
OBJECTS = [
('rrr', 'abc', 'www', 'def'),
('pqs', 'llq', 'aaa', 'sdr'),
('xyy', 'hjk', 'fed', 'eff'),
('pnn', 'rrr', 'mmm', 'qsq')
]
MapOfCodesToObjects = {}
for obj in OBJECTS:
for code in obj:
if (code in MapOfCodesToObjects):
MapOfCodesToObjects[code].add(obj)
else:
MapOfCodesToObjects[code] = set({obj})
RELATIONS = []
for rule in RULES:
if (len(rule) == 0):
continue
if (rule[0] in MapOfCodesToObjects):
ValidObjects = MapOfCodesToObjects[rule[0]]
else:
continue
for i in range(1, len(rule)):
if (rule[i] in MapOfCodesToObjects):
codeObjects = MapOfCodesToObjects[rule[i]]
else:
ValidObjects = set()
break
ValidObjects = ValidObjects.intersection(codeObjects)
if (len(ValidObjects) == 0):
break
for vo in ValidObjects:
RELATIONS.append((rule, vo))
for R in RELATIONS:
print(R)
First you build a map of codes to objects. If there are nObj objects and nCodePerObj codes on average per object, this takes O(nObj*nCodePerObj * log(nObj*nCodePerObj).
Next you iterate through the rules and look up each code in each rule in the map you built. There is a relation if a certain object occurs for every code in the rule, i.e. if it is in the set intersection of the objects for every code in the rule. Since hash lookups have O(1) time complexity on average, and set intersection has time complexity O(min of the lengths of the 2 sets), this will take O(nRule * nCodePerRule * nObjectsPerCode), (note that is nObjectsPerCode, not nCodePerObj, the performance gets worse when one code is included in many objects).

Retrieving keys with maximum values in a Hashmap in java 8 [duplicate]

As of Java 1.5, you can pretty much interchange Integer with int in many situations.
However, I found a potential defect in my code that surprised me a bit.
The following code:
Integer cdiCt = ...;
Integer cdsCt = ...;
...
if (cdiCt != null && cdsCt != null && cdiCt != cdsCt)
mismatch = true;
appeared to be incorrectly setting mismatch when the values were equal, although I can't determine under what circumstances. I set a breakpoint in Eclipse and saw that the Integer values were both 137, and I inspected the boolean expression and it said it was false, but when I stepped over it, it was setting mismatch to true.
Changing the conditional to:
if (cdiCt != null && cdsCt != null && !cdiCt.equals(cdsCt))
fixed the problem.
Can anyone shed some light on why this happened? So far, I have only seen the behavior on my localhost on my own PC. In this particular case, the code successfully made it past about 20 comparisons, but failed on 2. The problem was consistently reproducible.
If it is a prevalent problem, it should be causing errors on our other environments (dev and test), but so far, no one has reported the problem after hundreds of tests executing this code snippet.
Is it still not legitimate to use == to compare two Integer values?
In addition to all the fine answers below, the following stackoverflow link has quite a bit of additional information. It actually would have answered my original question, but because I didn't mention autoboxing in my question, it didn't show up in the selected suggestions:
Why can't the compiler/JVM just make autoboxing “just work”?
The JVM is caching Integer values. Hence the comparison with == only works for numbers between -128 and 127.
Refer: #Immutable_Objects_.2F_Wrapper_Class_Caching
You can't compare two Integer with a simple == they're objects so most of the time references won't be the same.
There is a trick, with Integer between -128 and 127, references will be the same as autoboxing uses Integer.valueOf() which caches small integers.
If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.
Resources :
JLS - Boxing
On the same topic :
autoboxing vs manual boxing java
"==" always compare the memory location or object references of the values. equals method always compare the values. But equals also indirectly uses the "==" operator to compare the values.
Integer uses Integer cache to store the values from -128 to +127. If == operator is used to check for any values between -128 to 127 then it returns true. for other than these values it returns false .
Refer the link for some additional info
Integer refers to the reference, that is, when comparing references you're comparing if they point to the same object, not value. Hence, the issue you're seeing. The reason it works so well with plain int types is that it unboxes the value contained by the Integer.
May I add that if you're doing what you're doing, why have the if statement to begin with?
mismatch = ( cdiCt != null && cdsCt != null && !cdiCt.equals( cdsCt ) );
The issue is that your two Integer objects are just that, objects. They do not match because you are comparing your two object references, not the values within. Obviously .equals is overridden to provide a value comparison as opposed to an object reference comparison.
Besides these given great answers, What I have learned is that:
NEVER compare objects with == unless you intend to be comparing them
by their references.
As well for correctness of using == you can just unbox one of compared Integer values before doing == comparison, like:
if ( firstInteger.intValue() == secondInteger ) {..
The second will be auto unboxed (of course you have to check for nulls first).

How write a hash function to make such expression to be true?

pseudocode:
// deprecated x!=y && hash(x) == hash(y) // how to make this true?
x!=y && hash(x) == hash(y) && (z!=x && z!=y) && (hash(x) != hash(z) && (hash(y) != hash(z)) // how to make this true?
x and y can be any readable value
Whatever the language, the pseudocode is just help to understand what I mean.
I just wonder how to implement such hash function.
PS: For math, i am an idiot. I can not imagine if there is an algorithm that can do this.
UPDATE 1:
The pseudocode has bug, so I updated the code(actually still has bug, never mind, I will explain).
My original requirement is to make a hash function that can return same value for different parameter, and the parameter value should contains some rule. It means, only the parameter value in same category would gets same hash code, others are not.
e.g.
The following expressions are clearly(you can treat '0' as placeholder):
hash("1.1") == hash("1.0") == hash("0.1")
hash("2.2") == hash("2.0") == hash("0.2")
and
hash("2.2") != hash("2.1") != hash("1.2")
I think this question can do such description:
There are two or more different values contains implied same attribute.
Only these values have such same attribute in the world.
The attribute can obtain through some way(maybe a function), hash() will call it inside.
hash() one of the values, you can retrive the attribute, then you can get the unique hashCode.
It's looks like hash collision, but we exactly know what they are. Also looks like many-to-one model.
How to design collision rules? The values could be any character or numeric. And how to implement the designs?
PPS: This is a question full of bugs, maybe the updated parts cannot explain the the problem either. Or maybe this is a false proposition. I want abstract my issue as a general model, but it makes my mind overflowed. If necessary I will post my actual issue that I am facing.
Any constant hash trivially satisfies your condition:
hash(v) = 42
A less constant answer than yuri kilocheck's would be to use the mod operator:
hash(v) = v % 10;
Then you'll have:
hash(1) = 1
hash(2) = 2
hash(3) = 3
...
hash(11) = 1
hash(12) = 2

Ada Enumerated Type Range

As I understand it, Ada uses 0 based indexes on its enumerated types.. So in Status_Type below, the ordinal value goes from 0 to 5.
type Status_Type is
(Undefined,
Available,
Fout,
Assigned,
Effected,
Cleared);
My question is.. what are the ordinal values for the following examples? Do they start at 0 or do they start from the ordinal value from the super type?
subtype Sub_Status_Type is Status_Type
range Available.. Effected;
subtype Un_Status_Type is Sub_Status_Type
range Fout .. Assigned;
Would Sub_Status_Type ordinal values go from 1 to 4 or from 0 to 3?
Would Un_Status_Type ordinal values go from 3 to 4 or from 1 to 2 or from 0 to 1?
For the subtypes, a 'pos will return the same value as it would have for the base type (1..4 and 2..3 respectively, I believe). Subtypes aren't really new and different types, so much as they are the same old type, but with some extra limitations on its possible values.
But it should be noted that these values are assigned under the scenes. It really should make no difference to you what they are, unless you are using the 'val and 'pos attributes, or you are interfacing to code written outside of Ada (or to hardware).
Plus, if it does end up mattering, you should know that the situation is actually much more complicated. 'pos and 'val don't return the actual bit value the compiler uses for those enumeration values when it generates code. They just return their "ordinal position"; their offset from the first value.
By default they will usually be the same thing. However, you can change the value assignments (but not the ordinal position assignments) yourself with a for ... use clause, like in the code below:
for Status_Type use
(Undefined => 1,
Available => 2,
Out => 4,
Assigned => 8,
Effected => 16,
Cleared => 32);
The position number is defined in terms of the base type. So Sub_Status_Type'Pos(Assigned) is the same as Status_Type'Pos(Assigned), and the position values of Sub_Status_Type go from 1 to 4, not 0 to 3.
(And note that the position number isn't affected by an enumeration representation clause; it always starts at 0 for the first value of the base type.)
Incidentally, it would have been easy enough to find out by running a small test program that prints the values of Sub_Status_Type'Pos(...) -- which would also have told you that you can't use the reserved word out as an identifier.
As I understand it, Ada uses 0 based indexes on its enumerated types
Yes, it uses 0 for the indexes, or rather for the position of the values of the type. This is not the value of the enumeration literals, and not the binary representation of them.
what are the ordinal values for the following examples?
There are no "ordinal" values. The values of the type are the ones you specified. You are confusing "value", "representation", and "position" here.
The values of your Status_Type are Undefined, Available, Out, Assigned, Effected, and Cleared.
The positions are 0, 1, 2, 3, 4, and 5. These are what you can use to translate with 'Pos and 'Val.
The representation defaults to the position, but you can freely assign other values (as long as you keep the correct order). These are used if you write it to a file, or send it through a socket, or load it into a register..
I think the best way to answer your questions is in reverse:
A subtype is, mathematically speaking, a continuous subset of its parent type. So, if the type SIZES is (1, 2, 3, 4, 5, 6, 7, 8) and you define a subtype MEDIUM as (4,5) the first element of MEDIUM is 4.
Example:
Type Small_Natural is 0..16;
Subtype Small_Positive is Small_Natural'Succ(Small_Natural'First)..Small_Natural'Last;
This defines two small sets of possible-values, which are tightly related: namely that Positive numbers are all the Natural Numbers save Zero.
I used this form to illustrate that with a few text-changes we have the following example:
Type Device is ( Not_Present, Power_Save, Read, Write );
Subtype Device_State is Device'Succ(Device'First)..Device'Last;
And here we are modeling the intuitive notion that a device must be present to have a state, but note that the values in the subtype ARE [exactly] the values in the type from which they are derived.
This answers your second question: Yes, an element of an enumeration would have the same value that its parent-type would.
As to the first, I believe the starting position is actually implementation defined (if not then I assume the LM defaults it to 0). You are, however free to override that and provide your own numbering, the only restriction being that elements earlier in the enumeration are valued less than the value that you are assigning currently [IIRC].

Should I test if equal to 1 or not equal to 0?

I was coding here the other day, writing a couple of if statements with integers that are always either 0 or 1 (practically acting as bools). I asked myself:
When testing for positive result, which is better; testing for int == 1 or int != 0?
For example, given an int n, if I want to test if it's true, should I use n == 1 or n != 0?
Is there any difference at all in regards to speed, processing power, etc?
Please ignore the fact that the int may being more/less than 1/0, it is irrelevant and does not occur.
Human's brain better process statements that don't contain negations, which makes "int == 1" better way.
It really depends. If you're using a language that supports booleans, you should use the boolean, not an integer, ie:
if (value == false)
or
if (value == true)
That being said, with real boolean types, it's perfectly valid (and typically nicer) to just write:
if (!value)
or
if (value)
There is really very little reason in most modern languages to ever use an integer for a boolean operation.
That being said, if you're using a language which does not support booleans directly, the best option here really depends on how you're defining true and false. Often, false is 0, and true is anything other than 0. In that situation, using if (i == 0) (for false check) and if (i != 0) for true checking.
If you're guaranteed that 0 and 1 are the only two values, I'd probably use if (i == 1) since a negation is more complex, and more likely to lead to maintenance bugs.
If you're working with values that can only be 1 or 0, then I suggest you use boolean values to begin with and then just do if (bool) or if (!bool).
In language where int that are not 0 represents the boolean value 'true', and 0 'false', like C, I will tend to use if (int != 0) because it represents the same meaning as if (int) whereas int == 1 represents more the integer value being equal to 1 rather than the boolean true. It may be just me though. In languages that support the boolean type, always use it rather than ints.
A Daft question really. If you're testing for 1, test for 1, if you're testing for zero, test for zero.
The addition of an else statement can make the choice can seem arbitrary. I'd choose which makes the most sense, or has more contextual significance, default or 'natural' behaviour suggested by expected frequency of occurrence for example.
This choice between int == 0 and int != 1 may very well boil down to subjective evaluations which probably aren't worth worrying about.
Two points:
1) As noted above, being more explicit is a win. If you add something to an empty list you not only want its size to be not zero, but you also want it to be explicitly 1.
2) You may want to do
(1 == int)
That way if you forget an = you'll end up with a compile error rather than a debugging session.
To be honest if the value of int is just 1 or 0 you could even say:
if (int)
and that would be the same as saying
if (int != 0)
but you probably would want to use
if (int == 1)
because not zero would potentially let the answer be something other than 1 even though you said not to worry about it.
If only two values are possible, then I would use the first:
if(int == 1)
because it is more explicit. If there were no constraint on the values, I would think otherwise.
IF INT IS 1
NEXT SENTENCE
ELSE MOVE "INT IS NOT ONE" TO MESSAGE.
As others have said, using == is frequently easier to read than using !=.
That said, most processors have a specific compare-to-zero operation. It depends on the specific compiler, processor, et cetera, but there may be an almost immeasurably small speed benefit to using != 0 over == 1 as a result.
Most languages will let you use if (int) and if (!int), though, which is both more readable and get you that minuscule speed bonus.
I'm paranoid. If a value is either 0 or 1 then it might be 2. May be not today, may be not tomorrow, but some maintenance programmer is going to do something weird in a subclass. Sometimes I make mistakes myself [shh, don't tell my employer]. So, make the code say tell me that the value is either 0 or 1, otherwise it cries to mummy.
if (i == 0) {
... 0 stuff ...
} else if (i == 1) {
... 1 stuff ...
} else {
throw new Error();
}
(You might prefer switch - I find its syntax in curly brace language too heavy.)
When using integers as booleans, I prefer to interpret them as follows: false = 0, true = non-zero.
I would write the condition statements as int == 0 and int != 0.
I would say it depends on the semantics, if you condition means
while ( ! abort ) negation is ok.
if ( quit ) break; would be also ok.
if( is_numeric( $int ) ) { its a number }
elseif( !$int ) { $int is not set or false }
else { its set but its not a number }
end of discussion :P
I agree with what most people have said in this post. It's much more efficient to use boolean values if you have one of two distinct possibilities. It also makes the code a lot easier to read and interpret.
if(bool) { ... }
I was from the c world. At first I don't understand much about objective-c. After some while, I prefer something like:
if (int == YES)
or
if (int == NO)
in c, i.e.:
if (int == true)
if (int == false)
these days, I use varchar instead of integer as table keys too, e.g.
name marital_status
------ --------------
john single
joe married
is a lot better than:
name marital_status
------ --------------
john S
joe M
or
name marital_status
------ --------------
john 1
joe 2
(Assuming your ints can only be 1 or 0) The two statements are logically equivalent. I'd recommend using the == syntax though because I think it's clearer to most people when you don't introduce unnecessary negations.

Resources