Related
I am trying to find the best algorithm for my particular application. I have searched around on SO, Google, read various articles about Levenshtein distances, etc. but honestly it's a bit out of my area of expertise. And most seem to find how similar two input strings are, like a Hamming distance between strings.
What I'm looking for is different, more of a fuzzy record search (and I'm sure there is a name for it, that I don't know to Google). I am sure someone has solved this problem before and I'm looking for a recommendation to point me in the right direction for my further research.
In my case I am needing a fuzzy search of a database of entries of music artists and their albums. As you can imagine, the database will have millions of entries so an algorithm that scales well is crucial. It's not important to my question that Artist and Album are in different columns, the database could just store all words in one column if that helped the search.
The database to search:
|-------------------|---------------------|
| Artist | Album |
|-------------------|---------------------|
| Alanis Morissette | Jagged Little Pill |
| Moby | Everything is Wrong |
| Air | Moon Safari |
| Pearl Jam | Ten |
| Nirvana | Nevermind |
| Radiohead | OK Computer |
| Beck | Odelay |
|-------------------|---------------------|
The query text will contain from just one word in the entire Artist_Album concatenation up to the entire thing. The query text is coming from OCR and is likely to have single character transpositions but the most likely thing is the words are not guaranteed to have the right order. Additionally, there could be extra words in the search that aren't a part of the album (like cover art text). For example, "OK Computer" might be at the top of the album and "Radiohead" below it, or some albums have text arranged in columns which intermixes the word orders.
Possible search strings:
C0mputer Rad1ohead
Pearl Ten Jan
Alanis Jagged Morisse11e Litt1e Pi11
Air Moon Virgin Records
Moby Everything
Note that with OCR, some letters will look like numbers, or the wrong letter completely (Jan instead of Jam). And in the case of Radiohead's OK Computer and Moby's Everything Is Wrong, the query text doesn't even have all of the words. In the case of Air's Moon Safari, the extra words Virgin Records are searched, but Safari is missing.
Is there a general algorithm that could return the single likeliest result from the database, and if none meet some "likeliness" score threshold, it returns nothing? I'm actually developing this in Python, but that's just a bonus, I'm looking more for where to get started researching.
Let's break the problem down in two parts.
First, you want to define some measure of likeness (this is called a metric). This metric should return a small number if the query text closely matches the album/artist cover, and return a larger number otherwise.
Second, you want a datastructure that speeds up this process. Obviously, you don't want to calculate this metric every single time a query is ran.
part 1: the metric
You already mentioned Levenshtein distance, which is a great place to start.
Think outside the box though.
LD makes certain assumptions (each character replacement is equally likely, deletion is equally likely as insertion, etc). You can obviously improve the performance of this metric by taking into account what faults OCR is likely to introduce.
E.g. turning a '1' into an 'i' should not be penalized as harshly as turning a '0' into an '_'.
I would implement the metric in two stages. For any given two strings:
split both strings in tokens (assume space as the separator)
look for the most similar words (using a modified version of LD)
assign a final score based on 'matching words', 'missing words' and 'added words' (preferably weighted)
This is an example implementation (fiddle around with the constants):
static double m(String a, String b){
String[] aParts = a.split(" ");
String[] bParts = b.split(" ");
boolean[] bUsed = new boolean[bParts.length];
int matchedTokens = 0;
int tokensInANotInB = 0;
int tokensInBNotInA = 0;
for(int i=0;i<aParts.length;i++){
String a0 = aParts[i];
boolean wasMatched = true;
for(int j=0;j<bParts.length;j++){
String b0 = bParts[j];
double d = levenshtein(a0, b0);
/* If we match the token a0 with a token from b0
* update the number of matchedTokens
* escape the loop
*/
if(d < 2){
bUsed[j]=true;
wasMatched = true;
matchedTokens++;
break;
}
}
if(!wasMatched){
tokensInANotInB++;
}
}
for(boolean partUsed : bUsed){
if(!partUsed){
tokensInBNotInA++;
}
}
return (matchedTokens
+ tokensInANotInB * -0.3 // the query is allowed to contain extra words at minimal cost
+ tokensInBNotInA * -0.5 // the album title should not contain too many extra words
) / java.lang.Math.max(aParts.length, bParts.length);
}
This function uses a modified levenshtein function:
static double levenshtein(String x, String y) {
double[][] dp = new double[x.length() + 1][y.length() + 1];
for (int i = 0; i <= x.length(); i++) {
for (int j = 0; j <= y.length(); j++) {
if (i == 0) {
dp[i][j] = j;
}
else if (j == 0) {
dp[i][j] = i;
}
else {
dp[i][j] = min(dp[i - 1][j - 1]
+ costOfSubstitution(x.charAt(i - 1), y.charAt(j - 1)),
dp[i - 1][j] + 1,
dp[i][j - 1] + 1);
}
}
}
return dp[x.length()][y.length()];
}
Which uses the function 'cost of substitution' (which works as explained)
static double costOfSubstitution(char a, char b){
if(a == b)
return 0.0;
else{
// 1 and i
if(a == '1' && b == 'i')
return 0.5;
if(a == 'i' && b == '1')
return 0.5;
// 0 and O
if(a == '0' && b == 'o')
return 0.5;
if(a == 'o' && b == '0')
return 0.5;
if(a == '0' && b == 'O')
return 0.5;
if(a == 'O' && b == '0')
return 0.5;
// default
return 1.0;
}
}
I only included a couple of examples (turning '1' into 'i' or '0' into 'o').
But I'm sure you get the idea.
part 2: the datastructure
Look into BK-trees. They are a specific datastructure to hold metric information. Your metric needs to be a genuine metric (in the mathematical sense of the word). But that's easily arranged.
I am trying to generate a small algorithm that will give a user a decimal score out of 1 based on how close their answer is to a true answer. These answers will always be numeric and be things like 'How many x did this?'
I will be setting a sensible maximum and minimum value for each answer where if a users answer exceeds this, they will score nothing though am a bit stuck on getting an equation created ...
As an example, a correct answer could be 100 and a sensible minimum could be set as 50. A user specifying 75 would thus be given a score of 0.5
Perhaps getting a bit complicated now but it would also be nice to allocate the score on a curve so the result is not linear and thus weighting is higher the nearer you are to the correct answer
Any help or better ideas for this scoring would be much appreciated
A formula code could be like this :
score = abs(input - answer) / (answer - min)
for your example we have input = 75 , answer = 100 and min = 50 so:
score = abs(75 - 100) / (100 - 50) = 25 / 50 = 0.5
If you wanted the scoring to be non-linear (to reward closeness to the answer) you could try a 'squared difference' formula. E.g.
score = 1 - (abs((answer - input)/(answer - minimum)))^2
e.g. with correct = 100, minimum = 60, answer = 70 you would get:
score = 1 - (abs((100 - 70)/(100 - 60)))^2 = 0.4375
If you want to give a greater reward for closeness, you could use a higher power. Note that division by zero will occur if answer = minimum.
I implemented the algorithm in Java and made a small test case.
public class Quiz{
public static double calculateScore(int input,
int correctAnswer,
int minimumAnswer){
if(input == correctAnswer){
return 1;
}
double correctInterval = Math.abs(correctAnswer - minimumAnswer);
double relativeAnswer = Math.abs(correctAnswer - input);
if(relativeAnswer > correctInterval){
return 0;
}else{
double score = relativeAnswer/correctInterval;
score *= score;// make ^2 to avoid a linear progression
return 1.0 - score;
}
}
}
public class QuizTest{
#Test
public void testCalculateScore() {
assertTrue(0 == Quiz.calculateScore(5, 20, 15));
assertTrue(0 == Quiz.calculateScore(30, 20, 15));
assertTrue(1 == Quiz.calculateScore(20, 20, 15));
assertTrue(0 < Quiz.calculateScore(17, 20, 15));
assertTrue(0 < Quiz.calculateScore(22, 20, 15));
assertTrue(Quiz.calculateScore(18, 20, 15) == Quiz.calculateScore(22, 20, 15));
assertTrue(Quiz.calculateScore(17, 20, 15) < Quiz.calculateScore(22, 20, 15));
}
}
The test run is successful
QUESTION:
We define super digit of an integer x using the following rules:
Iff x has only 1 digit, then its super digit is x.
Otherwise, the super digit of x is equal to the super digit of the digit-sum of x. Here, digit-sum of a number is defined as the sum of its digits.
For example, super digit of 9875 will be calculated as:
super-digit(9875) = super-digit(9+8+7+5)
= super-digit(29)
= super-digit(2+9)
= super-digit(11)
= super-digit(1+1)
= super-digit(2)
= 2.
You are given two numbers - n k. You have to calculate the super digit of P.
P is created when number n is concatenated k times. That is, if n = 123 and k = 3, then P = 123123123.
Input Format
Input will contain two space separated integers, n and k.
Output Format
Output the super digit of P, where P is created as described above.
Constraint
1≤n<10100000
1≤k≤105
Sample Input
148 3
Sample Output
3
Explanation
Here n = 148 and k = 3, so P = 148148148.
super-digit(P) = super-digit(148148148)
= super-digit(1+4+8+1+4+8+1+4+8)
= super-digit(39)
= super-digit(3+9)
= super-digit(12)
= super-digit(1+2)
= super-digit(3)
= 3.
I have written the following program to solve the above problem , but how to solve it even efficiently and is string operation efficient than math operation ??? and for few inputs it takes a long time for example
861568688536788 100000
object SuperDigit {
def main(args: Array[String]) {
/* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution
*/
def generateString (no:String,re:BigInt , tot:BigInt , temp:String):String = {
if(tot-1>re) generateString(no+temp,re+1,tot,temp)
else no
}
def totalSum(no:List[Char]):BigInt = no match {
case x::xs => x.asDigit+totalSum(xs)
case Nil => '0'.asDigit
}
def tot(no:List[Char]):BigInt = no match {
case _ if no.length == 1=> no.head.asDigit
case no => tot(totalSum(no).toString.toList)
}
var list = readLine.split(" ");
var one = list.head.toString();
var two = BigInt(list(1));
//println(generateString("148",0,3,"148"))
println(tot(generateString(one,BigInt(0),two,one).toList))
}
}
One reduction is to realise that you do not have to concatenate the number considered as a string k times but rather can start with the number k * qs(n) (where qs is the function that maps a number to its sum of digits, i. e. qs(123) = 1+2+3). Here is a more functional programming stylish approach. I do not know whether it can be made faster than this.
object Solution {
def qs(n: BigInt): BigInt = n.toString.foldLeft(BigInt(0))((n, ch) => n + (ch - '0').toInt)
def main(args: Array[String]) {
val input = scala.io.Source.stdin.getLines
val Array(n, k) = input.next.split(" ").map(BigInt(_))
println(Stream.iterate(k * qs(n))(qs(_)).find(_ < 10).get)
}
}
Recently I found this in some code I wrote a few years ago. It was used to rationalize a real value (within a tolerance) by determining a suitable denominator and then checking if the difference between the original real and the rational was small enough.
Edit to clarify : I actually don't want to convert all real values. For instance I could choose a max denominator of 14, and a real value that equals 7/15 would stay as-is. It's not as clear that as it's an outside variable in the algorithms I wrote here.
The algorithm to get the denominator was this (pseudocode):
denominator(x)
frac = fractional part of x
recip = 1/frac
if (frac < tol)
return 1
else
return recip * denominator(recip)
end
end
Seems to be based on continued fractions although it became clear on looking at it again that it was wrong. (It worked for me because it would eventually just spit out infinity, which I handled outside, but it would be often really slow.) The value for tol doesn't really do anything except in the case of termination or for numbers that end up close. I don't think it's relatable to the tolerance for the real - rational conversion.
I've replaced it with an iterative version that is not only faster but I'm pretty sure it won't fail theoretically (d = 1 to start with and fractional part returns a positive, so recip is always >= 1) :
denom_iter(x d)
return d if d > maxd
frac = fractional part of x
recip = 1/frac
if (frac = 0)
return d
else
return denom_iter(recip d*recip)
end
end
What I'm curious to know if there's a way to pick the maxd that will ensure that it converts all values that are possible for a given tolerance. I'm assuming 1/tol but don't want to miss something. I'm also wondering if there's an way in this approach to actually limit the denominator size - this allows some denominators larger than maxd.
This can be considered a 2D minimization problem on error:
ArgMin ( r - q / p ), where r is real, q and p are integers
I suggest the use of Gradient Descent algorithm . The gradient in this objective function is:
f'(q, p) = (-1/p, q/p^2)
The initial guess r_o can be q being the closest integer to r, and p being 1.
The stopping condition can be thresholding of the error.
The pseudo-code of GD can be found in wiki: http://en.wikipedia.org/wiki/Gradient_descent
If the initial guess is close enough, the objective function should be convex.
As Jacob suggested, this problem can be better solved by minimizing the following error function:
ArgMin ( p * r - q ), where r is real, q and p are integers
This is linear programming, which can be efficiently solved by any ILP (Integer Linear Programming) solvers. GD works on non-linear cases, but lack efficiency in linear problems.
Initial guesses and stopping condition can be similar to stated above. Better choice can be obtained for individual choice of solver.
I suggest you should still assume convexity near the local minimum, which can greatly reduce cost. You can also try Simplex method, which is great on linear programming problem.
I give credit to Jacob on this.
A problem similar to this is solved in the Approximations section beginning ca. page 28 of Bill Gosper's Continued Fraction Arithmetic document. (Ref: postscript file; also see text version, from line 1984.) The general idea is to compute continued-fraction approximations of the low-end and high-end range limiting numbers, until the two fractions differ, and then choose a value in the range of those two approximations. This is guaranteed to give a simplest fraction, using Gosper's terminology.
The python code below (program "simpleden") implements a similar process. (It probably is not as good as Gosper's suggested implementation, but is good enough that you can see what kind of results the method produces.) The amount of work done is similar to that for Euclid's algorithm, ie O(n) for numbers with n bits, so the program is reasonably fast. Some example test cases (ie the program's output) are shown after the code itself. Note, function simpleratio(vlo, vhi) as shown here returns -1 if vhi is smaller than vlo.
#!/usr/bin/env python
def simpleratio(vlo, vhi):
rlo, rhi, eps = vlo, vhi, 0.0000001
if vhi < vlo: return -1
num = denp = 1
nump = den = 0
while 1:
klo, khi = int(rlo), int(rhi)
if klo != khi or rlo-klo < eps or rhi-khi < eps:
tlo = denp + klo * den
thi = denp + khi * den
if tlo < thi:
return tlo + (rlo-klo > eps)*den
elif thi < tlo:
return thi + (rhi-khi > eps)*den
else:
return tlo
nump, num = num, nump + klo * num
denp, den = den, denp + klo * den
rlo, rhi = 1/(rlo-klo), 1/(rhi-khi)
def test(vlo, vhi):
den = simpleratio(vlo, vhi);
fden = float(den)
ilo, ihi = int(vlo*den), int(vhi*den)
rlo, rhi = ilo/fden, ihi/fden;
izok = 'ok' if rlo <= vlo <= rhi <= vhi else 'wrong'
print '{:4d}/{:4d} = {:0.8f} vlo:{:0.8f} {:4d}/{:4d} = {:0.8f} vhi:{:0.8f} {}'.format(ilo,den,rlo,vlo, ihi,den,rhi,vhi, izok)
test (0.685, 0.695)
test (0.685, 0.7)
test (0.685, 0.71)
test (0.685, 0.75)
test (0.685, 0.76)
test (0.75, 0.76)
test (2.173, 2.177)
test (2.373, 2.377)
test (3.484, 3.487)
test (4.0, 4.87)
test (4.0, 8.0)
test (5.5, 5.6)
test (5.5, 6.5)
test (7.5, 7.3)
test (7.5, 7.5)
test (8.534537, 8.534538)
test (9.343221, 9.343222)
Output from program:
> ./simpleden
8/ 13 = 0.61538462 vlo:0.68500000 9/ 13 = 0.69230769 vhi:0.69500000 ok
6/ 10 = 0.60000000 vlo:0.68500000 7/ 10 = 0.70000000 vhi:0.70000000 ok
6/ 10 = 0.60000000 vlo:0.68500000 7/ 10 = 0.70000000 vhi:0.71000000 ok
2/ 4 = 0.50000000 vlo:0.68500000 3/ 4 = 0.75000000 vhi:0.75000000 ok
2/ 4 = 0.50000000 vlo:0.68500000 3/ 4 = 0.75000000 vhi:0.76000000 ok
3/ 4 = 0.75000000 vlo:0.75000000 3/ 4 = 0.75000000 vhi:0.76000000 ok
36/ 17 = 2.11764706 vlo:2.17300000 37/ 17 = 2.17647059 vhi:2.17700000 ok
18/ 8 = 2.25000000 vlo:2.37300000 19/ 8 = 2.37500000 vhi:2.37700000 ok
114/ 33 = 3.45454545 vlo:3.48400000 115/ 33 = 3.48484848 vhi:3.48700000 ok
4/ 1 = 4.00000000 vlo:4.00000000 4/ 1 = 4.00000000 vhi:4.87000000 ok
4/ 1 = 4.00000000 vlo:4.00000000 8/ 1 = 8.00000000 vhi:8.00000000 ok
11/ 2 = 5.50000000 vlo:5.50000000 11/ 2 = 5.50000000 vhi:5.60000000 ok
5/ 1 = 5.00000000 vlo:5.50000000 6/ 1 = 6.00000000 vhi:6.50000000 ok
-7/ -1 = 7.00000000 vlo:7.50000000 -7/ -1 = 7.00000000 vhi:7.30000000 wrong
15/ 2 = 7.50000000 vlo:7.50000000 15/ 2 = 7.50000000 vhi:7.50000000 ok
8030/ 941 = 8.53347503 vlo:8.53453700 8031/ 941 = 8.53453773 vhi:8.53453800 ok
24880/2663 = 9.34284641 vlo:9.34322100 24881/2663 = 9.34322193 vhi:9.34322200 ok
If, rather than the simplest fraction in a range, you seek the best approximation given some upper limit on denominator size, consider code like the following, which replaces all the code from def test(vlo, vhi) forward.
def smallden(target, maxden):
global pas
pas = 0
tol = 1/float(maxden)**2
while 1:
den = simpleratio(target-tol, target+tol);
if den <= maxden: return den
tol *= 2
pas += 1
# Test driver for smallden(target, maxden) routine
import random
totalpass, trials, passes = 0, 20, [0 for i in range(20)]
print 'Maxden Num Den Num/Den Target Error Passes'
for i in range(trials):
target = random.random()
maxden = 10 + round(10000*random.random())
den = smallden(target, maxden)
num = int(round(target*den))
got = float(num)/den
print '{:4d} {:4d}/{:4d} = {:10.8f} = {:10.8f} + {:12.9f} {:2}'.format(
int(maxden), num, den, got, target, got - target, pas)
totalpass += pas
passes[pas-1] += 1
print 'Average pass count: {:0.3}\nPass histo: {}'.format(
float(totalpass)/trials, passes)
In production code, drop out all the references to pas (etc.), ie, drop out pass-counting code.
The routine smallden is given a target value and a maximum value for allowed denominators. Given maxden possible choices of denominators, it's reasonable to suppose that a tolerance on the order of 1/maxden² can be achieved. The pass-counts shown in the following typical output (where target and maxden were set via random numbers) illustrate that such a tolerance was reached immediately more than half the time, but in other cases tolerances 2 or 4 or 8 times as large were used, requiring extra calls to simpleratio. Note, the last two lines of output from a 10000-number test run are shown following the complete output of a 20-number test run.
Maxden Num Den Num/Den Target Error Passes
1198 32/ 509 = 0.06286837 = 0.06286798 + 0.000000392 1
2136 115/ 427 = 0.26932084 = 0.26932103 + -0.000000185 1
4257 839/2670 = 0.31423221 = 0.31423223 + -0.000000025 1
2680 449/ 509 = 0.88212181 = 0.88212132 + 0.000000486 3
2935 440/1853 = 0.23745278 = 0.23745287 + -0.000000095 1
6128 347/1285 = 0.27003891 = 0.27003899 + -0.000000077 3
8041 1780/4243 = 0.41951449 = 0.41951447 + 0.000000020 2
7637 3926/7127 = 0.55086292 = 0.55086293 + -0.000000010 1
3422 27/ 469 = 0.05756930 = 0.05756918 + 0.000000113 2
1616 168/1507 = 0.11147976 = 0.11147982 + -0.000000061 1
260 62/ 123 = 0.50406504 = 0.50406378 + 0.000001264 1
3775 52/3327 = 0.01562970 = 0.01562750 + 0.000002195 6
233 6/ 13 = 0.46153846 = 0.46172772 + -0.000189254 5
3650 3151/3514 = 0.89669892 = 0.89669890 + 0.000000020 1
9307 2943/7528 = 0.39094049 = 0.39094048 + 0.000000013 2
962 206/ 225 = 0.91555556 = 0.91555496 + 0.000000594 1
2080 564/1975 = 0.28556962 = 0.28556943 + 0.000000190 1
6505 1971/2347 = 0.83979548 = 0.83979551 + -0.000000022 1
1944 472/ 833 = 0.56662665 = 0.56662696 + -0.000000305 2
3244 291/1447 = 0.20110574 = 0.20110579 + -0.000000051 1
Average pass count: 1.85
Pass histo: [12, 4, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The last two lines of output from a 10000-number test run:
Average pass count: 1.77
Pass histo: [56659, 25227, 10020, 4146, 2072, 931, 497, 233, 125, 39, 33, 17, 1, 0, 0, 0, 0, 0, 0, 0]
Four men have to cross a bridge at night.Any party who crosses, either one or two men, must carry the flashlight with them. The flashlight must be walked back and forth; it cannot be thrown, etc. Each man walks at a different speed. One takes 1 minute to cross, another 2 minutes, another 5, and the last 10 minutes. If two men cross together, they must walk at the slower man's pace. There are no tricks--the men all start on the same side, the flashlight cannot shine a long distance, no one can be carried, etc.
And the question is What's the fastest they can all get across. I am basically looking for some generalized approach to these kind of problem. I was told by my friend, that this can be solved by Fibonacci series, but the solution does not work for all.
Please note this is not a home work.
There is an entire PDF (alternate link) that solves the general case of this problem (in a formal proof).
17 minutes - this is a classic MS question.
1,2 => 2 minutes passed.
1 retuns => 3 minutes passed.
5,10 => 13 minutes passed.
2 returns => 15 minutes passed.
1,2 => 17 minute passed.
In general the largest problem / slowest people should always be put together, and sufficient trips of the fastest made to be able to bring the light back each time without using a slow resource.
I would solve this problem by placing a fake job ad on Dice.com, and then asking this question in the interviews until someone gets it right.
As per Wikipedia
The puzzle is known to have appeared as early as 1981, in the book Super Strategies For Puzzles and Games. In this version of the puzzle, A, B, C and D take 5, 10, 20, and 25 minutes, respectively, to cross, and the time limit is 60 minutes
This question was however popularized after its appearance in the book "How Would You Move Mount Fuji?"
the question can be generalized for N people with varying individual time taken to cross the bridge.
The below program works for a generic N no of people and their times.
class Program
{
public static int TotalTime(List<int> band, int n)
{
if (n < 3)
{
return band[n - 1];
}
else if (n == 3)
{
return band[0] + band[1] + band[2];
}
else
{
int temp1 = band[n - 1] + band[0] + band[n - 2] + band[0];
int temp2 = band[1] + band[0] + band[n - 1] + band[1];
if (temp1 < temp2)
{
return temp1 + TotalTime(band, n - 2);
}
else if (temp2 < temp1)
{
return temp2 + TotalTime(band, n - 2);
}
else
{
return temp2 + TotalTime(band, n - 2);
}
}
}
static void Main(string[] args)
{
// change the no of people crossing the bridge
// add or remove corresponding time to the list
int n = 4;
List<int> band = new List<int>() { 1, 2, 5, 10 };
band.Sort();
Console.WriteLine("The total time taken to cross the bridge is: " + Program.TotalTime(band, n));
Console.ReadLine();
}
}
OUTPUT:
The total time taken to cross the bridge is: 17
For,
int n = 5;
List<int> band = new List<int>() { 1, 2, 5, 10, 12 };
OUTPUT:
The total time taken to cross the bridge is: 25
For,
int n = 4;
List<int> band = new List<int>() { 5, 10, 20, 25 };
OUTPUT
The total time taken to cross the bridge is: 60
Here's the response in ruby:
#values = [1, 2, 5, 10]
# #values = [1, 2, 5, 10, 20, 25, 30, 35, 40]
#values.sort!
#position = #values.map { |v| :first }
#total = 0
def send_people(first, second)
first_time = #values[first]
second_time = #values[second]
#position[first] = :second
#position[second] = :second
p "crossing #{first_time} and #{second_time}"
first_time > second_time ? first_time : second_time
end
def send_lowest
value = nil
#values.each_with_index do |v, i|
if #position[i] == :second
value = v
#position[i] = :first
break
end
end
p "return #{value}"
return value
end
def highest_two
first = nil
second = nil
first_arr = #position - [:second]
if (first_arr.length % 2) == 0
#values.each_with_index do |v, i|
if #position[i] == :first
first = i unless first
second = i if !second && i != first
end
break if first && second
end
else
#values.reverse.each_with_index do |v, i|
real_index = #values.length - i - 1
if #position[real_index] == :first
first = real_index unless first
second = real_index if !second && real_index != first
end
break if first && second
end
end
return first, second
end
#we first send the first two
#total += send_people(0, 1)
#then we get the lowest one from there
#total += send_lowest
#we loop through the rest with highest 2 always being sent
while #position.include?(:first)
first, second = highest_two
#total += send_people(first, second)
#total += send_lowest if #position.include?(:first)
end
p "Total time: #{#total}"
Another Ruby implementation inspired by #roc-khalil 's solution
#values = [1,2,5,10]
# #values = [1,2,5,10,20,25]
#left = #values.sort
#right = []
#total_time = 0
def trace(moving)
puts moving
puts "State: #{#left} #{#right}"
puts "Time: #{#total_time}"
puts "-------------------------"
end
# move right the fastest two
def move_fastest_right!
fastest_two = #left.shift(2)
#right = #right + fastest_two
#right = #right.sort
#total_time += fastest_two.max
trace "Moving right: #{fastest_two}"
end
# move left the fastest runner
def move_fastest_left!
fastest_one = #right.shift
#left << fastest_one
#left.sort!
#total_time += fastest_one
trace "Moving left: #{fastest_one}"
end
# move right the slowest two
def move_slowest_right!
slowest_two = #left.pop(2)
#right = #right + slowest_two
#right = #right.sort
#total_time += slowest_two.max
trace "Moving right: #{slowest_two}"
end
def iterate!
move_fastest_right!
return if #left.length == 0
move_fastest_left!
move_slowest_right!
return if #left.length == 0
move_fastest_left!
end
puts "State: #{#left} #{#right}"
puts "-------------------------"
while #left.length > 0
iterate!
end
Output:
State: [1, 2, 5, 10] []
-------------------------
Moving right: [1, 2]
State: [5, 10] [1, 2]
Time: 2
-------------------------
Moving left: 1
State: [1, 5, 10] [2]
Time: 3
-------------------------
Moving right: [5, 10]
State: [1] [2, 5, 10]
Time: 13
-------------------------
Moving left: 2
State: [1, 2] [5, 10]
Time: 15
-------------------------
Moving right: [1, 2]
State: [] [1, 2, 5, 10]
Time: 17
-------------------------
An exhaustive search of all possibilities is simple with such a small problem space. Breadth or depth first would work. It is a simple CS problem.
I prefer the missionary and cannibal problems myself
17 -- a very common question
-> 1-2 = 2
<- 2 = 2
-> 5,10 = 10 (none of them has to return)
<- 1 = 1
-> 1,2 = 2
all on the other side
total = 2+2+10+1+2 = 17
usually people get it as 19 in the first try
Considering there will be 2 sides, side 1 and side 2, and N number of people should cross from side 1 to side 2. The logic to cross the bridge by a limit of L number of people would be -
Step 1 : Move L number of the fastest members from side 1 to side 2
Step 2 : Bring back the fastest person back from Side 2 to Side 1
Step 3 : Move L number of slowest members from side 1 to side 2
Step 4 : Bring back the fastest person among the ones present in Side 2
Repeat these steps until you will be left with no one in Side 1, either at the end of step 2 or at the end of step 4.
A code in C# for n number of people, with just 2 persons at a time is here. This will intake N number of people, which can be specified in runtime. It will then accept person name and time taken, for N people. The output also specifies the iteration of the lowest time possible.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace RiverCrossing_Problem
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, int> Side1 = new Dictionary<string, int>();
Dictionary<string, int> Side2 = new Dictionary<string, int>();
Console.WriteLine("Enter number of persons");
int n = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("Enter the name and time taken by each");
for(int a =0; a<n; a++)
{
string tempname = Console.ReadLine();
int temptime = Convert.ToInt32(Console.ReadLine());
Side1.Add(tempname, temptime);
}
Console.WriteLine("Shortest time and logic:");
int totaltime = 0;
int i = 1;
do
{
KeyValuePair<string, int> low1, low2, high1, high2;
if (i % 2 == 1)
{
LowestTwo(Side1, out low1, out low2);
Console.WriteLine("{0} and {1} goes from side 1 to side 2, time taken = {2}", low1.Key, low2.Key, low2.Value);
Side1.Remove(low2.Key);
Side1.Remove(low1.Key);
Side2.Add(low2.Key, low2.Value);
Side2.Add(low1.Key, low1.Value);
totaltime += low2.Value;
low1 = LowestOne(Side2);
Console.WriteLine("{0} comes back to side 1, time taken = {1}", low1.Key, low1.Value);
totaltime += low1.Value;
Side1.Add(low1.Key, low1.Value);
Side2.Remove(low1.Key);
i++;
}
else
{
HighestTwo(Side1, out high1, out high2);
Console.WriteLine("{0} and {1} goes from side 1 to side 2, time taken = {2}", high1.Key, high2.Key, high1.Value);
Side1.Remove(high1.Key);
Side1.Remove(high2.Key);
Side2.Add(high1.Key, high1.Value);
Side2.Add(high2.Key, high2.Value);
totaltime += high1.Value;
low1 = LowestOne(Side2);
Console.WriteLine("{0} comes back to side 1, time taken = {1}", low1.Key, low1.Value);
Side2.Remove(low1.Key);
Side1.Add(low1.Key, low1.Value);
totaltime += low1.Value;
i++;
}
} while (Side1.Count > 2);
KeyValuePair<string, int> low3, low4;
LowestTwo(Side1, out low3, out low4);
Console.WriteLine("{0} and {1} goes from side 1 to side 2, time taken = {2}", low3.Key, low4.Key, low4.Value);
Side2.Add(low4.Key, low4.Value);
Side2.Add(low3.Key, low3.Value);
totaltime += low4.Value;
Console.WriteLine("\n");
Console.WriteLine("Total Time taken = {0}", totaltime);
}
public static void LowestTwo(Dictionary<string, int> a, out KeyValuePair<string, int> low1, out KeyValuePair<string, int> low2)
{
Dictionary<string, int> b = a;
low1 = b.OrderBy(kvp => kvp.Value).First();
b.Remove(low1.Key);
low2 = b.OrderBy(kvp => kvp.Value).First();
}
public static void HighestTwo(Dictionary<string,int> a, out KeyValuePair<string,int> high1, out KeyValuePair<string,int> high2)
{
Dictionary<string, int> b = a;
high1 = b.OrderByDescending(k => k.Value).First();
b.Remove(high1.Key);
high2 = b.OrderByDescending(k => k.Value).First();
}
public static KeyValuePair<string, int> LowestOne(Dictionary<string,int> a)
{
Dictionary<string, int> b = a;
return b.OrderBy(k => k.Value).First();
}
}
}
Sample output for a random input provided which is 7 in this case, and 2 persons to cross at a time will be:
Enter number of persons
7
Enter the name and time taken by each
A
2
B
5
C
3
D
7
E
9
F
4
G
6
Shortest time and logic:
A and C goes from side 1 to side 2, time taken = 3
A comes back to side 1, time taken = 2
E and D goes from side 1 to side 2, time taken = 9
C comes back to side 1, time taken = 3
A and C goes from side 1 to side 2, time taken = 3
A comes back to side 1, time taken = 2
G and B goes from side 1 to side 2, time taken = 6
C comes back to side 1, time taken = 3
A and C goes from side 1 to side 2, time taken = 3
A comes back to side 1, time taken = 2
A and F goes from side 1 to side 2, time taken = 4
Total Time taken = 40
I mapped out the possible solutions algebraically and came out the with the fastest time . and assigning algebra with the list of A,B,C,D where A is the smallest and D is the biggest
the formula for the shortest time is B+A+D+B+B or 3B+A+D
or in wordy terms, the sum of second fastest times 3 and add with the Most Fastest and Most Slowest.
looking at the program there was also a question of increased items. Although I haven't gone through it, but I am guessing the formula still applies, just add till all items with the second item times 3 and sum of everything except 2nd slowest times.
e.g. since 4 items are 3 x second + first and fourth.
then 5 items are 3 x second + first, third and fifth.
would like to check this out using the program.
also i just looked at the pdf shared above, so for more items it is the sum of
3 x second + fastest + sum of slowest of each subsequent pair.
looking at the steps for the optimized solution, the idea is
-right - for two items going to the right the fastest is 1st and 2nd fastest ,
-left - then plus the fastest going back for a single item is the fastest item
-right - bring the slowest 2 items, which will account for only the slowest item and disregard the second slowest.
-left - the 2nd fastest item.
-final right - the 1st and 2nd fastest again
so again summing up = 2nd fastest goes 3 times, fastest goes once, and slowest goes with 2nd slowest.
A simple algorithm is : assume 'N' is the number of people who can cross at same time and one person has to cross back bearing the torch
When moving people from first side to second side preference should be given to the 'N' slowest walkers
Always use fastest walker to take torch from second side to first side
When moving people from first side to second side, take into consideration who will bring back the torch in the next step. If the speed of the torch bearer in next step will be equal to the fastest walker, among the 'N' slowest walkers, in the current step then instead of choosing 'N' slowest walker, as given in '1', choose 'N' fastest walkers
Here is a sample python script which does this: https://github.com/meowbowgrr/puzzles/blob/master/bridgentorch.py