Exercise 9 - 45 minutes
You have been given a list of sequential numbers from 1 to 10,000, but they are all out of order; furthermore, 1 number is missing from the list. The goal is to find which number is missing.Write out in plain English your strategy for solving this problem. Be as concise as possible.
Write Ruby code that takes this list of numbers as an argument, and returns the missing number.
My initial impression is some sort of sort function will help me put the array into order, but then I reread the problem and its not asking for a sorted sequence, it's asking for a missing number. The next step to consider is how do you determine a number that is the next sequence and I think of the 99 bottles challenge in Chris Pine's book and realize that that "n + 1"or "n - 1" will be a part of the solution as will a 'range statement' that begins with 1 and ends with 10,000 (1..10,000).
I next think about indexing and that I'll need to loop through the range using #upto or #each to determine the missing number as well as some sort of conditional statement that allows me to return the missing value. I'll be defining a method "missing_number" but what is the input?
Is it an array? Or is it a range? I am going to go with array since most of the time arrays are unsorted and when I test it I'll define the input as a range.
After doing a bit of research I came across a strategy that indicated the key step would be to sum all of the numbers in the array and subtract the
difference from the sum of the given range. This makes a lot of sense as a good approach because you are dealing with a constant value, so I selected this approach
to inform the code.
def missing_number(array)
grand_sum = (array.length + 1) * (array.length + 2) / 2
sum = 0
array.each {|n| sum += n}
grand_sum - sum
end
x=(1..10000).to_a
x.delete rand(10000)
puts missing_number(x)
Related
One straightforward way to find it is just check each element and its following element and include it in another array. But it's not so clear. Another way would be Divide & Conquer approach using Merge Sorting algorithm. In this case it's not supposed to sort numbers but divide them recursively in subarrays and merge only those numbers which satisfy condition n < n+1. But I'm not sure about code implementation of program that is merging and checking part.
It can be done using a single pass scan operation on the candidate array to see the length of the consecutive increasing sequence. So the pseudocode can be like the following:
a = [5 1 3 10 5 15 25 35 45 3 4 5];
longest_seq = 1;
temp_sec = 1;
for i =1:size(a)
if a[i-1]<a[i[
temp_sec = temp_sec +1;
else
if(temp_sec > longest_seq )
longest_seq = temp_sec ;
end
temp_sec = 1;
end
end
longest_seq is the number you are looking for (To my understanding).
Forgive me if I misunderstand your problem but why use separate arrays, merging, sorting, etc.? If as I understand it you're just looking for the longest sequence of increasing elements, recursion could be your friend. Pass a pointer to element N to func(n), if N+1 is greater than N, then N++, and call func(N) calls itself again (making sure you don't exceed the bounds of your array.
A simplistic explanation but think you see what I'm driving at.
According as the number of elements in a set of numbers is odd or even, median of that set is defined respectively as the middle value or the average of the two middle values in the list that results when the set is sorted.
Below is code for calculating the "running" median of a list of numbers. "Running" median is a dynamic median which is re-calculated with the appearance of a new number as the list is scanned for all numbers that have appeared thus far. Input is an integer n followed by a list of n integers, and output should be the "running" median of the list as the list is scanned. For example,
3
4
1
5
should yield
4
2.5
4
because 4 is the median of [4], 2.5 ((1+4)/2)is the median of [4,1] and 4 again is the median of [4,1,5].
My program works correctly, but it times out on a certain test on very large inputs. I suspect that this copying step is the problem.
a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
But I am not sure because this copy is meant to be a shallow copy as far as I know. Also, I am not doing a regular insert anywhere, which would involved shifting elements and thus result in slowing down the code, into the array that contains the numbers.
n=gets.chomp.to_i
a=[]
n.times do
t=gets.chomp.to_i
a==[]||(t<=a.first) ? a.unshift(t): t>=a.last ? a.push(t) : a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
p (l=a.count)%2==0 ? ((a[l/2] + a[l/2-1])/2.0).round(1):a[(l-1)/2].round(1)
end
Can anybody point out where the problem could be? Thank you.
Here is a less obfuscated version.
n=gets.chomp.to_i
a=[]
n.times do
t=gets.chomp.to_i
if a==[]||(t<=a.first)
a.unshift(t)
else
k=a.index(a.bsearch{|x|x>=t})
if k.nil? == true
k=a.length
end
a=a[0,k].push(t)+ a[k,a.length-k]
end
p (l=a.count)%2==0 ? ((a[l/2] + a[l/2-1])/2.0).round(1):a[(l-1)/2].round(1)
end
I think...
a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
...because it's creating a new array every time, is likely an expensive operation as the array gets bigger.
Better might actually be something that mutates the original array.
a.insert((a.index{|x|x>t} || -1), t)
It also handles the edge cases of less than first or greater than last, so you can remove those tests. Also works on first pass (empty array a)
I am trying to loop the numbers 1 to 1000 in such a way that I have all possible pairs, e.g., 1 and 1, 1 and 2, 1 and 3, ..., but also 2 and 1, 2 and 2, 2 and 3, et cetera, and so on.
In this case I have a condition (amicable_pair) that returns true if two numbers are an amicable pair. I want to check all numbers from 1 to n against each other and add all amicable pairs to a total total. The first value will be added to the total if it is part of an amicable pair (not the second value of the pair, since we'll find that later in the loop). To do this I wrote the following "Java-like" code:
def add_amicable_pairs(n)
amicable_values = []
for i in 1..n
for j in 1..n
if (amicable_pair?(i,j))
amicable_values.push(i)
puts "added #{i} from amicable pair #{i}, #{j}"
end
end
end
return amicable_values.inject(:+)
end
Two issues with this: (1) it is really slow. (2) In Ruby you should not use for-loops.
This is why I am wondering how this can be accomplished in a faster and more Ruby-like way. Any help would be greatly appreciated.
Your code has O(n^2) runtime, so if n gets moderately large then it will naturally be slow. Brute-force algorithms are always slow if the search space is large. To avoid this, is there some way you can directly find the "amicable pairs" rather than looping through all possible combinations and checking one by one?
As far as how to write the loops in a more elegant way, I would probably rewrite your code as:
(1..n).to_a.product((1..n).to_a).select { |a,b| amicable_pair?(a,b) }.reduce(0, &:+)
(1..1000).to_a.repeated_permutation(2).select{|pair| amicable_pair?(*pair)}
.map(&:first).inject(:+)
This is a solution for calculating the median value in an array. I get the first three lines, duh ;), but the third line is where the magic is happening. Can someone explain how the 'sorted' variable is using and why it's next to brackets, and why the other variable 'len' is enclosed in those parentheses and then brackets? It's almost like sorted is all of a sudden being used as an array? Thanks!
def median(array)
sorted = array.sort
len = sorted.length
return ((sorted[(len - 1) / 2] + sorted[len / 2]) / 2.0).to_f
end
puts median([3,2,3,8,91])
puts median([2,8,3,11,-5])
puts median([4,3,8,11])
Consider this:
[1,2,2,3,4] and [1,2,3,4]. Both arrays are sorted, but have odd and even numbers of elements respectively. So, that piece of code is taking into account these 2 cases.
sorted is indeed an array. You sort [2,3,1,4] and you get back [1,2,3,4]. Then you calculate the middle index (len - 1) / 2 and len / 2 for even / odd number of elements, and find the average of them.
Yes, array.sort is returning an array and it is assigned to sorted. You can then access it via array indices.
If you have an odd number of elements, say 5 elements as in the example, the indices come out to be:
(len-1)/2=(5-1)/2=2
len/2=5/2=2 --- (remember this is integer division, so the decimal gets truncated)
So you take the value at index 2 and add them, and then divide by 2, which is the same as the value at index 2.
If you have an even number of elements, say 4,
(len-1)/2=(4-1)/2=1 --- (remember this is integer division, so the decimal gets truncated)
len/2=4/2=2
So in this case, you are effectively averaging the two middle elements 1 and 2, which is the definition of median for when you have an even number of elements.
It's almost like sorted is all of a sudden being used as an array?
Yes, it is. On line 2 it's being initialized as being an array with the same elements as the input, but in ascending order (default sort is ascending). On line 3 you have len which is initialized with the length of the sorted array, so yeah, sorted is being used as an array since then, because that's what it is.
I have got numbers in a specific range (usually from 0 to about 1000). An algorithm selects some numbers from this range (about 3 to 10 numbers). This selection is done quite often, and I need to check if a permutation of the chosen numbers has already been selected.
e.g one step selects [1, 10, 3, 18] and another one [10, 18, 3, 1] then the second selection can be discarded because it is a permutation.
I need to do this check very fast. Right now I put all arrays in a hashmap, and use a custom hash function: just sums up all the elements, so 1+10+3+18=32, and also 10+18+3+1=32. For equals I use a bitset to quickly check if elements are in both sets (I do not need sorting when using the bitset, but it only works when the range of numbers is known and not too big).
This works ok, but can generate lots of collisions, so the equals() method is called quite often. I was wondering if there is a faster way to check for permutations?
Are there any good hash functions for permutations?
UPDATE
I have done a little benchmark: generate all combinations of numbers in the range 0 to 6, and array length 1 to 9. There are 3003 possible permutations, and a good hash should generated close to this many different hashes (I use 32 bit numbers for the hash):
41 different hashes for just adding (so there are lots of collisions)
8 different hashes for XOR'ing values together
286 different hashes for multiplying
3003 different hashes for (R + 2e) and multiplying as abc has suggested (using 1779033703 for R)
So abc's hash can be calculated very fast and is a lot better than all the rest. Thanks!
PS: I do not want to sort the values when I do not have to, because this would get too slow.
One potential candidate might be this.
Fix a odd integer R.
For each element e you want to hash compute the factor (R + 2*e).
Then compute the product of all these factors.
Finally divide the product by 2 to get the hash.
The factor 2 in (R + 2e) guarantees that all factors are odd, hence avoiding
that the product will ever become 0. The division by 2 at the end is because
the product will always be odd, hence the division just removes a constant bit.
E.g. I choose R = 1779033703. This is an arbitrary choice, doing some experiments should show if a given R is good or bad. Assume your values are [1, 10, 3, 18].
The product (computed using 32-bit ints) is
(R + 2) * (R + 20) * (R + 6) * (R + 36) = 3376724311
Hence the hash would be
3376724311/2 = 1688362155.
Summing the elements is already one of the simplest things you could do. But I don't think it's a particularly good hash function w.r.t. pseudo randomness.
If you sort your arrays before storing them or computing hashes, every good hash function will do.
If it's about speed: Have you measured where the bottleneck is? If your hash function is giving you a lot of collisions and you have to spend most of the time comparing the arrays bit-by-bit the hash function is obviously not good at what it's supposed to do. Sorting + Better Hash might be the solution.
If I understand your question correctly you want to test equality between sets where the items are not ordered. This is precisely what a Bloom filter will do for you. At the expense of a small number of false positives (in which case you'll need to make a call to a brute-force set comparison) you'll be able to compare such sets by checking whether their Bloom filter hash is equal.
The algebraic reason why this holds is that the OR operation is commutative. This holds for other semirings, too.
depending if you have a lot of collisions (so the same hash but not a permutation), you might presort the arrays while hashing them. In that case you can do a more aggressive kind of hashing where you don't only add up the numbers but add some bitmagick to it as well to get quite different hashes.
This is only beneficial if you get loads of unwanted collisions because the hash you are doing now is too poor. If you hardly get any collisions, the method you are using seems fine
I would suggest this:
1. Check if the lengths of permutations are the same (if not - they are not equal)
Sort only 1 array. Instead of sorting another array iterate through the elements of the 1st array and search for the presence of each of them in the 2nd array (compare only while the elements in the 2nd array are smaller - do not iterate through the whole array).
note: if you can have the same numbers in your permutaions (e.g. [1,2,2,10]) then you will need to remove elements from the 2nd array when it matches a member from the 1st one.
pseudo-code:
if length(arr1) <> length(arr2) return false;
sort(arr2);
for i=1 to length(arr1) {
elem=arr1[i];
j=1;
while (j<=length(arr2) and elem<arr2[j]) j=j+1;
if elem <> arr2[j] return false;
}
return true;
the idea is that instead of sorting another array we can just try to match all of its elements in the sorted array.
You can probably reduce the collisions a lot by using the product as well as the sum of the terms.
1*10*3*18=540 and 10*18*3*1=540
so the sum-product hash would be [32,540]
you still need to do something about collisions when they do happen though
I like using string's default hash code (Java, C# not sure about other languages), it generates pretty unique hash codes.
so if you first sort the array, and then generates a unique string using some delimiter.
so you can do the following (Java):
int[] arr = selectRandomNumbers();
Arrays.sort(arr);
int hash = (arr[0] + "," + arr[1] + "," + arr[2] + "," + arr[3]).hashCode();
if performance is an issue, you can change the suggested inefficient string concatenation to use StringBuilder or String.format
String.format("{0},{1},{2},{3}", arr[0],arr[1],arr[2],arr[3]);
String hash code of course doesn't guarantee that two distinct strings have different hash, but considering this suggested formatting, collisions should be extremely rare