I have this structure
x = [8349310431, 8349314513]
y = [667984788, 667987788]
z = [148507632380, 153294624079]
map = Hash[x.zip([y, z].transpose).sort]
#=> {
# 8349310431=>[667984788, 148507632380],
# 8349314533=>[667987788, 153294624079]
# }
and I need to compare, the keys with the rest of the keys, but if the subtraction of the keys is less than 100, you have to compare the first elements to which the key points and if this subtraction of elements is less than 100 the procedure is repeated with the second element that the key points to
example
key[0] - key[1] = 8349310431−8349314533 = 4102 (with value absolute)
so now we subtract the first elements that the key points to, because it is greater than 100 the subtraction
element1Key1 - element1Key2 = 667984788 - 667987788 = 3000 (with value absolute)
as the subtraction is greater than 100 we repeat this with the second elements
element2Key1 - element2Key2 = 15329460 - 15329462 = 2 (with value absolute)
as it is less than 100 we stop here and keep this in a counter can be
if the subtraction is less than 100 since the operation with the keys, it can not be stopped there, it is necessary to do it until the second element to which the key points.
but how do I do it
Sorry for my English, but I don't speak it, I hope you understand, and thanks
Does this make sense?
x = [8349310431, 8349314513]
y = [667984788, 667987788]
z = [15329460, 15329462]
[x, y, z].detect { |a, b| (a-b).abs < 100 } # => [15329460, 15329462]
Just in case, why build a hash?
Related
I'm attempting to return a count of the total number of matching and non-matching entries in a set of two ranges.
I'm trying to avoid looping over the array twice like this:
#expected output:
#inside: 421 | outside: 55
constant_range = 240..960
sample_range = 540..1015
sample_range_a = sample_range.to_a
def generate_range
inside = sample_range_a.select { |val| constant_range.include?(val) }.count
outside = sample_range_a.select { |val| !constant_range.include?(val) }.count
end
# I was thinking of a counter, but thought that would be even more ineffective
def generate_range
a = 0
b = 0
sample_range_a.select { |val| constant_range.include?(val) ? a++ : b++ }
end
I don't know if this is entirely your case, but if they're always number ranges with an interval of 1 and not any arbitrary array, the solution can be optimized to O(1), unlike the other methods using to_a that are at least O(n). In other words, if you have a BIG range, those array solutions would choke badly.
Assuming that you'll always use an ascending range of numbers with interval of 1, it means you can count them just by using size (count would be our enemy in this situation).
With that said, using simple math you can first check if the ranges may intersect, if not, just return 0. Otherwise, you can finally calculate the new range interval and get its size.
def range_intersection_count(x, y)
return 0 if x.last < y.first || y.last < x.first
([x.begin, y.begin].max..[x.max, y.max].min).size
end
This will count the number of elements that intersect in two ranges in O(1). You can test this code with something like
range_intersection_count(5000000..10000000000, 3000..1000000000000)
and then try the same input with the other methods and watch your program hang.
The final solution would look something like this:
constant_range = (240..960)
sample_range = (540..1015)
inside_count = range_intersection_count(constant_range, sample_range) # = 421
outside_count = sample_range.size - inside_count # = 55
constant_range = (240..960).to_a
sample_range = (540..1015).to_a
inside_count = (sample_range & constant_range).count #inside: 421
outside_count = sample_range.count - inside_count #outside: 55
You can use - (difference) in Ruby:
constant_range = (240..960).to_a
sample_range = (540..1015).to_a
puts (sample_range - constant_range).count # 55
I have a variable, between 0 and 1, which should dictate the likelyhood that a second variable, a random number between 0 and 1, is greater than 0.5. In other words, if I were to generate the second variable 1000 times, the average should be approximately equal to the first variable's value. How do I make this code?
Oh, and the second variable should always be capable of producing either 0 or 1 in any condition, just more or less likely depending on the value of the first variable. Here is a link to a graph which models approximately how I would like the program to behave. Each equation represents a separate value for the first variable.
You have a variable p and you are looking for a mapping function f(x) that maps random rolls between x in [0, 1] to the same interval [0, 1] such that the expected value, i.e. the average of all rolls, is p.
You have chosen the function prototype
f(x) = pow(x, c)
where c must be chosen appropriately. If x is uniformly distributed in [0, 1], the average value is:
int(f(x) dx, [0, 1]) == p
With the integral:
int(pow(x, c) dx) == pow(x, c + 1) / (c + 1) + K
one gets:
c = 1/p - 1
A different approach is to make p the median value of the distribution, such that half of the rolls fall below p, the other half above p. This yields a different distribution. (I am aware that you didn't ask for that.) Now, we have to satisfy the condition:
f(0.5) == pow(0.5, c) == p
which yields:
c = log(p) / log(0.5)
With the current function prototype, you cannot satisfy both requirements. Your function is also asymmetric (f(x, p) != f(1-x, 1-p)).
Python functions below:
def medianrand(p):
"""Random number between 0 and 1 whose median is p"""
c = math.log(p) / math.log(0.5)
return math.pow(random.random(), c)
def averagerand(p):
"""Random number between 0 and 1 whose expected value is p"""
c = 1/p - 1
return math.pow(random.random(), c)
You can do this by using a dummy. First set the first variable to a value between 0 and 1. Then create a random number in the dummy between 0 and 1. If this dummy is bigger than the first variable, you generate a random number between 0 and 0.5, and otherwise you generate a number between 0.5 and 1.
In pseudocode:
real a = 0.7
real total = 0.0
for i between 0 and 1000 begin
real dummy = rand(0,1)
real b
if dummy > a then
b = rand(0,0.5)
else
b = rand(0.5,1)
end if
total = total + b
end for
real avg = total / 1000
Please note that this algorithm will generate average values between 0.25 and 0.75. For a = 1 it will only generate random values between 0.5 and 1, which should average to 0.75. For a=0 it will generate only random numbers between 0 and 0.5, which should average to 0.25.
I've made a sort of pseudo-solution to this problem, which I think is acceptable.
Here is the algorithm I made;
a = 0.2 # variable one
b = 0 # variable two
b = random.random()
b = b^(1/(2^(4*a-1)))
It doesn't actually produce the average results that I wanted, but it's close enough for my purposes.
Edit: Here's a graph I made that consists of a large amount of datapoints I generated with a python script using this algorithm;
import random
mod = 6
div = 100
for z in xrange(div):
s = 0
for i in xrange (100000):
a = (z+1)/float(div) # variable one
b = random.random() # variable two
c = b**(1/(2**((mod*a*2)-mod)))
s += c
print str((z+1)/float(div)) + "\t" + str(round(s/100000.0, 3))
Each point in the table is the result of 100000 randomly generated points from the algorithm; their x positions being the a value given, and their y positions being their average. Ideally they would fit to a straight line of y = x, but as you can see they fit closer to an arctan equation. I'm trying to mess around with the algorithm so that the averages fit the line, but I haven't had much luck as of yet.
I had a question in an interview and I couldn't find the optimal solution (and it's quite frustrating lol)
So you have a n-list of 1 and 0.
110000110101110..
The goal is to extract the longest sub sequence containing as many 1 as 0.
Here for example it is "110000110101" or "100001101011" or "0000110101110"
I have an idea for O(n^2), just scanning all possibilities from the beginning to the end, but apparently there is a way to do it in O(n).
Any ideas?
Thanks a lot!
Consider '10110':
Create a variable S. Create array A=[0].
Iterate from first number and add 1 to S if you notice 1 and subtract 1 from S if you notice 0 and append S to A.
For our example sequence A will be: [0, 1, 0, 1, 2, 1]. A is simply an array which stores a difference between number of 1s and 0s preceding the index. The sequence has to start and end at the place which has the same difference between 1s and 0s. So now our task is to find the longest distance between same numbers in A.
Now create 2 empty dictionaries (hash maps) First and Last.
Iterate through A and save position of first occurrence of every number in A in dictionary First.
Iterate through A (starting from the end) and save position of the last occurrence of each number in A in dictionary Last.
So for our example array First will be {0:0, 1:1, 2:4} and Last will be {0:2, 1:5, 2:4}
Now find the key(max_key) for which the difference between corresponding values in First and Last is the largest. This max difference is the length of the subsequence. Subsequence starts at First[max_key] and ends at Last[max_key].
I know it is a bit hard to understand but it has complexity O(n) - four loops, each has complexity N. You can replace dictionaries with arrays of course but it is more complicated then using dictionaries.
Solution in Python.
def find_subsequence(seq):
S = 0
A = [0]
for e in seq:
if e=='1':
S+=1
else:
S-=1
A.append(S)
First = {}
Last = {}
for pos, e in enumerate(A):
if e not in First:
First[e] = pos
for pos, e in enumerate(reversed(A)):
if e not in Last:
Last[e] = len(seq) - pos
max_difference = 0
max_key = None
for key in First:
difference = Last[key] - First[key]
if difference>max_difference:
max_difference = difference
max_key = key
if max_key is None:
return ''
return seq[First[max_key]:Last[max_key]]
find_sequene('10110') # Gives '0110'
find_sequence('1') # gives ''
J.F. Sebastian's code is more optimised.
EXTRA
This problem is related to Maximum subarray problem. Its solution is also based on summing elements from start:
def max_subarray(arr):
max_diff = total = min_total = start = tstart = end = 0
for pos, val in enumerate(arr, 1):
total += val
if min_total > total:
min_total = total
tstart = pos
if total - min_total > max_diff:
max_diff = total - min_total
end = pos
start = tstart
return max_diff, arr[start:end]
I have two arrays: fasta_ids & frags_by_density. Both contain the same set of ≈1300 strings.
fasta_ids is ordered numerically e.g. ['frag1', 'frag2', 'frag3'...]
frags_by_density contains the same strings ordered differently e.g. ['frag14', 'frag1000'...]
The way in which frag_by_density is ordered is irrelevant to the question (but for any bioinformaticians, the 'frags' are contigs ordered by snp density).
What I want to do is find the indexes in the frag_by_density array, that contain each of the strings in fasta_ids. I want to end up with a new array of those positions (indexes), which will be in the same order as the fasta_ids array.
For example, if the order of the 'frag' strings was identical in both the fasta_ids and frags_by_density arrays, the output array would be: [0, 1, 2, 3...].
In this example, the value at index 2 of the output array (2), corresponds to the value at index 2 of fasta_ids ('frag3') - so I can deduce from this that the 'frag3' string is at index 2 in frags_by_density.
Below is the code I have come up with, at the moment it gets stuck in what I think is an infinite loop. I have annotated what each part should do:
x = 0 #the value of x will represent the position (index) in the density array
position_each_frag_id_in_d = [] #want to get positions of the values in frag_ids in frags_by_density
iteration = []
fasta_ids.each do |i|
if frags_by_density[x] == i
position_each_frag_id_in_d << x #if the value at position x matches the value at i, add it to the new array
iteration << i
else
until frags_by_density[x] == i #otherwise increment x until they do match, and add the position
x +=1
end
position_each_frag_id_in_d << x
iteration << i
end
x = iteration.length # x should be incremented, however I cannot simply do: x += 1, as x may have been incremented by the until loop
end
puts position_each_frag_id_in_d
This was quite a complex question to put into words. Hopefully there is a much easier solution, or at least someone can modify what I have started.
Update: renamed the array fasta_ids, as it is in the code (sorry if any confusion)
fasta_id = frag_id
Non optimized version. array.index(x) returns index of x in array or nil if not found. compact then removes nil elements from the array.
position_of_frag_id_in_d = frag_ids.map{|x| frag_by_density.index(x)}.compact
Let's say I have a min and a max number. max can be anything, but min will always be greater than zero.
I can get the range min..max and let's say I have a third number, count -- I want to divide the range by 10 (or some other number) to get a new scale. So, if the range is 1000, it would increment in values of 100, 200, 300, and find out where the count lies within the range, based on my new scale. So, if count is 235, it would return 2 because that's where it lies on the range scale.
Am I making any sense? I'm trying to create a heat map based on a range of values, basically ... so I need to create the scale based on the range and find out where the value I'm testing lies on that new scale.
I was working with something like this, but it didn't do it:
def heat_map(project, word_count, division)
unless word_count == 0
max = project.words.maximum('quantity')
min = project.words.minimum('quantity')
range = min..max
total = range.count
break_point = total / division
heat_index = total.to_f / word_count.to_f
heat_index.round
else
"freezing"
end
end
I figured there's probably an easier ruby way I'm missing.
Why not just use arithmetic and rounding? Assuming that number is between min and max and you want the range split into n_div divisions and x is the number you want to find the index of (according to above it looks like min = 0, max = 1000, n_div = 10, and x = 235):
def heat_index(x, min, max, n_div)
break_point = (max - min).to_f/n_div.to_f
heat_index = (((x - min).to_f)/break_point).to_i
end
Then heat_index(235, 0, 1000, 10) gives 2.
I'm just quickly brainstorming an idea, but would something like this help?
>> (1..100).each_slice(10).to_a.index { |subrange| subrange.include? 34 }
=> 3
>> (1..100).each_slice(5).to_a.index { |subrange| subrange.include? 34 }
=> 6
This tells you in which subrange (the subrange size is determined by the argument to each_slice) the value (the argument to subrange.include?) lies.
>> (1..1000).each_slice(100).to_a.index { |subrange| subrange.include? 235 }
=> 2
Note that the indices for the subranges start from 0, so you may want to add 1 to them depending on what you need. Also this isn't ready as is, but should be easy to wrap up in a method.
How's this? It makes an array of range boundaries and then checks if the number lies between them.
def find_range(min, max, query, increment)
values = []
(min..max).step(increment) { |value| values << value }
values.each_with_index do |value, index|
break if values[index + 1].nil?
if query > value && query < values[index + 1]
return index
end
end
end
EDIT: removed redundant variable