Why Elixir's MapSet becomes unordered after 32 elements? - set

iex> MapSet.new(1..32) |> Enum.to_list
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
iex> MapSet.new(1..33) |> Enum.to_list
[11, 26, 15, 20, 17, 25, 13, 8, 7, 1, 32, 3, 6, 2, 33, 10, 9, 19, 14, 5, 18, 31,
22, 29, 21, 27, 24, 30, 23, 28, 16, 4, 12]
Here's the implementation in Elixir 1.3
def new(enumerable) do
map =
enumerable
|> Enum.to_list
|> do_new([])
%MapSet{map: map}
end
defp do_new([], acc) do
acc
|> :lists.reverse
|> :maps.from_list
end
defp do_new([item | rest], acc) do
do_new(rest, [{item, true} | acc])
end
Even though the order doesn't matter in a MapSet, but still wondering why a MapSet becomes unordered after 32 elements?

This is not specific to MapSet, but the same thing happens with normal Map (MapSet uses Map under the hood):
iex(1)> for i <- Enum.shuffle(1..32), into: %{}, do: {i, i}
%{1 => 1, 2 => 2, 3 => 3, 4 => 4, 5 => 5, 6 => 6, 7 => 7, 8 => 8, 9 => 9,
10 => 10, 11 => 11, 12 => 12, 13 => 13, 14 => 14, 15 => 15, 16 => 16,
17 => 17, 18 => 18, 19 => 19, 20 => 20, 21 => 21, 22 => 22, 23 => 23,
24 => 24, 25 => 25, 26 => 26, 27 => 27, 28 => 28, 29 => 29, 30 => 30,
31 => 31, 32 => 32}
iex(2)> for i <- Enum.shuffle(1..33), into: %{}, do: {i, i}
%{11 => 11, 26 => 26, 15 => 15, 20 => 20, 17 => 17, 25 => 25, 13 => 13, 8 => 8,
7 => 7, 1 => 1, 32 => 32, 3 => 3, 6 => 6, 2 => 2, 33 => 33, 10 => 10, 9 => 9,
19 => 19, 14 => 14, 5 => 5, 18 => 18, 31 => 31, 22 => 22, 29 => 29, 21 => 21,
27 => 27, 24 => 24, 30 => 30, 23 => 23, 28 => 28, 16 => 16, 4 => 4, 12 => 12}
This is because (most likely as an optimization) Erlang stores Maps of size upto MAP_SMALL_MAP_LIMIT as a sorted by key array. Only after the size is greater than MAP_SMALL_MAP_LIMIT Erlang switches to storing the data in a Hash Array Mapped Trie like data structure. In non-debug mode Erlang, MAP_SMALL_MAP_LIMIT is defined to be 32, so all maps with length upto 32 should print in sorted order. Note that this is an implementation detail as far as I know, and you should not rely on this behavior; they may change the value of the constant in the future or switch to a completely different algorithm if it's more performant.

Related

How to iterate an array every 30 items

I have an array of products with 234 items.
I need to create another array with a pagination (every 10 items)
example:
[
[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10],
...
]
How can I solve this?
I've tried in_groups_of but I don't have success.
You're looking for each_slice
Whenever you have an array problem, check the Enumerable. in_groups_of is a Rails method and uses each_slice under the hood.
Just use Enumerable#each_slice
[*1..34].each_slice(10).to_a
# =>
# [
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
# [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
# [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
# [31, 32, 33, 34]
# ]

Re-numbering residues in PDB file with biopython

I have a sequence alignment as:
RefSeq :MXKQRSLPLXQKRTKQAISFSASHRIYLQRKFSH .....
Templatepdb:-----------------ISFSASHR------FSHAQADFAG
I am trying to write a code that re-number residues based on this alignment in PDB file as:
original pdb : RES ID= 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 5 ...
new pdb : RES ID = 18 18 18 19 19 19 19 19 20 20 20 21 21 22 23 24 25 31 31 31 31 32 32 33 34 35 36 ...
If alignment only has gaps at beginning of alignment, easy to figure out. Only count gaps("-") and add sum of gaps in to residue.id= " " "sum of gap" " "
However, I could not find a way if there are gaps in the middle of the sequence.
Do you have any suggestions?
If I understand it correctly,
Your input is an alignment:
'-----------------ISFSASHR------FSHAQADFAG'
and a list of residue numbers:
[1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 18, 18, 18, 18]
And your output is the residue number shifted by the number of gaps before the residue:
[18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 25, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 41, 41, 41, 41]
Below is the code to demonstrate it. There are numerous ways to calculate the output.
The way I do it is to keep a dictionary shift_dict with key as the original number and value as the shifted number.
import itertools
import random
def random_residue_number(sequence):
nested = [[i + 1] * random.randint(1, 10) for i in range(len(sequence))]
merged = list(itertools.chain.from_iterable(nested))
return merged
def aligned_residue_number(alignment, original_number):
gap_shift = 0
residue_count = 0
shift_dict = {}
for residue in alignment:
if residue == '-':
gap_shift += 1
else:
residue_count += 1
shift_dict[residue_count] = gap_shift + residue_count
return [shift_dict[number] for number in original_number]
sequence = 'ISFSASHRFSHAQADFAG'
alignment = '-----------------ISFSASHR------FSHAQADFAG'
original_number = random_residue_number(sequence)
print(original_number)
# [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 18, 18, 18, 18]
new_number = aligned_residue_number(alignment, original_number)
print(new_number)
# [18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 25, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 41, 41, 41, 41]

Remove n elements from array dynamically and add to another array

nums= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
new_array=[]
How do I grab every two items divisible by 5 and add them to a new array.
This is the desired result:
the new_array should now contain these values
[[5,10],[15,20],[25,30]]
Note: I want to do this without pushing them all into the array and then performing
array.each_slice(2). The process should happen dynamically.
Try this
new_array = nums.select { |x| x % 5 == 0 }.each_slice(2).entries
No push involved.

Make a square multiplication table in Ruby

I got this question in an interview and got almost all the way to the answer but got stuck on the last part. If I want to get the multiplication table for 5, for instance, I want to get the output to be formatted like so:
1, 2, 3, 4, 5
2, 4, 6, 8, 10
3, 6, 9, 12, 15
4, 8, 12, 16, 20
5, 10, 15, 20, 25
My answer to this is:
def make_table(n)
s = ""
1.upto(n).each do |i|
1.upto(n).each do |j|
s += (i*j).to_s
end
s += "\n"
end
p s
end
But the output for make_table(5) is:
"12345\n246810\n3691215\n48121620\n510152025\n"
I've tried variations with array but I'm getting similar output.
What am I missing or how should I think about the last part of the problem?
You can use map and join to get a String in one line :
n = 5
puts (1..n).map { |x| (1..n).map { |y| x * y }.join(', ') }.join("\n")
It iterates over rows (x=1, x=2, ...). For each row, it iterates over cells (y=1, y=2, ...) and calculates x*y. It joins every cells in a row with ,, and joins every rows in the table with a newline :
1, 2, 3, 4, 5
2, 4, 6, 8, 10
3, 6, 9, 12, 15
4, 8, 12, 16, 20
5, 10, 15, 20, 25
If you want to keep the commas aligned, you can use rjust :
puts (1..n).map { |x| (1..n).map { |y| (x * y).to_s.rjust(3) }.join(',') }.join("\n")
It outputs :
1, 2, 3, 4, 5
2, 4, 6, 8, 10
3, 6, 9, 12, 15
4, 8, 12, 16, 20
5, 10, 15, 20, 25
You could even go fancy and calculate the width of n**2 before aligning commas :
n = 11
width = Math.log10(n**2).ceil + 1
puts (1..n).map { |x| (1..n).map { |y| (x * y).to_s.rjust(width) }.join(',') }.join("\n")
It outputs :
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22
3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33
4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44
5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55
6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66
7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77
8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88
9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99
10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110
11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121
Without spaces between the figures, the result is indeed unreadable. Have a look at the % operator, which formats strings and numbers. Instead of
s += (i*j).to_s
you could write
s += '%3d' % (i*j)
If you really want to get the output formatted in the way you explained in your posting (which I don't find that much readable), you could do a
s += "#{i*j}, "
This leaves you with two extra characters at the end of the line, which you have to remove. An alternative would be to use an array. Instead of the inner loop, you would have then something like
s += 1.upto(n).to_a.map {|j| i*j}.join(', ') + "\n"
You don't need to construct a string if you're only interested in printing the table and not returning the table(as a string).
(1..n).each do |a|
(1..n-1).each { |b| print "#{a * b}, " }
puts a * n
end
This is how I'd do it.
require 'matrix'
n = 5
puts Matrix.build(n) { |i,j| (i+1)*(j+1) }.to_a.map { |row| row.join(', ') }
1, 2, 3, 4, 5
2, 4, 6, 8, 10
3, 6, 9, 12, 15
4, 8, 12, 16, 20
5, 10, 15, 20, 25
See Matrix::build.
You can make it much shorter but here's my version.
range = Array(1..12)
range.each do |element|
range.map { |item| print "#{element * item} " } && puts
end

Confusion with enum#with_index - starting offset index

a=[11,22,31,224,44].to_enum
=> #<Enumerator: [11, 22, 31, 224, 44]:each>
a.select.with_index{|x| puts x if x<2 }
=> []
a.with_index(2)
=> #<Enumerator: #<Enumerator: [11, 22, 31, 224, 44]:each>:with_index(2)>
irb(main):011:0> a.with_index(2){|x| puts x if x==224}
224
=> [11, 22, 31, 224, 44]
a.with_index(2){|x| puts x if x < 224}
11
22
31
44
=> [11, 22, 31, 224, 44]
Confusion: Here I have set the starting offset as 2.But the if we look into the output- how 11 comes
instead of 31. As the 31 is on the 2th position.
a.with_index(2){|x| puts x if x > 224}
=> [11, 22, 31, 224, 44]
a.with_index(1){|x| puts x if x > 224}
=> [11, 22, 31, 224, 44]
a.with_index(1){|x| puts x if x < 224}
11
22
31
44
=> [11, 22, 31, 224, 44]
a.with_index(1){|x| puts x if x < 224}
11
22
31
44
=> [11, 22, 31, 224, 44]
Confusion: Here bove I have set the starting offset as 1.But the if we look into the output- how 11 comes instead of 22. As the 22 is on the 1st position.
In considering all the fact together I would like to know even if we mentioned the starting offset - why enum#with_index not started the evaluation from mentioned offset?
Note: Is there any direct approach to print the index also with the contents?
Enumerator#with_index has confusing documentation,but hopefully this will make it more clear.
a=[11,22,31,224,44].to_enum
=> [11, 22, 31, 224, 44]
a.with_index { |val,index| puts "index: #{index} for #{val}" }
index: 0 for 11
index: 1 for 22
index: 2 for 31
index: 3 for 224
index: 4 for 44
a.with_index(2) { |val,index| puts "index: #{index} for #{val}" }
index: 2 for 11
index: 3 for 22
index: 4 for 31
index: 5 for 224
index: 6 for 44
As you can see, what it actually does is offset the index, not start iterating from the given index.

Resources