Convert digits in string to ints and then back to string - ruby

Say I have a string: formula = "C3H12O4"
How can I convert the digit chars in the string to ints?
My end goal is to do something along the lines of:
formula * 4
Once converted formula chars to an int, it would be best to report the result back to a string, thus
outputting as:
"C12H48O16"

formula = "C3H12O4"
Code
p formula.gsub(/\d+/) { |x| x.to_i * 4 }
output
"C12H48O16"

If you had many conversions to do it might be worthwhile to include the following in a benchmark of different methods:
h = (0..9).each_with_object({}) { |n,h| h[n.to_s] = (4*n).to_s }
#=> {"0"=>"0", "1"=>"4", "2"=>"8", "3"=>"12", "4"=>"16",
# "5"=>"20", "6"=>"24", "7"=>"28", "8"=>"32", "9"=>"36"}
Then for each string of interest the following calculation would be performed:
"C3H12O4".gsub(/\d/, h)
#=> "C12H48O16"
"99Ra$32".gsub(/\d/, h)
#=> "3636Ra$128"
This uses the form of String#gsub that employs a hash to make the substitutions.
A variant of this is the following.
"C3H12O4".gsub(/./) { |c| h.fetch(c, c) }
#=> "C12H48O16"
Here gsub matches every character, which it passes to the block to be held by the block variable c. Hash#fetch is then used to look up and return h[c], provided h has a key c. If h does not have a key c, fetch's second argument (c) is returned.
The use of the hash avoids the need to convert back and forth between integers and strings, except in the creation of the hash, of course, but that is done only once.

Related

Move decimal fixed number of spaces with Ruby

I have large integers (typically 15-30 digits) stored as a string that represent a certain amount of a given currenty (such as ETH). Also stored with that is the number of digits to move the decimal.
{
"base_price"=>"5000000000000000000",
"decimals"=>18
}
The output that I'm ultimately looking for is 5.00 (which is what you'd get if took the decimal from 5000000000000000000 and moved it to the left 18 positions).
How would I do that in Ruby?
Given:
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
You could use:
my_number = my_map["base_price"].to_i / (10**my_map["decimals"]).to_f
puts(my_number)
h = { "base_price"=>"5000000000000000000", "decimals"=>18 }
​
bef, aft = h["base_price"].split(/(?=\d{#{h["decimals"]}}\z)/)
#=> ["5", "000000000000000000"]
bef + '.' + aft[0,2]
#=> "5.00"
The regular expression uses the positive lookahead (?=\d{18}\z) to split the string at a ("zero-width") location between digits such that 18 digits follow to the end of the string.
Alternatively, one could write:
str = h["base_price"][0, h["base_price"].size-h["decimals"]+2]
#=> h["base_price"][0, 3]
#=> "500"
str.insert(str.size-2, '.')
#=> "5.00"
Neither of these address potential boundary cases such as
{ "base_price"=>"500", "decimals"=>1 }
or
{ "base_price"=>"500", "decimals"=>4 }
Nor do they consider rounding issues.
Regular expressions and interpolation?
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
my_map["base_price"].sub(
/(0{#{my_map["decimals"]}})\s*$/,
".#{$1}"
)
The number of decimal places is interpolated into the regular expression as the count of zeroes to look for from the end of the string (plus zero or more whitespace characters). This is matched, and the match is subbed with a . in front of it.
Producing:
=> "5.000000000000000000"

How to store a list of small numbers in Postgres

I have a long list of small numbers, all of them < 16 but there can be more than 10000 of them in a unique list.
I get the values as a comma separated list, like:
6,12,10,2,2,2,6,12,8,2,2,6,10,2,4,12,14,10,2, .... lots and lots of numbers
And finally I need to store the values in a database in the most efficient way in order to be read back and processed again ... as a string, comma separated values.
I was thinking of sort of storing them in a big TEXT field ... however I find that adding all the commas in there would be a waste of space.
I am wondering if there is any best practice for this scenario.
For more technical details:
for Database I have to use Postgres (and I am sort of beginner in this field), and the programming language is Ruby (also beginner :) )
For a fast and reasonably space efficient solution, you could simply write a hexadecimal string :
string = '6,12,10,2,2,2,6,12,8,2,2,6,10,2,4,12,14,10,2'
p string.split(',').map { |v| v.to_i.to_s(16) }.join
# "6ca2226c8226a24cea2"
p '6ca2226c8226a24cea2'.each_char.map { |c| c.to_i(16) }.join(',')
# "6,12,10,2,2,2,6,12,8,2,2,6,10,2,4,12,14,10,2"
It brings the advantage of being easily readable by any DB and any program.
Also, it works even if there are leading 0s in the string : "0,0,6".
If you have an even number of elements, you could pack 2 hexa characters into one byte, to divide the string length by 2.
numbers = "6,12,10,2,2,2,6,12,8,2,2,6,10,2,4,12,14,10,2"
numbers.split(',')
.map { |n| n.to_i.to_s(2).rjust(4, '0') }
.join
.to_i(2)
.to_s(36)
#⇒ "57ymwcgbl1umt2a"
"57ymwcgbl1umt2a".to_i(36)
.to_s(2)
.tap { |e| e.prepend('0') until (e.length % 4).zero? }
.scan(/.{4}/)
.map { |e| e.to_i(2).to_s }
.join(',')
#⇒ "6,12,10,2,2,2,6,12,8,2,2,6,10,2,4,12,14,10,2"

Converting binary (or hex) into ASCII

I need to convert a large binary string (a sequence of bytes) into ASCII like this table. I can also start with a hex string.
I read this post: Converting binary data to string in ruby. I found a solution that converts to characters in the extended ASCII table. I could write conditionals for every case in order to convert, but there has to be an easier way. Can someone help?
The link you specified contains javascript code, that performs a conversion, on the page, not obfuscated:
function OnConvert()
{
hex = document.calcform.hex.value;
hex = hex.match(/[0-9A-Fa-f]{2}/g);
len = hex.length;
if( len==0 ) return;
txt='';
for(i=0; i<len; i++)
{
h = hex[i];
code = parseInt(h,16);
t = String.fromCharCode(code);
txt += t;
}
document.calcform.txt.value = txt;
}
I did not understand your task clearly, since if you’ll enter e. g. EEEFFA there in the form, you’ll get îïú as an output, what, in my opinion, is extended ASCII. But there is a simple way to achieve the same functionality in ruby.
▶ "EEEFFA".scan(/[0-9a-f]{2}/i).map { |cp| cp.to_i(16) }.inject('', &:concat)
#⇒ "îïú"
UPD As I understood from the comments, you want to convert every 8 zeros and ones to the respective ASCII letter. Here you go (assuming you have a long string, containing zeroes and ones):
▶ '010000010100001001000011'.
▷ scan(/[01]{8}/). # allow only zeros and ones, scan by 8
▷ map { |e| e.to_i 2 }. # convert to integers, base 10
▷ inject '', &:concat # concatenate into one string
#⇒ 'ABC'
A slight variation on #mudasobwa's excellent solution, using an (apparently undocumented) feature of String#oct:
'010000010100001001000011'
.scan(/0[01]{7}/)
.map { |b| b.prepend('0b').oct.chr }
.join
And hex, for completeness:
'627e29397c5727611147503e36355a4f683737'
.scan(/[0-7]\h/)
.map { |x| x.prepend('0x').oct.chr }
.join
I've opened a bug report at ruby-lang if anybody is interested...

Complementary DNA sequence

I'm having a problem writing this loop; it seems to stop after the second sequence.
I want to return the complementary DNA sequence to the given DNA sequence.
E.g. ('AGATTC') -> ('TCTAAG'), where A:T and C:G
def get_complementary_sequence(dna):
"""(str) -> str
> Return the DNA sequence that is complementary to the given DNA sequence
>>> get_complementary_sequence('AT')
('TA')
>>> get_complementary_sequence('AGATTC')
('TCTAAG')
"""
x = 0
complementary_sequence = ''
for char in dna:
complementary_sequence = (get_complement(dna))
return complementary_sequence + (dna[x:x+1])
Can anyone spot why the loop does not continue?
Here is an example how I would do it - only several lines of code really:
from string import maketrans
DNA="CCAGCTTATCGGGGTACCTAAATACAGAGATAT" #example DNA fragment
def complement(sequence):
reverse = sequence[::-1]
return reverse.translate(maketrans('ATCG','TAGC'))
print complement(DNA)
What's wrong?
You're calling:
complementary_sequence = (get_complement(dna))
...n times where n is the length of the string. This leaves you with whatever the return value of get_complement(dna) is in complementary_sequence. Presumably just one letter.
You then return this one letter (complementary_sequence) followed by the substring dna[0:1] (i.e. the first letter in dna), because x is always 0.
This would be why you always get two characters returned.
How to fix it?
Assuming you have a function like:
def get_complement(d):
return {'T': 'A', 'A': 'T', 'C': 'G', 'G': 'C'}.get(d, d)
...you could fix your function by simply using str.join() and a list comprehension:
def get_complementary_sequence(dna):
"""(str) -> str
> Return the DNA sequence that is complementary to the given DNA sequence
>>> get_complementary_sequence('AT')
('TA')
>>> get_complementary_sequence('AGATTC')
('TCTAAG')
"""
return ''.join([get_complement(c) for c in dna])
You call get_complement on all of dna instead of each char. This will simply call the came function with the same parameters len(dna) times. There's no reason to loop through the chars if you never use them. If get_complement() can take a char, I would recommend:
for char in dna:
complementary_sequence += get_complement(char)
The implementation of get_complement would take a single character and return its complement.
Also, you're returning complementary_sequence + (dna[x:x+1]). If you want the function to conform to the behavior that you've documented, the + (dna[x:x+1]) will add an extra (wrong) character from the beginning off the dna string. All you need to return is complementary_sequence! Thanks to #Kevin for noticing.
What you're doing:
>>> dna = "1234"
>>> for char in dna:
... print dna
...
1234
1234
1234
1234
what I think is closer to what you want to be doing:
>>> for char in dna:
... print char
...
1
2
3
4
Putting it all together:
# you could also use a list comprehension, with a join() call, but
# this is closer to your original implementation.
def get_complementary_sequence(seq):
complement = ''
for char in seq:
complement += get_complement(char)
return complement
def get_complement(base):
complements = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
return complements[base]
>>> get_complementary_sequence('AT')
'TA'
>>> get_complementary_sequence('AGATTC')
'TCTAAG'

How do I convert a Ruby string with brackets to an array?

I would like to convert the following string into an array/nested array:
str = "[[this, is],[a, nested],[array]]"
newarray = # this is what I need help with!
newarray.inspect # => [['this','is'],['a','nested'],['array']]
You'll get what you want with YAML.
But there is a little problem with your string. YAML expects that there's a space behind the comma. So we need this
str = "[[this, is], [a, nested], [array]]"
Code:
require 'yaml'
str = "[[this, is],[a, nested],[array]]"
### transform your string in a valid YAML-String
str.gsub!(/(\,)(\S)/, "\\1 \\2")
YAML::load(str)
# => [["this", "is"], ["a", "nested"], ["array"]]
You could also treat it as almost-JSON. If the strings really are only letters, like in your example, then this will work:
JSON.parse(yourarray.gsub(/([a-z]+)/,'"\1"'))
If they could have arbitrary characters (other than [ ] , ), you'd need a little more:
JSON.parse("[[this, is],[a, nested],[array]]".gsub(/, /,",").gsub(/([^\[\]\,]+)/,'"\1"'))
For a laugh:
ary = eval("[[this, is],[a, nested],[array]]".gsub(/(\w+?)/, "'\\1'") )
=> [["this", "is"], ["a", "nested"], ["array"]]
Disclaimer: You definitely shouldn't do this as eval is a terrible idea, but it is fast and has the useful side effect of throwing an exception if your nested arrays aren't valid
Looks like a basic parsing task. Generally the approach you are going to want to take is to create a recursive function with the following general algorithm
base case (input doesn't begin with '[') return the input
recursive case:
split the input on ',' (you will need to find commas only at this level)
for each sub string call this method again with the sub string
return array containing the results from this recursive method
The only slighlty tricky part here is splitting the input on a single ','. You could write a separate function for this that would scan through the string and keep a count of the openbrackets - closedbrakets seen so far. Then only split on commas when the count is equal to zero.
Make a recursive function that takes the string and an integer offset, and "reads" out an array. That is, have it return an array or string (that it has read) and an integer offset pointing after the array. For example:
s = "[[this, is],[a, nested],[array]]"
yourFunc(s, 1) # returns ['this', 'is'] and 11.
yourFunc(s, 2) # returns 'this' and 6.
Then you can call it with another function that provides an offset of 0, and makes sure that the finishing offset is the length of the string.

Resources