Where as str[] will replace a character, str.insert will insert a character at a position. But it requires two lines of code:
str = "COSO17123456"
str.insert 4, "-"
str.insert 7, "-"
=> "COSO-17-123456"
I was thinking how to do this in one line of code. I came up with the following solution:
str = "COSO17123456"
str.each_char.with_index.reduce("") { |acc,(c,i)| acc += c + ( (i == 3 || i == 5) ? "-" : "" ) }
=> "COSO-17-123456
Is there a built-in Ruby helper for this task? If not, should I stick with the insert option rather than combining several iterators?
Use each to iterate over an array of indices:
str = "COSO17123456"
[4, 7].each { |i| str.insert i, '-' }
str #=> "COSO-17-123456"
You can uses slices and .join:
> [str[0..3], str[4..5],str[6..-1]].join("-")
=> "COSO-17-123456"
Note that the index after the first one (between 3 and 4) will be different since you are not inserting earlier insertion first. ie, more natural (to me anyway...)
You will insert at the absolute index of the original string -- not the moving relative index as insertions are made.
If you want to insert at specific absolute index values, you can also use ..each_with_index and control the behavior character by character:
str2 = ""
tgts=[3,5]
str.split("").each_with_index { |c,idx| str2+=c; str2+='-' if tgts.include? idx }
Both of the above create a new string.
String#insert returns the string itself.
This means you can chain the method calls, which can be a prettier and more efficient if you only have to do it a couple of times like in your example:
str = "COSO17123456".insert(4, "-").insert(7, "-")
puts str
COSO-17-123456
Your reduce version can be therefore more concisely written as:
[4,7].reduce(str) { |str, idx| str.insert(idx, '-') }
I'll bring one more variation to the table, String#unpack:
new_str = str.unpack("A4A2A*").join('-')
# or with String#%
new_str = "%s-%s-%s" % str.unpack("A4A2A*")
Related
I have large integers (typically 15-30 digits) stored as a string that represent a certain amount of a given currenty (such as ETH). Also stored with that is the number of digits to move the decimal.
{
"base_price"=>"5000000000000000000",
"decimals"=>18
}
The output that I'm ultimately looking for is 5.00 (which is what you'd get if took the decimal from 5000000000000000000 and moved it to the left 18 positions).
How would I do that in Ruby?
Given:
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
You could use:
my_number = my_map["base_price"].to_i / (10**my_map["decimals"]).to_f
puts(my_number)
h = { "base_price"=>"5000000000000000000", "decimals"=>18 }
bef, aft = h["base_price"].split(/(?=\d{#{h["decimals"]}}\z)/)
#=> ["5", "000000000000000000"]
bef + '.' + aft[0,2]
#=> "5.00"
The regular expression uses the positive lookahead (?=\d{18}\z) to split the string at a ("zero-width") location between digits such that 18 digits follow to the end of the string.
Alternatively, one could write:
str = h["base_price"][0, h["base_price"].size-h["decimals"]+2]
#=> h["base_price"][0, 3]
#=> "500"
str.insert(str.size-2, '.')
#=> "5.00"
Neither of these address potential boundary cases such as
{ "base_price"=>"500", "decimals"=>1 }
or
{ "base_price"=>"500", "decimals"=>4 }
Nor do they consider rounding issues.
Regular expressions and interpolation?
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
my_map["base_price"].sub(
/(0{#{my_map["decimals"]}})\s*$/,
".#{$1}"
)
The number of decimal places is interpolated into the regular expression as the count of zeroes to look for from the end of the string (plus zero or more whitespace characters). This is matched, and the match is subbed with a . in front of it.
Producing:
=> "5.000000000000000000"
Idea. Given the string, return all the matches (with overlaps) and the text before these matches.
Example. For the text atatgcgcatatat and the query atat there are three matches, and the desired output is atat, atatgcgcatat and atatgcgcatatat.
Problem. I use Ruby 2.2 and String#scan method to get multiple matches. I've tried to use lookahead, but the regex /(?=(.*?atat))/ returns every substring that ends with atat. There must be some regex magic to solve this problem, but I can't figure out the right spell.
I believe this is at least better than the OP's answer:
text = "atatgcgcatatat"
query = "atat"
res = []
text.scan(/(?=#{query})/){res.push($` + query)} #`
res # => ["atat", "atatgcgcatat", "atatgcgcatatat"]
Given the nature and purpose of regex, there is no way to do that. When a regex matches text, there is no way to include the same text in another match. Therefore, the best option that I can think of is to use a look-behind to find the ending position of each match:
(?<=atat)
With your example input of atatgcgcatatat, that would return the following three matches:
Position 4, Length 0
Position 12, Length 0
Position 14, Length 0
You could then loop through those results, get the position for each one, and then get the sub-string that starts at the beginning of the input string and ends at that position. If you don't know how to get the positions of each match, you may find the answers to this question helpful.
You could do this:
str = 'atatgcgcatatat'
target = 'atat'
[].tap do |a|
str.gsub(/(?=#{target})/) { a << str[0, $~.end(0)+target.size] }
end
#=> ["atat", "atatgcgcatat", "atatgcgcatatat"]
Notice that the string returned by gsub is discarded.
It seems, there's no way to solve the problem in just one go.
One possible solution is to use this knowledge to get indices of matches when using String#scan, and then return the array of sliced strings:
def find_by_end text, query
res = []
n = query.length
text.scan( /(?=(#{query}))/ ) do |m|
res << text.slice(0, $~.offset(0).first + n)
end
res
end
find_by_end "atatgcgcatatat", "atat" #=> ["atat", "atatgcgcatat", "atatgcgcatatat"]
A slightly different solution was proposed by #StevenDoggart. Here's a nice and short code which uses this hack to solve the problem:
"atatgcatatat".to_enum(:scan, /(?<=atat)/).map { $` } #`
#=> ["atat", "atatgcatat", "atatgcatatat"]
As #CasimiretHippolyte notes, reversing the string might help to solve the problem. It actually does, but it's hardly the prettiest solution:
"atatgcatatat".reverse.scan(/(?=(tata.*))/).flatten.map(&:reverse).reverse
#=> ["atat", "atatgcatat", "atatgcatatat"]
I have some parameters that I have to sort into different lists. The prefix determines which list should it belong to.
I use prefixes like: c, a, n, o and an additional hyphen (-) to determine whether to put it in include l it or exclude list.
I use the regex grouped as:
/^(-?)([o|a|c|n])(\w+)/
But here the third group (\w+) is not generic, and it should actually be dependent on the second group's result. I.e, if the prefix is:
'c' or 'a' -> /\w{3}/
'o' -> /\w{2}/
else -> /\w+/
Can I do this with a single regex? Currently I am using an if condition to do so.
Example input:
Valid:
"-cABS", "-aXYZ", "-oWE", "-oqr", "-ncanbeanyting", "nstillanything", "a123", "-conT" (will go to c_exclude_list)
Invalid:
"cmorethan3chars", "c1", "-a1234", "prefizisnotvalid", "somethingelse", "oABC"
Output: for each arg push to the correct list, ignore the invalid.
c_include_list, c_exclude_list, a_include_list, a_exclude_list etc.
You can use this pattern:
/(-?)\b([aocn])((?:(?<=[ac])\w{3}|(?<=o)\w{2}|(?<=n)\w+))\b/
The idea consists to use lookbehinds to check the previous character without including it in the capture group.
Since version 2.0, Ruby has switched from Oniguruma to Onigmo (a fork of Oniguruma), which adds support for conditional regex, among other features.
So you can use the following regex to customize the pattern based on the prefix:
^-(?:([ca])|(o)|(n))?(?(1)\w{3}|(?(2)\w{2}|(?(3)\w+)))$
Demo at rubular
Is a single, mind-bending regex the best way to deal with this problem?
Here's a simpler approach that does not employ a regex at all. I suspect that it would be at least as efficient as a single regex, considering that with the latter you must still assign matching strings to their respective arrays. I think it also reads better and would be easier to maintain. The code below should be easy to modify if I have misunderstood some fine points of the question.
Code
def devide_em_up(str)
h = { a_exclude: [], a_include: [], c_exclude: [], c_include: [],
o_exclude: [], o_include: [], other_exclude: [], other_include: [] }
str.split.each do |s|
exclude = (s[0] == ?-)
s = s[1..-1] if exclude
first = s[0]
s = s[1..-1] if 'cao'.include?(first)
len = s.size
case first
when 'a'
(exclude ? h[:a_exclude] : h[:a_include]) << s if len == 3
when 'c'
(exclude ? h[:c_exclude] : h[:c_include]) << s if len == 3
when 'o'
(exclude ? h[:o_exclude] : h[:o_include]) << s if len == 2
else
(exclude ? h[:other_exclude] : h[:other_include]) << s
end
end
h
end
Example
Let's try it:
str = "-cABS cABT -cDEF -aXYZ -oWE -oQR oQT -ncanbeany nstillany a123 " +
"-conT cmorethan3chars c1 -a1234 prefizisnotvalid somethingelse oABC"
devide_em_up(str)
#=> {:a_exclude=>["XYZ"], :a_include=>["123"],
# :c_exclude=>["ABS", "DEF"], :c_include=>["ABT"],
# :o_exclude=>["WE", "QR"], :o_include=>["QT"],
# :other_exclude=>["ncanbeany"], :other_include=>["nstillany"]}
I have some sequences in a string denoted by "#number" (/#\d/)
I want to remove any redundant sequences, where #2 is followed by #2,
I only want to remove them if another identical #number sequence is found directly after somewhere in the text, so for #2lorem#2ipsum the 2nd #2 is removed, but for #2lorem#1ipsum#2dolor nothing is removed because #1 is between the two #2 sequences.
"#2randomtext#2randomtext#2randomtext#1bla#2bla2#2bla2"
becomes:
"#2randomtextrandomtextrandomtext#1bla#2bla2bla2
"#2randomtext#2randomtext#2randomtext#1bla#2bla2#2bla2".gsub /(?<=(#\d))([^#]*)\1/,'\2'
=> "#2randomtextrandomtextrandomtext#1bla#2bla2bla2"
You can split it into tokens:
my_string = "#2randomtext#2randomtext#2randomtext#1bla#2bla2#2bla2"
tokens = my_string.scan /(#\d+)?((?:(?!#\d+).)*)/
#=> [["#2", "randomtext"], ["#2", "randomtext"], ["#2", "randomtext"], ["#1", "bla"], ["#2", "bla2"], ["#2", "bla2"]]
Then chunk, map and join:
tokens.chunk{|x| x[0].to_s}.map{|n, v| [n, v.map(&:last)]}.join
#=> "#2randomtextrandomtextrandomtext#1bla#2bla2bla2"
my_string = "#2randomtext#2randomtext#2randomtext#1bla#2bla2#2bla2"
prev_sequence = String.new
penultimate_index = my_string.length - 2
for i in 0..penultimate_index
if my_string[i] == '#'
new_sequence = "##{my_string[i+1]}"
if new_sequence == prev_sequence
my_string.slice!( i, 2 )
else
prev_sequence = new_sequence
end
end
end
puts my_string
easy... split your string into an array, and then compare the number that comes right after that. If it's the same, remove it/them. The complicated (through not that much), is that you can't remove entries from an array while looping through them... so what you need to do is make a recursive function... here's the pseudo:
-= Global values =-
Decalre StringArray and set it to OriginalString.SplitOn("#")
-= Method RemoveLeadingDuplicates =-
Declare Counter
Declare RemoveIndex
loop for each string in StringArray
if previous lead == current lead
Set RemoveIndex
break from loop
else
previous lead = current lead
Increase Counter By 1
end loop
if RemoveIndex is not null
Remove the item at specified index from the array
Call RemoveLeadingDuplicates
Return
I have the following function which accepts text and a word count and if the number of words in the text exceeded the word-count it gets truncated with an ellipsis.
#Truncate the passed text. Used for headlines and such
def snippet(thought, wordcount)
thought.split[0..(wordcount-1)].join(" ") + (thought.split.size > wordcount ? "..." : "")
end
However what this function doesn't take into account is extremely long words, for instance...
"Helloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
world!"
I was wondering if there's a better way to approach what I'm trying to do so it takes both word count and text size into consideration in an efficient way.
Is this a Rails project?
Why not use the following helper:
truncate("Once upon a time in a world far far away", :length => 17)
If not, just reuse the code.
This is probably a two step process:
Truncate the string to a max length (no need for regex for this)
Using regex, find a max words quantity from the truncated string.
Edit:
Another approach is to split the string into words, loop through the array adding up
the lengths. When you find the overrun, join 0 .. index just before the overrun.
Hint: regex ^(\s*.+?\b){5} will match first 5 "words"
The logic for checking both word and char limits becomes too convoluted to clearly express as one expression. I would suggest something like this:
def snippet str, max_words, max_chars, omission='...'
max_chars = 1+omision.size if max_chars <= omission.size # need at least one char plus ellipses
words = str.split
omit = words.size > max_words || str.length > max_chars ? omission : ''
snip = words[0...max_words].join ' '
snip = snip[0...(max_chars-3)] if snip.length > max_chars
snip + omit
end
As other have pointed out Rails String#truncate offers almost the functionality you want (truncate to fit in length at a natural boundary), but it doesn't let you independently state max char length and word count.
First 20 characters:
>> "hello world this is the world".gsub(/.+/) { |m| m[0..20] + (m.size > 20 ? '...' : '') }
=> "hello world this is t..."
First 5 words:
>> "hello world this is the world".gsub(/.+/) { |m| m.split[0..5].join(' ') + (m.split.size > 5 ? '...' : '') }
=> "hello world this is the world..."