I have a string like:
hn$8m3kj4.23hs#8;
i need to split it as follow: first entry should be of one char length, second entry of 2 chars, third entry of one char, fourth - by 2 chars and so on.
then concatenate one char with two chars entries by a semicolon :
if some chars at the end remains unpaired, they should be displayed as well.
it is important to skip all non alphanumeric chars.
so the final string should be:
h:n8 m:3k j:42 3:hs 8:
see, 8 has no 2 chars pair but it is displayed anyway.
i have tried with a loop but i get huge code.
also tried regexs but it split by wrong number of chars.
you can try this:
s = "hn$8m3kj4.23hs#8;"
s.gsub(/\W/, '').scan(/(.)(..)?/).map { |i| i.join ':' }.join ' '
=> "h:n8 m:3k j:42 3:hs 8:"
this will not skip underscores though.
if you need to skip them as well, use this one:
s = "hn$8m3k_j4.23hs#8;_"
s.gsub(/\W|_/, '').scan(/(.)(..)?/).map { |i| i.join ':' }.join ' '
=> "h:n8 m:3k j:42 3:hs 8:"
See live demo here
Related
Beginner here, obviously. I need to add a sum and a string together and from the product, I have to replace every 4th character with underscore, the end product should look something like this: 160_bws_np8_1a
I think .gsub is the way, but I can find a way to format the first part in .gsub where I have to specify every 4th character.
total = (1..num).sum
final_output = "#{total.to_s}" + "06bwsmnp851a"
return final_output.gsub(//, "_")
This would work:
s = '12345678901234'
s.gsub(/(...)./, '\1_')
#=> "123_567_901_34"
The regex matches 3 characters (...) that are captured (parentheses) followed by another character (.). Each match is replaced by the first capture (\1) and a literal underscore (_).
s = "12345678901234"
Here are two ways to do that. Both return
"123_567_901_34"
Match every four-character substring and replace the match with the first three characters of the match followed by an underscore
s.gsub(/.{4}/) { |s| s[0,3] << '_' }
Chain the enumerator s.gsub(/./) to Enumerator#with_index and replace every fourth character with an underscore
s.gsub(/./).with_index { |c,i| i%4 == 3 ? '_' : c }
See the form of String#gsub that takes a single argument and no block.
I have this code running inside a buffer (used to unescape a JS string in Ruby):
elsif hex_substring =~ /^\\u[0-9a-fA-F]{1,4}/
hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/) do |match|
hex_byte = match[0]
buffer << JSON.load(%Q("#{hex_byte}"))
hex_index += hex_byte.length
end
...
I have a concern that the scan() is matching a bit too much:
hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/)
# => [["\\ud83c\\udfec", "\\udfec"]]
I am using only "\\ud83c\\udfec", not "\\udfec".
Is there a way in Ruby or in regex to grab only the first part?
You should use a single grouping construct here, the one to match 1 or more occurrences of four hex chars, and omit the inner capturing group that resulted in an extra item in the resulting array:
.scan(/^(?:\\u[\da-fA-F]{4})+/)
Note that + is a simpler and shorter way to write {1,} (one or more occurrences).
Details
^ - start of string
(?: - start of a non-capturing group (what it matches won't be added to the final scan result):
\\u - a \u substring
[\da-fA-F]{4} - four hex chars
)+ - 1 or more occurrences (of the group pattern sequence).
Given any email address I would like to leave only the first and last two characters and input 4 asterisks to the left and right of # character.
The best way to explain are examples:
lorem.ipsum#gmail.com changed to lo****#****om
foo#foo.de changed fo****#****de
How to do it with gsub?
**If you want to mask with a fixed number of * symbols, you may yse
'lorem.ipsum#gmail.com'.sub(/\A(..).*#.*(..)\z/, '\1****#****\2')
# => lo****#****om
See the Ruby demo.
Here,
\A - start of string anchor
(..) - Group 1: first 2 chars
.*#.* - any 0+ chars other than line break chars as many as possible up to the last # followed with another set of 0+ chars other than line break ones
(..) - Group 2: last 2 chars
\z - end of string.
The \1 in the replacment string refers to the value kept in Group 1, and \2 references the value in Group 2.
If you want to mask existing chars while keeping their number, you might consider an approach to capture the parts of the string you need to keep or process, and manipulate the captures inside a sub block:
'lorem.ipsum#gmail.com'.sub(/\A(..)(.*)#(.*)(..)\z/) {
$1 + "*"*$2.length + "#" + "*"*$3.length + $4
}
# => lo*********#*******om
See the Ruby demo
Details
\A - start of string
(..) - Group 1 capturing any 2 chars
(.*) - Group 2 capturing any 0+ chars as many as possible up to the last....
# - # char
(.*) - Group 3 capturing any 0+ chars as many as possible up to the
(..) - Group 4: last two chars
\z - end of string.
Note that inside the block, $1 contains Group 1 value, $2 holds Group 2 value, and so on.
Using gsub with look-ahead and look-behind regex patterns:
'lorem.ipsum#gmail.com'.gsub(/(?<=.{2}).*#.*(?=\S{2})/, '****#****')
=> "lo****#****om"
Using plain ruby:
str.first(2) + '****#****' + str.last(2)
=> "lo****#****om"
I have a solution which doesn't fully solve your problem but it's pretty flexible and I think it's worth it to share it for anyone else looking for similar solutions.
module CoreExtensions
module String
module MaskChars
def mask_chars(except_first_n: 1, except_last_n: 2, mask_with: '*')
if except_first_n.zero? && except_last_n.zero?
raise ArgumentError, "except_first_n and except_last_n can't both be zero"
end
if length < (except_first_n + except_last_n)
raise ArgumentError, "String '#{self}' must be at least #{except_first_n}"\
" (except_first_n) #{except_last_n} (except_last_n) ="\
" #{except_first_n + except_last_n} characters long"
end
sub(
/\A(.{#{except_first_n}})(.*)(.{#{except_last_n}})\z/,
'\1' + (mask_with * (length - (except_first_n + except_last_n))) + '\3'
)
end
end
end
end
Let me explain the regex in /\A(.{#{except_first_n}})(.*)(.{#{except_last_n}})\z/
\A - start of string
(.#{except_first_n}) or (.{1}) Group 1: first n chars. Default value of except_first_n is 1
(.*) Group 2 capturing any 0+ chars as many as possible before the last n characters
(.#{except_last_n}) or (.{2}) Group 3: last n chars. Default value of except_last_n is 2
\z - end of string
Let me explain what's happening in '\1' + (mask_with * (length - (except_first_n + except_last_n))) + '\3'
We are substituting the string with group 1 (\1) at the start, it'll contain characters equalling except_first_n argument's value. We are not gonna use group 2, we need to replace group 2 with the character from mask_with argument, to calculate the amount of times we need to add mask_with character, we use this formula length - (except_first_n + except_last_n) (total length of the string minus the sum value of except_first_n and except_last_n. This will ensure that we have the exact number of mask_with characters between the except_first_n and the except_last_n characters).
Then I created an initializer file config/initializers/core_extensions.rb with this line:
String.include CoreExtensions::String::MaskChars
It will add mask_chars as an instance method to the String class available to all strings.
It should work like this:
account = "123456789101112"
=> "123456789101112"
account.mask_chars
=> "1************12"
account.mask_chars(except_first_n: 3, except_last_n: 4, mask_with: '#')
=> "123########1112"
I think this is a pretty useful method which can be useful in many scenarios and very flexible too.
I have the following function which accepts text and a word count and if the number of words in the text exceeded the word-count it gets truncated with an ellipsis.
#Truncate the passed text. Used for headlines and such
def snippet(thought, wordcount)
thought.split[0..(wordcount-1)].join(" ") + (thought.split.size > wordcount ? "..." : "")
end
However what this function doesn't take into account is extremely long words, for instance...
"Helloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
world!"
I was wondering if there's a better way to approach what I'm trying to do so it takes both word count and text size into consideration in an efficient way.
Is this a Rails project?
Why not use the following helper:
truncate("Once upon a time in a world far far away", :length => 17)
If not, just reuse the code.
This is probably a two step process:
Truncate the string to a max length (no need for regex for this)
Using regex, find a max words quantity from the truncated string.
Edit:
Another approach is to split the string into words, loop through the array adding up
the lengths. When you find the overrun, join 0 .. index just before the overrun.
Hint: regex ^(\s*.+?\b){5} will match first 5 "words"
The logic for checking both word and char limits becomes too convoluted to clearly express as one expression. I would suggest something like this:
def snippet str, max_words, max_chars, omission='...'
max_chars = 1+omision.size if max_chars <= omission.size # need at least one char plus ellipses
words = str.split
omit = words.size > max_words || str.length > max_chars ? omission : ''
snip = words[0...max_words].join ' '
snip = snip[0...(max_chars-3)] if snip.length > max_chars
snip + omit
end
As other have pointed out Rails String#truncate offers almost the functionality you want (truncate to fit in length at a natural boundary), but it doesn't let you independently state max char length and word count.
First 20 characters:
>> "hello world this is the world".gsub(/.+/) { |m| m[0..20] + (m.size > 20 ? '...' : '') }
=> "hello world this is t..."
First 5 words:
>> "hello world this is the world".gsub(/.+/) { |m| m.split[0..5].join(' ') + (m.split.size > 5 ? '...' : '') }
=> "hello world this is the world..."
I would like to insert a <wbr> tag every 5 characters.
Input: s = 'HelloWorld-Hello guys'
Expected outcome: Hello<wbr>World<wbr>-Hell<wbr>o guys
s = 'HelloWorld-Hello guys'
s.scan(/.{5}|.+/).join("<wbr>")
Explanation:
Scan groups all matches of the regexp into an array. The .{5} matches any 5 characters. If there are characters left at the end of the string, they will be matched by the .+. Join the array with your string
There are several options to do this. If you just want to insert a delimiter string you can use scan followed by join as follows:
s = '12345678901234567'
puts s.scan(/.{1,5}/).join(":")
# 12345:67890:12345:67
.{1,5} matches between 1 and 5 of "any" character, but since it's greedy, it will take 5 if it can. The allowance for taking less is to accomodate the last match, where there may not be enough leftovers.
Another option is to use gsub, which allows for more flexible substitutions:
puts s.gsub(/.{1,5}/, '<\0>')
# <12345><67890><12345><67>
\0 is a backreference to what group 0 matched, i.e. the whole match. So substituting with <\0> effectively puts whatever the regex matched in literal brackets.
If whitespaces are not to be counted, then instead of ., you want to match \s*\S (i.e. a non whitespace, possibly preceded by whitespaces).
s = '123 4 567 890 1 2 3 456 7 '
puts s.gsub(/(\s*\S){1,5}/, '[\0]')
# [123 4 5][67 890][ 1 2 3 45][6 7]
Attachments
Source code and output on ideone.com
References
regular-expressions.info
Finite Repetition, Greediness
Character classes
Grouping and Backreferences
Dot Matches (Almost) Any Character
Here is a solution that is adapted from the answer to a recent question:
class String
def in_groups_of(n, sep = ' ')
chars.each_slice(n).map(&:join).join(sep)
end
end
p 'HelloWorld-Hello guys'.in_groups_of(5,'<wbr>')
# "Hello<wbr>World<wbr>-Hell<wbr>o guy<wbr>s"
The result differs from your example in that the space counts as a character, leaving the final s in a group of its own. Was your example flawed, or do you mean to exclude spaces (whitespace in general?) from the character count?
To only count non-whitespace (“sticking” trailing whitespace to the last non-whitespace, leaving whitespace-only strings alone):
# count "hard coded" into regexp
s.scan(/(?:\s*\S(?:\s+\z)?){1,5}|\s+\z/).join('<wbr>')
# parametric count
s.scan(/\s*\S(?:\s+\z)?|\s+\z/).each_slice(5).map(&:join).join('<wbr>')