Replacing %uXXXX to the corresponding Unicode codepoint in Ruby

Replacing %uXXXX to the corresponding Unicode codepoint in Ruby - ruby

I have filenames which contain %uXXXX substrings, where XXXX are hexadecimal numbers / digits, for example %u0151, etc. I got these filenames by applying URI.unescape, which was able to replace %XX substrings to the corresponding characters but %uXXXX substrings remained untouched. I would like to replace them with the corresponding Unicode codepoints applying String#gsub. I tried the following, but no success:
"rep%u00fcl%u0151".gsub(/%u([0-9a-fA-F]{4,4})/,'\u\1')
I get this:
"rep\\u00fcl\\u0151"
Instead of this:
"repülő"

Try this code:
string.gsub(/%u([0-9A-F]{4})/i){[$1.hex].pack("U")}
In the comments, cremno has a better faster solution:
string.gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}
In the comments, bobince adds important restrictions, worth reading in full.

Per commenter #cremno's idea, try also this code:
gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }
For example:
s = "rep%u00fcl%u0151"
s.gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }
# => "repülő"

Related

Split a string and remove the first element in string

Original string '4.0.0-4.0-M-672092'
How to modify the Original string to "4.0-M-672092" using a one line code.
Any Help is highly appreciated .
Thanks and Regards

The 'split' method works in this case
https://apidock.com/ruby/String/split
'4.0.0-4.0-M-672092'.split('-')[1..-1].join('-')
# => "4.0-M-672092"
Just be careful, in this application is fine, but in long texts this might become unoptimized, since it splits all the string and then joins the array all over again
If you need this in wider texts to be more optimized, you can find the "-" index (which is your split) and use the next position to make a substring
text = '4.0.0-4.0-M-672092'
text[(text.index('-') + 1)..-1]
# => "4.0-M-672092"
But you can't do it in one line, and not finding a split character will result in an error, so use a rescue statement if that is possible to happen

Simplest way:
'4.0.0-4.0-M-672092'.split('-', 2).second

"4.0.0-4.0-M-672092"[/(?<=-).*/]
#=> "4.0-M-672092"
The regular expression reads, "Match zero or more characters other than newlines, as many as possible (.*), provided the match is preceded by a hyphen. (?<=-) is a positive lookbehind. See String#[].

Replace special character with its index

I need to replace all special characters within a string with their index.
For example,
"I-need_to#change$all%special^characters^"
should become:
"I1need6to9change16all20special28characters39"
The index of all special character differs.
I have checked many links replacing all with single character, occurances of a character.
I found very similar link but it I do not want to adopt these replace its index number as I need to replace all of the special characters.
I have also tried to do something like this:
str.gsub!(/[^0-9A-Za-z]/, '')
Here str is my example string.
As this replaces all the characters but with space, and I want the index instead of space. Either all of the special character or these seven
\/*[]:?
I need to replace this seven mainly but it would be OK if we replace all of them.
I need a simpler way.
Thanks in advance.

You can use the global variable $` and the block form of gsub:
irb> str = "I-need_to#change$all%special^characters^"
=> "I-need_to#change$all%special^characters^"
irb> str.gsub(/[^0-9A-Za-z]/) { $`.length }
=> "I1need6to9change16all20special28characters39"

Check if a string contains a character in a unicode range (using Ruby)

I want to create a simple function in Ruby that will check if the given string contains any unicode characters in the ranges such as the following:
U+007B -- U+00BF
U+02B0 -- U+037F
U+2000 -- U+2BFF
How can I accomplish this? Google is coming up blank for me, all things about removing unicode characters or checking if a string contains unicode.

The easiest thing would probably be a regex using String#index, String#match, or even String#[]:
string.index(/[\u007B-\u00BF\u02B0-\u037F\u2000-\u2BFF]/)
string.match(/[\u007B-\u00BF\u02B0-\u037F\u2000-\u2BFF]/)
string[/[\u007B-\u00BF\u02B0-\u037F\u2000-\u2BFF]/]
All three will give you nil (which is falsey) if they don't find the pattern and non-nil (which will be truthy) if they do.

I would do as below:
my_string = "{ How are you ?}"
puts my_string.chars.any? { |chr| ("\u007B".."\u00BF").include?(chr) }
#=> true

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.

You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")

For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.

Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.

The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")

You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Separate word Regex Ruby

I have a bunch of input files in a loop and I am extracting tag from them. However, I want to separate some of the words. The incoming strings are in the form cs### where ### => is any number from 0-9. I want the result to be cs ###. The closest answer I found was this, Regex to separate Numeric from Alpha . But I cannot get this to work, as the string is being predefined (Static) and mine changes.
Found answer:
Nevermind, I found the answer the following sperates alpha-numeric characters and removes any unwanted non-alphanumeric characters so anything like ab5#6$% =>ab 56
gsub(/(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i, ' ').gsub(/[^0-9a-z ]/i, ' ')

If your string is something like
str = "cs3232
cs23
cs423"
Then you can do something like
str.scan(/((cs)(\d{1,10}))/m).collect{|e| e.shift; e }
# [["cs", "3232"], ["cs", "23"], ["cs", "423"]]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Replacing %uXXXX to the corresponding Unicode codepoint in Ruby - ruby

Try this code: string.gsub(/%u([0-9A-F]{4})/i){[$1.hex].pack("U")} In the comments, cremno has a better faster solution: string.gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)} In the comments, bobince adds important restrictions, worth reading in full.

Per commenter #cremno's idea, try also this code: gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) } For example: s = "rep%u00fcl%u0151" s.gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) } # => "repülő"

Related

Split a string and remove the first element in string

Replace special character with its index

Check if a string contains a character in a unicode range (using Ruby)

String gsub - Replace characters between two elements, but leave surrounding elements

Separate word Regex Ruby

Categories

Resources