I have filenames which contain %uXXXX substrings, where XXXX are hexadecimal numbers / digits, for example %u0151, etc. I got these filenames by applying URI.unescape, which was able to replace %XX substrings to the corresponding characters but %uXXXX substrings remained untouched. I would like to replace them with the corresponding Unicode codepoints applying String#gsub. I tried the following, but no success:
"rep%u00fcl%u0151".gsub(/%u([0-9a-fA-F]{4,4})/,'\u\1')
I get this:
"rep\\u00fcl\\u0151"
Instead of this:
"repülő"
Try this code:
string.gsub(/%u([0-9A-F]{4})/i){[$1.hex].pack("U")}
In the comments, cremno has a better faster solution:
string.gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}
In the comments, bobince adds important restrictions, worth reading in full.
Per commenter #cremno's idea, try also this code:
gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }
For example:
s = "rep%u00fcl%u0151"
s.gsub(/%u([0-9A-F]{4})/i) { $1.hex.chr(Encoding::UTF_8) }
# => "repülő"
Related
Original string '4.0.0-4.0-M-672092'
How to modify the Original string to "4.0-M-672092" using a one line code.
Any Help is highly appreciated .
Thanks and Regards
The 'split' method works in this case
https://apidock.com/ruby/String/split
'4.0.0-4.0-M-672092'.split('-')[1..-1].join('-')
# => "4.0-M-672092"
Just be careful, in this application is fine, but in long texts this might become unoptimized, since it splits all the string and then joins the array all over again
If you need this in wider texts to be more optimized, you can find the "-" index (which is your split) and use the next position to make a substring
text = '4.0.0-4.0-M-672092'
text[(text.index('-') + 1)..-1]
# => "4.0-M-672092"
But you can't do it in one line, and not finding a split character will result in an error, so use a rescue statement if that is possible to happen
Simplest way:
'4.0.0-4.0-M-672092'.split('-', 2).second
"4.0.0-4.0-M-672092"[/(?<=-).*/]
#=> "4.0-M-672092"
The regular expression reads, "Match zero or more characters other than newlines, as many as possible (.*), provided the match is preceded by a hyphen. (?<=-) is a positive lookbehind. See String#[].
I need to replace all special characters within a string with their index.
For example,
"I-need_to#change$all%special^characters^"
should become:
"I1need6to9change16all20special28characters39"
The index of all special character differs.
I have checked many links replacing all with single character, occurances of a character.
I found very similar link but it I do not want to adopt these replace its index number as I need to replace all of the special characters.
I have also tried to do something like this:
str.gsub!(/[^0-9A-Za-z]/, '')
Here str is my example string.
As this replaces all the characters but with space, and I want the index instead of space. Either all of the special character or these seven
\/*[]:?
I need to replace this seven mainly but it would be OK if we replace all of them.
I need a simpler way.
Thanks in advance.
You can use the global variable $` and the block form of gsub:
irb> str = "I-need_to#change$all%special^characters^"
=> "I-need_to#change$all%special^characters^"
irb> str.gsub(/[^0-9A-Za-z]/) { $`.length }
=> "I1need6to9change16all20special28characters39"
I want to create a simple function in Ruby that will check if the given string contains any unicode characters in the ranges such as the following:
U+007B -- U+00BF
U+02B0 -- U+037F
U+2000 -- U+2BFF
How can I accomplish this? Google is coming up blank for me, all things about removing unicode characters or checking if a string contains unicode.
The easiest thing would probably be a regex using String#index, String#match, or even String#[]:
string.index(/[\u007B-\u00BF\u02B0-\u037F\u2000-\u2BFF]/)
string.match(/[\u007B-\u00BF\u02B0-\u037F\u2000-\u2BFF]/)
string[/[\u007B-\u00BF\u02B0-\u037F\u2000-\u2BFF]/]
All three will give you nil (which is falsey) if they don't find the pattern and non-nil (which will be truthy) if they do.
I would do as below:
my_string = "{ How are you ?}"
puts my_string.chars.any? { |chr| ("\u007B".."\u00BF").include?(chr) }
#=> true
Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")
I have a bunch of input files in a loop and I am extracting tag from them. However, I want to separate some of the words. The incoming strings are in the form cs### where ### => is any number from 0-9. I want the result to be cs ###. The closest answer I found was this, Regex to separate Numeric from Alpha . But I cannot get this to work, as the string is being predefined (Static) and mine changes.
Found answer:
Nevermind, I found the answer the following sperates alpha-numeric characters and removes any unwanted non-alphanumeric characters so anything like ab5#6$% =>ab 56
gsub(/(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i, ' ').gsub(/[^0-9a-z ]/i, ' ')
If your string is something like
str = "cs3232
cs23
cs423"
Then you can do something like
str.scan(/((cs)(\d{1,10}))/m).collect{|e| e.shift; e }
# [["cs", "3232"], ["cs", "23"], ["cs", "423"]]