Delete all non cyrullic symbols from string - ruby

I want have a field on my form, which can contain some symbols like #, $, etc. But for another case i want to have only letters, without any symbols. How do i cut all non letter symbols and leave all russian cyrullic letters
Here is small example:
i have string "йцукен#$%йцукен"
in the end want to get "йцукен йцукен"

Try this:
'йцукен#$%йцукен'.gsub(/\P{Cyrillic}++/, ' ')
explanation:
\p{Cyrillic} is the character class for cyrillic letters.
\P{Cyrillic} contains all characters that are not cyrillic letters.
If you want to preserve other characters you can do it like this:
'123йцукен#$%йцукен456'.gsub(/[^\p{Cyrillic}0-9]++/, ' ')

brute force with a list of allowed characters
def filter(input, allowed)
input.chars.inject('') do |result, char|
result << char if allowed.include? char
result
end
end
test_string = 'abc$6&йцxyz'
allowed_characters = 'abcxyzйц'
puts filter(test_string, allowed_characters)
=> abcйцxyz

The expression "йцукен#$%йцукен" that you have in the question is not a valid Ruby expression. The #$ within double quotes works as interpolation. If you meant a string 'йцукен#$%йцукен', and if you wanted to replace sequences of characters like '#%$' with a space rather than just deleting them, then, the following can work.
'йцукен#$%йцукен'.tr('#%$', " ").squeeze(" ")
# => "йцукен йцукен"

Related

Ruby - replace characters in a string with a symbol

I am trying to receive a string of characters and change all characters aside from spaces to "*". Here is where I am:
def change_word(word)
new_word.each {|replace| replace.gsub!(/./, "*") }
new_word.to_s
new_word.join
end
I'm taking a word, adding the individual characters to an array and assigning this to a new variable, replacing everything in said array with the required symbol, changing everything in the array to a string and then joining everything in the array to output a bunch of *'s.
What I would like to do (and it's not necessary that the solution follows the previous syntax) is take all letters and replace them with *. Spaces should stay as a space, only letters should become *.
What about gsub(/\S/, '*')
It will find all non-whitespace characters and replace every one of them with *. \S is a regex character class matching non-whitespace chars (thanks #jdno).
E.g.
pry> "as12 43-".gsub(/\S/, '*')
=> "**** ***"
So in your case:
def change_word(word)
word.gsub(/\S/, '*')
end
You may also extract the regex outside of the method to optimize it a bit:
CHANGE_WORD_PATTERN = /\S/
def change_word(word)
word.gsub(CHANGE_WORD_PATTERN, '*')
end
When the first argument from the tr method of String starts with "^", then it means: everything but. So "^ " means everything but a space.
word = "12 34 rfv"
word.tr("^ ","*") # => "** ** ***"

How to print an escape character in Ruby?

I have a string containing an escape character:
word = "x\nz"
and I would like to print it as x\nz.
However, puts word gives me:
x
z
How do I get puts word to output x\nz instead of creating a new line?
Use String#inspect
puts word.inspect #=> "x\nz"
Or just p
p word #=> "x\nz"
I have a string containing an escape character:
No, you don't. You have a string containing a newline.
How do I get puts word to output x\nz instead of creating a new line?
The easiest way would be to just create the string in the format you want in the first place:
word = 'x\nz'
# or
word = "x\\nz"
If that isn't possible, you can translate the string the way you want:
word = word.gsub("\n", '\n')
# or
word.gsub!("\n", '\n')
You may be tempted to do something like
puts word.inspect
# or
p word
Don't do that! #inspect is not guaranteed to have any particular format. The only requirement it has, is that it should return a human-readable string representation that is suitable for debugging. You should never rely on the content of #inspect, the only thing you should rely on, is that it is human readable.

How to remove strings that end with a particular character in Ruby

Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end

How to insert a newline character to an array of characters

I want to insert a newline character into an array of characters which initially is a string. Let's say I have a variable myvar = "Blizzard". A string is formed from an array of characters. How can I insert a newline character inside it? In hope of making an output like this:
"B
lizzard"
I tried this:
myvar[1] = "\n"
but it's not working, and the output is like this:
"B\nlizzard"
My goal is to make the output like this:
B
l
i
z
z
a
r
d
without using puts. I have to do it by inserting newline characters into the array. Can someone point out where my mistake is, and if possible help me with this?
To add \n you can use this:
myvar = "Blizzard"
myvar.chars.map { |c| c + "\n" }.join.strip
Or better #Uri solution:
myvar.chars.join "\n"
But you can puts letters one on the line with next code:
myvar.chars.each { |c| puts c }
or:
myvar.each_char { |c| puts c } # for ruby >= 2.0
by Darek Nędza
'Blizzard'.chars.join("\n")
# => "B\nl\ni\nz\nz\na\nr\nd"
If all you want is to print the characters each in a new row you can do the following:
puts 'Blizzard'.chars
Output:
B
l
i
z
z
a
r
d
You have done myvar[1] = "\n" correctly. Your problem is not how you did it, but what you are expecting.
You seem to be confusing the inspection of a string and the puts output of the string. Inspection is what is displayed as the return value as in irb, and it is a meta-representation of what you have. And as long as it is a string, it will be delimited by double quotes, and all the special characters will be escaped with a backslash \. If you have a new line character, that would be represented as "\n". On the other hand, when you pass the string to puts, you will get the output according to what the special characters represent.
What you displayed as what you want (the one in multiple lines) should be the result of puts. You will never get such thing as inspection of the string.

Remove hex escape from string

I have the following hex as a string: "\xfe\xff". I'd like to convert this to "feff". How do I do this?
The closest I got was "\xfe\xff".inspect.gsub("\\x", ""), which returns "\"FEFF\"".
"\xfe\xff".unpack("H*").first
# => "feff"
You are dealing with what's called an escape sequence in your double quoted string. The most common escape sequence in a double quoted string is "\n", but ruby allows you to use other escape sequences in strings too. Your string, "\xfe\xff", contains two hex escape sequences, which are of the form:
\xNN
Escape sequences represent ONE character. When ruby processes the string, it notices the "\" and converts the whole hex escape sequence to one character. After ruby processes the string, there is no \x left anywhere in the string. Therefore, looking for a \x in the string is fruitless--it doesn't exist. The same is true for the characters 'f' and 'e' found in your escape sequences: they do not exist in the string after ruby processes the string.
Note that ruby processes hex escape sequences in double quoted strings only, so the type of string--double or single quoted--is entirely relevant. In a single quoted string, the series of characters '\xfe' is four characters long because there is no such thing as a hex escape sequence in a single quoted string:
str = "\xfe"
puts str.length #=>1
str = '\xfe'
puts str.length #=>4
Regexes behave like double quoted strings, so it is possible to use an entire escape sequence in a regex:
/\xfe/
When ruby processes the regex, then just like with a double quoted string, ruby converts the hex escape sequence to a single character. That allows you to search for the single character in a string containing the same hex escape sequence:
if "abc\xfe" =~ /\xfe/
If you pretend for a minute that the character ruby converts the escape sequence "\xfe" to is the character 'z', then that if statement is equivalent to:
if "abcz" =~ /z/
It's important to realize that the regex is not searching the string for a '\' followed by an 'x' followed by an 'f' followed by an 'e'. Those characters do not exist in the string.
The inspect() method allows you to see the escape sequences in a string by nullifying the escape sequences, like this:
str = "\\xfe\\xff"
puts str
--output:--
\xfe\xff
In a double quoted string, "\\" represents a literal backslash, while an escape sequence begins with only one slash.
Once you've nullified the escape sequences, then you can match the literal characters, like the two character sequence '\x'. But it's easier to just pick out the parts you want rather than matching the parts you don't want:
str = "\xfe\xff"
str = str.inspect #=> "\"\\xFE\\xFF\""
result = ""
str.scan /x(..)/ do |groups_arr|
result << groups_arr[0]
end
puts result.downcase
--output:--
feff
Here it is with gsub:
str = "\xfe\xff"
str = str.inspect #=>"\"\\xFE\\xFF\""
str.gsub!(/
"? #An optional quote mark
\\ #A literal '\'
x #An 'x'
(..) #Any two characters, captured in group 1
"? #An optional quote mark
/xm) do
Regexp.last_match(1)
end
puts str.downcase
--output:--
feff
Remember, a regex acts like a double quoted string, so to specify a literal \ in a regex, you have to write \\. However, in a regex you don't have to worry about a " being mistaken for the end of the regex, so you don't need to escape it, like you do in a double quoted string.
Just for fun:
str = "\xfe\xff"
result = ""
str.each_byte do |int_code|
result << sprintf('%x', int_code)
end
p result
--output:--
"feff"
Why are you calling inspect? That's adding the extra quotes..
Also, putting that in double quotes means the \x is interpolated. Put it in single quotes and everything should be good.
'\xfe\xff'.gsub("\\x","")
=> "feff"

Resources