Apply .capitalize on an Cyrillic array in ruby - ruby

I want to capitalise the string elements in the array with ruby
This is my code:
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.each {|month| month.capitalize!}
puts headermonths
I get the following output:
января
февраля
марта
апреля
мая
июня
июля
августа
октября
ноября
декабря
if print the array with:
print headermonths
I get the following
["\u044F\u043D\u0432\u0430\u0440\u044F", "\u0444\u0435\u0432\u0440\u0430\u043B\u044F", "\u043C\u0430\u0440\u0442\u0430", "\u0430\u043F\u0440\u0435\u043B\u044F", "\u043C\u0430\u044F", "\u0438\u044E\u043D\u044F", "\u0438\u044E\u043B\u044F", "\u0430\u0432\u0433\u0443\u0441\u0442\u0430", "\u043E\u043A\u0442\u044F\u0431\u0440\u044F", "\u043D\u043E\u044F\u0431\u0440\u044F", "\u0434\u0435\u043A\u0430\u0431\u0440\u044F"]
But I would like to have an output like:
Января
Февраля
Марта
Апреля
Мая
Июня
Июля
Августа
Октября
Ноября
Декабря
How does I achieve this with a ruby method?

You can use the unicode gem
require 'unicode'
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.map! {|month| Unicode::capitalize month }
puts headermonths
# >> ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня", "Июля", "Августа", "Октября", "Ноября", "Декабря"]

Stand-alone solution :
# From : https://en.wikipedia.org/wiki/Cyrillic_alphabets :
upcase = "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЮЯ"
downcase = "абвгдежзийклмнопрстуфхцчшщьюя"
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.each{|word| word[0] = word[0].tr(downcase,upcase)}
# => ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня", "Июля", "Августа", "Октября", "Ноября", "Декабря"]
If you want to use it with words in latin and cyrillic alphabets :
headermonths.each{|word| word[0] = word[0].tr(downcase,upcase).upcase }
With ActiveSupport
You can use ActiveSupport::Multibyte :
require 'active_support/core_ext/string/multibyte'
"января".mb_chars.capitalize.to_s #=> "Января"
So your script becomes :
require 'active_support/core_ext/string/multibyte'
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.map!{|word| word.mb_chars.capitalize.to_s}
#=> ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня", "Июля", "Августа", "Октября", "Ноября", "Декабря"]
Ruby 2.4
The code in your question would work just as expected with Ruby 2.4.
See "Case sensitivity for unicode characters" here.

The example below is a robust capitalize version, that works in any ruby starting with 1.9 but for cyrillic only due to -32 hardcoded.
NB: thanks and credits go to #Stefan and #EricDuminil, who lead me to the right direction
headermonths = %w|января февраля марта апреля мая июня
июля августа октября ноября декабря|
puts (headermonths.each do |s|
s[0] = (s[0].ord - 32).chr(Encoding::UTF_8)
end.inspect)
#⇒ ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня",
# "Июля", "Августа", "Октября", "Ноября", "Декабря"]

Related

Garbage Base64 decoded string [duplicate]

This question already has answers here:
p vs puts in Ruby
(8 answers)
Closed 3 years ago.
Could somebody explain me, why there are two various outputs?
CODE IN IRB(Interactive ruby shell):
irb(main):001:0> require 'base64'
=> true
irb(main):002:0> cookie = "YXNkZmctLTBEAiAvi95NGgcgk1W0pyUKXFEo6IuEvdxhmrfLqNVpskDv5AIgVn8wfIWf0y41cb%2Bx9I0ah%2F4BIIeRJ54nX2qGcxw567Y%3D"
=> "YXNkZmctLTBEAiAvi95NGgcgk1W0pyUKXFEo6IuEvdxhmrfLqNVpskDv5AIgVn8wfIWf0y41cb%2Bx9I0ah%2F4BIIeRJ54nX2qGcxw567Y%3D"
irb(main):003:0> decoded_cookie = Base64.urlsafe_decode64(URI.decode(cookie))
=> "asdfg--0D\x02 /\x8B\xDEM\x1A\a \x93U\xB4\xA7%\n\\Q(\xE8\x8B\x84\xBD\xDCa\x9A\xB7\xCB\xA8\xD5i\xB2#\xEF\xE4\x02 V\x7F0|\x85\x9F\xD3.5q\xBF\xB1\xF4\x8D\x1A\x87\xFE\x01 \x87\x91'\x9E'_j\x86s\x1C9\xEB\xB6"
Code from Linux terminal:
asd#asd:~# ruby script.rb
asdfg--0D /��M� �U��%
\Q(苄��a��˨�i�#�� V0|���.5q������ ��'�'_j�s9�
Script:
require 'base64'
require 'ecdsa'
cookie = "YXNkZmctLTBEAiAvi95NGgcgk1W0pyUKXFEo6IuEvdxhmrfLqNVpskDv5AIgVn8wfIWf0y41cb%2Bx9I0ah%2F4BIIeRJ54nX2qGcxw567Y%3D"
def decode_cookie(cookie)
decoded_cookie = Base64.urlsafe_decode64(URI.decode(cookie))
end
puts (decode_cookie(cookie))
How can i get the same output in terminal?
I need the output:
"asdfg--0D\x02 /\x8B\xDEM\x1A\a \x93U\xB4\xA7%\n\Q(\xE8\x8B\x84\xBD\xDCa\x9A\xB7\xCB\xA8\xD5i\xB2#\xEF\xE4\x02 V\x7F0|\x85\x9F\xD3.5q\xBF\xB1\xF4\x8D\x1A\x87\xFE\x01 \x87\x91'\x9E'_j\x86s\x1C9\xEB\xB6"
In Linux terminal.
A string like "\x8B" is a representation of character, not the literal \x8B. Ruby uses such representation if it's missing the font to display the character or if it messes with whitespacing (for example "\n" is a newline and not \ followed by a n).
The reason you get another output in irb is because you don't print the string using puts (like you do in your script). Simply calling decoded_cookie will return the string representation, not the actual content.
You can display the actual content by simply printing it to an output.
require 'base64'
cookie = "YXNkZmctLTBEAiAvi95NGgcgk1W0pyUKXFEo6IuEvdxhmrfLqNVpskDv5AIgVn8wfIWf0y41cb%2Bx9I0ah%2F4BIIeRJ54nX2qGcxw567Y%3D"
decoded_cookie = Base64.urlsafe_decode64(URI.decode(cookie))
puts decoded_cookie
# asdfg--0D /��M �U��%
# \Q(苄��a��˨�i�#�� V0|���.5q����� ��'�'_j�s9�
#=> nil
You can find more info about the "\xnn" representation here.
If you'd like the script to display the string representation use p instead of puts, or use puts decoded_cookie.inspect.

How can i read lines in a textfile with RUBY

I am new in ruby programming. I am trying to read a textfile line by line.
Here is my sample textfile:
john
doe
john_d
somepassword
Here is my code:
f = File.open('input.txt', 'r')
a = f.readlines
n = a[0]
s = a[1]
u = a[2]
p = a[3]
str = "<user><name=\"#{n}\" surname=\"#{s}\" username=\"#{u}\" password=\"#{p}\"/></user>"
File.open('result.txt', 'w') { |file| file.print(str) }
The output should look like this:
<user><name="john" surname="doe" username="john_d" password="somepassword"/></user>
But the result.txt looks like this. It includes newline character for every line:
<user><name="john
" surname="doe
" username="john_d
" password="somepassword"/></user>
How can i correct this?
It includes newline character for every line, because there is a newline character at the end of every line.
Just removed it when you don't need it:
n = a[0].gsub("\n", '')
s = a[1].gsub("\n", '')
# ...
As explained by spickermann, also just change line two into:
a = f.readlines.map! { |line| line.chomp }
As #iGian already mentioned, chomp is a good option to clean up your text. I am not sure which version of Ruby you are using, but here is the link to the official Ruby version 2.5 documentation on chomp just so you see how it is going to help you: https://ruby-doc.org/core-2.5.0/String.html#method-i-chomp
See the content of variable a after using chomp:
2.4.1 :001 > f = File.open('input.txt', 'r')
=> #<File:input.txt>
2.4.1 :002 > a = f.readlines.map! {|line| line.chomp}
=> ["john", "doe", "john_d", "somepassword"]
Depending on how many other corner cases you expect to see from your input string, here is also another suggestion that can help you to clean up your strings: strip with link to its official documentation with examples: https://ruby-doc.org/core-2.5.0/String.html#method-i-strip
See the content of variable a after using strip:
2.4.1 :001 > f = File.open('input.txt', 'r')
=> #<File:input.txt>
2.4.1 :002 > a = f.readlines.map! {|line| line.strip}
=> ["john", "doe", "john_d", "somepassword"]
FName = 'temp'
File.write FName, "john
doe
john_d
somepassword"
#=> 28
Here are two ways.
s = "<user><name=\"%s\" surname=\"%s\" username=\"%s\" password=\"%s\"/></user>"
puts s % File.readlines(FName).map(&:chomp)
# <user><name="john" surname="doe" username="john_d" password="somepassword"/></user>
puts s % File.read(FName).split("\n")
# <user><name="john" surname="doe" username="john_d" password="somepassword"/></user>
See String#% and, as mentioned in that doc, Kernel#sprintf.

How can I remove escape characters from string? UTF issue?

I've read in a XML file that has lines such as
<Song name="Caught Up In You" id='162' duration='276610'/>
I'm reading in the file with
f=File.open(file)
f.each_with_index do |line,index|
if line.match('Song name="')
#songs << line
puts line if (index % 1000) == 0
end
end
However when I try and use entries I find that get text with escaped characters such as:
"\t\t<Song name=\"Veinte Anos\" id='3118' duration='212009'/>\n"
How can I eliminate the escape characters either in the initial store or in the later selection
#songs[rand(#songs.size)]
ruby 2.0
Your text does not have 'escape' characters. The .inspect version of the string shows these. Observe:
> s = gets
Hello "Michael"
#=> "Hello \"Michael\"\n"
> puts s
Hello "Michael"
> p s # The same as `puts s.inspect`
"Hello \"Michael\"\n"
However, the real answer is to process this XML file as XML. For example:
require 'nokogiri' # gem install nokogiri
doc = Nokogiri.XML( IO.read( 'mysonglist.xml' ) ) # Read and parse the XML file
songs = doc.css( 'Song' ) # Gives you a NodeList of song els
puts songs.map{ |s| s['name'] } # Print the name of all songs
puts songs.map{ |s| s['duration'] } # Print the durations (as strings)
mins_and_seconds = songs.map{ |s| (s['duration'].to_i/1000.0).divmod(60) }
#=> [ [ 4, 36.6 ], … ]

Thor & YAML outputting as binary?

I'm using Thor and trying to output YAML to a file. In irb I get what I expect. Plain text in YAML format. But when part of a method in Thor, its output is different...
class Foo < Thor
include Thor::Actions
desc "bar", "test"
def set
test = {"name" => "Xavier", "age" => 30}
puts test
# {"name"=>"Xavier", "age"=>30}
puts test.to_yaml
# !binary "bmFtZQ==": !binary |-
# WGF2aWVy
# !binary "YWdl": 30
File.open("data/config.yml", "w") {|f| f.write(test.to_yaml) }
end
end
Any ideas?
All Ruby 1.9 strings have an encoding attached to them.
YAML encodes some non-UTF8 strings as binary, even when they look innocent, without any high-bit characters. You might think that your code is always using UTF8, but builtins can return non-UTF8 strings (ex File path routines).
To avoid binary encoding, make sure all your strings encodings are UTF-8 before calling to_yaml. Change the encoding with force_encoding("UTF-8") method.
For example, this is how I encode my options hash into yaml:
options = {
:port => 26000,
:rackup => File.expand_path(File.join(File.dirname(__FILE__), "../sveg.rb"))
}
utf8_options = {}
options.each_pair { |k,v| utf8_options[k] = ((v.is_a? String) ? v.force_encoding("UTF-8") : v)}
puts utf8_options.to_yaml
Here is an example of yaml encoding simple strings as binary
>> x = "test"
=> "test"
>> x.encoding
=> #<Encoding:UTF-8>
>> x.to_yaml
=> "--- test\n...\n"
>> x.force_encoding "ASCII-8BIT"
=> "test"
>> x.to_yaml
=> "--- !binary |-\n dGVzdA==\n"
After version 1.9.3p125, ruby build-in YAML engine will treat all BINARY encoding differently than before. All you need to do is to set correct non-BINARY encoding before your String.to_yaml.
in Ruby 1.9, All String object have attached a Encoding object
and as following blog ( by James Edward Gray II ) mentioned, ruby have build in three type of encoding when String is generated:
http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings.
One of encoding may solve your problem => Source code Encoding
This is the encoding of your source code, and can be specify by adding magic encoding string at the first line or second line ( if you have a sha-bang string at the first line of your source code )
the magic encoding code could be one of following:
# encoding: utf-8
# coding: utf-8
# -- encoding : utf-8 --
so in your case, if you use ruby 1.9.3p125 or later, this should be solved by adding one of magic encoding in the beginning of your code.
# encoding: utf-8
require 'thor'
class Foo < Thor
include Thor::Actions
desc "bar", "test"
def bar
test = {"name" => "Xavier", "age" => 30}
puts test
#{"name"=>"Xavier", "age"=>30}
puts test["name"].encoding.name
#UTF-8
puts test.to_yaml
#---
#name: Xavier
#age: 30
puts test.to_yaml.encoding.name
#UTF-8
end
end
I have been struggling with this using 1.9.3p545 on Windows - just with a simple hash containing strings - and no Thor.
The gem ZAML solves the problem quite simply:
require 'ZAML'
yaml = ZAML.dump(some_hash)
File.write(path_to_yaml_file, yaml)

ruby string splitting problem

i have this string:
"asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"
i want to get the value after between the ACK and the & symbol, the value between the ACK and the & symbol can be changed...
thanks
i want the solution in ruby.
require "cgi"
query_string = "asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asmda=asdakmsd"
parsed_query_string = CGI.parse(query_string)
#=> { "asdasda" => ["asdaskdmasd"],
# "asmda" => ["asdasmda", "asdakmsd"],
# "ACK" => ["Success"] }
parsed_query_string["ACK"].first
#=> "Success"
If you also want to reconstruct the query string (especially together with the rest of a URL), I would recommend looking into the addressable gem.
require "addressable/uri"
# Note the leading '?'
query_string = "?asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asmda=asdakmsd"
parsed_uri = Addressable::URI.parse(query_string)
parsed_uri.query_values["ACK"]
#=> "Success"
parsed_uri.query_values = parsed_uri.query_values.merge("ACK" => "Changed")
parsed_uri.to_s
#=> "?ACK=Changed&asdasda=asdaskdmasd&asmda=asdakmsd"
# Note how the order has changed and the duplicate key has been removed due to
# Addressable's built-in normalisation.
"asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"[/ACK=([^&]*)&/]
$1 # => 'Success'
A quick approach:
s = "asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"
s.gsub(/ACK[=\w]+&/,"ACK[changedValue]&")
#=> asdasda=asdaskdmasd&asmda=asdasmda&ACK[changedValue]&asdmas=asdakmsd&asmda=adasda
s = "asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"
m = s.match /.*ACK=(.*?)&/
puts m[1]
and just for fun without regexp:
Hash[s.split("&").map{|p| p.split("=")}]["ACK"]

Resources