Transliteration with Iconv in Ruby - ruby

When I'm trying to transliterate a Cyrillic utf-8 string with
Iconv.iconv('ascii//ignore//translit', 'utf-8', string).to_s
(see questions/1726404/transliteration-in-ruby)
I'm getting everything but those symbols that have to be transliterated.
For example: 'r-строка' → 'r-' and 'Gévry' → 'Gvry'.
What's wrong?
Ruby 1.8.7 / Rails 2.3.5 / WSeven

require 'iconv'
p Iconv.iconv('ascii//translit//ignore', 'utf-8', 'Gévry') #=> ["Gevry"]
# not 'ascii//ignore//translit'
For Cyrillic the translit gem might work.

It seems the solution is too tricky for me. Problem solved using stringex gem.

Another way is to create custom translit by tr and gsub methods of String without using iconv.
# encoding: UTF-8
def russian_translit(text)
translited = text.tr('абвгдеёзийклмнопрстуфхэыь', 'abvgdeezijklmnoprstufhey\'')
translited = translited.tr('АБВГДЕЁЗИЙКЛМНОПРСТУФХЭ', 'ABVGDEEZIJKLMNOPRSTUFHEY\'')
translited = translited.gsub(/[жцчшщъюяЖЦЧШЩЪЮЯ]/,
'ж' => 'zh', 'ц' => 'ts', 'ч' => 'ch', 'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ю' => 'ju', 'я' => 'ja',
'Ж' => 'ZH', 'Ц' => 'TS', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '', 'Ю' => 'JU', 'Я' => 'JA')
return translited
end
p russian_translit("В чащах юга жил бы цитрус? Да, но фальшивый экземпляр!")
#=> "V chaschah juga zhil by tsitrus? Da, no fal'shivyj ekzempljar!"

Related

Obfuscating numbers in a string?

I have a challenge that calls for obfuscating numbers in a string, such as a SSN, for example: XXX-XX-4430. I've gotten pretty close:
def hide_all_ssns(string)
string.scan(/\w{3}-\w{2}-\w{4}/)
string.gsub('/\w{3}-\w{2}', 'XXX-XX')
end
but I get an error:
Error! hide_all_ssns obfuscates any SSNs in the string expected:
"XXX-XX-1422, XXX-XX-0744, XXX-XX-8762" got: "234-60-1422,
350-80-0744, 013-60-8762" (using ==)
I initially had the regular-expression (/\d{3}-\d{2}-\d{4}/) but thought that the problem was attempting to convert the integers in the string to X. Now I'm using \w, yet I am getting the same error.
Does anyone have any insight? I'm a newbie to coding and have exhausted Ruby-doc, as well as any blogs I can find on regex/gsub, but I am getting nowhere.
You're mis-using gsub (your regular expression needs to be between forward slashes), but I still thing gsub! is what you want...
def hide_all_ssns(string)
string.scan(/\w{3}-\w{2}-\w{4}/)
string.gsub!(/\w{3}-\w{2}/, 'XXX-XX')
end
Working example:
1.9.3p448 :063 > string = "123-45-6789"
=> "123-45-6789"
1.9.3p448 :064 > def hide_all_ssns(string)
1.9.3p448 :065?> string.scan(/\w{3}-\w{2}-\w{4}/)
1.9.3p448 :066?> string.gsub!(/\w{3}-\w{2}/, 'XXX-XX')
1.9.3p448 :067?> end
=> nil
1.9.3p448 :068 > hide_all_ssns(string)
=> "XXX-XX-6789"
1.9.3p448 :069 > string
=> "XXX-XX-6789"
Why does it have to be so hard? All U.S. social security numbers are the same format, right? So, work from that point. Here's some variations on a theme, ordered by escalating obscurity:
ssn = '123-45-6789' # => "123-45-6789"
ssn[0, 6] = 'XXX-XX' # => "XXX-XX"
ssn # => "XXX-XX-6789"
Or:
numbers = ssn.scan(/\d+/) # => ["123", "45", "6789"]
'XXX-XX-' + numbers.last # => "XXX-XX-6789"
Or:
ssn = '123-45-6789' # => "123-45-6789"
ssn[0, 6] = ssn[0, 6].gsub(/\d/, 'X') # => "XXX-XX"
ssn # => "XXX-XX-6789"
Or:
ssn[0,6] = ssn[0, 6].tr('0-9', 'X') # => "XXX-XX"
ssn # => "XXX-XX-6789"
Or:
numbers = ssn.split('-') # => ["123", "45", "6789"]
[*numbers[0, 2].map{ |s| 'X' * s.size }, numbers[-1]].join('-') # => "XXX-XX-6789"
Or:
ssn[/(\d+)-(\d+)-(\d+)/] # => "123-45-6789"
[$1, $2, $3] # => ["123", "45", "6789"]
[$3, *[$2, $1].map{ |s| s.gsub(/./, 'X') }].reverse.join('-') # => "XXX-XX-6789"
Of course, using one of these would cheating, since you're supposed to figure the challenge out by yourself, but they're good food for thought and a decent starting point for your own solution.
Short and simple... You could maybe try something like this:
crypted = ('X' * 6) + "4543-2329-1354-1111".to_s[14..18]
=> "XXXXXX-1111"

rake import- only adding one line from csv to database

I am attempting to import a CSV file into my rails database (SQLite in Development) following this tutorial. Data is actually getting inserted into my database but it seems to only insert the first record from the CSV File. the rake seems to run without problem. and a running it with --trace reveals no additional information.
require 'csv'
desc "Import Voters from CSV File"
task :import => [:environment] do
file = "db/GOTV.csv"
CSV.foreach(file, :headers => false) do |row|
Voter.create({
:last_name => row[0],
:first_name => row[1],
:middle_name => row[2],
:name_suffix => row[3],
:primary_address => row[4],
:primary_city => row[5],
:primary_state => row[6],
:primary_zip => row[7],
:primary_zip4 => row[8],
:primary_unit => row[9],
:primary_unit_number => row[10],
:phone_number => row[11],
:phone_code => row[12],
:gender => row[13],
:party_code => row[14],
:voter_score => row[15],
:congressional_district => row[16],
:house_district => row[17],
:senate_district => row[18],
:county_name => row[19],
:voter_key => row[20],
:household_id => row[21],
:client_id => row[22],
:state_voter_id => row[23]
})
end
end
Just ran into this as well - guess you solved it some other way, but still might be useful for others.
In my case, the issue seems to be an incompatible change in the CSV library.
I guess you were using Ruby 1.8, where
CSV.foreach(path, rs = nil, &block)
The docs here are severely lacking, actually no docs at all, so have to guess from source: http://ruby-doc.org/stdlib-1.8.7/libdoc/csv/rdoc/CSV.html#method-c-foreach..
Anyway, 'rs' is clearly not an option hash, it looks like the record separator.
In Ruby 1.9 this is nicer: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-foreach
self.foreach(path, options = Hash.new, &block)
so this is the one that supports options such as :headers..

Ruby String.encode still gives "invalid byte sequence in UTF-8"

In IRB, I'm trying the following:
1.9.3p194 :001 > foo = "\xBF".encode("utf-8", :invalid => :replace, :undef => :replace)
=> "\xBF"
1.9.3p194 :002 > foo.match /foo/
ArgumentError: invalid byte sequence in UTF-8
from (irb):2:in `match'
Any ideas what's going wrong?
I'd guess that "\xBF" already thinks it is encoded in UTF-8 so when you call encode, it thinks you're trying to encode a UTF-8 string in UTF-8 and does nothing:
>> s = "\xBF"
=> "\xBF"
>> s.encoding
=> #<Encoding:UTF-8>
\xBF isn't valid UTF-8 so this is, of course, nonsense. But if you use the three argument form of encode:
encode(dst_encoding, src_encoding [, options] ) → str
[...] The second form returns a copy of str transcoded from src_encoding to dst_encoding.
You can force the issue by telling encode to ignore what the string thinks its encoding is and treat it as binary data:
>> foo = s.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
=> "�"
Where s is the "\xBF" that thinks it is UTF-8 from above.
You could also use force_encoding on s to force it to be binary and then use the two-argument encode:
>> s.encoding
=> #<Encoding:UTF-8>
>> s.force_encoding('binary')
=> "\xBF"
>> s.encoding
=> #<Encoding:ASCII-8BIT>
>> foo = s.encode('utf-8', :invalid => :replace, :undef => :replace)
=> "�"
If you're only working with ascii characters you can use
>> "Hello \xBF World!".encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
=> "Hello � World!"
But what happens if we use the same approach with valid UTF8 characters that are invalid in ascii
>> "¡Hace \xBF mucho frío!".encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
=> "��Hace � mucho fr��o!"
Uh oh! We want frío to remain with the accent. Here's an option that keeps the valid UTF8 characters
>> "¡Hace \xBF mucho frío!".chars.select{|i| i.valid_encoding?}.join
=> "¡Hace mucho frío!"
Also in Ruby 2.1 there is a new method called scrub that solves this problem
>> "¡Hace \xBF mucho frío!".scrub
=> "¡Hace � mucho frío!"
>> "¡Hace \xBF mucho frío!".scrub('')
=> "¡Hace mucho frío!"
This is fixed if you read the source text file in using an explicit code page:
File.open( 'thefile.txt', 'r:iso8859-1' )

How to merge Ruby hashes

How can I merge these two hashes:
{:car => {:color => "red"}}
{:car => {:speed => "100mph"}}
To get:
{:car => {:color => "red", :speed => "100mph"}}
There is a Hash#merge method:
ruby-1.9.2 > a = {:car => {:color => "red"}}
=> {:car=>{:color=>"red"}}
ruby-1.9.2 > b = {:car => {:speed => "100mph"}}
=> {:car=>{:speed=>"100mph"}}
ruby-1.9.2 > a.merge(b) {|key, a_val, b_val| a_val.merge b_val }
=> {:car=>{:color=>"red", :speed=>"100mph"}}
You can create a recursive method if you need to merge nested hashes:
def merge_recursively(a, b)
a.merge(b) {|key, a_item, b_item| merge_recursively(a_item, b_item) }
end
ruby-1.9.2 > merge_recursively(a,b)
=> {:car=>{:color=>"red", :speed=>"100mph"}}
Hash#deep_merge
Rails 3.0+
a = {:car => {:color => "red"}}
b = {:car => {:speed => "100mph"}}
a.deep_merge(b)
=> {:car=>{:color=>"red", :speed=>"100mph"}}
Source: https://speakerdeck.com/u/jeg2/p/10-things-you-didnt-know-rails-could-do
Slide 24
Also,
http://apidock.com/rails/v3.2.13/Hash/deep_merge
You can use the merge method defined in the ruby library. https://ruby-doc.org/core-2.2.0/Hash.html#method-i-merge
Example
h1={"a"=>1,"b"=>2}
h2={"b"=>3,"c"=>3}
h1.merge!(h2)
It will give you output like this {"a"=>1,"b"=>3,"c"=>3}
Merge method does not allow duplicate key, so key b will be overwritten from 2 to 3.
To overcome the above problem, you can hack merge method like this.
h1.merge(h2){|k,v1,v2|[v1,v2]}
The above code snippet will be give you output
{"a"=>1,"b"=>[2,3],"c"=>3}
h1 = {:car => {:color => "red"}}
h2 = {:car => {:speed => "100mph"}}
h3 = h1[:car].merge(h2[:car])
h4 = {:car => h3}

How to get the formatting options for the to_yaml method working on ruby 1.9.1?

According to the YAML documentation it's possible to pass a hash of options to the .to_yaml method.
Currently when I pass the options as suggested by the documentation it's not working, the hash is being ignored.
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> user = { "1" => { "name" => "john", "age" => 44 } }
user.to_yaml
=> "--- \n\"1\": \n name: john\n age: 44\n"
Now, passing some options:
irb(main):014:0> user.to_yaml( :Indent => 4, :UseHeader => true, :UseVersion => true )
=> "--- \n\"1\": \n name: john\n age: 44\n"
irb(main):015:0> user.to_yaml( :Separator => "\n" )
=> "--- \n\"1\": \n name: john\n age: 44\n"
irb(main):016:0> user.to_yaml( :separator => "\n" )
=> "--- \n\"1\": \n name: john\n age: 44\n"
irb(main):017:0> RUBY_VERSION
=> "1.9.1"
As you can see, passing the options don't work. Only the defaults:
YAML::DEFAULTS
=> {:Indent=>2, :UseHeader=>false, :UseVersion=>false, :Version=>"1.0", :SortKeys=>false, :AnchorFormat=>"id%03d", :ExplicitTypes=>false, :WidthType=>"absolute", :BestWidth=>80, :UseBlock=>false, :UseFold=>false, :Encoding=>:None}
Is this a known bug? or It's currently working for anyone using Ruby 1.9.1 ?
I have dug relatively deep into the C source for this in the not so distant past. I'm posting just to validate what's already been said in the comments.
Basically, can't do it. The Syck options get lost somewhere in the process, before ever hitting the YAML writer.
The best you can have is to_yaml_style. Sometimes.
This is the same for 1.8 and 1.9.

Resources