Ruby: How to print ® - ruby

Save rb file with ASCII, ® can't be right displayed
Save rb file with Unicode, it would cause error
Invalid char \357' in expression
Invalid char\273' in expression
Invalid char `\277' in expression

You could also try Array#pack.
puts [174].pack('U*')
That won't require any non-ASCII characters in your source code.

You have to declare the source encoding:
# coding: utf-8
p "®"
(just add the # coding: utf-8 line in your file to declare its encoding to utf-8)

If you're displaying this in a web browser, use the ® or ® HTML entities. The browser should interpret them as the correct character.

Related

Text file encoding properly as utf-8 in Scite editor but fails to encode to uft-8 in ruby

I have a text file which if viewed in the Scite editor with the encoding set to utf-8, displays all text correctly, including capital letters with an accute accent (i.e. Á).
However, if I write a ruby script and use mystring.encode("utf-8") it will give me this error on capital letters that carry an acute accent (i.e. Á):
encode': "\x81" to UTF-8 in conversion from Windows-1252 to UTF-8 (Encoding::UndefinedConversionError)
Is this expected behaviour? How can I encode the whole text to utf-8 using ruby, knowing that otherwise it does get successfully encoded in the Scite editor?
Code:
ine_file = File.open("../../_data/ine_spain_demographics.csv", 'r')
ine_towns_population_hash = Hash.new
ine_file.each do|line|
values = line.split(";")
town_name = values[3]
population = values[4]
begin
ine_towns_population_hash[town_name.encode("utf-8")] = population
rescue
puts "problematic string: " + town_name
end
end
You say that ine_file.external_encoding says Windows-1252 so the file is being opened as a Windows-1252 encoded file. Then you say town_name.encode("utf-8") in an attempt to encoded a string as UTF-8 and Ruby complains. But the file is actually UTF-8; reading UTF-8 bytes as Windows-1252 and then trying to recode those bytes as UTF-8 isn't going to work.
You need to open the file in UTF-8 mode:
File.open("../../_data/ine_spain_demographics.csv", 'r:UTF-8')
and stop trying to change the encoding of town_name, just use town_name as-is.
It seems like it's misinterpreting the encoding of ine_spain_demographics.csv.
Looking at the doc's for encode and open you have two options:
Use replace in encode to tell Ruby what character to use town_name.encode("utf-8", replace: '').
Identify the correct file encoding and specify it: File.open("../../_data/ine_spain_demographics.csv", 'r:ISO-8859-1')

unable to convert array data to json when '¿' is there

this is my ruby code
require 'json'
a=Array.new
value="¿value"
data=value.gsub('¿','-')
a[0]=data
puts a
puts "json is"
puts jsondata=a.to_json
getting following error
C:\Ruby193>new.rb
C:/Ruby193/New.rb:3: invalid multibyte char (US-ASCII)
C:/Ruby193/New.rb:3: syntax error, unexpected tIDENTIFIER, expecting $end
value="┐value"
^
That's not a JSON problem — Ruby can't decode your source because it contains a multibyte character. By default, Ruby tries to decode files as US-ASCII, but ¿ isn't representable in US-ASCII, so it fails. The solution is to provide a magic comment as described in the documentation. Assuming your source file's encoding is UTF-8, you can tell Ruby that like so:
# encoding: UTF-8
# ...
value = "¿value"
# ...
With an editor or an IDE the soluton of icktoofay (# encoding: UTF-8 - in the first line) is perfect.
In a shell with IRB or PRY it is difficult to find a working configuration. But there is a workaround that at least worked for my encoding problem which was to enter German umlaut characters.
Workaround for PRY:
In PRY I use the edit command to edit the contents of the input buffer
as described in this pry wiki page.
This opens an external editor (you can configure which editor you want). And the editor accepts special characters that can not be entered in PRY directly.

Invalid multibyte char(UTF-8)

I am trying compile this Ruby code with option --1.9:
\# encoding: utf-8
module Modd
def cpd
#"_¦+?" mySQL
"ñ,B˜"
end
end
I used the GVim editor and compiled then got the following error:
SyntaxError: f3.rb:6: invalid multibyte char (UTF-8)
After that I used Notepad++ and changed to Encode as UTF-8 and compiled with this option:
jruby --1.9 f3.rb
then I get:
SyntaxError: f3.rb:1: \273Invalid char `\273' ('╗') in expression
I have seen this happen when the BOM gets messed up during a charset conversion (the BOM in octal is 357 273 277). If you open the file with a hexadecimal editor (:%!xxd on vi), you will more than likely see characters at the beginning of the file, before the first #.
If you recreate that file directly in utf-8, or get rid of these spurious characters, this should solve your problem.

`gsub': incompatible character encodings: UTF-8 and IBM437

I try to use search, google but with no luck.
OS: Windows XP
Ruby version 1.9.3po
Error:
`gsub': incompatible character encodings: UTF-8 and IBM437
Code:
require 'rubygems'
require 'hpricot'
require 'net/http'
source = Net::HTTP.get('host', '/' + ARGV[0] + '.asp')
doc = Hpricot(source)
doc.search("p.MsoNormal/a").each do |a|
puts a.to_plain_text
end
Program output few strings but when text is ”NOŻYCE” I am getting error above.
Could somebody help?
You could try converting your HTML to UTF-8 since it appears the original is in vintage-retro DOS format:
source.encode!('UTF-8')
That should flip it from 8-bit ASCII to UTF-8 as expected by the Hpricot parser.
The inner encoding of the source variable is UTF-8 but that is not what you want.
As tadman wrote, you must first tell Ruby that the actual characters in the string are in the IBM437 encoding. Then you can convert that string to your favourite encoding, but only if such a conversion is possible.
source.force_encoding('IBM437').encode('UTF-8')
In your case, you cannot convert your string to ISO-8859-2 because not all IBM437 characters can be converted to that charset. Sticking to UTF-8 is probably your best option.
Anyway, are you sure that that file is actually transmitted in IBM437? Maybe it is stored as such in the HTTP server but it is sent over-the-wire with another encoding. Or it may not even be exactly in IBM437, it may be CP852, also called MS-DOC Latin 2 (different from ISO Latin 2).

Why do I get an "Invalid Byte Sequence in UTF-8" error reading a text file?

I'm writing a Ruby script to process a large text file, and keep getting an odd encoding error.
Here's the situation:
input_data = File.new(in_path, 'r').read
p input_data.encoding.name # UTF-8
break_char = "\r".encode("UTF-8")
p break_char # "\r"
p break_char.encoding.name # "UTF-8"
input_data.split(",".encode("UTF-8"))
p Encoding.compatible?(input_data, break_char) # # Encoding:UTF-8>
This produces the error :in 'split': invalid byte sequence in UTF-8 (ArgumentError)
I read http://blog.grayproductions.net/articles/ruby_19s_string and looked at other solutions to apparently the same problem, but still can't work out why it's happening when I believe I am controlling the encodings.
I'm on OSX working with ruby 1.9.2
Obviously your input file is not UTF-8 (or at least, not entirely). If you don't care about non-ascii characters, you can simply assume your file is ascii-8bit encoded. BTW, your separator (break_char) is not causing problems as comma is encoded the same way in UTF-8 as in ASCII.
fname = 'test.in'
# create example file and fill it with invalid UTF-8 sequence
File.open(fname, 'w') do |f|
f.write "\xc3\x28"
end
# then try to read and parse it
s = File.open(fname) do |f| # file opened as UTF-8
#s = File.open(fname, 'r:ascii-8bit') do |f| # file opened as ascii-8bit
f.read
end
p s.split ','
I fail to get an error here on Linux even when the input file is not UTF-8. (I'm using Ruby 1.9.2, as well.)
Logically, either this problem is linked with OS-X, or it's something to do with your input data. Does it happen regardless of the data in the input file?
(I realise that this is not a proper answer, but I lack the rep to add a comment. And since no-one has responded yet, I thought it better than nothing...)
You read the file using the default encoding your system provides. So ruby tags the string as utf8, which doesn't mean it's really utf8-data. Try file <input file> to guess what kind of encoding is in there, then tell ruby it's that one (unclean: force_encoding(<encoding>), clean: tell the File object what encoding it is, I don't know how to do that) and then use encode!("utf8") to convert it to utf8.
Here are 2 common situations and how to deal with them:
Situation 1
You have an UTF-8 input-file with possibly a few invalid bytes
Remove the invalid bytes:
test = "Partly valid\xE4 UTF-8 encoding: äöüß"
File.open( 'input_file', 'w' ) {|f| f.write(test)}
str = File.read( 'input_file' )
str.scrub('')
=> "Partly valid UTF-8 encoding: äöüß"
Situation 2
You have an input-file that could be in either UTF-8 or ISO-8859-1 encoding
Check which encoding it is and convert to UTF-8 (if necessary):
test = "String in ISO-8859-1 encoding: \xE4\xF6\xFC\xDF"
File.open( 'input_file', 'w' ) {|f| f.write(test)}
str = File.read( 'input_file' )
unless str.valid_encoding?
str.encode!( 'UTF-8', 'ISO-8859-1', invalid: :replace )
end #unless
=> "String in ISO-8859-1 encoding: äöüß"
Notes
The above code snippets assume that Ruby encodes all your strings in UTF-8 by default. Even though, this is almost always the case, you can make sure of this by starting your scripts with # encoding: UTF-8.
If invalid, it is programmatically possible to detect most multi-byte encodings like UTF-8 (in Ruby, see: #valid_encoding?). However, it is NOT possible (or at least extremely hard) to programmatically detect invalidity of single-byte-encodings like ISO-8859-1. Thus the above code snippet does not work the other way around, i.e. detecting if a String is valid ISO-8859-1 encoding.
Even though UTF-8 has become increasingly popular as the default encoding in computer-systems, ISO-8859-1 and other Latin1 flavors are still very popular in the Western countries, especially in North America. Be aware that there a several single-byte encodings out there that are very similar, but slightly vary from ISO-8859-1. Examples: CP1252 (a.k.a. Windows-1252), ISO-8859-15
[ruby] [encoding] [utf8] [file-encoding] [character-encoding]
Please try this one:-
input_data = File.open("path/your_file.pdf", "rb") {|io| io.read}
Thanks

Resources