I'm trying to return russian text:
#Produces(MediaType.TEXT_PLAIN + ";charset=utf-8")
return Response.status(200).entity("Русский текст.").build();
Before I started reading standalone-full.xml, I was getting normal Russian text, but when I changed this in standalone.bat:
set "SERVER_OPTS=--server-config=standalone-full.xml"
I'm getting something like this.
This don't help me:
I have mobile automation code in Ruby with locale property files and code is using JavaProperties::Properties.new(filename with path) which is returning hash and we are reading property value by providing property name.
Recently fr_CA.properties file was updated with unicode chars, something like "Solde du dernier relev\u00E9". After the update, I'm getting value "Solde du dernier relevé" instead of "Solde du dernier relevé".
I need some help how/where to provide UTF-8 conversion type.
Quick help highly appreciated.
#filePaths={
:pathTo_some_JavaProperties => #resourcesPath+"/service_"+locale+""+platform_fileName+".properties",
:pathTo_locale_other_JavaProperties => #resourcesPath+"/MoblClient_XmlService"+locale+".properties"
// more file paths
}
begin
#someHash = JavaProperties::Properties.new(#filePaths.fetch(:pathTo_some_JavaProperties))
rescue Errno::ENOENT
filesNotFound << #filePaths.fetch(:pathTo_some_JavaProperties)
end
// Reading value as #someHash['propName'] which is giving output as "Solde du dernier relevé"
Ok, here's what I am getting:
In test.properties:
item1 = Solde du dernier relev\u00E9
Then in Ruby,
> JavaProperties.load('test.properties')[:item1]
# => "item1 Solde du dernier relevé"
You should try getting your problematic code as stripped as possible, and then see if you keep getting the error.
BTW, I think you should use JavaProperties.load, not JavaProperties.new as in your sample.
Good day,
I have a lot of big xml files that i need to parse, but problems is they have 'gb2312' encoding. I would normaly use SAX parser for this.
So here is in example of xml:
<?xml version="1.0" encoding="gb2312"?>
<Root>
<ValueList Count="112290" FieldCount="11">
<Item1 Value1="23743" Value2="Дипломатия � Пустой кувшин" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
<Item2 Value1="6611" Value2="ДЛ � 018 омела � золотой кинжал" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
<Item3 Value1="6608" Value2="Наука (ДЛ)�круг фей 021�тяпка" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
<Item4 Value1="6612" Value2="Знаки ДЛ � 003руны � разрушение" Value3="1" Value4="" Value5="6" Value6="0" Value7="0" Value8="0" Value9="0" Value10="0" Value11="0"/>
....
</Root>
I'm trying to use Nokogiri SAX (also tried libxml-ruby with same result) parser:
require 'nokogiri'
class SchemaParser < Nokogiri::XML::SAX::Document
def initialize
#cnt = 0
end
def start_element name, attrs =[]
if name == "Item1"
#cnt+= 1
puts #cnt
end
end
end
parser = Nokogiri::XML::SAX::Parser.new(SchemaParser.new)
parser.parse_io(File.open('2_4_EQUIPMENT_ESSENCE.xml'), 'gb2312')
But this gives error "`check_encoding': 'GB2312' is not a valid encoding (ArgumentError)". If I remove encoding declaration and let Nokogiri detect encoding himself, I will receive this error:
encoding error : input conversion failed due to input error, bytes 0xA8 0x43 0x20 0xA7
encoding error : input conversion failed due to input error, bytes 0xA8 0x43 0x20 0xA7
I/O error : encoder error
I also tried to open File with proper encoding, but that didn't help SAX parser:
[3] pry(main)> f = File.open('2_4_EQUIPMENT_ESSENCE.xml', "r:gb2312")
=> #<File:2_4_EQUIPMENT_ESSENCE.xml>
[4] pry(main)> f.external_encoding.name
=> "GB2312"
Did anyone use 'gb2312' encoding with SAX parsers in ruby? Any recommendations how to proceed?
It seems the issue is that Libxml2 does not support the GB2312 encoding (see here for a list of supported encodings).
I'm not sure if you have tried this, but I think you can work around this by removing the encoding declaration from the XML files (so Libxml2 does not try to transcode the data) and set the external encoding of the File object to GB2312, because then Ruby will transcode the file to UTF-8 as it is read, and from then on everything will remain as UTF-8.
So, here is my workaround.
Problems:
Some of characters presented in xml are not 'gb2312' encoding, I have found that 'GB18030' would be a better choice with full Chinese characters.
I converted all xml's to utf8, so i can use SAX parser.
I ended up with this rake task:
desc "convert chinese xml files to utf-8"
task :convert do
rm_rf 'data/utf8'
mkdir 'data/utf8'
Dir.foreach('data') {|f|
if f.end_with?('.xml')
puts "converted:: data/utf8/#{f}" if system("iconv -f GB18030 -t UTF-8 data/#{f} > data/utf8/#{f}")
end
}
#replace encodings for xml files
system("bundle exec ruby -pi -e \"gsub(/gb2312/, 'UTF-8')\" data/utf8/*.xml")
end
In an app I recently built for a client the following code resulted in the variable #nameText being evaluated, and then resulting in an error 'no text' (since the variable doesn't exist).
To get around this I used gsub, as per the example below. Is there a way to tell Magick not to evaluate the string at all?
require 'RMagick'
#image = Magick::Image.read( '/path/to/image.jpg' ).first
#nameText = '#SomeTwitterUser'
#text = Magick::Draw.new
#text.font_family = 'Futura'
#text.pointsize = 22
#text.font_weight = Magick::BoldWeight
# Causes error 'no text'...
# #text.annotate( #image, 0,0,200,54, #nameText )
#text.annotate( #image, 0,0,200,54, #nameText.gsub('#', '\#') )
This is the C code from RMagick that is returning the error:
// Translate & store in Draw structure
draw->info->text = InterpretImageProperties(NULL, image, StringValuePtr(text));
if (!draw->info->text)
{
rb_raise(rb_eArgError, "no text");
}
It is the call to InterpretImageProperties that is modifying the input text - but it is not Ruby, or a Ruby instance variable that it is trying to reference. The function is defined here in the Image Magick core library: http://www.imagemagick.org/api/MagickCore/property_8c_source.html#l02966
Look a bit further down, and you can see the code:
/* handle a '#' replace string from file */
if (*p == '#') {
p++;
if (*p != '-' && (IsPathAccessible(p) == MagickFalse) ) {
(void) ThrowMagickException(&image->exception,GetMagickModule(),
OptionError,"UnableToAccessPath","%s",p);
return((char *) NULL);
}
return(FileToString(p,~0,&image->exception));
}
In summary, this is a core library feature which will attempt to load text from file (named SomeTwitterUser in your case, I have confirmed this -try it!), and your work-around is probably the best you can do.
For efficiency, and minimal changes to input strings, you could rely on the selectivity of the library code and only modify the string if it starts with #:
#text.annotate( #image, 0,0,200,54, #name_string.gsub( /^#/, '\#') )
When I try to submit a textarea with Mechanize and Ruby 2.0, I always get an
Encoding::UndefinedConversionError: U+0151 from UTF-8 to ISO-8859-1
Then I tryied to convert the text with Iconv, I got a similar result:
Iconv.iconv("LATIN1", "UTF-8", text)
I get this error message:
Iconv::IllegalSequence: "őzködik, melyet "...
As the text contains east-european characters. What can I do to avoid this kind of inconveniences or how can I convert properly between different encodings?
I have found an elegant solution:
replacements = [["À", "À"], ["Á", "Á"], ["Â", "Â"], ["Ã", "Ã"], ["Ä", "Ä"], ["Å", "Å"], ["Æ", "Æ"], ["Ç", "Ç"], ["È", "È"], ["É", "É"], ["Ê", "Ê"], ["Ë", "Ë"], ["Ì", "Ì"], ["Í", "Í"], ["Î", "Î"], ["Ï", "Ï"], ["Ð", "Ð"], ["Ñ", "Ñ"], ["Ò", "Ò"], ["Ó", "Ó"], ["Ô", "Ô"], ["Õ", "Õ"], ["Ö", "Ö"], ["Ø", "Ø"], ["Ù", "Ù"], ["Ú", "Ú"], ["Û", "Û"], ["Ü", "Ü"], ["Ý", "Ý"], ["Þ", "Þ"], ["ß", "ß"], ["à", "à"], ["á", "á"], ["â", "â"], ["ã", "ã"], ["ä", "ä"], ["å", "å"], ["æ", "æ"], ["ç", "ç"], ["è", "è"], ["é", "é"], ["ê", "ê"], ["ë", "ë"], ["ì", "ì"], ["í", "í"], ["î", "î"], ["ï", "ï"], ["ð", "ð"], ["ñ", "ñ"], ["ò", "ò"], ["ó", "ó"], ["ô", "ô"], ["õ", "õ"], ["ö", "ö"], ["ø", "ø"], ["ù", "ù"], ["ú", "ú"], ["û", "û"], ["ü", "ü"], ["ý", "ý"], ["þ", "þ"], ["ÿ", "ÿ"]]
def replace(str,replacements)
replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
return str
end
my_string=replace(my_string,replacements)