mongodb/gridfs java-driver use with utf-8 meta data - utf-8

I am trying to use GridFS to load a file along with some meta data
using the java-driver. (2.5.3)
Things work fine as long as the meta-data is in ASCII. But I get an
exception - the moment I try and set a UTF8 string with non ascii
characters.
String MetaData = "学海";
GridFS gridFS = new GridFS(db);
GridFSInputFile inputFile = myFS.createFile(new File(filePath));
DBObject dbObj = inputFile.getMetaData()
dbObj.put("metaData", MetaData); ----> Get exception here (if non- ascii data)
inputFile.save();

Are you able to use UTF8 strings when storing regular documents?
Based on your description, it sounds like you're trying to report a bug rather than ask a question.
MongoDB uses a JIRA system for reporting bugs. If you can include the code you are using this will help the driver developer correct the issue.

Related

Reading Japanese query parameters from the URL

I am getting query parameter as "ã\\u0083\\u008eã\\u0083¼ã\\u0083\\u0096ã\\u0083©ã\\u0083³ã\\u0083\\u0089å\\u0093\\u0081" in my controller for Japanese character "ノーブランド品".
Is there a way to translate all query parameters into UTF-8?
I have tried multiple solutions but it does not seem to be working
Solution I tried
URLDecoder.decode(string, "UTF-8");
Another solution I tried is
ByteBuffer buffer = StandardCharsets.UTF_8.encode(encodedName);
decodedName = StandardCharsets.UTF_8.decode(buffer).toString();
Is there a way to decode the string back to Japanese once it is translated? Reason am asking is because page that is calling us is not owned by us
Thanks

Firestore will not save words with accents?

I'm trying to move data to Firestore from a MySQL table encoded as utf-8 (specifically, utf8mb4_unicode_520_ci). I'm using Golang's Firestore libraries along with sqlx. Most or every word that has accent characters fails, e.g., müller, évident, etc. The error returned is as follows:
rpc error: code = Internal desc = grpc: error while marshaling: proto:
field "google.firestore.v1.Value.ValueType" contains invalid UTF-8
I can enter the accent characters into Firestore manually using the browser-based interface, so I'm guessing the issue lies with the Golang library. Is there any workaround that would preserve the accent characters?
The solution to my issue was unrelated to Firestore and libraries I was using, but instead was a problem in a word-tokenization function I had written. The tokenization was mangling accented characters into bad UTF-8, so converting them to runes before tokenization solved the issue.

Read a CSV file with special characters in Ruby and store into SQL Server

I'm trying to import a CSV file (UTF-8 encoding) in Ruby (2.0.0) in to my database (MSSQL 2008R2, COLLATION French_CI_AS), but the special characters (French accents on vowels) are not stored properly : éèçôü becomes éèçôü (or other similar jibberish).
I use this piece of code to read the file :
CSV.foreach(file, col_sep: ';', encoding: "utf-8") do |row|
# ...
end
I tried various encoding in the CSV options (utf-8, iso-8859-1, windows-1252), but none would store the special characters correctly.
Before you ask, my database collation supports those characters, since we have successfully imported data containing those using PHP importers. If I dump the data using puts or a file logger, everything is correct.
Is something wrong with my code, or do I need to specify something else (like the ruby class file encoding for example) ?
Thanks
EDIT : The data saving is done by a PHP REST API that works fine with accented characters. It stores data as it is received.
In Ruby, I parse my data, store it in an object and then send the JSON-encoded object in the body of my PUT request. But if I use an SQL query directly from Ruby, the problem remains :
query = <<-SQL
UPDATE MyTable SET MyTable_title = '#{row_data['title']}' WHERE MyTable_id = '#{row_data['id']}'
SQL
res = db.execute query
I was thinking that this had something to do with the encoding type on your CSV file, so started digging around on that. I did find that windows-1252 encoding will insert control characters.
You can read more about it here: Converting special charactes such as ü and à back to their original, latin alphbet counterparts in C#

Failed to compare UTF-8 chrs in Ruby

I'm using Ruby - Cucumber for automation.
I'm trying to send Japanese chars as a parameter to the user defined function to verify in db.
Below is the statement what I have used :
x=$objDB.run_select_query_verifyText('select name from xxxx where id=1','ごせり槎ゃぱ')
In the run_select_query_verifyText() function I have the code to connect db and get the records from db and it will verify the the text which is passed as a parameter(Japanese chars. )
This function returns true if the string is match with table data in DB else false.
But I'm getting always false and I found that the Japanese string is converting as "??????" while comparing the data.
Note: My program is working fine with English chars.
Your problem is most likely with character encodings. The database returns the content in a different encoding that the Ruby string you are working with. You need to figure out what the db encoding is and make sure both are the same.
If you are using ruby 1.9, you can check the encoding current encoding with yourstring.encoding and change it to e.g. UTF-8 with yourstring.encode("UTF-8").
If you are on ruby 1.8 things are bit more tricky as the String class doesn't natively support encodings. You can use e.g. the character-encodings gem to work around this.

String not valid UTF-8 (BSON::InvalidStringEncoding) when saving a UTF8 compatible string to MongoDB through Mongoid ORM

I am importing data from a MySQL table into MongoDB using Mongoid for my ORM. I am getting an error when trying to save an email address as a string. The error is:
/Library/Ruby/Gems/1.8/gems/bson-1.2.4/lib/../lib/bson/bson_c.rb:24:in `serialize': String not valid UTF-8 (BSON::InvalidStringEncoding)
from /Library/Ruby/Gems/1.8/gems/bson-1.2.4/lib/../lib/bson/bson_c.rb:24:in `serialize'
From my GUI - this is a screenshot of the table info. You can see it's encoded in UTF8.
Also from my GUI - this is a screen shot of the field in my MySQL table that I am importing
This is what happens when I grab the data from MySQL CLI.
And finally, when I inspect the data in my ruby object, I get something that looks like this:
I'm a bit confused here because regardless my table is in UTF-8 and that funky is apparently valid UTF-8 character as a double byte. Anyone know why I'm getting this error?
Try using this helper:
http://snippets.dzone.com/posts/show/4527
It puts a method utf8? on the String. So you can grab the String from mysql and see if it is utf8:
my_string.utf8?
If is not, then you can try change the encoding of your String using other methods like:
my_string.asciify_utf8
my_string.latin1_to_utf8
my_string.cp1252_to_utf8
my_string.utf16le_to_utf8
Maybe this String is saved on mysql in one of these encodings.

Resources