json parser error unexpected token - ruby
I am getting a json response array as below.
"[{\"id\":\"23886\",\"item_type\":2,\"name\":\"Equalizer\",\"label\":null,\"desc\":null,\"genre\":null,\"show_name\":null,\"img\":\"http:\\/\\/httpg3.scdn.arkena.com\\/10242\\/v2_images\\/tf1\\/0\\/tf1_media_ingest95290_image\\/tf1_media_ingest95290_image_0_208x277.jpg\",\"url\":\"\\/films\\/media-23886-Equalizer.html\",\"duration\":\"2h27mn\",\"durationtime\":\"8865\",\"audio_languages\":null,\"prod\":null,\"year\":null,\"vf\":\"1\",\"vost\":\"1\",\"sd\":true,\"hd\":false,\"sdprice\":\"4.99\",\"hdprice\":null,\"sdfile\":null,\"hdfile\":null,\"sdbundle\":\"12771\",\"hdbundle\":\"12771\",\"teaser\":\"23887\",\"att_getter\":\"Tout le monde a le droit \\u00e0 la justice\",\"orig_prod\":null,\"director\":null,\"actors\":null,\"csa\":\"CSA_6\",\"season\":null,\"episode\":null,\"typeid\":\"1\",\"isfav\":false,\"viewersrating\":\"4.0\",\"criticsrating\":\"3.0\",\"onThisPf\":1},{\"id\":\"23998\",\"item_type\":2,\"name\":\"Le Labyrinthe\",\"label\":null,\"desc\":null,\"genre\":null,\"show_name\":null,\"img\":\"http:\\/\\/httpg3.scdn.arkena.com\\/10242\\/v2_images\\/tf1\\/1\\/tf1_media_ingest94727_image\\/tf1_media_ingest94727_image_1_208x277.jpg\",\"url\":\"\\/films\\/media-23998-Le_Labyrinthe.html\",\"duration\":\"1h48mn\",\"durationtime\":\"6533\",\"audio_languages\":null,\"prod\":null,\"year\":null,\"vf\":\"1\",\"vost\":\"1\",\"sd\":true,\"hd\":false,\"sdprice\":\"4.99\",\"hdprice\":null,\"sdfile\":null,\"hdfile\":null,\"sdbundle\":\"12699\",\"hdbundle\":\"12699\",\"teaser\":\"23999\",\"att_getter\":\"Saurez-vous r\\u00e9chapper du labyrinthe ?\",\"orig_prod\":null,\"director\":null,\"actors\":null,\"csa\":\"CSA_1\",\"season\":null,\"episode\":null,\"typeid\":\"1\",\"isfav\":false,\"viewersrating\":\"3.5\",\"criticsrating\":\"4.0\",\"onThisPf\":1},{\"id\":\"23688\",\"item_type\":2,\"name\":\"Gone Girl\",\"label\":null,\"desc\":null,\"genre\":null,\"show_name\":null,\"img\":\"http:\\/\\/httpg3.scdn.arkena.com\\/10242\\/v2_images\\/tf1\\/0\\/tf1_media_ingest92895_image\\/tf1_media_ingest92895_image_0_208x277.jpg\",\"url\":\"\\/films\\/media-23688-Gone_Girl.html\",\"duration\":\"2h22mn\",\"durationtime\":\"8579\",\"audio_languages\":null,\"prod\":null,\"year\":null,\"vf\":\"1\",\"vost\":\"1\",\"sd\":true,\"hd\":false,\"sdprice\":\"4.99\",\"hdprice\":null,\"sdfile\":null,\"hdfile\":null,\"sdbundle\":\"12507\",\"hdbundle\":\"12507\",\"teaser\":\"23689\",\"att_getter\":\"Il ne faut pas se fier aux apparences...\",\"orig_prod\":null,\"director\":null,\"actors\":null,\"csa\":\"CSA_2\",\"season\":null,\"episode\":null,\"typeid\":\"1\",\"isfav\":false,\"viewersrating\":\"4.0\",\"criticsrating\":\"4.5\",\"onThisPf\":1}]"
While I try to parse it, I get Unexpected token Parser Error, which I believe is due to the quotes at the beginning and end of the response.
I was wrong to say that the parser error was due to the quotes at the beginning and end of response. But I am not sure why it happens. But when I try to parse the json response array, it does throw error.
Any idea whether there is anything wrong in the json respnse array.
I tried to parse it but it throws parser error. I tried as below
JSON.parse(File.read('demo')). The demo file contains the json
response which I pasted.
First of all, the json you posted is a ruby String. And ruby parses it as json without error. However, if you paste that string into a file, it will not be valid json because of the escape sequences, the most numerous of which is \".
In a ruby string, the sequence \", which is two characters long, is converted to one character; in a file that same sequence is two characters long: a \ and a ". In other words, escape sequences that are legal inside a ruby String do not represent the same thing when pasted into a file.
Another example: in a ruby String the escape sequence \20AC is a single character--the Euro sign. However, if you paste that sequence into a file, it will be five characters long: a \, and a 2, and a 0, and an A, and a C.
Response to comment:
There is an invisible byte order mark (BOM) at the start of the json, which you can see by executing:
p resp
...which produces the output:
\xEF\xBB\xBF[{\"id\":\"2388\" .....
The UTF-8 representation of the BOM is the byte sequence
0xEF,0xBB,0xBF
Byte order has no meaning in UTF-8,[4] so its only use in UTF-8 is to
signal at the start that the text stream is encoded in UTF-8.
You can skip the first 3 bytes/characters like this:
resp[3..-1]
I had this error with reading in JSON files and it turned out that the issue was that JSON.parse somehow did not like UTF-8-encoded files. When I first encoded the files to ASCII (= ISO 8859-1) everything went fine.
Try this. It works.
require 'json'
my_obj = JSON.parse("your json string", :symbolize_names => true)
Related
Escaping %E1 in ruby
I'm parsing some sports data, including names like 'Olaz%E1bal' and '%C1lvaro Morata' from an external feed (read: I can't change it). I want to decode these strings, but I can't figure out how. Here's what I've tried: URI.unescape: Expected: "Olazábal" Actual: "Olaz\xE1bal" CGI::unescape: Expected: "Olazábal" Actual: "Olaz\xE1bal" CGI::unescape_html: Expected: "Olazábal" Actual: "Olaz%E1bal" HTMLEntities.decode: Expected: "Olazábal" Actual: "Olaz%E1bal"
Did you check the string encoding? \xE1 is the latin1 representation of á, so it would be invalid in a utf-8 encoded string. Try to enforce a latin1 encoding by calling .force_encoding('ISO-8859-1') on the string. Also mind that it is common to use UTF-8 in URLs as well, e.g. one would encode á as %C3%A1.
CSV writes Ñ into its code form instead of its actual form
I have a CSV file. I checked its encoding using this: File.open('C:\path\to\file\myfile.txt').read.encoding and it returned the encoding as: => #<Encoding:IBM437> I'm reading this CSV per row -- stripping spaces and doing other stuff. After "cleansing" it, I push it to a new file. I'm doing it like this: CSV.foreach(file_read, encoding: "IBM437:UTF-8") do |r| # some code CSV.open(file_appended, "a", col_sep: "|") do |csv| csv << r end end Now my problem is, inside the CSV I'm reading, there's a word with an accented character -- Ñ to be exact. This character is being appended to the new file as \u2564 Its a problem considering that the accented character is a vital part of that word, and I wanted that character to appear to the new file as-is. Am I missing something? I tried the ff. source:destination encoding but to no avail: ISO-8859-1:UTF8 (and vice versa) ISO-8859-1:Windows-1252 (and vice versa) Am I missing something? Here is my ruby version, just if you'd need to know: ruby 1.9.3p392 (2013-02-22) [i386-mingw32] Thanks in advance!
The line below solved my problem: Encoding.default_external = "iso-8859-1" It tells Ruby that the file being read is encoded in ISO-8859-1, and therefore correctly interprets the Ñ character. Credit goes to Darshan Computing's answer here. Just look for Update #2.
Converting gsub() pattern from ruby 1.8 to 2.0
I have a ruby program that I'm trying to upgrade form ruby 1.8 to ruby 2.0.0-p247. This works just fine in 1.8.7: begin ARGF.each do |line| # a collection of pecluliarlities, appended as they appear in data line.gsub!("\x92", "'") line.gsub!("\x96", "-") puts line end rescue => e $stderr << "exception on line #{$.}:\n" $stderr << "#{e.message}:\n" $stderr << #line end But under ruby 2.0, this results in this an exxeption when encountering the 96 or 92 encoded into a data file that otherwise contains what appears to be ASCII: invalid byte sequence in UTF-8 I have tried all manner of things: double backslashes, using a regex object instead of the string, force_encoding(), etc. and am stumped. Can anybody fill in the missing puzzle piece for me? Thanks. =============== additions: 2013-09-25 ============ Changing \x92 to \u2019 did not fix the problem. The program does not error until it actually hits a 92 or 96 in the input file, so I'm confused as to how the character pattern in the string is the problem when there are hundreds of thousands of lines of input data that are matched against the patterns without incident.
It's not the regex that's throwing the exception, it's the Ruby compiler. \x92 and \x96 are how you would represent ’ and – in the windows-1252 encoding, but Ruby expects the string to be UTF-8 encoded. You need to get out of the habit of putting raw byte values like \x92 in your string literals. Non-ASCII characters should be specified by Unicode escape sequences (in this case, \u2019 and \u2013). It's a Unicode world now, stop thinking of text in terms of bytes and think in terms of characters instead.
Incompatible character encodings error
I'm trying to run a ruby script which generates translated HTML files from a JSON file. However I get this error: incompatible character encodings: UTF-8 and CP850 Ruby translation_hash = JSON.parse(File.read('translation_master.json').force_encoding("ISO-8859-1").encode("utf-8", replace: nil)) It seems to get stuck on this line of the JSON: Json "3": "Klassisch geschnittene Anzüge", because there is a special character "ü". The JSON file's encoding is ANSI. Any ideas what could be wrong?
Try adding # encoding: UTF-8 to the top of the ruby file. This tells ruby to interpret the file with a different encoding. If this doesn't work try to find out what kind of encoding the text uses and change the line accordingly.
IMHO your code should work if the encoding of the json file is "ISO-8859-1" and if it is a valid json file. So you should first verify if "ISO-8859-1" is the correct encoding and by the way if the file is a valid json file. # read the file with the encoding, you assume it is correct json_or_not = File.read('translation_master.json').force_encoding("ISO-8859-1") # print result and ckeck if something is obscure puts json_or_not
How do you use unicode characters within a regular expression in Ruby?
I am attempting to write a line of code that will take a line of japanese text and delete a certain set of characters. However I am having trouble with using unicode characters inside of the regular expression. I am currently using text.gsub(/《.*?》/u, '') but I get the error 'gsub': invalid byte sequence in Windows-31J (Argument error) Can anyone tell me what I am doing incorrectly? Example text : その仕草《しぐさ》があまりに無造作《むぞうさ》だったので Expected result: その仕草があまりに無造作だったので Thanks edit: # encoding: utf-8 is present at the top of the script.
Try this: text.encode('utf-8', 'utf-8').gsub(/《.*?》/u, '')