UTF-8 i18n file - internationalization

I'm trying to add a Chinese localisation to a scaffolded Yesod site. I have a zh.msg message file saved as UTF-8 format using Notepad in Windows, but when I run cabal install in the project directory, I get this:
Handler\Home.hs:15:11:
Not in scope: data constructor `MsgHello'
Perhaps you meant `Msg<stderr>: hPutChar: invalid argument (invalid character)
The line in question is where I render my homepage:
$(widgetFile "homepage")
I changed both message files to be Unicode formatted instead of UTF-8, and get this message instead:
Foundation.hs:1:1:
Exception when trying to run compile-time code:
Cannot decode byte '\xff': Data.Text.Encoding.Fusion.streamUtf8: invalid UTF-8 stream
So I guess UTF-8 is the way to go... somehow.
(I'm using Notepad because I haven't set up gVim to render Unicode characters. It's apparently a bit of a feat.)

When I went to commit my changes I discovered the issue. The diff for my English file looked like this:
-Hello: Hello
+<U+FEFF>Hello: Hello
I guess notepad added the character in, and it was working its way into the Haskell code. I solved it using vim according to this answer.

Related

Encoding problem if I put my code in module or other ps1 file

My code was working well with special chars. I could use Write-Host "é" without any issue.
And then I moved some of my functions to an other PS1 file that I "dot sourced" (using Import-Module does the same), and I got encoding errors : prénom became prénom
I don't understand anything about encoding. VS Code doesn't allow me to change the encoding of a file. It has a parameter to set the default encoding but its defaulted on UTF8 and when I set Windows1252 it changes nothing. If I use Geany to update the encoding to Windows1252 it works... until I save the file again with VS Code.
Everything was working well when all my code was in the same file. Why would creating this second .ps1 file (which I created from the Windows Explorer) be a problem?
Working on Windows 10, in french, with VS Code 1.50.
Thank you in advance

How do I properly unzip a zip with Chinese character that from Windows in OSX?

One day I just zipped a file with Chinese character called 周國賢 - 密封罩.flac, to a zip, using bandizip & designated encoding to utf-8.
And then I try to unzip it in my MacbookPro, which is (probably) using Macintosh as encoding. The file unzipped is called ©P∞ÍΩ - ±K´ ∏n.flac, which does not match the above Chinese name.
So, I try to test about the encoding, and found that Macintosh->big5 would return the Macintosh mysterious symbol into Cantonese, but have some unmatching characters: 周衰�璀� - 密封罩.flac.
I have tried another file: §˝µ· - ¨ı®ß.ape: and it actually output the correct name of the file: 王菲 - 紅豆.ape
So, here is my question: how do I unzip a file that with big5 chinese character properly and without any information loss? Or how do I zip a file correctly to prevent information loss/ incorrect characters? (edit #2: you can use bandizip to zip the file into utf-8 encoding)
BTW, The encoding converter I am using is https://r12a.github.io/apps/encodings/, which could be quite helpful for you to check for encoding. Don't forget to click change encodings shown. And I am not the owner of the encoding converter.
edit #1: I have found that the setting in bandizip is wrong...well sorry for the inconvenience caused. Nonetheless, I figure out that The Unarchiver in Mac Apple Store can unzip big5 correctly. This can be a workaround, but still I don't know how to unzip big5 characters properly WITHOUT any loss.

How do I get quicklisp to load rfc2388 in slime?

I'm trying to load hunchentoot via quicklisp in slime, and getting the following error:
READ error during COMPILE-FILE:
:ASCII stream decoding error on
#<SB-SYS:FD-STREAM
for "file [redacted]/dists/quicklisp/software/rfc2388-20120107-http/rfc2388.asd"
{100607B723}>:
the octet sequence #(196) cannot be decoded.
(in form starting at line: 29, column: 29,
file-position: 1615)
[Condition of type ASDF:LOAD-SYSTEM-DEFINITION-ERROR]
I get this when trying to run either:
(ql:quickload "hunchentoot")
Or simply:
(ql:quickload "rfc2388")
It seems that others are getting this too. I found one hint at a possible answer, saying:
The system file is encoded as UTF-8.
I'm not sure how to configure things so that SBCL on Windows starts with
UTF-8 as its default encoding for loading sources, but that's what you
need to do.
From there, I've tried (based on e.g. [this] adding the following to my emacs config:
(set-language-environment "UTF-8")
(setq slime-lisp-implementations
'((sbcl ("/opt/local/bin/sbcl") :coding-system utf-8-unix)))
(setq slime-net-coding-system 'utf-8-unix)
But... I still get the same error, even after completely re-starting emacs, to make sure I had a fresh Slime that was reading the above config.
So, what am I missing, and/or otherwise how can I get this to load?
Thanks in advance! (More thanks to come for a successful answer. ;)
Have you checked your locale settings? Emacs configuration only tells it what coding systems to set for communication between SLIME and SWANK.
You can check for locale settings with /usr/bin/locale, for example:
navi ~ » locale
LANG=pl_PL.UTF-8
LC_CTYPE=pl_PL.UTF-8
LC_NUMERIC=pl_PL.UTF-8
LC_TIME=pl_PL.UTF-8
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY=pl_PL.UTF-8
LC_MESSAGES=C
LC_PAPER=pl_PL.UTF-8
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT=pl_PL.UTF-8
LC_IDENTIFICATION=pl_PL.UTF-8
LC_ALL=
navi ~ »
Mine is setup for UTF-8 everywhere, as you can see, except for displaying 'C' messages.
Try this:
change into the .../quicklisp/dists/quicklisp/software/rfc2388* directory and load rfc2388.asd into a text editor.
Move down to the :author parameter of the defsystem form. Replace the author's name by the name given at the top of the file.
Store file using ASCII encoding.
Of course, when a new version of the library is published, the workaround gets lost. Or else store the modified project in local-projects.
With the original UTF-8 encoding still in effect, the DEBUGGER should present an INPUT-REPLACEMENT option to replace offending input characters by a replacement string. Choose that option, type "?" or "x" or any string you like at the prompt and then ENTER. The load then completes. Of course, that is not something you would like to do every time.
So the best idea is probably to send an email to the author and ask to provide an ascii version for quicklisp.
There should be a .cache directory in your HOME that contains all the fasl files. Sometimes removing those old fasl files seems to work for me when something goes wrong with compilation.

Ruby, unable to delete file on windows - I suspect encoding problem

Similar to this question - now I can create file "Austra Skujytė.txt" but I am unable to delete it. I suspect that it is caused by ė as other files with fancy characters are also affected. AFAIK there is no way to specify encoding like in file opening:
out=File.open("#{file}", "a:UTF-8")
How can I fix it?
To delete the file, try using the short 8.3 filename; e.g.,
File.delete("AUSTRA~1.TXT")
You can convert a long filename to the short format using FFI:
https://github.com/ffi/ffi/wiki/Windows-Examples#wiki-intermediate
It's a bit hacky, but it may be what you need.

How can I modify .xfdl files? (Update #1)

The .XFDL file extension identifies XFDL Formatted Document files. These belong to the XML-based document and template formatting standard. This format is exactly like the XML file format however, contains a level of encryption for use in secure communications.
I know how to view XFDL files using a file viewer I found here. I can also modify and save these files by doing File:Save/Save As. I'd like, however, to modify these files on the fly. Any suggestions? Is this even possible?
Update #1: I have now successfully decoded and unziped a .xfdl into an XML file which I can then edit. Now, I am looking for a way to re-encode the modified XML file back into base64-gzip (using Ruby or the command line)
If the encoding is base64 then this is the solution I've stumbled upon on the web:
"Decoding XDFL files saved with 'encoding=base64'.
Files saved with:
application/vnd.xfdl;content-encoding="base64-gzip"
are simple base64-encoded gzip files. They can be easily restored to XML by first decoding and then unzipping them. This can be done as follows on Ubuntu:
sudo apt-get install uudeview
uudeview -i yourform.xfdl
gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xfdl
The first command will install uudeview, a package that can decode base64, among others. You can skip this step once it is installed.
Assuming your form is saved as 'yourform.xfdl', the uudeview command will decode the contents as 'UNKNOWN.001', since the xfdl file doesn't contain a file name. The '-i' option makes uudeview uninteractive, remove that option for more control.
The last command gunzips the decoded file into a file named 'yourform-unpacked.xfdl'.
Another possible solution - here
Side Note: Block quoted < code > doesn't work for long strings of code
The only answer I can think of right now is - read the manual for uudeview.
As much as I would like to help you, I am not an expert in this area, so you'll have to wait for someone more knowledgable to come down here and help you.
Meanwhile I can give you links to some documents that might help you:
UUDeview Home Page
Using XDFLengine
Gettting started with the XDFL Engine
Sorry if this doesn't help you.
You don't have to get out of Ruby to do this, can use the Base64 module in Ruby to encode the document like this:
irb(main):005:0> require 'base64'
=> true
irb(main):007:0> Base64.encode64("Hello World")
=> "SGVsbG8gV29ybGQ=\n"
irb(main):008:0> Base64.decode64("SGVsbG8gV29ybGQ=\n")
=> "Hello World"
And you can call gzip/gunzip using Kernel#system:
system("gzip foo.something")
system("gunzip foo.something.gz")

Resources