rubyzip and unicode characters in filenames - ruby

I am creating zip archive with rubyzip gem and Zip::ZipOutputStream class and have got a problem with unicode letters (cyrillic) - in archive they are presented as question marks like ????? ???? ??.doc. Does rubyzip support unicode?

I looked at rubyzip methods and it doesn't seem that rubyzip can change the language. It probably uses your computer's default code page. You could use chilkat zip instead as in this example unless you have specific requirements that cannot be addressed by chilkat.

You can use the following snippet to convert UTF-8 to CP437 which cover some unicode chars (just a few). Windows 7 and older assume that filenames are encoded in CP437.
# first normalize the string
normalized_filename = input.mb_chars.normalize.to_s
# then encode in cp437
filename_for_zip = normalized_filename.encode("cp437")
# add file to zip
zipfile.add(filename_for_zip, pdf_file)

You may just run zip directly.
`cd yourfolder; zip archivename file1 file2`
Notice specific quotes. Worked for me on Ubuntu for Cyrillic file names, while rubyzip was generating archive with non-readable file names.

Related

How do I properly unzip a zip with Chinese character that from Windows in OSX?

One day I just zipped a file with Chinese character called 周國賢 - 密封罩.flac, to a zip, using bandizip & designated encoding to utf-8.
And then I try to unzip it in my MacbookPro, which is (probably) using Macintosh as encoding. The file unzipped is called ©P∞ÍΩ - ±K´ ∏n.flac, which does not match the above Chinese name.
So, I try to test about the encoding, and found that Macintosh->big5 would return the Macintosh mysterious symbol into Cantonese, but have some unmatching characters: 周衰�璀� - 密封罩.flac.
I have tried another file: §˝µ· - ¨ı®ß.ape: and it actually output the correct name of the file: 王菲 - 紅豆.ape
So, here is my question: how do I unzip a file that with big5 chinese character properly and without any information loss? Or how do I zip a file correctly to prevent information loss/ incorrect characters? (edit #2: you can use bandizip to zip the file into utf-8 encoding)
BTW, The encoding converter I am using is https://r12a.github.io/apps/encodings/, which could be quite helpful for you to check for encoding. Don't forget to click change encodings shown. And I am not the owner of the encoding converter.
edit #1: I have found that the setting in bandizip is wrong...well sorry for the inconvenience caused. Nonetheless, I figure out that The Unarchiver in Mac Apple Store can unzip big5 correctly. This can be a workaround, but still I don't know how to unzip big5 characters properly WITHOUT any loss.

UTF-8 i18n file

I'm trying to add a Chinese localisation to a scaffolded Yesod site. I have a zh.msg message file saved as UTF-8 format using Notepad in Windows, but when I run cabal install in the project directory, I get this:
Handler\Home.hs:15:11:
Not in scope: data constructor `MsgHello'
Perhaps you meant `Msg<stderr>: hPutChar: invalid argument (invalid character)
The line in question is where I render my homepage:
$(widgetFile "homepage")
I changed both message files to be Unicode formatted instead of UTF-8, and get this message instead:
Foundation.hs:1:1:
Exception when trying to run compile-time code:
Cannot decode byte '\xff': Data.Text.Encoding.Fusion.streamUtf8: invalid UTF-8 stream
So I guess UTF-8 is the way to go... somehow.
(I'm using Notepad because I haven't set up gVim to render Unicode characters. It's apparently a bit of a feat.)
When I went to commit my changes I discovered the issue. The diff for my English file looked like this:
-Hello: Hello
+<U+FEFF>Hello: Hello
I guess notepad added the character in, and it was working its way into the Haskell code. I solved it using vim according to this answer.

how to split/rejoin the zip file using ruby

i am new to Ruby. Is there any way to split a large zip file & then again join the split files to one large zip file?
i can see a link with split sample, but can see an error while running(split object error)
split sample link
Can anyone help me in SPlit/join the zip filesin ruby?
The Zip::ZipFile.split isn't available in the latest rubyzip version 0.9.9. It exists only in the latest master branch of the source code. If you're finding a way to split a large file into small parts and join them later, or rather, you don't rely on the intermediate split results, you can try split of Unix/Linux. E.g. you want to use a USB drive to copy the small files and join them in another computer.
# each file will contain 1048576 bytes
# the file will be splitted into xaa, xab, xac...
# You can add optional prefix to the end of the command
split -b 1048576 large_input_file.zip
# join them some where after
cat x* >large_input_file.zip
The rubyzip gem provides a way to create multi-part zip files from a large zip file. You can use p7zip or WinRAR to unzip the split zip file parts. However, it's strange that unzip doesn't support multi-part zip files. The manual of unzip says,
Multi-part archives are not yet supported, except in conjunction with zip. (All parts must be concatenated together in order, and then zip -F'' (for zip 2.x) orzip -FF'' (for zip 3.x) must be performed on the concatenated archive in order to fix'' it. Also, zip 3.0 and later can combine multi-part (split) archives into a combined single-file archive usingzip -s- inarchive -O outarchive''. See the zip 3 manual page for more information.) This will definitely be corrected in the next major release.
If you want this, you can clone the latest master branch and use that lib to do the job.
$ git clone https://github.com/aussiegeek/rubyzip.git
$ vim split.rb
Then in your ruby file "split.rb":
$:.unshift './rubyzip/lib'
require 'zip/zip'
part_zip_count = Zip::ZipFile.split("large_zip_file.zip", 102400, false)
puts "Zip file splitted in #{part_zip_count} parts"
You can checkout the docs for split

Ruby, unable to delete file on windows - I suspect encoding problem

Similar to this question - now I can create file "Austra Skujytė.txt" but I am unable to delete it. I suspect that it is caused by ė as other files with fancy characters are also affected. AFAIK there is no way to specify encoding like in file opening:
out=File.open("#{file}", "a:UTF-8")
How can I fix it?
To delete the file, try using the short 8.3 filename; e.g.,
File.delete("AUSTRA~1.TXT")
You can convert a long filename to the short format using FFI:
https://github.com/ffi/ffi/wiki/Windows-Examples#wiki-intermediate
It's a bit hacky, but it may be what you need.

How can I modify .xfdl files? (Update #1)

The .XFDL file extension identifies XFDL Formatted Document files. These belong to the XML-based document and template formatting standard. This format is exactly like the XML file format however, contains a level of encryption for use in secure communications.
I know how to view XFDL files using a file viewer I found here. I can also modify and save these files by doing File:Save/Save As. I'd like, however, to modify these files on the fly. Any suggestions? Is this even possible?
Update #1: I have now successfully decoded and unziped a .xfdl into an XML file which I can then edit. Now, I am looking for a way to re-encode the modified XML file back into base64-gzip (using Ruby or the command line)
If the encoding is base64 then this is the solution I've stumbled upon on the web:
"Decoding XDFL files saved with 'encoding=base64'.
Files saved with:
application/vnd.xfdl;content-encoding="base64-gzip"
are simple base64-encoded gzip files. They can be easily restored to XML by first decoding and then unzipping them. This can be done as follows on Ubuntu:
sudo apt-get install uudeview
uudeview -i yourform.xfdl
gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xfdl
The first command will install uudeview, a package that can decode base64, among others. You can skip this step once it is installed.
Assuming your form is saved as 'yourform.xfdl', the uudeview command will decode the contents as 'UNKNOWN.001', since the xfdl file doesn't contain a file name. The '-i' option makes uudeview uninteractive, remove that option for more control.
The last command gunzips the decoded file into a file named 'yourform-unpacked.xfdl'.
Another possible solution - here
Side Note: Block quoted < code > doesn't work for long strings of code
The only answer I can think of right now is - read the manual for uudeview.
As much as I would like to help you, I am not an expert in this area, so you'll have to wait for someone more knowledgable to come down here and help you.
Meanwhile I can give you links to some documents that might help you:
UUDeview Home Page
Using XDFLengine
Gettting started with the XDFL Engine
Sorry if this doesn't help you.
You don't have to get out of Ruby to do this, can use the Base64 module in Ruby to encode the document like this:
irb(main):005:0> require 'base64'
=> true
irb(main):007:0> Base64.encode64("Hello World")
=> "SGVsbG8gV29ybGQ=\n"
irb(main):008:0> Base64.decode64("SGVsbG8gV29ybGQ=\n")
=> "Hello World"
And you can call gzip/gunzip using Kernel#system:
system("gzip foo.something")
system("gunzip foo.something.gz")

Resources