Wrong filenames after unzipping on ubuntu - windows

Problem
I have a zip-file that I would like to unzip on Ubuntu with the correct filenames (they contain æ,ø,å).
What I have tried:
1. Unrar in Windows 10 - WORKS!
Everything works as expected and filenames are correct.
2. Unzip in Ubuntu
unzip file.zip
The characters æ,ø and å are missing from the filenames, where 'æ' has been replaces with 'C'.
I attempt to detect the encoding of the zip-file, but it doesn't seem to tell me anything.
file file.zip
3. Unzip with encoding in Ubuntu
I attempt to unpack the file using various encodings that are often used for æ,ø,å-containing texts.
unzip -O UTF-8 file.zip
unzip -O ISO-8859-1 file.zip
unzip -O windows-1257 file.zip
None work...
4. Unzip using 7zip in Ubuntu
It is suggested that 7zip may fix the problem, but no..
7z x file.zip
5. Unzip using 7zip and danish language setting in Ubuntu
It is suggested that I change the ubuntu language settings and then try again.
saveLang=$LANG
export LANG=da_DK
7z x file.zip
export LANG=$saveLang
This also does not work.
6. Unzip using Python3 in Ubuntu - WORKS!
The unzip works correctly if I use Python3 for the purpose, but there must be an easier way?
import zipfile
with zipfile.ZipFile('file.zip', "r") as z:
z.extractall("/home/xxxx/")
7. Next step
I am considering finding a list of "ALL" encodings, and then just extracting the filenames and going through them manually. Something along the line of this...
while read p; do
echo "$p"
unzip -j -O $p file.zip
done <encodings.txt
Conclusion
Windows and Python3 seems to have some MAGIC under the hood that I cannot replicate. Do you guys have any suggestions to what this "MAGIC" is?
How do I identify the encoding of the filenames of a zip-file?
Where can I get a list of all encodings for step 7.
Is there any easy way to solve this problem without having to write e.g. a python script?

The key piece of information you provided was that unrar on windows was able to create the filenames correctly. So unless unrar is doing some encoding detection under the hood, that meant that there is a good chance that the encoding used in the zip files matches the default codepage used on your Windows setup.
Using chcp on Windows you see that your codepage is
Active code page: 850
It's then a simple matter of telling unzip that the encoding used in the zip file is CP850
unzip -O CP850 file.zip

Related

Using gunzip on Windows in command line

I need to use gunzip (which is the decompression tool of gzip) in a terminal on Windows
I've downloaded gzip from here (first download link)
I installed it and added its /bin folder to my PATH variable, the gzip command works but gunzip is not even executable so I can't use it
gunzip content:
#!/bin/sh
PATH=${GZIP_BINDIR-'c:/progra~1/Gzip/bin'}:$PATH
exec gzip -d "$#"
Thanks
I made it work
As I said I needed to install gzip and add its /bin folder to my PATH variable
Then edit the ProgramFiles/GnuWin32/bin/gunzip file using this (replace everything):
#echo off
gzip -d %1
and save it to .bat format so you now have a gunzip.bat file in your /bin folder
Now I can use gunzip in a terminal :)
I have been using MobaXterm software in my local machine (Windows).
It works like Putty and WinSCP together and opens the local desktop in linux mode, thus it is easy for me to gunzip the *.gz type files.
The code you posted is bash, suitable for linux.
You need to make a dos/command version of it to be run on windows
i.e.
REM THIS IS CMD
PATH=c:/progra~1/Gzip/bin;%PATH%
gzip.exe -d "%*"
Since it is a different build anyway it is hard to say if all command line parameters are the same you are used with linux so maybe even with this .cmd or .bat you will not be able to work at the same way you do in a linux environment.

I can unzip on a remote machine but not on my computer

On a cluster I zipped a large (61GB, 9.2GB when zipped) directory.
zip -r zzDirectory Directory
I then scp the zzDirectory on my personal computer.
scp -r name#host.com:/path/to/zzDirectory.zip path/in/my/computer/zzDirectory.zip
And finally I unzipped it. I tried to unzip from the bash but it failed
warning [zzDirectory.zip]: 5544449626 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [zzDirectory.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
So I doubled click on the icon from the finder and the system started to unzip zzDirectory.zip. However, some files are missing and it looks like (I am not 100% sure yet) that some newline characters (\n) are missing as well. unzip used to work fine on my computer before.
In order to investigate where the problem come from, I unzipped zzDirectory.zip on the cluster and everything seem to work fine (no missing files).
I repeated the transfer and unzipped again but the problem persists. Note that transfers are made via internet. My OS is Mac OSX Yosemite 10.10.2.
How can I solve this issue? I would prefer not to transfer data that are not zipped because of band width issue. Do you think I should try to tar or should I use specific options that goes with the unzip command line?
On OS X you could try:
ditto -x -k the_over4gb.zip /path/to/dir/where/want/unzip
e.g:
ditto -x -k zzDirectory.zip .

pack files on windows and preserve folder timestamps

I want to put a big folder on Windows box into one archive (tar, zip, gzip, whatever). Is there a tool that can preserver all folder timestamps?
The timestamps have to be preserverd after unpacking the archive on a Linux box.
Any ideas are welcome!
tar will do fine. gzip is for single file compression, zip won't preserve directory timestamps.
EDIT: Sample.
tar jcpf backup.tbz2 thedir
rm -rf thedir
tar jxpf backup.tbz2
Timestamps preserved.
EDIT2:
cygwin tar correctly preserves timestamps. Tested with tar jcf on cygwin, tar jxf on linux.
EDIT3:
WinRar preserves directory timestamps, linux unrar restores them properly.
DotNetZip preserves timestamps on folders, as well as files.
It seams that there's no soluation to all my requirements:
Pack on Windows an preserve folder timestamps
Unpack on Linux and preserve original folder timestamps
I prefer a copy/paste aka portable installation of the tool, otherwise the deployment gets to complicated.
A partial, drop in cygwin installation by just copying the necessary exe and dll files works, but doesn't preserve the folder timestamps.
Full cygwin installation is not easily possible since the windows client machines are on terminal server (see http://www.cygwin.com/faq/faq.setup.html#faq.setup.setup-fails-on-ts)
Zip doesn't work because unzip on ubuntu cannot preserver folder timestamps, even if the zip tool of choice does.

Pdflatex for windows

Does anyone know how to convert .tex files to .pdf in windows? I tried cygwin but it said the command "pdflatex" was not recognised
Thanks
Philip
There's no reason to complicate things with Cygwin. Go download and install a TeX distribution for Windows - I personally use TeX Live, but various other distributions are available, such as MikTeX or W32TeX.
If you want to use UTF-8 for your bibliography, and you're using BibTeX, I recommend using bibtexu instead of the regular bibtex (since bibtex doesn't actually support UTF-8). There's a download on the W32TeX site.
If you need to stick with cygwin, install texlive and texlive-collection-latex
The following command worked for me, under cygwin. I installed pandoc 1.13.2 and MiKTeX 2.9.5105 64-bit. Then I ran:
pandoc -s \
--latex-engine='C:\Program Files\MiKTeX 2.9\miktex\bin\x64\pdflatex.exe' \
-f markdown_github -t latex \
"my-file.md" -o "my-file.pdf"
The key here is that I gave the full path for MiKText's pdflatex.exe in the --latex-engine key, in quotes, using the windows path (as the pandoc I installed is the windows pandoc, it requires windows-style paths to find resources).
I used -f markdown_github because of the file format of my-file.md
I used -t latext but that's optional AFAIK.
Installing tetex(and optionally tetex-extra) package in Cygwin worked for me.
MikTex and texify work for me under plain Windows.

Untar multipart tarball on Windows

I have a series of files named filename.part0.tar, filename.part1.tar, … filename.part8.tar.
I guess tar can create multiple volumes when archiving, but I can't seem to find a way to unarchive them on Windows. I've tried to untar them using 7zip (GUI & commandline), WinRAR, tar114 (which doesn't run on 64-bit Windows), WinZip, and ZenTar (a little utility I found).
All programs run through the part0 file, extracting 3 rar files, then quit reporting an error. None of the other part files are recognized as .tar, .rar, .zip, or .gz.
I've tried concatenating them using the DOS copy command, but that doesn't work, possibly because part0 thru part6 and part8 are each 100Mb, while part7 is 53Mb and therefore likely the last part. I've tried several different logical orders for the files in concatenation, but no joy.
Other than installing Linux, finding a live distro, or tracking down the guy who left these files for me, how can I untar these files?
Install 7-zip. Right click on the first tar. In the context menu, go to "7zip -> Extract Here".
Works like a charm, no command-line kung-fu needed:)
EDIT:
I only now noticed that you mention already having tried 7zip. It might have balked if you tried to "open" the tar by going "open with" -> 7zip - Their command-line for opening files is a little unorthodox, so you have to associate via 7zip instead of via the file association system built-in to windows. If you try the right click -> "7-zip" -> "extract here", though, that should work- I tested the solution myself (albeit on a 32-bit Windows box- Don't have a 64 available)
1) download gzip http://www.gzip.org/ for windows and unpack it
2) gzip -c filename.part0.tar > foo.gz
gzip -c filename.part1.tar >> foo.gz
...
gzip -c filename.part8.tar >> foo.gz
3) unpack foo.gz
worked for me
As above, I had the same issue and ran into this old thread. For me it was a severe case of RTFM when installing a Siebel VM . These instructions were straight from the manual:
cat \
OVM_EL5U3_X86_ORACLE11G_SIEBEL811ENU_SIA21111_PVM.tgz.1of3 \
OVM_EL5U3_X86_ORACLE11G_SIEBEL811ENU_SIA21111_PVM.tgz.2of3 \
OVM_EL5U3_X86_ORACLE11G_SIEBEL811ENU_SIA21111_PVM.tgz.3of3 \
| tar xzf –
Worked for me!
The tar -M switch should it for you on windows (I'm using tar.exe).
tar --help says:
-M, --multi-volume create/list/extract multi-volume archive
I found this thread because I had the same problem with these files. Yes, the same exact files you have. Here's the correct order: 042358617 (i.e. start with part0, then part4, etc.)
Concatenate in that order and you'll get a tarball you can unarchive. (I'm not on Windows, so I can't advise on what app to use.) Note that of the 19 items contained therein, 3 are zip files that some unarchive utilities will report as being corrupted. Other apps will allow you to extract 99% of their contents. Again, I'm not on Windows, so you'll have to experiment for yourself.
Enjoy! ;)
This works well for me with multivolume tar archives (numbered .tar.1, .tar.2 and so on) and even allows to --list or --get specific folders or files in them:
#!/bin/bash
TAR=/usr/bin/tar
ARCHIVE=bkup-01Jun
RPATH=home/user
RDEST=restore/
EXCLUDE=.*
mkdir -p $RDEST
$TAR vf $ARCHIVE.tar.1 -F 'echo '$ARCHIVE'.tar.${TAR_VOLUME} >&${TAR_FD}' -C $RDEST --get $RPATH --exclude "$EXCLUDE"
Copy to a script file, then just change the parameters:
TAR=location of tar binary
ARCHIVE=Archive base name (without .tar.multivolumenumber)
RPATH=path to restore (leave empty for full restore)
RDEST=restore destination folder (relative or absolute path)
EXCLUDE=files to exclude (with pattern matching)
Interesting thing for me is you really DON'T use the -M option, as this would only ask you questions (insert next volume etc.)
Hello perhaps would help.
I had the same problems ...
a save on my web site made automaticaly in Centos at 4 am create multiple file in multivolume tar format (saveblabla.tar, saveblabla.tar1.tar, saveblabla.tar2.tar,etc..)
after downloading this file on my PC (windows) i can't extract them with both windows cmd or 7zip (unknow error).
I thirst binary copy file to reassemble tar files. (above in that thread)
copy /b file1+file2+file3 destination
after that, 7zip worked !!! Thanks for you help

Resources