I get broken archive when I download to the folder - bash

I get right archive when I used command:
wget https://mysite/published/my_archive.tar.gz
I can open this archive and use.
But I create fofder for archive
drwxr-xr-x folder_for_archive
and used command for download archive to the folder:
wget https://mysite/published/my_archive.tar.gz -o "./folder_for_archive/https://mysite/published/my_archive.tar.gz"
But archive is broken and I cannot use it.
What can I do?

The -o argument to wget doesn't do what you think it does:
-o logfile
--output-file=logfile
Log all messages to logfile. The messages are normally reported to standard error.
Either replace wget by curl (whose -o option does do what you expect), or use -O (capital letter O) instead:
-O file
--output-document file
The documents will not be written to the appropriate files, but all will be concatenated together and written to file.
Notice the use of plural "documents": wget is a recursive downloader, so it might produce more than one file. But in your case, it should be fine.
See the wget manual for more information on both options.

Related

Considering a specific name for the downloaded file

I download a .tar.gz file using wget using this command:
wget hello.tar.gz
This is a part of a long script, sometimes when I want to download this file, an error occurs and when for the second time the file is downloaded the name of the downloaded file changes to something like this:
hello.tar.gz.2
the third time:
hello.tar.gz.3
How can I say that the whatever the name of the downloaded is, change it to hello.tar.gz?
In other words I don't want the name of the downloaded file be anything other than hello.tar.gz?
wget hello.tar.gz -O <fileName>
wget have internal option like -r, -p to change default behavior
So just try the following:
wget -p <url>
wget -r <url>
Since now you noticed the incremental change. Discard any repeated files and rely on the following as initial condition:
wget hello.tar.gz
mv hello.tar.gz.2 hello.tar.gz

How to curl for extracting valid .zip file from redirecting link

I am trying to automate a data downloading process. For this purpose, my goal is to extract (using bash commands) the .zip from a redirection link that could be seen on display here: https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303
I have seen that people suggest the -L tag with curl for redirections, but it doesn't seem to work for my case. The specific command I have tried is:
curl -L -o output.zip https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip
The command file output.zip shows that the extracted .zip file is actually a HTML document text. On the other hand, clicking the redirection link (used inside curl command) downloads the extracted folder automatically via a browser.
Any ideas, tips, or suggestions on what I should try (or whether this is possible or not) will be highly appreciated!
If you execute curl with the --verbose option you can see that it is a cookie related problem. The cookie engine needs to be enabled. You can download the desired file as follows:
curl -b cookies.txt -L https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip -o test.zip
It doesn't matter if the file provided with the -b option doesn't exist. We just need to activate the cookie engine.
Refer to Send cookies with curl and Save cookies between two curl requests for futher information.
You can download that file with wget on Linux
$ wget https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip
$ unzip Sambanis_Aug_06.zip
Archive: Sambanis_Aug_06.zip
inflating: Sambanis (Aug 06).dta
inflating: Sambanis Appendix (Aug 06).pdf

wget how to download only newer version of a file

A script regularly downloads some data file from a remote server with wget:
CERTDIR=folder1
SPOOLDIR=folder2
URL="http://..."
FILENAME="$SPOOLDIR/latest.xml.gz"
/usr/bin/wget \
-N \
--quiet \
--private-key=${CERTDIR}/keynopass.pem \
--ca-certificate=${CERTDIR}/ca.pem \
--certificate=${CERTDIR}/client.pem \
"$URL" \
--output-document ${FILENAME}
The -N switch is used to turn on timestamping. (possibly redundant, this seems to be the default)
I was anticipating that the file only will be downloaded if there is a newer remote version.
But this is not the case. The actual download is done, no matter if the remote file has the same timestamp as the local file.
The file is a bit lengthy, so my plan was to check for a new version frequently, but download only as needed. Unfortunately this seems impossible with that approach.
Just guessing: the URL references no file, but is an api call. Could this be the reason?
But: the timestamp of the local file is set to the timestamp of the remote file - so I know, that the timestamp information is available.
Do I miss something?
Notes:
the remote server is not controlled by me
the local server runs ubuntu 16.04
wget --version: GNU Wget 1.17.1 built on linux-gnu.
The documentation mentions that:
Use of -O is not intended to mean simply "use the name file instead of
the one in the URL;" rather, it is analogous to shell redirection:
wget -O file http://foo is intended to work like wget -O - http://foo > file;
file will be truncated immediately, and all downloaded content will be written there.
For this reason, -N (for timestamp-checking) is not supported in
combination with -O: since file is always newly created, it will
always have a very new timestamp. A warning will be issued if this
combination is used.
Thus one option would be to leave out the -O option, let wget download the file (if needed), and just create a symlink in your target directory called latest.xml.gz pointing to the downloaded file...

Use curl to download a Dropbox folder via shared link (not public link)

Dropbox makes it easy to programmatically download a single file via curl (EX: curl -O https://dl.dropboxusercontent.com/s/file.ext). It is a little bit trickier for a folder (regular directory folder, not zipped). The shared link for a folder, as opposed to a file, does not link directly to the zipped folder (Dropbox automatically zips the folder before it is downloaded). It would appear that you could just add ?dl=1 to the end of the link, as this will directly start the download in a browser. This, however, points to an intermediary html document that redirects to the actual zip folder and does not seem to work with curl. Is there anyway to use curl to download a folder via a shared link? I realize that the best solution would be to use the Dropbox api, but for this project it is important to keep it as simple as possible. Additionally, the solution must be incorporated into a bash shell script.
It does appear to be possible with curl by using the -L option. This forces curl to follow the redirect. Additionally, it is important to specify an output name with a .zip extension, as the default will be a random alpha-numeric name with no extension. Finally, do not forget to add the ?dl=1 to the end of the link. Without it, curl will never reach the redirect page.
curl -L -o newName.zip https://www.dropbox.com/sh/[folderLink]?dl=1
Follow redirects (use -L). Your immediate problem is that Curl is not following redirects.
Set a filename. (Optional)
Dropbox already sends a Content-Disposition Header with its Dropbox filename. There is no reason to specify the filename if you use the correct curl flags.
Conversely, you can force a filename using something of your choosing.
Use one of these commands:
curl https://www.dropbox.com/sh/AAbbCCEeFF123?dl=1 -O -J -L
Preserve/write the remote filename (-O,-J) and follows any redirects (-L).
This same line works for both individually shared files or entire folders.
Folders will save as a .zip automatically (based on folder name).
Don't forget to change the parameter ?dl=0 to ?dl=1 (see comments).
OR:
curl https://www.dropbox.com/sh/AAbbCCEeFF123?dl=1 -L -o [filename]
Follow any redirects (-L) and set a filename (-o) of your choosing.
NOTE: Using the -J flag in general:
WARNING: Exercise judicious use of this option, especially on Windows. A rogue server could send you the name of a DLL or other file that could possibly be loaded automatically by Windows or some third party software.
Please consult: https://curl.haxx.se/docs/manpage.html#OPTIONS (See: -O, -J, -L, -o) for more.

Mac zip compress without __MACOSX folder?

When I compress files with the built in zip compressor in Mac OSX, it causes an extra folder titled "__MACOSX" to be created in the extracted zip.
Can I adjust my settings to keep this folder from being created or do I need to purchase a third party compression tool?
UPDATE: I just found a freeware app for OSX that solves my problem: "YemuZip"
UPDATE 2: YemuZip is no longer freeware.
Can be fixed after the fact by zip -d filename.zip __MACOSX/\*
And, to also delete .DS_Store files: zip -d filename.zip \*/.DS_Store
When I had this problem I've done it from command line:
zip file.zip uncompressed
EDIT, after many downvotes: I was using this option for some time ago and I don't know where I learnt it, so I can't give you a better explanation. Chris Johnson's answer is correct, but I won't delete mine. As one comment says, it's more accurate to what OP is asking, as it compress without those files, instead of removing them from a compressed file. I find it easier to remember, too.
Inside the folder you want to be compressed, in terminal:
zip -r -X Archive.zip *
Where -X means: Exclude those invisible Mac resource files such as “_MACOSX” or “._Filename” and .ds store files
source
Note: Will only work for the folder and subsequent folder tree you are in and has to have the * wildcard.
This command did it for me:
zip -r Target.zip Source -x "*.DS_Store"
Target.zip is the zip file to create. Source is the source file/folder to zip up. The -x parameter specifies the file/folder to exclude.
If the above doesn't work for whatever reason, try this instead:
zip -r Target.zip Source -x "*.DS_Store" -x "__MACOSX"
I'm using this Automator Shell Script to fix it after.
It's showing up as contextual menu item (right clicking on any file showing up in Finder).
while read -r p; do
zip -d "$p" __MACOSX/\* || true
zip -d "$p" \*/.DS_Store || true
done
Create a new Service with Automator
Select "Files and Folders" in "Finder"
Add a "Shell Script Action"
zip -r "$destFileName.zip" "$srcFileName" -x "*/\__MACOSX" -x "*/\.*"
-x "*/\__MACOSX": ignore __MACOSX as you mention.
-x "*/\.*": ignore any hidden file, such as .DS_Store .
Quote the variable to avoid file if it's named with SPACE.
Also, you can build Automator Service to make it easily to use in Finder.
Check link below to see detail if you need.
Github
The unwanted folders can be also be deleted by the following way:
zip -d filename.zip "__MACOSX*"
Works best for me
The zip command line utility never creates a __MACOSX directory, so you can just run a command like this:
zip directory.zip -x \*.DS_Store -r directory
In the output below, a.zip which I created with the zip command line utility does not contain a __MACOSX directory, but a 2.zip which I created from Finder does.
$ touch a
$ xattr -w somekey somevalue a
$ zip a.zip a
adding: a (stored 0%)
$ unzip -l a.zip
Archive: a.zip
Length Date Time Name
-------- ---- ---- ----
0 01-02-16 20:29 a
-------- -------
0 1 file
$ unzip -l a\ 2.zip # I created `a 2.zip` from Finder before this
Archive: a 2.zip
Length Date Time Name
-------- ---- ---- ----
0 01-02-16 20:29 a
0 01-02-16 20:31 __MACOSX/
149 01-02-16 20:29 __MACOSX/._a
-------- -------
149 3 files
-x .DS_Store does not exclude .DS_Store files inside directories but -x \*.DS_Store does.
The top level file of a zip archive with multiple files should usually be a single directory, because if it is not, some unarchiving utilites (like unzip and 7z, but not Archive Utility, The Unarchiver, unar, or dtrx) do not create a containing directory for the files when the archive is extracted, which often makes the files difficult to find, and if multiple archives like that are extracted at the same time, it can be difficult to tell which files belong to which archive.
Archive Utility only creates a __MACOSX directory when you create an archive where at least one file contains metadata such as extended attributes, file flags, or a resource fork. The __MACOSX directory contains AppleDouble files whose filename starts with ._ that are used to store OS X-specific metadata. The zip command line utility discards metadata such as extended attributes, file flags, and resource forks, which also means that metadata such as tags is lost, and that aliases stop working, because the information in an alias file is stored in a resource fork.
Normally you can just discard the OS X-specific metadata, but to see what metadata files contain, you can use xattr -l. xattr also includes resource forks and file flags, because even though they are not actually stored as extended attributes, they can be accessed through the extended attributes interface. Both Archive Utility and the zip command line utility discard ACLs.
You can't.
But what you can do is delete those unwanted folders after zipping. Command line zip takes different arguments where one, the -d, is for deleting contents based on a regex. So you can use it like this:
zip -d filename.zip __MACOSX/\*
Cleanup .zip from .DS_Store and __MACOSX, including subfolders:
zip -d archive.zip '__MACOSX/*' '*/__MACOSX/*' .DS_Store '*/.DS_Store'
Walkthrough:
Create .zip as usual by right-clicking on the file (or folder) and selecting "Compress ..."
Open Terminal app (search Terminal in Spotlight search)
Type zip in the Terminal (but don't hit enter)
Drag .zip to the Terminal so it converts to the path
Copy paste -d '__MACOSX/*' '*/__MACOSX/*' .DS_Store '*/.DS_Store'
Hit enter
Use zipinfo archive.zip to list files inside, to check (optional)
I have a better solution after read all of the existed answers. Everything could done by a workflow in a single right click.
NO additional software, NO complicated command line stuffs and NO shell tricks.
The automator workflow:
Input: files or folders from any application.
Step 1: Create Archive, the system builtin with default parameters.
Step 2: Run Shell command, with input as parameters. Copy command below.
zip -d "$#" "__MACOSX/*" || true
zip -d "$#" "*/.DS_Store" || true
Save it and we are done! Just right click folder or bulk of files and choose workflow from services menu. Archive with no metadata will be created alongside.
IMAGE UPDATE: I chose "Quick Action" when creating a new workflow - here’s an English version of the screenshot:
do not zip any hidden file:
zip newzipname filename.any -x "\.*"
with this question, it should be like:
zip newzipname filename.any -x "\__MACOSX"
It must be said, though, zip command runs in terminal just compressing the file, it does not compress any others. So do this the result is the same:
zip newzipname filename.any
Keka does this. Just drag your directory over the app screen.
Do you mean the zip command-line tool or the Finder's Compress command?
For zip, you can try the --data-fork option. If that doesn't do it, you might try --no-extra, although that seems to ignore other file metadata that might be valuable, like uid/gid and file times.
For the Finder's Compress command, I don't believe there are any options to control its behavior. It's for the simple case.
The other tool, and maybe the one that the Finder actually uses under the hood, is ditto. With the -c -k options, it creates zip archives. With this tool, you can experiment with --norsrc, --noextattr, --noqtn, --noacl and/or simply leave off the --sequesterRsrc option (which, according to the man page, may be responsible for the __MACOSX subdirectory). Although, perhaps the absence of --sequesterRsrc simply means to use AppleDouble format, which would create ._ files all over the place instead of one __MACOSX directory.
This is how i avoid the __MACOSX directory when compress files with tar command:
$ cd dir-you-want-to-archive
$ find . | xargs xattr -l # <- list all files with special xattr attributes
...
./conf/clamav: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
./conf/global: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
./conf/web_server: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
Delete the attribute first:
find . | xargs xattr -d com.apple.quarantine
Run find . | xargs xattr -l again, make sure no any file has the xattr attribute. then you're good to go:
tar cjvf file.tar.bz2 dir
Another shell script that could be used with the Automator tool (see also benedikt's answer on how to create the script) is:
while read -r f; do
d="$(dirname "$f")"
n="$(basename "$f")"
cd "$d"
zip "$n.zip" -x \*.DS_Store -r "$n"
done
The difference here is that this code directly compresses selected folders without macOS specific files (and not first compressing and afterwards deleting).

Resources