I need to move large number of files to to S3 with the time-stamps intact (c-time, m-time etc need to be intact => I cannot use the aws s3 sync command) - for which I use the following command:
sudo tar -c --use-compress-program=pigz -f - <folder>/ | aws s3 cp - s3://<bucket>/<path-to-folder>/
When trying to create a tar.gz file using the above command --- for a folder that is 80+GB --- I ran into the following error:
upload failed: - to s3://<bucket>/<path-to-folder>/<filename>.tar.gz An error occurred (InvalidArgument) when calling the UploadPart operation: Part number must be an integer between 1 and 10000, inclusive
Upon researching this --- I found that there is a limit of 68GB for tar files (size of file-size-field in the tar header).
Upon further research - I also found a solution (here) that shows how to create a set of tar.gz files using split:
tar cvzf - data/ | split --bytes=100GB - sda1.backup.tar.gz.
that can later be untar with:
cat sda1.backup.tar.gz.* | tar xzvf -
However - split has a different signature:
split [OPTION]... [FILE [PREFIX]]
...So - the obvious solution :
sudo tar -c --use-compress-program=pigz -f - folder/ | split --bytes=20GB - prefix.tar.gz. | aws s3 cp - s3://<bucket>/<path-to-folder>/
...will not work - since split uses the prefix as a string and writes the output to a file with that set of names.
Question is: Is there a way to code this such that I an effectively use a pipe'd solution (ie., not use additional disk-space) and yet get a set of files (called prefix.tar.gz.aa, prefix.tar.gz.ab etc) in S3?
Any pointers would be helpful.
--PK
That looks like a non-trivial challenge. Pseudo-code might look like this:
# Start with an empty list
list = ()
counter = 1
foreach file in folder/ do
if adding file to list exceeds tar or s3 limits then
# Flush current list of files to S3
write list to tmpfile
run tar czf - --files-from=tmpfile | aws s3 cp - s3://<bucket>/<path-to-file>.<counter>
list = ()
counter = counter + 1
end if
add file to list
end foreach
if list non-empty
write list to tmpfile
run tar czf - --files-from=tmpfile | aws s3 cp - s3://<bucket>/<path-to-file>.<counter>
end if
This uses the --files-from option of tar to avoid needing to pass individual files as command arguments and running into limitations there.
I'm writing a script that needs to list file entries from a zip file. My problem is that when there is an entry with an emoji, and the CLI doesn't output the file name correctly:
❯ zip -r foo.zip test/
adding: test/ (stored 0%)
adding: test/😊.txt (stored 0%)
src on main [!?] is 📦 v1.0.0 via 🤖 v16.14.0
❯ unzip -l foo.zip
Archive: foo.zip
Length Date Time Name
--------- ---------- ----- ----
0 04-08-2022 20:54 test/
0 04-08-2022 20:54 test/�???.txt <---- here is my problem
--------- -------
0 2 files
src on main [!?] is 📦 v1.0.0 via 🤖 v16.14.0
❯ unzip foo.zip test/😊.txt
Archive: foo.zip
extracting: test/�???.txt
Is there a way to tell unzip to list the file entries with consideration of special characters?
Thanks!
It doesn't seem possible to accurately list the files in a zip archive with unzip (tested with unzip 6.00); you'll have to select an other tool.
I chose perl in my answer because it has the required functionality in its core library. Here I used a newline as delimiter (-l) but you should replace it with a NULL-BYTE (-l0) if you want to be able to read and process the outputted paths 100% accurately from bash:
perl -l -e '
use IO::Uncompress::Unzip;
$zip = IO::Uncompress::Unzip->new($ARGV[0]);
while($zip->nextStream()) {
print $zip->getHeaderInfo()->{Name}
}
' foo.zip
test/
test/😊.txt
remark: Python also have a ZipFile module in its core library. I didn't post any Python solution because of the encoding issues of its stdout. The fixes aren't compatible between Python versions...
I am trying to create tar from a file, which contains list of other files and saving it to stdout.
let suppose there is a file called "files-to-create" which has path of other files like /home/abc.txt /home/def.txt and I want to create tar of abc.txt,def.txt.
my script contains:
exec 100>&1
tar cf - -T files-to-sync >&100
and I am calling the script and saving it to some other file like:
/script.sh > final_tar.tar
But while creating the tar I am getting error, can somebody help me out?
You can use the following script to reach your goal, let me know if something is unclear:
Prototype 1:
$ cat scriptTar.sh
#!/bin/bash
readonly HELP="$(basename "$0") <list_of_files> <output_tar>
this script will generate a tar file composed of all files present in <list_of_files> input file
the output tar file will be saved as <output_tar>
to run the script provide the input and output filenames"
readonly INPUT_LIST_FILE=$1
readonly OUTPUT_TAR_FILE=$2
if [ -z "$INPUT_LIST_FILE" -o -z "$OUTPUT_TAR_FILE" ]
then
echo $HELP;
exit 1;
fi
tar cf - -T $INPUT_LIST_FILE > $OUTPUT_TAR_FILE
exit $?
Folder content:
$ tree .
.
├── a
│ └── abc.txt
├── b
│ └── def.txt
├── c
│ └── ghj.txt
├── files-to-sync.in
└── scriptTar.sh
3 directories, 5 files
List file content:
$ cat files-to-sync.in
./a/abc.txt
./b/def.txt
./c/ghj.txt
Execution:
$ ./scriptTar.sh files-to-sync.in output.tar
tar file content:
$ tar -tvf output.tar
-rw-rw-r-- arobert/arobert 4 2018-02-22 16:50 ./a/abc.txt
-rw-rw-r-- arobert/arobert 4 2018-02-22 16:50 ./b/def.txt
-rw-rw-r-- arobert/arobert 4 2018-02-22 16:50 ./c/ghj.txt
Or use the following script if you really want to display it on stdout:
Prototype 2 via ssh:
#!/bin/bash
readonly HELP="ERROR: $(basename "$0") <list_of_files>
this script will generate to stdout a tar file composed of all files present in <list_of_files> input file
to run the script provide the input file and redirect the output to a file"
readonly INPUT_LIST_FILE=$1
if [ -z "$INPUT_LIST_FILE" ]
then
echo $HELP;
exit 1;
fi
tar cf - -T $INPUT_LIST_FILE
Execution via ssh:
$ ssh user#localhost "cd /home/user/test_tar/; ./scriptTar.sh files-to-sync.in" > output.tar
user#localhost's password:
Content of the tar generated:
tar -tf output.tar
./a/abc.txt
./b/def.txt
./c/ghj.txt
extracting the content:
tar xvf output.tar
./a/abc.txt
./b/def.txt
./c/ghj.txt
checking the files:
more ?/*.txt
::::::::::::::
a/abc.txt
::::::::::::::
abc
::::::::::::::
b/def.txt
::::::::::::::
abc
::::::::::::::
c/ghj.txt
::::::
However if I were you, I would not only generate a tar file but add some compression (tar.gz) and transfer the file with rsync to be able to restart the download from the point where it stopped in case of transfer error.
So the proper solution is
Case1: If you are passing the list of file as an argument
you can use this:
files-to-sync=$1
tar cf - -T files-to-sync
Case2: If you want to use absolute path for the list of file
you can use this:
tar cfP - -T /path/to/the/file
use -P in case of absolute path.
I have a pkg file created by Install Maker for Mac.
I want to replace one file in pkg. But I must do this under Linux system, because this is a part of download process. When user starts to download file server must replace one file in pkg.
I have a solution how unpack pkg and replace a file but I dont know how pack again to pkg.
http://emresaglam.com/blog/1035
http://ilostmynotes.blogspot.com/2012/06/mac-os-x-pkg-bom-files-package.html
Packages are just .xar archives with a different extension and a specified file hierarchy. Unfortunately, part of that file hierarchy is a cpio.gz archive of the actual installables, and usually that's what you want to edit. And there's also a Bom file that includes information on the files inside that cpio archive, and a PackageInfo file that includes summary information.
If you really do just need to edit one of the info files, that's simple:
mkdir Foo
cd Foo
xar -xf ../Foo.pkg
# edit stuff
xar -cf ../Foo-new.pkg *
But if you need to edit the installable files:
mkdir Foo
cd Foo
xar -xf ../Foo.pkg
cd foo.pkg
cat Payload | gunzip -dc |cpio -i
# edit Foo.app/*
rm Payload
find ./Foo.app | cpio -o | gzip -c > Payload
mkbom Foo.app Bom # or edit Bom
# edit PackageInfo
rm -rf Foo.app
cd ..
xar -cf ../Foo-new.pkg
I believe you can get mkbom (and lsbom) for most linux distros. (If you can get ditto, that makes things even easier, but I'm not sure if that's nearly as ubiquitously available.)
Here is a bash script inspired by abarnert's answer which will unpack a package named MyPackage.pkg into a subfolder named MyPackage_pkg and then open the folder in Finder.
#!/usr/bin/env bash
filename="$*"
dirname="${filename/\./_}"
pkgutil --expand "$filename" "$dirname"
cd "$dirname"
tar xvf Payload
open .
Usage:
pkg-upack.sh MyPackage.pkg
Warning: This will not work in all cases, and will fail with certain files, e.g. the PKGs inside the OSX system installer. If you want to peek inside the pkg file and see what's inside, you can try SuspiciousPackage (free app), and if you need more options such as selectively unpacking specific files, then have a look at Pacifist (nagware).
You might want to look into my fork of pbzx here: https://github.com/NiklasRosenstein/pbzx
It allows you to stream pbzx files that are not wrapped in a XAR archive. I've experienced this with recent XCode Command-Line Tools Disk Images (eg. 10.12 XCode 8).
pbzx -n Payload | cpio -i
In addition to what #abarnert said, I today had to find out that the default cpio utility on Mountain Lion uses a different archive format per default (not sure which), even with the man page stating it would use the old cpio/odc format. So, if anyone stumbles upon the cpio read error: bad file format message while trying to install his/her manipulated packages, be sure to include the format in the re-pack step:
find ./Foo.app | cpio -o --format odc | gzip -c > Payload
#shrx I've succeeded to unpack the BSD.pkg (part of the Yosemite installer) by using "pbzx" command.
pbzx <pkg> | cpio -idmu
The "pbzx" command can be downloaded from the following link:
pbzx Stream Parser
If you are experiencing errors during PKG installation following the accepted answer, I will give you another procedure that worked for me (please note the little changes to xar, cpio and mkbom commands):
mkdir Foo
cd Foo
xar -xf ../Foo.pkg
cd foo.pkg
cat Payload | gunzip -dc | cpio -i
# edit Foo.app/*
rm Payload
find ./Foo.app | cpio -o --format odc --owner 0:80 | gzip -c > Payload
mkbom -u 0 -g 80 Foo.app Bom # or edit Bom
# edit PackageInfo
rm -rf Foo.app
cd ..
xar --compression none -cf ../Foo-new.pkg
The resulted PKG will have no compression, cpio now uses odc format and specify the owner of the file as well as mkbom.
Bash script to extract pkg: (Inspired by this answer:https://stackoverflow.com/a/23950738/16923394)
Save the following code to a file named pkg-upack.sh on the $HOME/Downloads folder
#!/usr/bin/env bash
filename="$*"
dirname="${filename/\./_}"
mkdir "$dirname"
# pkgutil --expand "$filename" "$dirname"
xar -xf "$filename" -C "$dirname"
cd "$dirname"/*.pkg
pwd
# tar xvf Payload
cat Payload | gunzip -dc |cpio -i
# cd usr/local/bin
# pwd
# ls -lt
# cp -i * $HOME/Downloads/
Uncomment the last four lines, if you are using a rudix package.
Usage:
cd $HOME/Downloads
chmod +x ./pkg-upack.sh
./pkg-upack.sh MyPackage.pkg
This was tested with the ffmpeg and mawk package from rudix.org (https://rudix.org) search for ffmpeg and mawk packages on this site.
Source : My open source projects : https://sourceforge.net/u/nathan-sr/profile/
I often need to fetch tgz files, decompress them, and then delete the tgz.
How can I do all three steps with one simple command?
wget http://site/path/file.tgz -O - | tar -zxvf -
You can can use:
curl <url> | tar xz
Or put in your bashrc:
function ctxz {
curl $1 | tar xz
}
and just use:
ctxz <url>