List and Remove directories in an archive - bash

I wonder how to list the content in an archive file and remove some directories from it?
For example, I have an archive file data.tar.
I would like to list its content without extracting it. Is it possible to control the level of directory for viewing? I mean not necessarily every files, but just down to some level of the path.
I also would like to remove some directories matching "*/count1000" from it.

to see the contents of the tar file,
tar tvf mytar.tar
to extract a file,
tar xvf mytar.tar myfile.txt
to delete a file
tar -f mytar.tar --delete */count1000

You can strip the 1st path component with:
cat myarchive.tar.gz | tar -tzf - | grep --only-matching -e "/.*"
You can strip the 2nd path component with:
cat myarchive.tar.gz | tar -tzf - | grep --only-matching -e "/.*/.*"
etc.

Related

Find last created tar.gz and extract it

I need to find last created tar.gz file and extract it to some directory, something like this:
ls -t $(pwd)/Backup_db/ | head -1 | xargs tar xf -C /somedirectory
How to do it the right way in CentOS 7?
You can find out the most recently edited file in a subshell, and then use that in place of a filename. The new directory can be created, and then the tar file can be extracted to it.
new_dir="path/to/new/dir"
mkdir -p $new_dir
tar -zxvf $(ls -t *.tar.gz | head -1) -C $new_dir
Note that ls -t <dir> will not show the full <dir>/<filename> path for the files, but ls -t <dir>/* will, so after also reordering xargs flags (and forcing -n1 for safety), below should work for you:
ls -t $(pwd)/Backup_db/*.tar.gz | head -1 | xargs -n1 tar -C /somedirectory -xf

How to remove empty directories from a tarball in-place

I extracted a layer from a docker image which archived in a file called layer.tar. I want to remove empty directories from it.
I don't want to unpack then repack files in that archive, I want to keep the original info, so I want to do it in-place.
I know how to delete files from tar but I don't know any simple method to delete empty directories in-place.
Let's create a archive t.tar with a/b/c/ and a/b/c/d/ empty directories:
mkdir -p dir
cd dir
mkdir -p a/b/c/d
mkdir -p 1/2/3/4
touch a/fil_ea a/b/file_ab # directory a/b/c and a/b/c/d are empty
touch 1/2/3/file_123 1/2/3/4/file_1234 # directories 1/2/3/4 not empty
tar cf ../t.tar a 1
cd ..
Using tar tf and some filtering we can extract the directories and files in a tar archive. Then for each directory in tmpdirs we can check if it has any files in tmpfiles with a simple grep and then remove those directories using --delete tar option:
tar tf t.tar | tee >(grep '/$' > tmpdirs) | grep -v '/$' > tmpfiles
cat tmpdirs | xargs -n1 -- sh -c 'grep -q "$1" tmpfiles || echo "$1"' -- \
| tac \
| xargs -- tar --delete -f t.tar
Not that tac is a bit unneeded, but the files where sorted alphabetically in tar, so when tar removes the directory a/b/c/ with all subdirectories first and then tries to remove a/b/c/d/ directory it fails with an Not found in archive in error. tac is a cheap way to fix that, so tar first removes a/b/c/d/ and then a/b/c/.

How to delete all files not in a set

I have a plain text file with a list of file names. For example,
A.doc
E.doc
F.pdf
I would like to delete all files in the current directory except for those.
Can this be done in bash?
Let's say the list of files not to delete is goodfiles.txt. Then:
ls | grep -vx -f goodfiles.txt
Gives you the list of "other" files, that you want to delete. If you confirm those are the files you want to delete, then:
ls | grep -vx -f goodfiles.txt | xargs -d '\n' rm

Decompressing a tarball containing a single root directory while renaming the root destination

Here is the scenario:
$ wget "http://foo.bar/repository/nightly/src/foo-latest.tar.gz"
$ tar -xzf foo-lastest.tar.gz
$ ls # the archive root contained a single directory named after software name and the build date
foo-20140115-0024
What you want is that in the end, the extracted files are placed in the directory foo, instead of foo-20140115-0024. You can of course move the directory once decompressed :
$ mv `tar -tvzf foo-latest.tar.gz | head -n1 | awk '{print $6}'` foo
Here is the question: is there a shorter/proper to perform the same result?
This should work:
$ mkdir foo
$ tar -C foo --strip-components=1 -xzf foo-latest.tar.gz
First we create output directory.
After that we use -C to extract archive to that directory and --strip-components to get rid of root directory from archive.

how to untar the file and rename the folder in one command line operation?

I want to download a file, untar it and rename the folder.
I am able to download the file and untar it with
curl https://s3.amazonaws.com/sampletest/sample.tar.gz | tar xz
How can I rename the folder in the same command?
curl https://s3.amazonaws.com/sampletest/sample.tar.gz | tar xz | mv ???????
I do not want to use the folder name explicitly in the command.
It's possible, but not trivial. It's easier to create your own directory, cd into it, then pass --strip-components 1 or --strip-path 1 to tar if your tar (e.g. GNU Tar) supports it.
File name transformations:
--strip-components=NUMBER strip NUMBER leading components from file
names on extraction
--transform=EXPRESSION, --xform=EXPRESSION
use sed replace EXPRESSION to transform file names
If your system hasn't GNU tar installed, it might still have pax (a POSIX tool) available. The latter supports the -s option which allows arbitrary changes in the path name of the processed files.
That would then be:
curl https://s3.amazonaws.com/sampletest/sample.tar.gz | gunzip | pax -r -s "/old/new/"

Resources