How to remove empty directories from a tarball in-place

How to remove empty directories from a tarball in-place - bash

I extracted a layer from a docker image which archived in a file called layer.tar. I want to remove empty directories from it.
I don't want to unpack then repack files in that archive, I want to keep the original info, so I want to do it in-place.
I know how to delete files from tar but I don't know any simple method to delete empty directories in-place.

Let's create a archive t.tar with a/b/c/ and a/b/c/d/ empty directories:
mkdir -p dir
cd dir
mkdir -p a/b/c/d
mkdir -p 1/2/3/4
touch a/fil_ea a/b/file_ab # directory a/b/c and a/b/c/d are empty
touch 1/2/3/file_123 1/2/3/4/file_1234 # directories 1/2/3/4 not empty
tar cf ../t.tar a 1
cd ..
Using tar tf and some filtering we can extract the directories and files in a tar archive. Then for each directory in tmpdirs we can check if it has any files in tmpfiles with a simple grep and then remove those directories using --delete tar option:
tar tf t.tar | tee >(grep '/$' > tmpdirs) | grep -v '/$' > tmpfiles
cat tmpdirs | xargs -n1 -- sh -c 'grep -q "$1" tmpfiles || echo "$1"' -- \
| tac \
| xargs -- tar --delete -f t.tar
Not that tac is a bit unneeded, but the files where sorted alphabetically in tar, so when tar removes the directory a/b/c/ with all subdirectories first and then tries to remove a/b/c/d/ directory it fails with an Not found in archive in error. tac is a cheap way to fix that, so tar first removes a/b/c/d/ and then a/b/c/.

Related

Find last created tar.gz and extract it

I need to find last created tar.gz file and extract it to some directory, something like this:
ls -t $(pwd)/Backup_db/ | head -1 | xargs tar xf -C /somedirectory
How to do it the right way in CentOS 7?

You can find out the most recently edited file in a subshell, and then use that in place of a filename. The new directory can be created, and then the tar file can be extracted to it.
new_dir="path/to/new/dir"
mkdir -p $new_dir
tar -zxvf $(ls -t *.tar.gz | head -1) -C $new_dir

Note that ls -t <dir> will not show the full <dir>/<filename> path for the files, but ls -t <dir>/* will, so after also reordering xargs flags (and forcing -n1 for safety), below should work for you:
ls -t $(pwd)/Backup_db/*.tar.gz | head -1 | xargs -n1 tar -C /somedirectory -xf

Copy files from list while keeping subdirectory structure

I have a text file which specifies files that need to be copied:
...
b/bamboo/forest/00000456.jpg
b/bamboo/forest/00000483.jpg
...
c/corridor/00000334.jpg
c/corridor/00000343.jpg
...
However, I would like to copy them while preserving their subdirectory structure. So the result would be:
...
newfolder/b/bamboo/forest/00000483.jpg
newfolder/b/bamboo/forest/00000456.jpg
...
newfolder/c/corridor/00000334.jpg
newfolder/c/corridor/00000343.jpg
...
I have this cat /path/to/files.txt | xargs cp -t /dest/path/. But it just copies everything to one directory.

You can use cp --parents:
--parents -- append source path to target directory
cat /path/to/files | xargs cp --parents -t new_directory
If that isn't working for you, then you can take the boring approach and iterate over each file in /path/to/files.txt and use mkdir -p to make target directories as needed, and then simply copy the file:
while read -r file; do
new_dir="new_directory/$(dirname "$file")"
# ^ this is the new directory root
mkdir -p "$new_dir"
cp "$file" "$new_dir/$file"
done < <(cat /path/to/files.txt)

Decompressing a tarball containing a single root directory while renaming the root destination

Here is the scenario:
$ wget "http://foo.bar/repository/nightly/src/foo-latest.tar.gz"
$ tar -xzf foo-lastest.tar.gz
$ ls # the archive root contained a single directory named after software name and the build date
foo-20140115-0024
What you want is that in the end, the extracted files are placed in the directory foo, instead of foo-20140115-0024. You can of course move the directory once decompressed :
$ mv `tar -tvzf foo-latest.tar.gz | head -n1 | awk '{print $6}'` foo
Here is the question: is there a shorter/proper to perform the same result?

This should work:
$ mkdir foo
$ tar -C foo --strip-components=1 -xzf foo-latest.tar.gz
First we create output directory.
After that we use -C to extract archive to that directory and --strip-components to get rid of root directory from archive.

How do I write a shell script to remove the unzipped files in a wrong directory?

I accidentally unzipped files into a wrong directory, actually there are hundreds of files... now the directory is messed up with the original files and the wrongly unzip files. I want to pick the unzipped files and remove them using shell script, e.g.
$unzip foo.zip -d test_dir
$cd target_dir
$ls test_dir | rm -rf
nothing happened, no files were deleted, what's wrong with my command ? Thanks !

The following script has two main benefits over the other answers thus far:
It does not require you to unzip a whole 2nd copy to a temp dir (I just list the file names)
It works on files that may contain spaces (parsing ls will break on spaces)
while read -r _ _ _ file; do
arr+=("$file")
done < <(unzip -qql foo.zip)
rm -f "${arr[#]}"

Right way to do this is with xargs:
$find ./test_dir -print | xargs rm -rf
Edited Thanks SiegeX to explain to me OP question.
This 'read' wrong files from test dir and remove its from target dir.
$unzip foo.zip -d /path_to/test_dir
$cd target_dir
(cd /path_to/test_dir ; find ./ -type f -print0 ) | xargs -0 rm
I use find -0 because filenames can contain blanks and newlines. But if not is your case, you can run with ls:
$unzip foo.zip -d /path_to/test_dir
$cd target_dir
(cd /path_to/test_dir ; ls ) | xargs rm -rf
before to execute you should test script changing rm by echo

Try
for file in $( unzip -qql FILE.zip | awk '{ print $4 }'); do
rm -rf DIR/YOU/MESSED/UP/$file
done
unzip -l list the content with a bunch of information about the zipped files. You just have to grep the file name out of it.
EDIT: using -qql as suggested by SiegeX

The following worked for me (bash)
unzip -l filename.zip | awk '{print $NF}' | xargs rm -Rf

Do this:
$ ls test_dir | xargs rm -rf

You need ls test_dir | xargs rm -rf as your last command
Why:
rm doesn't take input from stdin so you can't pipe the list of files to it. xargs takes the output of ls command and presents it to rm as input so that it can delete it.

Compacting the previous one. Run this command in the /DIR/YOU/MESSED/UP
unzip -qql FILE.zip | awk '{print "rm -rf " $4 }' | sh
enjoy

List and Remove directories in an archive

I wonder how to list the content in an archive file and remove some directories from it?
For example, I have an archive file data.tar.
I would like to list its content without extracting it. Is it possible to control the level of directory for viewing? I mean not necessarily every files, but just down to some level of the path.
I also would like to remove some directories matching "*/count1000" from it.

to see the contents of the tar file,
tar tvf mytar.tar
to extract a file,
tar xvf mytar.tar myfile.txt
to delete a file
tar -f mytar.tar --delete */count1000

You can strip the 1st path component with:
cat myarchive.tar.gz | tar -tzf - | grep --only-matching -e "/.*"
You can strip the 2nd path component with:
cat myarchive.tar.gz | tar -tzf - | grep --only-matching -e "/.*/.*"
etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to remove empty directories from a tarball in-place - bash

Related

Find last created tar.gz and extract it

Copy files from list while keeping subdirectory structure

Decompressing a tarball containing a single root directory while renaming the root destination

How do I write a shell script to remove the unzipped files in a wrong directory?

List and Remove directories in an archive

Categories

Resources