Moving files inside a tar archive - bash

I have a script that archives a mongo collection:
archive.tar.gz contains:
folder/file.bson
and I need to add a additional top level folder to that structure, example:
top-folder/folder/file.bson
It seems that one way is to unpack and re-pack everything but is there any other solution to this ?
The problem is that there's is a third party script that unpacks the archive and fetches the files from top-folder/folder/file.bson and in current formal, the path is wrong.

.tar.gz is actually what the name suggests - first tar converts a directory structure to a byte stream (i.e. a single file), and this byte stream is then compressed by gzip.
Which means that changing the file path inside the archive is equal to byte-editing a compressed data stream - an unnecessarily difficult thing to do without decompressing the stream.

Related

How do I create a GZIP bundle in NiFi?

I have thousands of files that I want to GZIP together to make sending them more efficient. I used MergeContent, but that creates zip files, not GZIP. The system on the other side is only looking for GZIP. I can use CompressContent to create a single GZIP file, but that's not efficient for sending across the network. Also I need to preserve headers on the individual files which is why I wanted to use MergeContent.
I could write the files to disk as flowfile packages, run a script, pick up the result, then send it, but I would think I can do that in NiFi without writing to disk.
Any suggestions?
You are confusing compression with archiving.
Tar or Zip is method of archiving 1 or more input files into a single output file. E.g. file1.txt, file2.txt and file3.txt are separate files that are archived into files.tar. When you unpack the archive, you get all 3 files back as they were. An archive is not necessarily compressed.
GZIP is a method of compression, with the goal of reducing the size of the file. It takes 1 input, compresses it, and gives 1 output. E.g. You input file1.txt which is 100Kb, you compress it, you get file1.txt.gz which is 3Kb.
MergeContent is merging, thus is can produce archives like ZIP and TAR. It is not compressing.
CompressContent is compressing, thus it can produce compressed files like GZIP. It is not merging.
If you want to combine many files into a compressed archive like a tar.gz then you can use MergeContent (tar) > CompressContent (gzip). This will first archive all of the input FlowFiles into a tar file, and then GZIP compress the tar into a tar.gz.
See this answer for more detail on compression vs archiving: Difference between archiving and compression
(Note: MergeContent has an optional Compression flag when using it to create ZIPs, so in that one specific use-case it can also apply some compression to the archive, but it is only for zip)

Chilkat unzip files only from root directory

zip.UnzipMatching("qa_output","*.xml",true)
With this syntax I can unzip every Xml in every directory from my zip file and create the same directory structure.
But how can I unzip only the xml in the root directory?
I cannot understand how to write the filter.
I tried with "/*.xml" but nothing is extracted.
If I write "*/*.xml" I only extract xml files from subdirectory (and I skip the xml in the root directory!).
Can anyone help me?
example of a zip files content:
a1.xml
b1.xml
c1.xml
dir1\a2.xml
dir1\c2.xml
dir2\dir3\c3.xml
with unzipmatching("qa_output","*.xml", true) I extract all this files with the original directory structure, but I want to extract only a1.xml, b1.xml and c1.xml.
Is there a way to write filter to achieve this result, or a different command, or a different approach?
I think what you want is to call UnzipMatchingInto: All files (matching the pattern) in the Zip are unzipped into the specfied dirPath regardless of the directory path information contained in the Zip. This has the effect of collapsing all files into a single directory. If several files in the Zip have the same name, the files unzipped last will overwrite the files already unzipped.

Create a .tar.bz2 file given an array of files

In a Bash script, I have an array that contains a list of files (in the form of their complete file paths):
declare -a individual_files=("/path/to/a" "/path/to/b" "/path/to/c")
I want to create a compressed file in tar.bz2 which contains all the files in the array, using tar command.
So far, I have tried
tar rf files.tar "${individual_files[#]}"
tar cjf files.tar.bz2 files.tar
But for some reason, files.tar.bz2 always contains the last file in the array only.
What is the correct command(s) for doing so, preferably without creating the intermediate .tar file?
UPDATED: using #PanRuochen's answer, this is what I see in the verbose info:
+ tar cfvj /Users/skyork/test.tar.bz2 /Users/skyork/.emacs /Users/skyork/.Rprofile /Users/skyork/.aspell.en.pws /Users/skyork/.bash_profile /Users/skyork/.vimrc /Users/skyork/com.googlecode.iterm2.plist
tar: Removing leading '/' from member names
a Users/skyork/.emacs
a Users/skyork/.Rprofile
a Users/skyork/.aspell.en.pws
a Users/skyork/.bash_profile
a Users/skyork/.vimrc
a Users/skyork/com.googlecode.iterm2.plist
But still, the resulted test.tar.bz2 file has only the last file of the array (/Users/skyork/com.googlecode.iterm2.plist) in it.
My bad, the files are indeed there but hidden.
tar cfvj files.tar.bz2 "${individual_files[#]}"
v should give you verbose information about how bz2 file is created.

How to restore a folder structure 7Zip'd with split volume option?

I 7Zip'd a multi-gig folder which contained many folders each with many files using the split to volumes (9Meg) option. 7Zip created files of type .zip.001,
.zip.002, etc. When I extract .001 it appears to work correctly but I get an 'unexpected end of data' error. 7Zip does not automatically go to .002. When I extract .002, it also gives the same error and it does not continue the original folder/file structure. Instead it extracts a zip file in the same folder as the previously extracted files. How do I properly extract split files to obtain the original folder/file structure? Thank you.

Listing the contents of a LZMA compressed file?

Is it possible to list the contents of a LZMA file (.7zip) without uncompressing the whole file? Also, can I extract a single file from the LZMA file?
My problem: I have a 30GB .7z file that uncompresses to >5TB. I would like to manipulate the original .7z file without needing to do a full uncompress.
Yes. Start with XZ Utils. There are Perl and Python APIs.
You can find the file you want from the headers. Each file is compressed separately, so you can extract just the one you want.
Download lzma922.tar.bz2 from the LZMA SDK files page on Sourceforge, then extract the files and open up C/Util/7z/7zMain.c. There, you will find routines to extract a specific archive file from a .7z archive. You don't need to extract all the data from all the entries, the example code shows how to extract just the one you are interested in. This same code has logic to list the entries without extracting all the compressed data.
I solved this problem by installing 7zip (https://www.7-zip.org/) and using the parameter l. For example:
7z l file.7z
The output has some descriptive information and the list of files in the compressed files. Then, I call this inside python using the subprocess library:
import subprocess
output = subprocess.Popen(["7z","l", "file.7z"], stdout=subprocess.PIPE)
output = output.stdout.read().decode("utf-8")
Don't forget to make sure the program 7z is accessible in your PATH variable. I had to do this manually in Windows.

Resources