Want to grep in .tar.gz file in Solaris - bash

i have a file in .tar.gz format in solaris. i want to grep some line from that. i am using zipgrep command but unable to find the line. Below is my sample file and command that i am using.
Sample file:
BD201701.tar.gz
BD201702.tar.gz
BD201703.tar.gz
i am using below command to search a line that contains bangladesh.
zipgrep 'bangladesh' BD2017*
But it;s showing below error.
[BD201701.tar.gz]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
zipinfo: cannot find zipfile directory in one of BD201701.tar.gz or
BD201701.tar.gz.zip, and cannot find BD201701.tar.gz.ZIP, period.
/usr/bin/zipgrep: test: argument expected

zipgrep has been made for processing PKZIP archive files. In PKZIP archives the compression is applied individually on each contained file, so the process boils down to a sequence of operations like this (not actual code!):
foreach file in archive:
unzip file to tmpfile
grep tmpfile
A compressed tar archive is different. First you pack a large bunch of files into an archive, and then the compression is applied to the whole bunch. So to search inside that the whole archive has to be unpacked first, i.e. something like this (pseudocode again):
make and change to temporary directory
tar -xZf ${archive}
grep -R ${seachstring} ./
However a tar archive itself is just a bunch of files "glued" together, with some information about their filename and size inbetween. So you could simply decompress the archive into a pipe, disregarding file spearation and search through that
zcat file | grep

zgrep does not work on solaris servers. In case the requirement is to find a pattern in a/all file(s) inside a directory given that the files are zipped, the following command can be used.
gzgrep 'pattern' filename
or
gzgrep 'bangladesh' BD2017*

use zgrep:
zgrep 'bangladesh' BD2017*

Related

Linux Bash - modifying extracted text from stdout

I would like to recursively scan a given directory for all .zip files, extract text from each such a file using Apache Tika (in my case this is /opt/solr/bin/post script) into a single text file and put that text file into the same directory where the original zip file is.
To find all zip files recursively and extract all the content I use:
find . -name "*zip" -exec sh -c 'f="{}"; /opt/solr/bin/post "$f" \
-params="...params..." > "$f.txt"' \;
The content of the extracted file is:
java -classpath /opt/solr/dist/solr-core-8.7.0.jar -Dauto=yes -Dout=yes -
Dparams=literal.search_area=test&extractOnly=true
&extractFormat=text&defaultField=text -Dc=mycoll
-Ddata=files org.apache.solr.util.SimplePostTool zip.zip
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/mycoll/update?
literal.search_area=test&extractOnly=true&extractFormat=text
&defaultField=text...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,
odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file zip.zip (application/octet-stream) to [base]/extract
{
"responseHeader":{
"status":0,
"QTime":1614},
"":"**EXTRACTED TEXT**",
"null_metadata":[
"stream_size",["79855"],
"X-Parsed-By",["org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.pkg.PackageParser"],
"stream_content_type",["application/octet-stream"],
"resourceName",["/mnt/remote/users/zhilov/!tmp/zip.zip"],
"Content-Type",["application/zip"]]}
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/mycoll/update?
literal.search_area=test&extractOnly=true&
extractFormat=text&defaultField=text...
Time spent: 0:00:03.495
From that output I would like to cut out the beginning and the end of the file leaving only EXTRACTED TEXT inside of the generated file for further indexing.
Is that possible to do all those operations in one bash command line? Or at least with a bash script?
Try this:
sed -n '/QTime/{N;s/.*\n.*:.//;s/.,$//p;}'
This question addresses the UTF-8 problem.

Extracting specific files with file extension from a .tar.xz archive using MacOS terminal

I have a number of compressed archives with the extension .tar.xz. I am advised that, when decompressed, the total size required is around 2TB.
Within the archives are a number of images that I am solely after.
Is there a method to solely extract files for example with the extensions .jpeg, .jpeg and .gif from the compressed archives without having to extract every file?
Thanks
It's trivial to just extract one of the file types; for example:
tar -xjf archive.tar.xz '*.jpeg'
will extract all files with the .jpeg extension. It's important to quote the *, as otherwise the shell would attempt to expand it, and would only try to match only the files that were found (or fail because there were no files with that name).
You can similarly use other patterns like '*.gif', or both together:
tar -xjf archive.tar.xz '*.jpeg' '*.gif'
Because you tag that you're using OSX, I'll skip the need to use the --wildcards option, which is needed when trying to extract only those files under linux.

Create a .tar.bz2 file given an array of files

In a Bash script, I have an array that contains a list of files (in the form of their complete file paths):
declare -a individual_files=("/path/to/a" "/path/to/b" "/path/to/c")
I want to create a compressed file in tar.bz2 which contains all the files in the array, using tar command.
So far, I have tried
tar rf files.tar "${individual_files[#]}"
tar cjf files.tar.bz2 files.tar
But for some reason, files.tar.bz2 always contains the last file in the array only.
What is the correct command(s) for doing so, preferably without creating the intermediate .tar file?
UPDATED: using #PanRuochen's answer, this is what I see in the verbose info:
+ tar cfvj /Users/skyork/test.tar.bz2 /Users/skyork/.emacs /Users/skyork/.Rprofile /Users/skyork/.aspell.en.pws /Users/skyork/.bash_profile /Users/skyork/.vimrc /Users/skyork/com.googlecode.iterm2.plist
tar: Removing leading '/' from member names
a Users/skyork/.emacs
a Users/skyork/.Rprofile
a Users/skyork/.aspell.en.pws
a Users/skyork/.bash_profile
a Users/skyork/.vimrc
a Users/skyork/com.googlecode.iterm2.plist
But still, the resulted test.tar.bz2 file has only the last file of the array (/Users/skyork/com.googlecode.iterm2.plist) in it.
My bad, the files are indeed there but hidden.
tar cfvj files.tar.bz2 "${individual_files[#]}"
v should give you verbose information about how bz2 file is created.

Unable to create the md5sum file I need to create. Manually doing it would be far too labour-intensive

I need to create/recreate an md5sum file for all files in a directory and all files in all sub-directories of that directory.
I am using a rockettheme template that requires a valid md5sum document and I have made changes to the files, so the originally included md5sum file is no longer valid.
There are over 300 files that need to be checksummed, and the md5hash added to a single file.
The basic structure of the file is as follows:
1555599f85c7cd6b3d8f1047db42200b admin/forms/fields/imagepicker.php
8a3edb0428f11a404535d9134c90063f admin/forms/fields/index.html
8a3edb0428f11a404535d9134c90063f admin/forms/index.html
8a3edb0428f11a404535d9134c90063f admin/index.html
8a3edb0428f11a404535d9134c90063f admin/presets/index.html
b6609f823ffa5cb52fc2f8a49618757f admin/presets/preset1.png
7d84b8d140e68c0eaf0b3ee6d7b676c8 admin/presets/preset2.png
0de9472357279d64771a9af4f8657c2a admin/presets/preset3.png
5bda28157fe18bffe11cad1e4c8a78fa admin/presets/preset4.png
2ff2c5c22e531df390d2a4adb1700678 admin/presets/preset5.png
4b3561659633476f1fd0b88034ae1815 admin/presets/preset6.png
8a3edb0428f11a404535d9134c90063f admin/tips/index.html
2afd5df9f103032d5055019dbd72da38 admin/tips/overview.xml
79f1beb0ce5170a8120ba65369503bdc component.php
caf4a31db542ca8ee63501b364821d9d css/grid-responsive.css
8a3edb0428f11a404535d9134c90063f css/index.html
8697baa2e31e784c8612e2c56a1cd472 css/master-gecko.css
0857bc517aa15592eb796553fd57668b css/master-ie10.css
a4625ce5b8e23790eacb7704742bf735 css/master-ie8.css
This is just a snippet, but the logic is there.
hash path/to/file/relative/to/MD5SUM_file
Can anyone help me write a shell script (bash shell) that I can add to my path that will execute and generate a file called "MD5SUM_new"? I want the output file name to be "MD5SUM_new" so I can review the content before issuing a mv MD5SUM_new MD5SUM
FYI, the MD5SUM_new file needs to be saved in the root level of the template.
Thanks
This is quite easy, really. To hash all files under the current directory:
find . -type f | xargs md5sum > md5sums
Then, you can make sure it's correct:
md5sum -c md5sums

Replacing a file in a zip archive

Using Ruby (1.9.3) I need to replace a single file in a zip archive.
The situation is as follows. I have ~1000 zip archives that need to be updated, specifically one file in each of them needs to be replaced. The archives are all of the same structure. Is there a quick and dirty way for Ruby, or a library/gem for Ruby, to simply say "replace the file in this zip archive with this file on the filesystem"?
I'll work on a solution of my own in the meantime.
You can use the zip command, called from the ruby, which probably will be the best solution. From the zip manpage zip manpage
-d
--delete
Remove (delete) entries from a zip archive. For example:
zip -d foo foo/tom/junk foo/harry/\* \*.o
will remove the entry foo/tom/junk, all of the files that start with foo/harry/, and all of the files that end with .o (in any path). Note that shell path‐
name expansion has been inhibited with backslashes, so that zip can see the asterisks, enabling zip to match on the contents of the zip archive instead of the
contents of the current directory. (The backslashes are not used on MSDOS-based platforms.) Can also use quotes to escape the asterisks as in
zip -d foo foo/tom/junk "foo/harry/*" "*.o"
Not escaping the asterisks on a system where the shell expands wildcards could result in the asterisks being converted to a list of files in the current
directory and that list used to delete entries from the archive.
Under MSDOS, -d is case sensitive when it matches names in the zip archive. This requires that file names be entered in upper case if they were zipped by
PKZIP on an MSDOS system. (We considered making this case insensitive on systems where paths were case insensitive, but it is possible the archive came from
a system where case does matter and the archive could include both Bar and bar as separate files in the archive.) But see the new option -ic to ignore case
in the archive.
If you want a pure ruby solution take a look at ZipFileSystem
Zip::ZipFile looks promising. It appears to have a way to delete and add files to a zip archive.

Resources