How to create tar files automatically - bash

I like to create tar-files to distribute some scripts using bash.
For every script certain configuration-files and libraries (or toolboxes) are needed,
e.g. a script called CheckTool.py needs Checks.ini, CheckToolbox.py and CommontToolbox.py to run, which are stored in specific folders on my harddisk and need to be copied in the same manner on the users harddisk.
I can create a tarfile manually for each script, but i like to have it more simple.
For this i have the idea to define a list of all needed files and their pathes for a specific script and read this in a bashscript, which creates the tar file.
I started with:
#!/bin/bash
while read line
do
echo "$line"
done < $1
Which is reading the files and pathes. In my example the lines are:
./CheckTools/CheckMesh.bs
./Configs/CheckMesh.ini
./Toolboxes/CommonToolbox.bs
./Toolboxes/CheckToolbox.bs
My question is how do I have to organize the data to make a tar file with the specified files using bash?
Or is there someone having a better idea?

No need for a complicated script, use option -T of tar. Every file listed in there will be added to the tar file:
-T, --files-from FILE
get names to extract or create from FILE
So your script becomes:
#!/bin/bash
tar -cvpf something.tar -T listoffiles.txt
listoffiles.txt format is super easy, one file per line. You might want to put full path to ensure you get the right files:
./CheckTools/CheckMesh.bs
./Configs/CheckMesh.ini
./Toolboxes/CommonToolbox.bs
./Toolboxes/CheckToolbox.bs
You can add tar commands to the script as needed, or you could loop on the list files, from that point on, your imagination is the limit!

Related

How do I use grep command to search in .bz2.gz.bz2 file?

Basically I have .bz2.gz.bz2 file which on extraction gives a .bz2.gz file and on again extraction gives .bz2 file. In this .bz2 file, is my txt file which I want to search on using grep command. I have searched for this but I got bzgrep command which will only search in bz2 file and not the corresponding .gz.bz2 file and give me no results.
Is there a command in unix system which will recursively search in a zipped archive for zipped archive and return results only when it finds the txt file inside it?
P.S: the txt file may be deep in the archive to level 10 max. I want the command to recursively find the txt file and search for the required string. And there will be no other than an archive inside the archive until the txt file level.
I'm not sure I fully understand but maybe this will help:
for i in /path/to/each/*.tar.bz2; do
tar -xvjf "$i" -C /path/to/save/in
rm $i
done
extract all `tar.bz2` and save them in directory then remove the `.bz2`
Thnx for sharing your question.
There are a couple of strange things with it though:
It makes no sense to have a .bz2.gz.bz2 file, so have you created this file yourself? If so, I'd advise you to reconsider doing so in that manner.
Also, you mention there is a .bz2 that would apparently contain different archives, but a .bz2 can only contain one single file by design. So if it contains archives it is probably a .tar.bz2 file in which the tar-file holds the actual archives.
In answer to your question, why can't you write a simple shell script that will unpack your .bz2.gz.bz2 into a .bz2.gz and then into a .bz2 file and then execute your bzgrep command on that file?
I do not understand where it is exactly that you seem to get stuck..

How can I find a directory-diff of millions of files to script maintenance?

I have been working on how to verify that millions of files that were on file system A have infact been moved to file system B. While working on a system migration, it became evident that all the files needed to be audited to prove that the files have been moved. The files were initially moved via rsync, which does provide logs, although not in a format that is helpful for doing an audit. So, I wrote this script to index all the files on System A:
#!/bin/bash
# Get directories and file list to be used to verify proper file moves have worked successfully.
LOGDATE=`/usr/bin/date +%Y-%m-%d`
FILE_LIST_OUT=/mounts/A_files_$LOGDATE.txt
MOUNT_POINTS="/mounts/AA mounts/AB"
touch $FILE_LIST_OUT
echo TYPE,USER,GROUP,BYTES,OCTAL,OCTETS,FILE_NAME > $FILE_LIST_OUT
for directory in $MOUNT_POINTS; do
# format: type,user,group,bytes,octal,octets,file_name
gfind $directory -mount -printf "%y","%u","%g","%s","%m","%p\n" >> $FILE_LIST_OUT
done
The file indexing works fine and takes about two hours to index ~30 million files.
On side B is where we run into issues. I have written a very simple shell script that reads the index file, tests to see if the file is there, and then counts up how many files are there, but it's running out of memory while looping through the 30 million lines on indexed file names. Effectively doing this little bit of code below through a while loop, and counters to increment for files found and not found.
if [ -f "$TYPE" "$FILENAME" ] ; then
print file found
++
else
file not found
++
fi
My questions are:
Can a shell script do this type of reporting from such a large list. A 64 bit unix system ran out of memory while trying to execute this script. I have already considered breaking up the input script into smaller chunks to make it faster. Currently it can
If as shell script is inappropriate, what would you suggest?
You just used rsync, use it again...
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn’t affect the data that goes into the file-lists, and thus it doesn’t affect deletions. It just limits the files that the receiver requests to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to continue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hierarchy (when it is used properly), using --ignore existing will ensure that the already-handled files don’t get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that this option is only looking at the existing files in the destination hierarchy itself.
That will actually fix any problems (at least in the same sense that any diff-list on file-exist tests could fix problem. Using --ignore-existing means rsync only does the file-exist tests (so it'll construct the diff list as you request and use it internally). If you just want information on the differences, check --dry-run, and --itemize-changes.
Lets say you have two directories, foo and bar. Let's say bar has three files, 1,2, and 3. Let's say that bar, has a directory quz, which has a file 1. The directory foo is empty:
Now, here is the result,
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/
>f+++++++++ 1
>f+++++++++ 2
>f+++++++++ 3
cd+++++++++ quz/
>f+++++++++ quz/1
Note, you're not interested in the cd+++++++++ -- that's just showing you that rsync issued a chdir. Now, let's add a file in foo called 1, and let's use grep to remove the chdir(s),
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/ | grep -v '^cd'
>f+++++++++ 2
>f+++++++++ 3
>f+++++++++ quz/1
f is for file. The +++++++++ means the file doesn't exist in the DEST dir.
Here is the bonus, remove --dry-run, and, it'll go ahead and make the changes for you.
Have you considered a solution such as kdiff3, which will diff directories of files ?
Note the feature for version 0.9.84
Directory-Comparison: Option "Full Analysis" allows to show the number
of solved vs. unsolved conflicts or deltas vs. whitespace-changes in
the directory tree.
There is absolutely no problem reading a 30 million line file in a shell script. The reason why your process failed was most likely that you tried to read the file entirely into memory, e.g. by doing something wrong like for i in $(cat file).
The correct way of reading a file is:
while IFS= read -r line
do
echo "Something with $line"
done < someFile
A shell script is inappropriate, yes. You should be using a diff tool:
diff -rNq /original /new
If you're not particular about the solution being a script, you could also look into meld, which would let you diff directory trees quite easily and you can also set ignore patterns if you have any.

shell "if" statement

I am new to unix and am practicing a simple script to unzip a load of files within a specified directory. I need the program to move the zipped file into another folder when it is done unzipping it (I called this oldzipped folder). For simplicity, I have removed the part of the code unzipping the file and currently have the program working for a specific file rather than the *tar.7z file extention. For some reason, the mv statement is not working. Unix is saying the following when I try to run the script. Could someone give me a hand with this? Again, I know this is the long way of doing things, but I want practice writing a script. Please be nice, as I am very new to Unix :(
unzip5: line 14: [ASDE0002.tar.7z]: command not found
#!~/bin/bash
# My program to try to unzip several files with ending of tar.7z
# I have inserted the ability to enter the directory where you want this to be done
echo "What file location is required for unzipping?"
read dirloc
cd $dirloc
mkdir oldzippedfiles
for directory in $dirloc
do
if
[ASDE0002.tar.7z]
then
mv -f ASDE0002.tar.7z $dirloc/oldzippedfiles
fi
done
echo "unzipping of file is complete"
exit 0
[ is the name of a (sometimes built-in) command which accepts arguments. As such you need to put a space after it as you would when invoking any other program. Also, you need a test. For example, to determine if the file exists and is a file, you need to use -f:
if [ -f ASDE0002.tar.7z ]
then
mv -f ASDE0002.tar.7z $dirloc/oldzippedfiles
fi
Here are some other possible tests.

extract specific folder in shell command using unzip

My backup.zip has the following structure.
OverallFolder
lots of files and subfolders inside
i used this unzip backup.zip -d ~/public_html/demo
so i end up with ~/public_html/demo/OverallFolder/my other files.
How do i extract so that i end up with all my files INSIDE OverallFolder GOING DIRECTLY into ~public_html/demo?
~/public_html/demo/my other files
like this?
if you can't find any options to do that, this is the last resort
mv ~/public_html/demo/OverallFolder/* ~/public_html/demo/
(cd ~public_html/demo; unzip $OLDPWD/backup.zip)
This, in a subshell, changes to your destination directory, unzips the file from your source directory, and when the subshell exits, leaves you back in your source directory.
That, or something similar, should work in most shells.

Bash script to archive files and then copy new ones

Need some help with this as my shell scripting skills are somewhat less than l337 :(
I need to gzip several files and then copy newer ones over the top from another location. I need to be able to call this script in the following manner from other scripts.
exec script.sh $oldfile $newfile
Can anyone point me in the right direction?
EDIT: To add more detail:
This script will be used for monthly updates of some documents uploaded to a folder, the old documents need to be archived into one compressed file and the new documents, which may have different names, copied over the top of the old. The script needs to be called on a document by document case from another script. The basic flow for this script should be -
The script file should create a new gzip
archive with a specified name (created from a prefix constant in the script and the current month and year e.g. prefix.september.2009.tar.gz) only if it
does not already exist, otherwise add to the existing one.
Copy the old file into the archive.
Replace the old file with the new one.
Thanks in advance,
Richard
EDIT: Added mode detail on the archive filename
Here's the modified script based on your clarifications. I've used tar archives, compressed with gzip, to store the multiple files in a single archive (you can't store multiple files using gzip alone). This code is only superficially tested - it probably has one or two bugs, and you should add further code to check for command success etc. if you're using it in anger. But it should get you most of the way there.
#!/bin/bash
oldfile=$1
newfile=$2
month=`date +%B`
year=`date +%Y`
prefix="frozenskys"
archivefile=$prefix.$month.$year.tar
# Check for existence of a compressed archive matching the naming convention
if [ -e $archivefile.gz ]
then
echo "Archive file $archivefile already exists..."
echo "Adding file '$oldfile' to existing tar archive..."
# Uncompress the archive, because you can't add a file to a
# compressed archive
gunzip $archivefile.gz
# Add the file to the archive
tar --append --file=$archivefile $oldfile
# Recompress the archive
gzip $archivefile
# No existing archive - create a new one and add the file
else
echo "Creating new archive file '$archivefile'..."
tar --create --file=$archivefile $oldfile
gzip $archivefile
fi
# Update the files outside the archive
mv $newfile $oldfile
Save it as script.sh, then make it executable:
chmod +x script.sh
Then run like so:
./script.sh oldfile newfile
something like frozenskys.September.2009.tar.gz, will be created, and newfile will replace oldfile. You can also call this script with exec from another script if you want. Just put this line in your second script:
exec ./script.sh $1 $2
A good refference for any bash scripting is Advanced Bash-Scripting Guide.
This guide explains every thing bash scripting.
The basic approach I would take is:
Move the files you want to zip to a directory your create.
(commands mv and mkdir)
zip the directory. (command gzip, I assume)
Copy the new files to the desired location (command cp)
In my experience bash scripting is mainly knowing how to use these command well and if you can run it on the command line you can run it in your script.
Another command that might be useful is
pwd - this returns the current directory
Why don't you use version control? It's much easier; just check out, and compress.
(apologize if it's not an option)

Resources