Bash Script to Extract split archive files using rar - bash

Here's what I want to do:
I have thousand of split rar archives on folder name Archives.
Name of the files 0001.part1.rar 0002.part2.rar 0003.part3.rar etc.
read 0001.part1.rar
create a directory based on prefix of the file above e.g. 0001
move all files with the same prefix on the directory created above.
Extract the files within that directory
delete all rar files in that directory
Rename the extracted file based on a list of names from a text file.
Rar the renamed file with different arguments.
Move the renamed file (from step 6) to a new directory called Done.
Proceed to file 0002.part1.rar then do steps 2-8 and so forth.
Additional how would I incorporate it with cron???
This should be run once only...
After extracting the first set of rar's files change to:
file.001
file.002
file.003
etc. which I need to extract too.
Clarification on Step 6:
After extracting the second set of rar's (file.001 , file.002 etc.)
I want to rename it based on a list of names from a text file.
e.g. List of files from a text file:
0001 - GB Funds.DAT
0002 - US Bonds.DAT
0003 - SG Securities.DAT
0004 - LU Credits.DAT
Clarification on Step 7:
After renaming the file I want to move it on a new folder called "Done"
Clarification on Step 9:
Go back to the main folder with all the other archives
and continue extracting the next set of archives and
do the same steps from 1 to 8.

You can write a shell script including something like this:
# foo.sh
set -e
set -u
for i in `find -max-depth 1 -type f -name '*.rar' | sed 's/\.part*\.rar$//' | sort -u`; do
mkdir $i
mv $i.part*rar $i
cd $i
unrar x $i.part1.rar
DST=`grep $i ../rename.txt | sed 's/^[0-9]\+ - //'`
mv $i "$DST"
# and so on, rar it together again, move it done directory etc.
cd ..
done
Run it then via:
bash foo.sh
You have to clarify 6./8./9.
I don't know why do you want to run it via cron, since you only want to run it once. at is designed for running one-time jobs, or run it in a screen session.
I suggest that you do a few tests with 1-3 files from your collection and the script you end up with, before starting the whole job.

Related

Mac OS - Batch Rename All Files in Folder but Disregard All SubFolders

I have a bunch of folders that I would like to rename all the files contained within minus any subdfolders.
For example lets say I have two parent folders:
ParentFolder1 - [PF1]
ParentFolder2 - [PF2]
Each parent folder has various amounts of subfolders:
SubParentFolder1_1
SubParentFolder1_2
SubParentFolder2_1
Inside the ParentFolder and each SubParentFolder there can be files such as .mp3, .txt. etc. or more subfolders.
How would I go about renaming all and any files in this manner:
example.mp3 -> example - [PF1]
example.txt -> example - [PF2]
example.docx -> example - [PF2]
Appreciate any input!
This is a way to list files (not folders) in a range of directories and then do something with them... Specifics of renaming are up to you.
for FOLD in Parent*;
do for FILE in $(ls -p $FOLD | grep -v "/");
do echo "$FOLD/$FILE" "$FOLD/${FILE%.*}";
done; done;
Explanation:
For each folder (FOLD) in directories matching the wildcard Parent*,
list the contents, adding / to the end of directory names.
Do inverse grep on that list, leaving just the file names.
Loop through each FILE and echo out the original folder+file, followed by the folder and file with the suffix removed by patten matching.
Once you test this, you can replace echo with mv to do the actual renaming... (I've put these on separate lines to make them more readable, but would usually run this as one long command.

How to copy specific files from directories, while the directories name was extracted from an excel file using Bash script

I'm new in Bash and I have a list of names of directories stored in an excel file. I'd like to find those directories (they are located in different location at the computer) and to copy from each directory specific files (list of 4 files that ends with specific endings) to a remote computer.
For examples:
For a name of directory at the excel sheet - "NA123", I'd like to find it and copy it's partial content to a remote computer, for example copy the files: samples-sheet.csv, toInfo.xml, newfiles.gz, todo.csv to the remote computer, under a folder name "NA123".
How do I begin to do that?
****Editing to give an example of how it needs to be*****
A short example of the csv is as below:
A
1 14RD00129_TS1_01
2 SD-2015-06_01
3 US-005
4 RA99
All the names at the csv are directories that can be found under /home/bella/samples under 3 different folders: some will be at /home/bella/samples/gruop_1, some at:/home/bella/samples/gruop_2, and some at:/home/bella/samples/gruop_3
So first I need to iterate through the csv file, to locate the match directory at my computer, then I need to copy 4 specific files to a remote computer with the same name of directory. Hope this is clearer...
I guess you CSV file should only consist of directory names then, since there's only one column. I assume there is no header line in the CSV (A in your example) and no line number. You can take this as a starting point:
samples='/home/bella/samples'
while IFS= read -r line; do
dir=$(find "$samples"/gruop_{1..3} -type d -name "$line")
scp "$dir"/{samples-sheet.csv,toInfo.xml,newfiles.gz,todo.csv} \
user#host.com:"/path/to/$line"
done < 'file.csv'
Basically, you could do something like:
# create the directory on the remote:
ssh remote-ip 'mkdir -p NA123'
# copy the files to the remote in the directory just created
for f in samples-sheet.csv toInfo.xml newfiles.gz todo.csv; do scp $f remote-ip:NA123/; done

Files not found in the current working dir

I have a really basic question. I'm trying to run STAR to align some reads to a sequencing experiment. I have around 30 samples. I firstly splitted the fast.gz files in 30 folders according to the sample of origin. In other words in each folder corresponding to a Sample* there are 4 files named *.fastq.gz (4 because of 4 lanes). I'm trying to loop STAR in each folder in this way:
for i in Sample*/;
do
cd $i;
qsub my_align_script.sh;
cd ..; done
where my_align_script.sh contains the following:
for i in *fastq.gz; do
STAR --runMode alignReads --genomeLoad LoadAndKeep --readFilesCommand zcat --outSAMtype BAM Unsorted --genomeDir "path" --readFilesIn $i ${i%.fastq.gz}.fastq.gz --runThreadN 10 --outFileNamePrefix ${i%.fastq.gz}
done
but unfortunately it seems not to find any *fast.gz file.
I tried to force to look in the current working dir of each Sample* folder specifying cd $path at the beginning but nothing when I run the for loop to launch jobs over folders.

How to create multiple files in each directories and then compress it through tar (BASH)

What I am currently struggling is to create multiple files and storing it into each directory.
I have made 100 directories like this:
mkdir ./dir.{01..100}
What I want is to create 100 text files for each directory. So that the result will show like:
click on dir.01 at home dir, which has files named: 01.txt to 100.txt
Also, I want to compress all 100 directories, each containing 100 text files into one using tar.
I am struggling with:
creating 100 text files each in 100 directories
using tar to zip all 100 directories together.
I am more interested in making creating 100 text files IN 100 directories. Also I am MUCH MORE interested in how to use tar to join all 100 directories together in specific file (fdtar) for instance.
If you are fine with empty files,
touch ./dir.{01..100}/{01..100}.txt
If you need each file to contain something, use that as the driver in a loop:
for file in ./dir.{01..100}/{01..100}.txt; do
printf "This is the file %s\n" "$file">"$file"
done
This could bump into ARG_MAX ("argument list too long") on some platforms, but it works fine on MacOS and should work fine on any reasonably standard Linux.
Splitting the loop into an inner and an outer loop could work around that problem:
for dir in ./dir.{01..100}; do
for file in {01..100}.txt; do
printf "This is file %s/%s\n" >"$dir/$file"
done
done
If I understand you need two things. First, you have 100 directories and need to create a file in each. With a for loop in bash run from the parent directory where all other directories you have created are:
for n in dir.*
do
f=`echo $n | sed s/dir\.//`
echo "This is file $n" >"$n/$f.txt"
done
Regarding tar that is even easier because tar will take multiple directories and glue them together. From the parent directory try:
tar cvf fd.tar dir.*
The c option will create the archive. v will tell tar to print all it is doing so you know what is happening. f directories.tar will create the archive with that name.
When you undo the tar operation, you will use:
tar xvf fd.tar
In this case x will extract the contents of the tar archive and will create all 100 directories for you at the directory from which you invoke it.
Note that I have used fd.tar and not fdtar as the .tar extension is the customary way to signal that the file is a tar archive.

Adding a status (file integrity)check to a cbr cbz converting bash script

First post, so Hi! Let me start by saying I'm a total noob regarding programming. I understand very basic stuff, but when it comes to checking exit codes or what the adequate term is, I'm at a loss. Apparently my searchfoo is really weak in this area, I guess it's a question of terminology.
Thanks in advance for taking your time to reading this/answering my question!
Description: I found a script that converts/repack .cbr files to .cbz files. These files are basically your average rar and zip files, however renamed to another extension as they are used for (comic)book applications such as comicrack, qcomicbook and what not. Surprisingly enough there no cbr -> cbz converters out there. The advantages of .cbz is besides escaping the proprietary rar file format, that one can store the metadata from Comic Vine with e. g comictagger.
Issue: Sometimes the repackaging of the files doesn't end well and would hopefully be alleviated by a integrity check & another go. I modified said script slightly to use p7zip as it can both pack/unpack 7z, zip-files and some others, i. e great for options. p7zip can test the archive by:
7z t comicfile.cbz tmpworkingdir
I guess it's a matter of using if & else here(?) to check the integrity and then give it another go, if there are any error.
Question/tl;dr: What would be the "best"/adequate approach to add a integrity file check to the script below?
#!/bin/bash
#Source: http://comicrack.cyolito.com/forum/13-scripts/30013-cbr3cbz-rar-to-zip-conversion-for-linux
echo "Converting CBRs to CBZs"
# Set the "field separator" to something other than spaces/newlines" so that spaces
# in the file names don't mess things up. I'm using the pipe symbol ("|") as it is very
# unlikely to appear in a file name.
IFS="|"
# Set working directory where to create the temp dir. The user you are using must have permission
# to write into this directory.
# For performance reasons I'm using ram disk (/dev/shm/) in Ubuntu server.
WORKDIR="/dev/shm/"
# Set name for the temp dir. This directory will be created under WORDDIR
TEMPDIR="cbr2cbz"
# The script should be invoked as "cbr2cbz {directory}", where "{directory}" is the
# top-level directory to be searched. Just to be paranoid, if no directory is specified,
# then default to the current working directory ("."). Let's put the name of the
# directory into a shell variable called SOURCEDIR.
# Note: "$1" = "The first command line argument"
if test -z "$1"; then
SOURCEDIR=`pwd`
else
SOURCEDIR="$1"
fi
echo "Working from directory $SOURCEDIR"
# We need an empty directory to work in, so we'll create a temp directory here
cd "$WORKDIR"
mkdir "$TEMPDIR"
# and step into it
cd "$TEMPDIR"
# Now, execute a loop, based on a "find" command in the specified directory. The
# "-printf "$p|" will cause the file names to be separated by the pipe symbol, rather than
# the default newline. Note the backtics ("`") (the key above the tab key on US
# keyboards).
for CBRFILE in `find "$SOURCEDIR" -name "*.cbr" -printf "%p|while read line; do
# Now for the actual work. First, extract the base file name (without the extension)
# using the "basename" command. Warning: more backtics.
BASENAME=`basename $CBRFILE ".cbr"`
# And the directory path for that file, so we know where to put the finished ".cbz"
# file.
DIRNAME=`dirname $CBRFILE`
# Now, build the "new" file name,
NEWNAME="$BASENAME.cbz"
# We use RAR file's name to create folder for unpacked files
echo "Processing $CBRFILE"
mkdir "$BASENAME"
# and unpack the rar file into it
7z x "$CBRFILE" -O"$BASENAME"
cd "$BASENAME"
# Lets ensure the permissions allow us to pack everything
sudo chmod 777 -R ./*
# Put all the extracted files into new ".cbz" file
7z a -tzip -mx=9 "$NEWNAME" *
# And move it to the directory where we found the original ".cbr" file
mv "$NEWNAME" $DIRNAME/"$NEWNAME"
# Finally, "cd" back to the original working directory, and delete the temp directory
# created earlier.
cd ..
rm -r "$BASENAME"
# Delete the RAR file also
rm "$CBRFILE"
done
# At the end we cleanup by removing the temp folder from ram disk
cd ..
echo "Conversion Done"
rm -r "$TEMPDIR"
Oh the humanity, not posting more than two links before 10 reputation and I linked the crap out of OP.. [edit]ah.. mh-mmm.. there we go..
[edit 2] I removed unrar as an dependency and use p7zip instead, as it can extract rar-files.
You will need two checks:
7z t will test the integrity of the archive
You should also test the integrity of all the image files in the archive. You can use at tools like ImageMagick for this.
A simple test would be identify file but that might read only the header. I'd use convert file -resize 5x5 png:- > /dev/null
This scales the image down to 5x5 pixels, converts it to PNG and then pipes the result to /dev/null (discarding it). For the scaling, the whole image has to be read. If this command fails with an error, something is wrong with the image file.

Resources