Shell Script to redirect to different directory and create a list file - shell

src_dir="/export/home/destination"
list_file="client_list_file.txt"
file=".csv"
echo "src directory="$src_dir
echo "list_file="$list_file
echo "file="$file
cd /export/home/destination
touch $list_file
x=`ls *$file | sort >$list_file`
if [ -s $list_file ]
then
echo "List File is available, archiving now"
y=`tar -cvf mystuff.tar $list_file`
else
echo "List File is not available"
fi
The above script is working fine and it's supposed to create a list file of all .csv files and tar's it.
However I am trying to do it from a different directory while running the script, so it should go to the destination directory and makes a list file with all the .csv in destination directory and make a .tar from the list file(i.e archive the list file)
So i am not sure what to change

there are a lot of tricks in filename handling. the one thing you should know is file naming under POSIX sucks. commands like ls or find may not return the expected result(but 99% of the time they will). so here is what you have to do to get the list of files truely:
for file in $src_dir/*.csv; do
echo `basename $file` >> $src_dir/$list_file
done
tar cvf $src_dir/mystuff.tar $src_dir/$list_file
maybe you should learn bash in a serious manner and try to google first before you asking question in SO next time.
http://www.gnu.org/software/bash/manual/html_node/index.html#SEC_Contents
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

Related

Extracting certain files from a tar archive on a remote ssh server

I am running numerous simulations on a remote server (via ssh). The outcomes of these simulations are stored as .tar archives in an archive directory on this remote server.
What I would like to do, is write a bash script which connects to the remote server via ssh and extracts the required output files from each .tar archive into separate folders on my local hard drive.
These folders should have the same name as the .tar file from which the files come (To give an example, say the output of simulation 1 is stored in the archive S1.tar on the remote server, I want all '.dat' and '.def' files within this .tar archive to be extracted to a directory S1 on my local drive).
For the extraction itself, I was trying:
for f in *.tar; do
(
mkdir ../${f%.tar}
tar -x -f "$f" -C ../${f%.tar} "*.dat" "*.def"
)
done
wait
Every .tar file is around 1GB and there is a lot of them. So downloading everything takes too much time, which is why I only want to extract the necessary files (see the extensions in the code above).
Now the code works perfectly when I have the .tar files on my local drive. However, what I can't figure out is how I can do it without first having to download all the .tar archives from the server.
When I first connect to the remote server via ssh username#host, then the terminal stops with the script and just connects to the server.
Btw I am doing this in VS Code and running the script through terminal on my MacBook.
I hope I have described it clear enough. Thanks for the help!
Stream the results of tar back with filenames via SSH
To get the data you wish to retrieve from .tar files, you'll need to pass the results of tar to a string of commands with the --to-command option. In the example below, we'll run three commands.
# Send the files name back to your shell
echo $TAR_FILENAME
# Send the contents of the file back
cat /dev/stdin
# Send EOF (Ctrl+d) back (note: since we're already in a $'' we don't use the $ again)
echo '\004'
Once the information is captured in your shell, we can start to process the data. This is a three-step process.
Get the file's name
note that, in this code, we aren't handling directories at all (simply stripping them away; i.e. dir/1.dat -> 1.dat)
you can write code to create directories for the file by replacing the forward slashes / with spaces and iterating over each directory name but that seems out-of-scope for this.
Check for the EOF (end-of-file)
Add content to file
# Get the files via ssh and tar
files=$(ssh -n <user#server> $'tar -xf <tar-file> --wildcards \'*\' --to-command=$\'echo $TAR_FILENAME; cat /dev/stdin; echo \'\004\'\'')
# Keeps track of what state we're in (filename or content)
state="filename"
filename=""
# Each line is one of these:
# - file's name
# - file's data
# - EOF
while read line; do
if [[ $state == "filename" ]]; then
filename=${line/*\//}
touch $filename
echo "Copying: $filename"
state="content"
elif [[ $state == "content" ]]; then
# look for EOF (ctrl+d)
if [[ $line == $'\004' ]]; then
filename=""
state="filename"
else
# append data to file
echo $line >> <output-folder>/$filename
fi
fi
# Double quotes here are very important
done < <(echo -e "$files")
Alternative: tar + scp
If the above example seems overly complex for what it's doing, it is. An alternative that touches the disk more and requires to separate ssh connections would be to extract the files you need from your .tar file to a folder and scp that folder back to your workstation.
ssh -n <username>#<server> 'mkdir output/; tar -C output/ -xf <tar-file> --wildcards *.dat *.def'
scp -r <username>#<server>:output/ ./
The breakdown
First, we'll make a place to keep our outputted files. You can skip this if you already know the folder they'll be in.
mkdir output/
Then, we'll extract the matching files to this folder we created (if you don't want them to be in a different folder remove the -C output/ option).
tar -C output/ -xf <tar-file> --wildcards *.dat *.def
Lastly, now that we're running commands on our machine again, we can run scp to reconnect to the remote machine and pull the files back.
scp -r <username>#<server>:output/ ./

Bash diff that stops when it finds the first difference

I have this script that I use for backups. The problem is that it is kind of slow. I want to know if there is a diff command that stops when finds the first difference.
DocumentsFiles=("Books" "Comics" "Distros" "Emulators" "Facturas" "Facultad" "Laboral" "Mods" "Music" "Paintings" "Projects" "Scripts" "Tesis" "Torrents" "Utilities")
OriginDocumentsFile="E:\Documents\\"
DestinationDocumentsFile="F:\Files\Documents\\"
## loop file to file and copy in backup
for directory in "${DocumentsFiles[#]}"
do
RealOrigin="${OriginDocumentsFile}${directory}"
RealDestination="${DestinationDocumentsFile}${directory}"
echo $directory
if [ -a "$RealDestination" ]; then
echo ok
if diff -r $RealOrigin $RealDestination; then
echo "${directory} are equal!"
else
rm -rfv $RealDestination
cp -ruv $RealOrigin "${DestinationDocumentsFile}"
fi
else
cp -ruv $RealOrigin "${DestinationDocumentsFile}"
fi
done
diff -q reports "only when files differ" (per man diff), so I believe it'll stop after the first difference.
But this is a bit of an XY problem. Really you need a better backup program like rsync:
It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination.
From man rsync

bash shell copy the file from one location to another using break and continue

I have small script to copy the all the files from one directory (SRC) to another directory (DES). This below script is running perfectly.
#!/bin/bash
SRC="/home/user/dir1/*"
DES="/home/user/dir2/"
for file in "$SRC"
do
if [ -f "$file" ]
then
cp "$file" "$DES"
echo "$file -----> file copied"
fi
done
Now what i am thinking while copying files from one directory to another directory, how to skip the copying file if that file has already exist in (DES) directory with same name of (SRC) directory and continue the remaining file as usual from source to destination?
Here how to i use break and continue looping to perform this action?
Thanks,
I recommend to use rsync:
src="/home/user/dir1/"
dst="/home/user/dir2/"
rsync -rav --ignore-existing "${src}" "${dst}"
The switch --ignore-existing tells rsync to skip files which exist at the destination.
Why not just reduce the entire script to a oneliner?
cp -n /home/user/dir1/* /home/user/dir2/
The -n flag (--no-clobber) prevents cp from overwriting existing files.
If your real situation is more complicated, you can also take a look at rsync.

How to develop this sort script further

I have a small interactive unix script that takes a terminal command input for a chosen file type and upon receiving such iterates through a large folder of unsorted files on my desktop and pulls files of that chosen selection to a new sorted folder.
i.e. The user types jpg in the command and all jpg files are pulled out of the unsorted folder and into the sorted folder.
It works terrific as it stands, but I would like to develop my script further so that instead of all file types being pushed into a communal sorted folder, I could have jpg files being pushed into a dedicated folderjpg, png files pushed in folderpng and finally all docx files moved into docxfolder.
How can I achieve such in the leanest possible manner assuming that these dedicated folders for the file types mentioned have been created on my desktop.
#!/bin/bash
echo "Good Morning, Please enter your file type name for sorting [ENTER]:"
read extension
mv -v /Users/christopherdorman/desktop/unsorted/*.${extension} /Users/christopherdorman/desktop/sorted/
if [[ $? -eq 0 ]]; then
echo "Good News, Your files have been successfully processed"
fi
I would write it this way:
read -p "Good Morning, Please enter your file type name for sorting [ENTER]:" extension
if cd /Users/christopherdorman/desktop; then
destination="folder$extension"
# ensure the destination folder exists
mkdir -p "$destination"
if mv -v unsorted/*."$extension" "$destination"; then
echo "Good News, Your files have been successfully processed"
fi
fi

Create a detailed self tracing log in bash

I know you can create a log of the output by typing in script nameOfLog.txt and exit in terminal before and after running the script, but I want to write it in the actual script so it creates a log automatically. There is a problem I'm having with the exec >>log_file 2>&1 line:
The code redirects the output to a log file and a user can no longer interact with it. How can I create a log where it just basically copies what is in the output?
And, is it possible to have it also automatically record the process of files that were copied? For example, if a file at /home/user/Deskop/file.sh was copied to /home/bckup, is it possible to have that printed in the log too or will I have to write that manually?
Is it also possible to record the amount of time it took to run the whole process and count the number of files and directories that were processed or am I going to have to write that manually too?
My future self appreciates all the help!
Here is my whole code:
#!/bin/bash
collect()
{
find "$directory" -name "*.sh" -print0 | xargs -0 cp -t ~/bckup #xargs handles files names with spaces. Also gives error of "cp: will not overwrite just-created" even if file didn't exist previously
}
echo "Starting log"
exec >>log_file 2>&1
timelimit=10
echo "Please enter the directory that you would like to collect.
If no input in 10 secs, default of /home will be selected"
read -t $timelimit directory
if [ ! -z "$directory" ] #if directory doesn't have a length of 0
then
echo -e "\nYou want to copy $directory." #-e is so the \n will work and it won't show up as part of the string
else
directory=/home/
echo "Time's up. Backup will be in $directory"
fi
if [ ! -d ~/bckup ]
then
echo "Directory does not exist, creating now"
mkdir ~/bckup
fi
collect
echo "Finished collecting"
exit 0
To answer the "how to just copy the output" question: use a program called tee and then a bit of exec magic explained here:
redirect COPY of stdout to log file from within bash script itself
Regarding the analytics (time needed, files accessed, etc) -- this is a bit harder. Some programs that can help you are time(1):
time - run programs and summarize system resource usage
and strace(1):
strace - trace system calls and signals
Check the man pages for more info. If you have control over the script it will be probably easier to do the logging yourself instead of parsing strace output.

Resources