After I finish writing scripts on my local machine, I need to copy them in the cluster to execute the codes. For example, I want to copy all the matlab files in my current directory in a directory at the server id#server.
Can anyone help to write a very basic Makefile to fulfill this purpose?
Thanks a lot!
John
Here is an adaptation of Jens's answer, together with my answer here, that takes advantage of the capabilities of Make to only copy across those files that have been modified since the last time you copied the files to the server. That way, if you have hundreds of .m files and you modify one of them, you won't copy all of them across to the server.
It makes use of an empty hidden file, .last_push, that serves only to record (through its own timestamp) the time at which we last copied files to the server.
FILES = $(shell find . -name *.m)
SCP = scp id#server:path/relative/to/your/serverhomedir
LAST_PUSH = .last_push
.PHONY : push
push : $(LAST_PUSH)
$(LAST_PUSH) : $(FILES)
$(SCP) $?
touch $(LAST_PUSH)
Run this with make or make push. The key is the variable $?, which is populated with the list of all prerequisites that are newer than the target - in this case, the list of .m files that have been modified more recently than the last push.
How do you copy files to the server? Assuming you have ssh/scp available:
FILES = file1 file2 *.matlab
copy:
scp $(FILES) id#server:path/relative/to/your/serverhomedir
Run with
$ make copy
As a shell script, it could look like this:
#!/bin/sh
set -- file1 file2 *.matlab
scp "$#" id#server:path/relative/to/your/serverhomedir
Don't forget to chmod u+x yourscript.
Related
I like to create tar-files to distribute some scripts using bash.
For every script certain configuration-files and libraries (or toolboxes) are needed,
e.g. a script called CheckTool.py needs Checks.ini, CheckToolbox.py and CommontToolbox.py to run, which are stored in specific folders on my harddisk and need to be copied in the same manner on the users harddisk.
I can create a tarfile manually for each script, but i like to have it more simple.
For this i have the idea to define a list of all needed files and their pathes for a specific script and read this in a bashscript, which creates the tar file.
I started with:
#!/bin/bash
while read line
do
echo "$line"
done < $1
Which is reading the files and pathes. In my example the lines are:
./CheckTools/CheckMesh.bs
./Configs/CheckMesh.ini
./Toolboxes/CommonToolbox.bs
./Toolboxes/CheckToolbox.bs
My question is how do I have to organize the data to make a tar file with the specified files using bash?
Or is there someone having a better idea?
No need for a complicated script, use option -T of tar. Every file listed in there will be added to the tar file:
-T, --files-from FILE
get names to extract or create from FILE
So your script becomes:
#!/bin/bash
tar -cvpf something.tar -T listoffiles.txt
listoffiles.txt format is super easy, one file per line. You might want to put full path to ensure you get the right files:
./CheckTools/CheckMesh.bs
./Configs/CheckMesh.ini
./Toolboxes/CommonToolbox.bs
./Toolboxes/CheckToolbox.bs
You can add tar commands to the script as needed, or you could loop on the list files, from that point on, your imagination is the limit!
Can the shell override where output files are placed? (Not the console/screen output, but files created by a program.) I have a script that currently runs a sequence of input files through a program and for each one produces a lot of different output files.
for i in `seq 1 24`
do
../Bin/myprog inputfile.$i.in
done
Is there a way to create new directories for each run of the program and place the corresponding output files in each directory? So I would get dir1: <output files from run 1>; dir2 <output files from run 2> etc. I suppose one way would be to just write another script to create directories and sort all the files after the program(s) had run, but is there a more elegant way to do it?
As suggested in the comments, this might be what you need, assuming that your program just dumps output into the current working directory.
for i in `seq 1 24`
do
mkdir $i
pushd $i
../../Bin/myprog ../inputfile.$i.in
popd
done
If you are trying to change where an existing program (e.g., myprog) writes its files, this is only possible if the program writes its files relative to the current directory. In this case, the outer script that invokes myprog, can create a "destination" directory and chdir to it before invoking myprog.
If the myprog program writes to an absolute path, e.g., /var/tmp/myprog.tmp, the only way to override where this write actually goes is to place a symbolic link at the absolute path linking to the desired destination. This will only work if the program (myprog) doesn't first delete an existing file before writing to it.
The third and most extreme possibility for directing absolute file path writes is to create a chroot'ed file system, in which the myprog output files will be contained, after which the outer script can copy or move them to where they are desired.
To summarize: other than changing the source, setting the working directory for relative-path output files, or chrooting a filesystem for absolute-path files, there really is no "elegant" way to replace the actual output files used in a program.
I take delivery of files from multiple places as part of a publishing aggregation service. I need a way to move files that have been delivered to me from one location to another without losing the directory listings for sorting purposes.
Example:
Filepath of delivery: Server/Vendor/To_Company/Customer_Name/**
Filepath of processing: ~/Desktop/MM-DD-YYYY/Returned_Files/Customer_Name/**
I know I can move all of the directories by doing something such as:
find Server/Vendor/To_Company/* -exec mv -n ~/Desktop/MM-DD-YYYY/Returned_Files \;
but using that I can only run the script one time per day and there are times when I might need to run it multiple times.
It seems like ideally I should be able to create a copycat directory in my daily processing folder and then move the files from one to the other.
you can use rsync command with --remove-source-files option. you can run it as many times as needed.
#for trial run, without making any actual transfer.
rsync --dry-run -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
#command
rsync -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
reference:
http://www.cyberciti.biz/faq/linux-unix-bsd-appleosx-rsync-delete-file-after-transfer/
You could use rsync to do this for you:
rsync -a --remove-source-files /Server/Vendor/To_Company/Customer_Name ~/Desktop/$(date +"%y-%m-%d")/Returned_files/
Add -n to do a dry run to make sure it does what you want.
From the manual page:
--remove-source-files
This tells rsync to remove from the sending side the files (meaning non-directories) that are a part of the
transfer and have been successfully duplicated on the receiving side.
Note that you should only use this option on source files that are quiescent. If you are using this to move
files that show up in a particular directory over to another host, make sure that the finished files get renamed
into the source directory, not directly written into it, so that rsync can’t possibly transfer a file that is
not yet fully written. If you can’t first write the files into a different directory, you should use a naming
idiom that lets rsync avoid transferring files that are not yet finished (e.g. name the file "foo.new" when it
is written, rename it to "foo" when it is done, and then use the option --exclude='*.new' for the rsync trans‐
fer).
I have been working on how to verify that millions of files that were on file system A have infact been moved to file system B. While working on a system migration, it became evident that all the files needed to be audited to prove that the files have been moved. The files were initially moved via rsync, which does provide logs, although not in a format that is helpful for doing an audit. So, I wrote this script to index all the files on System A:
#!/bin/bash
# Get directories and file list to be used to verify proper file moves have worked successfully.
LOGDATE=`/usr/bin/date +%Y-%m-%d`
FILE_LIST_OUT=/mounts/A_files_$LOGDATE.txt
MOUNT_POINTS="/mounts/AA mounts/AB"
touch $FILE_LIST_OUT
echo TYPE,USER,GROUP,BYTES,OCTAL,OCTETS,FILE_NAME > $FILE_LIST_OUT
for directory in $MOUNT_POINTS; do
# format: type,user,group,bytes,octal,octets,file_name
gfind $directory -mount -printf "%y","%u","%g","%s","%m","%p\n" >> $FILE_LIST_OUT
done
The file indexing works fine and takes about two hours to index ~30 million files.
On side B is where we run into issues. I have written a very simple shell script that reads the index file, tests to see if the file is there, and then counts up how many files are there, but it's running out of memory while looping through the 30 million lines on indexed file names. Effectively doing this little bit of code below through a while loop, and counters to increment for files found and not found.
if [ -f "$TYPE" "$FILENAME" ] ; then
print file found
++
else
file not found
++
fi
My questions are:
Can a shell script do this type of reporting from such a large list. A 64 bit unix system ran out of memory while trying to execute this script. I have already considered breaking up the input script into smaller chunks to make it faster. Currently it can
If as shell script is inappropriate, what would you suggest?
You just used rsync, use it again...
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn’t affect the data that goes into the file-lists, and thus it doesn’t affect deletions. It just limits the files that the receiver requests to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to continue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hierarchy (when it is used properly), using --ignore existing will ensure that the already-handled files don’t get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that this option is only looking at the existing files in the destination hierarchy itself.
That will actually fix any problems (at least in the same sense that any diff-list on file-exist tests could fix problem. Using --ignore-existing means rsync only does the file-exist tests (so it'll construct the diff list as you request and use it internally). If you just want information on the differences, check --dry-run, and --itemize-changes.
Lets say you have two directories, foo and bar. Let's say bar has three files, 1,2, and 3. Let's say that bar, has a directory quz, which has a file 1. The directory foo is empty:
Now, here is the result,
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/
>f+++++++++ 1
>f+++++++++ 2
>f+++++++++ 3
cd+++++++++ quz/
>f+++++++++ quz/1
Note, you're not interested in the cd+++++++++ -- that's just showing you that rsync issued a chdir. Now, let's add a file in foo called 1, and let's use grep to remove the chdir(s),
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/ | grep -v '^cd'
>f+++++++++ 2
>f+++++++++ 3
>f+++++++++ quz/1
f is for file. The +++++++++ means the file doesn't exist in the DEST dir.
Here is the bonus, remove --dry-run, and, it'll go ahead and make the changes for you.
Have you considered a solution such as kdiff3, which will diff directories of files ?
Note the feature for version 0.9.84
Directory-Comparison: Option "Full Analysis" allows to show the number
of solved vs. unsolved conflicts or deltas vs. whitespace-changes in
the directory tree.
There is absolutely no problem reading a 30 million line file in a shell script. The reason why your process failed was most likely that you tried to read the file entirely into memory, e.g. by doing something wrong like for i in $(cat file).
The correct way of reading a file is:
while IFS= read -r line
do
echo "Something with $line"
done < someFile
A shell script is inappropriate, yes. You should be using a diff tool:
diff -rNq /original /new
If you're not particular about the solution being a script, you could also look into meld, which would let you diff directory trees quite easily and you can also set ignore patterns if you have any.
Need some help with this as my shell scripting skills are somewhat less than l337 :(
I need to gzip several files and then copy newer ones over the top from another location. I need to be able to call this script in the following manner from other scripts.
exec script.sh $oldfile $newfile
Can anyone point me in the right direction?
EDIT: To add more detail:
This script will be used for monthly updates of some documents uploaded to a folder, the old documents need to be archived into one compressed file and the new documents, which may have different names, copied over the top of the old. The script needs to be called on a document by document case from another script. The basic flow for this script should be -
The script file should create a new gzip
archive with a specified name (created from a prefix constant in the script and the current month and year e.g. prefix.september.2009.tar.gz) only if it
does not already exist, otherwise add to the existing one.
Copy the old file into the archive.
Replace the old file with the new one.
Thanks in advance,
Richard
EDIT: Added mode detail on the archive filename
Here's the modified script based on your clarifications. I've used tar archives, compressed with gzip, to store the multiple files in a single archive (you can't store multiple files using gzip alone). This code is only superficially tested - it probably has one or two bugs, and you should add further code to check for command success etc. if you're using it in anger. But it should get you most of the way there.
#!/bin/bash
oldfile=$1
newfile=$2
month=`date +%B`
year=`date +%Y`
prefix="frozenskys"
archivefile=$prefix.$month.$year.tar
# Check for existence of a compressed archive matching the naming convention
if [ -e $archivefile.gz ]
then
echo "Archive file $archivefile already exists..."
echo "Adding file '$oldfile' to existing tar archive..."
# Uncompress the archive, because you can't add a file to a
# compressed archive
gunzip $archivefile.gz
# Add the file to the archive
tar --append --file=$archivefile $oldfile
# Recompress the archive
gzip $archivefile
# No existing archive - create a new one and add the file
else
echo "Creating new archive file '$archivefile'..."
tar --create --file=$archivefile $oldfile
gzip $archivefile
fi
# Update the files outside the archive
mv $newfile $oldfile
Save it as script.sh, then make it executable:
chmod +x script.sh
Then run like so:
./script.sh oldfile newfile
something like frozenskys.September.2009.tar.gz, will be created, and newfile will replace oldfile. You can also call this script with exec from another script if you want. Just put this line in your second script:
exec ./script.sh $1 $2
A good refference for any bash scripting is Advanced Bash-Scripting Guide.
This guide explains every thing bash scripting.
The basic approach I would take is:
Move the files you want to zip to a directory your create.
(commands mv and mkdir)
zip the directory. (command gzip, I assume)
Copy the new files to the desired location (command cp)
In my experience bash scripting is mainly knowing how to use these command well and if you can run it on the command line you can run it in your script.
Another command that might be useful is
pwd - this returns the current directory
Why don't you use version control? It's much easier; just check out, and compress.
(apologize if it's not an option)