Find oldest item in a folder SVN - bash

I want to write a bash script that will store 10 back ups of a website in SVN, with it being back up nightly and then have the oldest back up deleted.
Is there an SVN command where I can get the age of these files in svn so then I can grammatically call "svn delete" on that file?

Subversion is definitely not the tool for this job. Once you commit something to subversion, there is no practical way to delete it.
There are a lot of ways to achieve your goal using standard commands in bash. You can use tools like ftp, wget, curl, scp, ssh, or whatever to download your site files, then tar and zip them up with different file names based on the date.
#!/bin/bash
DELETEME='htdocs_'`date '+%Y%m%d' -d '-10 days'`'.tar.gz'
NEW='htdocs_'`date '+%Y%m%d'`'.tar.gz'
SOURCE='/path/on/server/to/backup'
HOST='IP_or_hostname'
USER='user_on_HOST'
ssh $USER#$HOST tar czvf - $SOURCE > $NEW
rm -v $DELETEME
Then just schedule this as a daily cron job.

It doesn't sound like you understand how Subversion works.
Subversion is a version control system. You really use it the other way around, you write your webpages and JavaScripts in Subversion and then deploy your webpage from Subversion to your website. You have a complete history of all of your files in Subversion, and use its features like creating a tag to mark specific revisions of your website. This way, you can find out who made changes and why they were made.
It sounds like you simply want to make a backup of your website, and then delete the oldest backup to save room.
You should look into rsync which is really great for backups. Rsync is fast and is pretty simple to use.
You can look at the Subversion online manual and read the first two or three chapters. It'll explain how Subversion is used and it's one of the best manuals for open source software out there. After you read it, you might decide to use Subversion after all, but not for backups, but for development.

Related

Retrieving latest file in a directory from a remote server

I was hoping to crack this myself, but it seems I have fallen at the first hurdle because I can't make head nor tale of other options I've read about.
I wish to access a database file hosted as follows (i.e. the hhsuite_dbs is a folder containing several databases)
http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pdb70_08Oct15.tgz
Periodically, they update these databases, and so I want to download the lastest version. My plan is to run a bash script via cron, most likely monthly (though I've yet to even tackle the scheduling aspect of the task).
I believe the database is refreshed fortnightly, so if my script runs monthly I can expect there to be a new version. I'll then be running downstream programs that require the database.
My question is then, how do I go about retrieving this (and for a little more finesse I'd perhaps like to be able to check whether the remote file has changed in name or content to avoid a large download if unnecessary)? Is the best approach to query the name of the file, or the file property of date last modified (given that they may change the naming syntax of the file too?). To my naive brain, some kind of globbing of the pdb70 (something I think I can rely on to be in the filename) then pulled down with wget was all I had come up with so far.
EDIT Another confounding issue that has just occurred to me is that the file I want wont necessarily be the newest in the folder (as there are other types of databases there too), but rather, I need the newest version of, in this case, the pdb70 database.
Solutions I've looked at so far have mentioned weex, lftp, curlftpls but all of these seem to suggest logins/passwords for the server which I don't have/need if I was to just download it via the web. I've also seen mention of rsync, but of a cursory read it seems like people are steering clear of it for FTP uses.
Quite a few barriers in your way for this.
My first suggestion is that rather than getting the filename itself, you simply mirror the directory using wget, which should already be installed on your Ubuntu system, and let wget figure out what to download.
base="http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/"
cd /some/place/safe/
wget --mirror -nd "$base"
And new files will be created in the "safe" directory.
But that just gets you your mirror. You're still after is the "newest" file.
Luckily, wget sets the datestamp of files it downloads, if it can. So after mirroring, you might be able to do something like:
newestfile=$(ls -t /some/place/safe/pdb70*gz | head -1)
Note that this fails if ever there are newlines in the filename.
Another possibility might be to check the difference between the current file list and the last one. Something like this:
#!/bin/bash
base="http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/"
cd /some/place/safe/
wget --mirror -nd "$base"
rm index.html* *.gif # remove debris from mirroring an index
ls > /tmp/filelist.txt.$$
if [ -f /tmp/filelist.txt ]; then
echo "Difference since last check:"
diff /tmp/filelist.txt /tmp/filelist.txt.$$
fi
mv /tmp/filelist.txt.$$ /tmp/filelist.txt
You can parse the diff output (man diff for more options) to determine what file has been added.
Of course, with a solution like this, you could run your script every day and hopefully download a new update within a day of it being ready, rather than a fortnight later. Nice thing about --mirror is that it won't download files that are already on-hand.
Oh, and I haven't tested what I've written here. That's one monstrously large file.

FTP backup script with hard links using

Usually I use rsync based backup.
But now I have to make backup script from Windows server to linux.
So, there is no rsync - only FTP.
I like ideas of hard links using to save disk space and incremental backup to minimize traffic.
Is there any similar backup script for ftp instead of rsync?
UPDATE:
I need to backup Windows server through FTP. Backup script executes at Linux backup server.
SOLUTION:
I found this useful script to backup through FTP with hard links and incremental feature.
Note for Ubuntu users: there is no md5 command in Ubuntu. Use md5sum instead.
# filehash1="$(md5 -q "$curfile"".gz")"
# filehash2="$(md5 -q "$mysqltmpfile")"
filehash1="$(md5sum "$curfile"".gz" | awk '{ print $1 }')"
filehash2="$(md5sum "$mysqltmpfile" | awk '{ print $1 }')"
Edit, since the setup was not clear enough for me from the original question.
Based on the update of the question the situation is, that you need to pull the data on the backup server from the windows system via ftp. In this case you could adapt the script you find yourself (see comment) or use a similar idea like:
Use cp -lr to clone the previous backup with hard links.
Use lftp --mirror to overwrite this copy with anything which got updated on the remote system.
But I assumed initially that you need to push the data from the windows system to the backup server, that is the FTP server is on the backup system. This case can not handled this way (original answer follows):
Since FTP has no idea of links at all any transfers will only result in new or overwritten files. The only way would be to using the SITE command to issue site specific commands and deal this way with hard links. But site specific commands are usually restricted heavily so that you can do something like change permissions but not do anything with hard links.
And even if you could support hard links with SITE you have to implement the logic which decides when to use such links. With rsync this logic is built into the rsync server and executed on the server site. With FTP you have to built all the logic at the client site, which means that you would have to download a file to compare it with a local file and then decide if you would need to upload the new file or if a hard link to an existing file could be used.

How to keep two folders automatically synchronized?

I would like to have a synchronized copy of one folder with all its subtree.
It should work automatically in this way: whenever I create, modify, or delete stuff from the original folder those changes should be automatically applied to the sync-folder.
Which is the best approach to this task?
BTW: I'm on Ubuntu 12.04
Final goal is to have a separated real-time backup copy, without the use of symlinks or mount.
I used Ubuntu One to synchronize data between my computers, and after a while something went wrong and all my data was lost during a synchronization.
So I thought to add a step further to keep a backup copy of my data:
I keep my data stored on a "folder A"
I need the answer of my current question to create a one-way sync of "folder A" to "folder B" (cron a script with rsync? could be?). I need it to be one-way only from A to B any changes to B must not be applied to A.
The I simply keep synchronized "folder B" with Ubuntu One
In this manner any change in A will be appled to B, which will be detected from U1 and synchronized to the cloud. If anything goes wrong and U1 delete my data on B, I always have them on A.
Inspired by lanzz's comments, another idea could be to run rsync at startup to backup the content of a folder under Ubuntu One, and start Ubuntu One only after rsync is completed.
What do you think about that?
How to know when rsync ends?
You can use inotifywait (with the modify,create,delete,move flags enabled) and rsync.
while inotifywait -r -e modify,create,delete,move /directory; do
rsync -avz /directory /target
done
If you don't have inotifywait on your system, run sudo apt-get install inotify-tools
You need something like this:
https://github.com/axkibe/lsyncd
It is a tool which combines rsync and inotify - the former is a tool that mirrors, with the correct options set, a directory to the last bit. The latter tells the kernel to notify a program of changes to a directory ot file.
It says:
It aggregates and combines events for a few seconds and then spawns one (or more) process(es) to synchronize the changes.
But - according to Digital Ocean at https://www.digitalocean.com/community/tutorials/how-to-mirror-local-and-remote-directories-on-a-vps-with-lsyncd - it ought to be in the Ubuntu repository!
I have similar requirements, and this tool, which I have yet to try, seems suitable for the task.
Just simple modification of #silgon answer:
while true; do
inotifywait -r -e modify,create,delete /directory
rsync -avz /directory /target
done
(#silgon version sometimes crashes on Ubuntu 16 if you run it in cron)
Using the cross-platform fswatch and rsync:
fswatch -o /src | xargs -n1 -I{} rsync -a /src /dest
You can take advantage of fschange. It’s a Linux filesystem change notification. The source code is downloadable from the above link, you can compile it yourself. fschange can be used to keep track of file changes by reading data from a proc file (/proc/fschange). When data is written to a file, fschange reports the exact interval that has been modified instead of just saying that the file has been changed.
If you are looking for the more advanced solution, I would suggest checking Resilio Connect.
It is cross-platform, provides extended options for use and monitoring. Since it’s BitTorrent-based, it is faster than any other existing sync tool. It was written on their behalf.
I use this free program to synchronize local files and directories: https://github.com/Fitus/Zaloha.sh. The repository contains a simple demo as well.
The good point: It is a bash shell script (one file only). Not a black box like other programs. Documentation is there as well. Also, with some technical talents, you can "bend" and "integrate" it to create the final solution you like.

How to export only last changes from SVN repository?

I want to export only changes (files and folders - tree) between HEAD and PREVIOUS revisions using svn export command. Is it possible? How to do it under Windows (command-line)?
svn diff -rPREV:HEAD --summarize
Will give you a list of files modified. You should be able to iterate over the list with whatever scripting languages you have available and create the necessary directories followed by svn cat commands to get the contents of each file.
This would be quite easy in bash, maybe someone else can tell you where to go from here in Windows.

Incremental deploy from a shell script

I have a project, where I'm forced to use ftp as a means of deploying the files to the live server.
I'm developing on linux, so I hacked together a bash script that makes a backup of the ftp server's contents,
deletes all the files on the ftp, and uploads all the fresh files from the mercurial repository.
(and taking care of user uploaded files and folders, and making post-deploy changes, etc)
It's working well, but the project is starting to get big enough to make the deployment process too long.
I'd like to modify the script to look up which files have changed, and only deploy the modified files. (the backup is fine atm as it is)
I'm using mercurial as a VCS, so my idea is to somehow request the changed files between two revisions from it, iterate over the changed files,
and upload each modified file, and delete each removed file.
I can use hg log -vr rev1:rev2, and from the output, I can carve out the changed files with grep/sed/etc.
Two problems:
I have heard the horror stories that parsing the output of ls leads to insanity, so my guess is that the same applies to here,
if I try to parse the output of hg log, the variables will undergo word-splitting, and all kinds of transformations.
hg log doesn't tell me a file is modified/added/deleted. Differentiating between modified and deleted files would be the least.
So, what would be the correct way to do this? I'm using yafc as an ftp client, in case it's needed, but willing to switch.
You could use a custom style that does the parsing for you.
hg log --rev rev1:rev2 --style mystyle
Then pipe it to sort -u to get a unique list of files. The file "mystyle" would look like this:
changeset = '{file_mods}{file_adds}\n'
file_mod = '{file_mod}\n'
file_add = '{file_add}\n'
The mods and adds templates are files modified or added. There is a similar file_dels and file_del template for deleted files.
Alternatively, you could use hg status -ma --rev rev1-1:rev2 which adds an M or an A before modified/added files. You need to pass a different revision range, one less than rev1, as it is the status since that "baseline". Deleted files are similar - you need the -d flag and a D is added before each deleted file.

Resources