how to preserve file modification time with LFTP - ftp

I wrote a script that sync several servers at once. One of my problem is that I cannot get LFTP to preserve initial file modification time. So basically, it always upload all the files while only some changed.
Do you know how to force LFTP to preserve file modification time when downloading or uploading?
Thanks for any help.
Korchkidu

On the following page
http://www.bouthors.fr/wiki/doku.php?id=en:linux:synchro_lftp
the authors state:
When uploading, it is not possible to set the date/time on the files uploaded, that's why –>ignore-time is needed.
so if you use the flag combination --only-newer and --ignore-time you can achieve decent backup properties, in such a way that all files that differ in size are replaced. Of course it doesn't help if you really need to rely on time-synchronization but if it is just to perform a regular backup of data, it'll do the job.

try this
from here http://lftp.yar.ru/lftp-man.html:
ftp:use-site-utime (boolean)
when true, lftp sends 5-argument `SITE UTIME' command to set file modification time on
uploaded files. Default is true.

According to man lftp, I found this flag:
ftp:use-mdtm-overloaded (boolean)
when true, lftp uses two argument MDTM command to set file modification time on uploaded files. Default is false.
Thus, I set it to:
set ftp:use-mdtm-overloaded true
and it worked as expected.

Related

LFTP file prefix while is being downloaded

Is it possible to prefix the files that are being downloaded? (similar to rsync)
For instance if fruit_banana is being downloaded, it will show as fruit_banana but the transfer is not completed. (It would be nice to prefix it with something to differentiate between completed or not completed files)
Eventually found the answer:
The following extra parameters are needed:
set xfer:use-temp-file yes
set xfer:temp-file-name *.lftp

sql loader without .dat extension

Oracle's sqlldr defaults to a .dat extension. That I want to override. I don't like to rename the file. When googled get to know few answers to use . like data='fileName.' which is not working. Share your ideas, please.
Error message is fileName.dat is not found.
Sqlloder has default extension for all input files data,log,control...
data= .dat
log= .log
control = .ctl
bad =.bad
PARFILE = .par
But you have to pass filename without apostrophe and dot
sqlloder pass/user#db control=control data=data
sqloader will add extension. control.ctl data.dat
Nevertheless i do not understand why you do not want to specify extension?
You can't, at least in Unix/Linux environments. In Windows you can use the trailing period trick, specifying either INFILE 'filename.' in the control file or DATA=filename. on the command line. WIndows file name handling allows that; you can for instance do DIR filename. at a command prompt and it will list the file with no extension (as will DIR filename). But you can't do that with *nix, from a shell prompt or anywhere else.
You said you don't want to copy or rename the file. Temporarily renaming it might be the simplest solution, but as you may have a reason not to do that even briefly you could instead create a hard or soft link to the file which does have an extension, and use that link as the target instead. You could wrap that in a shell script that takes the file name argument:
# set variable from correct positional parameter; if you pass in the control
# file name or other options, this might not be $1 so adjust as needed
# if the tmeproary file won't be int he same directory, need to be full path
filename=$1
# optionally check file exists, is readable, etc. but overkill for demo
# can also check temporary file does not already exist - stop or remove
# create soft link somewhere it won't impact any other processes
ln -s ${filename} /tmp/${filename##*/}.dat
# run SQL*Loader with soft link as target
sqlldr user/password#db control=file.ctl data=/tmp/${filename##*/}.dat
# clean up
rm -f /tmp/${filename##*/}.dat
You can then call that as:
./scriptfile.sh /path/to/filename
If you can create the link in the same directory then you only need to pass the file, but if it's somewhere else - which may be necessary depending on why renaming isn't an option, and desirable either way - then you need to pass the full path of the data file so the link works. (If the temporary file will be int he same filesystem you could use a hard link, and you wouldn't have to pass the full path then either, but it's still cleaner to do so).
As you haven't shown your current command line options you may have to adjust that to take into account anything else you currently specify there rather than in the control file, particularly which positional argument is actually the data file path.
I have the same issue. I get a monthly download of reference data used in medical application and the 485 downloaded files don't have file extensions (#2gb). Unless I can load without file extensions I have to copy the files with .dat and load from there.

s3cmd sync is remote copying the wrong files to the wrong locations

I've got the following as part of a shell script to copy site files up to a S3 CDN:
for i in "${S3_ASSET_FOLDERS[#]}"; do
s3cmd sync -c /path/to/.s3cfg --recursive --acl-public --no-check-md5 --guess-mime-type --verbose --exclude-from=sync_ignore.txt /path/to/local/${i} s3://my.cdn/path/to/remote/${i}
done
Say S3_ASSET_FOLDERS is:
("one/" "two/")
and say both of those folders contain a file called... "script.js"
and say I've made a change to two/script.js - but not touched one/script.js
running the above command will firstly copy the file from /one/ to the correct location, although I've no idea why it thinks it needs to:
INFO: Sending file
'/path/to/local/one/script.js', please wait...
File
'/path/to/local/one/script.js'
stored as
's3://my.cdn/path/to/remote/one/script.js' (13551
bytes in 0.1 seconds, 168.22 kB/s) [1 of 0]
... and then a remote copy operation for the second folder:
remote copy: two/script.js -> script.js
What's it doing? Why?? Those files aren't even similar. Different modified times, different checksums. No relation.
And I end up with an s3 bucket with two incorrect files in. The file in /two/ that should have been updated, hasn't. And the file in /one/ that shouldn't have changed is now overwritten with the contents of /two/script.js
Clearly I'm doing something bizarrely stupid because I don't see anyone else having the same issue. But I've no idea what??
First of all, try to run it without --no-check-md5 option.
Second, I suggest you to pay attention to directory names, specifically trailing slashes.
s3cmd documentation says:
With directories there is one thing to watch out for – you can either upload the directory and its contents or just the contents. It all depends on how you specify the source.
To upload a directory and keep its name on the remote side specify the source without the trailing slash
On the other hand to upload just the contents, specify the directory it with a trailing slash

bash scripting de-dupe

I have a shell script. A cron job runs it once a day. At the moment it just downloads a file from the web using wget, appends a timestamp to the filename, then compresses it. Basic stuff.
This file doesn't change very frequently though, so I want to discard the downloaded file if it already exists.
Easiest way to do this?
Thanks!
Do you really need to compress the file ?
wget provides -N, --timestamping which obviously, turns on time-stamping. What that does is say your file is located at www.example.com/file.txt
The first time you do:
$ wget -N www.example.com/file.txt
[...]
[...] file.txt saved [..size..]
The next time it'll be like this:
$ wget -N www.example.com/file.txt
Server file no newer than local file “file.txt” -- not retrieving.
Except if the file on the server was updated.
That would solve your problem, if you didn't compress the file.
If you really need to compress it, then I guess I'd go with comparing the hash of the new file/archive and the old. What matters in that case is, how big is the downloaded file ? is it worth compressing it first then checking the hashes ? is it worth decompressing the old archive and comparing the hashes ? is it better to store the old hash in a txt file ? do all these have an advantage over overwriting the old file ?
You only know that, make some tests.
So if you go the hash way, consider sha256 and xz (lzma2 algorithm) compression.
I would do something like this (in Bash):
newfilesum="$(wget -q www.example.com/file.txt -O- | tee file.txt | sha256sum)"
oldfilesum="$(xzcat file.txt.xz | sha256sum)"
if [[ $newfilesum != $oldfilesum ]]; then
xz -f file.txt # overwrite with the new compressed data
else
rm file.txt
fi
and that's done;
Calculate a hash of the content of the file and check against the new one. Use for instance md5sum. You only have to save the last MD5 sum to check if the file changed.
Also, take into account that the web is evolving to give more information on pages, that is, metadata. A well-founded web site should include file version and/or date of modification (or a valid, expires header) as part of the response headers. This, and quite other things, is what makes up the scalability of Web 2.0.
How about downloading the file, and checking it against a "last saved" file?
For example, the first time it downloads myfile, and saves it as myfile-[date], and compresses it. It also adds a symbolic link, such as lastfile pointing to myfile-[date]. The next time the script runs, it can check if the contents of whatever lastfile points to is the same as the new downloaded file.
Don't know if this would work well, but it's what I could think of.
You can compare the new file with the last one using the sum command. This takes the checksum of the file. If both files have the same checksum, they are very, very likely to be exactly the same. There's another command called md5 that takes the md5 fingerprint, but the sum command is on all systems.

Sync File Modification Time Across Multiple Directories

I have a computer A with two directory trees. The first directory contains the original mod dates that span back several years. The second directory is a copy of the first with a few additional files. There is a second computer be which contains a directory tree which is the same as the second directory on computer A (new mod times and additional files). How update the files in the two newer directories on both machines so that the mod times on the files are the same as the original? Note that these directory trees are in the order of 10s of gigabytes so the solution would have to include some method of sending only the date information to the second computer.
The answer by Paul is partly correct, rsync is able to do this, however with different parameters. The correct command is
rsync -Prt --size-only original_dir copy_dir
where -P enables partial transfers and displays a progress indicator, -r recurses through subdirectories, -t preserves time stamps and --size-only doesn't transfer files that match in size.
The following command will make sure that TEST2 gets the same date assigned that TEST1 has
touch -t `stat -t '%Y%m%d%H%M.%S' -f '%Sa' TEST1` TEST2
Now instead of using hard-coded values here, you could find the files using "find" utility and then run touch via SSH on the remote machine. However, that means you may have to enter the password for each file, unless you switch SSH to cert authentication. I'd rather not do it all in a super fancy one-liner. Instead let's work with temp files. First go to the directory in question and run a find (you can filter by file type, size, extension, whatever pleases you, see "man find" for details. I'm just filtering by type file here to exclude any directories):
find . -type f -print -exec stat -t '%Y%m%d%H%M.%S' -f '%Sm' "{}" \; > /tmp/original_dates.txt
Now we have a file that looks like this (in my example there are only two entries there):
# cat /tmp/original_dates.txt
./test1
200809241840.55
./test2
200809241849.56
Now just copy the file over to the other machine and place it in the directory (so the relative file paths match) and apply the dates:
cat original_dates.txt | (while read FILE && read DATE; do touch -t $DATE "$FILE"; done)
Will also work with file names containing spaces.
One note: I used the last "modification" date at stat, as that's what you wrote in the question. However, it rather sounds as if you want to use the "creation" date (every file has a creation date, last modification date and last access date), you need to alter the stat call a bit.
'%Sm' - last modification date
'%Sc' - creation date
'%Sa' - last access date
However, touch can only change the modification time and access time, I think it can't change the creation time of a file ... so if that was your real intention, my solution might be sub-optimal... but in that case your question was as well ;-)
I would go through all the files in the source directory tree and gather the modification times from them into a script that I could run on the other directory trees. You will need to be careful about a few 'gotchas'. First, make sure that your output script has relative paths, and make sure you run it from the proper target directory, which should be the root directory of the target tree. Also, when changing machines make sure you are using the same timezone as you were on the machine where you generated the script.
Here's a Perl script I put together that will output the touch commands needed to update the times on the other directory trees. Depending on the target machines, you may need to tweak the date formats or command options, but this should give you a place to start.
#!/usr/bin/perl
my $STARTDIR="$HOME/test";
chdir $STARTDIR;
my #files = `find . -type f`;
chomp #files;
foreach my $file (#files) {
my $mtime = localtime((stat($file))[9]);
print qq(touch -m -d "$mtime" "$file"\n);
}
The other approach you could try is to attach the remote directory using NFS and then copy the times using find and touch -r.
I think rsync (with the right options)
will do this - it claims to only send
file differences, so presumably will
work out that there are no differences
to be transferred.
--times preserves the modification times, which is what you want.
See (for instance)
http://linux.die.net/man/1/rsync
Also add -I, --ignore-times don't skip files that match size and time
so that all files are "transferred' and trust to rsync's file differences optimisation to make it "fairly efficient" - see excerpt from the man page below.
-t, --times
This tells rsync to transfer modification times along with the files and update them on the remote system. Note that if this option is not used, the optimization that excludes files that have not been modified cannot be effective; in other words, a missing -t or -a will cause the next transfer to behave as if it used -I, causing all files to be updated (though the rsync algorithm will make the update fairly efficient if the files haven't actually changed, you're much better off using -t).
I used the following Python scripts instead.
Python scripts run much faster than an approach creating new processes for each file (like using find and stat). The solution below also works in case of timezone differences between systems, as it uses UTC times. It also works with paths containing spaces (but not paths containing newline!). It doesn't set times for symlinks, because the operating system provides no mechanism to modify the timestamp of a symlink, but in a file manager the time of the file the symlink points at is shown instead anyway. It uses a maxTime parameter to avoid resetting dates for files that are actually modified after copying from the original directory.
listMTimes.py:
import os
from datetime import datetime
from pytz import utc
for dirpath, dirnames, filenames in os.walk('./'):
for name in filenames+dirnames:
path = os.path.join(dirpath, name)
# Avoid symlinks because os.path.getmtime and os.utime get and
# set the time of the pointed file, and in the new directory,
# the link may have been redirected.
if not os.path.islink(path):
mtime = datetime.fromtimestamp(os.path.getmtime(path), utc)
print(mtime.isoformat()+" "+path)
setMTimes.py:
import datetime, fileinput, os, sys, time
import dateutil.parser
from pytz import utc
# Based on
# http://stackoverflow.com/questions/6999726/python-getting-millis-since-epoch-from-datetime
def unix_time(dt):
epoch = datetime.datetime.fromtimestamp(0, utc)
delta = dt - epoch
return delta.total_seconds()
if len(sys.argv) != 2:
print('Syntax: '+sys.argv[0]+' <maxTime>')
print(' where <maxTime> an ISO time, e. g. "2013-12-02T23:00+02:00".')
exit(1)
# A file with modification time newer than maxTime is not reset to
# its original modification time.
maxTime = unix_time(dateutil.parser.parse(sys.argv[1]))
for line in fileinput.input([]):
(datetimeString, path) = line.rstrip('\r\n').split(' ', 1)
mtime = dateutil.parser.parse(datetimeString)
if os.path.exists(path) and not os.path.islink(path):
if os.path.getmtime(path) <= maxTime:
os.utime(path, (time.time(), unix_time(mtime)))
Usage: in the first directory (the original) run
python listMTimes.py >/tmp/original_dates.txt
Then in the second directory (a copy of the original, possibly with some files modified/added/deleted) run something like this:
python setMTimes.py 2013-12-02T23:00+02:00 </tmp/original_dates.txt

Resources