In my Synology NAS, I have an APFS share with files that have been transferred back and forth for decades across different OSes.
original systems: probably ext4 filesystem and Synology-hosted NFS mount, years ago (various systems, Linux/Windows)
current system: EXT4 filesystem, with Synology-hosted AFP mounts (to a macOS 10.15 system, though I doubt that matters)
For files that were copied via NFS originally, and now hosted via AFP, all the file dates seem to be offset by some amount. I can sort of eyeball the datetime offset, but is there a definitive number I can use? (And a simple way to parse/modify the times using something like GetFileInfo?)
For reference, I have a copy of iTerm2-3_2_6.zip, dated "2039-01-22 08:25:17". I would probably map that to 2019-01-21 (release date for 3.2.7), implying a 20-year offset.
The closest thing I can think of is macOS epoch starting on 2001-01-01 instead of UNIX 1970-01-01, but that's a 30-year offset.
There's also the "year 2038 problem", and some software might be doing something clever with 32-bit overflows to support post-2038 datetimes, but I have at least one file dated "2031-08-10", so that seems unlikely.
This seems to be something related to 32-bit and 64-bit overflows, somewhere in this complicated storage stack; the way the datetimes + error offsets add up is consistently close to 2^31, though I wasn't able to find any clear pattern.
Also, I noticed strange behavior from my Synology system's use of eCryptFS, which seems to lag metadata updates when done in batch. (In particular, I suspect that some eCryptFS/Synology metadata is getting translated incorrectly, or just never written to disk.)
Anyway, I basically wrote a Python script that does the following:
check if os.stat() reports an atime/mtime newer than 2030
check that both atime and mtime are newer; error out if they differ
adjust both times back by 632599096 seconds (offset based on comparing copies of the one file I found in common between two systems).
with the following caveats to watch out for:
macOS's GetFileInfo/SetFile utilities only support 32-bit datetimes, so you should generally avoid using them (even just to verify the metadata updates).
something in the Synology/eCryptFS encryption gets very slow, so after a few dozen metadata updates, the updates will no longer be visible (even after calling sync from the shell). But if you give it some time, you'll see the corrected atime/mtime changes.
The OS-specific os.stat field, ctime, does literally track metadata update times. And there doesn't seem to be a way to manually set it (nor a need to, since this isn't visible through any GUI).
The combination of slow metadata updates + GetFileInfo reporting the wrong time made this incredibly frustrating, until I figured both out. In practice, this means you have to test metadata updates on a few files at a time, then your large batch execution can only be verified a few hours later (I waited a day).
This should have been a blog post, good riddance.
Related
I know, VB6 is historic...ok, but...
I w
rote years ago a backup program not being satisfied from coomercial producuts I tested.
Now I wanted to renew it with some enhancements and a new graphic; the result is quite good for me. Since the file copying process is generally rather slow, I thought to compile it to squeeze some seconds...and instead...this is much slower.
Here are some info:
Win10-64 (version 22H2 just upgraded)
Tested on the same PC with identical parameters
VB6 runs with admin privileges, in Win7 SP3 compatibility mode.
Even if it is not relevant here, the job was to copy a folder containing other 426 folders and 4598 files of different sizes (from 1kB to 435MB, for a total of 1.05GB), from an inside SSD disk to an external SSD disk.
The interpreted version took 7.2 sec while the compiled version ended in 18.6 sec !
I tried different compilation setting in native code, dismissing all the advanced controls over ranges, integers and floats, without any notable difference.
I could accept a small difference for some unknown reason, but it is unreal to get a 2.5:1 ratio.
Any idea?
EDIT
Based on comments:
I repeated the comparison several times; the differences (in both the compiled and the interpreted mode) is around +/- 1sec.
Files are copied using filesystemobject.copyfile
my admin privileges are the same for both
Again, I'm not complaining nor worried by the absolute time the copy takes, I can survive with that since it is an operation made every week and during easy hours.
What is surprising is WHY it happens.
Even the idea to compile the program was due to my curiosity since there is very little to optimize in the code; it is just a for-next loop with very little calculations and assignements.
The program takes the dir and files info from a text-based DB created by recursively scanning of the source folder, then loaded into a custom array...pretty simple.
This is done before the actual copy phase, which is what I'm investigating.
I run the command to find the files named ".*large_files.*"
[root#iz2ze9wve43n2nyuvmsfx5z ~]# find / -iregex ".*large_files.*"
/root/search_large_files.py
It found the file but the cursor is shinning endless even if I leave it alone for over half an hour.
What's the bug in my codes to cause the problem?
Well, it may be that you just have massive file systems :-)
But, if you think it shouldn't be taking that long, you may well have mount points that are slower than normal, such as NFS-mounts where you have to go out over the network to get file information.
You could probably see a slow-down in that case if you just run find / on its own. If it goes out to an external location (like, I don't know, a ZX80 running in Antartica), the output rate may show that, and you'll be able to identify where in the hierarchical structure it happens.
Another possibility is to restrict it to the actual file system you're on to minimise the chance it will go external. That would be by using the xdev flag to prevent it crossing file systems. On my VM with one root file system but mounts for my C and D host drives, I cut the time down from two minutes to seventeen seconds.
Of course, that won't go to other local file systems but you could, if necessary write a script to find (with xdev) the file on all file systems marked ext4 (and whatever other ones you deem to be local).
I have a relatively simple bash script that reads from a set of static input files, stores the input in bash variables and then does a bunch of processing over said input by calling out to external scripts (e.g. written in Python, Go, other bash scripts etc.) and using the intermediate results.
Lately I have been experiencing an intermittent problem where a single character seems to be getting altered somewhere during the processing which then causes subsequent errors. Specifically, a lot of the processing I'm doing involves slicing up a list of comma-separated records, and one of the values on each line is a unix timestamp, e.g. 1354245000.
What seems to be happening is that occasionally one of these values will get altered slightly, so I end up with a timestamp like 13542458=2 or 13542458>2 or 13542458;2 coming out of one of the intermediate scripts. This then subsequently gets fed into another script, which throws an exception when it tries to parse the value to an integer.
In the title of this question, I've suggested that this might be a potential CPU/RAM error. I know the general folly in thinking errors are caused by low level things like hardware/compilers etcetera, but the nature of this particular error makes me think it may be possible, for the following reasons:
The input files are the same on each invocation of the script, and the script only fails on some invocations.
I cannot think of any sources of randomness in the source code prior to where the script is breaking. It's basically just slicing and dicing csv input.
I cannot think of any sources of concurrency in the source code -- even the Go scripts aren't actually written to run anything concurrently.
This problem has only arisen in the last week or so. Prior to this time, this error would never occur.
While I haven't documented every erroneous character, they seem to often be quite close in the ASCII table to numeric values (=, >, ; etc). That said, I guess the Hamming distance between two characters quite far apart can be small also with changes to a high order bit.
The script often breaks at a different stage on different runs. i.e. I have a number of separate Python scripts, and sometimes it'll make it past one script and then the error will be induced in another. Other times it'll be induced on an earlier script.
What I'd like to know is, is there any methodical way to either confirm or rule out a hardware error for this problem? Or if it is a hardware problem, is it possibly undetectable by the operating system?
A bit of further info on the machine:
Linux 64-bit, Ubuntu 12.04
Intel i7 processor
16GB DDR3 RAM
I'm hoping someone can either point me to a reliable way to verify whether the hardware is to blame or otherwise a sound reason as to what else might be the cause.
Try booting into Memtest to check your memory.
While it is highly unlikely that it will be hardware, if you have exhausted you standard software debug as suggested by #OliCharlesworth, here is an outline of hardware error investigation:
(1) check your log area for any `MCE` logs (machine check exceptions).
If you find any in either your log area (syslog) or sometimes in
the present working dir or /dir -- you have a hardware failure.
(2) check your log area for disk errors. e.g:
smartd[3963]: Device: /dev/sda [SAT], 34 Currently unreadable (pending) sectors
(3) check your drive integrity, e.g.: (as root) # `smartctl -a /dev/sda` if any abnormality, run:
smartctl -t short /dev/sda (change drive as required)
(4) download/install/boot to [memtest86](http://www.memtest86.com/download.htm)
(run the complete test)
If your cpu/motherboard has thrown no mce's, you have no disk error, your drive tests OK with smartctl and you have no memory errors with memtest86, then recheck the software debugging. While additional hardware errors can still be present (bad capacitors, etc..) the likelihood at this point is software. Good luck.
This is so wrong.
I want to perform a large copy operation; moving 250 GB from my laptop hard drive to an external drive.
OSX lion claims this will take about five hours.
After a couple of hours of chugging, it reports that one particular file could not be copied (for whatever reason; I cannot remember and I don't have the patience to repeat the experiment at the moment).
And on that note it bails.
I am frankly left aghast.
That this problem persists in this day and age is to me scarcely believable. I remember hitting up against the same scenario 20 years back with Windows 3.1.
How hard would it be for the folks at Apple (or Microsoft for that matter) to implement file copying in such a way that it skips over failures, writing a list of failed operations on-the-fly to stderr? And how much more useful would that implementation be? (both these questions are rhetorical by the way; simply an expression of my utter bewilderment; please don't answer them unless by means of comments or supplements to an answer to the actual question, which follows:).
More to the point (and this is my actual question), how can I implement this myself in OS X?
PS I'm open to all solutions here: programmatic / scripting / third-party software
I hear and understand your rant, but this is bordering on being a SuperUser-type question and not a programming question (saved only by the fact you said you would like to implement this yourself).
From the description, it sounds like the Finder bailed when it couldn't copy one particular file (my guess is that it was looking for admin and/or root permission for some priviledged folder).
For massive copies like this, you can use the Terminal command line:
e.g.
cp
or
sudo cp
with options like "-R" (which continues copying even if errors are detected -- unless you're using "legacy" mode) or "-n" (don't copy if the file already exists at the destination). You can see all the possible options by typing in "man cp" at the Terminal command line.
If you really wanted to do this programatically, there are options in NSWorkspace (the performFileoperation:source:destination:files:tag: method (documentation linked for you, look at the NSWorkspaceCopyOperation constant). You can also do more low level stuff via "NSFileManager" and it's copyItemAtPath:toPath:error: method, but that's really getting to brute-force approaches there.
My Mac app keeps a collection of objects (with Core Data), each of which has a cover image, and to which I assign a UUID upon creation. I had originally been storing the cover images as a field in my Core Data store, but recently started storing them on disk in the file system, instead.
Initially, I'm storing the covers in a flat directory, using the UUID to name the file, as below. This gives me O(1) fetching, as I know exactly where to look.
...
/.../Covers/3B723A52-C228-4C5F-A71C-3169EBA33677.jpg
/.../Covers/6BEC2FC4-B9DA-4E28-8A58-387BC6FF8E06.jpg
...
I've looked at the way other applications handle this task, though, and noticed a multi-level scheme, as below (for instance). This could still be implemented in O(1) time.
...
/.../Covers/A/B/3B723A52-C228-4C5F-A71C-3169EBA33677.jpg
/.../Covers/C/D/6BEC2FC4-B9DA-4E28-8A58-387BC6FF8E06.jpg
...
What might be the reason to do it this way? Does OS X limit the number of files in a directory? Is it in some way faster to retrieve them from disk? It would make the code used to calculate the file's name more complicated, so I want to find out if there is a good reason to do it that way.
On certain file systems (and I beleive HFS+ too), having too many files in the same directory will cause performance issues.
I used to work in an ISP where they would break up the home directories (they had 90k+ of them) Using a multi-directory scheme. You can partition your directories by using, say, the first two characters of the UUID, then the second two, eg:
/.../Covers/3B/72/3B723A52-C228-4C5F-A71C-3169EBA33677.jpg
/.../Covers/6B/EC/6BEC2FC4-B9DA-4E28-8A58-387BC6FF8E06.jpg
That way you don't need to calculate any extra characters or codes, just use the ones you have already to break it up. Since your UUIDs will be different every time, this should suffice.
The main reason is that in the latter way, as you've mentioned, disk retrieval is faster because your directory is smaller (so the FS will lookup in a smaller table for a file to exists).
As others mentioned, on some file systems it takes longer for the OS to open the file, because one directory with many files is longer to read than a couple of short directories.
However, you should perform measurements on your particular file system and for your particular usage scenario. I did this for NTFS on Windows XP and was surprised to discover that flat directory was performing better in all kinds of tests, than hierarchical structure.