How to know in Ruby if a file is completely downloaded - ruby

Our issue is that our project has files being downloaded using wget to the file system. We are using ruby to read the downloaded files for data.
How is it possible to tell if the file is completely downloaded so we don't read a half complete file?

I asked a very similar question and got some good answers... in summary, use some combination of one or more of the following:
download the file to a holding area and finally copy to your input directory;
use a marker file, created once the download completes, to signal readiness;
poll the file twice and see if its size has stopped increasing;
check the file's permissions (some download processes block reads during download);
use another method like in-thread download by the Ruby process.
To quote Martin Cowie, "This is a middleware problem as old as the hills"...

The typical approach to this is to download the file to a temporary location and when finished 'move' it to the final destination for processing.

Related

Spark textFileStream [duplicate]

Should the file name contain a number for the tetFileStream to pickup? my program is picking up new files only if the file name contains a number. Ignoring all other files even if they are new. Is there any setting I need to change for picking up all the files? Please help
No. it scans the directory for new files which appear within the window. If you are writing to S3, do a direct write with your code, as the file doesn't appear until the final close() —no need to rename. In constrast, if you are working with file streaming sources against normal filesystems, you should create out of the scanned dir and rename in at the end —otherwise work-in-progress files may get read. And once read: never re-read.
After spending hours on analyzing stack trace, I figured out that the problem is S3 address. I was providing "s3://mybucket", which was working for Spark 1.6 and Scala 2.10.5. On Spark 2.0 (and Scala 2.11), it must be provided as "s3://mybucket/". May be some Regex related stuff. Working fine now. Thanks for all the help.

Rename a file that multiple processes are trying to use

I have 2 applications running in parallel, both doing the following:
check for file not containing "processed"
process the file and then rename it to filename+processed
for every file, only one application shall use it (on a first come first served basis)
I get the files and I also lock them so the other application cannot process it. But when it comes to renaming the file I get a problem. To rename the file, wanted to use the File.renameTo function. However, for that to work, I have to release the lock on the file. But when I release the lock another process may try to use the file. Exactly that should not happen.
Is there any way to prevent the application B from using the file between application A releasing the lock and finishing renaming the file?
EDIT
Some more information:
File creation if the file doesn't exist has to be prevented.
The file will be processed RandomAccessFile (with read and write permission; this creates a new file if it doesn't exist).
Note: On linux, one can rename a file that is locked, so this problem doesn't occur there. However, on Windows a locked file cannot be renamed; I have to release the lock, then rename it. But the time, during which the lock is released creates enables other applications to see that the file is available and then they will try to use it.
Windows applications can do this using the SetFileInformationByHandle function, which allows you to rename the file using the handle you already have open. You probably can't do this natively from Java.
However, a more straightforward solution would be to rename the file (to filename+processing, for example) before you start processing it. Whichever process successfully renames the file in this way is the one responsible for processing it and eventually renaming it to filename+processed.

How to edit the contents of index.dat windows file

I need to be able edit the content of index.dat file programmatically (C:\Documents and Settings\Username\Cookies\index.dat). More precisely I need to modify it in order that index.dat for one user can be used for a different user name. Is there any documentation out there for this kind of binary file?
Pasco (http://www.foundstone.com/us/resources/proddesc/pasco.htm) is a free index.dat parser that comes with the source code.
Docs will be hard to come by - Microsoft has never publicly documented the structure of the the structure of this file. That said, you can find docs on the web such as the one mentioned above.
However, note that IE keeps close tabs on this file. The file is locked while IE is running (meaning, you can open/read it in some modes but not in others) and you can certainly not write to it.
One method that might still work is to boot-up in safe mode and then assign yourself administrator rights and then see if you can find the files to delete them.
The method I now use is to create a batch file to rename the subfolder below the folder containing the index.bat files and to then only copy the folders back to the original location that don't contain these files but the resultant batch files needs to be run from a separate windows account that has full administrator permissions.
The freeware code editor PSPad will allow you to view and to edit the contents of all of the index.dat files on your computer in hexadecimal form. This is done by replacing all of the digits in the first eight columns with zeros. This removes all of the information contained in the files.
It's a tedious process, requiring holding down the "0" (zero numeric key) as all of the edits are made, but anyone then accessing any of the index.dat files will get no information.
IE must be closed when doing this or you may receive an error message when attempting to save the modified file(s).

Can VS_VERSION_INFO be added to non-exe files?

My windows co-workers were asking me if I could modify my non-windows binary files such that when their "Properties" are examined under Windows, they could see a "Version" tab like that which would show for a Visual Studio compiled exe.
Specifically, I have some gzipped binary files and was wondering if I could modify them to satisfy this demand. If there's a better way, that would be fine, too.
Is there a way I could make my binaries appear to be exe files?
I tried simply appending the VS_VERSION_INFO block from notepad.exe to the end of one of my binaries in the hope that Windows scans for the block, but it didn't work.
I tried editing the other information regarding Author, Subject, Revision, etc. That doesn't modify the file, it just creates another data fork(what's the windows term?) for the file in NTFS.
It is not supported by windows, since each file type has their own file format. But that doesn't mean you can't accomplish it. The resources stored inside dlls and exes are part of the file format.
Display to the user:
If you wanted this information to be displayed to the user, this would probably be best accomplished with using a property page shell extension. You would create a similar looking page, but it wouldn't be using the exact same page. There is a really good multi part tutorial on shell extensions, including property pages starting with that link.
Where to actually store the resource:
Instead of appending a block to the file, you could store the resource into a separate alternate data stream on the same file. This would leave the original file stream non corrupted on disk and not cause its primary file size to change.
Alternate data streams allow more than one data stream to be associated with a filename. Each stream is identified by a colon : at the end of the filename and an identifier.
You can create them for example by doing:
notepad test.txt:adsname1
notepad test.txt:adsname2
notepad test.txt
Getting the normal Win32 APIs working:
If you wanted the normal API to work, you'd have to intercept the Win32 APIs: LoadLibraryEx, FindResource, LoadResource and LockResource. This is probably not worth the trouble though since you are already creating your own property page.
Can't think of any way to do this short of a shell extension. The approach I've taken in the past is a separate "census" program that knows how to read version information from any kind of file.
Zip files can be converted into exe files by using a program that turns a zip file into a self-extracting zip (I know that WinZip does this, there are most likely free utilities for this also; here's one that came up on a search but I haven't actually tried it). Once you've got an exe, you should be able to use a tool like Resource Hacker to change the version information.
It won't work. Either Windows would have to know every file format or no file format would be disturbed if version information were appended to it.
No, resource section is only expected inside PE (portable executable; exe, dll, sys).
It is more then just putting the data inside the file, you have a table that points to the data in the file header.
What you can do if you have NTFS drive, is to use NTFS stream to store custom properties this way the contact of the binary file will remain the same, but you will need to use a custom shell extension to show the content of the stream.

How can I atomically replace a file on a webserver so it's latest version is continually available?

I'm working on a project that generates Google Earth KML files and saves the file to a web-accessible directory. It's running on Windows with ActivePerl. (not my preferred platform but it's what I must work with.)
The method I'm using for this is: write to temp.kml, use File::Copy to copy temp.kml to real.kml. This occurs once a second.
Google Earth grabs this real.kml via an apache2 webserver. The problem is, errors get thrown when Google Earth grabs the real.kml at the same time as temp.kml is being copied to real.kml.
I understand that there's a good chance this is unavoidable, but is there any way that I can minimize the frequency of errors thrown?
Instead of copying the file, why not just move it from your temp directory to the web directory once your processing has finished? If your temp directory is on the same filesystem as the web directory, this should result in only the name of the file changing, while the contents remain unchanged. There should be a smaller chance of a race condition.
Use file::Copy to move file

Resources