I have a vb script that processes text files on a server, however, very occasionally without any apparent reason, the script gets stuck in a loop.
Files are uploaded by clients via ftp (smallish about 2 - 10 kb), a .Net service that is watching the ftp folder kicks off an executable (written in VB6), the executable moves the file from the ftp folder to a folder where it is processed. At this point the exe kicks off the vb script. 98% of the time the script runs without issue.
When the infinite loop occurs I kill the exe process. The odd thing is that when I re-kick off the process manually (by copying it back to the the folder that's being watched) the exact same file it goes through without issue.
In short the script opens the file into a text stream then it loops through the TextStream object until its AtEndOfStream property is true. Within the loop it creates a new file with some additional information added, now whenever the infinate loop situation occurs the temp file doesnt contain any of the sourcefile data, just the extra data added by the script, for example, the following code works 98% of the time:
Do While ts.AtEndOfStream <> True
sOriginalRow = ts.ReadLine
sUpdatedRow = sOriginalRow & ",Extra_Data"
NewFile.WriteLine sUpdatedRow
Loop
So if the source file contains:
LineALineBLineC
The new file is created conbtaining:
LineA,Extra_DataLineB,Extra_DataLineC,Extra_Data
but when the problem occurs and the new file is instantly populated with thousands of rows of
,Extra_Data,Extra_Data,Extra_Data,Extra_Data,Extra_Data...
its as though the AtEndOfStream property never becomes true.
My initial thought is that the source file is getting corrupted somehow but when these source files are reprocessed they are fine. Another thought is that the textstream object is getting incorrectly created, and not picking up the new line character, maybe because the file is locked by another process or something.
The only way I have found to replicate the behaviour is to comment out the ReadLine part of the code, which in effect prevents the current line number from incrementing. e.g.
sOriginalRow = "" 'ts.ReadLine
Can anyone offer any suggestions?
Well, as you say, the only thing I can offer are suggestions, not real answers:
Maybe your stream bumps against an ObjectDisposedException (not catchable in VBScript), so the EndOfStream condition gets <> True but not False.
For the sake of the experiment, you could try to change Do While ts.AtEndOfStream <> True to Do While Not ts.AtEndOfStream or Do Until ts.AtEndOfStream but my bio-logical circuits tell me that is probably not going to work.
There is another problem described where the stdIn and stdErr are conflicting with each other causing a hang in particular situations.
Could you also please answer the comment of Tmdean. On Error Resume Next could create a real mess: If ts.AtEndOfStream returns Null or garbage (because ts was destroyed for example) it will not become True and will cause a loop in Foreverland.
Related
I have a Python program which performs a simple operation on a file:
with open(self.cache_filename_url, "a", encoding="utf8") as f:
w = csv.writer(f, delimiter=',', quotechar='"', lineterminator='\n')
w.writerow([cache_url, rpd_products])
As you can see it just opens the file and appends a CSV line to it. It does this a lot, in a loop.
I accidentally ran two copies of this program simultaneously, so I think they would have been appending to the file simultaneously. I am trying to determine the worst-case-scenario for file corruption.
Do you think the writes would at least be atomic operations in this case? For example this wouldn't be a problem for me:
old line
old line
new line written by instance 1
new line written by instance 2
new line written by one
This would be a problem for me:
old line
old line
[half of new line written by instance 1] [half of new line by instance 2]
etc
To put it another way, is it possible for the two append operations to "interfere" with each other?
EDIT: I am using Windows 7
Opening the same file multiple times in shared write mode can definitely be problematic. And, if they don't open in shared mode, you'll get one of them throwing exceptions that it cannot open the file.
If SHARED mode:
Both instances will have their own internal pointer. In most cases, they will probably write independently. You could get:
Process A opens file, sets pointer to end (byte 1024)
Process B opens file, sets pointer to end (byte 1024)
Process B writes at byte 1024 and closes file
Process A writes at byte 1024 and closes file.
Both processes will have written to the file at the same location. You've basically lost the record from Process B, and depending on how the close works (if it truncates), if the lines it writes are different lengths, you could get part of Process B if the line was longer.
If it is in EXCLUSIVE mode, one process will fail to open the file, and whatever exception handling you have will kick in.
Which mode you are in can be system dependent, as Python doesn't seem to provide any mechanisms for controlling the share mode.
Update: I ran a check on my file, and I did indeed have corrupted partial lines (the case under "This would be a problem for me" in my question)
It's unfortunate, especially since it implies you could have problems even when you intend to share a file between two processes.
I am still interested in any pointers on how to avoid this outcome. I will hold off on marking an answer as accepted for now. (The other answer is good, but doesn't provide enough details on these modes or how to determine which will be used.)
I would like to create an application that would monitor processes that are opened. I want to use it to monitor how often do I open my mp3 files. The program would run in background and count the number of times I run each mp3 file and later I would sort my mp3 files based on this number.
I used Process Monitor to check if it can be done. When i filter the output so that it shows only my mp3 player processes and i set to see only "Process Start" operation then I can read which file was opened, for example this is the command line detail that the Process Monitor showed me files that are opend in real-time. After each file run there is a new input in Process Monitor.
As you can see I could easily count the Process Monitor outputs to get the number of times each file was started. However, I don't know how it is done because Process Monitor is an .exe and i can't see inside the code.
What is the easiest way to solve my problem? In fact programming language I would like to use doesn't matter, can be C#, C++, Python.
Thank you in advance.
I have come up with something on my own. It should be enough to use this library psutil in Python 2.7. My process name will be constant, because I always use the same program to run my mp3 / video files.
Find PID of my process
When I have the PID i can check what files are currently opened by my process.
Always remember last opened file. If the current file is different from last one, then increment current's file open count by 1. That's how I can create a list of opened files based on opening frequency.
Use while loop to check this in background.
Here's some small amount of code but in general the soultion is quite simple :)
import psutil
process_list = psutil.pids()
for i in process_list:
temp_process = psutil.Process(i)
#if process doesn't have a name then the .name() method will cause a crash, we need to try-catch it
try:
if (temp_process.name()=="mpc-hc64.exe"):
print temp_process.cmdline() #in my case opened file will be listed in the cmdline, may not always be true
print temp_process.open_files() #this however should ALWAYS return files that are opened by current process!!
except:
print "no process name"
I do some Word automation filling in the blanks in some Word documents which are used as templates.
One template is used more often than the others, and this causes the error, as it locks out and Word is unable to open it, though I wish to open it in read only.
Opening the document
do until lole_word.Documents.Count = 0
lole_word.Documents[1].Close(lole_word.SaveOptions.wdDoNotSaveChanges)
loop
boolean lb_readOnly
lb_readonly = true
lole_word.Documents.Open(as_fileIn, lb_readOnly)
The problem is that the template document is opened once, with no flaws of any kind. But when the same template has to be reused, although the lole_word.Documents.Count always returns 0, when Word opens the previously used template, it is locked out, and Word finally shows up asking me whether I want to open it in read only mode.
I wish to avoid this annoyance and simply open the file in read only mode, as it shall be saved elsewhere once it is filled in.
My problem is that even though I specify open in read only mode by setting the second parameter to true, Word doesn't seem to see it this way and still pops up his File Already in Use by Another User dialog, and then my application loses control over Word and it crashes.
We had a similar problem and I wish that I could remember how we solved it. We may have used the Quit command. I know that we also did an attempted FileOpen in exclusive mode (with no intention of using the file) and immediately closing it. If we got a file locked return code we prompted the user to close out of excel first because there were times they would have the program open outside of OLE. I know this isn't exactly what you were looking for but hope it leads you somewhere. I recall this being an intermittent problem and there were some cases users had to open task manager and kill the extraneous excel process.
I vaguely remember the locking being caused by the file system and not Word, as we were opening in read only as well.
I have been facing this problem for sometimes now, a laziness caused in part by the fact that Microsoft Office automatically save files you are working on with versions and automatic recovery.
Many times when I am starting a new notebook in mathematica to do some tests or whatever, I often forget to save what I am doing.
Every now and then, depending on the computer I am using, the computer crashes and all the beautiful work I was doing is lost forever...
Is there a way to get around this other that manically saving my files every five minutes? How about file versioning?
BTW: Using MMA V8
Regarding autosaving, you may want to check out the NotebookAutoSave option, which can be set to True through Fromat->Option Inspector. You have to choose "Selected notebook", then go to Notebook Options -> File Options, and set NotebookAutoSave to True. Then, your notebook will be saved after every evaluation. Whether or not this is a satisfactory solution, of course depends on the situation.
But my experience is that the most reliable way is to develop a CTRL+S reflex - this one never lets me down and is working quite well.
As for the versioning, it is much easier with packages, for which you can use WorkBench which has integrated support for CVS and support for SVN via Eclipse plugin. For notebooks, I refer you to this SO thread. You may also find this Mathgroup discussion of some interest.
EDIT
For M8, for auto-saving purposes you can probably also run
RunScheduledTask[NotebookSave[EvaluationNotebook[]],{300}]
But I can not test this code at the moment
EDIT2
I just came across this post in the Toolbag repository - which may also be an alternative for the autosave part of the question (but please see also the discussion in comments on the relative advantages of scheduled tasks vs. Dynamic)
Since you have MMA version 8 you could use:
saveTask = CreateScheduledTask[FrontEndExecute[FrontEndToken["Save"]], 5*60];
StartScheduledTask[saveTask];
to save every 5 minutes (change the term 5*60 for other timings).
To remove the auto-save task use:
RemoveScheduledTask[saveTask];
To save only a fixed, specific notebook, store its handle in nb (finding it using Notebooks, SelectedNotebook, InputNotebook or EvaluationNotebook) and use FrontEndToken[nb,"Save"] instead of just FrontEndToken["Save"]
I have a Mathematica package that provides auto-backup functionality. When enabled, the current notebook--call it "blah.nb"--will be backed up to "blah.nb~" after a configurable amount of time has elapsed. I use it constantly and it has saved me from losing work many, many times. It's better than autosaving since it doesn't touch the actual notebook file: if you screw something up or something gets corrupted you don't want to overwrite your main file. :)
It's on GitHub here.
I've got an autosave routine that saves a copy of every open, modified notebook every 5 minutes (or whatever interval you prefer. It leaves your manually-saved copy alone, and saves a "swap file" in a separate directory that can be easily recovered if need be. The code (to be copied to init.m) is given in this answer: https://mathematica.stackexchange.com/questions/18380/automatic-recovery-after-crash/65852#65852, and copied below:
Motivated by the same concerns, I wrote the following code and added it to my init.m file. There are two main entries you'll want to change to use this. The global variable $SwapDirectory is where the swap files are saved (by swap file, I mean it in the VIm sense; an "extra" copy of your notebook, separate from your manually saved copy that periodically saves any new work). The swap files are organized within the swap directory in a directory structure which "mirrors" their original file locations, and have ".swp" appended to their file names. The other variable you might want to change is the number of seconds between autosaves, indicated by the "300" (corresponding to 5 minutes) near the bottom of the code below. At the appropriate times, this code will (automatically in the background) save swap files for ALL open notebooks, unless they are unmodified from their manually-saved versions (this exception makes the code more efficient, and more importantly, prevents the storage of swap files for documentation notebooks, for example).
In its current form, the code does not filter for only the input cells, but hopefully you can use the other answers to make that modification yourself.
Some things to note:
1) the Mathematica Put command seems to have trouble writing to network drives, even when offline access is enabled. Therefore, it is probably best to choose a SwapDirectory that is on your local machine.
2) Within SwapDirectory, you should create a sub-directory called "Recovery". This is where the AutoSaveSwap routine will make an initial save of any notebooks for which there is NO existing manual save location.
3) Simply evaluate
RecoverSwap["filePath"]
where "filePath" is a string representing the filePath of the MANUALLY-SAVED copy of the file (i.e., not the file that was created by AutoSave). This will then pop up a window containing the most recent auto-saved version of the file. The manually saved version is NEVER overwritten, unless you explicitly choose to do so. Once the recovered version pops up, you can save it whereever you like, or discard it at your discretion.
4) You should probably add this code to the KERNEL version of init.m ($UserBaseDirectory/Kernel/init.m) rather than the frontend version... this way, if you quit and restart the kernel, the autosave feature will also restart. On the other hand, this means that you must evaluate at least one expression after each start or restart to begin auto-saving. Once this initial evaluation is done, you do NOT need to have evaluated a cell for it to be backed up (unlike the built-in autosave utility).
Hope this helps someone! Feel free to respond with any questions, suggestions, or requests for improvement you may have. And, if you find this post useful, upvotes would be most appeciated! Take care.
$SwapDirectory= "C:\\Users\\pacoj\\Swap Files\\";
SaveSwap[nb_NotebookObject]:=Module[
{fileName, swapFileName, nbout, nbdir, nbdirout, recoveryDir},
If[ ! SameQ[Quiet[NotebookFileName[nb]], $Failed],
(* if the notebook is already saved to the file system *)
fileName = Last[ FileNameSplit[ NotebookFileName[nb]] ];
swapFileName = fileName <> ".swp";
nbdir = Rest[FileNameSplit # NotebookDirectory[nb]];
nbdirout= FileNameJoin[ FileNameSplit[$SwapDirectory]~Join~nbdir]<>"\\";
If[!DirectoryQ[nbdirout], CreateDirectory[nbdirout]];
nbout = NotebookGet[nb];
Put[nbout, nbdirout <> swapFileName],
(* else, if the file has never been saved, save as untitled *)
recoveryDir= $SwapDirectory <> "Recovery\\\";
fileName= ("WindowTitle" /. NotebookInformation[nb])<>".nb";
NotebookSave[nb, recoveryDir <> fileName]
]
];
RecoverSwap::noswp= "swap file `1` not found in expected location";
RecoverSwap[nbfilename_String]:=Module[
{fileName, swapFileName, nbin, nbdir, nbdirout},
fileName= Last[ FileNameSplit[ nbfilename] ];
swapFileName= fileName <> ".swp";
nbdir= Most[ Rest[FileNameSplit # nbfilename] ];
nbdirout= FileNameJoin[ FileNameSplit[$SwapDirectory]~Join~nbdir]<>"\\\";
If[ FileNames[swapFileName, {nbdirout}] == {},
Message[RecoverSwap::noswp,nbdirout <> swapFileName]; Return[],
nbin= Get[nbdirout <> swapFileName]; NotebookPut[nbin]
]
];
AutoSaveSwaps= CreateScheduledTask[
SaveSwap /# Select[Notebooks[], "ModifiedInMemory" /. NotebookInformation[#]&],
300
]
StartScheduledTask[AutoSaveSwaps]
Please note this is not duplicate of File r/w locking and unlink. (The difference - platform. Operations of files like locking and deletion have totally different semantics, thus the sultion would be different).
I have following problem. I want to create a file system based session storage where each session data is stored in simple file named with session ids.
I want following API: write(sid,data,timeout), read(sid,data,timeout), remove(sid)
where sid==file name, Also I want to have some kind of GC that may remove all timed-out sessions.
Quite simple task if you work with single process but absolutly not trivial when working with multiple processes or even over shared folders.
The simplest solution I thought about was:
write/read:
hanlde=CreateFile
LockFile(handle)
read/write data
UnlockFile(handle)
CloseHanlde(handle)
GC (for each file in directory)
hanlde=CreateFile
LockFile(handle)
check if timeout occured
DeleteFile
UnlockFile(handle)
CloseHanlde(handle)
But AFIAK I can't call DeleteFile on opended locked file (unlike in Unix where file locking is
not mandatory and unlink is allowed for opened files.
But if I put DeleteFile outside of Locking loop bad scenario may happen
GC - CreateFile/LockFile/Unlock/CloseHandle,
write - oCreateFile/LockFile/WriteUpdatedData/Unlock/CloseHandle
GC - DeleteFile
Does anybody have an idea how such issue may be solved? Are there any tricks that allow
combine file locking and file removal or make operation on file atomic (Win32)?
Notes:
I don't want to use Database,
I look for a solution for Win32 API for NT 5.01 and above
Thanks.
I don't really understand how this is supposed to work. However, deleting a file that's opened by another process is possible. The process that creates the file has to use the FILE_SHARE_DELETE flag for the dwShareMode argument of CreateFile(). A subsequent DeleteFile() call will succeed. The file doesn't actually get removed from the file system until the last handle on it is closed.
You currently have data in the record that allows the GC to determine if the record is timed out. How about extending that housekeeping info with a "TooLateWeAlreadyTimedItOut" flag.
GC sets TooLateWeAlreadyTimedItOut = true
Release lock
<== writer comes in here, sees the "TooLate" flag and so does not write
GC deletes
In other words we're using a kind of optimistic locking approach. This does require some additional complexity in the Writer, but now you're not dependent upon any OS-specifc wrinkles.
I'm not clear what happens in the case:
GC checks timeout
GC deletes
Writer attempts write, and finds no file ...
Whatever you have planned for this case can also be used in the "TooLate" case
Edited to add:
You have said that it's valid for this sequence to occur:
GC Deletes
(Very slightly later) Writer attempts a write, sees no file, creates a new one
The writer can treat "tooLate" flag as a identical to this case. It just creates a new file, with a different name, use a version number as a trailing part of it's name. Opening a session file the first time requires a directory search, but then you can stash the latest name in the session.
This does assume that there can only be one Writer thread for a given session, or that we can mediate between two Writer threads creating the file, but that must be true for your simple GC/Writer case to work.
For Windows, you can use the FILE_FLAG_DELETE_ON_CLOSE option to CreateFile - that will cause the file to be deleted when you close the handle. But I'm not sure that this satisfies your semantics (because I don't believe you can clear the delete-on-close attribute.
Here's another thought. What about renaming the file before you delete it? You simply can't close the window where the write comes in after you decided to delete the file but what if you rename the file before deleting it? Then when the write comes in it'll see that the session file doesn't exist and recreate it.
The key thing to keep in mind is that you simply can't close the window in question. IMHO there are two solutions:
Adding a flag like djna mentioned or
Require that a per-session named mutex be acquired which has the unfortunate side effect of serializing writes on the session.
What is the downside of having a TooLate flag? In other words, what goes wrong if you delete the file prematurely? After all your system has to deal with the file not being present...