Store parsable backup of all printed documents - windows

What I'm trying to accomplish is to always keep a parsable duplicate of all printed documents, and execute a secondary process for each print.
(i.e.: Be able to parse all text, account for pages, vectors, images, etc).
Processing the document can either be done immediately or deferred (immediately is desirable).
As formats go, any PDL might be suitable, my best guess is XPS would probably be the best bet for a parsable format, any recommendations for other formats are appreciated.
Ideally, I'd like to not mess with the user interaction with the printing (e.g.: print settings page; or create a virtual printer, which could save a XPS and then forward the print job to the physical printer).
Since users might not be tech savvy to either set up/use it properly and/or mess up the process at a later date.
What I'm looking for at this time:
Documentation on the print process and flow (WDK, PDL, what else?)
How this could be accomplished, if at all possible; are there any existing solutions?
Any directions into what I should be looking at.

It's only part of an answer, but rumor has it you can tell Windows to keep spooled documents (right-click the printer, choose "Printer Properties", Advanced, "Keep Printed Documents").
You could enable this, and then create a scheduled task (or system service, etc.) that watches the spool directory and moves all files older than a certain threshold to a more appropriate location for further processing. (The age threshold would be a reasonable way to avoid trying to move files that are currently being written.)
Then you'd have to find a program to convert the .spl files to whatever format you like, or try interpreting it yourself. It looks pretty low-level but Microsoft does offer some documentation about the MS-EMF and MS-EMFSPOOL formats that might be a start.

Related

Is there a way to tell if a file has changed in the Windows API other than opening it or timestamps?

I'm writing a program which needs to look at a very large number of files, some of which are very large in size. I'd like to visit a file only once, unless it changes. If it changes I need to revisit it again.
The way I know of to do this is with datestamps. One can look at the modified date to see if it is newer than the last time you looked at the file. Obviously those can be changed programmatically, so I'm wondering if there is a way to determine if a file has changed other than that. (I'm thinking along the lines of a UUID for the file which is changed every time it is modified or an epoch counter, but I'm open to more exotic solutions)
You can monitor changes for these files, assuming you continue to run the whole time. Check the FindFirstChangeNotification API. You can take a look at this project as an example. Sysinternals also has a similar tool, I believe it's implemented in a similar way.

Converting text file with spaces between CR & LF

I've never seen this line ending before and I am trying to load the file into a database.
The lines all have a fixed width. After the CSV text which contains the data (the length varies line-by-line), there is a CR followed by multiple spaces and ending with LF. The spaces provide the padding to equalize the line width.
Line1,Data 1,Data 2,Data 3,4,50D20202020200A
Line2,Data 11,Data 21,Data 31,41,510D2020200A
Line3,Data12,Data22,Data 32,42,520D202020200A
I am about to handle this with a stream reader / writer in C#, but there are 40 files that come in each month and if there is a way to convert them all at once instead of one line at a time, I would rather do that.
Any thoughts?
Line-by-line processing of a stream doesn't have to be a bottleneck if you implement it at the right point in your overall process.
When I've had to do this kind of preprocessing I put a folder watch on the inbound folder, then automatically pick up each file and process it upon arrival, putting the original into an archive folder and writing the processed file into another location from which data will be parsed or loaded into the database. Unless you have unusual real-time requirements, you'll never notice this kind of overhead. If you do have real-time requirements, this issue will pale in comparison to all the other issues you'll face with batched data files :)
But you may not even have to go through a preprocessing step at all. You didn't indicate what database you will be using or how you plan to load the data, but many databases do include utilities to process fixed-length records. In the past, fixed-format files came with every imaginable kind of bizarre format (and contained all kinds of stuff that had to be stripped out or converted). As a result those utilities tend to be very efficient at this kind of task. In my experience they can easily be at least an order of magnitude faster than line-by-line processing, which can make a real difference on larger bulk loads.
If your database doesn't have good bulk import processing tools, there are a number of many open-source or freeware utilities already written that do pretty much exactly what you need. You can find them on GitHub and other places. For example, NPM replace is here and zzzprojects findandreplace is here.
For a quick and dirty approach that allows you to preview all the changes as you develop a more robust solution, many text editors have the ability to find and replace in multiple files. I've used that approach successfully in the past. For example, here's the window from NotePad++ that lets you use RegEx to remove or change whatever you like in all files matching defined criteria.

Does SQL*Loader have any functionality that allows for customizing the log file?

I have been asked to create a system for allowing third party companies to dump data into several of our tables. These third parties provide csv files on a periodic basis, and after doing some research it seemed like Oracle themselves had a standard tool for doing so, "sqlldr". I've since gotten it working to an acceptable degree, and we have a job scheduled to run that script once a day.
But one of the third parties supplies really dirty data, of the sort where I can't expect it to always load every row/record (looking like up to about 8% will fail). My boss asked me to forward "all output" from the first few tests to him, and like a moron I also sent the log file.
He has asked that this "report" be modified to include those exceptions that aren't unique constraints along with the line in the input file that caused the exception.
This means that I need data from the log file, but also from the (I believe) reject file in a single document. Rather than write a convoluted shell script to combine those two, does SQL*Loader itself allow any customization that might achieve the same thing? I've read through the Oracle documentation and haven't found anything that suggests this, but I've also learned not to trust it entirely either.
Is this possible? Ideally, the solution would allow me to add values to the reject file that don't exist in the original input file, but I'm also interested in any customization of the log file or reject file.
No.
I was going to stop there, but you can define the name of the log file, which might help with issue. Most automation with SQL*Loader involves wrapping it within shell scripts; aka "roll your own."

Applescript: system-wide global variable accessible by all scripts

We have a PDF document processing system, implemented in AppleScript (where we call the scripts from the shell using osascript). In some of the scripts, we call Acrobat Preflight Droplets from the Applescript.
This does usually work without problems. However, in some cases, where the processed document is big or/and complex. the droplet returns control to the script before the report is written and the document is moved to the "success" or "failure" folder. The consequence is that the process continues, but without the moved file, it eventually fails.
The workaround so far has been to add a delay after those droplet calls. This does help, but it is a waste of time for small documents, and there will always be a document big and complex enough to take longer than the delay.
We also found out that the time needed for finishing writing the report and moving the document depends on the speed of the system (had to be expected…).
The workaround would be to calculate the delay from the document size, its number of pages, and a machine-dependent parameter. Document size, and number of pages are no big deal; they can be retrieved in the Applescript.
The problem is the machine-dependent parameter, which can be determined experimentally. But how do I make that parameter available to all the scripts needing it?
Incorporating it into the scripts is not an option, because we have a number of systems installed, and if we would do that, we'd end up in a maintenance nightmare. Passing it as an argument in the initial system call is also not possible, because the calls are many, and again would lead to a maintenance nightmare.
So, is there a way to set up a place where that machine parameter can be stored and easily called from any Applescript, no matter how it itself is called.
Thanks a lot for your advice.
You might find the Property List Suite in System Events useful. It’s a standard means of storing and then retrieving such information. Property List files themselves are simply XML files, so you can even create them outside of AppleScript and then read them within your scripts.
There’s a description with examples at https://apple.stackexchange.com/questions/58007/how-do-i-pass-variables-values-between-subsequent-applescript-runs-persistent
A simple suggestion if you only have one paramater to keep track of would be to just have a text file in a known location on each machine. The only content of the text file would be the machine paramater. I like to use the Application Support folder this kind of thing.
Assuming your machine parameter is CPU speed. You can save a text file in /Library/Application Support/Preflight Scripts/machinecpu.txt with the contents:
2.4
Then in Applescript, you would just read the text file.:
set machineParam to read file "Macintosh HD:Library:Application Support:Preflight Scripts:machinecpu.txt"

Is it possible to create a file that cannot be copied?

To restrict the scope, let assume we are in Windows world only.
Also assume we don't want to play with permission policy.
Is it possible for us to create a file that cannot be copied?
Thank you in advance.
"Trying to make digital files uncopyable is like trying to make water not wet." ~ Bruce Schneier
No. You can't create a file that a SYSADMIN can't copy. You could encrypt it, though.
Well, how about creating a file that uses up more than 50% of the total space on that machine and that is not compressible?
For instance, let us assume that you want to save a boolean (true or false) in such a fashion.
Depending on its value, you could then write a bit stream of ones or zeroes and encrypt said stream using some kind of encryption algorith, such as AES in CBC mode. This gives you the added advantage of error correction. Even in case of massive data corruption, you should be able to recover your boolean by checking whether ones or zeroes are prevalent in the decrypted stream.
In that case you cannot copy it around (completely) on the machine...
Of course, any type of external memory that can be added to the system would pose a problem in this scenario. But the file would be already encrypted, so don't worry about it too much...
Any file that can be read can have its contents written to another location (such as another file, i.e. copied).
The only thing you can do is limit who/what can read the file.
What is the motivation behind? If it is a read-only file, you can have it as embedded resources within your assembly.
Nice try, RIAA.
But seriously, no you can not. It is always possible to copy, you can just make it it more difficult for people to make sense of the file or try to hide it using like encryption. Spotify does it.
If you really try hard thou, you cold make a root-kit for windows and use it to prevent windows from even knowing about the file and also prevent copies. The file will still be there and copy-able by other tools, or Linux accessing the ntfs.
If in a running process you open a file and hold an exclusive lock, then other processes cannot read the file until you close the handle or your process terminates. However, as admin you could forcibly remove the lock handle.
Short answer: No.
You can, of course, use security settings to limit who can read the file. But if someone can read it, then they can copy it. Even if you found some operating system trick to disable "ordinary" copying, if someone can read the file, they can extract the contents, store it in memory, and then write it somewhere else.
You can encrypt the contents so it's only useful to your own program, that knows how to decrypt it.
That's about it.
When using Windows 7 to copy some files from a hard drive, certain files popped up a message saying they could not be copied in their entirety; certain data would be omitted from the copy. I suspect that had something to do with slack space at the end of the files, though I thought the message was curious. I would have expected the copy operation to just ignore the slack space.
If you are running old (OLD) versions of windows, there are certain characters you can put in the filename that make it invalid, not listed in folders, etc. They were used a lot in the old pub ftp days of filesharing ;)
In the old DOS days, you used to be able to flag disk sectors as bad and still read from them. This meant the OS ignored the sector in question but your application would know where to look and be able to get the data. Not sure this would work these days.
Another old MS-DOS trick was to put a space character in the middle of the filename (yes, spaces were valid characters for filenames). Since there was no method on the command line to escape a space, the file couldn't be copied using the DOS commands.
This answer is outside Windows so yeah
Dont know if its already been said but what about a file that is an inseperable part of the firmware so that it is always on AND running, perhaps it has firmware that generates a sequence that is required for the other . AN incedental effect of its running is to prevent any 80% or more of its code from being replicated. Lets say its on an entirely different board, protected by surge protectors, heavy em proof shielding and anything else required to make it completely unerasable.
If its possible to make a program that is ALWAYS on and running as long as the copying software is running then yes.
I have another way and this IS with windows. I will come to your house and give you a disk, i will then proceed to destroy every single computer you put the disk into. This doesnt work on XP
Well technically you could create and write to a write-only network share.

Resources