Can I parse a pdf with powershell, using no extra libraries? - windows

I would like to parse a pdf with a Windows powershell script. Is it possible to do this without any open source libraries, though? I am in a situation at work that views this as a security risk.
The pdfs are in an expected text format, and I need to extract two numbers from them to be used later.
Sample pdf:
Obviously, the Device ID and Agreement number are crossed off for security, but those are the two strings I care about.

Related

Applescript to grab text from a text file and highlight said text in Skim. Two failed attempts?

PDF's ubiquity, specially in academia, makes being able to highlight them (and saving these annotations) extremely important. Some academic journals (specially in law) allows the user to send journal articles to a Kindle reader, which makes reading and taking notes extremely easy. The question is how to take the underlined text from the MyClippings.txt file to the PDF. About a year ago I found that this is possible through an action in Adobe Acrobat Pro X which would parse a text file which is feeded to it and would highlight the relevant sections. The action takes advantage of the Search and Redact tool but instead of redacting, it highlights.
However, I would like to get out of the Adobe environment (for different reasons, one of which is that the Adobe reader is demanding resource wise and not-free). Skim in Mac OSX sounds like a good alternative for its support of AppleScript integration. I found two projects in GitHub which attempt to do this, but with both of them I failed.
my-clippings-to-pdf
Skim-AppleScript
Would anyone with knowledge in AppleScript take a look at that code and tell me if they look sound? It seems this would be a great functionality for integrating PDFs and ePub in a useful and meaningful way specially for academics.

Protecting/Encrypting Software Data

I've tried to find the answer for days from many sources, but unfortunately not reached any solution.
The problem is how to prevent user from accessing to software data (videos, images, etc.). For example, i have a software or mobil application. And it has some folders that contains videos. I don't want users not to access directly and copy them.
In addition, since these files are big, any conversion of the file needs much time. So this causes slow down the application. I think, encrypting the whole file takes a long time.
I'm asking my question independent of any environment. It can be a windows or android application. Is there any method or technic to achieve this?
Edit: If there is a way to decode/encode the files quickly, it can help me. Or such a password protection solution...
Sorry for my english.
Short version: Not really. All you can do is obfuscate the files, and (on mobile) you have limited resources with which to do this (cpu,ram)
No matter what you do to the files, your program must contain all the information required to decode them or they will not be usable. Ergo, the determined attacker will be able to get the files.
If all you are trying to do is keep out the casual person, then you probably don't need to do anything - extracting files from within mobile applications is generally beyond the normal user.

Pretty print code to PDF

I'm searching for a tool that will take a source directory and produce a single PDF containing the source code, preferably with syntax highlighting.
I would like to read the PDF on my phone, in order to get familiar with a code-base, or just to see what I can learn by reading a lot of code. I will most often be reading Ruby.
I would prefer if the tool ran on Linux. I don't mind paying for a tool if it is particularly good.
Any suggestions?
You could wipe something up yourself with Prawn and Ultraviolet.
PDF is no good for reflowing. You might like a html based solution better.
And in reading existing code, a lineair model is no good. You need to jump from one file to the other. A hypertext model with history would probably work best on the limited screen estate of a phone. It should borrow some features of the smalltalk IDEs (jump to senders, implementors).
For the UI, take a look at clamato
GNU source-highlight supports many languages and can output LaTeX in particular that can be converted to pdf.
The SciTE editor can export the currently edited file (with syntax highlighting) to PDF (and HTML, RTF, LaTeX and XML).
Alas, it doesn't have batch conversion capability, but IIRC somebody made a batch tool out of this code base.
I realize this is very late, but I wanted to do the same thing, except I wanted it for my tablet, which is a Galaxy Note 10.1 with a Wacom digitizer that I can use to annotate code. I found that one good solution is to use Doxygen to generate a PDF which will have hyperlinks and everything you would want in a PDF. For my use case, I would pair it with EzPDF on Android to annotate the code. This was also for the purpose of learning a new codebase. In the end I ended up not using the generated PDF but it was pretty usable.

Getting the path & filename of the open document in any Windows application

Goal
Let me start with my final vision of what I'd like to be able to do first: In Windows, I'd like to be able to use a global keyboard shortcut that I define (say, Ctrl+Alt+C) to copy the full path and filename of the open document in the foreground application to the clipboard.
This would be useful to, for example, be able to subsequently paste the path & filename into an "Open File" dialog in an email client to attach that document to an email, without having to manually browse to the target document in the filesystem.
Specific Question
Now, the specific part of how to do this that I'm interested in how to implement is: How can I get the path and filename of the current "open document" of any arbitrary currently-running Windows application. (If this can't be done with any Windows application, then the next best thing would be for this to work with as many applications as possible.)
Obviously, this wouldn't apply to some applications that don't necessarily have the concept of a "currently open document" that corresponds to a file on the local filesystem, such as an email client, an IM client, or (usually) a web browser.
Application-Specific Solutions
I'm aware that it's possible to write application-specific solutions to do this. For example, the following MS Word VBA subroutine will copy the filename and path of the open document in Word to the clipboard:
Dim myDataObject As DataObject
Set myDataObject = New DataObject
myDataObject.SetText ActiveDocument.FullName
myDataObject.PutInClipboard
However, what I really want is something that will work for any of the applications on my system (or, again, for as many of them as reasonably possible) without having to try and write an application-specific solution for each one.
Idea: Recent Documents Folder
One idea: Could the Recent Documents folder (and/or its underlying Windows APIs) somehow be leveraged to help with this? It seems to have information about the same concept of "open documents" that I'm interested in here, that apparently applies across various application types. (Looking at the contents of the Recent Documents folder on my machine, I see entries in there that were apparently made for documents that I opened with various applications including MS Word, MS Excel, Eclipse, Adobe Acrobat Reader, Paint.NET, TOAD, and Notepad2.)
Preferred Solution Language
I'd prefer solutions in C# or C++ code, but I'm open to any suggestions for how to go about doing this, regardless of implementation language!
Windows 7?
Update 11/2009: Now that Windows 7 is widely available, I figured it might be worth coming back to this question and asking: Does Windows 7 provide any new APIs, or any other mechanism, that would help with what I'm trying to accomplish here?
The best you could probably do is look at the recent documentation registry keys, and get the list of most recent documents. Some sample code for working with this data is in this CodeProject article. This is saved in:
HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs
However, this isn't going to show you whether a document is currently open or not. You could potentially check the title of all open applications, since many applications put document names in their window titles, but this is not a requirement, and many applications do not do that.
There is no mandatory mechanism for an application to specify its open document, so this is not generically possible.

Windows Help files - what are the options?

Back in the old days, Help was not trivial but possible: generate some funky .rtf file with special tags, run it through a compiler, and you got a WinHelp file (.hlp) that actually works really well.
Then, Microsoft decided that WinHelp was not hip and cool anymore and switched to CHM, up to the point they actually axed WinHelp from Vista.
Now, CHM maybe nice, but everyone that tried to open a .chm file on the Network will know the nice "Navigation to the webpage was canceled" screen that is caused by security restrictions.
While there are ways to make CHM work off the network, this is hardly a good choice, because when a user presses the Help Button he wants help and not have to make some funky settings.
Bottom Line: I find CHM absolutely unusable. But with WinHelp not being an option anymore either, I wonder what the alternatives are, especially when it comes to integrate with my Application (i.e. for WinHelp and CHM there are functions that allow you to directly jump to a topic)?
PDF has the disadvantage of requiring the Adobe Reader (or one of the more lightweight ones that not many people use). I could live with that seeing as this is kind of standard nowadays, but can you tell it reliably to jump to a given page/anchor?
HTML files seem to be the best choice, you then just have to deal with different browsers (CSS and stuff).
Edit: I am looking to create my own Help Files. As I am a fan of the "No Setup, Just Extract and Run" Philosophy, i had that problem many times in the past because many of my users will run it off the network, which causes exactly this problem.
So i am looking for a more robust and future-proof way to provide help to my users without having to code a different help system for each application i make.
CHM is a really nice format, but that Security Stuff makes it unusable, as a Help system is supposed to provide help to the user, not to generate even more problems.
HTML would be the next best choice, ONLY IF you would serve them from a public web server. If you tried to bundle it with your app, all the files (and images (and stylesheets (and ...) ) ) would make CHM look like a gift from gods.
That said, when actually bundled in the installation package, (instead of being served over the network), I found the CHM files to work nicely.
OTOH, another pitfall about CHM files: Even if you try to open a CHM file on a local disk, you may bump into the security block if you initially downloaded it from somewhere, because the file could be marked as "came from external source" when it was obtained.
I don't like the html option, and actually moved from plain HTML to CHM by compressing and indexing them. Even use them on a handful of non-Windows customers even.
It simply solved the constant little breakage of people putting it on the network (nesting depth limited, strange locking effects), antivirus that died in directories with 30000 html files, and 20 minutes decompression time while installing on an older system, browser safety zones and features, miscalculations of needed space in the installer etc.
And then I don't even include the people that start "correcting" them, 3rd party product with faulty "integration" attempts etc, complaints about slowliness (browser start-up)
We all had waited years for the problems to go away as OSes and hardware improved, but the problems kept recurring in a bedazzling number of varieties and enough was enough. We found chmlib, and decided we could forever use something based on this as escape with a simple external reader, if the OS provided ones stopped working and switched.
Meanwhile we also have an own compiler, so we are MS free future-proof. That doesn't mean we never will change (solutions with local web-servers seem favourite nowadays), but at least we have a choice.
Our software is both distributed locally to the clients and served from a network share. We opted for generating both a CHM file and a set of HTML files for serving from the network. Users starting the program locally use the CHM file, and users getting their program served from a network share has to use the HTML files.
We use Help and Manual and can thus easily produce both types of output from the same source project. The HTML files also contain searching capabilities and doesn't require a web server, so though it isn't an optimal solution, works fine.
So far all the single-file types for Windows seems broken in one way or another:
WinHelp - obsoleted
HtmlHelp (CHM) - obsoleted on Vista, doesn't work from network share, other than that works really nice
Microsoft Help 2 (HXS) - this seems to work right up until the point when it doesn't, corrupted indexes or similar, this is used by Visual Studio 2005 and above, as an example
If you don't want to use an installer and you don't want the user to perform any extra steps to allow CHM files over the network, why not fall back to WinHelp? Vista does not include WinHlp32.exe out of the box, but it is freely available as a download for both Vista and Server 2008.
It depends on how import the online documentation is to your product, a good documentation infrastructure can be complex to establish but once done it pays off. Here is how we do it -
Help source DITA compilant XML, stored in SCC (ClearCase).
Help editing XMetal
Help compilation, customized Open DITA Toolkit, with custom Perl/Java preprocessing
Help source cross references applications resources at compile time, .RC files etc
Help deliverables from single source, PDF, CHM, Eclipse Help, HTML.
Single source repository produces help for multiple products 10+ with thousands of shared topics.
From what you describe I would look at Eclipse Help, its not simple to integrate into .NET or MFC applications, you basically have to do the help mapping to resolve the request to a URL then fire the URL to Eclipse Help wrapper or a browser.
Is the question how to generate your own help files, or what is the best help file format?
Personally, I find CHM to be excellent. One of the first things I do when setting up a machine is to download the PHP Manual in CHM format (http://www.php.net/download-docs.php) and add a hotkey to it in Crimson Editor. So when I press F1 it loads the CHM and performs a search for the word my cursor is on (great for quick function reference).
If you are doing "just extract and run", you are going to run in security issues. This is especially true if you are users are running Vista (or later). is there a reason why you wanted to avoid packaging your applications inside an installer? Using an installer would alleviate the "external source" problem. You would be able to use .chm files without any problems.
We use InstallAware to create our install packages. It's not cheap, but is very good. If cost is your concern, WIX is open source and pretty robust. WIX does have a learning curve, but it's easy to work with.
PDF has the disadvantage of requiring the Adobe Reader
I use Foxit Reader on Windows at home and at work. A lot smaller and very quick to open. Very handy when you are wondering what exactly a80000326.pdf is and why it is clogging up your documents folder.
I think the solution we're going to end up going with for our application is hosting the help files ourselves. This gives us immediate access to the files and the ability to keep them up to date.
What I plan is to have the content loaded into a huge series of XML files, each one containing help for a specific item. This XML would contain links to other XML files. We would use XSLT to display the contents as necessary.
Depending on the licensing, we may build a client-specific XSLT file in order to tailor the look and feel to what they need. We may need to be able to only show help for particular versions of our product as well and that can be done by filtering out stuff in the XSLT.
I use a commercial package called AuthorIT that can generate a number of different formats, such as chm, html, pdf, word, windows help, xml, xhtml, and some others I have never heard of (does dita ring a bell?).
It is a content management system oriented towards the needs of technical documentation writers.
The advantage is that you can use and re-use the same content to build a set of guides, and then generate them in different formats.
So the bottom line relative to the question of choosing chm or html or whatever is that if you are using this you are not locked into a given format, but you can provide several among which the user can choose, and you can even add more formats as you go along, at no extra cost.
If you just have one guide to create it won't be worth your while, but if you have a documentation set to manage then it is the best to my knowledge. Their support is very helpful also.

Resources