I recently discovered that concatenating text to the end of a PDF file does not change properties of the PDF file. This may be a very silly question, but if a program were concatenated to the PDF file, could it somehow be executed?
For example, opening this PDF file would create a text file in the home directory with the words "hello world" in it.
*pdf contents*...
trailer^M
<</Size 219/Root 186 0 R/Info 177 0 R/ID[<5990BFFB4DF3DB26CE6A92829BB5C41B> <B35E036CA0E7BA4CBF39B3D74DCE4CAF>]/Prev 4494028 >>^M
startxref^M
4663747^M
%%EOF^M
#!/bin/bash
echo "hello world" > ~/hello.txt
Would this work with a different file format? Does the embedded code need to be a binary executable?
As (fortunately), that's not part of the standard, you can't do that.
Unfortunately, the standard supports "launch actions", to execute arbitrary code with user confirmation. Those are now disabled by default and don't allow to execute embedded bulbs, but if enabled you could use that to execute arbitrary code that finds and executes the code embedded on the pdf.
The standard also supports javascript that excecutes sandboxed, but it a reader specific bug that allows may escaping the sandbox.
Related
I have a script in python that does the following:
For a folder of XML files (each file lacks a docroot) :
Read the first 7 lines of source file, but do nothing with them as they need to "not be in the output"
Write a new file (in separate directory) that starts with XML tag & opening Docroot /
Parent Tag
While still reading source file at line 8, go line-by-line and append same new file
Append a closing Docroot / Parent tag to end of new file
Inspiration from John Machin Feb 1 2011
I have a similar solution using bash and sed.
The project sponsor is looking to have the script get called by AWS Lambda, and as such, is leaning towards python as the script's language.
I'm looking for a performance boost and scaling (the source files range in size from 2 MB to 241 MB, and may be larger in the future).
Is it better to stick to Pure Python solution, or use Python, but call out to the sed routines or run the bash script using the subprocess module? Thanks.
Per the advice from Jean-François Fabre , I used subprocess.call() and it worked great. Thanks again for your help
I am looking for a solution i have been trying to research for a little while.
I am making a .bat program to use at my work for templates to be placed in a certification document. Trying to speed up the process of copying and pasting .
Currently i am using notepad, highlighting and CRTL+C / V to copy the template.
I have built a .bat program that will do it in a selection window (enter 1-8 ect ect)
The only problem i am having is i was wondering if i can place the template inside the scripting of the .bat, and it will copy it to the clipboard.
example of template (must have spaces and breaks):
~~~~~~~~~~~~~~~~~~~~~~~
Processed OK:
Surface Hardness Tested HR:
Name
03/21/2015
Company
~~~~~~~~~~~~~~~~~~~~~~~~~~
i know of the "clip" command, but i can only find ways to copy stuff from a .txt. document, and i would like to avoid having 8+ .txt documents for every template. (e.g: clip > temp1.txt)
So is there a way to copy a prefab'ed template from within the .bat
example:
if %TYPE%==1 goto temp1
:temp1
clip ???????
Any help would be great!
Thanks!
Sure you can. You can create arbitrarily complex streams to be fed into the clip program, such as:
#echo off
(
echo hello
echo goodbye
) | clip
If you run a cmd file containing those commands, then open up a notepad document and use CTRL-V, it will paste the text:
hello
goodbye
Just replace the two echo commands with whatever code you need to generate the template.
I just bought abbyy finereader 11 copr to rund it from another programm, but i cant find any commends to be used for finereader.exe.
so without any commands it simply openens and scans but i need to tell it where to save the document and how to name and the to close the app again, also it would be cool to have it as a background task.
While doing my OCR research project, found one. Works with FR12, didn't tested with earlier versions.
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt /quit
general command line: <open_keys/scanning> [<recognition_keys>] [<export_keys>]
<open_keys/scanning> ::= ImageFiles | /scan [SourceName] | /file [filename1 filename2], where
ImageFiles - list of files for recognition
SourceName - images source (scanner); if not specified, current is used
filename.. - list of files for recognition
<recognition_keys> ::= [/lang Language] [/optionsFile OptionsFileName], where
Language - name of language in English (russian, greek, Mixed)
OptionsFileName - path to options file
<export_key> ::= /out ExportFile | /send Target, where
ExportFile - name of file with extension to save file to
(txt, rtf, doc, docx, xml, htm(l), xls, xlsx, ppt, pptx, pdf, dbf, csv, lit);
Target - name of target app where to open
(MSWord, MSExcel, WordPro, WordPerfect, StarWriter, Mail, Clipboard, WebBrowser, Acrobat, PowerPoint)
This command opens FR ui, processes the file and then closes it (if you pass argument /quit). FineCmd.exe located in FR directory where you installed it
Hello I saw this msg very late but i m using ABBYY command line for 10years .
I prefer ABBYY 8 because makes same good job faster and does not open any GUI . It comes with FineOCR.exe:
"C:\...\ABBYY FineReader 8\FineOCR.exe" %1 /lang greek english /send MsWord
It does OCR and opens MsWord . FineOCR.txt is a simple help file.
Regarding ABBYY 11,12 (all versions) there is a FineCmd.exe . Using something like:
"c:\...\FineReader\FineCMD.exe" %1 /lang greek english /send MsWord
does what FineOCR did before (but no .txt help file)
Unfortunately, Such a professional OCR software doesn't support command line utilities. For batch processing, it offers HOT FOLDER utility inside it (from GUI). http://informationworker.ru/finereader10.en/hotfolder_and_scheduling/installandrun.htm
If you want to make OCR batch processing from your program, they sell another software, called 'ABBYY Recoginition Server'.
There also offer a comprehensive API for programmers : http://www.abbyy.com/ocr_sdk_windows/technical_specifications/developer_environment/
If your plan is to batch process them and write the contents to a Database, you can also do a programmatical trick to overcome such limitation, as I did recently in one of my projects (It is a bit offline-way but it is simple and works) : While parsing the files and putting them to your Database table from your program, move (or copy) them all into a folder while changing their filename to include an ID from your Database table. Then use 'hot folder' utility to OCR all files, by having the same filename with TXT extention (It is set from 'hot folder' settings). Then in your program parse the folder's text files, get their content as string, and parse the table IDS from filename, the rest is updating your table with that information.)
An year later, ABBYY does support command line usage: http://www.ocr4linux.com/en:documentation
Version 14 does not save the output file using:
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt /quit
or
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt
Versions 11 & 12 work well using the above commands (does save the output) but does display the GUI which can be closed using /quit.
Versions 9 & 10 don't come with FineCmd.exe or FineOCR.exe.
Version 8 can OCR and send the output to an application of choice but cannot save using /out. In my experience it does open the GUI.
Okay so here is what I want to do. I want to add a print option that prints whatever the user's document is to a PDF and adds some headers before sending it off to a device.
I guess my questions are: how do I add a virtual "printer" driver for the user that will launch the application I've been developing that will make the PDF (or make the PDF and launch my application with references to the newly generated PDF)? How do I interface with CUPS to generate the PDF? I'm not sure I'm being clear, so let me know if more information would be helpful.
I've worked through this printing with CUPS tutorial and seem to get everything set up okay, but the file never seems to appear in the appropriate temporary location. And if anyone is looking for a user-end PDF-printer, this cups-pdf-for-mac-os-x is one that works through the installer, however I have the same issue of no file appearing in the indicated directory when I download the source and follow the instructions in the readme. If anyone can get either of these to work on a mac through the terminal, please let me know step-by-step how you did it.
The way to go is this:
Set up a print queue with any driver you like. But I recommend to use a PostScript driver/PPD. (A PostScript PPD is one which does not contain any *cupsFilter: ... line.):
Initially, use the (educational) CUPS backend named 2dir. That one can be copied from this website: KDE Printing Developer Tools Wiki. Make sure when copying that you get the line endings right (Unix-like).
Commandline to set up the initial queue:
lpadmin \
-p pdfqueue \
-v 2dir:/tmp/pdfqueue \
-E \
-P /path/to/postscript-printer.ppd
The 2dir backend now will write all output to directory /tmp/pdfqueue/ and it will use a uniq name for each job. Each result should for now be a PostScript file. (with none of the modifications you want yet).
Locate the PPD used by this queue in /etc/cups/ppd/ (its name should be pdfqueue.ppd).
Add the following line (best, near the top of the PPD):
*cupsFilter: "application/pdf 0 -" (Make sure the *cupsFilter starts at the very beginning of the line.) This line tells cupsd to auto-setup a filtering chain that produces PDF and then call the last filter named '-' before it sends the file via a backend to a printer. That '-' filter is a special one: it does nothing, it is a passthrough filter.
Re-start the CUPS scheduler:sudo launchctl unload /System/Library/LaunchDaemons/org.cups.cupsd.plist
sudo launchctl load /System/Library/LaunchDaemons/org.cups.cupsd.plist
From now on your pdfqueue will cause each job printed to it to end up as PDF in /tmp/pdfqueue/*.pdf.
Study the 2dir backend script. It's simple Bash, and reasonably well commented.
Modify the 2dir in a way that adds your desired modifications to your PDF before saving on the result in /tmp/pdfqueue/*.pdf...
Update: Looks like I forgot 2 quotes in my originally prescribed *cupsFilter: ... line above. Sorry!
I really wish I could accept two answers because I don't think I could have done this without all of #Kurt Pfeifle 's help for Mac specifics and just understanding printer drivers and locations of files. But here's what I did:
Download the source code from codepoet cups-pdf-for-mac-os-x. (For non-macs, you can look at http://www.cups-pdf.de/) The readme is greatly detailed and if you read all of the instructions carefully, it will work, however I had a little trouble getting all the pieces, so I will outline exactly what I did in the hopes of saving someone else some trouble. For this, the directory with the source code is called "cups-pdfdownloaddir".
Compile cups-pdf.c contained in the src folder as the readme specifies:
gcc -09 -s -lcups -o cups-pdf cups-pdf.c
There may be a warning: ld: warning: option -s is obsolete and being ignored, but this posed no issue for me. Copy the binary into /usr/libexec/cups/backend. You will likely have to the sudo command, which will prompt you for your password. For example:
sudo cp /cups-pdfdownloaddir/src/cups-pdf /usr/libexec/cups/backend
Also, don't forget to change the permissions on this file--it needs root permissions (700) which can be changed with the following after moving cupd-pdf into the backend directory:
sudo chmod 700 /usr/libexec/cups/backend/cups-pdf
Edit the file contained in /cups-pdfdownloaddir/extra/cups-pdf.conf. Under the "PDF Conversion Settings" header, find a line under the GhostScript that reads #GhostScript /usr/bin/gs. I did not uncomment it in case I needed it, but simply added beneath it the line Ghostscript /usr/bin/pstopdf. (There should be no pre-cursor # for any of these modifications)
Find the line under GSCall that reads #GSCall %s -q -dCompatibilityLevel=%s -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pdfwrite -sOutputFile="%s" -dAutoRotatePage\
s=/PageByPage -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode -dPDFSETTINGS=/prepress -c .setpdfwrite \
-f %s Again without uncommenting this, under this I added the line GSCall %s %s -o %s %s
Find the line under PDFVer that reads #PDFVer 1.4 and change it to PDFVer, no spaces or following characters.
Now save and exit editing before copying this file to /etc/cups with the following command
sudo cp cups-pdfdownloaddir/extra/cups-pdf.conf /etc/cups
Be careful of editing in a text editor because newlines in UNIX and Mac environments are different and can potentially ruin scripts. You can always use a perl command to remove them, but I'm paranoid and prefer not to deal with it in the first place.
You should now be able to open a program (e.g. Word, Excel, ...) and select File >> Print and find an available printer called CUPS-PDF. Print to this printer, and you should find your pdfs in /var/spool/cups-pdf/yourusername/ by default.
*Also, I figured this might be helpful because it helped me: if something gets screwed up in following these directions and you need to start over/get rid of it, in order to remove the driver you need to (1) remove the cups-pdf backend from /usr/libexec/cups/backend (2) remove the cups-pdf.conf from /etc/cups/ (3) Go into System Preferences >> Print & Fax and delete the CUPS-PDF printer.
This is how I successfully set up a pdf backend/filter for myself, however there are more details, and other information on customization contained in the readme file. Hope this helps someone else!
The .XFDL file extension identifies XFDL Formatted Document files. These belong to the XML-based document and template formatting standard. This format is exactly like the XML file format however, contains a level of encryption for use in secure communications.
I know how to view XFDL files using a file viewer I found here. I can also modify and save these files by doing File:Save/Save As. I'd like, however, to modify these files on the fly. Any suggestions? Is this even possible?
Update #1: I have now successfully decoded and unziped a .xfdl into an XML file which I can then edit. Now, I am looking for a way to re-encode the modified XML file back into base64-gzip (using Ruby or the command line)
If the encoding is base64 then this is the solution I've stumbled upon on the web:
"Decoding XDFL files saved with 'encoding=base64'.
Files saved with:
application/vnd.xfdl;content-encoding="base64-gzip"
are simple base64-encoded gzip files. They can be easily restored to XML by first decoding and then unzipping them. This can be done as follows on Ubuntu:
sudo apt-get install uudeview
uudeview -i yourform.xfdl
gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xfdl
The first command will install uudeview, a package that can decode base64, among others. You can skip this step once it is installed.
Assuming your form is saved as 'yourform.xfdl', the uudeview command will decode the contents as 'UNKNOWN.001', since the xfdl file doesn't contain a file name. The '-i' option makes uudeview uninteractive, remove that option for more control.
The last command gunzips the decoded file into a file named 'yourform-unpacked.xfdl'.
Another possible solution - here
Side Note: Block quoted < code > doesn't work for long strings of code
The only answer I can think of right now is - read the manual for uudeview.
As much as I would like to help you, I am not an expert in this area, so you'll have to wait for someone more knowledgable to come down here and help you.
Meanwhile I can give you links to some documents that might help you:
UUDeview Home Page
Using XDFLengine
Gettting started with the XDFL Engine
Sorry if this doesn't help you.
You don't have to get out of Ruby to do this, can use the Base64 module in Ruby to encode the document like this:
irb(main):005:0> require 'base64'
=> true
irb(main):007:0> Base64.encode64("Hello World")
=> "SGVsbG8gV29ybGQ=\n"
irb(main):008:0> Base64.decode64("SGVsbG8gV29ybGQ=\n")
=> "Hello World"
And you can call gzip/gunzip using Kernel#system:
system("gzip foo.something")
system("gunzip foo.something.gz")