How can I extract text from Win8 print driver generated PostScript file - ghostscript

I can extract text from a Win7 print driver generated PostScript file, but not from Win8.
For example, creating some text in Windows' "Notepad", telling Notepad to print using an HP PostScript print driver, and telling the print driver to output to a file, I obtain a file that I then want to extract text from.
I have tried Ghostscript's ps2ascii and pstopdf | pdftotext and a number of other things on a Ubuntu platform, and while some of these work on the Win7 output, I can not find any combination that work on the Win8 output.
Is there an Open Source solution to this?

You cannot guarantee getting text from any PostScript program, its not designed for that.
However Ghostscript's txtwrite device will do a decent job on the output from the Windows PostScript printer driver. Its much better than ps2ascii because (amongst other things) it can handle Unicode, so its not limited to ASCII.
Beware that applications may generate PostScript themselves, so even if the output appears to be from the Windows PostScript printer driver, the actual content might be generated by the application.
Also you will only get text out of the Windows PostScript printer driver if the application actually writes text to the device context. For example if you print a PDF from the Edge browser then you will get text in the output. If you print the same PDF from Chrome on the same system, then the text is instead rendered as vectors (ie line, arc, stroke fill etc) not text.
Just be aware that what you are trying to do isn't going to be 100% successful in the general case.

Related

How to tell the printer to print grayscale or colored content

I have this line of codes that will tell the printer to print a document. But it will only print colored content.
word = Dispatch("Word.Application")
word.Documents.Open(self.filePath)
word.ActiveDocument.PrintOut()
word.ActiveDocument.Close()
word.Quit()
What I want is to tell the printer to print grayscale content. Is there any possible solution for this?
Q: How to tell the printer to print grayscale or colored content?
Short Answer::
You have to talk to the relevant printer driver, which is completely platform- and API-specific.
Longer Answer:
The snippet you showed, word = Dispatch("Word.Application"), is using a Python wrapper to Microsoft Com/ActiveX. Specifically, to the MS-Word COM/ActiveX component (which was presumably registered on your PC when you installed MS-Word).
So all you have to do is look at the options provided by "Word.Application":
https://learn.microsoft.com/en-us/office/vba/api/word.application.printout
Be advised, you might also have to play around with "Printer Device Settings", for example:
https://learn.microsoft.com/en-us/office/vba/api/access.printer

PostScript Printer loading

We have a VarioPrint 135 laser printer. Printer has ability to print via Postscript. I was trying to set some properties which could lead to print out immediately. But with no luck.
I was trying to print 10000 A4, but the printer is spooling everytime (even if i set print directly to printer). It's not possible to wait more than one hour for spooling.
I am able to print via PCL driver, but we couldn't do that because there is no full bleed option. But when the printer starts spool some files it starts printing.
I expect same behaviour when I am using PostScript driver.
Any ideas?

Zebra printer uploading PCX instead of GRF image

I have two different Zebra printers, the RW420 and the iMZ320.
I am trying to print images on them.
I am using the Java/Android SDK provided by Zebra to first upload the image.
printer.storeImage("R:IMAGE.GRF", ZebraImageFactory.getImage(bmp), ImageUtils.IMAGE_DIMEN, ImageUtils.IMAGE_DIMEN);
On the iMZ320, the image uploads just fine and I am able to print it out.
However, on the RW420, I cannot print the image and when I print the config page with the list of file names, the file is listed as 'IMAGE.PCX'
The printer's language is set to 'ZPL'
Any ideas on why this is happening?
So it depends upon how you created 'printer' in your example. If you used the ZebraPrinterFactory.getInstance(Connection connection) call directly, the SDK will communicate with the printer and determine the type of printer based on a few criteria. For RW420 it will use CPCL as the default language of choice (even though it is in ZPL mode) which will force it to use PCX rather than GRF.
To override this, you can create the printer using the explicit language you wish to use.
ZebraPrinter printer = ZebraPrinterFactory.getInstance(PrinterLanguage.ZPL, connection);

Printing a graphic to a Zebra LP2844 with the GW EPL command?

I need to print an image that is being returned to me through a web service (the data is returned as RAW) and I cannot for the life of me figure out how to print a graphic to a label with EPL.
The EPL manual defines the Graphic Write instruction as:
GWp1, p2, p3, p4, DATA
All of the parameters are returned to me, so I don't have to worry about calculating the height, width, etc., but my problem is that I don't know how to format the DATA.
The manual says DATA should be
Raw binary data without graphic file formatting. Data must be in bytes.
I've tried passing a binary string and a hex string, but nothing seems to work. There is no example on how to use this command in the EPL manual and after hours of searching online I have not been able to find a single example of how to use the command (i.e. example EPL commands that I can copy & paste to send to the printer).
Does anyone have an idea of how to use this command? Could you provide me with an example? (by example I don't mean a framework, code, etc., what I mean is just the plain EPL commands).
I can confirm that the data is in raw, uncompressed binary. It is also inverted-- that is, the 0 bits print as black, at least on my UPS-firmware LP-2844. I have no idea why all the examples from Zebra show the data as encoded into a hex representation.
It's worth noting that most print servers (HP Jetdirect, Lantronix LPS1-T, and almost certainly the Zebra built-in and external print servers) will form a binary connection to the printer if you spit data at them on port 9100 (using netcat for example):
nc printer_hostname_or_ip_address 9100 < test_file.txt
You get no feedback from the printer, except for the label having printed or not.
It takes my LP-2844 (UPS Firmware) printer about 5-6 seconds to print a label containing a 816 wide x 1218 tall downloaded bitmap onto a 4" wide x 6" tall label. It seems to be all imaging time: sending three labels at once is not any faster, and the network connection (through a Lantronix LPS1-T) is held open until the final label prints. That image is at the native resolution of the printer (203 dots/inch), and there is no dithering or resizing going on (I don't think EPL2 even knows how to dither or resize).
It might be possible to speed up the imaging time by optimizing the label into many smaller bitmaps (and horizontal and vertical line segments, and perhaps filled-in rectangular blocks). This wouldn't be a very hard optimization because the image is a single-bit black and white bitmap, and the code would be fairly simple. I don't know if it would really speed it up, though.
A more modern Zebra GX420 running ZPL with a built-in ethernet port ($500 online) can print the same label (with essentially the same graphic download encoding) in 1-2 seconds.
By the way, since I haven't yet actually answered the question, the raw EPL code for this is:
(a blank line)
N
q816
Q1218,20
GW10,10,102,1218,(124236 bytes of inverted bitmap data)
P
all the newlines are 0x0a (unix-style).
Maybe this will help, it has examples (and corrects an error in the manual). Also, it may be easier to use the GM command instead and just delete the image each time (see here for a stackexhange related question).
That being said, I've never gotten my Eltrons to successfully print an image (but my jobs don't require it).
Good luck!!
EDIT: Here's another link with example Perl code. They're aiming for Chinese characters but show how to print the Great Wave image (which oddly is Japanese).
I found that it is not possible to send a graphic to a Zebra printer with EPL using ASCII characters. The data must actually be sent as RAW data. So, for example, you can't send a graphic to the printer using Zebra Setup Utilities, or through any other means that cannot write RAW data from a file directly through the printer.
The only way around this I've found is to create the label as an image and send that image to the printer via a print command within your application.

Print any document with a textual identifier on it

Is there any possibility of printing any document (e.g. image, PDF, Office document, etc) with a text label at the top of page? Modifying actual files isn't an option for me. I'm wondering if there's anything like that provided in Windows printing system.
Thanks.
Some printers allow you to add a "watermark" to every page they print (but that functionality is all in the printer drivers, now in Windows itself). If that's available to you, you could probably tweak the watermark to be what you need.
Another tactic--but a challenging one!--would be to create your own printer driver that accepts the Print command from any program, just like a printer, then adds the text label you want, then forwards the print job on to a real physical printer.

Resources