How to Highlight Text in PDF with commandline (windows)? - windows

I'm looking for a way to highlight words (e.g."some words [0-9]") or better the whole line with the given words in it, in some onesided PDFs.
It will be part of a Batch-process on Windows, so i need a command line way to do this. I've looked at Ghostscript, but can not see how it is to be used.
hope i didnt made something wrong - i looked into other questions, mainly Add comments to PDF files automagically with regular expressions but this helped me not really, also english is not my native language - as you maybe have noticed already.
Thanks in advance

Ghostscript can't do this. Generalized text tools also can't because (1) most PDF's have the text commands in compressed blocks, and (2) text often is not 'encoded' in any standard way. Sometimes the font provides a ToUnicode map, but often not even that and (3) what looks like text may not even be text -- it may just be bitmapped images.
A tool like 'mutool clean -d' and "expand" a PDF so that (1) is solved -- text commands can be found in the PDF, but you still may have things like:
(!"##$) Tj
instead of Hello because of (2). And then there's the other way kerned text is done in PDF, even if standard encoding is used:
[(H) 120 (e) 80 (l) 95 (l) 95 (o)] TJ
It might be possible, but very difficult, and would require programming, and still would not address (3) (that would require OCR of the bitmapped text).

Related

Ghostscript - Indentation of postscript code

Is there an option for to me to ask Ghostscript to indent the Postscript it creates?
Everything starts at the beginning of a line and I find it difficult to follow.
Alternatively, I am using Emacs and ps-mode.
If anyone know how to indent code in this mode I would appreciate a tip (apologize because this may not be relevant to this StackExchange)
No, there is no option for indenting the output.
PostScript is pretty much regarded as a write-only language anyway, and the output of ps2write (which is what I assume you are using though you don't say) is particularly difficult since it fundamentally outputs PDF syntax with a PostScript program on the front to parse it into PostScript operations.
Why do you want to read it ?
[EDIT]
You can always edit your question, you don't need to post a new answer.
I'm afraid what you want to do isn't as simple as you might think.
It might be possible for this use case if the PDF files you receive are always created the same way, but there are significant problems.
The font you use as a substitute for the missing font must be encoded the same way. Say for example the font in the PDF file is encoded so that 0x41 is 'A', you need to make sure that the replacement font is also encoded so that 0x41 is an 'A'. So just the findfont, scalefont, setfont sequence is not always going to be sufficient, sometimes you will need to re-encode the font.
CIDFonts will be a major stumbling block. Firstly because ps2write simply doesn't emit CIDFonts at all. These were not part of level 2 PostScript. As a result all text in a CIDFont will be embedded as bitmaps. If your original file doesn't contain the CIDFont then you'll get the fallback CIDFont bitmapped.
Secondly CIDFonts can use multiple-byte character codes, of variable length. You can't simply replace a CIDFont with a Font, it just won't work.
The best solution, obviously, is to have the PDF files created with the fonts required embedded. This is best practice. If you can't get that, then I'd suggest that rather than trying to hand edit PostScript, you use the fontmap.GS and cidfmap files which Ghostscript uses to find font.
Ghostscript already has a load of code to do font substitution automatically, using both Fonts and CIDFonts as substitutes, and it does all the hard work of re-encoding the fonts or building CMaps as required. If you are on Windows much of this may already be done for you, when you install Ghostscript it will ask if you want to create font mappings. If you said yes then it will
Add the font substitutions you want to use in those files (they have comments explaining the layout) and then use the pdfwrite device to make a new PDF file. Set EmbedAllFonts to true (you may need to add a AlwayEmbed font array as well, listing the fonts specifically) and SubsetFonts to false.
That should create a new PDF file where the missing fonts have been replaced by your defined substitutes, those substitutes will have been embedded in the new PDF file and they have will not been subset (Acrobat will generally refuse to edit text in a subset font).
The switches I mentioned above are standard Adobe Distiller parameters, but they are documented for pdfwrite here. There's some documentation on adding fonts here and here and specifically for CIDFonts here.
Basically I'd suggest you define your substitutions and let Ghostscript do the work for you.
This is not an answer to the problem but rather an answer to KenS's question about "Why do you want to read it?"
I tried to put it in the comment box but it was too long.
I am a retired engineer with a strong programming background.
I would like to read and understand the postscript code for the reason shown below.
I play duplicate bridge as a hobby. I recieve a PDF file of what is know as a convention card (a single page document of bridge agreements).
Frequently I would like to edit these files.
When I open with Adobe Illustrator I have to spend a significant amount of time replacing fonts that are not on my system with fonts that I do have.
I can take the PDF and export it as a postscript file using Ghostscript.
I was going to write a little program to replace the embedded fonts with the fonts that I use to replace them.
I was going to leave the postscript file unaltered and insert things like
/HelveticaMonospacedPro-RG findfont
12 scalefont setfont
just above where the text is written.
I was planning on using the fonts that I have on my system (e.g., HelveticaMonospacedPro-RG).

Arabic/Persian label Matlab figure

Matlab cannot display Arabic/Persian labels of the figure. Also I cannot see my installed fonts and I don't want to add the labels by another program. How can I fix this problem?
What you're looking for is a way to display unicode characters in axes labels.
It seems that this problem was encountered before, but there's no simple solution for it. See workarounds here and here.
One important thing though - do not edit .m files containing unicode\utf-8 characters (such as Arabic, Farsi, Hebrew, Chinese, etc...) in MATLAB, because it messes up the characters upon saving. Use an external editor (like Notepad++) to edit and save the files (as UTF-8 without BOM), and only run in MATLAB.

markdown or markup to powerpoint?

I need to maintain some slides in both latex beamer and in powerpoint. (This is to make slides available for instructors elsewhere, too, 90% of which do not know how to use latex and are unwilling to learn it. and I am a latex guy on linux.)
I have tried the route via Libreoffice (and opendocument), but this did not come out well. right now, the best method that I have found is to author pdf in beamer, then run it through a nuance OCR program to get MS Word...and not even go all the way to Powerpoint (which is where I really need to be).
If I only had a markup language that produced nice Powerpoint, I could probably code a perl translator from markdown to this intermediate markup language. (going from markdown to latex beamer is relatively easy.)
I don't think this exists, but hope springs eternal. after all, it is almost 2014 now. does anyone know of a solution?
One solution is to use odpdown: It converts markdown to the OpenOffice Presenter format, which can be imported into PowerPoint.
It is not yet complete, i.e. table support is missing and possibly not running on certain Windows setups, but nevertheless it could be a start. Possibly, you have Linux running, where it seems to work.
Steve Rindsberg's answer in the comments works on PP 2007 works! Let me repeat it here:
I suspect that PowerPoint is the likeliest solution. ;-) But what sort
of slides are you creating? If they're simple heading and bullet point
slides, all you need to produce is a simple text file. Any text that
starts in the left column will be the heading of a new slide. Indent
one tab and it becomes a first-level bullet point under the current
heading; indent two tabs, it becomes a second level bullet point and
so on. Simply use File | Open on the text file to pull it into PPT.
Steve: Is this all that PP converts? Or is there a reference of other "sneaky" markup that PP knows about?
(pandoc: unfortunately, the conversion from libreoffice to powerpoint is pretty poor when I tried it last. I also tried to save and understand the powerpoint xml format, but that was REAL bad.)
The easiest way to handle this is to work with:
RStudio (and R if not already installed)
RMarkdown
Pandoc 2.0.5 (minimum)
Install those 3 (or 4) items, then read: https://bookdown.org/yihui/rmarkdown/powerpoint-presentation.html
The installation time is worth the time saved copy-pasting everything from scratch.
I also am a Linux guy and I also use LateX engines to create nice documents. Based on my experience, here's what you should do :
Stop writing directly in LaTeX and start using org-mode to write documents instead (I spent years writing in LaTeX and now it's over (except when I use modernv package))
Org supports latex math formulas and .org files are easily exported in .tex files
Org can also be easily exported in markdown
Once you have your markdown, there are several tools that will allow you to create a PowerPoint. Two of them are pandoc and md2pptx

How can I change the background color of specific characters in a RTF document?

I'm trying to output RTF (Rich Text Format) from a Ruby program - and I'd prefer to just emit RTF directly without using the RTF gem as I'm doing pretty simple stuff.
I would like to highlight specific characters in a DNA sequence alignment and from the docs it seems that I can either use \highlightN ... \highlight0 or \cbN ... \cb1
The problem is that I cannot get \cb to work in either Word:Mac 2008 or Mac TextEdit (\cf works fine so I know it's not a color table issue)
\highlight does work but seemingly only with two of the possible colors (black and red) and \highlight does not use the custom color table.
By creating simple docs in Word with character shading and saving as RTF I can see blocks of ridiculously verbose RTF code that presumably does what I want, but it is so impenetrable that I'm not seeing the wood for the trees.
Part of the problem may well be that Mac Word is just not implementing RTF properly. I don't have a Windows version of Word handy.
Anyone know the right way to shade blocks of text?
Thanks
--Rob
There is a note in the RTF Pocket Guide that says MS Word does not implement the \cb command. It says MS Word uses \chshdng0\chcbpatN (where "N" is the color number that you would use with \cb). The book recommends using something like the following for compatibility with programs that implement \cbN and/or \chshdng0\chcbpatN: {\chshdng0\chcbpat5\cb5 text}.
Note: The copy of the book I have was published in 2003, so it might be a bit out-of-date.
The sequence of RTF commands that seems to be most universally supported by RTF-capable applications is:
\chshdng10000\chcbpatN\chcfpatN\cbN
These commands:
set the shading to 100 percent
set the pattern foreground and background colors to the color from the color table (we're not actually specifying a shading pattern)
set the character background to the color from the color table
Word was the most difficult application to properly render background colors in:
Despite what the latest (1.9.1) RTF spec says, Word 2013 does not resolve \highlightN colors from the \colortbl. Instead, \highlightN maps to a predefined list of colors. It looks like those colors come from the 1.5 version of the RTF spec.
Regarding \cb, the 1.9.1 spec contains this helpful pointer at the end of the section on Color Table:
Note: Windows versions of Word have never supported \cbN, but it can be emulated by the control word sequence \chshdng0\chcbpatN.
This is almost a useful suggestion, except that if you read the documentation for \chshdngN:
Character shading. The N argument is a value representing the shading of the text in hundredths of a percent.
So, 0 turns out to not be a very useful value; 100 / 0.01 gives us the 10000 we used in the sequence above.
Use WordPad to create RTF documents, not Word. WordPad creates much simpler documents, i.e. approaching human-readable.
I use WordPad every time I need to display formatted text in a WinForms application, and need something that the RichTextBox control can handle being assigned to its Rtf parameter.

Text editor/viewer with ANSI codes rendering support for Windows [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I need some tool to display text containing ANSI codes correctly on Windows. No full support needed, but at least coloring/bold is a must.
Reason: My logger/debug module produce nicely rendered rich output with important sections colored using ANSI codes. This helps a lot when debugging on the serial terminal, but if I dump the debug to a file or copy-paste it into a text editor on Windows (interactive remote debug is not always viable), at best all the ANSI codes are stripped, at worst they are rendered as junk characters obscuring the real data. Rudimentary editing capabilities would be appreciated to be able to pick out specific parts, annotate, and so on.
The open-source editor Atom has the package language-ansi-styles. It supports all kinds of formatting except ;r;g;b.
You might have some more luck with ASCII/ANSI utilities, like the ones listed here:
List of ASCII/ANSI/NFO utilities
**Note: some files on this page might be outdated, you might find newer versions of these utilities on their respective homepages.*
For example, the latest version of NFOPad can be found here.
I've been looking for a solution to display the ANSI colors as well (for program debug output readability) and stumbled upon Sublime Text (paid software with trial http://www.sublimetext.com/) with a the ANSIescape package (https://github.com/aziz/SublimeANSI or installed through the package control).
It supports coloring and the bold escape is recognized but not displayed, although a special color can be assigned to it in the settings file. Also worth noting that this plugin shows text in read-only mode, and needs to be turned off if editing is necessary.
Here is the screenshot provided on the github, and I have personally tried it and verified it works:
If you're primarily interested in viewing the file instead of editing it, Ansifilter will convert it to HTML, which you can then view and at least search in your browser, or RTF if wordpad would be good enough (hard to imagine). Huh, looks like there's a notepad++ plugin version on the download page, too, so that might be perfect if it allows you to load into notepad++.
http://www.andre-simon.de/doku/ansifilter/ansifilter.html
There's also a different plugin for vim which colors text according to ANSI codes.
http://www.vim.org/scripts/script.php?script_id=302
However, while it highlights the text in the correct color, it leaves the ANSI codes themselves in there (in a faded, near-background color) which probably will mess up any alignment formatting in the file, as well as making it harder to move around the file (lots of "empty space" to wade the cursor through, searching for a word won't match if there's an ansi code in the middle of it, etc.). There's a patch it can take advantage of to hide the codes too, but that would require patching and then recompiling vim itself from source.
Yeah, suggesting vim is pretty unhelpful if you aren't a vim user already, it has too huge of a learning curve, I know. But it might be useful to the vim users out there.
I know it won't be of much help - but I was looking for the exact same thing on linux; was just trying to view some log outputs that had bash ANSI color codes inside. Unfortunately, those ANSI color codes were spread across several lines - meaning 'cat'-ing the file and piping into 'less -R', 'most' and similar tools, would simply display the starting line where the color originated, but not the subsequent lines that should've been colored.
Funnily enough, I thought usual Linux tools like 'nano', 'gedit', 'vim' and whatnot would have capabilities for ANSI color codes in a text file, but it's very modest out there with info on ANSI color in text files in these editors. I've only found info on ANSI color for the test editor 'joe':
Cheap ANSI Color! - http://tldp.org/LDP/LG/issue01to08/articles.html#ansi
but couldn't get the recommendations there to work (also couldn't get 'emacs' to work either, at least not by directly reading a text file with ANSI color characters inside).
The good thing - it seems what you need, if you need ANSI color in text, is to look for ASCII art / NFO utilities as recommended above - and the one that I finally found, and was working for me, was tetradraw (via www.linux.org/apps/AppId_42.html ; can be sudo apt-get installed in Ubuntu ... actually, tetradraw is the name of the drawing/editor part - however there is a separate viewer that also works with ANSI color codes, tetraview).
Well, who would have thought, that you need to track down an ASCII art utility, in order to read log files :)
Anyways, hope this may somehow help in the further search of ANSI color text editors for Windows, too.. Cheers!
If you just want to view then the terminal program "Tera Term" can do this. Just click "File" -> "Replay Log" and select your file containing the ANSI codes.
You can download Tera Term here:
http://logmett.com/index.php?/download/tera-term-477-freeware.html
In Emacs, just eval the following before opening your .nfo file:
(add-to-list 'auto-coding-alist '("\\.nfo\\'" . cp437-dos))
I have been a while testing multiple programs on the URL refered by Andras Vass with no results (they don't show colors, or they keep showing ANSI codes as a mess of characters).
Tired of searching I have finally found ANSIFilter (not the NotePad++ plugin refered by Jeffson), the only that works for me.
I have added it to Windows context menu, so I can now easily open my ANSI text files.
I would be surprised if emacs can't do that.
At least with the embeded shell.
There are:
http://www.emacswiki.org/emacs/AnsiTerm
http://www.emacswiki.org/emacs/MultiTerm
http://www.emacswiki.org/emacs/ansi-color.el
Update: as it had been pointed, they are just term output colorizers. But if you can edit the shell buffer contents in emacs too, eg. cat file && colorize.
But wait a minute, I had just found these:
http://vaperized.com/ansiexpress.htm
http://www.syaross.org/thedraw/
http://picoe.ca/products/pablodraw/
If the debug logging of your application goes via 1 class/function, you could try to split the output so that:
ANSI-like logging is shown on the terminal/console
HTML-like logging is written to file
For your application all logging goes to this class, and this class splits the output to terminal/console and file.
Make a 'standard' in your logging class for specifying colors and boldness (e.g. predefined codes like Ctrl-A means red, Ctrl-B means bold, ..., or specific methods in the logging class for setting the color and boldness, or maybe even the ANSI-codes), and translate this in your central logging class to:
the correct ANSI codes on terminal
the correct HTML codes in file
Alternatively, I think that instead of HTML you also could use rich-text, but I don't know all the possibilities of rich text so you may have to look this up.
You could try notepad++ (see http://notepad-plus.sourceforge.net/uk/site.htm). It's pretty powerful (Scintilla based) and has an option to view non-printable characters (like line-breaks and the like).

Resources