Indic language not rendered in iText under certain scenarios - pdf-generation

iText version – 5.3.5
I’m using an Indic language (Tamil) to generate PDF. I have made it as a phrase using base font and is written on to the document using ColumnText. Few glyphs are rendered as symbol (a question mark in between a diamond).
com.lowagie.text.pdf.ColumnText.showTextAligned(canvas, PdfContentByte.ALIGN_LEFT, Phrase(DATA, font), (float)X, (float)Y, rotationVal, RUN_DIRECTION, 0);
Refer screenshot below.
I have used 2 sentences. When I'm using only second sentence then it is printing correct in PDF. But when I print 2 sentences together, then the second one doesn't print properly.

There are some contradictions in your question. You say that you are using iText 5.3.5, but you mention com.lowagie.text.pdf which was used only in version 2.1.7 and earlier.
You also expect that versions predating iText 7 support Tamil. This is not the case. If you want Tamil support, you need at least iText 7 (available as AGPL software) in combination with PdfCalligraph: http://itextpdf.com/itext7/pdfCalligraph
Take a look at the following screen shot to see the difference between writing Tamil without the addon versus Tamil with the addon:
PdfCalligraph is a value addon to iText 7. It is not available as open source software.

Related

Different rendering between .Rmd and .Rmarkdown when using blogdown

I have found some strange differences in the way .Rmd files are rendered compared to .Rmarkdown.
My setup:
Beautiful hugo theme
Blogdown 0.9
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
If change the file extension between Rmd and Rmarkdown, I observe the following differences:
Rmd doesn't render multi level lists properly
Rmd doesn't render footnotes [^1] properly
Rmarkdown doesn't render math properly
Python code chunks don't have a nice little execute button in the upper right in Rmd.
Is this expected behavior? Is there something with the theme that causes this?
Yes the difference between .Rmd and .Rmarkdown is expected and Yihui et al. outlines it in their book (it's around the middle of the page)
You cannot use Markdown features only supported by Pandoc, such as fenced Divs.
Math expressions only work if you apply the JavaScript solution mentioned in Section B.3.
The main thing to note is that .Rmarkdown gets converted to .markdown documents first and gets passed onto Hugo's Markdown renderer (e.g., Goldmark or Blackfriday) to generate a html while .Rmd uses Pandoc by default.

Converted PDF from html is not read by screen reader

I am working on a .Net application which converts html to pdf using Winnovative htmltopdf and the produced PDF should be read by Screen readers ( currently we are testing using JAWS screen reader). but the produced PDF is not readable by screen reader with ADA compliance. meaning if there is heading 1 with text ' this is heading 1 ' it should be reading as heading 1 this is heading 1 but it is reading just like a plain text (just reading as this is heading 1). the tag properties of PDF says No. so I thought it was the reason.
But, I have also tried with ABCPdf. now the PDF tag properties says Yes and it still reading as plain text. can someone who has already done similar kind of thing (to produce a PDF from html using some .Net library and is readable by screen readers) share the right way which I am missing?
Thanks
I found the solution. I was using Itext 3 version which doesn't support accessibility features. Itext 5 was supporting and now the produced pdf from html has a property of tagged ,Yes' and readable by screen readers.

What underlying graphics library for text output Sublime Text is using?

I noticed that Sublime Text has much better rendering for some fonts and sizes than Scintilla based editors. How is that achieved? Is there some famous text renderer underneath it or they developed their own?
Sublime Text uses highly tuned settings over platform-specific libraries to render text. It has actually used several different libraries throughout the years. I couldn't find any specifics for OSX, but there are some details for Windows/Linux in the forums and release notes.
In ST3 build 3034, it is noted that the graphics rendering engine was "ported to Skia from Cairo". However, it is not clear if there was custom text rendering being done, or if it is just for UI elements.
Sublime Text 1
I can't find a good reference for version 1, but through bits and pieces of forum conversations, it looks like it may have been a custom rendering engine written on top of OpenGL. Other forum posts point out that all future rendering would be pure software as there was a lot of cross-platform issues caused by GPU rendering.
The best quote I could find was this:
Sublime Text 2 uses software rendering only. Ultimately, it caused too many compatibility issues in 1.x, and with going cross platform for version 2, that would only have increased. Nonetheless, I'll talk about Sublime Text 1.x for a bit.
The basic text rendering itself was fairly standard: textured quads are drawn to the screen, one per character. However, there are a few details worth noting:
* Characters are buffered, and sorted by color, to reduce state changes.
* Most OpenGL applications will have a single channel texture for the characters, and blend the desired color on top. Sublime Text uses RGB textures, with the text pre-composed with the correct foreground and background color. This takes more memory, but allows ClearType to be used, which is important for a text editor.
Sublime Text 2 (release notes)
Windows uses GDI, with an option added in build 2081 to use DirectWrite instead.
In build 2170, Pango was added as the rendering on Linux to improve support for cjk text.
Sublime Text 3 (release notes)
DirectWrite replaced GDI as the default renderer on Windows in build 3127, unless you are using the fonts Consolas or Courier New. GDI was kept as a rendering option in the settings.

Display of Asian characters (with Unicode): Difference in character spacing when presented in a RichEdit control compared with using ExtTextOut

This picture illustrates my predicament:
All of the characters appear to be the same size, but the space between them is different when presented in a RichEdit control compared with when I use ExtTextOut.
I would like to present the characters the same as in the RichEdit control (ideally), in order to preserve wrap positions.
Can anyone tell me:
a) Which is the more correct representation?
b) Why the RichEdit control displays the text with no gaps between the Asian Characters?
c) Is there any way to make ExtTextOut reproduce the behaviour of the RichEdit control when drawing these characters?
d) Would this be any different if I was working on an Asian version of Windows?
Perhaps I'm being optimistic, but if anyone has any hints to offer, I'd be very interested to hear.
In case it helps:
Here's my text:
快的棕色狐狸跳在懶惰狗1 2 3 4 5 6 7 8 9 0
apologies to Asian readers, this is merely for testing our Unicode implemetation and I don't even know what language the characters are taken from, let alone whether they mean anything
In order to view the effect by pasting these characters into a RichEdit control (eg. Wordpad), you may find you have to swipe them and set the font to 'Arial'.
The rich text that I obtain is:
{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 Arial;}}{\colortbl ;\red0\green0\blue0;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\fs22\u24555?\u30340?\u26837?\u33394?\u29392?\u29432?\u36339?\u22312?\u25078?\u24816?\u29399?1 2 3 4 5 6 7 8 9 0\par\pard\'a3 $$ \'80\'80\cf1\lang2057\fs16\par}
It doesn't appear to contain a value for character 'pitch' which was my first thought.
I don't know the answer, but there are several things to suspect:
There are several versions of the rich edit control. Perhaps you're using an older one that doesn't have all the latest typographic improvements.
There are many styles and flags that affect the behavior of a rich editcontrol, so you might want to explore which ones are set and what they do. For example, look at EM_GETEDITSTYLE.
Many Asian fonts come in two versions on Windows. One is optimized for horizontal layout, and the other for vertical layout. That latter usually has the same name, but has # prepended to it. Perhaps you are using the wrong one in the rich edit control.
UPDATE: By messing around with Wordpad, I was able to reproduce the problem with the crowded text in the rich edit control.
Open a new document in Wordpad on Windows 7. Note that the selected font is Calibri.
Paste the sample text into the document.
Text appears correct, but Wordpad changed the font to SimSun.
Select the text and change the font back to Calibri or Arial.
The text will now be overcrowded, very similar to your example. Thus it appears the fundamental problem is with font linking and fallback. ExtTextOut is probably selecting an appropriate font for the script automatically. Your challenge is to figure out how to identify the right font for the script and set that font in the rich edit control.
This will only help with part of your problem, but there is a way to draw text to a DC that will look exactly the same as it does with RichEdit: what's called the windowless RichEdit control. It not exactly easy to use: I wrote a CodeProject article on it a few years back. I used this to solve the problem of a scrollable display of blocks of text, each one of which can be edited by clicking on it: the normal drawing is done with the windowless RichEdit, and the editing by showing a "real" RichEdit control on the top of it.
That would at least get you the text looking the same in both cases, though unfortunately both cases would show too little character spacing.
One further thought: if you could rely on Microsoft Office being installed, you could also try later versions of RichEdit that come with office. There's more about these on Murray Sargent's blog, as well as some interesting articles on font binding that might also help.
ExtTextOut allows you to specify the logical spacing between records. It has the parameter lpDx which is a const pointer to an array of values that indicate the distance between origins of adjacent character cells. The Microsoft API documentation notes that if you don't set it, then it sets it's own default spacing. I would have to say that's why ExtTextOut is working fine.
In particular, when you construct a EMR_EXTTEXTOUTW record in EMF, it populates an EMR_TEXT structure with this DX array - which looking at one of your comments, allowed the RichEdit to insert the EMF with the information contained in the record, whereby if you didn't set a font binding then the RTF record does some matching to work out what font to use.
In terms of the RichEdit control, the following article might be useful:
Use Font Binding in a Rich Edit Control
After character sets are assigned, Rich Edit scans the text around the
insertion point forward and backward to find the nearest fonts that
have been used for the character sets. If no font is found for a
character set, Rich Edit uses the font chosen by the client for that
character set. If the client hasn't specified a font for the character
set, Rich Edit uses the default font for that character set. If the
client wants some other font, the client can always change it, but
this approach will work most of the time. The current default font
choices are based on the following table. Note that the default fonts
are set per-process, and there are separate lists for UI usage and for
non-UI usage.
If you haven't set the characterset, then it further explains that it falls back to ANSI_CHARSET. However, it's most definitely a lot more complicated than that, as that blog article by Murray Sargent (a programmer at Microsoft) shows.

How can I change the background color of specific characters in a RTF document?

I'm trying to output RTF (Rich Text Format) from a Ruby program - and I'd prefer to just emit RTF directly without using the RTF gem as I'm doing pretty simple stuff.
I would like to highlight specific characters in a DNA sequence alignment and from the docs it seems that I can either use \highlightN ... \highlight0 or \cbN ... \cb1
The problem is that I cannot get \cb to work in either Word:Mac 2008 or Mac TextEdit (\cf works fine so I know it's not a color table issue)
\highlight does work but seemingly only with two of the possible colors (black and red) and \highlight does not use the custom color table.
By creating simple docs in Word with character shading and saving as RTF I can see blocks of ridiculously verbose RTF code that presumably does what I want, but it is so impenetrable that I'm not seeing the wood for the trees.
Part of the problem may well be that Mac Word is just not implementing RTF properly. I don't have a Windows version of Word handy.
Anyone know the right way to shade blocks of text?
Thanks
--Rob
There is a note in the RTF Pocket Guide that says MS Word does not implement the \cb command. It says MS Word uses \chshdng0\chcbpatN (where "N" is the color number that you would use with \cb). The book recommends using something like the following for compatibility with programs that implement \cbN and/or \chshdng0\chcbpatN: {\chshdng0\chcbpat5\cb5 text}.
Note: The copy of the book I have was published in 2003, so it might be a bit out-of-date.
The sequence of RTF commands that seems to be most universally supported by RTF-capable applications is:
\chshdng10000\chcbpatN\chcfpatN\cbN
These commands:
set the shading to 100 percent
set the pattern foreground and background colors to the color from the color table (we're not actually specifying a shading pattern)
set the character background to the color from the color table
Word was the most difficult application to properly render background colors in:
Despite what the latest (1.9.1) RTF spec says, Word 2013 does not resolve \highlightN colors from the \colortbl. Instead, \highlightN maps to a predefined list of colors. It looks like those colors come from the 1.5 version of the RTF spec.
Regarding \cb, the 1.9.1 spec contains this helpful pointer at the end of the section on Color Table:
Note: Windows versions of Word have never supported \cbN, but it can be emulated by the control word sequence \chshdng0\chcbpatN.
This is almost a useful suggestion, except that if you read the documentation for \chshdngN:
Character shading. The N argument is a value representing the shading of the text in hundredths of a percent.
So, 0 turns out to not be a very useful value; 100 / 0.01 gives us the 10000 we used in the sequence above.
Use WordPad to create RTF documents, not Word. WordPad creates much simpler documents, i.e. approaching human-readable.
I use WordPad every time I need to display formatted text in a WinForms application, and need something that the RichTextBox control can handle being assigned to its Rtf parameter.

Resources