Converted PDF from html is not read by screen reader - pdf-generation

I am working on a .Net application which converts html to pdf using Winnovative htmltopdf and the produced PDF should be read by Screen readers ( currently we are testing using JAWS screen reader). but the produced PDF is not readable by screen reader with ADA compliance. meaning if there is heading 1 with text ' this is heading 1 ' it should be reading as heading 1 this is heading 1 but it is reading just like a plain text (just reading as this is heading 1). the tag properties of PDF says No. so I thought it was the reason.
But, I have also tried with ABCPdf. now the PDF tag properties says Yes and it still reading as plain text. can someone who has already done similar kind of thing (to produce a PDF from html using some .Net library and is readable by screen readers) share the right way which I am missing?
Thanks

I found the solution. I was using Itext 3 version which doesn't support accessibility features. Itext 5 was supporting and now the produced pdf from html has a property of tagged ,Yes' and readable by screen readers.

Related

Reading reading pdf paragraph text along with css (color etc) using itext

I have pdf with tables and cells and i have added paragraph with formatted text in the cell, i need to read list of all tables and followed by text in paragraph with some css(some text highlighted in color), let me know how to start with this, any link where i can go through.
Using iText7
Thanks
Daya
These are lots of pages to go through CSS, use them and get a bit known about CSS. W3SCHOOLS is a good place to go through them :)
i have converted PDF to xlsx - which helped in identifying font size,color,name.

FOP 2.2 RTF Export | Table header text | Bottom to top alignment not working

I'm using fop 2.2 and below mentioned xml for generating the table.
It produces correct pdf documents, see below
but in rtf export, the text don't get vertical and inverted as well.
Any help would be appreciated!!
FOP RTF support is very limited. Per the FOP website:
"JFOR, an open source XSL-FO to RTF converter has been integrated into Apache FOP. This will create an RTF (rich text format) document that will attempt to contain as much information from the XSL-FO document as possible. It should be noted that is not possible (due to RTF's limitations) to map all XSL-FO features to RTF. For complex documents, the RTF output will never reach the feature level from PDF, for example. Thus, using RTF output is only recommended for simple documents such as letters."
It is highly likely it is not supported. Since your test is valid that it produces PDF fine but not RTF, it is most likely the fact that it is not supported in FOP RTF output.

Debugging PDF for error

I'm creating PDF files using PDFClown java library.
Sometimes, when openning these files with Adobe Acrobat Reader I get the famous error message:
"An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."
The error shows while reading (with Adobe) the attached file only when scrolling down to the 8'th page, then scrolling back up to 3'td page. Alternatively, Zooming out to 33.3% will also produce the message.
Just for the record, Foxit reader reads the file flawlessly, as well as other PDF readers like browsers.
My questions are:
What's wrong with my file?? (file is attached)
How can I find what's wrong with it? is there a tool which tells you where does the error lie?
Thanks!
Ok, this wasn't easy -
Due to a bug in PDFClown the my main stream of information in the PDF page has been corrupted.
After it's end it had a copy of a past instance of it.
This caused a partial text section without the starting command "BT" - which left a single "ET" without a "BT" in the end of the stream.
once I corrected this, it ran great.
Thank you all for your help.
I would have much more difficult time debugging it without the tool RUPS which #Bruno suggested.
edit:
The bug was in the Buffer.java:clone() (line 217)
instead of line:
clone.append(data);
needs to be:
clone.append(data, 0, this.length);
Without this correction it clones the whole data buffer, and set the cloned Buffer's length to the data[].length. This is very problematic if the Buffer.length is smaller than the data[].length.
The result in my case was that in the end of the stream there was garbage.
The error shows while reading (with Adobe) the attached file only when scrolling down to the 8'th page, then scrolling back up to 3'td page. Alternatively, Zooming out to 33.3% will also produce the message.
Well, I get it easier, I merely open the PDF and scroll down using the cursor keys. As soon as the top 2 cm of page 3 appear, the message appears.
What's wrong with my file??
The content of pages 1 and 2 look ok, so let's look at the content of page 3.
My initial attributing the issue to the use of text specific operations (especially Tf and Tw) outside of a text object was wrong as Stefano Chizzolini pointed out: Some text related operations indeed are allowed outside text objects, namely the text state operations, cf. figure 9 from the PDF specification:
So while being less common, text state operations at page description level are completely ok.
After my incorrect attempt to explain the issue, the OP's own answer indicated that the
main stream of information in the PDF page has been corrupted. After it's end it had a copy of a past instance of it. This caused a partial text section without the starting command "BT" - which left a single "ET" without a "BT" in the end of the stream.
An ET without a prior BT indeed would be an error, and quite likely it would be accompanied by operations at the wrong level... Inspecting the stream content of that third page (the focused page of this issue), though, I could not find any unmatched ET. In the course of that inspection, though, I discovered that the content stream contains more than 2000 trailing 0 bytes! Adobe Reader seems not to be able to cope with these 0 bytes.
The bug the OP found, can explain the issue:
in the Buffer.java:clone() (line 217)
instead of line:
clone.append(data);
needs to be:
clone.append(data, 0, this.length);
Without this correction it clones the whole data buffer, and set the cloned Buffer's length to the data[].length. This is very problematic if the Buffer.length`` is smaller than the data[].length.
Trailing 0 bytes can be an effect of such a buffer copying bug.
Furthermore symptoms as found by the OP (After it's end it had a copy of a past instance of it) can also be the effect of such a bug. So I assume the OP found those symptoms on a different page, not page 3, but fixing the bug healed all symptoms.
How can I find what's wrong with it? is there a tool which tells you where does the error lie?
There are PDF syntax checkers, e.g. the Preflight tool included in Adobe Acrobat. but even that fails on your file.
So essentially you have to extract the page content (using a PDF browser, e.g. RUPS) and check manually with the PDF specification on the other screen.
the general post about debugging pdf might have been also helpful as rups / pdfstreamdump etc is mentioned there How do you debug PDF files?

How do HTML entities in the alt or title attribute of an image affect tool tips and audio user agents?

BACKGROUND
I was given text from a copywriter which contained a lot of © . So in Sublime Text Editor, I did a Search and replace with:
©
A lot of the text was already added to the images in titles and alt attributes. So the text in there was affected as well.
Previously on my own projects I would just strip out all legal items such as registered, trademark, and copyright symbols. Because I figured only the actual text mattered. But this project is for a company that has more compliance regulations, so I am taking into account things such as screen readers. The first thought that comes to my head in a screen reader would be that "copyright" would be spoken? is that correct? does it really even matter in this context?
So should I use the HTML entities?
I have so far in Google found
HTML title Attribute summarized at w3 schools. but no mention of HTML entities.
The global structure of an HTML document in the w3c recommendation documents that discusses the title tag. But no mention of HTML entities.
Using the HTML title attribute – updated had good insights on the title attribute, but again, no mention of html entities.
Any experience with this? or other suggestions for things to search? or will I be fine with just plain text
Generally speaking the screen readers should do an acceptable job with them. You can find an excellent summary of how JAWS handles this at http://accessibleculture.org/research-files/character-references/jaws-all.php. Not all of them are read correctly, but most are. I also tested that page with VoiceOver on OS X Yosemite and it seems to do a good job as well. So, I think you can use character entities as you normally would and expect decent (if not perfect) results with mainstream screen readers.

Flash / Actionscript: How to insert images in dynamic text (In line image)?

I want to insert images in dynamic-text-box(s) which should be inline.
Detail:
I am preparing an application using flash CS4; The application is just like a chat room which will show conversation the only difference in this; it will show stored messages (stored in XML file). I want to insert smiling faces (emotions) in text body (using html tags) but the problem is that image is not inline (like in chat room [yahoo, hotmail, etc.]).
I have no idea what to do......
Please paste your code where you set the dynamic-text-box.text
Make sure that you wrap the whole text in html braces, not only images.
You can embed images in any HTMLText field using the tag.
The image, however, must be loaded externally. You can't get images stored in your library.
Its good to get the solution of my problem but sad I got solution by myself :-P
The simplest solution I got is, to update my Flash CS4 to Flash CS6; in Flash CS6 text (TextField) have extra feature like TLF (Text Layout Framework). By use TLF I can insert graphics in text area and inserted graphics are inline as well.
Problem Solved :-)

Resources