Does FOP 2.1 support ViewerPreferences? - pdf-generation

I'm using FOP 2.1 and am trying to set ViewerPreferences, e.g. DisplayDocTitle -> true.
I'm trying (from this question
<fo:declarations>
<pdf:dictionary type="Catalog" xmlns:pdf="http://xmlgraphics.apache/org/fop/extensions/pdf">
<pdf:dictionary type="normal" key="ViewerPreferences">
<pdf:entry key="DisplayDocTitle" type="boolean">true</pdf:entry>
</pdf:dictionary>
</pdf:dictionary>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
...
but getting
Jul 13, 2016 11:18:31 AM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Unknown formatting object "{http://xmlgraphics.apache/org/fop/extensions/pdf}dictionary" encountered (a child of fo:declarations}. (See position 242:105)
Jul 13, 2016 11:18:31 AM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Unknown formatting object "{http://xmlgraphics.apache/org/fop/extensions/pdf}dictionary" encountered (a child of dictionary}. (See position 243:69)
and no ViewerPreferences inside the pdf.
When I put the dictionarys below the <x:xmpmeta xmlns:x="adobe:ns:meta/"> then I get no ViewerPreferences either, only pdfbox preflight will then complain about
The file test.pdf is not valid, error(s) :
7.3 : Error on MetaData, Cannot find a definition for the namespace http://xmlgraphics.apache/org/fop/extensions/pdf
What am I doing wrong, am I too early to try it? Where do I have to patch fop?

According to the release notes FOP 2.0 introduced, among other things, a
Low level mechanism to augment PDF /Catalog and /Page dictionaries
but there are not many examples of its usage in the website.
Looking at the testcases included in the source distribution, in particular the ones named pdf-dictionary-extension_*.xml, I was able to put together something similar to your code which does not generate run-time exceptions; admittedly, I'm not familiar enough with this PDF feature to say whether the output actually achieves what you are trying to do:
<fo:declarations>
<pdf:catalog xmlns:pdf="http://xmlgraphics.apache.org/fop/extensions/pdf">
<pdf:dictionary type="normal" key="ViewerPreferences">
<pdf:boolean key="DisplayDocTitle">true</pdf:boolean>
</pdf:dictionary>
</pdf:catalog>
</fo:declarations>
there is no <pdf:dictionary type="Catalog">, there is pdf:catalog instead
there is not a single <pdf:entry key="..." type="..."> element, but there is a specific element for each possible entry type: pdf:array, pdf:boolean, pdf:name, pdf:number, pdf:string, ...
(disclosure: I'm a FOP developer, though not very active nowadays)

As supplement to #lfurini's excellent finding, here are some more thing that can be done that way easily, tested with fop 2.1, but could also work from 2.0:, remove the comments from the relevant sections to try:
<fo:declarations>
<pdf:catalog xmlns:pdf="http://xmlgraphics.apache.org/fop/extensions/pdf">
<!-- this opens in full-screen mode, e.g. as presentation -->
<!-- pdf:name key="PageMode">FullScreen</pdf:name -->
<!-- this opens then second page so it is fully visible -->
<!-- (count seems to start at 0) -->
<!-- pdf:array key="OpenAction">
<pdf:number>1</pdf:number>
<pdf:name>Fit</pdf:name>
</pdf:array -->
<!-- this will replace the window title from filename to below dc:title -->
<pdf:dictionary type="normal" key="ViewerPreferences">
<pdf:boolean key="DisplayDocTitle">true</pdf:boolean>
</pdf:dictionary>
</pdf:catalog>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/">
<!-- Dublin Core properties go here -->
<dc:title>Sample Document title</dc:title>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
</fo:declarations>
Details of possible values can be looked up in the pdf specification (from page 139 in this v1.7 version, TABLE 3.25 Entries in the catalog dictionary), take care not to use values that would normally be set by fop anyway, restrict yourself to viewer/reader relevant stuff.

Related

Issue with xsl-fo :footnote when generating pdf/ua-1 document with fop.: "tagged PDF note id is missing"

I have an issue with <fo:footnote> when generating pdf/ua-1 document with fop.
The resulting pdf displays correctly the footnote in the page but don’t pass the pdf-ua validation. A severe error on pdf tag Note “id is missing” is raised so the document is not conformed. I'm using PAC3 for the conformance test.
In the example below I have extracted the basic <fo:footnote> element which has a unique id.
How can I generate the missing Id attribute in the pdf tagged Note element?
Here is the xsl-fo really simple footnote. Note that I used an id to reference the footnote.
some text...
<fo:footnote id="FNE0001">
<fo:inline font-size="6pt" baseline-shift="super">E0001</fo:inline>
<fo:footnote-body>
<fo:block>
<fo:inline>E0001</fo:inline><fo:inline > JO L 139 du 29.5.2002, p. 9.</fo:inline>
</fo:block>
</fo:footnote-body>
</fo:footnote> some text...
Apache FOP has been set to generate pdf-ua through the conf file as follow:
<renderers>
<renderer mime="application/pdf">
<!-- Before setting the pdf-ua-mode, we must insert metadata Title in FO declaration -->
<pdf-ua-mode>PDF/UA-1</pdf-ua-mode>
....
PAC3 is checking against the failure conditions in the Matterhorn Protocol; latest edition at https://www.pdfa.org/resource/the-matterhorn-protocol/.
PAC3 is probably reporting on failure condition 19-003, ID entry of the tag is not present.
fo:footnote-body can have an id property. You could try adding an ID to that. In the fo:inline for the footnote marker, you might also need to add an fo:basic-link that refers to that ID.
FWIW, your footnote does not cause that error when checking PDF/UA generated by AH Formatter, and I'm only guessing at what FOP would be doing differently. (PAC3 and I generally disagree when it complains about a footnote being a possibly incorrect use of a Note tag, but that's another story: PAC3 tries to automate checking the conditions that should be checked by a human, and it doesn't always get it right.)

iText7 html to pdf conversion with target-counter having dynamic target value not working

I have been looking into why the page numbers in my generated toc are not working on the latest (3.0.3) version of htmlpdf - even though the release note state that this is now supported.
The devil is in the detail, as noted in the example (https://kb.itextpdf.com/home/it7kb/examples/pdfhtml-support-for-generating-cross-references-for-toc-creation-with-target-counter-target-counters-css-properties) the following works very well:
.preface::before {
content: target-counter(url('#id1'), page) ' ';
}
But as soon as you change this to something more dynamic, in order to not have to not have to specify an entry for each topic you would like to have in the TOC it fails.
.preface::before {
content: target-counter(attr(href), page) ' ';
}
As also indicated in the logging by the following line:
WARN com.itextpdf.html2pdf.attach.impl.layout.PageTargetCountRenderer - Cannot resolve target-counter value with given target "attr(href)"
It looks like the "attr(href)" is taken as a literal and not resolved to the context it is being used in, eg. in this case extracting the tag's href value and using that to find the right page number.
Could this be fixed somehow - or can this be covered by some custom coding?
Thanks.
Using the latest version of the renderer (3.0.5 at the time of writing) effectively supports this properly!

InvoiceAddRq fails for unknown reason

I hope I stay in line here, I don't want to waste your time.
I developed a simple app that extracts customer invoice data from a database and writes xml requests to load the invoice data into QuickBooks. It works well. I've run into and fixed a dozen or so quirks and glitches along the way; it's in production about a year.
While loading today's invoices into QuickBooks, one failed.
There were no objectionable characters or oddities in the data of the failed invoice; I compared it to other invoices and found very similar data that was successful; I verified the ListID values for each entity; I am stumped.
If anybody can help with this, I will appreciated it greatly.
Here's the xml that failed today (the only change is that I blocked a couple of names):
FAILED InvoiceAddRq:
<?xml version="1.0"?>
<?qbxml version="8.0"?>
<QBXML>
<QBXMLMsgsRq onError="stopOnError">
<InvoiceAddRq requestID="16012816370245509">
<InvoiceAdd>
<CustomerRef><ListID>3E00001-1139583887</ListID></CustomerRef>
<ARAccountRef><ListID>380000-1137509930</ListID></ARAccountRef>
<TemplateRef><ListID>B0000-1142608867</ListID></TemplateRef>
<TxnDate>2016-01-28</TxnDate>
<RefNumber>60125008</RefNumber>
<PONumber>AI600292569</PONumber>
<TermsRef><ListID>20000-1137508984</ListID></TermsRef>
<ShipDate>2016-01-25</ShipDate>
<Other>XYXYX PLASTICS</Other>
<InvoiceLineAdd>
<ItemRef><ListID>20000-1139578831</ListID></ItemRef>
<Desc>RECOVERING 1/25DEL. 1/26 # 11:57 AM EST – POD XYXYX</Desc>
<Quantity>1</Quantity>
<Rate>480.79</Rate>
<Other1>6</Other1>
<Other2>3466</Other2>
</InvoiceLineAdd>
</InvoiceAdd>
<IncludeRetElement>TxnID</IncludeRetElement><IncludeRetElement>TimeCreated</IncludeRetElement><IncludeRetElement>RefNumber</IncludeRetElement>
</InvoiceAddRq>
</QBXMLMsgsRq>
</QBXML>
I don't see anything wrong here. I compared this with the xml for many successful InvoiceAdd requests that occurred before and after this one. Then I tried it again for the insanity check (same result).
Below I'll post a similar invoice as a successful example.
SUCCESSFUL InvoiceAddRq:
<?xml version="1.0"?>
<?qbxml version="8.0"?>
<QBXML>
<QBXMLMsgsRq onError="stopOnError">
<InvoiceAddRq requestID="160128164851801">
<InvoiceAdd>
<CustomerRef><ListID>80000915-1294766937</ListID></CustomerRef>
<ARAccountRef><ListID>380000-1137509930</ListID></ARAccountRef>
<TemplateRef><ListID>B0000-1142608867</ListID></TemplateRef>
<TxnDate>2016-01-28</TxnDate>
<RefNumber>60125011</RefNumber>
<PONumber>2200166200</PONumber>
<TermsRef><ListID>20000-1137508984</ListID></TermsRef>
<ShipDate>2016-01-26</ShipDate>
<Other>X&Y MEHOOPANY </Other>
<InvoiceLineAdd>
<ItemRef><ListID>20000-1139578831</ListID></ItemRef>
<Desc>UNABLE TO DELIVER DUE TO THE WEATHER 1-22DEL. 1-25, POD. AXYXYX BXYXYXY #14:30</Desc>
<Quantity>1</Quantity>
<Rate>148.60</Rate>
<Other1>1</Other1>
<Other2>3</Other2>
</InvoiceLineAdd>
</InvoiceAdd>
<IncludeRetElement>TxnID</IncludeRetElement><IncludeRetElement>TimeCreated</IncludeRetElement><IncludeRetElement>RefNumber</IncludeRetElement>
</InvoiceAddRq>
</QBXMLMsgsRq>
</QBXML>
Thanks in advance.
edit: I forgot to mention that the error was very generic
"QuickBooks found an error when parsing the provided XML text stream."
Source:QBXMLRP2.RequestProcessor.2
Ok, I found the cause. The logged message cited an invalid byte "(–)". The dash character was bouncing my Invoice. I had previously looked into invoices using a dash and found many were successful loaded using the same method. What I had failed to notice was that the dash characters were different in all of the successful invoices I found except 1.
The failed xml includes – or, if you prefer, – or "En dash" instead of the normal ascii dash "-" (-).
The SDK validator doesn't flag it as a problem. Also, 1 invoice that included an "En dash" had previously loaded successful; I can't explain that.
I use character substitution tables to deal with the requirements (or idiosyncrasies) of Xml, QbXml, and the database (Oracle). Instead of changing my encoding, I'll substitute - for all occurrences of – (and —, "Em dash", for that matter). This worked for the previously failing invoice.
Thanks again to WilliamLorfing for the help.

How to construct valid rdf/xml?

I have a valid rdf/xml file and I have to put it in another tag, so on the first level to have only one tag element.
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:rtc_cm="http://jazz.net/xmlns/prod/jazz/rtc/cm/1.0/"
xmlns:oslc="http://open-services.net/ns/core#" >
<rdf:Description rdf:about="https://10.0.2.79:9443/ccm/oslc/types/_tsVvMWWwEeWQIIEAtKgWEg/com.ibm.team.apt.workItemType.story">
<rdf:type rdf:resource="http://jazz.net/xmlns/prod/jazz/rtc/cm/1.0/Type"/>
<rtc_cm:projectArea rdf:resource="https://10.0.2.79:9443/ccm/oslc/projectareas/_tsVvMWWwEeWQIIEAtKgWEg"/>
<rtc_cm:category rdf:datatype="http://www.w3.org/2001/XMLSchema#string">com.ibm.team.workitem.workItemType.story</rtc_cm:category>
<rtc_cm:iconUrl rdf:resource="https://10.0.2.79:9443/ccm/service/com.ibm.team.workitem.common.internal.model.IImageContentService/processattachment/_tsVvMWWwEeWQIIEAtKgWEg/workitemtype/story.gif"/>
<dcterms:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Story</dcterms:title>
<dcterms:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">com.ibm.team.apt.workItemType.story</dcterms:identifier>
</rdf:Description>
...
<rdf:Description ...2>
</rdf:Description ...2>
</rdf:RDF>
Here as you see there are can be more than one Description elements.
I want to put all of them in one tag. How to do that? If I try:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:rtc_cm="http://jazz.net/xmlns/prod/jazz/rtc/cm/1.0/"
xmlns:oslc="http://open-services.net/ns/core#" >
<rdf:MyTag rdf:about="https://10.0.2.79:9443/ccm/oslc/types/_tsVvMWWwEeWQIIEAtKgWEg/com.ibm.team.apt.workItemType.story">
<rdf:Description rdf:about="https://10.0.2.79:9443/ccm/oslc/types/_tsVvMWWwEeWQIIEAtKgWEg/com.ibm.team.apt.workItemType.story">
<rdf:type rdf:resource="http://jazz.net/xmlns/prod/jazz/rtc/cm/1.0/Type"/>
<rtc_cm:projectArea rdf:resource="https://10.0.2.79:9443/ccm/oslc/projectareas/_tsVvMWWwEeWQIIEAtKgWEg"/>
<rtc_cm:category rdf:datatype="http://www.w3.org/2001/XMLSchema#string">com.ibm.team.workitem.workItemType.story</rtc_cm:category>
<rtc_cm:iconUrl rdf:resource="https://10.0.2.79:9443/ccm/service/com.ibm.team.workitem.common.internal.model.IImageContentService/processattachment/_tsVvMWWwEeWQIIEAtKgWEg/workitemtype/story.gif"/>
<dcterms:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Story</dcterms:title>
<dcterms:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">com.ibm.team.apt.workItemType.story</dcterms:identifier>
</rdf:Description>
</rdf:MyTag>
</rdf:RDF>
I added MyTag, but an error apperas:
Error: {E205} rdf:Description is not allowed as an element tag here.[Line = 9, Column = 128]
Error: {E201} rdf:about not allowed as attribute here.[Line = 9, Column = 128]
Error: {E201} rdf:resource not allowed as attribute here.[Line = 10, Column = 79]
Error: {E201} Multiple children of property element[Line = 11, Column = 110]
Error: {E201} rdf:resource not allowed as attribute here.[Line = 11, Column = 110]
......
Warning: {W113} rdf:MyTag is not a recognized RDF property or type.[Line = 8, Column = 122]
I use for validation:
http://www.w3.org/RDF/Validator/
I do something stupid I thing. Probably in the http://www.w3.org/RDF/Validator/ are defined other tags than MyProp, but if I open that link I can not see the valid tags? How to fix the error?
I want to put all of them in one tag
Why?
RDF/XML is a serialisation of a RDF Graph in XML, you are constrained by the rules of RDF/XML and you really should not care what the XML looks like.
If you are working with a system that does care then that is a bug/poor design in the system you are working with and the system should change not the RDF. This of course assumes that your RDF/XML expresses the appropriate RDF graph that the system wishes to consume.
Any manipulation of RDF/XML (or any other RDF serialisation) should be done using an appropriate API or Toolkit that will understand and enforce the rules for you.
For anything beyond trivial examples you should really never be editing RDF by hand.
As #RobV indicated, technically it's possible to do this, but it is fundamentally problematic that the tool you're using requires this, and likely indicates deeper problems - so I doubt that fixing this will help you all that much: any fix that results in syntactically legal RDF/XML is likely to still be impossible to process by this tool.
However, here goes: in RDF/XML, each Description element captures the description (in terms of its properties) for a particular resource. The resource described is the one identified in the rdf:about attribute.
So:
<rdf:Description rdf:about="http://example.org/person1">
<rdfs:label>John</rdfs:label>
</rdf:Description>
is a description of a single resource http://example.org/person1, and its property rdfs:label, which has the value "John". In RDF-terms: we have defined a single triple (or statement):
<http://example.org/person1> rdfs:label "John" .
Your RDF file contains more than one description. Although you haven't shown the rdf:about attributes of any of the other Description elemements, I assume they are different - which means they are descriptions of different things :
<rdf:Description rdf:about="http://example.org/person1">
<rdfs:label>John</rdfs:label>
</rdf:Description>
<rdf:Description rdf:about="http://example.org/person2">
<rdfs:label>Paul</rdfs:label>
</rdf:Description>
which corresponds to the following RDF statements:
<http://example.org/person1> rdfs:label "John" .
<http://example.org/person2> rdfs:label "Paul" .
We could of course put everything in a single Description element, but then what should the value of its rdf:about attribute become? If we picked one at random we'd get something like this:
<rdf:Description rdf:about="http://example.org/person1">
<rdfs:label>John</rdfs:label>
<rdfs:label>Paul</rdfs:label>
</rdf:Description>
While doing this leads to syntactically correct RDF/XML, it corresponds to the following RDF model:
<http://example.org/person1> rdfs:label "John" .
<http://example.org/person1> rdfs:label "Paul" .
This is clearly undesirable, since it is factually wrong (there is not just one person called both "John" and "Paul", there's two separate persons, one named "John", the other named "Paul"). In other words, it fixes the syntax problem, but messes up the actual meaning of your data.
The first element underneath the rdf:RDF element in RDF/XML is expected to be either a rdf:Description element, or a different element which indicates a class name (this is a shorthand for writing down a description of a resource and assigning it a class by adding a property). Let's forget about that shorthand notation for a bit and just focus on rdf:Description. Clearly, we can't just have something like this:
<rdf:Description>
<rdf:Description rdf:about="http://example.org/person1">
<rdfs:label>John</rdfs:label>
</rdf:Description>
<rdf:Description rdf:about="http://example.org/person2">
<rdfs:label>Paul</rdfs:label>
</rdf:Description>
</rdf:Description>
This is syntactically invalid because an RDF processor expects the elements nested inside a description to be the properties of the resource - but here we just have other descriptions, which no indication as to what the actual relation between those description and the 'outer' description is. This can be fixed by further nesting each 'inner' description in a property, like so:
<rdf:Description>
<ex:contains>
<rdf:Description rdf:about="http://example.org/person1">
<rdfs:label>John</rdfs:label>
</rdf:Description>
<ex:contains>
<ex:contains>
<rdf:Description rdf:about="http://example.org/person2">
<rdfs:label>Paul</rdfs:label>
</rdf:Description>
<ex:contains>
<rdf:Description>
This is syntactially legal (assuming you also add a namespace definition to the rdf:RDF element for the ex namespace), and preserves the meaning of your original, separate descriptions. If you want to limit the number of times you have to repeat the ex:contains element, you can also use an RDF Collection, like so:
<rdf:Description>
<ex:contains rdf:parseType="Collection">
<rdf:Description rdf:about="http://example.org/person1">
<rdfs:label>John</rdfs:label>
</rdf:Description>
<rdf:Description rdf:about="http://example.org/person2">
<rdfs:label>Paul</rdfs:label>
</rdf:Description>
<ex:contains>
<rdf:Description>
As said though: while this fixes the specific problem you asked about, neither of these solution are likely to help you much further. If the tool you're using to process this data can't deal with more than one rdf:Description, it's unlikely to be able to process either of the above files meaningfully. And also, as soon as you use any sort of proper RDF tooling to process/edit the file, it is likely to change this surface syntax again to something with more than one description. The point is that relying on the precise syntax formatting of the RDF/XML syntax is a Very Bad Idea(tm).

Is altChunk valid within PresentationML?

Is it valid to use an altChunk element within a any of the slide xml files within a pptx ooxml package?
I have read through the ECMA-376 spec, and while altChunk is defined within the WordprocessingML section of the spec (e.g.: "Any document part that permits a p element can also contain an altChunk element, whose id attribute refers to a relationship"), it is not mentioned anywhere else.
PresentationML apparently doesn't have a p element, and the other valid parent elements for AltChunk (body (§17.2.2); comment (§17.13.4.2); docPartBody (§17.12.6); endnote (§17.11.2); footnote (§17.11.10); ftr (§17.10.3); hdr (§17.10.4); tc (§17.4.66)) don't appear to be valid within PresentationML.
My attempts to use altChunk within a slide xml file (with proper validated entries in the relevant rels file) have resulted in invalid xml: PPT2010 offers to repair the file, and this great tool http://www.probatron.org:8080/officeotron/officeotron.html offers a number of different errors (e.g.: "Invalid content was found starting with element 'p:altChunk'." or "One of '{"http://schemas.openxmlformats.org/drawingml/2006/main":p}' is expected") depending on where I place the altChunk element.
(FWIW, the actual problem I am trying to solve is to include some basic HTML within a ppt slide.)
What do you want to accomplish by including the HTML within a slide?
If you simply need to store it (or any other text) you can use Tags. I don't know how they're implemented in XML but you can add one via a bit of VBA and see what appears in the XML:
Sub AddTagToSlide()
With ActivePresentation.Slides(1)
.Tags.Add "ThisIsTheTagName", "ThisIsTheTagValue"
End With
' Did it work? How do we retrieve a tag?
With ActivePresentation.Slides(1)
MsgBox .Tags("ThisIsTheTagName")
End With
End Sub
You can add as many tags as you like to the Presentation object, Slide objects, and Shape objects, among other things. There's no UI for tags, so they're not visible to users.

Resources