Issue with xsl-fo :footnote when generating pdf/ua-1 document with fop.: "tagged PDF note id is missing" - pdf-generation

I have an issue with <fo:footnote> when generating pdf/ua-1 document with fop.
The resulting pdf displays correctly the footnote in the page but don’t pass the pdf-ua validation. A severe error on pdf tag Note “id is missing” is raised so the document is not conformed. I'm using PAC3 for the conformance test.
In the example below I have extracted the basic <fo:footnote> element which has a unique id.
How can I generate the missing Id attribute in the pdf tagged Note element?
Here is the xsl-fo really simple footnote. Note that I used an id to reference the footnote.
some text...
<fo:footnote id="FNE0001">
<fo:inline font-size="6pt" baseline-shift="super">E0001</fo:inline>
<fo:footnote-body>
<fo:block>
<fo:inline>E0001</fo:inline><fo:inline > JO L 139 du 29.5.2002, p. 9.</fo:inline>
</fo:block>
</fo:footnote-body>
</fo:footnote> some text...
Apache FOP has been set to generate pdf-ua through the conf file as follow:
<renderers>
<renderer mime="application/pdf">
<!-- Before setting the pdf-ua-mode, we must insert metadata Title in FO declaration -->
<pdf-ua-mode>PDF/UA-1</pdf-ua-mode>
....

PAC3 is checking against the failure conditions in the Matterhorn Protocol; latest edition at https://www.pdfa.org/resource/the-matterhorn-protocol/.
PAC3 is probably reporting on failure condition 19-003, ID entry of the tag is not present.
fo:footnote-body can have an id property. You could try adding an ID to that. In the fo:inline for the footnote marker, you might also need to add an fo:basic-link that refers to that ID.
FWIW, your footnote does not cause that error when checking PDF/UA generated by AH Formatter, and I'm only guessing at what FOP would be doing differently. (PAC3 and I generally disagree when it complains about a footnote being a possibly incorrect use of a Note tag, but that's another story: PAC3 tries to automate checking the conditions that should be checked by a human, and it doesn't always get it right.)

Related

iText7 html to pdf conversion with target-counter having dynamic target value not working

I have been looking into why the page numbers in my generated toc are not working on the latest (3.0.3) version of htmlpdf - even though the release note state that this is now supported.
The devil is in the detail, as noted in the example (https://kb.itextpdf.com/home/it7kb/examples/pdfhtml-support-for-generating-cross-references-for-toc-creation-with-target-counter-target-counters-css-properties) the following works very well:
.preface::before {
content: target-counter(url('#id1'), page) ' ';
}
But as soon as you change this to something more dynamic, in order to not have to not have to specify an entry for each topic you would like to have in the TOC it fails.
.preface::before {
content: target-counter(attr(href), page) ' ';
}
As also indicated in the logging by the following line:
WARN com.itextpdf.html2pdf.attach.impl.layout.PageTargetCountRenderer - Cannot resolve target-counter value with given target "attr(href)"
It looks like the "attr(href)" is taken as a literal and not resolved to the context it is being used in, eg. in this case extracting the tag's href value and using that to find the right page number.
Could this be fixed somehow - or can this be covered by some custom coding?
Thanks.
Using the latest version of the renderer (3.0.5 at the time of writing) effectively supports this properly!

How to properly scraping filtered content using XPath Query to Google Sheet?

So, this is about a content from a website which I want to get and put it in my Google Sheets, but I'm having difficulty understanding the class of the content.
target link: https://www.cnbc.com/quotes/?symbol=XAU=
This number is what I want to get from. Picture 1: The part which i want to scrape
And this is what the code looks like in inspector. Picture 2: The code shown in inspector
The target is inside a span attribute but the span attribute looks very difficult to me, so I tried to simplify it using this line of code here =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span")
Picture 3: List is shown when putting the code
After some tries, I am able to get the right target, but it confuse me, Im using this code =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span[#class='last original'][1]")
Picture 4: The right target is shown when the xpath query is more specified
As what you can see in 2nd Picture, 'last original' is not really the full name of the class, when I put the 'last original ng-binding' instead it gave me an error saying imported content is empty
So, correct me if my code is wrong, or accidental worked out somehow because there's another correct way?
How about this answer?
Modified formula 1:
When the name of class is last original and last original ng-binding, how about the following xpath and formula?
=IMPORTXML(A1,"//span[contains(#class,'last original')][1]")
In this case, the URL of https://www.cnbc.com/quotes/?symbol=XAU= is put in the cell "A1".
In this case, //span[contains(#class,'last original')][1] is used as the xpath. The value of span that the name of class includes last original is retrieved. So last original and last original ng-binding can be used.
Modified formula2:
As other xpath, how about the following xpath and formula?
=IMPORTXML(A1,"//meta[#itemprop='price']/#content")
It seems that the value is included in the metadata. So this sample retrieves the value from the metadata.
Reference:
IMPORTXML
To complete #Tanaike's answer, two alternatives :
=IMPORTXML(B2;"//span[#class='year high']")
"Year high" seems always equal to the current stock index value.
Or, with value retrieved from the script element :
=IMPORTXML(B2;"substring-before(substring-after(//script[contains(.,'modApi')],'""last\"":\""'),'\')")
Note : since I'm based in Europe, you need to replace ; with , in the formulas.

InvoiceAddRq fails for unknown reason

I hope I stay in line here, I don't want to waste your time.
I developed a simple app that extracts customer invoice data from a database and writes xml requests to load the invoice data into QuickBooks. It works well. I've run into and fixed a dozen or so quirks and glitches along the way; it's in production about a year.
While loading today's invoices into QuickBooks, one failed.
There were no objectionable characters or oddities in the data of the failed invoice; I compared it to other invoices and found very similar data that was successful; I verified the ListID values for each entity; I am stumped.
If anybody can help with this, I will appreciated it greatly.
Here's the xml that failed today (the only change is that I blocked a couple of names):
FAILED InvoiceAddRq:
<?xml version="1.0"?>
<?qbxml version="8.0"?>
<QBXML>
<QBXMLMsgsRq onError="stopOnError">
<InvoiceAddRq requestID="16012816370245509">
<InvoiceAdd>
<CustomerRef><ListID>3E00001-1139583887</ListID></CustomerRef>
<ARAccountRef><ListID>380000-1137509930</ListID></ARAccountRef>
<TemplateRef><ListID>B0000-1142608867</ListID></TemplateRef>
<TxnDate>2016-01-28</TxnDate>
<RefNumber>60125008</RefNumber>
<PONumber>AI600292569</PONumber>
<TermsRef><ListID>20000-1137508984</ListID></TermsRef>
<ShipDate>2016-01-25</ShipDate>
<Other>XYXYX PLASTICS</Other>
<InvoiceLineAdd>
<ItemRef><ListID>20000-1139578831</ListID></ItemRef>
<Desc>RECOVERING 1/25DEL. 1/26 # 11:57 AM EST – POD XYXYX</Desc>
<Quantity>1</Quantity>
<Rate>480.79</Rate>
<Other1>6</Other1>
<Other2>3466</Other2>
</InvoiceLineAdd>
</InvoiceAdd>
<IncludeRetElement>TxnID</IncludeRetElement><IncludeRetElement>TimeCreated</IncludeRetElement><IncludeRetElement>RefNumber</IncludeRetElement>
</InvoiceAddRq>
</QBXMLMsgsRq>
</QBXML>
I don't see anything wrong here. I compared this with the xml for many successful InvoiceAdd requests that occurred before and after this one. Then I tried it again for the insanity check (same result).
Below I'll post a similar invoice as a successful example.
SUCCESSFUL InvoiceAddRq:
<?xml version="1.0"?>
<?qbxml version="8.0"?>
<QBXML>
<QBXMLMsgsRq onError="stopOnError">
<InvoiceAddRq requestID="160128164851801">
<InvoiceAdd>
<CustomerRef><ListID>80000915-1294766937</ListID></CustomerRef>
<ARAccountRef><ListID>380000-1137509930</ListID></ARAccountRef>
<TemplateRef><ListID>B0000-1142608867</ListID></TemplateRef>
<TxnDate>2016-01-28</TxnDate>
<RefNumber>60125011</RefNumber>
<PONumber>2200166200</PONumber>
<TermsRef><ListID>20000-1137508984</ListID></TermsRef>
<ShipDate>2016-01-26</ShipDate>
<Other>X&Y MEHOOPANY </Other>
<InvoiceLineAdd>
<ItemRef><ListID>20000-1139578831</ListID></ItemRef>
<Desc>UNABLE TO DELIVER DUE TO THE WEATHER 1-22DEL. 1-25, POD. AXYXYX BXYXYXY #14:30</Desc>
<Quantity>1</Quantity>
<Rate>148.60</Rate>
<Other1>1</Other1>
<Other2>3</Other2>
</InvoiceLineAdd>
</InvoiceAdd>
<IncludeRetElement>TxnID</IncludeRetElement><IncludeRetElement>TimeCreated</IncludeRetElement><IncludeRetElement>RefNumber</IncludeRetElement>
</InvoiceAddRq>
</QBXMLMsgsRq>
</QBXML>
Thanks in advance.
edit: I forgot to mention that the error was very generic
"QuickBooks found an error when parsing the provided XML text stream."
Source:QBXMLRP2.RequestProcessor.2
Ok, I found the cause. The logged message cited an invalid byte "(–)". The dash character was bouncing my Invoice. I had previously looked into invoices using a dash and found many were successful loaded using the same method. What I had failed to notice was that the dash characters were different in all of the successful invoices I found except 1.
The failed xml includes – or, if you prefer, – or "En dash" instead of the normal ascii dash "-" (-).
The SDK validator doesn't flag it as a problem. Also, 1 invoice that included an "En dash" had previously loaded successful; I can't explain that.
I use character substitution tables to deal with the requirements (or idiosyncrasies) of Xml, QbXml, and the database (Oracle). Instead of changing my encoding, I'll substitute - for all occurrences of – (and —, "Em dash", for that matter). This worked for the previously failing invoice.
Thanks again to WilliamLorfing for the help.

Xpath for Gmail encountering issue

I'm investigating automation task relating to Gmail using non-US language.
I've 2 parent labels ('Cá nhân', Cá thể) and 3 sub-labels, they're on the left panel of Gmail:
Cá nhân
Private
Remotekit
Cá thể
Cá tính
Installing Xpath Helper extension in Chrome Browser. Then, open it to identify the 'Cá nhân' location
//div[#class='aio aip']//a[contains(string(),'C')]
--> Result returns 4: 'Cá nhân,Cá thể,Cá tính,Cơ quan'
Then, I type full:
//div[#class='aio aip']//a[contains(string(),'Cá nhân')]
--> Result return: Null
But, if I type //div[#class='aio aip']//a[contains(string(),'Ca')]
--> Result return: 'Cá nhân'
Why does have difference result between short name 'Ca' and full name 'Cá nhân' -- is this UTF8 encoding ?
One strange, if //div[#class='aio aip']//a[contains(string(),'Cá thể')], it can find exactly 'Cá thể' label location
Please help me figure out the issue.

Is altChunk valid within PresentationML?

Is it valid to use an altChunk element within a any of the slide xml files within a pptx ooxml package?
I have read through the ECMA-376 spec, and while altChunk is defined within the WordprocessingML section of the spec (e.g.: "Any document part that permits a p element can also contain an altChunk element, whose id attribute refers to a relationship"), it is not mentioned anywhere else.
PresentationML apparently doesn't have a p element, and the other valid parent elements for AltChunk (body (§17.2.2); comment (§17.13.4.2); docPartBody (§17.12.6); endnote (§17.11.2); footnote (§17.11.10); ftr (§17.10.3); hdr (§17.10.4); tc (§17.4.66)) don't appear to be valid within PresentationML.
My attempts to use altChunk within a slide xml file (with proper validated entries in the relevant rels file) have resulted in invalid xml: PPT2010 offers to repair the file, and this great tool http://www.probatron.org:8080/officeotron/officeotron.html offers a number of different errors (e.g.: "Invalid content was found starting with element 'p:altChunk'." or "One of '{"http://schemas.openxmlformats.org/drawingml/2006/main":p}' is expected") depending on where I place the altChunk element.
(FWIW, the actual problem I am trying to solve is to include some basic HTML within a ppt slide.)
What do you want to accomplish by including the HTML within a slide?
If you simply need to store it (or any other text) you can use Tags. I don't know how they're implemented in XML but you can add one via a bit of VBA and see what appears in the XML:
Sub AddTagToSlide()
With ActivePresentation.Slides(1)
.Tags.Add "ThisIsTheTagName", "ThisIsTheTagValue"
End With
' Did it work? How do we retrieve a tag?
With ActivePresentation.Slides(1)
MsgBox .Tags("ThisIsTheTagName")
End With
End Sub
You can add as many tags as you like to the Presentation object, Slide objects, and Shape objects, among other things. There's no UI for tags, so they're not visible to users.

Resources