Undefined namespace prefix error when using xlink - image

i have this xml :
<figure>
<objetmedia>
<image typeimage="figure" xlink:actuate="onLoad" xlink:href="picture-02.jpg" xlink:show="embed" xlink:type="simple"/>
</objetmedia>
</figure>
And i have this xsl script :
<xsl:template match="figure">
<figure>
<xsl:apply-templates select="objetmedia" mode="image"/>
</figure>
</xsl:template>
<xsl:template match="objetmedia" mode="image">
<img src='{image/#xlink:href}' />
</xsl:template>
But i have this error :
Warning: XSLTProcessor::transformToXml(): Undefined namespace prefix
Warning: XSLTProcessor::transformToXml(): xmlXPathCompiledEval:
evaluation failed Warning: XSLTProcessor::transformToXml(): runtime
error: file script.xsl line 154 element img Warning:
XSLTProcessor::transformToXml(): Internal error: Failed to evaluate
the AVT of attribute 'src'.
Why ?

You need to declare the xlink: namespace prefix in your stylesheet, and bind it to the same namespace uri as the original document uses. You haven't shown the part of your input document that includes the namespace declaration, but if it's the standard XLink namespace then you'd need to add
xmlns:xlink="http://www.w3.org/1999/xlink"
to an appropriate place, usually your xsl:stylesheet tag.
The point is that XPath expressions and match patterns use the prefix bindings of the stylesheet, not of the input xml document. What matters for the matching is the namespace uri and the local name. Your stylesheet could equally well declare xmlns:x="http://www.w3.org/1999/xlink" and then look for #x:href - as long as the namespaces match it'll find the right thing.

Related

Replace image src in vml markup with globally available images using Nokogiri

Is it possible to find outlook specific markup via Capybara/Nokogiri ?
Given the following markup (erb <% %> tags are processed into regular HTML)
...
<div>
<!--[if gte mso 9]>
<v:rect
xmlns:v="urn:schemas-microsoft-com:vml" fill="true" stroke="false"
style="width:<%= card_width %>px;height:<%= card_header_height %>px;"
>
<v:fill type="tile"
src="<%= avatar_background_url.split('?')[0] %>"
color="<%= background_color %>" />
<v:textbox inset="0,0,0,0">
<![endif]-->
<div>
How can I get the list of <v:fill ../> tags ? (or eventually how can I get the whole comment if finding the tag inside a conditional comment is a problem)
I have tried the following
doc.xpath('//v:fill')
*** Nokogiri::XML::XPath::SyntaxError Exception: ERROR: Undefined namespace prefix: //v:fill
DO I need to somehow register the vml namespace ?
EDIT - following #ThomasWalpole approach
doc.xpath('//comment()').each do |comment_node|
vml_node_match = /<v\:fill.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
if vml_node_match
original_image_uri = URI.parse(vml_node_match['url'])
vml_tag = vml_node_match[0]
handle_vml_image_replacement(original_image_uri, comment_node, vml_tag)
end
My handle_vml_image_replacement then ends up calling the following replace_comment_image_src
def self.replace_comment_image_src(node:, comment:, old_url:, new_url:)
new_url = new_url.split('?').first # VML does not support URL with query params
puts "Replacing comment src URL in #{comment} by #{new_url}"
node.content = node.content.gsub(old_url, new_url)
end
But then it feels like the comment is actually no longer a "comment" and I can sometimes see the HTML as if it was escaped... I am most likely using the wrong method to change the comment text with Nokogiri ?
Here's the final code that I used for my email interceptor, thanks to #Thomas Walpole and #sschmeck for help along the way.
My goal was to replace images (linking to localhost) in VML markup with globally available images for testing with services like MOA or Litmus
doc.xpath('//comment()').each do |comment_node|
# Note : cannot capture beginning of tag, since it might span across several lines
src_attr_match = /.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
next unless src_attr_match
original_image_uri = URI.parse(src_attr_match['url'])
handle_comment_image_replacement(original_image_uri, comment_node)
end
WHich is later calling (after picking an url replacement strategy depending on source image type) :
def self.replace_comment_image_src(node:, old_url:, new_url:)
new_url = new_url.split('?').first
node.native_content = node.content.gsub(old_url, new_url)
end

How to deal with namespaces in XML with Ruby and Nokogiri

I have this XML document:
<item>
<title>The Big Bang Theory - Temporada 1</title>
<lge:titleid>season_id4</lge:titleid>
<lge:thumb>https://d12h56qju7t1ah.cloudfront.net/system/artworks/4/h80/the-big-bang-theory-1.20130326114106.jpg?1364298066</lge:thumb>
</item>
and parse it with:
parsed = Nokogiri::XML.parse(File.read(file_doc))
parsed.class returns "Nokogiri::XML::Document".
But when I do:
parsed.at_xpath('lge:titleid').text
I get:
Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //lge:titleid
What could be the reason for this and how can I get the text of that particular node?

Xmllint using XSD+XMLCatalog cannot find uri

I am trying to use xmllint with an XSD file that uses one xs:import without the #schemaLocation attrib set. (In OxygenXMLEditor this setup is working fine)
XSD relevant section:
...
<!-- this one is the trouble -->
<xs:import namespace="http://www.w3.org/1999/xhtml" />
<!-- this one is resolved fine -->
<xs:import namespace="http://www.idpf.org/2007/ops" schemaLocation="conf/b.xsd"/>
...
Given this XML Catalog file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<uri name="http://www.w3.org/1999/xhtml" uri="file:///home/me/code/base5html.xsd"/>
</catalog>
Ok, then I run it:
$ export XML_CATALOG_FILES=~/code/mycatalog.xml
$ export XML_DEBUG_CATALOG=1
$ xmllint --load-trace --noout --schema myschema.xsd test_doc.xml
Result (approx):
Loaded URL="myschema.xsd" ID="(null)"
Loaded URL="conf/b.xsd" ID="(null)"
Loaded URL="c.xsd" ID="(null)"
c.xsd:17: element element: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://www.w3.org/1999/xhtml}p' does not resolve to a(n) element declaration.
c.xsd:18: element element: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://www.w3.org/1999/xhtml}cite' does not resolve to a(n) element declaration.
c.xsd:26: element attributeGroup: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/1999/xhtml}normAttrGrp' does not resolve to a(n) attribute group definition.
...
(more taken out)
...
WXS schema public/ctrl/lov.xsd failed to compile
Loaded URL="myschema.xml" ID="(null)"
Q: Isn't XSD+XMLCat a supported feature in libxml2 ?
( No hints to be found anywhere on the homepage : http://xmlsoft.org/catalog.html )

Trouble Parsing XML using Ruby XML Parser

I am having trouble parsing some returned XML using this command: XML::Parser.string(xml_string).parse
Here is the XML I am trying to parse:
<div style=\"border:1px solid #990000;padding-left:20px;margin:0 0 10px 0;\">
<h4>A PHP Error was encountered</h4>
<p>Severity: Notice</p>
<p>Message: Undefined index: HTTP_USER_AGENT</p>
<p>Filename: test</p>
<p>Line Number: test</p>
</div><?xml version=\"1.0\" encoding=\"UTF-8\"?>
<response>
<review>
<reviewer><![CDATA[test]]></reviewer>
<ip><![CDATA[test]]></ip>
rating><![CDATA[test]]></rating>
<content><![CDATA[test.]]></content>
<date><![CDATA[test]]></date>
</review>
</response>
I get this error:
Fatal error: XML declaration allowed only at the start of the document at :10.Fatal error: Extra content at the end of the document at :11.
LibXML::XML::Error: Fatal error: Extra content at the end of the document
What is going on here?
Your string is not a valid XML document; it appears to be two documents concatenated together. (The first one is a "<div>" the second one is a "<response>".)
Try separating them into two strings and parsing each of them separately.
When you are fetching xml_string, I believe you need to set the user agent. You are not providing a user agent so the server serving the XML is choking.
Use this code to add a user agent to your request:
resp = http.post(path, query, {'User-Agent' => "Ruby"})

How to use xpath from lxml on null namespaced nodes?

What is the best way to handle the lack of a namespace on some of the nodes in an xml document using lxml? Should I first modify all None named nodes to add the "gmd" name and then change the tree attributes to name http://www.isotc211.org/2005/gmd as "gmd"? If so, is there a clean way to do this with lxml or something else that would be relatively clean/safe?
from lxml import etree
nsmap = charts_tree.nsmap
nsmap.pop(None) # complains without this on the xpath with
# TypeError: empty namespace prefix is not supported in XPath
len (charts_tree.xpath('//*/gml:Polygon',namespaces=nsmap))
# 1180
len (charts_tree.xpath('//*/DS_DataSet',namespaces=nsmap))
# 0 ... Bummer!
len (charts_tree.xpath('//*/DS_DataSet'))
# 0 ... Also a bummer
e.g. http://www.charts.noaa.gov/ENCs/ENCProdCat_19115.xml
<DS_Series xmlns="http://www.isotc211.org/2005/gmd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gsr="http://www.isotc211.org/2005/gsr" xmlns:gss="http://www.isotc211.org/2005/gss" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/iso/19139/20070417/gmd/gmd.xsd">
<composedOf>
<DS_DataSet>
<has>
<MD_Metadata>
<parentIdentifier>
<gco:CharacterString>NOAA ENC Product Catalog</gco:CharacterString>
</parentIdentifier>
...
<EX_BoundingPolygon>
<polygon>
<gml:Polygon gml:id="US1AK90M_P1">
<gml:exterior>
<gml:LinearRing>
<gml:pos>67.61505 -178.99979</gml:pos>
<gml:pos>73.99999 -178.99979</gml:pos>
...
<gml:pos>64.99997 -178.99979</gml:pos>
<gml:pos>67.61505 -178.99979</gml:pos>
</gml:LinearRing>
I believe your DS_DataSet is by virtue of being within the DS_Series (implying a default namespace of "http://www.isotc211.org/2005/gmd") carrying a namespace.
Try and map that into your namespace dictionary (you can probably first test through a print to see if it's already in there, otherwise add it and refer to the namespace by your new key).
nsmap['some_ns'] = "http://www.isotc211.org/2005/gmd"
len (charts_tree.xpath('//*/some_ns:DS_DataSet',namespaces=nsmap))
Which becomes:
nsmap['gmd'] = nsmap[None]
nsmap.pop(None)
len(charts_tree.xpath('//*/gmd:DS_DataSet',namespaces=nsmap))

Resources