Is it possible to alter a php file using XPath? - xpath

I"m unsure about this. Would having PHP ( or I guess any template language like Django's or Mako or whatever ) inside an html file prevent me from making changes to it with XPath?
I'm very new to XPath. I would think that you could not, but as I said, I'm unsure.

Xpath is a query language. You use it to query XML content, not change it.
You can use Xpath in conjunction with other technologies (XSLT is the first one that comes to mind) in order to query you XML and then use the results of these queries to transform your XML.

XPath doesn't change the XML document.
Use XSLT or a any other XPath-hosting language that can produce a new XML document.

Related

Get the inner XML using XPath?

This is my XML
<my_xml>
<record>
<p>hello <b>world</b> this is some html</p>
</record>
</my_xml>
Can I use XPath to return the following?
<p>hello <b>world</b> this is some html</p>
my_xml/record/child::*
child::* selects all element children of the context node
see details
The quick answer is, no. You can't accomplish this with XPath, but, once you select the parent node (i.e. "record" in your example), you should be able to manipulate it in whichever language you are using to parse the XML. Unfortunately, it may not be "easy".
It sounds like you would want something like the innerHTML property, but for XML DOM instead of the HTML DOM. Unfortunately, nothing like this exists for the XML DOM. If you don't care about the nodes themselves, you could use the textContent property; in the case of your example, you would get "hello world this is some html", which doesn't seem to be what you want.
Check out this similar question, which includes a parsing algorithm in Java. It seems that you will need to write a similar algorithm in whichever language you're using to parse the XML.
For anyone looking for this in the future, this IS very much possible to do using a DOT, that will return the entire node content as text (at least in MSSQL xpath it does).
'(/my_xml/record/.)[1]'

Problems with Xalan using XPATH (unclosed tags)

Greetings,
I'm facing a problem with the following tech-stack: JWebUnit -> HtmlUnit -> Xalan.
I'm trying to find an element by XPATH, but the HTML document is pretty malformed.
Xalan stops finding elements when I reach the /body element on XPATH. I believe it's because the document contains two <body> tags and one being unclosed.
Everything works for /html/head or /html. But when I try /html/body (or /html/body[1], //body[1], or anything inside those tags) I get only null from Xalan.
Is there any way to get around with that? I just can't change the html document istself. Thank you kindly for your attention.
Best regards,
Thiago
HtmlUnit must be using something to convert HTML to XML. Perhaps you can tell it to use jsoup or tagsoup, which are very tolerant of messy HTML?
You might as well also write code to just dump the XML tree to a file so you can see what's in it.

XPATH remove attribute

Hi does anyone know hwo to remove an attrbute using xpath. In particular the rel attribute and its text from a link. i.e. <a href='http://google.com' rel='some text'>Link</a> and i want to remove rel='some text'.
There will be multiple links in the html i am parsing.
You can select items using xpath, but that's all it can do - it is a query language.
You need to use XSLT or an XML parser in order to remove attributes/elements.
As pointed out by Oded, Xpath merely identifies XML nodes. To remove/edit XML, you need some additional tooling.
One solution is the Ant-based plugin XMLTask (disclaimer - I wrote this). It provides a simple mechanism to read an XML file, identify parts of that using XPath, and change it (including removing nodes).
e.g.
<remove path="web/servlet/context[#id='redundant']"/>
Have you already tried using Javascript for this If that is applicable in your scenario:-
var allLinks=document.getElementsByTagName("a");
for(i=0;i<allLinks.length;i++)
{
allLinks[i].removeAttribute("rel");
}

Parse XHTML using Ruby

Is there any way I can parse a remote html page, in Ruby, preferably using jQuery like selectors?
For example, I could select all the div having a specific class, and get the content of all those elements in an array.
I was trying to use Regex for this, but I think using XML parser would be better.
I found hpricot is very similar.

Is there an XPath equivilent for Linq to XML?

I have been using Linq to XML for a few hours and while it seems lovely and powerful when it comes to loops and complex selections, it doesn't seem so good for situations where I just want to select a single node value which XPath seems to be good at.
I may be missing something obvious here but is there a way to use XPath and Linq to XML together without having to parse the document twice?
You can still use XPath, with the XPathEvaluate, XPathSelectElement and XPathSelectElements extension methods. You can also call CreateNavigator to create an XPathNavigator.

Resources