FreeMarker: get results as DOM tree - freemarker

I want to use FreeMarker to process an XML template. However, instead of letting FreeMarker write the output to a Writer, I’d like to traverse the processed DOM tree. (Essentially, I want to fill a Protobuf3 structure from it.)
Is that possible?

FreeMarker is not imposing any kind of DOM-like structure on the template — unlike Thymeleaf.
So, it is not possible to automatically get a DOM, but the output has to be parsed into a DOM afterwards.

Related

Get the inner XML using XPath?

This is my XML
<my_xml>
<record>
<p>hello <b>world</b> this is some html</p>
</record>
</my_xml>
Can I use XPath to return the following?
<p>hello <b>world</b> this is some html</p>
my_xml/record/child::*
child::* selects all element children of the context node
see details
The quick answer is, no. You can't accomplish this with XPath, but, once you select the parent node (i.e. "record" in your example), you should be able to manipulate it in whichever language you are using to parse the XML. Unfortunately, it may not be "easy".
It sounds like you would want something like the innerHTML property, but for XML DOM instead of the HTML DOM. Unfortunately, nothing like this exists for the XML DOM. If you don't care about the nodes themselves, you could use the textContent property; in the case of your example, you would get "hello world this is some html", which doesn't seem to be what you want.
Check out this similar question, which includes a parsing algorithm in Java. It seems that you will need to write a similar algorithm in whichever language you're using to parse the XML.
For anyone looking for this in the future, this IS very much possible to do using a DOT, that will return the entire node content as text (at least in MSSQL xpath it does).
'(/my_xml/record/.)[1]'

Using Boost Property Tree to replace DOM Parser

I need to write a XML Parser using Boost Property tree which can replace an existing MSXML DOM Parser. Basically my code should return the list of child nodes, number of child nodes etc. Can this be achieved using Property Tree? Eg. GetfirstChild(),selectNodes(),Getlength()etc.
I saw a lot of APIs related to Boost Property Tree, but the documentation seems to be bare minimum and confusing. As of now, I am able to parse the entire XML using BOOST_FOREACH. But the path to each node is hard coded which will not serve my purpose.
boost::property_tree can be used to parse XML and it's a tree so you can use as XML DOM substitution but the library is not intended to be fully fledged XML parser and it's not complaint with XML standard. For instance it can successfully parse non-wellformed xml input and it doesn't support some of XML features. So it's your choice - if you want simple interface to simple XML configuration then yes, you should use boost::property_tree

Letting Nokogiri decide whether to use #fragment or #parse

I have a piece of HTML that I would like to parse with Nokogiri, but I do not know whether it is a full HTML document (with DOCTYPE, etc) or a fragment (e.g. just a div with some elements in it).
This makes a difference for Nokogiri, because it should use #fragment for parsing fragments but #parse for parsing full documents.
Is there a way to determine whether a given piece of text is a fragment or a full HTML document?
Denis
Depends on how trashed your page is, but
/^(?:\s*<!DOCTYPE)|(?:\s*<html)/
should work in most cases.
The simplest way would be to look for the mandatory <html> tag, using for instance a regular expression /<html[\s>])/ (allowing attributes).
Is this sufficient to solve your problem?

XPath queries using HtmlAgilityPack fails to select notes with self closing tags

I'm trying to query all input nodes. All of the nodes that are not self-closing are being returned fine, but the nodes that are self-closing are not. Is there a way to address this that doesn't require me to changes the HTML?
Thanks!
This is the default behavior. If you want to change it, you need to play with the ElementFlags collection, and for example, just remove INPUT from it, just like I explained for OPTION on a similar question here on SO: XHTML Parsing with HTMLAgilityPack

Dynamically updating RDF File

Is it possible to update an rdf file dynamically from user generated input through a webform? The exact scenario would beskos concept definitions being created and updated through user input to html forms.
I was considering xpath but is there a better / generally accepted / best practice way of doing this kind of thing?
For this type of thing there are IMO two approaches:
1 - Using Named Graphs in a Triple Store
Rather than editing an actual fixed file you use a Graph which is stored as a named graph in a Triple Store that supports triple level updates (i.e. you can change individual Triples in a Graph). For example you could use a store like Virtuoso or a Jena based store (Jena SDB/TDB) to do this, basically any store that supports the SPARUL language or has it's own equivalent.
2 - Using a fixed RDF file and altering it
From your mention of XPath I assume that you are intending to store your file as RDF/XML. While XPath would potentially work for this it's going to be dependent on the exact serialization of your file and may get very complex. If your app is going to allow users to submit and edit their own files then they'll be no guarantees over how the RDF has been serialized into RDF/XML so your XPath expressions might not work. If you control all the serialization and processing of the RDF/XML then you can keep it in a format that your XPath will work on.
From my point of view the simplest way to do this approach is to load the file into memory using an appropriate RDF library, manipulate it in memory and then persist the whole thing back to disk when the user is done (or at regular intervals or whatever is appropriate to your application)

Resources