How to retrieve JSON embedded in HTML comments using XPATH? - xpath

I have HTML output of the form:
<!-- wp:uagb/section {"block_id":"e00ee750-246d-46fd-a034-c6dc37497309","contenttype":"exercise","contenttitle":"here is exercise 1","contentname":"Exercise"} -->
<div id="here-is-exercise-1" class="contenttype-wrapper sometopictype-exercise" data-id="e00ee750-246d-46fd-a034-c6dc37497309">
<!-- wp:paragraph -->
<p>Some stuff</p>
<!-- /wp:paragraph -->
</div>
<!-- /wp:uagb/section -->
So you have these comments that start with <!-- wp:uagb/section and in those comments you have JSON of form
{"block_id":"e00ee750-246d-46fd-a034-c6dc37497309","contenttype":"exercise","contenttitle":"here is exercise 1","contentname":"Exercise"}
I am trying to form an XPATH query (in PHP) that is of form
"Get me all JSON objects that contain parameter `contenttype`"
I am pretty proficient with XPATH when it comes to normal DOM extraction, but not quite sure how to go about this. Ideas?

Ok, based on this post
Extracting Comments from HTML code using Xpath
here is what I came up with:
$contenttypes = $xpath->query("//comment()[contains(.,'contenttype')]");
foreach ($contenttypes as $contenttype) {
preg_match_all("/\\{(.*?)\\}/", $contenttype->data, $matches);
$data = json_decode(trim($matches[0][0], '"'));
And then the data I need is in that $data variable. Didn't realize that xpath has built in support for comments!
I am sure someone has a way to get the JSON object in one xpath query. I couldn't figure that out so just went with preg_match_all.

Related

Accessing Oracle ATG variables with Javascript

I am trying to pass the contents of a bean to javascript so that I can parse it and create a JSON object... (Yes I am still on ATG 9.1). However I am having trouble getting from serverside to client side.... I am new with this stuff and would appreciate any explanation as documentation on this is scarce and not helpful.
<dsp:tomap var="cartMap" bean="MyShoppingCartModifier.order" recursive="true"/>
<script>
var myCartMap = "${cartMap}";
//Logic (easy)
</script>
Doing this generates an "Uncaught SyntaxError: Unexpected token ILLEGAL" on my browser (Chrome)
Any wisdom will greatly help me in my quest in learning this stuff.
The problem is your usage of the tomap tag. You can't just pass in an entire tomap'd object because the tomap tag isn't going to create a nice, parsable json object.
You should either:
1) Format the json yourself right within your tags. Choose only the values that you want from the order.
<script>
var myCart = {
total : '<dsp:valueof bean="MyShoppingCartModifier.order.priceInfo.total">'
...
}
// Then use myCart for something here
</script>
or 2) There's a little known JSP to JSON library found here, http://json-taglib.sourceforge.net, that is very useful. To use that, you'd create a separate page, something like orderJSON.jspf, that is used to generate a pure json object from your order. Then in the page that you require this js, you can do:
<script>
var myCart = <%# include file="/path/to/orderJSON.jspf" %>
// Then use myCart for something here.
</script>

How do I insert unique meta tags per page using docpad events

I'm trying to write a docpad plugin that will allow me to insert meta tags unique to each page, for example og:title or og:description. I've been able to accomplish this globally with the populateCollections event for global values, but have not been able to do this per page.
I'd like for this to work without the need for a template function so that the meta tag is inserted automatically based on the document's meta. One way might be to grab the contentRendered value in the writeBefore event and do string manipulation that way, but that seems hacky.
Any ideas?
This worked for what I needed. Basically, I'm getting the rendered content right before the file is written using the writeBefore event, and doing a very simple string replace which adds the meta tags and their unique values, which is pulled from the model in the collection.
writeBefore: (opts) ->
docPad = #docPad
templateData = docpad.getTemplateData()
siteUrl = templateData.site.url
for model in opts.collection.models
if model.get('outExtension') == 'html'
url = #getTag('og:url', siteUrl+model.get('url'))
title = #getTag('og:title', model.get('title'))
content = model.get('contentRendered')
if content
content = content.replace(/<\/title>/, '</title>'+url+title+description)
model.set('contentRendered', content)
# Helper
getTag: (ogName, data) ->
return "\n <meta property=\"#{ogName}\" content=\"#{data}\" />"
Great answer David, leaving this one if someone faced the same issue I did.
Check if meta tag is broken, if it is - don't render:
renderBefore: (opts) ->
for model in opts.collection.models
if model.get('date').toLocaleDateString()=='Invalid Date'
model.set('write', false)
docpad.log model.get('title')+' has broken date format!\n\n\n\n\n'
false
I am using partials in with collections. Adding what is needed in the document like this:
```
title: Meetings and Events
layout: page
description: "This is my custom description."
tags: ['resources']
pageOrder: 3
pageclass: rc-events
```
I needed a custom CSS class by page. Then you can call it in your default template like this.
<div id="main" class="container <%= #document.pageclass %>">
Should be the same for meta
<meta name="description" content="<%= #document.description) %>" />
or check your docpad.coffee file and put together helper function for prepared content based off of a default site value combined with a #document value. Then you can just call something like the default:
<meta name="description" content="<%= #getPreparedDescription() %>" />
Which is built by this helper function:
# Get the prepared site/document description
getPreparedDescription: ->
# if we have a document description, then we should use that, otherwise use the site's description
#document.description or #site.description

HtmlAgilityPack Div Class Contains String

I'm trying to scrape only article text from web pages. I have discovered that the article is always surrounded with div tags. Unfortunately the class of these div tags is slightly different for each web page. I looked into using XPath but I don't think it will work due to the different class names. Is there a way I can get all the div tags and then get the class?
Examples
<div class="entry_single">
<p>I recently traveled without my notebook for the first time in ages.</p>
</div>
<div class="entry-content-pagination">
<p>Ward 9 Ald. Steven Dove</p>
</div>
That'd be easier using Linq.
foreach(HtmlNode div in doc.DocumentNode.Descendants("div"))
{
string className = div.GetAttributeValue("class", string.Empty);
// do something with class name
}

jQuery GET html as traversable jQuery object

This is a super simple question that I just can't seem to find a good answer too.
$.get('/myurl.html', function(response){
console.log(response); //works!
console.log( $(response).find('#element').text() ); //null :(
}, 'html');
I am just trying to traverse my the html response. So far the only thing I can think of that would works is to regex to inside the body tags, and use that as a string to create my traversable jQuery object. But that just seems stupid. Anyone care to point out the right way to do this?
Maybe its my html?
<html>
<head>
<title>Center</title>
</head>
<body>
<!-- tons-o-stuff -->
</body>
</html>
This also works fine but will not suit my needs:
$('#myelem').load('/myurl.html #element');
It fails because it doesn't like <html> and <body>.
Using the method described here: A JavaScript parser for DOM
$.get('/myurl.html', function(response){
var doc = document.createElement('html');
doc.innerHTML = response;
console.log( $("#element", doc).text() );
}, 'html');
I think the above should work.
When jQuery parses HTML it will normally strip out the html and body tags, so if the element you are searching for is at the top level of your document structure once the html and body tags have been removed then the find function may not be able to locate the element you're searching for.
See this question for further info - Using jQuery to search a string of HTML
Try this:
$("#element", $(response)).text()
This searches for the element ID in the $(response) treating $(response) as a DOM object.
Maybe I'm misunderstanding something, but
.find('#element')
matches elements with an ID of "element," like
<p id="element">
Since I don't see the "tons of stuff" HTML I don't understand what elements you're trying to find.

Html tags in xml (rss)

Followed http://damieng.com/blog/2010/04/26/creating-rss-feeds-in-asp-net-mvc to create RSS for my blog. Everything fine except html tags in xml document. Typical problem:
<br />
insted of
<br />
Normally I would use
#HtmlRaw()
or
MvcHtmlString()
But how can I fix it in XML document created with SyndicationFeed?
Edit:
Ok, I'm starting to think that my question is pointless.
Should I just leave my RSS as it is?
With the XML element, you can wrap the text with your HTML in it in as a CDATA section:
<![CDATA[
your html
]]>
I don't recommend doing that, however.
wrap the text in side the CDATA
var xml= '<person><name><![CDATA[<h1>john smith</h1>]]></name></person>',
xmlDoc = $.parseXML( xml ),
$xml = $( xmlDoc ),
$title = $xml.find( "name" );
$($title.text()).appendTo("body");
DEMO

Resources