HTMLUnit HtmlPage.getBody() returns null even though the response contains a <body> tag - htmlunit

I'm using HTMLUnit in Java to extract information from website.
Ran into a strange phenom where the page is not fully parsed into the DOM tree.
After the following:
HtmlPage lineHours = (HtmlPage) _webClient.getTopLevelWindows().get(1).getEnclosedPage();
Watching the expression lineHours.asXml() results in the following (... marks ommitted sensitive data)
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<script ...>
</script>
</head>
</html>
While printing lineHours.getWebResponse().getContentAsString() results in the following:
<html>
<head>
<script ...>
</script>
</head>
</html>
<body>
<div> ...
In short, the body tag is not parsed into the DOM tree. and therefore all XPath queries and helper methods such as HtmlPage.getBody() fail. In a regular browser the page renders well.
Any ideas?
Thanks
Tomer

This was eventually solved by parsing the DOM tree using a Xerces parser and retrieving the result from it.

Related

How to format HTML returned by Verify.PlayWright for better comparison

I am using Verify.PlayWright and to take HTML element snapshots. When the compare opens, all the HTML is on one line. This makes it hard to see the differences. Is there a way to format the HTML in order to get a nicer comparison?
var root = await page.QuerySelectorAsync("#sectionContainer .tree-root");
await Verifier.Verify(root);
You can use Verify.AngleSharp. It has a feature that ppretty prints html](https://github.com/VerifyTests/Verify.AngleSharp#pretty-print) for comparison purposes.
install https://nuget.org/packages/Verify.AngleSharp/
Call VerifyAngleSharpDiffing.Initialize() once at assembly load time.
use PrettyPrintHtml in your test:
[Test]
public Task PrettyPrintHtml()
{
var html = #"<!DOCTYPE html>
<html><body><h1>My First Heading</h1>
<p>My first paragraph.</p></body></html>";
return Verifier.Verify(html)
.UseExtension("html")
.PrettyPrintHtml();
}
which will produce a verified file containing
<!DOCTYPE html>
<html>
<head></head>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>

Cannot print this document yet, it is still being loaded - Firefox Printer Error

My API generates dynamic HTML document and dumps it into a popup window like so:
var popup = window.open('', "_blank", 'toolbar=0,location=0,menubar=1,scrollbars=1');
popup.document.write(result);
After the document is reviewed by a user, they can print it calling
window.print();
Chrome handles it without any problems, but Firefox shows a Printer error:
"Cannot print this document yet, it is still being loaded"
Printer window opens only if I hit Ctrl+R.
It appears that $(document).ready() never happens in firefox and it keeps waiting for something to load.
Status bar in popup says Read fonts.gstatic.com
Here's a brief content of a document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="https://fonts.googleapis.com/css?family=Orbitron|Jura|Prompt" rel="stylesheet">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.0/jquery.min.js"></script>
<title>Invoice #15001</title>
<style>
...
</style>
</head>
<body>
<div id="invoice_body" >
...
</div><!-- Invoice body -->
</body>
</html>
I have a feeling it has something to do with Google fonts. Any help is appreciated
When you pass "" as the URL to window.open, Firefox loads 'about:blank' at which point script security is likely preventing you from pulling in external resources via http or https ...
I am able to reproduce your problem and have it popup with the same error when I try to print-- I was able to get it working by using a data url when calling window.open ...
Based on your example, result is a string containing the HTML for the popup, so you would call window.open like this, and no longer use document.write for anything:
var popup = window.open("data:text/html;charset=utf-8,"+result, "printPopup", "toolbar=0,location=0,menubar=0,scrollbars=1");
I tested this with result being a string containing:
<html><head>
<link rel="stylesheet"href="https://fonts.googleapis.com/css?family=Tangerine">
<style> body { font-family: 'Tangerine', serif; font-size: 48px; } </style>
<title>Test</title></head>
<body>
<div>Testing testing</div>
<div>Print</div>
</body>
</html>
And clicking the print link worked as expected...
I had to go an extra mile, but:
I added server side code that would save a html file and pass a link to that file instead of html content:
ob_start();
include('ezts_invoice_template.php');
$dom = ob_get_clean();
$ezts_file_path = EZTS_PLUGIN_PATH.'kernel/tmp/'.session_id().'_tmp.html';
$ezts_file = fopen($ezts_file_path, 'w+');
$result = fwrite($ezts_file, $dom);
fclose($ezts_file);
print_r('{"result":"success", "file":"'.plugin_dir_url(__FILE__).'tmp/'.session_id().'_tmp.html"}');
in JS I open a popup by a link passed from PHP:
var popup = window.open(result.file, "_blank", 'toolbar=0,location=0,menubar=0,scrollbars=1');
and, finally, in template file I added event listener to request deletion of temporary file when the window is closed
window.addEventListener('beforeunload', function(event) {
window.opener.eztsApiRequest('deleteTempFile',
'',
function(result, status){ console.log(result); });
}, false);
It's not as easy, but it works great.

Polymer iron-ajax data binding example not working

I'm having problems with iron-ajax and data binding in Polymer 1.0.2. Not even a slightly changed example from the Polymer documentation is working.
Here is the code with my changes:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<script src="../../../bower_components/webcomponentsjs/webcomponents-lite.js"></script>
<link rel="import" href="../../../bower_components/polymer/polymer.html">
<link rel="import" href="../../../bower_components/iron-ajax/iron-ajax.html">
</head>
<body>
<template is="dom-bind">
<iron-ajax
auto
url="http://jsonplaceholder.typicode.com/posts/"
lastResponse="{{data}}"
handleAs="json">
</iron-ajax>
<template is="dom-repeat" items="{{data}}">
<div><span>{{item.id}}</span></div>
</template>
</template>
<script>
(function (document) {
'use strict';
var app = document.querySelector('#app');
window.addEventListener('WebComponentsReady', function() {
var ironAjax = document.querySelector('iron-ajax');
ironAjax.addEventListener('response', function() {
console.log(ironAjax.lastResponse[0].id);
});
ironAjax.generateRequest();
});
})(document);
</script>
</body>
</html>
All I changed was entering a URL to get a real JSON response and setting the auto and handleAs properties. I also added a small script with a listener for the response event. The listener is working fine and handles the response, but the spans in the dom-repeat template aren't rendered.
I'm using Polymer 1.0.2 and iron-elements 1.0.0
It seems the documentation you is missing a - character in the lastresponse attribute of the example.
You must change lastResponse to last-response.
Look at this example from the iron-ajax github page.
when you use a attribute on a element, you have to convert the camelcase sentence to dashes sentence, I mean:
lastResponse is maps to last-response
Property name to attribute name mapping

Read/write to Parse Core db from Google Apps Script

I'm just starting to use Parse Core (as Google'e ScriptDB is being decommissioned soon) and am having some trouble.
So I'm able to get Parse Core db to read/write using just a standard HTML page as shown below:
<!doctype html>
<head>
<meta charset="utf-8">
<title>My Parse App</title>
<meta name="description" content="My Parse App">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="css/reset.css">
<link rel="stylesheet" href="css/styles.css">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
<script type="text/javascript" src="http://www.parsecdn.com/js/parse-1.2.18.min.js"></script>
</head>
<body>
<div id="main">
<h1>You're ready to use Parse!</h1>
<p>Read the documentation and start building your JavaScript app:</p>
<ul>
<li>Parse JavaScript Guide</li>
<li>Parse JavaScript API Documentation</li>
</ul>
<div style="display:none" class="error">
Looks like there was a problem saving the test object. Make sure you've set your application ID and javascript key correctly in the call to <code>Parse.initialize</code> in this file.
</div>
<div style="display:none" class="success">
<p>We've also just created your first object using the following code:</p>
<code>
var TestObject = Parse.Object.extend("TestObject");<br/>
var testObject = new TestObject();<br/>
testObject.save({foo: "bar"});
</code>
</div>
</div>
<script type="text/javascript">
Parse.initialize("PyMFUxyBxR8IDgndjZ378CeEXH2c6WLK1wK2JHYX", "IgiMfiuy3LFjzH0ehmyf5Rkti8AmVtwcGqc6nttN");
var TestObject = Parse.Object.extend("TestObject");
var testObject = new TestObject();
testObject.save({foo: "bar"}, {
success: function(object) {
$(".success").show();
},
error: function(model, error) {
$(".error").show();
}
});
</script>
</body>
</html>
However, when I try to serve that up using the HtmlService shown below, I get no response from Parse. Parse Core.html basically has all of the code I have above ( only thing I changed was to remove the css calls).
function doGet() {
var htmlPage = HtmlService.createTemplateFromFile('Parse Core.html')
.evaluate()
.setSandboxMode(HtmlService.SandboxMode.NATIVE)
.setTitle('Parse Core Test');
return htmlPage;
}
Link to ParseDb Library for Apps Script
Here is the key to add the library: MxhsVzdWH6ZQMWWeAA9tObPxhMjh3Sh48
Install that library and it allows you to use most of the same methods that were used by ScriptDb. As far as saving and querying go they almost identical. Make sure to read the Library's notes, how to add the applicationId and restApiKey. It is a little different that you can silo data by classes which must be defined in the call to Parse.
Bruce here is leading the way on database connection for Apps Script, he has plenty of documentation on using Parse.com, and also his own DbConncection Drive that would allow you to use a number of back-end systems.
Excel Liberation - Bruce's Site.

Ajax using <g:remoteLink> in Grails

Reading through the Grails docs (see here http://grails.org/doc/latest/guide/theWebLayer.html#ajax), I was led to believe that I could use Ajax to update a div using the following syntax:
My view (Ajax/index.gsp)
<!doctype html>
<head>
<meta name="layout" content="main"/>
</head>
<body>
<div id="error"></div>
<div id="message"></div>
<g:remoteLink action="retrievePets" update="message">Ajax magic... Click here</g:remoteLink>
</body>
</html>
My controller (AjaxController):
package genericsite
class AjaxController {
def index() { }
def retrieveMessage() {
render "Weeee! Ajax!"
}
}
However, when I select the link, it just sends me to a page with "Weeee! Ajax!" I know how to do this the typical jQuery way. This is slightly more convenient...
The default "main" layout doesn't include a javascript library by default, so if you want to use remoteLink or any of its associates you'll need to add
<r:require module="jquery"/>
or (if you're on a pre-2.0 version of Grails or not using the resources plugin)
<g:javascript library="jquery"/>
to the <head> section of your GSP.

Resources