Nokogiri query for HTML comments contained in JavaScript? - ruby

I have the following HTML and am trying to get the comments in the script nodes:
<html>
<head>
<script language="JavaScript" type="text/javascript">
<!--
url = 'http://someurl.com';
-->
</script>
</head>
</html>
Using this, I get the script nodes:
javascript_code = doc.xpath("/html/head/script")
But, when adding comments() to xpath, it returns nothing:
javascript_code = doc.xpath("/html/head/script/comment()")
I have no idea why this is not working, it seems like it should be simple. Is it possible to get the comment?

If you parse the document as XML, it will find the comment. However, if you parse it as HTML, Nokogiri will put the entire contents of the script tag into a cdata section. You could then parse it out.
require 'rubygems'
require 'nokogiri'
body = DATA.read
doc = Nokogiri::XML(body)
puts doc.search('/html/head/script/comment()').text.strip
# puts "url = 'http://someurl.com';"
doc = Nokogiri::HTML(body)
puts doc.search('/html/head/script').text.strip
# puts "<!--\n url = 'http://someurl.com';\n -->"
__END__
<html>
<head>
<script language="JavaScript" type="text/javascript">
<!--
url = 'http://someurl.com';
-->
</script>
</head>
</html>

Related

How to format HTML returned by Verify.PlayWright for better comparison

I am using Verify.PlayWright and to take HTML element snapshots. When the compare opens, all the HTML is on one line. This makes it hard to see the differences. Is there a way to format the HTML in order to get a nicer comparison?
var root = await page.QuerySelectorAsync("#sectionContainer .tree-root");
await Verifier.Verify(root);
You can use Verify.AngleSharp. It has a feature that ppretty prints html](https://github.com/VerifyTests/Verify.AngleSharp#pretty-print) for comparison purposes.
install https://nuget.org/packages/Verify.AngleSharp/
Call VerifyAngleSharpDiffing.Initialize() once at assembly load time.
use PrettyPrintHtml in your test:
[Test]
public Task PrettyPrintHtml()
{
var html = #"<!DOCTYPE html>
<html><body><h1>My First Heading</h1>
<p>My first paragraph.</p></body></html>";
return Verifier.Verify(html)
.UseExtension("html")
.PrettyPrintHtml();
}
which will produce a verified file containing
<!DOCTYPE html>
<html>
<head></head>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>

Why does the browser tell me the parse is not defined even if I have imported it in the head of the HTML file?

I'm trying to connect my HTML files with the parser server. I followed the direction of the back4app guides and added the following code to the head of index.html. But the browser kept telling me Parse is not defined.
<script src="https://npmcdn.com/parse/dist/parse.min.js"></script>
<script type="text/javascript">
Parse.serverURL = "https://parseapi.back4app.com";
Parse.initialize(
"MY_APP_ID",
"MY_JS_KEY"
);
</script>
Can you please test the code below?
<script src="https://cdnjs.cloudflare.com/ajax/libs/parse/2.1.0/parse.js"></script>
<script type="text/javascript">
function myFunction() {
Parse.initialize("APP_ID", "JS_KEY");
Parse.serverURL = 'https://parseapi.back4app.com/';
}
/</script>

Ruby Mechanize gem not following meta refresh

I have a Ruby 2.2 automation script that uses Mechanize to log in to Google Payments. When I try to access the url Mechanize stops on the meta refresh. The content of the page is:
<!DOCTYPE html>
<html>
<head>
<title>Redirecting...</title>
<script type="text/javascript" language="javascript">
var url = 'https:\/\/accounts.google.com\/ServiceLogin?
service\x3dbilling\x26passive\x3d1209600\x26continue\x3dhttps:\/\/payments.google.com\/
payments\/home%23__HASH__\x26followup\x3dhttps:\/\/payments.google.com\/payments\/
home'; var fragment = ''; if (self.document.location.hash) {fragment = self.document.
location.hash.replace(/^#/,'');}url = url.replace(new RegExp("__HASH__", 'g'),
encodeURIComponent(fragment));window.location.assign(url);
</script><noscript><meta
http-equiv="refresh" content="0; url='https://accounts.google.com/ServiceLogin?
service=billing&passive=1209600&continue=https://payments.google.com
/payments/home&followup=https://payments.google.com/payments/home'"></meta>
</noscript></head>
<body></body>
</html>
Here is the part of my script to get to the login screen:
#agent = Mechanize.new
#agent.follow_meta_refresh = true
page = #agent.get("http://payments.google.com/payments/home")
puts page.content
The page.content at the end only shows the above html, the meta refresh is not followed. Any suggestions on how I can follow that would be greatly appreciated.
Assuming the script isn't really reformatted like that:
url = page.body[/url = '(.*?)'/, 1]
page = #agent.get url

Read/write to Parse Core db from Google Apps Script

I'm just starting to use Parse Core (as Google'e ScriptDB is being decommissioned soon) and am having some trouble.
So I'm able to get Parse Core db to read/write using just a standard HTML page as shown below:
<!doctype html>
<head>
<meta charset="utf-8">
<title>My Parse App</title>
<meta name="description" content="My Parse App">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="css/reset.css">
<link rel="stylesheet" href="css/styles.css">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
<script type="text/javascript" src="http://www.parsecdn.com/js/parse-1.2.18.min.js"></script>
</head>
<body>
<div id="main">
<h1>You're ready to use Parse!</h1>
<p>Read the documentation and start building your JavaScript app:</p>
<ul>
<li>Parse JavaScript Guide</li>
<li>Parse JavaScript API Documentation</li>
</ul>
<div style="display:none" class="error">
Looks like there was a problem saving the test object. Make sure you've set your application ID and javascript key correctly in the call to <code>Parse.initialize</code> in this file.
</div>
<div style="display:none" class="success">
<p>We've also just created your first object using the following code:</p>
<code>
var TestObject = Parse.Object.extend("TestObject");<br/>
var testObject = new TestObject();<br/>
testObject.save({foo: "bar"});
</code>
</div>
</div>
<script type="text/javascript">
Parse.initialize("PyMFUxyBxR8IDgndjZ378CeEXH2c6WLK1wK2JHYX", "IgiMfiuy3LFjzH0ehmyf5Rkti8AmVtwcGqc6nttN");
var TestObject = Parse.Object.extend("TestObject");
var testObject = new TestObject();
testObject.save({foo: "bar"}, {
success: function(object) {
$(".success").show();
},
error: function(model, error) {
$(".error").show();
}
});
</script>
</body>
</html>
However, when I try to serve that up using the HtmlService shown below, I get no response from Parse. Parse Core.html basically has all of the code I have above ( only thing I changed was to remove the css calls).
function doGet() {
var htmlPage = HtmlService.createTemplateFromFile('Parse Core.html')
.evaluate()
.setSandboxMode(HtmlService.SandboxMode.NATIVE)
.setTitle('Parse Core Test');
return htmlPage;
}
Link to ParseDb Library for Apps Script
Here is the key to add the library: MxhsVzdWH6ZQMWWeAA9tObPxhMjh3Sh48
Install that library and it allows you to use most of the same methods that were used by ScriptDb. As far as saving and querying go they almost identical. Make sure to read the Library's notes, how to add the applicationId and restApiKey. It is a little different that you can silo data by classes which must be defined in the call to Parse.
Bruce here is leading the way on database connection for Apps Script, he has plenty of documentation on using Parse.com, and also his own DbConncection Drive that would allow you to use a number of back-end systems.
Excel Liberation - Bruce's Site.

HTMLUnit HtmlPage.getBody() returns null even though the response contains a <body> tag

I'm using HTMLUnit in Java to extract information from website.
Ran into a strange phenom where the page is not fully parsed into the DOM tree.
After the following:
HtmlPage lineHours = (HtmlPage) _webClient.getTopLevelWindows().get(1).getEnclosedPage();
Watching the expression lineHours.asXml() results in the following (... marks ommitted sensitive data)
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<script ...>
</script>
</head>
</html>
While printing lineHours.getWebResponse().getContentAsString() results in the following:
<html>
<head>
<script ...>
</script>
</head>
</html>
<body>
<div> ...
In short, the body tag is not parsed into the DOM tree. and therefore all XPath queries and helper methods such as HtmlPage.getBody() fail. In a regular browser the page renders well.
Any ideas?
Thanks
Tomer
This was eventually solved by parsing the DOM tree using a Xerces parser and retrieving the result from it.

Resources