How to get all the nodes which are coming after a particular tag using Nokogiri - ruby

I want to fetch all the HTML tags which are coming after the particular tag. For example:
<html>
<body>
<p>one</p>
<u><p>Two</p></u>
<b><p>Three</p></b>
<p>Four</p>
<table>
<tr><td>Five</td></tr>
<tr><td>Six</td></tr>
</table>
</body>
</html>
I want all the HTML tags which are coming after <u><p>Two</p></u> using Nokogiri.
My result should be:
<b><p>Three</p></b>
<p>Four</p>
<table>
<tr><td>Five</td></tr>
<tr><td>Six</td></tr>
</table>

The following-sibling XPath axis is what you want here. Your example isn’t valid HTML, and Nokogiri will change it when parsing as HTML making it hard to demonstrate using it, but with this similar code:
<html>
<body>
<p>one</p>
<p>Two</p>
<p>Three</p>
<p>Four</p>
<table>
<tr><td>Five</td></tr>
<tr><td>Six</td></tr>
</table>
</body>
</html>
this XPath expression:
//p[.="Two"]/following-sibling::*
will select this:
<p>Three</p>
<p>Four</p>
<table>
<tr><td>Five</td></tr>
<tr><td>Six</td></tr>
</table>
You might want to use node() instead of *, which will select all text nodes as well as elements (including whitespace only nodes):
<p>Three</p>
<p>Four</p>
<table>
<tr><td>Five</td></tr>
<tr><td>Six</td></tr>
</table>
(There will be some more leading whitespace on each line if you do this, I‘ve removed it here.)

Related

How to Load Data From a Json Using Thymeleaf Template

I have a rest api returns a Json value as a Output of the service call.
eg:- https://localhost:8080/getEmployees/loadAll
this returns following json values
eg:-
{
"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]
}
I need to load the following json values to my thymeleaf table.
In normal way returning values in controller using modal in spring can retun values as list like following.
<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="ISO-8859-1">
<title>Employee List</title>
</head>
<body>
<h1>Welcome</h1>
<br>
<h3>Employee List</h3>
<br />
<table border="1">
<tr>
<td>Employee First Name</td>
<td>Employee Last Name</td>
</tr>
<tr th:each="emp : ${empList}">
<td th:text="${emp.firstName}">First Name</td>
<td th:text="${emp.name}">Last Name</td>
</tr>
</table>
</body>
</html>
is there a way to accomplish this using above json using thymeleaf?
You can do something like that using the following structure.
When you call the service
https://localhost:8080/getEmployees/loadAll
you will need to pass the employees data using model.addAttribute.
For instance, let's say you have the following method:
#RequestMapping(value="/getEmployees/loadAll")
String getAllEmployees(Model model) {
model.addAttribute("empList", <your service here that generates the data>);
return "pagenamehere";
}
The above method, will only be executed when you make a call using the following url: https://localhost:8080/getEmployees/loadAll
and it will add your empList data as an attribute. Then, the return string indicates the name of the page that will load. You will need to use your own page with the thymeleaf code.
<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="ISO-8859-1">
<title>Employee List</title>
</head>
<body>
<h1>Welcome</h1>
<br>
<h3>Employee List</h3>
<br />
<table border="1">
<tr>
<td>Employee First Name</td>
<td>Employee Last Name</td>
</tr>
<tr th:each="emp : ${empList}">
<td th:text="${emp.firstName}">First Name</td>
<td th:text="${emp.lastNname}">Last Name</td>
</tr>
</table>
</body>
</html>
Now, thymeleaf will be able to display the given data.
I think that you are a little confused. Thymeleaf templates are compiled on server side generating html code. Then, no thymeleaf code found on client side.
The json data got of the api response is generated on client side.
One way is use javascript to load the api response data into a html table.
Another way can you take is modify the controller that calls to the thymeleaf template to get the JSon value. If you store this response (on an object List named empList on your example) yo can add the object into the Controller response (Model or ModelAndView objects) as a template attribute.

Jmeter: Xpath to get text upto certain number of characters

From below snippet I want to get title as My status Report(ABCDEFGH12160916)
I have thousands of titles in my html.
//td[#class="dealertitle"]//text() -- this gets me
My status Report (ABCDEFGH12160916)
* Live, Billable, CRM *
I have also tried
//td[#class="dealertitle"]//text()//substring-before(text(),')')---
Jmeter does not allow me to use substring-before. It says unknown node type
substring-before
Can someone please help me.
I want to get this text till the end - My status Report (ABCDEFGH12160916)
<html>
<head>
<body>
<table class="secondhead" width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td class="title">
My status Report (ABCDEFGH12160916)
<span style="font-size:14pt;color:#00FF00"> * Live, Billable, CRM *
</span>
</td>
</tr>
</tbody>
</table>
</body>
</head>
</html>
//td[#class="dealertitle"]//text()[1]
You have an extra / in your XPath query
Your query:
//td[#class="dealertitle"]//text()
Correct query:
//td[#class="dealertitle"]/text()
Demo:
Explanation: as per XPath Syntax article
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
See Using the XPath Extractor in JMeter guide for more details on using XPath for correlation in JMeter tests.

Jmeter: normalize-space to remove white spaces

How do I remove extra spaces in tile ? Using below throws an error as unknown
node:normalize-space. this is the one I tried.
//td[#class="title"]/text()/normalize-space(.)
<html>
<head>
<body>
<table class="secondhead" width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td class="title">
My status Report (ABCDEFGH12160916)
<span style="font-size:14pt;color:#00FF00"> * Live, Billable, CRM *
</span>
</td>
</tr>
</tbody>
</table>
</body>
</head>
</html>
As per XPath functions reference
fn:normalize-space(string)
fn:normalize-space()
Removes leading and trailing spaces from the specified string, and replaces all internal sequences of white space with one and returns the result. If there is no string argument it does the same on the current node
Example: normalize-space(' The XML ')
Result: 'The XML'
So you should be using the following expression instead:
normalize-space(//td[#class="title"]/text())
Check out Using the XPath Extractor in JMeter guide to learn more about dealing with XPath and JSON Path in JMeter

Nokogiri and tables

Am parsing a web page with a standard structure as follows:
<html>
<body>
<table>
<tbody>
<tr class="active">
<td>name1</td>
<td>name2</td>
<td>name3</td>
</tr>
</tbody>
</table>
</body>
</html>
For the life of me, I can't access the 'tbody' or 'tr' elements.
response = open('http://my_url')
node = Nokogiri::HTML(response).css('table')
puts node
Returns
#<Nokogiri::XML::Element:0x8294c08c name="table" attributes=[#<Nokogiri::XML::Attr:0x8294c014 name="id" value="beta-users">] children=[#<Nokogiri::XML::Text:0x82953bc0 "\n">]>
I have tried various tricks but can't seem to dig deeper down to a lower-level child than 'table'.
At best, I can get to the lowest-level Text object by using
node.children
but
node.children.text
returns "\n".
Despite searching for some hours am none the wiser how to sort it out. Any thoughts?
There is a non-closed class value in your sample, it should be:
<html>
<body>
<table>
<tbody>
<tr class="active">
<td>name1</td>
<td>name2</td>
<td>name3</td>
</tr>
</tbody>
</table>
</body>
</html>
After correcting this, you can:
node = Nokogiri::HTML(response).css('table tbody tr td')
node.each {|child| puts child.text}
name1
name2
name3

Simple wkhtmltopdf conversion with framesets creating empty pdf

We need to convert/provide our html-based in-app HelpSystem to an on-disc pdf for the client to view outside of the application.
I'm trying to use wkhtmltopdf with a very basic file (3 frames with links to simple .html files) but getting an empty .pdf when I run the following from the command line:
wkhtmltopdf "C:\Program Files (x86)\wkhtmltopdf\index.html" "c:\delme\test.pdf"
I know frames are somewhat deprecated but it’s what I’ve got to deal with. Are the frames causing the empty pdf?
Index.html:
<html>
<head>
<title>Help</title>
</head>
<frameset cols="28%, 72%">
<frameset rows="8%, 92%">
<frame noresize="noresize" src="Buttons.html" name="UPPERLEFT" />
<frame noresize="noresize" src="mytest2.html" name="LOWERLEFT" />
</frameset>
<frame noresize="noresize" src="mytest.html" name="RIGHT" />
</frameset>
</html>
mytest.html:
<html>
<body>
<p>
<b>This text is bold</b>
</p>
<p>
<strong>This text is strong</strong>
</p>
<p>
<em>This text is emphasized</em>
</p>
<p>
<i>This text is italic</i>
</p>
<p>
<small>This text is small</small>
</p>
<p>This is
<sub>subscript</sub> and
<sup>superscript</sup></p>
</body>
</html>
mytest2.html:
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<h2>The blockquote Element</h2>
<p>The blockquote element specifies a section that is quoted from another source.</p>
<p>Here is a quote from WWF's website:</p>
<blockquote cite="http://www.worldwildlife.org/who/index.html">For 50 years, WWF has been protecting the future of nature. The
world’s leading conservation organization, WWF works in 100 countries and is supported by 1.2 million members in the United
States and close to 5 million globally.</blockquote>
<p>
<b>Note:</b> Browsers usually indent blockquote elements.</p>
<h2>The q Element</h2>
<p>The q element defines a short quotation.</p>
<p>WWF's goal is to:
<q>Build a future where people live in harmony with nature.</q> We hope they succeed.</p>
<p>
<b>Note:</b> Browsers insert quotation marks around the q element.</p>
</body>
</html>
buttons.html:
![<html>
<body>
<center>
<table>
<tr>
<td>
<form method="link" action="mytest.html" target="LOWERLEFT">
<input type="submit" value="Contents" />
</form>
</td>
<td>
<form method="link" action="mytest2.html" target="LOWERLEFT">
<input type="submit" value="Index" />
</form>
</td>
</tr>
</table>
</center>
</body>
</html>][2]
Taken from the official wkhtmltopdf issues area from a code project member’s answer; emphasis is mine:
wkhtmltopdf calculates the TOC based on the H* (e.g. H1, H2 and so on)
tags in the supplied documents. It does not recurse into frames and
iframes.. It will nest dependend on the number, to make sure that it
does the right thing, it is good to make sure that you only have
tags under a tag and not for some k larger
then 1. 2000+ files sounds like a lot. You might run out of memory
while converting the output. If it does not work for you.. you could
try using the switch to dump the outline to a xml file, to see what it
would but into a TOC.

Resources