XPATH - add concatenation into multiple attributes

XPATH - add concatenation into multiple attributes - xpath

Here is my code:
!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<link rel="alternate" hreflang="en" href="http://www.example.com/page-59.html"/>
<div id="" class="pgLinks">
«
1
<span class="paging pageDisplay">2</span>
When I run this query, it returns either the top URL on the page "http://www.example.com/page-59.html" OR if it a "1" is present here:
1
it returns the URL from the href which is:
/example-text-2
The thing is I want the full URL:
http://www.example.com/example-text-2
I basically need to add a URL to the second part of this so it joins the second result if present, so it is something like this:
(//link[#hreflang='en'] | "SITE URL HERE" //div[#class='pgLinks']/a[.='1'])[last()]/#href
I have tried concat:
(//link[#hreflang='en'] | concat("http://www.example.com", //div[#class='pgLinks']/a[.='1']))[last()]/#href)
And so many other variations including using the pipe "|" but cannot figure it out at all.
Grateful for any help.

Assuming you only have xpath 1.0 support, you can do the xpath:
concat(
substring(
concat(
'http://www.example.com',
//div[#class='pgLinks']/a[.='1']/#href
),
1 div boolean(//div[#class='pgLinks']/a[.='1'])
),
substring(
//link[#hreflang='en']/#href,
1 div not(//div[#class='pgLinks']/a[.='1'])
)
)
This is an application of an answer on implementing an if-else statement.

Related

<img> after <h1> does not validate as HTML 4.01 Strict

My markup is
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>klaymen - About</title>
</head>
<body>
<h1>Klaymen</h1>
<img src="resources/klaymen-about.jpg" width="200" alt="Klaymen's about picture">
</body>
When I test the document with this validator https://validator.w3.org/#validate_by_input I get the following error:
document type does not allow element "IMG" here; missing one of "P", "H1", "H2", "H3", "H4", "H5", "H6", "DIV", "ADDRESS" start-tag
Clearly I have an H1, and img is a flow content element which supposed to be allowed in this location, so what is the problem?

You can use a div container.
<div>
<img src=""/>
<span display: block>Text below the image</span>
</div>
This happens because the body of a document in this spec cannot contain an inline element like <img>, thus, by putting it inside a block element like <div>, all's fixed.

Nokogiri Scraping Misses HTML

Nokogiri isn't grabbing anything beneath the iframe tag.
doc.search("iframe") returns only the iframe tag. doc.search("body.content-frame") returns empty. doc.errors returns empty also. Why isn't Nokogiri registering the HTML beneath the iframe? How can I grab it?
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body onunload="clearMyTimeInterval()">
<iframe id="content-frame" frameborder="0" src="/sportsbook/betting-lines/baseball/2014-08-21/?range=day" onload="javascript:checkLoadedFrame(this);" style="background-color: rgb(34, 34, 34); height: 1875px;" name="content-frame" scrolling="no" border="0">
#document
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body class="content-frame">
#ETC.......

That's because the contents of the iframe are not part of the page. In fact, they are in a completely different location (note the src attribute of the iframe). You'll have to fetch that content separately, which is how a browser would do it.

Here is code that handles it:
page = Mechanize.new.get "http://page_u_need"
page.iframe_with(id: 'beatles').content

document type does not allow element "div" here; assuming missing "object" start-tag

Error Line 18, Column 19: document type does not allow element "div" here; assuming missing "object" start-tag
Please see the page source below
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta name="keywords" content="Karnataka,Bangalore_Rural,Healthcare,Office_Assistant,Kerala,Ernakulam,IT_Hardware_Networking,Engineer,Sales___Marketing,Executive,Maharashtra,Mumbai_City,Retailing,Manager,Kollam,CRM_CallCentres_BPO_ITES_Med.Trans,Customer_Care,Hotel_Travel_Tourism_Airlines_Hospitality,Front_Office_Staff,Andhra_Pradesh,Hyderabad,IT_Software,Java_Developer,Pathanamthitta,Manufacturing_Industrial,Educational_Training,Teacher,Engineering_Projects"/>
<meta name="description" content="The best job oriented resume sharing system. Create and Publish your online resumes for FREE. Search and apply your dream jobs for FREE. Post your jobs for FREE."/>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<div id="fb-root"></div>

I do believe you need to put the < div > inside a < body > section.

Javascript getElementById from parent window to child window

Hello and thank you for reading my post.
Here is what I basically want to do:
in a first HTML page ("parent.html"), there is a button ;
when a user clicks the button a new window pops up ("child.html")
AND the contents of a "div" element in the child window is updated.
The final action is unsuccessful under "Firefox" and "Chrome".
parent.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Parent window document</title>
</head>
<body>
<input
type="button"
value="Open child window document"
onclick="openChildWindow()" />
<script type="text/javascript">
function openChildWindow()
{
var s_url = "http://localhost:8080/projectroot/child.html";
var s_name = "ChildWindowDocument";
var s_specs = "resizable=yes,scrollbars=yes,toolbar=0,status=0";
var childWnd = window.open(s_url, s_name, s_specs);
var div = childWnd.document.getElementById("child_wnd_doc_div_id");
div.innerHTML = "Hello from parent wnd";
}
</script>
</body>
</html>
child.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Parent window document</title>
</head>
<body>
<div id="child_wnd_doc_div_id">child window</div>
</body>
</html>
IE9 => it works.
Firefox 13.0.1 => it doesn't work. Error message: "div is null".
Chrome 20.0.1132.47 m => doesn't work.
Do you understand that behaviour?
Can you help me make it work in these three cases?
Thank you and best regards.

I think that the window/document is not loaded at the time when you try to access the elements from it. You can do something like
childWnd.onload = function() {
var div = childWnd.document.getElementById("child_wnd_doc_div_id");
div.innerHTML = "Hello from parent wnd";
}
Also you can take a look at the mdn doc.
A better approach to the problem may be to do the changes in the 'child'. You can access the parent window with window.opener. But you should keep in mind that the parent window could be closed so you should consider some type of local storage (e.g. cookie).

XPath: select nodes with explicit 'xmlns' attribute

Could anyone please provide XPath expression which selects all nodes that have explicit 'xmlns' attribute, e.g. <html xmlns="http://www.w3.org/1999/xhtml">? //*[#xmlns] does not work because (as it turned out) xmlns is not treated as attribute by XPath.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<title>Информация по счетам, картам</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta http-equiv="cache-control" content="no-cache"/>
<meta http-equiv="pragma" content="no-cache"/>
.......
I need only 'html' node here.

The technically correct answer is that it's...
Not possible. You need to distinguish between the abstract document that the source text represents and the actual source text itself. XPath operates on the abstraction, not on the source text, and the location of the xmlns pseudo-attribute is only relevant in the latter.
However...
You could sort of fake it with the following XPath 2.0 expression:
//*[not(namespace-uri()=ancestor::*/namespace-uri())]
This selects any element that does not have an ancestor in the same namespace, which theoretically means that it selects all elements where the namespace is declared. However, it won't catch namespaces that are re-declared. For example, consider this document:
<html xmlns="http://www.w3.org/1999/xhtml">
<head/>
<body>
<p xmlns="http://something">
<p xmlns="http://something"/>
</p>
</body>
</html>
The expression above selects the html element and the first p. The second p has an ancestor in the same namespace, so it's not selected, even though it specifies an xmlns.

This should not be possible, because
<a xmlns="http://www.org/1"> <b/> </a>
is equivalent to
<a xmlns="http://www.org/1"> <b xmlns="http://www.org/1"/> </a>

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

XPATH - add concatenation into multiple attributes - xpath

Related

<img> after <h1> does not validate as HTML 4.01 Strict

Nokogiri Scraping Misses HTML

document type does not allow element "div" here; assuming missing "object" start-tag

Javascript getElementById from parent window to child window

XPath: select nodes with explicit 'xmlns' attribute

Categories

Resources