XPath: select nodes with explicit 'xmlns' attribute - xpath

Could anyone please provide XPath expression which selects all nodes that have explicit 'xmlns' attribute, e.g. <html xmlns="http://www.w3.org/1999/xhtml">? //*[#xmlns] does not work because (as it turned out) xmlns is not treated as attribute by XPath.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<title>Информация по счетам, картам</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta http-equiv="cache-control" content="no-cache"/>
<meta http-equiv="pragma" content="no-cache"/>
.......
I need only 'html' node here.

The technically correct answer is that it's...
Not possible. You need to distinguish between the abstract document that the source text represents and the actual source text itself. XPath operates on the abstraction, not on the source text, and the location of the xmlns pseudo-attribute is only relevant in the latter.
However...
You could sort of fake it with the following XPath 2.0 expression:
//*[not(namespace-uri()=ancestor::*/namespace-uri())]
This selects any element that does not have an ancestor in the same namespace, which theoretically means that it selects all elements where the namespace is declared. However, it won't catch namespaces that are re-declared. For example, consider this document:
<html xmlns="http://www.w3.org/1999/xhtml">
<head/>
<body>
<p xmlns="http://something">
<p xmlns="http://something"/>
</p>
</body>
</html>
The expression above selects the html element and the first p. The second p has an ancestor in the same namespace, so it's not selected, even though it specifies an xmlns.

This should not be possible, because
<a xmlns="http://www.org/1"> <b/> </a>
is equivalent to
<a xmlns="http://www.org/1"> <b xmlns="http://www.org/1"/> </a>

Related

How can I show/hide FreeMaker template FTL

I'm new to FreeMaker Template, In the below example, I want to show the <#greet person="${name}"!/> macro for 10 seconds only, then need to remove it, any idea how can I make it?
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>${title} | Kweet</title>
</head>
<body>
<#greet person="${name}"!/>
<#include "/copyright_footer.html">
</body>
</html>
<#macro greet person color="black">
<font size="+2" color="${color}">Hello ${person}!</font>
</#macro>
You can do that with JavaScript (which you put into the template). It has nothing to do FreeMarker, as that only generates the page before it's sent to the browser.

<img> after <h1> does not validate as HTML 4.01 Strict

My markup is
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>klaymen - About</title>
</head>
<body>
<h1>Klaymen</h1>
<img src="resources/klaymen-about.jpg" width="200" alt="Klaymen's about picture">
</body>
When I test the document with this validator https://validator.w3.org/#validate_by_input I get the following error:
document type does not allow element "IMG" here; missing one of "P", "H1", "H2", "H3", "H4", "H5", "H6", "DIV", "ADDRESS" start-tag
Clearly I have an H1, and img is a flow content element which supposed to be allowed in this location, so what is the problem?
You can use a div container.
<div>
<img src=""/>
<span display: block>Text below the image</span>
</div>
This happens because the body of a document in this spec cannot contain an inline element like <img>, thus, by putting it inside a block element like <div>, all's fixed.

How do I repeat the form POST request programmatically

When I visit the site http://www.jetstar.com/au/en/home
And fill the form then submit.
It will send a POST request then redirect me to new page show the ticket price page(HTML).
I can get the expected result in the second GET response
However, When I try to repeat the POST request with Ruby or Charles
I will get 302 Found error.
I don't get it.
Ruby
q_prams = {
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListFareTypes" =>"I",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListMarketDay1" =>"9",
~~~
"pageToken" =>"sLkmnwXwAsY=",
"ControlGroupSearchView$AvailabilitySearchInputSearchView$fromCS" =>"yes"
}
res = RestClient.post 'https://booknow.jetstar.com/Search.aspx', q_prams
POST request params
ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListCurrency=&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListFareTypes=I&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListMarketDay1=18&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListMarketDay2=1&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListMarketDay3=&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListMarketMonth1=2015-6&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListMarketMonth2=1968-1&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListMarketMonth3=&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListPassengerType_ADT=1&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListPassengerType_CHD=0&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24DropDownListPassengerType_INFANT=0&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24RadioButtonMarketStructure=OneWay&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24TextBoxMarketDestination1=MEL&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24TextBoxMarketDestination2=&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24TextBoxMarketDestination3=&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24TextBoxMarketOrigin1=NAN&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24TextBoxMarketOrigin2=&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24TextBoxMarketOrigin3=&ControlGroupSearchView%24ButtonSubmit=&__VIEWSTATE=&culture=en-AU&date_picker=&go-booking=&pageToken=sLkmnwXwAsY%3D&ControlGroupSearchView%24AvailabilitySearchInputSearchView%24fromCS=yes&_pe_39b5379c652b_9df496572198=null&locale=en-AU
The first time response (SUCCESS) but cat not repeat it programmatically
<!doctype html><!--paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/--><!--[if lt IE 7 ]> <html lang="en" class="no-js ie6"> <![endif]--><!--[if IE 7 ]> <html lang="en" class="no-js ie7"> <![endif]--><!--[if IE 8 ]> <html lang="en" class="no-js ie8"> <![endif]--><!--[if IE 9 ]> <html lang="en" class="no-js ie9"> <![endif]--><!--[if (gt IE 9)|!(IE)]><!--><html lang="en" class="no-js"><!--<![endif]--><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<head class="SB">
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Jetstar Airways Cheap Flights, Low Fares all day everyday from the world's best Cheap Fare airline</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<link rel="shortcut icon" href="favicon.ico" type="image/x-icon">
<link rel="icon" href="favicon.ico" type="image/ico">
<link rel="SHORTCUT ICON" href="favicon.ico">
...
This is most probably because of the CSRF verification, Sites users CSRF to validate the request to make sure the form submission came from the same site.
In your case, you try to submit a form from a different source and hence the verification fails.
If you want to do the above, I recon to do a screen scraping by using a library like capybara
read more about CSRF here

Nokogiri Scraping Misses HTML

Nokogiri isn't grabbing anything beneath the iframe tag.
doc.search("iframe") returns only the iframe tag. doc.search("body.content-frame") returns empty. doc.errors returns empty also. Why isn't Nokogiri registering the HTML beneath the iframe? How can I grab it?
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body onunload="clearMyTimeInterval()">
<iframe id="content-frame" frameborder="0" src="/sportsbook/betting-lines/baseball/2014-08-21/?range=day" onload="javascript:checkLoadedFrame(this);" style="background-color: rgb(34, 34, 34); height: 1875px;" name="content-frame" scrolling="no" border="0">
#document
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body class="content-frame">
#ETC.......
That's because the contents of the iframe are not part of the page. In fact, they are in a completely different location (note the src attribute of the iframe). You'll have to fetch that content separately, which is how a browser would do it.
Here is code that handles it:
page = Mechanize.new.get "http://page_u_need"
page.iframe_with(id: 'beatles').content

document type does not allow element "div" here; assuming missing "object" start-tag

Error Line 18, Column 19: document type does not allow element "div" here; assuming missing "object" start-tag
Please see the page source below
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta name="keywords" content="Karnataka,Bangalore_Rural,Healthcare,Office_Assistant,Kerala,Ernakulam,IT_Hardware_Networking,Engineer,Sales___Marketing,Executive,Maharashtra,Mumbai_City,Retailing,Manager,Kollam,CRM_CallCentres_BPO_ITES_Med.Trans,Customer_Care,Hotel_Travel_Tourism_Airlines_Hospitality,Front_Office_Staff,Andhra_Pradesh,Hyderabad,IT_Software,Java_Developer,Pathanamthitta,Manufacturing_Industrial,Educational_Training,Teacher,Engineering_Projects"/>
<meta name="description" content="The best job oriented resume sharing system. Create and Publish your online resumes for FREE. Search and apply your dream jobs for FREE. Post your jobs for FREE."/>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<div id="fb-root"></div>
I do believe you need to put the < div > inside a < body > section.

Resources