XPath to find anchor with no text? - xpath

I have a couple of anchor tags which have no text, so no text here, how do I find those anchors with no text?
I've tried this but it's returning nothing:
global $post;
$doc = new DOMDocument();
$doc->loadHtml($post->post_content);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//a[string-length(.) = 0]');
foreach($nodes as $node) {
$remove_elements[] = $node->getAttribute('href');
}
return $remove_elements;
The html looks like this.
<br/>
<br/>
<br/>
<br/>

If you want to know if a node has no text(), you can use string-length() which excepts a node. In this case '.' is a reference to current element.
You can do
//a[string-length(.) = 0]

Related

HtmlAgilityPack - SelectSingleNode for descendants

I found that HtmlAgilityPack SelectSingleNode always starts from the first node of the original DOM. Is there an equivalent method to set its starting node ?
Sample html
<html>
<body>
Home
<div id="contentDiv">
<tr class="blueRow">
<td scope="row">target</td>
</tr>
</div>
</body>
</html>
Not working code
//Expected:iwantthis.com Actual:home.com,
string url = contentDiv.SelectSingleNode("//tr[#class='blueRow']")
.SelectSingleNode("//a") //What should this be ?
.GetAttributeValue("href", "");
I have to replace the code above with this:
var tds = contentDiv.SelectSingleNode("//tr[#class='blueRow']").Descendants("td");
string url = "";
foreach (HtmlNode td in tds)
{
if (td.Descendants("a").Any())
{
url= td.ChildNodes.First().GetAttributeValue("href", "");
}
}
I am using HtmlAgilityPack 1.7.4 on .Net Framework 4.6.2
The XPath you are using always starts at the root of the document. SelectSingleNode("//a") means start at the root of the document and find the first a anywhere in the document; that's why it grabs the Home link.
If you want to start from the current node, you should use the . selector. SelectSingleNode(".//a") would mean find the first a that is anywhere beneath the current node.
So your code would look like this:
string url = contentDiv.SelectSingleNode(".//tr[#class='blueRow']")
.SelectSingleNode(".//a")
.GetAttributeValue("href", "");

Get img src within html file in Laravel

How can i get the image if I was given a URL with lots of img src? I am using Laravel 5.2.
This takes too long to load given that I have 6,000 links. Thanks in advance.
$url="$products->product_url";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}
Firstable you need to find some kind of class of id of parent element that contain image that you need for example:
<body>
// bla-bla-bla
<div class="content">
<img src="img.png" />
</div>
</body>
You can do follow with ZendDomQuery:
$finder = new Zend_Dom_Query($html);
$node = $finder->query("div[class~='content'] img");

Search specific text on a webpage using xpath

I want to search specific text on a webpage using XPath.
<?php
$url = 'http://www.barringtonsports.com/browse/hockey_sticks/show/325/list';
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$found = $xpath->evaluate("//span[contains(text(),'blablabla')]");
if(!$found){
echo "NOT FOUND";
}
else{
echo "found";
}
?>
it always give the output found as the text blablabla is not in the webpage.
where is the problem?
Is my evalute expression correct for searching specific text?
evaluate will not return a boolean for a XPath expression that selects a node, you have to use:
$found = $xpath->evaluate("boolean(//span[contains(text(),'blablabla')])");

strict with CGI::AJAX

I have set of code for updating a password in the table, here I'm using CGI::AJAX module to update the password and get the popup screen on corresponding execution.When using that code with my application it is executing properly but I didn't get the output(means Perl subroutine is not called when JavaScript function to get use.password is not updated into table). I don't get any error either.
#!/usr/bin/perl -w
use strict;
use CGI;
use DBI;
use Data::Dumper;
my $p = new CGI qw(header start_html end_html h1 script link);
use Class::Accessor;
use CGI::Ajax;
my $create_newuser;
my $ajax = new CGI::Ajax('fetch_javaScript' => $create_newuser);
print $ajax->build_html($p,\&Show_html,{-charset=>'UTF-8', -expires=>'-1d'});
sub Show_html
{
my $html = <<EOHTML;
<html>
<body bgcolor="#D2B9D3">
<IMG src="karvy.jpg" ALT="image">
<form name='myForm'>
<center><table><tr><td>
<div style="width:400px;height:250px;border:3px solid black;">
<center><h4>Create New Password's</h4>
<p>&nbsp User Name</b>&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp<INPUT TYPE="text" NAME="user" id = "user" size = "15" maxlength = "15" tabindex = "1"/></p>
<p>&nbsp Password:</b>&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp<INPUT TYPE=PASSWORD NAME="newpassword" id = "newpassword" size = "15" maxlength = "15" tabindex = "1"/></p>
<p>&nbsp Re-Password:</b>&nbsp&nbsp&nbsp<INPUT TYPE=PASSWORD NAME="repassword" id = "repassword" size = "15" maxlength = "15" tabindex = "1"/></p>
<input type="submit" id="val" value="Submit" align="middle" method="GET" onclick="fetch_javaScript(['user','newpassword','repassword']);"/><INPUT TYPE="reset" name = "Reset" value = "Reset"/>
<p>Main Menu <A HREF = login.pl>click here</A>
</center>
</div>
</td></tr></table></center>
</form>
</body>
</html>
EOHTML
return $html;
}
$create_newuser =sub
{
my #input = $p->params('args');
my $user=$input[0];
my $password=$input[1];
my $repassword=$input[2];
my $DSN = q/dbi:ODBC:SQLSERVER/;
my $uid = q/123/;
my $pwd = q/123/;
my $DRIVER = "Freetds";
my $dbh = DBI->connect($DSN,$uid,$pwd) or die "Coudn't Connect SQL";
if ($user ne '')
{
if($password eq $repassword)
{
my $sth=$dbh->do("insert into rpt_account_information (user_id,username,password,user_status,is_admin) values(2,'".$user."','".$password."',1,1)");
my $value=$sth;
print $value,"\n";
if($value == 1)
{
print 'Your pass has benn changed.Return to the main page';
}
}
else
{
print "<script>alert('Password and Re-Password does not match')</script>";
}
}
else
{
print "<script>alert('Please Enter the User Name')</script>";
}
}
my $create_newuser;
my $ajax = new CGI::Ajax('fetch_javaScript' => $create_newuser);
...;
$create_newuser =sub { ... };
At the moment when you create a new CGI::Ajax object, the $create_newuser variable is still undef. Only much later do you assign a coderef to it.
You can either assign the $create_newuser before you create the CGI::Ajax:
my $create_newuser =sub { ... };
my $ajax = new CGI::Ajax('fetch_javaScript' => $create_newuser);
...;
Or you use a normal, named subroutine and pass a coderef.
my $ajax = new CGI::Ajax('fetch_javaScript' => \&create_newuser);
...;
sub create_newuser { ... }
Aside from this main error, your script has many more problems.
You should use strict instead of the -w option.
For debugging purposes only, use CGI::Carp 'fatalsToBrowser' and sometimes even with warningsToBrowser can be extremely helpful. Otherwise, keeping a close eye on the error logs is a must.
my $p = new CGI qw(header start_html end_html h1 script link) doesn't make any sense. my $p = CGI->new should be sufficient.
use Class::Accessor seems a bit random here.
The HTML in Show_html is careless. First, your heredocs allows variable interpolation and escape codes – it has the semantics of a double quoted string. Most of the time, you don't want that. Start a heredoc like <<'END_OF_HTML' to avoid interpolation etc.
Secondly, look at that tag soup you are producing! Here are some snippets that astonish me:
bgcolor="#D2B9D3", align="middle" – because CSS hasn't been invented yet.
<center> – because CSS hasn't been invented yet, and this element isn't deprecated at all.
<table><tr><td><div ... </div></td></tr></table> – because there is nothing wrong with a table containing a single cell. (For what? This isn't even for layout reasons!) This table cell contains a single div …
… which contains another center. Seriously, what is so great about unneccessary DOM elements that CSS isn't even an option.
style="width:400px;height:250px;border:3px solid black;" – because responsive design hasn't been invented yet.
<p> ... </b> – Oh, what delicious tag soup!
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp – this isn't a typewriter, you know. Use CSS and proper markup for your layout. There is a difference between text containing whitespace, and empty areas in your layout.
tabindex = "1" … tabindex = "1" … tabindex = "1" – I don't think you know what tabindex does.
<A HREF = login.pl> – LOWERCASING OR QUOTING YOUR ATTRIBUTES IS FOR THE WEAK!!1
onclick="fetch_javaScript(['user','newpassword','repassword']);" – have you read the CGI::Ajax docs? This is not how it works: You need to define another argument with the ID of the element where the answer HTML is displayed.
In your create_newuser, you have an SQL injection vulnerability. Use placeholders to solve that. Instead of $sth->do("INSERT INTO ... VALUES('$foo')") use $sth->do('INSERT INTO ... VALUES(?)', $foo).
print ... – your Ajax handler shouldn't print output, instead it should return a HTML string, which then gets hooked into the DOM at the place your JS function specified. You want something like
use HTML::Entities;
sub create_newuser {
my ($user, $password, $repassword) = $p->params('args');
my ($e_user, $e_password) = map { encode_entities($_) } $user, $password;
# DON'T DO THIS, it is a joke
return "Hello <em>$e_user</em>, your password <code>$e_password</code> has been successfully transmitted in cleartext!";
}
and in your JS:
fetch_javaScript(['user','newpassword','repassword'], ['answer-element'], 'GET');
where your HTML document somewhere has a <div id="answer-element" />.

Parse HTML doc with HtmlAgilityPack-Xpath, RegExp

I try parse image url from html with HtmlAgilityPack. In html doc I have img tag :
<a class="css_foto" href="" title="Fotka: MyKe015">
<span>
<img src="http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6"
width="176" height="216" alt="Fotka: MyKe015" />
</span>
</a>
I need get from this img tag atribute src. I need this: http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6.
I know this:
Src atribute consist url, url start
with
http://213.215.107.125/fotky
I know value of alt atribute Url
have
variable lenght and also html doc
consist other img tags with url, which start with
http://213.215.107.125/fotky
I know alt attribute of img tag (Fotka: Myke015))
Any advance, I try many ways, but nothing works good.
Last I try this:
List<string> src;
var req = (HttpWebRequest)WebRequest.Create("http://pokec.azet.sk/myke015");
req.Method = "GET";
using (WebResponse odpoved = req.GetResponse())
{
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.Load(odpoved.GetResponseStream());
var nodes = htmlDoc.DocumentNode.SelectNodes("//img[#src]");
src = new List<string>(nodes.Count);
if (nodes != null)
{
foreach (var node in nodes)
{
if (node.Id != null)
src.Add(node.Id);
}
}
}
Your XPath selects the img nodes, not the src attributes belonging to them.
Instead of (selecting all image tags that have a src attribute):
var nodes = htmlDoc.DocumentNode.SelectNodes("//img[#src]");
Use this (select the src attributes that are child nodes of all img elements):
var nodes = htmlDoc.DocumentNode.SelectNodes("//img/#src");
This XPath 1.0 expression:
//a[#alt='Fotka: MyKe015']/#src

Resources