i want parsing website using htmlagilitypack on aspx
below is my code
var html = #"http://test.com";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//table[#class='tableclass']//tr")
.Where(x => !x.Attributes["id"].Value.Contains("tableid"));
when this code is executed, all 'tr' from HTMLtable are returned.
below is One of returned HTML
<tr bgcolor="gray">
<td align="center" height="40">123</td>
<td align="center" width="56">
<div>
<img src="http://img.test.com/img.jpg" height="10" border="0" />
</div>
</td>
<td style="padding-left:3px;">THIS_1</td>
<td style="padding-left:3px;">THIS_2</td>
<td style="padding-left:3px;"><font color='red'>blah</font></td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
</tr>
I Only want two td (THIS_1, THIS_2) InnerText
below is my wrong code
foreach (var node in htmlNodes)
{
var str1 = node.ChildNodes["td"].InnerHtml;
var str2 = node.SelectNodes(".//td[#style='padding-left:3px;']");
}
I want to Put [THIS_1 in str1] and [THIS_2 in str2].
Try get elements by index. For example:
foreach (var node in htmlNodes)
{
var str1 = node.SelectSingleNode("td[3]").InnerText; // THIS_1
var str2 = node.SelectSingleNode("td[4]").InnerText; // THIS_2
}
Related
I have a table with td like this
<td>
<span> Washington US <br>98101 Times Square</span>
</td>
I can get all the elements in the page, but I need to get those two values separately. If that isn't possible I would like to somehow get 98101 Times Square
I have tried doing something like string(//tr[3]//td[2])/ but all I get is the two text joined together.
You can select the text child nodes in the span element with span/text() so assuming your posted path selects the td containing the span you want //tr[3]//td[2]/span/text().
Here is a sample:
$html = <<<EOD
<html>
<body>
<table>
<tr>
<td>1</td>
</tr>
<tr>
<td>2</td>
</tr>
<tr>
<td>3,1</td>
<td>
<span> Washington US <br>98101 Times Square</span>
</td>
</tr>
</body>
</html>
EOD;
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$textNodes = $xpath->query('//tr[3]//td[2]/span/text()');
foreach ($textNodes as $text) {
echo $text->textContent . "\n";
}
Outputs
Washington US
98101 Times Square
Try
td/span/node()[1]
and
td/span/node()[3]
Or
td/span/text()[1]
td/span/text()[2]
I am developing a shopping cart website in MVC3 with razor. Below code is view code, in which I am showing multiple products, each product has quantity text box and one submit button. So, in controller how would I know which submit button is called by user and how to read text from Label?
<table>
<tr>
#foreach (System.Data.DataRow i in Model.dt.Rows )
{
using (#Html.BeginForm("addtocart", "Chocolatier",FormMethod.Post,null))
{
<tr><td> <img src=" #Url.Content("~/Content/abc.gif") " alt="Imge" height ="40" width ="40"/></td></tr>
foreach (System.Data.DataColumn j in Model.dt.Columns)
{
lbl = "lbl";
lbl = lbl+#cnt.ToString();
if(j.ToString().ToLower ().Equals ("name"))
{
<tr> <td style="width : 200px"><h5>#j.ToString()</h5><label id="#lbl" >#i[j].ToString()</label></td></tr>
}
if(j.ToString().ToLower ().Equals ("description"))
{
<tr> <td style="width : 200px"><h5>#j.ToString()</h5>#i[j].ToString())</td></tr>
}
if(j.ToString().ToLower ().Equals ("price"))
{
<tr> <td style="width : 200px"><h5>#j.ToString()</h5>#Html.LabelFor(m=>m.prod_price,i[j].ToString())</td></tr>
}
}
cnt += 1;
<td>Enter quantity : #Html.TextBoxFor(m=>m.prod_quantity)</td>
<tr><td><input type ="submit" value="Add to cart" id="#cnt"/></td></tr>
<tr> <td>__________________________________________________________________________</td></tr>
}
}
</tr>
</table>
I have an ajax call attached to the click event of a pic inside a table row. Once the pic is clicked and the click event initiated, I need to grab the first and second td elements from that row. I'm new to jQuery so what I have below is my latest attempt (not working..). The variables firstName and lastName both wind up being undefined after those lines are executed
$('.checkErrors').click(function () {
var firstName = $(this).parent('tr td:first-child').val();
var lastName = $(this).parent('tr td:nth-child(2)').val();
$.ajax({
type: 'GET',
url: '#Url.Action("GetErrors","AgentTransmission")',
data: { term: $(this).attr('id') },
.
.
});
});
Here is a sample table row. The image in the last td element contains the .click event. I would like to grab the first two that contain the text "phinneas" and "ferbseven".
<tr>
<td>
phinneas
</td>
<td>
ferbseven
</td>
<td nowrap>
7735
</td>
<td>
Agent
</td>
<td>
SAF
 
07070900
</td>
<td>
6/5/2013 10:35:38 AM
</td>
<td>
DANTAK
</td>
<td class="errorPlus">
Error
</td>
<td>
Details
<span> | </span>
Edit
</td>
<td align=center id=2358>
<img src="/Content/images/magnify.gif" class="checkErrors" id=2358 alt="Program Details" />
</td>
</tr>
use closest
var $tr = $(this).closest(´tr´);
var firstName = $tr.find('td:first-child').text();
var lastName = $tr.find('td:nth-child(2)').text();
Also, you need to use text instead of val for td elements since it make sense only on input controls.
First only form elements have .val property so .
You are supposed to use .text since it is a td
Try using :eq psuedo selector
var $tr;
$tr.find('td:eq(0)').text();
$tr.find('td:eq(1)').text();
To get their content, you can do this:
var cells = $(this).closest('td').siblings('td');
var firstName = cells.eq(0).text();
var firstName = cells.eq(1).text();
Those last two lines can also be:
var firstName = $(cells[0]).text();
var firstName = $(cells[1]).text();
How can I select two cells in table bases on span class?
my html looks like this.
what I want is to select innertext of span class="store-name-span"
and span class="price"
<table class="list mixed zebra-striped">
<tbody>
<tr data-pris_typ="normal">
<td class="span4-5">
<span class="store-name-span">Electroworld</span>
<a data-drg="store-2641" class="drg-sidebar"></a>
</td>
<td class="span3 cell-bar">
<span class="chart-bar price" style="width:50px"></span>
<span class="price" title="Uppdaterad 2013-02-18 08:23">1 690:-</span>
</td>
</tr>
<tr data-pris_typ="normal">
<td class="span4-5">
<span class="store-name-span">Webhallen</span>
<a data-drg="store-113" class="drg-sidebar"</a>
</td>
<td class="span3 cell-bar">
<span class="chart-bar price" style="width:50px"></span>
<span class="price" title="Uppdaterad 2013-02-18 13:55">1 690:-</span>
</td>
</tr>
</tbody>
</table>
var Nodes = from x in doc2.DocumentNode.Descendants()
//where x.Attributes["class"].Value == "store-name-span"
where x.Name == "span" && x.Attributes["class"].Value == "store-name-span"
select x.InnerText;
I'd use xpath for this:
var nodes = doc.DocumentNode.SelectNodes("//span[#class='store-name-span' or #class='price']");
foreach (var node in nodes)
Console.WriteLine(node.InnerText);
By using LINQ:
var nodes = doc.DocumentNode.Descendants("span")
.Where(s =>
s.GetAttributeValue("class", null) == "store-name-span" ||
s.GetAttributeValue("class", null) == "price"
);
this will get you:
Electroworld
1 690:-
Webhallen
1 690:-
In that particular HTML layout, you can do:
var items = doc.DocumentNode.SelectNodes("//tr[#data-pris_typ='normal']").Select(x => new
{
Store = x.SelectSingleNode(".//span[#class='store-name-span']").InnerText,
Price = x.SelectSingleNode(".//span[#class='price']").InnerText
});
On items you'll get what you need. Each item will be an anonymous type with the Store and Price fields.
One important thing:
You might want to clean the fields (like Price) using HttpUtility.HtmlDecode(). To do that you will have to add a reference to the System.Web assembly.
I would use a combination of querySelectorAll and fetching innerHTML.
queryselectors work both on calling globally (on document) as well as for a single element.
I have following xml file:
<ab>
<![CDATA[
<table>
<tbody>
<tr>
<th>abcdef</th>
<th>Contact</th>
</tr>
<tr>
<p>
Home
</p>
</tr>
</tbody>
</table>
]]>
</ab>
I am still learning linq. Want to know if there is an easier way to find all a href = "/1/2/" tags inside cdata and remove them. Like in above example it should just show Contact and Home and remove the
void Main()
{
XDocument doc = XDocument.Load("C:\\test.xml");
XDocument xdoc = XDocument.Parse(doc.ToString());
XNode node = xdoc.DescendantNodes().Single(x => x.NodeType == XmlNodeType.CDATA);
if (node.Parent != null)
{
string content = node.Parent.Value.Trim();
IEnumerable<XElement> elements =
XDocument.Parse(content).Descendants().Where(x =>
{
XAttribute xAttribute = x.Attribute("href");
return
xAttribute !=
null && xAttribute.Value == "/1/2";
});
// do something here
}
}
contents of test.xml is
<ab>
<![CDATA[
<table>
<tbody>
<tr>
<th>abcdef</th>
<th>Contact</th>
</tr>
<tr>
<p>
Home
</p>
</tr>
</tbody>
</table>
]]>
</ab>
I don't think LINQ is the best way to go about this problem. Personally, I would use Regular Expression. Here is an example of how this could be done:
Example: Scanning for HREFs
In general, if you are doing any more intensive HTML processing, using an HTML parser is probably the best way to go, such as HtmlAgilityPack.
Regex sample code:
Regex hrefRegex = new Regex(#"href=""([^""]*"")", RegexOptions.IgnoreCase | RegexOptions.Compiled);
string output = hrefRegex.Replace(input, new MatchEvaluator(m => string.Empty));
Hope this helps,
Ivan