HTMLAgilityPack ChildNodes index works, named node does not - html-agility-pack

I am parsing an XML API response with HTMLAgilityPack. I am able to select the result items from the API call.
Then I loop through the items and want to write the ChildNodes to a table. When I select
ChildNodes by saying something like:
sItemId = dnItem.ChildNodes(0).innertext
I get the proper itemId result. But when I try:
sItemId = dnItem.ChildNodes("itemId").innertext
I get "Referenced object has a value of 'Nothing'."
I have tried "itemID[1]", "/itemId[1]" and a veriety of strings. I have tried SelectSingleNode and ChildNodes.Item("itemId").innertext. The only one that has worked is using the index.
The problem with using the index is that sometimes child elements are omitted in the results and that throw off the index.
Anybody know what I am doing wrong?

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(#webHTMLeditor.TextXhtml);
HtmlNodeCollection tableRows = htmlDoc.DocumentNode.SelectNodes("//tr");
for (int i = 0; i < tableRows.Count; i++)
{
HtmlNode tr = tableRows[i];
HtmlNode[] td = new HtmlNode[2];
string xpath = tr.XPath + "//td";
HtmlNodeCollection cellRows = tr.SelectNodes(#xpath);
//td[0] = tr.ChildNodes[1];
//td[1] = tr.ChildNodes[3];
try
{
td[0] = cellRows[0];
td[1] = cellRows[1];
}
catch (Exception)
{ }
//etc
}
The code is used to extract data from a table, row by row, by cell per row.
I used the existing xpath and I altered it acording to my needs.
Good luck!

Related

How to populate the table dynamically and correctly with Ajax?

I have a form where the user submits a query and then have a Servlet that processes this query and returns the results in XML. With this result trying to populate a table dynamically via Ajax, for such, I use the following code below.
var thead = $("<thead>");
var rowsTHead = $("<tr>");
var tbody = $("<tbody>");
var numberOfColumns;
$(xml).find("head").each(function(){
var variable = $(this).find("variable");
numberOfColumns = variable.length;
for (var i = 0; i < variable.length; i++){
var name = $(variable[i]).attr("name");
rowsTHead.append($("<th>").html(name));
}
});
thead.append(rowsTHead);
$(xml).find("result").each(function(){
var literal = $(this).find("literal");
var rowsTBody = $("<tr class=\"even\">");
literal.length = numberOfColumns;
for (var j = 0; j < literal.length; j++){
var tdBody = $("<td>");
tdBody.html($(literal[j]).text());
rowsTBody.append(tdBody);
}
tbody.append(rowsTBody);
});
$(".tablesorter").empty()
.append(thead)
.append(tbody);
This code works perfectly until it was used in a UNION query. When using a UNION the returned xml comes in the following way http://pastebin.com/y7hXK1Zy
As can be observed, this query has 4 variables that are: gn1, indication1, gn2, indication2.
What is going wrong is that the values of all the variables being written in columns corresponding to gn1 and indication1.
What I wish I was to write the value of each variable in its corresponding column. I wonder what should I change in my code to make this possible.
You need to respect the name values of the binding elements, and relate them back to the columns that you correctly built from parsing the element. When you are doing the find "literal", you are skipping the parsing of the binding elements. You should find "binding", respect the name and look up which column to use based on that, and then for each of those, find the "literal" elements for the actual values.

Can I refer to items within a LINQ result set by index?

I'm trying to work with a LINQ result set of 4 tables retrieved with html agility pack. I'd like to process each one slightly differently by setting a variable for each (switch statement below), and then processing the rows within the table. The variable would ideally be the index for each of the tables in the set, 0 to 3, and would be used in the switch statement and to select the rows. I haven't been able to locate the index property, but I see it used in situations such as SelectChildNode.
My question is can I refer to items within a LINQ result set by index? My "ideal scenario" is the last commented out line. Thanks in advance.
var ratingsChgs = from table in htmlDoc.DocumentNode
.SelectNodes("//table[#class='calendar-table']")
.Cast<HtmlNode>()
select table;
String rtgChgType;
for (int ratingsChgTbl = 0; ratingsChgTbl < 4; ratingsChgTbl++)
{
switch (ratingsChgTbl)
{
case 0:
rtgChgType = "Upgrades";
break;
case 1:
rtgChgType = "Downgrades";
break;
case 2:
rtgChgType = "Coverage Initiated";
break;
case 3:
rtgChgType = "Coverage Reit/ Price Tgt Changed";
break;
//This is what I'd like to do.
var tblRowsByChgType = from row in ratingsChgs[ratingsChgTbl]
.SelectNodes("tr")
select row;
//Processing of returned rows.
}
}
ElementAt does what you're asking for. I don't recommend using it in your example, though, because each time you call it, your initial LINQ query will be executed. The easy fix is to have ratingsChgs be a List or Array.
You can also refactor out the switch statement. It is overkill when you only need to iterate through a list of items. Here is a possible solution:
var ratingsChgs = from table in htmlDoc.DocumentNode
.SelectNodes("//table[#class='calendar-table']")
.Cast<HtmlNode>()
select table;
var rtgChgTypeNames = new List
{
"Upgrades",
"Downgrades",
"Coverage Initiated",
"Coverage Reit/ Price Tgt Changed"
};
var changeTypes = ratingsChgs.Zip(rtgChgTypeNames, (changeType, name) => new
{
Name = name,
Rows = changeType.SelectNodes("tr")
});
foreach( var changeType in changeTypes)
{
var name = changeType.Name;
var rows = changeType.Rows;
//Processing of returned rows.
}
Also, why not store your rating change types in the HTML doc? It seems odd to have table information defined in the business logic.

LINQ query returning null results

I have the following code
nodes = data.Descendants(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Results")).Nodes();
System.Collections.Generic.IEnumerable<Result> res = new List<Result>();
if (nodes.Count() > 0)
{
var results = from uris in nodes
select new Result
{
URL =
((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Url")).Value,
Title =
((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Title")).Value,
Description =
((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Description")).Value,
DateTime =
((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}DateTime")).Value,
};
res = results;
}
Where Results is a object who has those URL, Title, Description, and DateTime variables defined.
This all works fine normally, but when a 'node' in nodes doesnt contain a Description element (or at least I think thats whats throwing it) the program hits the "res = results;"
line of code and throws a 'object reference not set to...' error and highlights the whole section right after "select new Results"..
How do I fix this?
The simplest way is to cast to string instead of using the Value property. That way you'll end up with a null reference for the Description instead.
However, your code can also be made a lot nicer:
XNamespace ns = "http://schemas.microsoft.com/LiveSearch/2008/04/XML/web";
var results = data.Descendants(ns + "Results")
.Elements()
.Select(x => new Result
{
URL = (string) x.Element(ns + "Url"),
Title = (string) x.Element(ns + "Title"),
Description = (string) x.Element(ns + "Description"),
DateTime = (string) x.Element(ns + "DateTime")
})
.ToList();
See how much simpler that is? Techiques used:
Calling ToList() on an empty sequence gives you a list anyway
This way you'll only ever perform the query once; before you were calling Count() which would potentially have iterated over each node. In general, use Any() instead of Count() > 0) - but this time just making the list unconditional is simpler.
Use the Elements() method to get child elements, rather than casting multiple times. (Your previous code would have thrown an exception if it had encountered any non-element nodes)
Use the implicit conversion from string to XNamespace
Use the +(XNamespace, string) operator to get an XName
If the Description element is not included you should test if this
((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Description"))
is not null before using Value. Try this code:
var results = from uris in nodes let des = ((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Description"))
select new Result
{
URL = ((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Url")).Value,
Title = ((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}Title")).Value,
Description = (des != null) ? des.Value : string.Empty,
DateTime = ((XElement)uris).Element(XName.Get("{http://schemas.microsoft.com/LiveSearch/2008/04/XML/web}DateTime")).Value,
};

LINQ: Field is not a reference field

I've got a list of IQueryable. I'm trying to split this list into an array of IQueryable matching on a certain field (say fieldnum) in the first list...
for example, if fieldnum == 1, it should go into array[1]. I'm using Where() to filter based on this field, it looks something like this:
var allItems = FillListofMyObjects();
var Filtered = new List<IQueryable<myObject>(MAX+1);
for (var i = 1; i <= MAX; i++)
{
var sublist = allItems.Where(e => e.fieldnum == i);
if (sublist.Count() == 0) continue;
Filtered[i] = sublist;
}
however, I'm getting the error Field "t1.fieldnum" is not a reference field on the if line. stepping through the debugger shows the error actually occurs on the line before (the Where() method) but either way, I don't know what I'm doing wrong.
I'm farily new to LINQ so if I'm doing this all wrong please let me know, thanks!
Why don't you just use ToLookup?
var allItemsPerFieldNum = allItems.ToLookup(e => e.fieldnum);
Do you need to reevaluate the expression every time you get the values?
Why not use a dictionary?
var dictionary = allItems.ToDictionar(y => y.fieldnum);

Iterating Linq result set using indexers

Let's ay I have this query:
var results = from row in db.Table select row;
How can I access this:
string name = results[0]["columnName"];
if you really want a particular index you can use the Skip() method with First().
var rowOffset = 0;
var results = (from row in db.Table
select row).Skip(rowOffset).First()["columnName"];
But unless you are using a Where clause I would really recommend using the indexer. The indexer is pretty much a direct reference while the LINQ statement would be using the objects iterator.
Also don't forget you can do much more advanced stuff with LINQ.
var rowOffset = 0;
var pageLength = 10;
var results = (from row in db.Table
let colValue = row["columnname"]
where colValue != null
select colValue.ToString()
).Skip(rowOffset)
.Take(pageLength)
.ToArray();
var commaString = string.Join(", ", results);
If you specifically just want the zeroth element, you can use results.First()
results is a IEnumerable list of Rows. So you can get it with a simple foreach.
foreach(var row in results)
{
string name = row["columnName"];
}
(from row in db.Table select row).First().columnName

Resources