HTMLAgilityPack iterate all text nodes only - html-agility-pack

Here is a HTML snippet and all I want is to get only the text nodes and iterate them. Pls let me know. Thanks.
<div>
<div>
Select your Age:
<select>
<option>0 to 10</option>
<option>20 and above</option>
</select>
</div>
<div>
Help/Hints:
<ul>
<li>This is required field.
<li>Make sure select the right age.
</ul>
Learn More
</div>
</div>
Result:
Select your Age:
0 to 10
20 and above
Help/Hints:
This is required field.
Make sure select the right age.
Learn More

Something like this:
HtmlDocument doc = new HtmlDocument();
doc.Load(yourHtmlFile);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']"))
{
Console.WriteLine(node.InnerText.Trim());
}
Will output this:
Select your Age:
0 to 10
20 and above
Help/Hints:
This is required field.
Make sure select the right age.
Learn More

I tested #Simon Mourier's answer on the Google home page and got lots of CSS and Javascript, so I added an extra filter to remove it:
public string getBodyText(string html)
{
string str = "";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
try
{
// Remove script & style nodes
doc.DocumentNode.Descendants().Where( n => n.Name == "script" || n.Name == "style" ).ToList().ForEach(n => n.Remove());
// Simon Mourier's Answer
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']"))
{
str += node.InnerText.Trim() + " ";
}
}
catch (Exception)
{
}
return str;
}

Related

cyoress test validate div tag ID contains value

On the HTML page nested div tags are there. And either div tag ID = x or tag ID =y. in different HTML pages.
<--! example 1 -->
<body>
<div>
<div></div>
<div id=Y></div>
</div>
</body>
<--! example 2 -->
<body>
<div>
<div></div>
<div></div>
<div>
<div id=X></div>
</div>
</div>
</body>
each HTML page contains either one of the above example code
Want to write a common method and run the code.
if div tag with ID= X then run Xcode();
if div tag with ID= Y then run Ycode();
We can iterate through all divs in your DOM, and using JQuery commands, we can determine if the div has a certain id.
cy.get('div').each(($div) => {
if ($div.attr('id') === 'x') {
// code to run if id = X
} else if ($div.attr('id') === 'y') {
// code to run if id = y
} else {
// code to run if neither
}
})
If you end up with more than just two or three cases, I'd probably recommend using a switch statement instead of if/else.
Additionally, the above code will solve your problem, but will probably be less performant than if you were to have a stable DOM that you knew either did or did not have the div#X/div#Y element. In this case, we have to traverse every single div element. If we knew for certain that div#X existed, we could just do cy.get('div#X') and get on with the tests. You could sure this up by adding a specific data-cy attr (although id should be sufficient) and more importantly by writing your tests to be determinant and atomic.
Go to cypress/support/commands.js file and write the following:
Cypress.Commands.add('runCode', (id) => {
cy.get('#' + id).then(() => {
if ($ele.attr('id') == 'X') {
//Write the code when the id is X
} else {
//Write the code when the id is Y
}
})
})
Then in your test write:
cy.runCode("X") //When id is X
cy.runCode("Y") //When id is Y
jQuery allows multiple selectors.
Presuming either one or other ID exists,
cy.get('div[id="X"], div[id="Y"]')
.then($el => {
if ($el.attr('id') === 'X') {
Xcode()
}
if ($el.attr('id') === 'Y') {
Ycode()
}
})

check if there is a div that has the words `some text` and has the `i` tag

I have a class:
<div class = "abc def">
<i style="...."></i>
some text 2
</div>
<div class = "abc def">
<i></i>
some text
</div>
<div class = "abc def">
1 some text
</div>
how can I check if there is a div that has the words some text and there is the i tag in this div?
for this example, I have to get the first and the second div. the third div doesn't have the i tag, so then I won't get him.
I think it should be:
elements = driver.findElement(By.xpath(//div[contains(text(), 'some text')]));
if (elements.length > 0) {
for(var i = 0; i < elements.length; i++) {
if (elements[i].find('<i') != null) {
alert('the item: ' + i + 'is found');
}
}
}
The XPath expression more or less equals the natural language version:
//div[contains(., 'some text') and i]
If the <i/> tag may be contained within other elements, use .//i instead. In most cases you want to use . instead of text(), this joins all text nodes and scans the combined result, so <em>some</em> text would be matched, too.

Html Aglity pack extra <A> tag

The extra <A> in the following causes selectnode() to return too many elements. How can I remove the extra characters?
<DIV align=center><STRONG><A><A class=white
href="javascript: event_info = openWin('/events/search/index_results.cfm?action=plan&event_number=2013292001&cde_comp_group=CONF&cde_comp_type=&NEW_END_DATE1>=&key_stkhldr_event=&mixed_breed=N', 'eventinfo', 'width=800,height=600,toolbar=1,location=0>,directories=0,status=0,menuBar=0,scrollBars=1,resizable=1' ); event_info.focus()"><STRONG>Labrador
Retriever Club of the Piedmont</STRONG></A> </STRONG></DIV
>
You could select only those <a> tags, which have e.g. href attribute set:
var doc = new HtmlDocument();
doc.LoadHtml(html);
var anchors = doc.DocumentNode
.SelectNodes("//a[#href]")
.ToList();
foreach (var anchor in anchors)
{
//process your node here
}

Working with ASP.NET Razor and HTML

I have a list of categories and sub categories which is passing from controller to the view. Now, I want them to be represented in the HTML like following. But, I dont know how can i achieve this by using foreach or table or whatever.
EDIT : Code
public ActionResult Electronics()
{
var topCategories = pe.Categories.Where(category => category.ParentCategory.CategoryName == "Electronics").ToList();
//var catsAndSubs = pe.Categories.Include("ParentCategory").Where(c => c.ParentCategory.CategoryName == "Electronics");
return View(topCategories);
}
With this view code, I am just able to pull a vertical list.
#foreach (var cats in Model)
{
<li>#cats.CategoryName</li>
foreach (var subcats in cats.SubCategories)
{
<li>#subcats.CategoryName</li>
}
}
When designing HTML mark-up it is very important to consider semantics. What meaning are you trying to convey? That doesn't look like tabular data to me so please don't put it in tables :P
Based on your wireframe above, the way I would probably structure this is like this:
<h1>Category Directory</h1>
<h2>Multimedia Projectors</h2>
<h2>Home Audio</h2>
<p>
Amplifiers, Speakers
</p>
Adjust the hX tags to reflect their position within the document's hierachy. Remember to only ever have ONE h1 per page (or per <acticle>, or <section> if using HTML5).
If instead you wind up turning this into something like a Superfish menu then this is the markup that you would use:
<nav id="category_menu">
<ul>
<li>
Multimedia Projectors
</li>
<li>
Home Audio
<ul>
<li>
Amplifiers
</li>
<li>
Speakers
</li>
</ul>
</li>
</ul>
</nav>
Edit
Your model is not suitable for creating your desired view, the relationship is bottom-up, but to conveniently construct the view you will want the relationships defined top-down. You need to start by converting the data model into a view model, such as:
class CategoryViewModel
{
string CategoryName { get;set; }
IList<CategoryModel> SubCategories { get;set; }
}
and to make this:
IList<CategoryViewModel> Map(IList<CategoryDataModel> dataModel)
{
var model = new List<CategoryViewModel>();
//Select the categories with no parent (these are the root categories)
var rootDataCategories = dataModel.Where(x => x.ParentCategory == null);
foreach(var dataCat in rootDataCategories )
{
//Select the sub-categories for this root category
var children = dataModel
.Where(x => x.ParentCategory != null && x.ParentCategory.Name = cat.Name)
.Select(y => new CategoryViewModel() { CategoryName = y.CategoryName })
.ToList();
var viewCat = new CategoryViewModel()
{
CategoryName = dataCat.CategoryName,
SubCategories = children
};
model.Add(viewCat);
}
return model;
}
Then your view:
<h1>Category Directory</h1>
#foreach(var category in Model)
{
#Html.Partial("Category", category)
}
Category partial:
<h2>#Html.ActionLink(Model.CategoryName, "Detail", new { Model.CategoryName })</h2>
#if(Model.SubCategories.Count> 0)
{
<p>
#for (var i = 0; i < Model.SubCategories.Count; i++)
{
var subCat = Model.SubCategories[i];
#Html.ActionLink(subCat.CategoryName, "Detail", new { subCat.CategoryName })
#if(i < Model.SubCategories.Count - 1)
{
<text>,</text>
}
}
</p>
}
Note that my current solution only supports 2 levels of categories (as per your wireframe). It could however be easily extended to be recursive.

Sorting Div's With PrototypeJS

I am looking for some help in sorting div's with PrototypeJS. Here is what I have so far:
document.observe('dom:loaded',function(){
$$('.sortcol').invoke('observe', 'click', function() {
if (this.hasClassName('desc')) {
var desc = false;
this.removeClassName('desc');
} else {
var desc = true;
this.addClassName('desc');
}
var colname = this.className;
var contentid = this.up(2).id;
sortColumn(contentid,colname,desc);
});
});
function sortColumn(contentid,colname,desc) {
$$('#'+contentid).select('.'+colname).sort(function(a,b){
if (desc) {
return (a.text.toLowerCase() >= b.text.toLowerCase() ) ? -1 : 1;
} else {
return (a.text.toLowerCase() < b.text.toLowerCase() ) ? -1 : 1;
}
});
}
Example data:
<div id="contentbox_Users" class="userList">
<div class="userListHeader">
<div class="userListHeaderCell col1">First Name</div>
<div class="userListHeaderCell col2">Last Name</div>
</div>
<div id="contentbox_People">
<div class="userListRow">
<div class="userListCell col1">John</div>
<div class="userListCell col2">Smith</div>
</div>
<div class="userListRow">
<div class="userListCell col1">Bob</div>
<div class="userListCell col2">Ray</div>
</div>
<div class="userListRow">
<div class="userListCell col1">Fred</div>
<div class="userListCell col2">Jones</div>
</div>
</div>
</div>
Basically anything with a class "sortcol", when it is clicked, I want it to sort by the column name clicked (class). The first issue is I need to be able to get the class name correctly when there is multiple classes. The classes are all like col1, col2, etc. How would I find the correct class?
The second thing is changing sortColumn so that it keeps column data together (each row is wrapped by another div) and output the result, replacing the current data.
This needs to be done in prototypejs and I can't change the code to tables.
Thanks in advance for the help.
For the first part of your question it would be much easier if the column name was it's own attribute like rel or data-*, but you say you cannot change the HTML. It is possible to pick out the likeliest class with regex...
var colname = this.className.match(/\bcol\d+\b/).first()
But this is unnecessary if we assume every row has the same columns in the same order. This would be a safer assumption if a table were used.
var colnumber = this.up().childElements().indexOf(this);
The second part of your question is easy, just sort the rows instead of the cells.
Your draft sortColumn function doesn't actually change the elements - select returns an array of element references, not their container - so you need to do something with the resulting array. Luckily any append or insert action of an element causes it to be removed from it's parent first, so simply append them once more and they'll assume the correct order. No replacing is needed, I've seen libraries that bizarrely convert the elements to HTML, concatenate that then reinsert it!?!
The following has been tested.
document.observe('dom:loaded',function() {
$$('.userListHeaderCell').invoke('observe', 'click', function() {
this.toggleClassName('desc');
var colnumber = this.up().childElements().indexOf(this);
var content = this.up(2); // use the element directly instead of it's ID
sortColumn(content, colnumber, this.hasClassName('desc'));
});
});
function sortColumn(content, colnumber, desc) {
content.select('.userListRow').sort(function(a,b){
var atext = a.down(colnumber).innerHTML.stripTags().toLowerCase();
var btext = b.down(colnumber).innerHTML.stripTags().toLowerCase();
return atext.localeCompare(btext) * (desc ? -1 : 1);
}).each(Element.prototype.appendChild, content);
}
This to me seems like you are creating tabular data. So why not use a table? And once you use a table, there are many sorting scripts out there. A quick google came up with this one.

Resources