HtmlAgilityPack Div Class Contains String - html-agility-pack

I'm trying to scrape only article text from web pages. I have discovered that the article is always surrounded with div tags. Unfortunately the class of these div tags is slightly different for each web page. I looked into using XPath but I don't think it will work due to the different class names. Is there a way I can get all the div tags and then get the class?
Examples
<div class="entry_single">
<p>I recently traveled without my notebook for the first time in ages.</p>
</div>
<div class="entry-content-pagination">
<p>Ward 9 Ald. Steven Dove</p>
</div>

That'd be easier using Linq.
foreach(HtmlNode div in doc.DocumentNode.Descendants("div"))
{
string className = div.GetAttributeValue("class", string.Empty);
// do something with class name
}

Related

How can I add c# code within HTML tags in ASP.Net Core

I'm migrating my site from ASP.net Framework (4.7.2) to Asp.net Core (5). One issue that I can't seem to figure out is that in my original site I had c# in a few of my HTML tags to set the css class(es). For instance:
<div class="carousel-item propertyCarousel #if (firstImage) { <text>active</text> } #if (slideNumber > 2) { <text>bonus-image</text> } " data-slide-number="#slideNumber.ToString("D2")">
Because of tag helpers, asp complains about the code. So I disabled tag helpers in the _ViewImports.cshtml and it no longer complains, but then sometimes the code just doesn't work. For instance in the above example I never get a div with the 'active' class despite verifying the conditions are correct (i.e. that 'firstImage' is true for the first image).
Since the previous commenter did not really answered the question, I'm going to go ahead and say that you need to use brackets for it to work.
This works because it's in a variable:
<div class="#htmlClass"></div>
But when you need the result of an expression from within your HTML attribute like:
<div class="#myvar == true ? "active" : string.Empty"></div>, does not work. What you should do is wrap it in brackets like this :
<div class="#(myvar == true ? "active" : string.Empty)"
This will output: <div class="active"> if the result of the expression was true.

Ckeditor in Drupal 8 : how to remove <span> tags if they don't have class attributes?

I'm using the "Allowed html tags" filter in Ckeditor - Drupal 8.
I want Ckeditor to keep <span> tags that have specific classes or IDs, and to remove if it has no attribute.
For example :
Keep span: <span class="apple">text sample</span>
Keep span : <span id="fruit">text sample</span>
Remove span : <span>text sample</span> -> text sample
Actually, when I configure a text format, I have this code in the allowed tags field :
<p><sup><sub><span id class="apple"><a href !href accesskey id rel target title>
It keeps <span> with IDs or wanted classes, but I cannot get rid of the unwanted <span> with no attribute.
Is there any way to solve this problem with code input?
Thanks in advance,
Emilie
So here is the custom module I wrote to make it work and to get around this major bug in CKEDITOR :
<?php
use Drupal\editor\Entity\Editor;
function MODULENAME_editor_js_settings_alter(array &$settings) {
foreach ($settings['editor']['formats'] as $name => $value) {
$settings['editor']['formats']['machine_name_of_your_text_editor_profile']
['editorSettings']['allowedContent'] =
'p sup h1 h2 h3' +
'span[!id];
span(!foo);
span(!bar);
span(!jane);
span(!doe);'
;}
}
Result : spans are totally deleted if there is no ID, or if you use a class that is not mentionned in this list (foo, bar, jane or doe). You must declare all elements you need to be displayed, because this config will overwrite all previous inputs in the ACF field.
For this solution, I was inspired by :
The ACF Custom doc : https://ckeditor.com/docs/ckeditor4/latest/examples/acfcustom.html
A tread about hook_editor_js_settings_alter : https://drupal.stackexchange.com/questions/268311/hook-editor-js-setting...
Note : Limit allowed HTML tags and correct faulty HTML filter (in /admin/config/content/formats) does not act consistently with the Ckeditor API. Only a part of the options are really implemented in this field, and uses of "!" don't work. This is why the solution provided uses "hook_editor_js_settings_alter".
function MODULENAME_editor_js_settings_alter(array &$settings) {
$formats = ['basic_html', 'full_html'];
foreach ($formats as $format) {
$settings['editor']['formats'][$format]['editorSettings']['allowedContent']['span']['attributes'] = '!class';
}
}
allowedContent is an array when loaded by Drupal. Instead of replacing it with a string, you can use the ACF rules to specify whether attributes are required. This allows the config from the UI to still apply.

How do I extract a class from this HTML with XPath

<div class="a-row a-spacing-micro" style="">
<i class="a-icon a-icon-star-medium a-star-medium-4"></i>
<a data-analytics="{"name":"Review.FullReview"}" class="a-size-base a-link-normal a-color-base review-title a-text-bold" href="/gp/cdp/member-reviews/A19123D9G66E0O/ref=pdp_new_read_full_review_link?ie=UTF8&page=1&sort_by=MostRecentReview#R1Z0A6K9CROFFV"> <span>Good Cheap Knee Pads</span>
</a>
</div>
I have this HTML that I am scraping with XPath. What XPath would I use to just return the class "a-star-medium-4"?
Thanks!
Jeff
If it's only for this specific HTML, you can extract the class name starting with a-star with this XPath:
substring(string(//i/#class),string-length(substring-before(string(//i/#class),'a-star')) +1)
When applied to your example HTML this returns a-star-medium-4.
As explanation: string(//i/#class) returns the class attribute value a-icon a-icon-star-medium a-star-medium-4. To get only the class name starting with a-star, substring() is used to remove the part of the string before a-star by cutting the string after the string-length() of the remaining string when it's cutted before a-star using substring-before().

How do I theme the title of a block in a view

I am using views and I have created a block. I need to add a span to Newest members.
I tried using the theme tpl files given by views. The top most file is views-view.tpl.php, I added a class to the first div "vishaltest" however u can see it starts a bit lower than what I want. how can I override this section
<div class="block-title">Newest members</div>
the code:
<section id="block-views-5a3590205379433adabbd042516161b0" class="block block-views clearfix">
<div class="block-title">Newest members</div>
<div class="view view-recently-added-updated-profiles view-id-recently_added_updated_profiles view-display-id-newest_member view-dom-id-e8042a917bbe79ecf65705f5c8bda2a3 vishaltest">
I think you would have more chance to access the block title through block--views-view.tpl.php using $block->subject variable.

Getting raw text using #Html.ActionLink in Razor / MVC3?

Given the following Html.ActionLink:
#Html.ActionLink(Model.dsResults.Tables[0].Rows[i]["title"].ToString(), "ItemLinkClick",
new { itemListID = #Model.dsResults.Tables[0].Rows[i]["ItemListID"], itemPosNum = i+1 }, ...
Data from the model contains HTML in the title field. However, I am unable to display the HTML encoded values. ie. underlined text shows up with the <u>....</u> around it.
I've tried Html.Raw in the text part of the ActionLink, but no go.
Any suggestions?
If you still want to use a helper to create an action link with raw HTML for the link text then I don't believe you can use Html.ActionLink. However, the answer to this stackoverflow question describes creating a helper which does this.
I would write the link HTML manually though and use the Url.Action helper which creates the URL which Html.ActionLink would have created:
<a href="#Url.Action("ItemLinkClick", new { itemListID = #Model.dsResults.Tables[0].Rows[i]["ItemListID"], itemPosNum = i+1 })">
#Html.Raw(Model.dsResults.Tables[0].Rows[i]["title"].ToString())
</a>
MVCHtmlString.Create should do the trick.
Using the actionlink below you do not need to pass html in the model. Let the css class or inline style determine how the href is decorated.
#Html.ActionLink(Model.dsResults.Tables[0].Rows[i]["title"], "ItemLinkClick", "Controller", new { #class = "underline", style="text-decoration: underline" }, null)
those are the cases that you should take the other path
#{
string title = Model.dsResults.Tables[0].Rows[i]["title"].ToString(),
aHref = String.Format("/ItemLinkClick/itemListID={0}&itemPosNum={1}...",
Model.dsResults.Tables[0].Rows[i]["ItemListID"],
i+1);
}
#Html.Raw(title)
Remember that Razor helpers, help you, but you can still do things in the HTML way.
You could also use this:
<a class='btn btn-link'
href='/Mycontroler/MyAction/" + item.ID + "'
data-ajax='true'
data-ajax-method='Get'
data-ajax-mode='InsertionMode.Replace'
data-ajax-update='#Mymodal'>My Comments</a>

Resources