HtmlAgilityPack retrieve innerText without children's Innertext - html-agility-pack

I have
<p class="MyClass">
<span>Value:</span>
12345
</p>
I'd like to retrieve only 12345, if possible, thanks

Not sure if the most elegant solution, but what about
using HtmlAgilityPack;
using System;
using ScrapySharp.Extensions;
using System.Linq;
using HtmlAgilityPack.CssSelectors.NetCore;
namespace StackOverflow
{
class Program
{
static void Main(string[] args)
{
var doc = new HtmlDocument();
doc.LoadHtml(#"<p class='MyClass'>
<span>Value:</span>
12345
</p>");
//Using ScrapySharp.Extensions
var p = doc.DocumentNode.CssSelect("p")?.FirstOrDefault();
var span = p.CssSelect("span")?.FirstOrDefault();
Console.WriteLine(p.InnerText.Replace(span.InnerHtml, string.Empty)?.Trim());
//Using HtmlAgilityPack.CssSelectors.NetCore
var results = doc.QuerySelectorAll("p")?.Select(p => p.InnerText.Replace(p.QuerySelector("span").InnerHtml, string.Empty)?.Trim());
foreach(var result in results)
Console.WriteLine(result);
}
}
}
P.S.: I am used to working with ScrapySharp in conjunction to HtmlAgilityPack, but see that there is a HtmlAgilityPack.CssSelectors.NetCore that may the common choice nowadays.

Related

Does Linq Select does not work with IList?

I'm working with Visual Studio 2019 and it suggested a conversion for a foreach to Linq which then doesn't compile:
using System.Collections;
using System.Collections.Generic;
using System.Linq;
static void Main(string[] args)
{
var list = new List<object>();
list.Add("One");
list.Add("Two");
list.Add("Three");
// To make the point...
var iList = (IList)list;
// This doesn't compile and generates CS1061: IList does not contain a definition for Select.
// I thought Linq gave Select to list types?
var toStringList = iList.Select(s => s.ToString());
// The original foreach version:
var outputList = new List<string>();
foreach (var item in iList)
{
var itemToString = item.ToString();
outputList.Add(itemToString);
}
}
Linq is a set of extension methods on everything that implements IEnumerable<T>.
The non-generic IList doesn't implement that interface, so Linq won't work on it.
Visual Studio 2019 Professional with ReSharper doesn't suggest Select for me:

Scrape a table using ScrapySharp and HtmlAgilityPack

I am trying to scrape an economic calendar from a specific website. Actually, I tried many times without any success, I don't know where I am wrong. Can you help me, pls?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;
using ScrapySharp.Extensions;
using ScrapySharp.Network;
namespace Calendar
{
class Program
{
static void Main(string[] args)
{
var url = "https://www.fxstreet.com/economic-calendar";
var webGet = new HtmlWeb();
if (webGet.Load(url) is HtmlDocument document)
{
var nodes = document.DocumentNode.CssSelect("#fxst-calendartable tbody").ToList();
foreach (var node in nodes)
{
// event_time
Debug.Print(node.CssSelect("div div").Single().InnerText);
// event-title
Debug.Print(node.CssSelect("div a").Single().InnerText);
}
}
Console.WriteLine("");
Console.ReadLine()
}
}
}
What error are you getting?
If you want to publish the event names and times from the website, I am assuming you need to read the table.
You can do so using
HtmlNode tablebody = doc.DocumentNode.SelectSingleNode("//table[#class='fxs_c_table']/tbody");
foreach(HtmlNode tr in tablebody.SelectNodes("./tr[#class='fxs_c_row']"))
{
Console.WriteLine("\nTableRow: ");
foreach(HtmlNode td in tr.SelectNodes("./td"))
{
Console.WriteLine(td.SelectSingleNode("./span").InnerText);
}
}
Get hold of the table with the class attribute and then use relevant XPATH to traverse the elements. Please post the error you are getting with your code.

Populating Umbraco Contour forms from using cookie data

We're currently using Umbraco version 7.1.4 assembly: 1.0.5261.28127 with Contour version 3.0.26
I'm trying to populate a contour form with information pulled from a database, but dependent on a user cookie (the cookie hold the primary key for the record in the database).
To implement this I'm looking at writing a custom field type (well a bunch of them, one for each data field) which examines the cookie makes the db request and then populates the textbox with the value (users name/address/etc).
I've managed to add custom setting to a control and have it display the value that's populated at design time, but I can't seem to amend that value at run time.
I'm happy to post the code if relevant, but my question is. Am I barking up the wrong tree? is this the best way to handle this or would it even work?
Any pointers would be most welcome
Thanks
EDIT
Thanks Tim,
I've now managed to break it in such a way it's not even rendering the controls (the debug message is saying the SVT value doesn't exist).
This just (or should) just populate the form with the current date/time just to get something working.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Umbraco.Forms.Core;
using System.Web.UI.WebControls;
namespace Custom.FieldType
{
public class CustomTextfield : Umbraco.Forms.Core.FieldType
{
public CustomTextfield()
{
//Provider
this.Id = new Guid("b994bc8b-2c65-461d-bfba-43c4b3bd2915");
this.Name = "Custom Textfield";
this.Description = "Renders a html input fieldKey"; //FieldType
this.Icon = "textfield.png";
this.SVT = DateTime.Now.ToLongTimeString();
}
public System.Web.UI.WebControls.TextBox tb;
public List<Object> _value;
[Umbraco.Forms.Core.Attributes.Setting("SVT", description = "the SVT")]
public string SVT { get; set; }
public override WebControl Editor
{
get
{
tb.TextMode = System.Web.UI.WebControls.TextBoxMode.SingleLine;
tb.CssClass = "text gaudete";
if (_value.Count > 0)
tb.Text = _value[0].ToString();
SVT = DateTime.Now.ToLongTimeString();
tb.Text = tb.Text + SVT;
return tb;
}
set { base.Editor = value; }
}
public override List<Object> Values
{
get
{
if (tb.Text != "")
{
_value.Clear();
_value.Add(tb.Text);
}
return _value;
}
set { _value = value; }
}
public override string RenderPreview()
{
return
"<input type=\"text\" id=\"text-content\" class=\"text\" maxlength=\"500\" value=\"" + this.SVT + "\" />";
}
public override string RenderPreviewWithPrevalues(List<object> prevalues)
{
return RenderPreview();
}
public override bool SupportsRegex
{
get { return true; }
}
}
}
And the view is
#model Umbraco.Forms.Mvc.Models.FieldViewModel
#{
var widthSetting = Model.AdditionalSettings.FirstOrDefault(s => s.Key.Equals("Width"));
string width = (widthSetting == null) ? null : widthSetting.Value;
var textSetting = Model.AdditionalSettings.FirstOrDefault(s => s.Key.Equals("SVT"));
string widthTXT = (textSetting == null) ? null : textSetting.Value;
}
<input type="text" name="#Model.Name" id="#Model.Id" class="text" maxlength="500"
value="#{if(!string.IsNullOrEmpty(widthTXT)){<text>#(SVT)</text>}}"
#{if(Model.Mandatory || Model.Validate){<text>data-val="true"</text>}}
#{if (Model.Mandatory) {<text> data-val-required="#Model.RequiredErrorMessage"</text>}}
#{if (Model.Validate) {<text> data-val-regex="#Model.InvalidErrorMessage" data-regex="#Model.Regex"</text>}}
/>
The code is mostly cobbled together from online tutorials which is why the naming is abysmal but if I can get something to populate the text box on the clients side then I can start the process of refactoring (well scrapping this demo version and writing a real version)
Thanks.
EDIT2
I was able to fix the error stopping the view loading thanks to the pointer from Tim, the new view looks as follows
#model Umbraco.Forms.Mvc.Models.FieldViewModel
#{
var textSetting = Model.AdditionalSettings.FirstOrDefault(s => s.Key.Equals("SVT"));
string widthTXT = (textSetting == null) ? null : textSetting.Value;
}
<input type="text" name="#Model.Name" id="#Model.Id" class="text" maxlength="500"
value="#{if(!string.IsNullOrEmpty(widthTXT)){<text>#(widthTXT)</text>}else{<text>Unknown</text>}}"
#{if(Model.Mandatory || Model.Validate){<text>data-val="true"</text>}}
#{if (Model.Mandatory) {<text> data-val-required="#Model.RequiredErrorMessage"</text>}}
#{if (Model.Validate) {<text> data-val-regex="#Model.InvalidErrorMessage" data-regex="#Model.Regex"</text>}}
/>
And just displays "Unknown" in the text box
thanks again.

Where Linq Method returning elements that don't satisfy condition [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I am invoking Where method on a List and is returning elements that don't satisfy my condition.
Here is my call to the Where method:
IEnumerable<MyObject> list = returnList.Where(p => p.MaxDate != null && p.MinDate != null);
I am expecting to have on "list" IEnumerable only the objects that have both MaxDate and MinDate defined (not null)!
And "list" ends with the same results as my returnList, and actually none of the items on "list" as the MaxDate and MinDate defined (different than null), my where condition was supposed to return no elements in that case, am I right?
Thank you very much in advance
EDIT2 (I added the namespaces I am using, maybe there is some bug with this):
Simple example:
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Security.Cryptography;
using System.Text;
using System.Web;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
class MyObject
{
public DateTime? MinDate { get; set; }
public DateTime? MaxDate { get; set; }
public string Description{ get; set; }
}
static void Main(string[] args)
{
List<MyObject> lista = new List<MyObject>();
lista.Add(new MyObject { Description = "123" });
lista.Add(new MyObject { Description = "456" });
lista.Add(new MyObject { Description = "678" });
IEnumerable<MyObject> returnn = lista.Where(p => p.MinDate != null && p.MaxDate != null); //this list contains 3 elements and should have 0!! Microsoft bug???? I can't understand this!
}
}
returnList.Where(p => p.MaxDate.HasValue && p.MinDate.HasValue);
Working example:
https://dotnetfiddle.net/qQrjkC
Edit: even the != null should also work, you should do your tests properly before giving downvotes
Jesus, I am feeling so dumb right now, I was looking at the field "source" in the IEnumerable attribute "returnn", instead of checking the actual ResultsView, I made a ToList() and returned no elements!
I am so sorry lol, maybe someone can close this question...
Thank you all for the efforts everyone! The problem was in front of the computer (me) LOL

How do I get self-hosted WebApi to fetch linked content?

Noob question. I have a project with a self-hosted Web Api. I'm using the RazorEngine package so that I can serve up HTML pages using the views/razor scheme.
Within the HTML page there are links to .css, .JS, and images. How does the page get these embedded resources?
As I understand it, http://localhost:8080/api/home in the browser causes the project to 'call' the page at /Views/Home.html and pass through the Value object. This results in HTML appearing in the browser rather than the usual JSON/XML that you normally get with WebAPi.
For the page to retrieve the embedded javascript, I guess I would create another WebApi controller that would respond to the URL, but how do I get it to transmit the javascript page? Ie how do I get it to look in a folder called 'Scripts' and not 'Views', not attempt to convert to HTML, and not bother with an associated model?
public class HomeController : ApiController
{
//http://localhost:8080/api/home
public Value GetValues()
{
return new Value() { Numbers = new int[] { 1, 2, 3 } };
}
}
[View("Home")]
public class Value
{
public int[] Numbers { get; set; }
}
home.cshtml...
<html>
<head>
<script src="/Scripts/script1.js"></script>
</head>
<body>
<img src="/Images/image1.png">
....
</body>
</html>
In case anyone else has this issue, this is how I did it in the end....
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Web.Http;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Diagnostics;
using WebApiContrib.Formatting.Html;
using System.IO;
using System.Net;
using System.Drawing;
using System.Resources;
using System.Reflection;
using System.Text.RegularExpressions;
namespace Owin_Test1.Controllers
{
public class PageResourcesController : ApiController
{
//
// An HTML page will have references to css, javascript and image files
// This method supplies these file to the browser
// These files are saved in the Visual Studio project as linked resources
// Make sure the resources are names correctly (and correct case) i.e.:
// <fileName> = <resourceName>.<fileExtension>
// http://localhost:8080/api/PageResources/<fileName>
// The fileExtension is used to determine how to extract & present the resource
// (Note, <filename> is the reference in the HTML page
// - it needed be the same as the name of the actual file.)
//
public HttpResponseMessage Get(string filename)
{
String projectName = "Owin_Test1";
//Obtain the resource name and file extension
var matches = Regex.Matches(filename, #"^\s*(.+?)\.([^.]+)\s*$");
String resourceName = matches[0].Groups[1].ToString();
String fileExtension = matches[0].Groups[2].ToString().ToLower();
Debug.WriteLine("Resource: {0} {1}",
resourceName,
fileExtension);
//Get the resource
ResourceManager rm = new ResourceManager(
projectName + ".Properties.Resources",
typeof(Properties.Resources).Assembly);
Object resource = rm.GetObject(resourceName);
ImageConverter imageConverter = new ImageConverter();
byte[] resourceByteArray;
String contentType;
//Generate a byteArray and contentType for each type of resource
switch (fileExtension)
{
case "jpg":
case "jpeg":
resourceByteArray = (byte[])imageConverter.ConvertTo(resource, typeof(byte[]));
contentType = "image/jpeg";
break;
case "png":
resourceByteArray = (byte[])imageConverter.ConvertTo(resource, typeof(byte[]));
contentType = "image/png";
break;
case "css":
resourceByteArray = Encoding.UTF8.GetBytes((String)resource);
contentType = "text/css";
break;
case "js":
resourceByteArray = Encoding.UTF8.GetBytes((String)resource);
contentType = "application/javascript";
break;
case "html":
default:
resourceByteArray = Encoding.UTF8.GetBytes((String)resource);
contentType = "text/html";
break;
}
//Convert resource to a stream, package up and send on to the browser
MemoryStream dataStream = new MemoryStream(resourceByteArray);
HttpResponseMessage response = new HttpResponseMessage(HttpStatusCode.OK);
response.Content = new StreamContent(dataStream);
response.Content.Headers.ContentType = new MediaTypeHeaderValue(contentType);
return response;
}
}
}

Resources