i'm working on a project that is based on web scraping with .NET Framework and Html-Agility-Pack tool.
At first, i made a method that parse the Category list from https://www.gearbest.com and it's totally working fine.
But now i need to parse the products from each category list item.
For example there is appliances category https://www.gearbest.com/appliances-c_12245/, but when i run the method it returns an error :
'The underlying connection was closed: An unexpected error occurred on a receive'
Here is my code :
public void Get_All_Categories()
{
var html = #"https://www.gearbest.com/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var nodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div[1]/div/ul[2]/li[1]/ul/li//a/span/../#href");
foreach (HtmlNode n in nodes)
{
Category c = new Category();
c.Name = n.InnerText;
c.CategoryLink = n.GetAttributeValue("href", string.Empty);
categories.Add(c);
}
}
This is working pretty much fine.
public void Get_Product()
{
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
var html = #"https://www.gearbest.com/appliances-c_12245/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var x = htmlDoc.DocumentNode.SelectSingleNode("//*[#id=\"siteWrap\"]/div[1]/div[1]/div/div[3]/ul/li[1]/div/p[1]/a");
Console.WriteLine(x.InnerText);
Console.WriteLine("done");
}
But this method doesn't work and it returns that error.
How can i fix this please ?
P.S : I already saw some solutions about HTTPS handling but it didn't work for me, maybe because i don't understand it.
I would appreciate any help, thank you in advance.
Related
My team has been using VSTS for 8 months. Now, Our customer is asking to get "Repro Steps" of the work items in VSTS.
Is there any way to get the content of "Repro Steps" without the HTML format?
No, because the Repro Steps value is the rich text that can contain image etc…. So, the value is incorrect if just return the data without HTML format.
However, you can remove HTML tag programing.
Simple code:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
var u = new Uri("[collection URL]"");
VssCredentials c = new VssCredentials(new Microsoft.VisualStudio.Services.Common.WindowsCredential(new NetworkCredential("[user name]", "[password]")));
var connection = new VssConnection(u, c);
var workitemClient = connection.GetClient<WorkItemTrackingHttpClient>();
var workitem = workitemClient.GetWorkItemAsync(96).Result;
object repoValue = workitem.Fields["Microsoft.VSTS.TCM.ReproSteps"];
string repoValueWithOutformat = StripHTML(repoValue.ToString());
I'm importing test cases from xml file to TFS2010 and get an exception. But there is no info about what definitely is incorrect.
"Work item 0 is invalid and cannot be saved. Exception: 'TF237124: Work Item is not ready to save'."
How is it possible to determine what is wrong in imported data from xml?
using System.Text.RegularExpressions;
using System.Xml;
using Microsoft.TeamFoundation.Server;
using Microsoft.TeamFoundation.WorkItemTracking.Client;
using System;
using System.Linq;
internal class Program
{
// Input File
private static TestLink testLink = new TestLink("E:\\dev\\TestLinkToTfs\\testsuites.xml");
// Target TFS server
private static Tfs tfs = new Tfs("http://host:8080/tfs/Test", "Test");
private static void Main(string[] args)
{
var testLinkTestCase = testLink.GetTestCases().Take(1).ToList();
var steps = testLinkTestCase.Descendants("step");
var testCase = tfs.Project.TestCases.Create(tfs.Project.WitProject.WorkItemTypes["Test Case"]);
testCase.Title = testLinkTestCase.Attribute("name").Value;
var summary = testLinkTestCase.Descendants("summary").ToList();
var issueId = TestLink.GetLinkedIssueId(summary);
var regEx = new Regex(#"[^a-zA-Z0-9 -]");
var grandParentName = regEx.Replace(testLinkTestCase.Parent.Parent.Attribute("name").Value, string.Empty);
var parentName = regEx.Replace(testLinkTestCase.Parent.Attribute("name").Value, string.Empty);
var area = string.Format(#"Test\Test Cases\{0}\{1}", grandParentName, parentName);
testCase.CustomFields["Assigned To"].Value = string.Empty;
testCase.Area = area;
Tfs.AddSteps(steps, testCase);
testCase.Save();
}
Console.ReadKey();
}
}
}
When the Work Item id is 0 means that this is created dynamically and some field values are not valid. You should try the method
workitem.validate();
before you save the Work Item and then try to debug you code. This will tell you the exact fields that have invalid data.
I could be more helpful if you post the code and the xml that you use for this.
I decided to implement caching to improve the performance of the product pages.
Each page contains a large amount of the product's images.
I created the following code in a Razor view.
#{
var productID = UrlData[0].AsInt();
var cacheItemKey = "products";
var cacheHit = true;
var data = WebCache.Get(cacheItemKey);
var db = Database.Open("adldb");
if (data == null) {
cacheHit = false;
}
if (cacheHit == false) {
data = db.Query("SELECT * FROM Products WHERE ProductID = #0", productID).ToList();
WebCache.Set(cacheItemKey, data, 1, false);
}
}
I'm using the data with the following code:
#foreach (dynamic p in data)
{
<a href="~/Products/View/#p.ProductID"
<img src="~/Thumbnail/#p.ProductID"></a>
}
The caching code works well, but when passing the new query string parameter (changing the version of the page) the result in browser is the same for the declared cashing time.
How to make caching every version of the page?
Thanks
Oleg
A very simple approach might be to convert your key (productID) to a string and append it to the name of your cacheItemKey.
So you might consider changing the line:
var cacheItemKey = "products";
to read:
var cacheItemKey = "products" + productID.ToString();
This should produce the behavior you are looking for -- basically mimicking a VaryByParam setup.
ps. Please keep in mind I have not added any sort of defensive code, which you should do.
Hope that helps.
Tool: Visual Studio 2010
Language: C#
I have just started learning Entity Framework,I'm stuck in a problem,whenver I used Code#1 it works fine but whenever I use CODE#2,I get error (posted below)
Title: InvalidOperationException was unhandled by user code
Error Message
"The EntityCollection has already been initialized. The InitializeRelatedCollection method should only be called to initialize a new EntityCollection during deserialization of an object graph."
//SchoolModel.Designer.cs
public EntityCollection<Course> Courses
{
get
{ //Blah blah code }
set
{
if ((value != null))
{//Below statement is pointed by Visual Studio as Exception Thrower
((IEntityWithRelationships)this).RelationshipManager.InitializeRelatedCollection<Course>("SchoolModel.CourseInstructor", "Course", value);
}
}
}
CODE# 1:
List<string> list = new List<string>();
var prs = new Person();
using (var myEntity = new SchoolEntities())
{
var result = myEntity.People;
foreach (var ppl in result)
{
list.Add(ppl.PersonID+","+ppl.FirstMidName);
}
}
CODE# 2:
List<string> list = new List<string>();
List<Person> prsList = new List<Person>();//when using this list,problem started
var prs = new Person();
using (var myEntity = new SchoolEntities())
{
var result = myEntity.People;
foreach (var ppl in result)
{
list.Add(ppl.PersonID+","+ppl.FirstMidName);
//New code which raised exceptions
prs.PersonID = ppl.PersonID;
prs.FirstMidName = ppl.FirstMidName;
prs.LastName = ppl.LastName;
prs.Courses = ppl.Courses;
prsList.Add(prs);
//New code end
}
}
Database Diagram:
Entity Diagram:
P.S:
I followed EF tutorials at http://www.asp.net/web-forms/tutorials/getting-started-with-ef/the-entity-framework-and-aspnet-getting-started-part-3, then deviated and started to play with it :)
I did find some related error questions,but my scenario is different.
You should not set an EntityCollection, as you do in prs.Courses = ppl.Courses. The collection has already been initialized (as per exception). You only modify it by Adding Course instances to it.
Can you try by moving the initialization of prs inside of the foreach loop.
I am planning on creating a custom route using ASP.NET Web Pages by dynamically creating WebPage instances as follows:
IHttpHandler handler = System.Web.WebPages.WebPageHttpHandler.CreateFromVirtualPath("~/Default.cshtml");
How can I supply an object to the underlying WebPage object so that it can become the web pages's "Model"? In other words I want to be able to write #Model.Firstname in the file Default.cshtml.
Any help will be greatly appreciated.
UPDATE
By modifying the answer by #Pranav, I was able to retrieve the underlying WebPage object using reflection:
public void ProcessRequest(HttpContext context)
{
//var page = (WebPage) System.Web.WebPages.WebPageHttpHandler.CreateFromVirtualPath(this.virtualPath);
var handler = System.Web.WebPages.WebPageHttpHandler.CreateFromVirtualPath(this.virtualPath);
var field = handler.GetType().GetField("_webPage", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
var page = field.GetValue(handler) as System.Web.WebPages.WebPage;
var contextWrapper = new HttpContextWrapper(context);
var pageContext = new WebPageContext(contextWrapper, page, context.Items[CURRENT_NODE]);
page.ExecutePageHierarchy(pageContext, contextWrapper.Response.Output);
}
Unfortunately this is not reliable as it does not work in Medium Trust (BindingFlags.NonPublic is ignored if application is not running in full trust). So while we have made significant progress, the solution is not yet complete.
Any suggestions will be greatly appreciated.
The Model property of a WebPage comes from the WebPageContext. To set a Model, you could create a WebPageContext with the right parameters:-
var page = (WebPage)WebPageHttpHandler.CreateFromVirtualPath("~/Default.cshtml");
var httpContext = new HttpContextWrapper(HttContext.Current);
var model = new { FirstName = "Foo", LastName = "Bar" };
var pageContext = new WebPageContext(httpContext, page, model);
page.ExecutePageHierarchy(pageContext, httpContext.Response.Output);
The model instance should now be available as a dynamic type to you in your page.