i'm working on a project that is based on web scraping with .NET Framework and Html-Agility-Pack tool.
At first, i made a method that parse the Category list from https://www.gearbest.com and it's totally working fine.
But now i need to parse the products from each category list item.
For example there is appliances category https://www.gearbest.com/appliances-c_12245/, but when i run the method it returns an error :
'The underlying connection was closed: An unexpected error occurred on a receive'
Here is my code :
public void Get_All_Categories()
{
var html = #"https://www.gearbest.com/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var nodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div[1]/div/ul[2]/li[1]/ul/li//a/span/../#href");
foreach (HtmlNode n in nodes)
{
Category c = new Category();
c.Name = n.InnerText;
c.CategoryLink = n.GetAttributeValue("href", string.Empty);
categories.Add(c);
}
}
This is working pretty much fine.
public void Get_Product()
{
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
var html = #"https://www.gearbest.com/appliances-c_12245/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var x = htmlDoc.DocumentNode.SelectSingleNode("//*[#id=\"siteWrap\"]/div[1]/div[1]/div/div[3]/ul/li[1]/div/p[1]/a");
Console.WriteLine(x.InnerText);
Console.WriteLine("done");
}
But this method doesn't work and it returns that error.
How can i fix this please ?
P.S : I already saw some solutions about HTTPS handling but it didn't work for me, maybe because i don't understand it.
I would appreciate any help, thank you in advance.
My team has been using VSTS for 8 months. Now, Our customer is asking to get "Repro Steps" of the work items in VSTS.
Is there any way to get the content of "Repro Steps" without the HTML format?
No, because the Repro Steps value is the rich text that can contain image etc…. So, the value is incorrect if just return the data without HTML format.
However, you can remove HTML tag programing.
Simple code:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
var u = new Uri("[collection URL]"");
VssCredentials c = new VssCredentials(new Microsoft.VisualStudio.Services.Common.WindowsCredential(new NetworkCredential("[user name]", "[password]")));
var connection = new VssConnection(u, c);
var workitemClient = connection.GetClient<WorkItemTrackingHttpClient>();
var workitem = workitemClient.GetWorkItemAsync(96).Result;
object repoValue = workitem.Fields["Microsoft.VSTS.TCM.ReproSteps"];
string repoValueWithOutformat = StripHTML(repoValue.ToString());
I am using the OData sample project at http://www.asp.net/web-api/overview/odata-support-in-aspnet-web-api/working-with-entity-relations. In the Get I want to be able to change the Filter in the QueryOptions of the EntitySetController:
public class ProductsController : EntitySetController<Product, int>
{
ProductsContext _context = new ProductsContext();
[Queryable(AllowedQueryOptions=AllowedQueryOptions.All)]
public override IQueryable<Product> Get()
{
var products = QueryOptions.ApplyTo(_context.Products).Cast<Product>();
return products.AsQueryable();
}
I would like to be able to find properties that are specifically referred to. I can do this by parsing this.QueryOptions.Filter.RawValue for the property names but I cannot update the RawValue as it is read only. I can however create another instance of FilterQueryOption from the modified RawValue but I cannot assign it to this.QueryOptions.Filter as this is read only too.
I guess I could call the new filter's ApplyTo passing it _context.Products, but then I will need to separately call the ApplyTo of the other properties of QueryOptions like Skip and OrderBy. Is there a better solution than this?
Update
I tried the following:
public override IQueryable<Product> Get()
{
IQueryable<Product> encryptedProducts = _context.Products;
var filter = QueryOptions.Filter;
if (filter != null && filter.RawValue.Contains("Name"))
{
var settings = new ODataQuerySettings();
var originalFilter = filter.RawValue;
var newFilter = ParseAndEncyptValue(originalFilter);
filter = new FilterQueryOption(newFilter, QueryOptions.Context);
encryptedProducts = filter.ApplyTo(encryptedProducts, settings).Cast<Product>();
if (QueryOptions.OrderBy != null)
{
QueryOptions.OrderBy.ApplyTo<Product>(encryptedProducts);
}
}
else
{
encryptedProducts = QueryOptions.ApplyTo(encryptedProducts).Cast<Product>();
}
var unencryptedProducts = encryptedProducts.Decrypt().ToList();
return unencryptedProducts.AsQueryable();
}
and it seems to be working up to a point. If I set a breakpoint I can see my products in the unencryptedProducts list, but when the method returns I don't get any items. I tried putting the [Queryable(AllowedQueryOptions=AllowedQueryOptions.All)] back on again but it had no effect. Any ideas why I am not getting an items?
Update 2
I discovered that my query was being applied twice even though I am not using the Queryable attribute. This meant that even though I had items to return the List was being queried with the unencrypted value and therefore no values were being returned.
I tried using an ODataController instead:
public class ODriversController : ODataController
{
//[Authorize()]
//[Queryable(AllowedQueryOptions = AllowedQueryOptions.All)]
public IQueryable<Products> Get(ODataQueryOptions options)
{
and this worked! Does this indicate that there is a bug in EntitySetController?
You would probably need to regenerate ODataQueryOptions to solve your issue. Let's say if you want to modify to add $orderby, you can do this like:
string url = HttpContext.Current.Request.Url.AbsoluteUri;
url += "&$orderby=name";
var request = new HttpRequestMessage(HttpMethod.Get, url);
ODataModelBuilder modelBuilder = new ODataConventionModelBuilder();
modelBuilder.EntitySet<Product>("Product");
var options = new ODataQueryOptions<Product>(new ODataQueryContext(modelBuilder.GetEdmModel(), typeof(Product)), request);
I decided to implement caching to improve the performance of the product pages.
Each page contains a large amount of the product's images.
I created the following code in a Razor view.
#{
var productID = UrlData[0].AsInt();
var cacheItemKey = "products";
var cacheHit = true;
var data = WebCache.Get(cacheItemKey);
var db = Database.Open("adldb");
if (data == null) {
cacheHit = false;
}
if (cacheHit == false) {
data = db.Query("SELECT * FROM Products WHERE ProductID = #0", productID).ToList();
WebCache.Set(cacheItemKey, data, 1, false);
}
}
I'm using the data with the following code:
#foreach (dynamic p in data)
{
<a href="~/Products/View/#p.ProductID"
<img src="~/Thumbnail/#p.ProductID"></a>
}
The caching code works well, but when passing the new query string parameter (changing the version of the page) the result in browser is the same for the declared cashing time.
How to make caching every version of the page?
Thanks
Oleg
A very simple approach might be to convert your key (productID) to a string and append it to the name of your cacheItemKey.
So you might consider changing the line:
var cacheItemKey = "products";
to read:
var cacheItemKey = "products" + productID.ToString();
This should produce the behavior you are looking for -- basically mimicking a VaryByParam setup.
ps. Please keep in mind I have not added any sort of defensive code, which you should do.
Hope that helps.
I have this bit of code that does not work because Entity Framework doesn't recognize the CreateItemDC method. CreateItemDC is a modular private method that creates a data contract for the given Item entity. I use CreateItemDC all throughout my service whenever I need to return an Item data contract, but I can't use it here. I can realize the sequence of ProjectItems into an array or enumerable because I would have to do this to all ProjectItem entities in my database as the query criteria is specified on the client and I don't have access to it here. Do I have any better options here? It seems that RIA Services is not worth the trouble. I'm really wishing I had used plain WCF with this project.
[Query]
public IQueryable<ProjectItemDC> GetProjectItems()
{
return from projectItem in ObjectContext.ProjectItems
select new ProjectItemDC
{
ID = projectItem.ID,
LibraryItem = CreateItemDC(projectItem.LibraryItem),
LibraryItemID = projectItem.LibraryItemID,
ProjectID = projectItem.ProjectID,
Quantity = projectItem.Quantity,
Width = projectItem.Width,
Height = projectItem.Height,
Depth = projectItem.Depth,
SheetMaterialID = projectItem.SheetMaterialID,
BandingMaterialID = projectItem.BandingMaterialID,
MaterialVolume = projectItem.MaterialVolume,
MaterialWeight = projectItem.MaterialWeight
};
}
P.S. I do love LINQ and E.F. though. :)
Well, if you want to go with plain WCF, you can, no problem, just change the code to
[Query(IsComposable=false)]
public IEnumerable<ProjectItemDC> GetProjectItems(string myParm1, string myParm2)
{
return from projectItem in ObjectContext.ProjectItems
select new ProjectItemDC
{
ID = projectItem.ID,
LibraryItem = CreateItemDC(projectItem.LibraryItem),
LibraryItemID = projectItem.LibraryItemID,
ProjectID = projectItem.ProjectID,
Quantity = projectItem.Quantity,
Width = projectItem.Width,
Height = projectItem.Height,
Depth = projectItem.Depth,
SheetMaterialID = projectItem.SheetMaterialID,
BandingMaterialID = projectItem.BandingMaterialID,
MaterialVolume = projectItem.MaterialVolume,
MaterialWeight = projectItem.MaterialWeight
}.ToArray();
}
write your own filtering/sorting logic and you're done.
Yes, you've lost WCF Ria Services dynamic query capabilities, but this is pretty much what you get with plain old WCF, isnt'it ?
If you instead need WCF Ria dynamic sorting/filtering/grouping you must take some additional steps, involving the visit of the Expression that WCF Ria Services create for you.
HTH
You can call ToArray() against ObjectContext.ProjectItems to force EF to load all the items, however, your query will no longer be composable on the client.
[Query]
public IQueryable<ProjectItemDC> GetProjectItems()
{
return from projectItem in ObjectContext.ProjectItems.ToArray()
select new ProjectItemDC
{
ID = projectItem.ID,
LibraryItem = CreateItemDC(projectItem.LibraryItem),
LibraryItemID = projectItem.LibraryItemID,
ProjectID = projectItem.ProjectID,
Quantity = projectItem.Quantity,
Width = projectItem.Width,
Height = projectItem.Height,
Depth = projectItem.Depth,
SheetMaterialID = projectItem.SheetMaterialID,
BandingMaterialID = projectItem.BandingMaterialID,
MaterialVolume = projectItem.MaterialVolume,
MaterialWeight = projectItem.MaterialWeight
};
}
Edit:
As mentioned in your comment, it gets all of the data out of the database at once which is not ideal. In order to create the LibraryItem with your private method, you cannot compose the query on the client. Instead, you should filter within the query method and then create the array.
[Query]
public IQueryable<ProjectItemDC> GetProjectItems(int id, string filter, object blah)
{
var projectItems = ObjectContext.ProjectItems.Where(...).ToArray();
return projectItems.Select(projectItem => new ProjectItemDC{...};
}