Two method for loading html from URL? - html-agility-pack

For loading HTML from a URL, I am using the method below
public HtmlDocument DownloadSource(string url)
{
try
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
if (Task.Error == null)
Task.Error = e;
Task.Status = TaskStatuses.Error;
Done = true;
return null;
}
}
but suddenly today the code above stopped working. I discovered another method and it works correctly.
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url.ToString());
Now I just wanted to know the difference between both methods

Seems now User-Agent header is mandatory for your site.
Everything is fine with HtmlAgilityPack but you should change DownloadString(url) method. If you check the request using Fiddler, you will see that it returns 403 Forbidden:
Solution is to add any User-Agent header to the request:
using HtmlAgilityPack;
using System;
using System.Net;
class Program
{
static void Main()
{
var doc = DownloadSource("https://videohive.net/item/inspired-slideshow/21544630");
Console.ReadKey();
}
public static HtmlDocument DownloadSource(string url)
{
try
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
// exception handling here
}
return null;
}
static String DownloadString(String url)
{
WebClient client = new WebClient();
client.Headers.Add("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:x.x.x) Gecko/20041107 Firefox/x.x");
return client.DownloadString(url);
}
}

Related

Get HttpHeaders from HttpRequestException?

I have a Web API, When the incoming request is not valid then the API sends back a HttpStatusCode.BadRequest and API would also add a CorrelationId into Response's HttpHeader. Something like below
public class ValidateRequestAttribute : ActionFilterAttribute
{
public ValidateRequestAttribute()
{
}
public override void OnActionExecuting(ActionExecutingContext context)
{
if (context.ModelState.IsValid == false)
{
context.HttpContext.Response.StatusCode = (int)HttpStatusCode.BadRequest;
context.HttpContext.Response.Headers.Add("x-correlationid", "someid");
context.Result = new ContentResult()
{
Content = "bad request."
};
}
}
}
On client side im using HttpClient to access the API. I am not sure how client would retrieve HttpStatusCode and HttpHeader here. Here is my client code
public bool Process(url)
{
bool result = false;
try
{
Task.Run(async () => await _httpClient.GetStringAsync(url).ConfigureAwait(false)).Result;
}
catch (Exception ex)
{
if(ex is AggregateException)
{
var aggregateException = ex as AggregateException;
foreach(var innerException in aggregateException.InnerExceptions)
{
if (innerException is HttpRequestException)
{
var httpRequestException = innerException as HttpRequestException;
// how do i get StatusCode and HttpHeader values here??
}
}
}
}
return result;
}
I have already gone through SO post here and MSDN article here and also Stephen Cleary's article here
Even though its recommended to make async all the way down, I this case Client and API are both disconnected from each other and client is synchronous. Note that Client's Process method is synchronous method.
Like this:
public bool Process(string url)
{
var result = _httpClient.GetAsync(url).ConfigureAwait(false).GetAwaiter().GetResult();
if (result.StatusCode == HttpStatusCode.BadRequest)
{
IEnumerable<string> values;
if (result.Headers.TryGetValues("x-correlationid", out values))
{
// Should print out "someid"
Console.WriteLine(values.First());
}
}
return result.IsSuccessStatusCode;
}
Also note that doing .GetAwaiter().GetResult(); vs .Result; is recommended since it makes the code easier to work with because it does not throw an AggregateException.
If you want to read the response content as a string just do:
var content = result.Content.ReadAsStringAsync().ConfigureAwait(false).GetAwaiter().GetResult();
If you want to make your code async though you should use the async/await keyword and skip the .GetAwaiter().GetResult();.

Connect to Server using xamarin forms

I'm developing Xamarin Cross platform application. I'm trying to connect to server (http://test.net/login/clientlogin), I need to send these fields (password = "xyz"; platform = iphone; (useremail) = "test#test.com";) along with the request. So that server will check these parameters and returns XML. But we don't know how to add these fields to the request.
When i open the above string url (http://*****/login/clientlogin) i am getting login screen, with in that we have username, password and platform text fields.
Thanks in advance!!!..
This should get you started presuming you are adding the values as headers in the request:
public class TestClient
{
HttpClient client;
public TestClient(){
this.client = new HttpClient ();
}
public void AddHeadersAndGet(){
client.DefaultRequestHeaders.Add ("username", "whatevertheusernameis");
this.GetAsync<WhateverObjectTypeYouAreReceiving> ("theurloftheservice");
}
public async Task<T> GetAsync<T>(string address){
HttpResponseMessage response = null;
response = await client.GetAsync (address);
if (response.IsSuccessStatusCode) {
try {
var responseString = await response.Content.ReadAsStringAsync ();
return new T (Serializer.DeserializeObject<T> (responseString),
response.StatusCode);
} catch (Exception ex) {
}
} else {
}
}
}
The key line for you is:
client.DefaultRequestHeaders.Add ("username", "whatevertheusernameis");

Xamarin http webservice issue

I m trying to use http request webservice issue is that when we post wrong username and password the login service generate exception and it can't return any value in async calls.
A code snippet would help assist with the problem ...
However using a try catch should help you catch your exception and prevent application from crashing and handling the exceptions accordingly.
As seen in my sample code below I cater for the incorrect details entered / connectivity problems. I peform the http async request then parse the xml to my model handling the exceptions accordingly
var response = await WebRequestHelper.MakeAsyncRequest(url, content);
if (response.IsSuccessStatusCode == true)
{
Debug.WriteLine("Login Successfull" + "result.IsSuccessStatusCode" + response.IsSuccessStatusCode);
var result = response.Content.ReadAsStringAsync().Result;
result = result.Replace("<xml>", "<LoginResult>").Replace("</xml>", "</LoginResult>");
loginResult = XMLHelper.FromXml<LoginResult>(result);
if (loginResult != null)
{
login.Type = ResultType.OK;
login.Result = loginResult;
}
else
{
login.Type = ResultType.WrongDetails;
}
}
else
{
Debug.WriteLine("Login Failed" + "result.IsSuccessStatusCode" + response.IsSuccessStatusCode);
login.Type = ResultType.WrongDetails;
}
}
catch (Exception ex)
{
login.Type = ResultType.ConnectivityProblem;
}
Web Request
public static async Task<HttpResponseMessage> MakeAsyncRequest(string url, Dictionary<string, string> content)
{
var httpClient = new HttpClient();
httpClient.Timeout = new TimeSpan(0, 5, 0);
httpClient.BaseAddress = new Uri(url);
httpClient.DefaultRequestHeaders.TryAddWithoutValidation("Content-Type: application/x-www-form-urlencoded", "application/json");
if (content == null)
{
content = new Dictionary<string, string>();
}
var encodedContent = new FormUrlEncodedContent(content);
var result = await httpClient.PostAsync(httpClient.BaseAddress, encodedContent);
return result;
I would recommend wrapping the response in a generic ServiceResponse where you can store the exceptions. await methods can be included in try/catch blocks so the standard process can be followed.
E.G.
public async Task<ServiceResponse<T>> PostAsync<T>(String address, object dto){
var content = Serializer.SerializeObject (dto);
var response = await client.PostAsync (
address,
new StringContent (content));
if (response.IsSuccessStatusCode) {
try {
var responseString = await response.Content.ReadAsStringAsync ();
return new ServiceResponse<T> (Serializer.DeserializeObject<T> (responseString),
response.StatusCode);
} catch (Exception ex) {
return new ServiceResponse<T> (response.StatusCode, ex);
}
} else {
return new ServiceResponse<T> (response.StatusCode);
}
}
With the ServiceResponse defined as :
public class ServiceResponse<T>
{
public HttpStatusCode StatusCode { get; set;}
public T Value { get; set;}
public String Content { get; set;}
public Exception Error {get;set;}
public ServiceResponse(T value, HttpStatusCode httpStatusCode){
this.Value = value;
this.StatusCode = httpStatusCode;
}
public ServiceResponse(HttpStatusCode httpStatusCode, Exception error = null){
this.StatusCode = httpStatusCode;
this.Error = error;
}
}
This will give you a clean way of managing all your HTTP responses and any errors that may occur.

MVC 3 GET Webservice and Response

I'm attempting to build a GET webservice that would from website 1 initiate a GET request...sending that request to website 2 and website two would respond by sending a list of objects. I using Json.net to serialize and deserialize the List of objects.
I've put together a POST webservice with the assistance of this question.. WebService ASP.NET MVC 3 Send and Receive
But I've been unsuccessful so far at adapting that example for my new requirement.
Here is what I have so far from website 1..
public static List<ScientificFocusArea> ScientificFocusAreas()
{
string apiURL = "http://localhost:50328/Api/GetAPI";
//Make the post
ServicePointManager.ServerCertificateValidationCallback = (sender, certificate, chain, errors) => true;
//var bytes = Encoding.Default.GetBytes(body);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(apiURL);
Stream stream = null;
try
{
request.KeepAlive = false;
request.ContentType = "application/x-www-form-urlencoded";
request.Timeout = -1;
request.Method = "GET";
}
finally
{
if (stream != null)
{
stream.Flush();
stream.Close();
}
}
List<ScientificFocusArea> listSFA = WebService.GetResponse_ScientificFocusArea(request);
return listSFA;
}
public static List<ScientificFocusArea> GetResponse_ScientificFocusArea(HttpWebRequest request)
{
List<ScientificFocusArea> listSFA = new List<ScientificFocusArea>();
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
if (response.StatusCode != HttpStatusCode.OK && response.StatusCode != HttpStatusCode.Created)
{
throw new HttpException((int)response.StatusCode, response.StatusDescription);
}
var end = string.Empty;
using (StreamReader reader = new StreamReader(responseStream))
{
end = reader.ReadToEnd();
reader.Close();
listSFA = JsonConvert.DeserializeObject<List<ScientificFocusArea>>(end);
}
response.Close();
}
}
return listSFA;
}
Then on the website 2...
public class GetAPIController : Controller
{
//
// GET: /Api/GetAPI/
[AcceptVerbs(HttpVerbs.Get)]
public ActionResult GetScientificFocusAreas()
{
//Get list of SFAs
List<ScientificFocusArea> ListSFA = CreateList.ScientificFocusArea();
string json = JsonConvert.SerializeObject(ListSFA, Formatting.Indented);
//Send the the seralized object.
return Json(json);
}
}
Also, on website 2, I've registered this route for the incoming request...
context.MapRoute(
"GetScientificFocusAreas",
"Api/GetAPI/",
new
{
controller = "GetAPI",
action = "GetScientificFocusAreas",
id = UrlParameter.Optional
}
);
I'm currently getting the error.. he remote server returned an error: (404) Not Found.
Any help would me greatly appreciated.
The problem seems like a routing issue. I would start with the RouteDebugger which can be found here. This tool gives insight into which routes your URL is hitting.
The code I use for a HTTP GET is a bit different that what you have above. It's included below.
public T Get<T>(string url)
{
try
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
using (Stream responseStream = response.GetResponseStream())
{
if (response.StatusCode != HttpStatusCode.OK && response.StatusCode != HttpStatusCode.Created)
{
throw new HttpException((int)response.StatusCode, response.StatusDescription);
}
var end = string.Empty;
using (StreamReader reader = new StreamReader(responseStream))
{
end = reader.ReadToEnd();
reader.Close();
}
responseStream.Close();
response.Close();
JsonSerializer serializer = new JsonSerializer();
serializer.Binder = new DefaultSerializationBinder();
JsonReader jsonReader = new JsonTextReader(new StringReader(end));
T deserialize = serializer.Deserialize<T>(jsonReader);
return deserialize;
}
}
catch (Exception ex)
{
throw new ApiException(string.Format("An error occured while trying to contact the API. URL: {0}", url), ex);
}
}
The other issue I see is in the GetScientificFocusAreas() method. On the second line of the code the objects are converted to JSON. Which is fine, but the last line of code the json is passed into the Json() method. Which converts the string into Json yet again. When using the JSON.Net library use the Content() method in the return instead of Json() and set the content type to application/json
The reasoning for using an external Json converter rather than the internal converter is simply the internal json converter has a few known issues. JSON.Net has been around for years and is solid.

when try to download a mp3 ;an error occur "The remote server returned an error: NotFound."

when i down load the two same link like those
a link! and http://files.sparklingclient.com/099_2010.07.09_WP7_Phones_In_The_Wild.mp3
they all can be downloded by IE .but when i download in wp7 the laster can be downloaded the first show an error ""The remote server returned an error: NotFound.""
i don't konw why .is webURL is not suited for wp7?
private void Button_Click(object sender, RoutedEventArgs e)
{
stringUri = "http://upload16.music.qzone.soso.com/30828161.mp3";
//stringUri = "http://files.sparklingclient.com/079_2009.08.20_ElementBinding.mp3";
Uri uri = new Uri(stringUri, UriKind.Absolute);
GetMusic(uri);
}
private void GetMusic(Uri uri)
{
request = WebRequest.Create(uri) as HttpWebRequest;
request.Method = "Post";
request.ContentType = "application/x-www-form-urlencoded;charset=UTF-8";
string header= request.Accept;
request.BeginGetResponse(new AsyncCallback(GetAsynResult),request);
}
void GetAsynResult(IAsyncResult result)
{
HttpWebResponse reponse = request.EndGetResponse(result) as HttpWebResponse;
if (reponse.StatusCode == HttpStatusCode.OK)
{
Stream stream=reponse.GetResponseStream();
SaveMusic(stream, "music");
ReadMusic("music");
Deployment.Current.Dispatcher.BeginInvoke(
() =>
{
me.AutoPlay = true;
me.Volume = 100;
songStream.Position = 0;
me.SetSource(songStream);
me.Play();
});
}
}
protected void SaveMusic(Stream stream,string name)
{
IsolatedStorageFile fileStorage = IsolatedStorageFile.GetUserStoreForApplication();
if (!fileStorage.DirectoryExists("Source/Music"))
{
fileStorage.CreateDirectory("Source/Music");
}
using (IsolatedStorageFileStream fileStream = IsolatedStorageFile.GetUserStoreForApplication().OpenFile("Source\\Music\\" + name + ".mp3", FileMode.Create))
{
byte[] bytes = new byte[stream.Length];
stream.Read(bytes, 0, bytes.Length);
fileStream.Write(bytes, 0, bytes.Length);
fileStream.Flush();
}
}
protected void ReadMusic(string name)
{
using (IsolatedStorageFile fileStorage = IsolatedStorageFile.GetUserStoreForApplication())
{
songStream = null;
songStream = new IsolatedStorageFileStream("Source\\Music\\" + name + ".mp3", FileMode.Open, fileStorage);
}
}
Please try to change
request.Method = "Post"
to
request.Method = "Get"
If you are running into this problem on the emulator, have you tried running Fiddler? It will intercept the HTTP requests and you can see if the call being made to the server is the one you expect.
Remember to close/reopen the emulator after you start Fiddler so that it will pick up the proxy.
The NotFound response can also occur with bad SSL certificates. That doesn't appear to be related to your problem, but something to keep in mind.

Resources