404 Not Found when using HtmlUnit - htmlunit

I have the following code:
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://www.myland.co.il/%D7%9E%D7%97%D7%A9%D7%91-%D7%94%D7%A9%D7%A7%D7%99%D7%94");
The code fails with com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 404 Not Found for http://www.myland.co.il/Scripts/swfobject_modified.js
I do see in the console output the HTML page I am interested in. Is there a way to supress the exception and get an Html page after all? The page does load correctly in a real browser.

Yes, you can use setThrowExceptionOnFailingStatusCode to ignore failing status codes, something like;
WebClient webClient = new WebClient();
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = webClient.getPage("http://www.myland.co.il/%D7%9E%D7%97%D7%A9%D7%91-%D7%94%D7%A9%D7%A7%D7%99%D7%94");
The default is normally true, which gives the error you're describing.
EDIT: Just in case you're running an old version, with versions of HtmlUnit earlier than 2.11, setThrowExceptionOnFailingStatusCode can be called on the WebClient itself instead of the options returned by getOptions(). In 2.11 or later, you should use getOptions() as above.

Related

WebRequest to Shoutcast throws "invalid or unrecognized response" exception

I'm trying to read metadata from shoutcast stream using ASP.net WebAPI, .net core 2.1. All I need is the headers and not the audio data.
I found out that Shoutcast servers 2+ give stats xml page, but for compatibility reasons, I need to work this out so I can support v1 too. The /7.html does not give the title and genre.
Following is piece of relevant code:
HttpWebRequest request = null;
HttpWebResponse response = null;
request = (HttpWebRequest)WebRequest.Create(server);
request.Headers.Clear();
request.Headers.Add("GET", "/ HTTP/1.0");
// needed to receive metadata information
request.Headers.Add("Icy-MetaData", "1");
request.UserAgent = "WinampMPEG/5.09";
response = (HttpWebResponse)request.GetResponse();
info.Title=response.Headers["icy-name"];
info.Genre=response.Headers["icy-genre"];
When I run this code on IIS Express, or publish and run on IIS, I get this error:
An unhandled exception occurred while processing the request.
HttpRequestException: The server returned an invalid or unrecognized response.
System.Net.Http.HttpConnection.ThrowInvalidHttpResponse()
WebException: The server returned an invalid or unrecognized response.
System.Net.HttpWebRequest.GetResponse()
I have tried WebClient and StreamReader as well but the issue seems to be consistent.
However I tried the same code in Console Application and it seems to work just fine.
How can I get this to work through a WebAPI on IIS?
I had same issue. I used HttpClient and PostAsync to fix the issue.
It is to note that I initially tried my code with dot net framework rather than dot net core and it was working fine, so I think HttpWebResponse is not compatible with dot net core in some cases.
Have a look on below links, if using HttpClient still does not fix your issue. I hope it helps.
https://github.com/dotnet/corefx/issues/14897
https://github.com/dotnet/corefx/issues/30040

DownloadString works for HTTP but not for HTTPS

I have webclient call in my SSIS package which call an API to get JSON response.
using (var mySSISWebClient = new System.Net.WebClient())
{
mySSISWebClient.Headers[HttpRequestHeader.Accept] = "application/json";
var result = mySSISWebClient.DownloadString(jsonURLwithDate);
}
When I change the URL to use HTTPS, the package is failing at downloadstring call with the following error.
Download failed: The underlying connection was closed: An unexpected error occurred on a send.
It is working fine when I have the URL with HTTP. I am working on it in SSDT 2010. Please help me to resolve this.
Thanks
I think you might need to add the following code to implement TLS1.2:
ServicePointManager.SecurityProtocol = (SecurityProtocolType)3072;
based on this article
https://blogs.perficient.com/microsoft/2016/04/tsl-1-2-and-net-support/

How to forward to an external url?

My Spring based app is running under http://localhost. Another app is running under http://localhost:88. I need to achieve the following: when a user opens http://localhost/page, a content of http://localhost:88/content should be shown.
I've supposed, that I should use forwarding, like shown bellow:
#RequestMapping("/page")
public String handleUriPage() {
return "forward:http://localhost:88/content";
}
but seems like forwarding to an external URL doesn't work.
How can I achieve this behaviour with Spring?
Firstly, you specify that you want to show the content of "http://localhost:88/content" but you actually forward to "http://localhost:88" in your method.
Nevertheless, forward works with relative URLs only (served by other controllers of the same application), so you should use 'redirect:' instead.
Forward happens entirely on the server side: the Servlet container forwards the same request to the target URL, so the URL won't change in the address bar.
Redirect, on the other hand, will cause the server to respond with 302 and the Location header set to the new URL, after which the client browser will make a separate request to it, changing the URL in the address bar, of course.
UPDATE: For returning the content of the external page as it would be an internal one, I would write a separate controller method to make the request to the URL and just return its content. Something like the following:
#RequestMapping(value = "/external", produces = MediaType.TEXT_HTML_VALUE)
public void getExternalPage(#RequestParam("url") String url, HttpServletResponse response) throws IOException {
HttpClient client = HttpClients.createDefault();
HttpGet request = new HttpGet(url);
HttpResponse response1 = client.execute(request);
response.setContentType("text/html");
ByteStreams.copy(response1.getEntity().getContent(), response.getOutputStream());
}
Of course, you have many possible solutions. Here I used Apache Commons HttpClient for making the request, and Google's Guava for copying the response from that request to the resulting one.
After that, your return statement would change to the following:
return "forward:/external?url=http%3A%2F%2Flocalhost%3A88%2Fcontent"
Note how you need to encode your URL given as parameter.

HTMLUnit does not process jsonp request

I am trying to crawl my GWT app with HTMLUnit, but for a certain page the desired content is not returned. The GWT page contains a dynamically added javascript which makes a jsonp request to a gae server. I already debugged the server code, and the breakpoint is hit, but at this time the htmlunit code is already finished and the returned content is not complete.
I almost tried all suggested solutions available in stackoverflow, but without any success.
Here is the jsonp request.
http://30.tripstorekrabi.appspot.com/activity?&callback=__gwt_jsonp__.P0.onSuccess
On other pages I use exactly the same kind of call, and there it works fine.
Can anyone help me?
I found a workaround in my GWT code:
Now the jsonp request is executed in a deferred scheduled command:
Scheduler.get().scheduleDeferred(new ScheduledCommand() {
#Override
public void execute() {
activityRegistry.loadActivities(new AsyncCallback<Result>() {
}
Now the javascript function is processed from htmlunit and the desired content is showed.

Using WebClient.UploadStringAsync with GET data

I'm trying to use WebClient.UploadStringAsync method to send some data to server. It works fine when I send POST data, but when using GET, it throws me an error "An exception occurred during a WebClient request."
Here is my code:
WebClient client = new WebClient();
String data = "param1=value1&param2=value2";
client.UploadStringAsync(new Uri("http://somesite.com"), "GET", data);
Any idea what's going wrong?
Don't use UploadStringAsync for GET. There is DownloadStringAsync designed specifically for that.
Don't use WebClient because it is bound to the UI thread. Use HttpWebRequest instead.
Uploading data for a GET breaks convention.
You might also want to take a look at HTTPClient which you can install via NuGet.

Resources