Using webclient with htmlAgilityPack on wp7 to get html generated from javascript - windows-phone-7

i want to get the time schedule on
http://www.21cineplex.com/playnow/sherlock-holmes-a-game-of-shadows,2709.htm
first,
i have tried using webclient with htmlAgilityPack and get to the table id = "table-theater" but appearently the html generated from java script so the table innetHTML is empty.
public void LoadMovieShowTime(string MovieLink)
{
WebClient MovieShowTimeclient = new WebClient();
MovieShowTimeclient.DownloadStringAsync(new Uri(MovieLink));
MovieShowTimeclient.DownloadStringCompleted += new DownloadStringCompletedEventHandler(MovieShowTimeclient_DownloadStringCompleted);
}
void MovieShowTimeclient_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(e.Result);
var node = doc.DocumentNode.Descendants("div").First()
.Elements("div").Skip(1).First()
.Elements("div").Skip(1).First()
.Element("div")
.Elements("table").FirstOrDefault(table => table.Attributes["class"].Value == "table-theater");
}
Is it possible to get the data using webclient on windows phone? or is there any pssible way to get it using another method?
second,
i have tried to get the time schedule from mobile site which is
http://m.21cineplex.com/gui.list_schedule?sid=&movie_id=11SHGO&find_by=1&order=1
but the return ask me to enable cookies. im new to this, i find that there is a way to extend webclien ability by overriding the webRequest cookies, but cant find any reference how to use it.
thanks, for any reply and help :)

Just because the table is generated in JavaScript does not mean the WebBrowser control will not render it. Ensure that IsScriptEnabled is set to true, this will ensure that the JavaScript that renders the table is executed. You can then 'scrape' the results.

Related

Using HtmlUnit, is there a way to pause execution of Javascript, then resume?

In HtmlUnit for testing, I'm coming across a case where, on page load, it'd be useful to NOT execute the Javascript automatically, and instead wait for me to initiate and tell the Javascript to start executing?
My specific use-case is testing something which the Javascript does some tests, and then does a location replace to send the user on to another page. I want to check some headers which I'm returning for testing/validation, and then let the JS execute as usual.
My current thought is to have a flag I pass to the page when testing which will cause the JS to not automatically run, and wait until I call a JS function from within the Java code via webClient.getJavaScriptEngine().execute().
While not specifically being able to pause JavaScript before invoking, it may be worthwhile to use the WebConnectionWrapper class to inspect/modify the response data or outgoing requests, effectively giving you a chance to execute your own code before the JavaScript is invoked.
An example usage of this is as follows:
try (final WebClient webClient = new WebClient()) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
// set more options
// create a WebConnectionWrapper with an (subclassed) getResponse() impl
new WebConnectionWrapper(webClient) {
public WebResponse getResponse(WebRequest request) throws IOException {
WebResponse response = super.getResponse(request);
if (request.getUrl().toExternalForm().contains("my_url")) {
String content = response.getContentAsString();
// intercept and/or change content
WebResponseData data = new WebResponseData(content.getBytes(),
response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
response = new WebResponse(data, request, response.getLoadTime());
}
return response;
}
};
// use the client as usual
HtmlPage page = webClient.getPage(uri);
}
The above code is from the official documentation here:
How to modify the outgoing request or incoming response?
The getResponse() method that you would override is called before each request is made and also allows you to modify the WebResponse object that is passed back to WebClient for its continued processing.
Sorry but at the moment (version 2.43.0) we have no such option. Feel free to open a issue on github for this.
I guess other test tools might also benefit from this function.

How get data from javascript using HtmlUnit?

How get data from javascript using HtmlUnit ?
Title: total shoots
screen html code
public static void getElements() {
try (
final WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage("some URL");
final HtmlDivision div = page.getHtmlElementById("in-game-stats");
System.out.println(div.getTextContent());
} catch (IOException e) {
e.printStackTrace();
}
}
what else ?
First of all you have to find the script element. Because you script tag has no id attribute doing something like 'page.getHtmlElementById' is not the right way. HtmlUnit offers many different ways for find elements. As starting point have a look at the documentation (http://htmlunit.sourceforge.net/gettingStarted.html).
Next step will be to get the javascript from the HtmlStript. If the script code is embedded inside the script tag you can simply use asXml().
There is no direct method to process javascript objects , but you can parse tag and have it parsed by self.
Use (HtmlElement).asText();
to extract data from div tag.

Modify User-Agent for Windows Phone WebBrowser Control

We have a WebBrowser embedded in our Windows Phone 7x application. This WebBrowser is pointed at our web servers. We need to be able to differentiate between a request coming from the app and a request coming from the native browser (or a WebBrowser embedded in another app, for instance). To do this we'd like to modify the User-Agent of all HTTP requests coming from said WebBrowser.
However, I can't find a way to do this. My initial thought was simply to override the Navigate functions adding "additionalHeaders." Unfortunately the WebBrowser class is sealed, so that option wasn't an option at all. I've searched high and low for a property or handler that's exposed that I might be able to take advantage of to no avail.
So, in short, is there a way to modify the User-Agent for a WebBrowser for all outbound HTTP requests?
I know this question is old, but in case this is of use to anyone, you could always use this for the WebBrowser's navigating event:
void wb_Navigating(object sender, NavigatingEventArgs e)
{
if (!e.Uri.ToString().Contains("!!!"))
{
e.Cancel = true;
string url = e.Uri.ToString();
if (url.Contains("?"))
url = url + "&!!!";
else
url = url + "?!!!";
wb.Navigate(new Uri(url), null, "User-Agent: " + "Your User Agent");
}
}
You just add "!!!" to all the urls for navigations that have your custom user agent. If the URL doesn't contain "!!!", it is a request from a clicked link and the WebBrowser cancels the navigation, and re-navigates with your custom user agent and "!!!" in the query string.
I tried a similar approach to msbg, where you store the URL in memory to avoid double checking it, and avoid modifying it with !!!. However, that approach doesn't preserve POST data, so it won't help me.
List<string> recentlyRequestedUrls = new List<string>();
void wb_Navigating(object sender, NavigatingEventArgs e)
{
if(!recentlyRequestedUrls.Contains(e.Uri.ToString()))
{
//new request, reinitiate it ourselves and save that we did to avoid infinite loop.
e.Cancel = true;
string url = e.Uri.ToString();
recentlyRequestedUrls.Add(url);
webBrowser1.Navigate(new Uri(url), null, "User-Agent: Your_User_Agent");
}
}
Set the user agent through additional headers, when invoking the Navigate method. Details here.

Failing to extract content by xpath using HtmlUnit

I'm trying to extract the title from this Maltese news page
http://www.maltarightnow.com/Default.asp?module=news&at=Inawgurat+%26%23289%3Bnien+%26%23289%3Bdid+f%27Marsalforn&t=a&aid=99839603&cid=19
using the following XPath
html/body/table/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[1]/td[1]/table/tbody/tr/td/table/tbody/tr/td[2]/table[3]/tbody/tr[1]/td/h1
(Ain't pretty but this Xpath was generated by Chrome and makes sense since there's a lack of element Ids).
I'm extracting the title programatically using HTMLUnit in Java. Here's the code. I've extracted news content and article date using the same code (obviously with a different xpath).
public static void main (String[] args) {
WebClient webClient = new WebClient();
HtmlPage page = null;
try {
page = webClient.getPage("http://www.maltarightnow.com/?module=news&at=Inawgurat+%26%23289%3Bnien+%26%23289%3Bdid+f%27Marsalforn&t=a&aid=99839603&cid=19");
} catch (FailingHttpStatusCodeException | IOException e) {
}
String text = ((DomElement)page.getFirstByXPath("html/body/table/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[1]/td[1]/table/tbody/tr/td/table/tbody/tr/td[2]/table[3]/tbody/tr[1]/td/h1")).asText();
System.out.println(text);
}
However it's giving a null pointer for the mentioned xpath in
((DomElement)page.getFirstByXPath("html/body/table/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[1]/td[1]/table/tbody/tr/td/table/tbody/tr/td[2]/table[3]/tbody/tr[1]/td/h1")).asText();
The DomElement is not being found and I'm sure it's there, Chrome created the XPath after all.
What could be the cause of this?
Thanks in advance
It is not that easy. You should:
See the text HTMLUnit is actually creating with Page.asXml()
Correct the XPath you're traversing to match whatever HTMLUnit is outputting in the previous step

BackgroundTransferRequest WP7

I am using the Background Transfer to upload Photographs to my Web Service. As the Photograph uploads can consume significant time and memory, I thought it might be a nice idea to use the background transfer request to accomplish this. After the photo is uploaded, I want to obtain the Id of the uploaded photo and then use it for post-processing. However, it turns out I can't do that in a background transfer request.
Per my understanding, Background Transfer works using the following logic ONLY:
You have to obtain the file you want to upload and then save/copy it to your app's Isolated Storage under the folder: shared/transfers. This is extremely important. Apparently, using file in a different location didn't work for me. Maybe it isn't the shared/transfers as much as it is a 'relative' path. But I would stick to the same conventions.
After you have saved the file in that location, your background request can be created based on that. It doesn't look like you can pass POST CONTENT other than the file contents, so any other parameters like file name, mime type etc. will need to be passed as QUERY String parameters only. I can understand this, but it would've been nice if I could pass both as POST Content. I don't think HTTP has a limitation on how this works.
Here is some code for creating a request using Hammock:
string url = App.ZineServiceAuthority + "articles/save-blob?ContainerName={0}&MimeType={1}&ZineId={2}&Notes={3}&IsPrivate={4}&FileName={5}";
url = String.Format(url, userId, "image/jpg", ZineId, txtStatus.Text, true, UploadFileName);
var btr = new BackgroundTransferRequest(new Uri(url, UriKind.Absolute));
btr.TransferPreferences = TransferPreferences.AllowCellularAndBattery;
btr.Method = "POST";
btr.Headers.Add("token", IsolatedStorageHelper.GetTravzineToken());
btr.UploadLocation = new Uri(#"/shared\transfers/" + UploadFileName, UriKind.Relative);
btr.TransferStatusChanged += new EventHandler<BackgroundTransferEventArgs>(btr_TransferStatusChanged);
btr.TransferProgressChanged += new EventHandler<BackgroundTransferEventArgs>(btr_TransferProgressChanged);
BackgroundTransferService.Add(btr);
In my case, I am literally passing all the necessary parameters using the query string. On a successful save, my Web Service returns back the Id of the Photo I just uploaded. However:
There is NO way (or at least I know of) to obtain and evaluate the RESPONSE. The Background Transfer Request Event handlers do not expose a RESPONSE.
Here are my event handlers:
void btr_TransferProgressChanged(object sender, BackgroundTransferEventArgs e)
{
bool isUploading = e.Request.TotalBytesToSend > 0 ? true : false;
lblStatus.Text = isUploading ? "Uploading" + e.Request.BytesSent.ToString() + " sent" : "Done";
}
void btr_TransferStatusChanged(object sender, BackgroundTransferEventArgs e)
{
if (e.Request.TransferStatus == TransferStatus.Completed)
{
using (IsolatedStorageFile iso =
IsolatedStorageFile.GetUserStoreForApplication())
{
if (iso.FileExists(e.Request.UploadLocation.OriginalString))
iso.DeleteFile(e.Request.UploadLocation.OriginalString);
}
BackgroundTransferService.Remove(e.Request);
if (null != e.Request.TransferError)
{
MessageBox.Show(e.Request.TransferError.Message);
}
else
{
lblStatus.Text = "Done baby done";
}
}
}
So now my question is, how does anyone do any sort of POST Processing in such scenarios?
Can anyone please tell me the line of thought behind designing such an inflexible class?
Any thoughts on how I could get around this issue would be appreciated.
Also, does anyone have any working examples of a homegrown BackgroundTransfer?
Haven't tried it but why not set a download location like this:
btr.DownloadLocation = "myDownloadFile.html";
btr.UploadLocation = "myUploadFile.jpg";
...
If the request is completed read the file "myDownloadFile.html" where your response has been stored and delete it afterwards.

Resources