I'm working on an application in C# that goes to a website and gets some content out of a table. It's working fine, but here is the problem: the table that I'm getting the content of changes as I select a different value in a combobox. The Xpath that I use always gets the table that is first shown on the website and I don't know how to get the other ones. I'm posting here everything I think is useful for you to help me.
The webpage is:
http://br.soccerway.com/national/brazil/serie-a/2012/regular-season/
xpath/C# code:
HtmlNodeCollection no2 = doc.DocumentNode
.SelectNodes("//*[#id='page_competition_1_block_competition_matches_summary_6']/div[2]/table/tbody/tr/td[#class='team team-a ' or #class='date no-repetition' or #class='score-time score' or #class='team team-b ']");
On the website, you have to click on the "Por semana de jogo" option, right above the scores, for the combobox to be visible.
I need to get all the scores from all the tables, not just the one that appears.
So when you select a game week from the drop down (or click the "anterior" or "proximo" links above the drop down), the JavaScript in the page makes a call to the server to get the data for the selected game week. It just sends a URL to the server via GET.
The data is returned in the form of a JSON object, and inside this object is the table HTML. This HTML is loaded into the DOM in the right place and presto, the browser displays the data for that week.
It is a bit of work to get this programmatically, but it can be done. What you can do is determine what the URL is for each week. Hopefully, most of the query strings are constant except for the week in question. So you will have a boilerplate URL that you tweak for the week you want, and send it off to the server. You get the JSON back and parse out the table HTML. Then, you're golden: you just feed that HTML into the Agility Pack and work with it as usual.
I did a little investigation, and using Chrome's Developer Tools, in the Network tab, I found that when I selected a game week, the URL that is sent off to the server looks like so (this is for week 14):
http://br.soccerway.com/a/block_competition_matches_summary?block_id=page_competition_1_block_competition_matches_summary_6&callback_params=%7B%22page%22%3A%229%22%2C%22round_id%22%3A%2217449%22%2C%22outgroup%22%3A%22%22%2C%22view%22%3A%221%22%7D&action=changePage¶ms=%7B%22page%22%3A13%7D
(Note that you can also use other tools, such as Firebug in FireFox or Fiddler to get the URL).
By trying other weeks and comparing, it looks like the (selected week - 1) is found in near the end in the params query string: "...%3A13...". So for week 15 you'd use "...%3A14...". Fortunately it looks like there is only one more area of difference among the URLs for different weeks and it is in the callback_params query string. Unfortunately, I wasn't able to figure out how it connects to the selected week, but hopefully you can.
So when you feed that URL into your browser, you get back the JSON block. If you search for "<table" and "/table>" you'll see the HTML that you want. In your C# code, you can just use a simple regular expression to parse it out of the JSON string:
string json = "..." // load the JSON string here
RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;
Regex regx = new Regex( "(?<theTable><table.*/table>)", options );
Match match = regx.Match( json );
if ( match.Success ) {
string tableHtml = match.Groups["theTable"].Value;
}
Feed the HTML string into the Agility Pack and you should be on your way.
Related
I am using SoapUI Pro 4.5.2 to read data from spreadsheets, put them into Soap requests to my web service, and get responses back to write to a spreadsheet. It's working.
I have two fields in the input data pertinent to my question:
Middle name is defined as a string of 4 characters, and as minOccurs = 1 and maxOccurs = 1.
Postal extension code (the 4-digit number that is optional after the normal 5-digit code) is defined as a string of 4 digits. This field is optional, so it is marked as minOccurs=0 and maxOccurs=1.
When I use the SoapUI UI interface to send a request, this works fine; if there's no value for middle name, SoapUI generates an empty tag and sends it (I guess because of the minOccurs=1). If there's no value for PostalCode, it does not send any tag at all (I guess because of minOccurs=0).
When SoapUI reads data from an Excel spreadsheet, however, the response to the same data is an error indicating that the extended postal code value of '' is not legal, because it must be 4 digits. It appears that SoapUI generates an empty tag for the extended postal code when reading data from the spreadsheet, and sends it.
I found the "Remove Empty Content" option for SoapUI requests, default to false. I set it to true, and now get an error back from validation indicating that middle name is required but not found. I'm guessing that the remove empty content removed all the empty content (reasonable enough), and middle name has to be there, even if empty, because of the minOccurs=1.
Do I have any way out of this tail-chasing problem? I suppose I'm looking for something like a conditional for the output of the postal extension code, so I can eliminate it if it's empty, even if reading values from the spreadsheet.
I am also curious if there are XSD fixes, but I greatly prefer a fix that doesn't involve changing the XSD -- that becomes a political matter.
EDIT FOR DETAIL:
To put input into the request: I have used the SoapUI UI to choose "properties" from the input spreadsheet for each of the input fields; when that's done, one ends up with values in the request fields like:
${SpreadsheetInput#FrstNm}
Where SpreadsheetInput is the name of the datasource step reading the spreadsheet, and FrstNm is one of the properties. I do this with the "Get Data" option off the popup menu you get by right-clicking the request input field, but there may be other ways.
So first your problem:
Remember that internally to SoapUI almost everything is a string. Doing something like:
<postCode>${SpreadsheetInput#PostCode}</postCode>
in your SOAP request, assuming PostCode is either blank or does not exist outright, will expand to:
<postCode></postCode>
and SoapUI will even optimize it to:
</postCode>
Then your validation kicks in, which says you do not need to provide this element, but if you do, it had better be 4-characters long. Which the above fails.
The solution:
You need to pragmatically (meaning you will have to write Groovy code) create this node in your request. There are several ways to handle this. The quick and dirty is with a Groovy step, that goes something like:
def postCode = context.expand('${SpreadsheetInput#PostCode}').trim()
if (postCode != null && postCode != '')
testRunner.testCase.setpropertyValue("postCodeNode", "<postCode>" + postCode + "</postCode>")
else
testRunner.testCase.setpropertyValue("postCodeNode", "")
Then in your request replace the original:
<postCode>${SpreadsheetInput#PostCode}</postCode>
with just:
${#TestCase#postCodeNode}
Notice, the XML node elements are part of the SoapUI property! Again: everything in SoapUI is just a plain string.
If you want something more hard-core, have a look at
dynamically create elements in a SoapUI request. This is mine.
I'm trying to use a button which opens an email with pre populated information. But require variables from page elements.
so for example P45_DATE get the update date and P45_DATA gets the data.
I tried different element identifiers like : , & or #. but if used nothing gets return upto the first identifier.
mailto:test#test.com&cc=someoneelse#test.com?Subject=Extension report for &P45_DATE. &body=Please see Extension below. &P45_DATA
Is this even possible
Oracle 11g2
apex 4.2.5.00.08
many thanks
Depends on where you are defining this string and if you want client or session state values.
If as part of JavaScript expression, you might use something like:
'mailto:test#test.com&cc=someoneelse#test.com?Subject=Extension report for '+$v('P45_DATE')+'&body=Please see Extension below. '+$v('P45_DATA')
Bear in mind this isn't escaping the data. Also check to see if any errors are appearing in the JavaScript console.
Basically in a nut shell I'm using AJAX to pass through say 30 items in an infinite scroll/masonry setup. In my AJAX file, one of the lines basically is requesting the WEB URL + the Unique Key for the page + the Disqus Hash Tag at the end to get the comment count for the page and to display this on the index. E.g. it looks like this
http://url.com?id=".$row['pkey']."#disqus_thread\">
For some reason, I'm getting mixed results. Sometimes, I will get results for all of the items as I scroll down and they load with masonry. Other times I will get no comment count and it will simply be blank for all of the items loaded via masonry. It's very sporadic and doesn't always pass the value through.
I know I could use the API and do some more complicated things, but for my purpose I'm simply looking to get a post count for the relevant URL and display it, that's all.
Any tips on why this passes and other times it does not? Everything else in my page seems in order.
Is there an existing library that would do this?
I want to be able to have code on the client side where the user chooses something, it makes a call to the server, and the server sends back "for this option, you need a have a text field called foo and a select field called bar with the following options, this one is selected, etc", and then the client side builds the next part of the form from that information. Or if they choose a different option, a different set of fields and values is returned from the server and populated on the screen. Also it might cascade so after the first selection we need a select field with some options, and then depending what they select on that select field the next field might be another select field or it might be a text input field.
Has anybody done anything like that? Is my best choice to have the AJAX call return some html that I just stuff into a div, or can I do it field by field and value by value?
If it matters, the back end is going to be written in Perl/MASON, and the front end will be using Javascript/JQuery/JQuery-UI.
I would use jquery and submit AJAX calls to whatever backend system you choose. Have this backend system compute the necessary changes and return the info as JSON. Let JQuery parse it for you and append the necessary form elements. However, it seems like under alot of use cases these decisions could be made on the client side without even talking to the server just as we pre validate form input before allowing posting to the server. I don't, however, have your requirements in front of me so I am sure there is a reason you want to get the info back from the server.
P.S. please do not return pure html from the back end to the client....ever.
Web programmer here - using AJAX (HTML, CSS, JavaScript, AJAX, PHP, MySQL), but for some reason Internet Explorer is acting up (surprise surprise).
AJAX is updating query results on the HTML page, via a PHP script that queries a MySQL Database.
Everything is working fine, except when I use Internet Explorer 8.0 .
There are several php scripts, which allow for the data to be ordered according to certain criteria, and for testing purposes I have attached the mktime field (current time, in the format HH:MM:SS) to the beginning of the results for each query.
When I use IE, these times appear to remain constant, whereas with ALL other browsers these times are correct and display the current time.
I think the issue has something to do with caching or something along those lines anyway.
Any thoughts or suggestions welcome...
Here is an article on the caching issue.
If your request is a GET change it to a POST, this will prevent the results being cached.
GET requests are cached in IE; switch it to a POST request and it won't be cached anymore.
Instead of switching to POST, which can be ugly if you're not really using it to update or create content, you should append a random number to the query string, as in http://domain.com/ajax/some-request?r=123456. If this number is unique for every request you won't have caching problems.
What I have done is, I have kept the "GET" and added new dummy query parameter to the querystring as follows,
./BaseServlet?sname=3d_motor&calcdir=20110514&dummyParam=datetime
I set dummyParam a value of date object in the javascript so that every time the url is generated browser will treat it as a new url and fetch new (fresh) results.
var d = new Date();
url = url + '&dummyParam='+d.valueOf();
So instead of generating some random numbers this is easy way!