?_escaped_fragment_= - headless browser - ajax

what do I have to do to add a ?_escaped_fragment_= support to my server? I want google to be able to crawl through my ajax site. My hashes are already in #! form
But I have no idea how to tell my server that when I enter mywebsite.com/?_escaped_fragment_=section to my browser so the url mywebsite.com/section and it would be equal to mywebsite.com/#!
thanks

Simple answer - my method (soon to be used for a site with ca. 50,000 AJAX-generated URLs) is to have a node.js server using a headless environment (try zombie, phantomjs, or any other) to load the site, making sure it's able to execute javascript and read the DOM - then at runtime, if it's google requesting the fragment, fire a request to the node.js server, which loads the site, executes the javascript, waits for the response, and delivers back the HTML, which is output to the browser.
If that sounds like a lot of work - I'm about 90% finished on the code that does it all for you, where you'd simply drop one line of (PHP) code at the top of your site/app and it does the rest for you, using a remote node.js server.
The code will be open source so if you want to set it up yourself on a node server, you can - or if it's a PITA to set it up yourself, I'll probably have a live server up and running which your app/website would fire ?_escaped_fragment_ requests to, and get the html snapshot back. It also implements caching so that these are only requested once every X days.
Watch this space - just got a few kinks to work out, and it'll be on my site (josscrowcroft.com) and I'll put it in a github repo too.

Related

What is the proper way to benchmarking a simple PHP-MySQL website with automated test cases

I am a novice in the area of benchmarking and so would like to request for your guidance.
Problem: I have a test website developed in PHP and MySQL hosted in the localhost.
I need to perform the following set of activities:-
Login as a registered user
Download a PDF file
I wish to know how to load test the above activities in order? I need to check if at a particular instant, 'n' number of users are logged in and they download a pdf file, what would be the worst response time and related stats.
Steps I already did (Please correct me if I did something wrong here.):-
Used the apache benchmarking tool (ab) to load test the login authentication script page passing the username and password as parameters
(i.e., ab -n 1000 -c 100 -A username:password url_of_script.php)
I tested both for apache and nginx webservers (got comparatively better results in nginx)
But, I want to test if after login, the user performs some other activities, how can we use the ab (or some other) tool to assess the load.
Waiting for your responses. Thanks.
Create a PHP script using curl.
Use your Browser to login and and downlaod the PDF.
Before you do that, right click and select Inspect or Inspect Element. Go to the Network tab. Then start the login and download PDF process.
In the Network Tab look only at the request headers of each HTML page request. Filter out all the other requests (e.g. JS CSS images, media and etc.). You can use these headers as a guideline when setting up curl to do each request.
In FireFox you can edit the headers and resend. Go in to the edit mode and copy the request header.
In your curl requests use exactly the the same request the Browser used.
Curl reports all the stats on the request and response.

MVC 3 with IE, poor bundle performance

We have deployed an MVC 3 website on an IIS6 box.
Everything runs fine, but the performance is abysmal.
Can anyone help me understand
why am I getting 20 second response times to get a script bundle?
why bundled scripts are not cached by IE even if the Expires header is set?
The site is several times faster in Chrome (I have noticed the cache behaviour is correct), but we cannot force customers to use it.
Any help would be great. I'm kind of wondering if it's a server-side setting that's forcing the bundle recompilation each request, or if it's just IE acting like usual.
Edit: as per comments request, I'm including also the bundle request headers:
If you have different download times for a full reload between the two browsers it could be that you are doing intense computations with a client side framework like angularjs (I have seen big performance differences from highly complex angularjs apps between the two browsers).
If both your browsers show the same download time, it is either a network issue, or a server issue.
The IE caching could be a separate issue, break your problem into two parts - look for the cause of the slow downloads first.
All I can do now is suggest an approach to finding the issue.
Summary of what you know
It looks like you have:
Server sends an Expires header one year from now
When you reload the page (i.e. you don't force a full refresh using Ctrl+F5)
IE doesn't take any notice of the cache header, and when it sends it's new request it doesn't use If-Modified-Since or If-None-Match
Chrome behaves differently and respects the Expires and/or ETag response headers (it doesn't even make the request again for the bundle).
EDIT 1: You also seem to be saying (though it would be good to see a timeline from chrome) that Chrome downloads the files faster, implying it is not a server-side problem. Your latest comment states that Chrome's downloads are also slow. (end edit)
And you also seem to be saying that this behaviour is consistent (i.e. 100 requests in IE, and 100 requests in Chrome show the above behaviour with no deviations).
Approach
You should break this problem into two parts:
Why is the download so slow?
Is there a server-side performance problem? Look for common download times in IE and Chrome, and Firefox (it could be due to bundling/minification/compression on the server).
Is there a network connectivity issue (dropped packets, for instance)? Look for inconsistent download times, Start times, Request times, between requests in a given browser and the same behaviour across all browsers.
Is a script slowing down IE, but not Chrome (this is not uncommon, I maintain legacy sites where the scripts don't run well in IE but do in Chrome) - look at different profile results between browsers.
Why is the javascript not being cached in IE? Troubleshoot (1) first, then worry about this.
It is possible that the two are related, but approaching them separately will be a start. Number 1 is far easier to diagnose that 2, the top references to caching javascript in IE on the web are to prevent it in order to help with development.
Root cause diagnosis
EDIT 1 The first thing to do is try the site from a browser on the server, or very close to the server to see if you have a network issue. (end edit)
Tools like Fiddler, the browser developer tools, timeline and script profiler, and YSlow are your friend. Compare each of the following between Chrome and IE (and see what happens in Firefox as well) and spot the difference. Note: you may need to clear the browser cache between tests.
browser developer tools -> script profile: see if you have a slow running script in IE compared to Chrome
similar analysis in a tool like YSlow (look for comparisons between the two browsers, not script improvements)
request and response headers, and timeline from a normal (i.e. not full reload) page load
request and response headers, and timeline from a full page reload (Ctrl+F5)
Start and Request durations for every js file for a given browser, and between browsers (this may point to network issues)? I note that the Start and Request alone are taking 0.6s and 1s each in IE - that is very very poor performance.
5 requests, and 5 full reloads with cache clearing between (that is, don't chase a ghost - be consistent in your test methodology)
Download times should be no different between Chrome and IE with no scripts actually running so also add a control test. Assuming that your bundle files don't "do anything" (i.e. they contain functions that the page calls rather than kicking off long processes by themselves) then create a blank page in your site which references exactly the same javascript files - not just the bundle, but every single js reference.
With the control test you can compare pure download times and caching behaviour in IE to Chrome, without any client side javascript running (use the developer tools profiler to verify no scripts are running). If your bundle files do kick off long running things, just temporarily disable those things by putting return statements at the top of the script and concentrate only on the download into the browser.

Cross site scripting(XSS)

I am loading content from another page and depending on the content of page, changing content of my page and this is giving me cross site scripting issues.
When i use iframe, since the content is from other domain, content of iframe becomes inaccessible.
When i use ajax and try to inject the content as plain html code, XmlHttpRequest object throws permission denied exception due to cross site scripting.
When i use JSONP, such as getJSON in JQuery, it only supports GET protocol and it is not adequate for further processing.
I wonder what other options i can try. Heard that DOJO, GWT,Adobe Air do some XSS, but dont know which one is the best.
Thanks,
Ebe.
Without JSON-P, your only option is to run a proxy script on your own server that fetches the content from the external site and pipes it back to the browser.
The browser fetches the content from the script on your server, hence no cross-domain issues, but the script on your server dynamically fetches it from the external site.
There's an example of such a script in PHP here: http://www.daniweb.com/code/snippet494.html (NB. I haven't personally used it).
If you have control over both domains, take a look at EasyXDM. It's a library which wraps cross-browser quirks and provides an easy-to-use API for communicating in client script between different domains using the best available mechanism for that browser (e.g. postMessage if available, other mechanisms if not).
Caveat: you need to have control over both domains in order to make it work (where "control" means you can place static files on both of them). But you don't need any server-side code changes.
To add to what RichieHindle says, there are some good script (Python+Cron) that you can plonk on your server and it will check for changes to a POST/GET location and cache the changes on your server.
Either set your triggers low (once every 10 mins/ 1 per day) or you might get blacklisted from the target.
This way, a local cache won't incur the HTTP overhead on every AJAX call from the client.

Are there any diagnostics tools for troubleshooting content delivery with Opera Mini?

I have an application that I'm targeting a wide variety of devices and platforms. The application can render different HTML based upon the type of client. However due to the complexity of the application, it shares a considerable amount of JavaScript libraries that rely on a number of async and ajax method calls.
One of the targets for the application is Opera Mini. This "sort-of" works but it seems like sometimes when building up the specialized markup to send down to the Opera Mini JVM client it does not wait until the async calls are complete. Are there any techniques or tools to see what's going on with the Opera Server (not my application web server) Side processing of the page to determine what I can do to make this solid?
It would appear that after further investigation that the server side browser is fairly picky when it comes to CSS. I can't remember the exact problem, but as soon as I removed the stylesheet all content was displayed properly. At that point I slowly re-introduced the CSS and everythning came back online and worked as expected.
Your javascript will only be allowed a short time before it is aborted:
JavaScript running on the Mini server
will only run for a couple of seconds
before pausing, for resource
constraint reasons. This applies to
JavaScript run due to an event firing
e.g. onload, as well as code run
because of a user action.
~ http://dev.opera.com/articles/view/opera-mini-web-content-authoring-guidelines/#javascript
So the best would be to serve the least javascripty version of your site to the Opera Mini user-agent.
You can type server:source in the address bar once a page is loaded if you want to see the current DOM tree.
It's also possible to post that source to a script on your server using server:source?post=http://your.server.com/script. It will send three fields as a POST request: url, host and html. You can then make your script save it to a file.
(Answering an old question in case it helps someone.)

Reverse Engineer A Web Form

I have a web site which I download 2-3 MB of raw data from that then feeds into an ETL process to load it into my data mart. Unfortunately the data provider is the US Dept. of Ag (USDA) and they do not allow downloading via FTP. They require that I use a web form to select the elements I want, click through 2-3 screens and eventually click to download the file. I'd like to automate this download process. I am not a web developer but somehow it seems that I should be able to use some tool to tell me exactly what put/get/magic goes from the final request to the server. If I had a tool that said, "pass these parameters to this url and wait for a response" I could then hack something together in Perl to automate this process.
I realize that if I deconstructed all 5 of their pages and read through the JavaScript includes and tapped my heals together 3 times I could get this info from what I have access to. But I want a faster and more direct path that does not require me to manually parse all their JS.
Restatement of the final question: Is there a tool or method that will show clearly what the final request request sent from a web form was and how it was structured?
A tamperer's best friends (these are firefox extensions, you could also use something like Wireshark)
HTTPFox
Tamper Data
Best of luck
Use Fiddler2 as a proxy to see what is being passed back and forth. I've done this with success in other similar circumstances
Home page is here: http://www.fiddler2.com/fiddler2/
As with the other responses, except my tool of choice is Charles
What about using a web testing toolkit, like Watir and Ruby ?
Easy to fill in the forms.. just use the output..
Use WatiN and combine it with WatiN TestRecorder (Google for it)
It can "simulate" a user sitting in front of the browser punching in values which you can supply from your own C# code...

Resources