Crawling Ajax.request url directly ... permission error - ajax

I need to crawl a web board, which uses ajax for dynamic update/hide/show of comments without reloading the corresponding post.
I am blocked by this comment area.
In Ajax.request, url is specified with a path without host name like this :
new Ajax(**'/bbs/comment_db/load.php'**, {
update : $('comment_result'),
evalScripts : true,
method : 'post',
data : 'id=work_gallery&no=i7dg&sno='+npage+'&spl='+splno+'&mno='+cmx+'&ksearch='+$('ksearch').value,
onComplete : function() {
$('cmt_spinner').setStyle('display','none');
try {
$('cpn'+npage).setStyle('fontWeight','bold');
$('cpf'+npage).setStyle('fontWeight','bold');
} catch(err) {}
}
}).request();
If I try to access the url with the full host name then
I just got the message: "Permission Error" :
new Ajax(**'http://host.name.com/bbs/comment_db/load.php'**, {
update : $('comment_result'),
evalScripts : true,
method : 'post',
data : 'id=work_gallery&no=i7dg&sno='+npage+'&spl='+splno+'&mno='+cmx+'&ksearch='+$('ksearch').value,
onComplete : function() {
$('cmt_spinner').setStyle('display','none');
try {
$('cpn'+npage).setStyle('fontWeight','bold');
$('cpf'+npage).setStyle('fontWeight','bold');
} catch(err) {}
}
}).request();
will result in the same error.
This is the same even when I call the actual php url in the web browser like this:
http://host.name.com/bbs/comment_db/load.php?'id=work_gallery&..'
I guess that the php module is restricted to be called by an url in the same host.
Any idea for crawling this data ?
Thanks in advance.
-- Shin

Cross site XMLHttpRequest are forbidden by most browsers. If you want to crawl different sites, you will need to do it in a server side script.

As mentioned by darin, the XMLHttpRequest Object (which is the essence of Ajax requests) has security restrictions on calling cross-site HTTP requests, I believe its called the "Same Origin Policy for JavaScript".
While there is a working group within the W3C who have proposed new Access Control for Cross-Site Requests recommendation the restriction still remains in effect for most mainstream browsers.
I found some information on the Mozilla Developer Network that may provide a better explanation.
In your case, it appears that you are using the Prototype JavaScript framework, where Ajax.Request still uses the XMLHttpRequest object for its Ajax requests.

method:'post'
might well be your problem: the host serving the request likely rejects get requests, which is all you can throw at it from a browser address bar. if this is what's happening, you'll need to find or install some sort of scripting tool capable of doing the job (perl would be my choice, and unless you're running Windows, you'll already have that).
I do have to wonder whether what you're trying to do is legit, though: trawling other sites' comment databases isn't usually encouraged.

I would solve this by running a PHP script locally that will do the crawling from outside pages. That way jQuery doesn't have to go to an outside domain.

Related

Curl works but ajax not working in Shopify private app

I have created a private app from my store and try to hit https://API_KEY:PASS#STORE_NAME/admin/orders.json URL using ajax and curl. It is working if I use curl but not with ajax. Can anyone explain here what is the issue?
This might be a Cross origin problem. If you are using jQuery try to make an ajax call with dataType set to jsonp as shown here:
$.ajax("url", {
dataType: "jsonp",
success: function(data) {
console.log(data);
}
})
Like the other answer said, it's a cross origin problem (See CORS)
Best way to deal with it normally is Shopify App Proxy, but this isn't available to private apps, only custom apps. Best bet is to build a custom app and authenticate with OAuth2, assuming there's no other reason you've chosen to build a private app instead.
If the nature of your app permits the change to a custom app, the App Proxy will give you a {store-name}.myshopify.com/{resource} end point that will bypass the cross-origin issue, but forward the request to your remote server.
Also, when you're working with JS and something is not working, check the console, and share any errors. No one can really tell you why it's not working without seeing either the code, the error, or both, but this is a common enough stumbling block with AJAX since all this cross-origin security stuff got put into place that I'm 90% sure it's the answer.

Screen scraping and proxies using Ruby

I know there are several screen scraping threads on here but none of the answers quite satisfied me.
I am trying to scrape the HTML from an external web page using javascript. I am using $.ajax and everything should work fine. Here is my code:
$.ajax({
url: "my.url/path",
dataType: 'text',
success: function(data) {
var myVar = $.get(url);
alert(myVar);
}
});
The only problem is that it is looking for the specified url within my web server. How do I use a proxy to get to an external web page?
Due to Cross Site Scripting restrictions, you're going to have to pass the desired URL to a page on your server that will query the URL in question from serverside, and then return the results to you. Take a look at the thread below and the incorporate that into your application and have it return the source when that page is hit by your AJAX function.
How to get the HTML source of a webpage in Ruby
Using a GET request is going to the be easiest way to transfer the URL of the page you want to fetch your server so you'll be able to call something like:
$.ajax("fetchPage.rb" + encodeURI(http://www.google.com))
Because you can't access the side in question directly from the server, you're going to have to pipe the serverside script through a proxy for the request to work, which really kind of depends on your setup. Taking a look at the Proxy class in Ruby:
http://ruby-doc.org/stdlib-1.9.3/libdoc/net/http/rdoc/Net/HTTP.html#method-c-Proxy

Cross domain javascript ajax request - status 200 OK but no response

Here is my situation:
Im creating a widget that site admins can embed in their site and the data are stored in my server. So the script basically has to make an ajax request to a php file in my server to update the database. Right? Right :)
The ajax request works excellent when i run it in my local server but it does not work when the php file is on my ONLINE server.
This is the code im using:
var url = "http://www.mydomain.net/ajax_php.php";
var params = "com=ins&id=1&mail=mymail#site.net";
http.async = true;
http.open("POST", url, true);
http.onreadystatechange = function() {
if(http.readyState == 4 && http.status == 200) {
//do my things here
alert( http.responseText );
}
}
http.send(params);
In firebug it shows: http://www.mydomain.net/ajax_php.php 200 OK X 600ms.
When i check the ajax responnseText I always get a Status:0
Now my question is: "Can i do cross-domain ajax requests by default? Might this be a cross-domain ajax problem? Since it works when the requested file resides in my local server but DOESN'T work when the requested file is in another server, im thinking ajax requests to another remote server might be denied? Can you help me clear on this?
Thanks..
Cross-domain requests are not directly allowed. However, there is a commonly-used technique called JSONP that will allow you to avoid this restriction through the use of script tags. Basically, you create a callback function with a known name:
function receiveData(data) {
// ...
}
And then your server wraps JSON data in a function call, like this:
receiveData({"the": "data"});
And you "call" the cross-domain server by adding a script tag to your page. jQuery elegantly wraps all of this up in its ajax function.
Another technique that I've had to use at times is cross-document communication through iframes. You can have one window talk to another, even cross-domain, in a restricted manner through postMessage. Note that only recent browsers have this functionality, so that option is not viable in all cases without resorting to hackery.
You're going to need to have your response sent back to your client via a JSONP call.
What you'll need to do is to have your request for data wrapped in a script tag. Your server will respond with your data wrapped in a function call. By downloading the script as an external resource, your browser will execute the script (just like adding a reference to an external JS file like jQuery) and pass the data to a known JS method. Your JS method will then take the data and do whatever you need to do with it.
Lots of steps involved. Using a library like jQuery provides a lot of support for this.
Hope this helps.

Issue with METHOD in prototype / Ajax.Request

I am trying to call yahoo api via Ajax to find current weather:
var query = "select * from weather.forecast where location in ('UKXX0085','UKXX0061','CAXX0518','CHXX0049') and u='c'";
var url = 'http://query.yahooapis.com/v1/public/yql?q=' + encodeURIComponent(query) +'&rnd=1344223&format=json&callback=jsonp1285353223470';
new Ajax.Request(url, {
method: 'get',
onComplete: function(transport) {
alert(transport.Status); // say 'null'
alert(transport.responseText); // say ''
}
});
I noticed, that instead of GET firebug says OPTIONS. What is it and how I can use force prototype to use GET?
Here is functionality which i am trying to recreate.
And here is full URL which I am trying to access:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20location%20in%20(%27UKXX0085%27%2C%27UKXX0061%27%2C%27CAXX0518%27%2C%27CHXX0049%27)%20and%20u%3D%27c%27&rnd=1344223&format=json&callback=jsonp1285353223470
After hours of trying to debug the same issue myself, I came to the following conclusion.
I believe this happens because of XSS counter-measures in newer browsers.
You can find very detailed information about these new counter-measures here:
https://developer.mozilla.org/en/http_access_control
Basically, a site can specify how "careful" the browser should be about allowing scripts from other domains. If your site, or a site from which you're loading external JavaScript code, includes one of these pieces of "browser advice", newer browsers will react by enforcing a stronger XSS policy.
For some reason, Prototype's Ajax.Request, under Firefox, seems to react by attempting to do an OPTIONS request, rather than a GET or POST, so perhaps Prototype has not been updated to correctly handle these new security conditions.
At least that was the conclusion in my case. Maybe this clue can help with your case...

NETWORK_ERROR: XMLHttpRequest Exception 101

I am getting this Error
NETWORK_ERROR: XMLHttpRequest Exception 101
when trying to get XML content from one site.
Here is my code:
var xmlhttp;
if(window.XMLHttpRequest) {
xmlhttp = new XMLHttpRequest();
}
if (xmlhttp==null) {
alert ("Your browser does not support XMLHTTP!");
return;
}
xmlhttp.onReadyStateChange=function() {
if(xmlhttp.readyState==4) {
var value =xmlhttp.responseXML;
alert(value);
}
}
xmlhttp.open("GET",url,false);
xmlhttp.send();
//alert(xmlhttp.responseXML);
}
xmlhttp.open("GET",url,false);
xmlhttp.send(null);
Does any one have a solution?
If the url you provide is located externally to your server, and the server has not allowed you to send requests, you have permission problems. You cannot access data from another server with a XMLHttpRequest, without the server explicitly allowing you to do so.
Update: Realizing this is now visible as an answer on Google, I tried to find some documentation on this error. That was surprisingly hard.
This article though, has some background info and steps to resolve. Specifically, it mentions this error here:
As long as the server is configured to allow requests from your web application's origin, XMLHttpRequest will work. Otherwise, an INVALID_ACCESS_ERR exception is thrown
An interpretation of INVALID_ACCESS_ERR seems to be what we're looking at here.
To solve this, the server that receives the request, must be configured to allow the origin. This is described in more details at Mozilla.
The restriction that you cannot access data from another server with a XMLHttpRequest can apply even if the url just implies a remote server.
So:
url = "http://www.myserver.com/webpage.html"
may fail,
but:
url = "/webpage.html"
succeed - even if the request is being made from www.myserver.com
Request aborted because it was cached or previously requested? It seems the XMLHttpRequest Exception 101 error can be thrown for several reasons. I've found that it occurs when I send an XMLHttpRequest with the same URL more than one time. (Changing the URL by appending a cache defeating nonsense string to the end of the URL allows the request to be repeated. -- I wasn't intending to repeat the request, but events in the program caused it to happen and resulted in this exception).
Not returning the correct responseText or responseXML in the event of a repeated request is a bug (probably webKit).
When this exception occurred, I did get an onload event with readyState==4 and the request object state=0 and responseText=="" and responseXML==null. This was a cross domain request, which the server permits.
This was on an Android 2.3.5 system which uses webKit/533.1
Anyone have documentation on what the exception is supposed to mean?
Something like this happened with me when I returned incorrect XML (I put an attribute in the root node). In case this helps anyone.
xmlhttp.open("GET",url, true);
set the async part to true
I found a very nice article with 2 diferent solutions.
The first one implementing jQuery and JSONP, explaining how simple it is.
The second approach, it's redirecting trough a PHP call. Very simple and very nice.
http://mayten.com.ar/blog/42-ajax-cross-domain
Another modern method of solving this problem is Cross Origin Ressource Sharing.
HTML5 offers this feature. You can "wrap" your XMLhttp request in this CORS_request and
if the target browser supports this feature, you can use it and wont have no problems.
EDIT:
Additionaly i have to add that there are many reasons which can cause this Issue.
Not only a Cross Domain Restriction but also simply wrong Settings in your WEB.CONFIG of your Webservice.
Example IIS(.NET):
To enable HTTP access from external sources ( in my case a compiled Phonegap app with CORS request ) you have to add this to your WEB.CONFIG
<webServices>
<protocols>
<add name="HttpGet"/>
<add name="HttpPost"/>
</protocols>
</webServices>
Another scenario:
I got two webservices running... One on Port 80 and one on Port 90. This also gave me an XML HTTP Request Error. I even dont know why :). Nevertheless i think this can help many not well experienced readers.

Resources