I'm trying to work with the IUCN Red List web services API (here's an example output). Unfortunately I haven't been able to find any documentation other than this one-off Gist. It looks as though the API is constructing an HTML document rather than returning a data object, which isn't something I've experienced in the past. I also notice that in the example there is no mention of a ?callback=JSON_CALLBACK in the URL, which I would expect when dealing with JSONP.
I've constructed an http request in AngularJS like so:
atRiskApp.controller('IucnController', ['$scope', '$routeParams', '$http', function ($scope, $routeParams, $http) {
$scope.iucn = $routeParams.iucn; // pulling a number from the URL: ex. 22718591
$scope.getIUCN = function () {
var iucnUrl = 'http://api.iucnredlist.org/details/' + $scope.iucn + '/0.js';
$http.jsonp( url )
.success( function (response) {
console.log(response);
})
.error( function (response) {
console.log(response);
});
};
}]);
Although the HTML document is being successfully passed to my app I'm getting the following error message:
Uncaught SyntaxError: Unexpected token <
It seems like the app is expecting to get Javascript, and is instead getting an HTML document, which it apparently can't parse. I've tried adding a config object to the request based on the angular docs: $http.jsonp( {url: iucnUrl, responseType: 'text'} ) without any luck.
My question is, how do I work with the returned HTML document, or am I way off track here?
Response from the API is an HTML document with a javascript extension:
On the page you linked to in your comment , I found some potentially useful information under the heading API Index.
You can actually get JSON for all levels of taxonomy, including your example Aneides aeneus. However, this JSON doesn't include all of the data from the HTML version, so it's not as useful. Hopefully this helps a little.
API Index (excerpt)
It is also possible to retrieve the row(s) of the index corresponding to an individual species:
http://iucn-redlist-api.heroku.com/index/species/panthera-leo.json
You can use dashes for spaces, as a convenient replacement for the standard URL escape, %20.
The HTML format contains direct links to the species account pages. The CSV and JSON formats include a species_id column which can be used to construct species account URLs as follows:
http://iucn-redlist-api.heroku.com/details/species_id/0
To use the index JSON in Web pages directly, you may need JSONP padding; use the “.js” extension and add a “callback” parameter with the name of the function to use.
http://iucn-redlist-api.heroku.com/index/genus/Dioscorea.js?callback=show
I diagonally looked over the website and its sitemap and found no reference to a public API. All the output is HTML, and it makes sense that json parse method jsonp will not be able to make sense of it. First < it encounters, it will fail (as is apparent).
First of all, I would contact the site admin to simply ask if there is an API that will yield you XML or json or some other object notation that's convenient to work with.
Then there's the scenario where his or her answer would be 'no':
Parsing HTML is not something to be taken lightly and certainly not something you would write yourself unless absolutely necessary.
Luckily, there are ways to get data from html using jQuery.parseHTML(), pure ('vanilla') javascript ways you can use from within AngularJS and full-blown HTML parsing libraries such as HTML Agility Pack(for use in .NET), all of which can get you to the heart of the data within the DOM nodes you're trying to poke at.
There are many other libraries that might serve you better, but these examples will give you a good starting point to canvas the landscape of HTML parsing. This will take some looking into, but it will be more than worth it.
Related
I have a column of links in Google Sheets. I want to tell if a page is producing an error message using importxml
As an example, this works fine
=importxml("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T", "//td/b")
i.e. it looks for td, and pulls out b (which are postcodes in Canada)
But this code that looks for the error message does not work:
=importxml("https://www.awwwards.com/error1/", "//div/h1" )
I want it to pull out the "THE PAGE YOU WERE LOOKING FOR DOESN'T EXIST."
...on this page https://www.awwwards.com/error1/
I'm getting a Resource at URL not found error. What could I be doing wrong? Thanks
after quick trial and error with default formulae:
=IMPORTXML("https://www.awwwards.com/error1/", "//*")
=IMPORTHTML("https://www.awwwards.com/error1/", "table", 1)
=IMPORTHTML("https://www.awwwards.com/error1/", "list", 1)
=IMPORTDATA("https://www.awwwards.com/error1/")
it seems that the website is not possible to be scraped in Google Sheets by any means (regular formulae)
You want to retrieve the value of THE PAGE YOU WERE LOOKING FOR DOESN'T EXIST. from the URL of https://www.awwwards.com/error1/.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Issue and workaround:
I think that the page of your URL is Error 404 (Not Found). So in this case, the status code of 404 is returned. I thought that by this, the built-in functions like IMPORTXML might not be able to retrieve the HTML data.
So as one workaround, how about using a custom function with UrlFetchApp? When UrlFetchApp is used, the HTML data can be retrieved even when the status code is 404.
Sample script for custom function:
Please copy and paste the following script to the script editor of the Spreadsheet. And please put =SAMPLE("https://www.awwwards.com/error1") to a cell on the Spreadsheet. By this, the script is run.
function SAMPLE(url) {
return UrlFetchApp
.fetch(url, {muteHttpExceptions: true})
.getContentText()
.match(/<h1>([\w\s\S]+)<\/h1>/)[1]
.toUpperCase();
}
Result:
Note:
This custom function is for the URL of https://www.awwwards.com/error1. When you use this for other URL, the expected results might not be able to be retrieved. Please be careful this.
References:
Custom Functions in Google Sheets
fetch(url, params)
muteHttpExceptions: If true the fetch doesn't throw an exception if the response code indicates failure, and instead returns the HTTPResponse. The default is false.
match()
toUpperCase()
If this was not the direction you want, I apologize.
Using the recaptcha javascript client ( http://www.google.com/recaptcha/api.js?onload=vcRecaptchaApiLoaded&render=explicit ) the automatic request (GET https://www.google.com/recaptcha/api2/userverify?k=XXXXXXXXX) after you select the correct images returns an invalid json which afterwards is sent to back to google and their json validation fails. This worked as expected for months now. Not anymore :(
Server side library used reCaptcha PHP (1.1)
Response
)]}'
["uvresp","03AHJ_Vuup5SJJ583dSIfezFl80dp2AlJ_rZpz3vqGWlOTmbjZqH8izwjruJASNhQI0tOOnmj2Pzg14xMw7dryeqVfTGhx6dg_x2i0PRA1ZeDyrBNn8DX-w5S262Zb3_ZWKj5JDBiqPVnXtNLGbyBxjd97VHanspJWU_-qKWLSWVKxLK6n3lm9Biw33oUCEiGA39GNa09Z6TSAEtnolQCf_LPPRWKoE_e50f2s5ZpUVG5GNVdX7qGBRwphTgcUhwOjA8uYTzmA9co3Jwk2KR5UQ0zzVRJzRZZTBuK9km3PE1WV05ACAwrJi29niDpVaRmpooAnIkHNgGyGBu7u3W7gU6YAHtwya8PYhdF__G_MMG8XpVFDTBa196hKD6hxw-E2PsxmoIQJrU1K89mmzNIh-xLNQ7KJvrBMzVf8A5FHyUQgL5UNDWVwSkWCdC_3swxBzi7R3p8VIrUtkIqJFA_GSAxy0cBRJ8J55Pfs5rzhfR8j-x1hGCzi_6vJrbrwfNesoLEB7GWJtElcljhBYvcDNzU_B_VJ7Sck-6i1Nd0qdmtSiCRZYNyaZ8uGLoDdNgCY-0Oi4802AlI26H7TjGBcKnr4gmaHXTNRf1W7x_3FV05DxVsTqeAlo8zGqmiVqcgmX64BbLK4fD9Xoait1_Lp5vK26fCaOQmGKF7CJaYPuxnX-zXgSkfZDCG6rs6xv1CfZQnIKD0W3Yz522VD4YdNfATb3FywhFWbZuxoBIt6vslZDlPXh2MYOkAmYfIPIo8WoWazMoLI_8iNBZPiMlRL0PS5aQiLSrvbf-sknMHhfM2MJYsfrQjC52aDRaHYdcZbY6Wxlhw0tQEknX8B47_DAQzCKkpoFsecO1eMHuInIykZ1l7TOdZMytI-NzGg21KeKAE8dK6ZWee0UEqDJvCkj5aH5TQcBA--ygbOS186bAptUP5n6WvORx1Nb2ZU_AF9fB23PJWH1xvB4gZoNDvhLmdpkE1Po9Lyim1P61E2rrgYjWgPRwT4jUo",1,120]
Any ideas?
Thanks
I think it's a duplicate of https://stackoverflow.com/a/36862268/2140017
And here is my answer from that link:
Actually the value returned is not valid json but well parsed by the
Google's API.
Is it a protection ? I don't know, but if you look at the javascript,
you can find that:
var jm=function(a,b,c,d,e,g,h,l,r){this.xl=a;this.$c=c||"GET";this.Ka=d;this.Gg=e||null;this.Td=m(h)?h:1;this.ye=0;this.xh=this.Nh=!1;this.uh=b;this.Mh=g;this.md=l||"";this.Zb=!!r;this.Wf=null};f=jm.prototype;f.getUrl=function(){return this.xl};f.ug=function(){return this.$c};f.Ca=function(){return this.Ka};f.fi=function(){return this.Zb};f.bi=function(){return this.md};var nm=function(){G.call(this);this.nj=new hm(0,mm,1,10,5E3);H(this,this.nj);this.ad=0};x(nm,G);var mm=new Nh;nm.prototype.send=function(a){return new Lc(function(b,c){var d=String(this.ad++);this.nj.send(d,a.Uf.toString(),a.ug(),a.Ca(),mm,void 0,u(function(a,d){var h=d.target;if(Xk(h)){var l=a.ml;h.B?(h=h.B.responseText,0==h.indexOf(")]}'\n")&&(h=h.substring(5)),h=Hg(h)):h=void 0;b(new l(h))}else c(new om(a))},this,a))},this)};var om=function(a){y.call(this);this.request=a};x(om,y);
especially take a look at:
var l=a.ml;h.B?(h=h.B.responseText,0==h.indexOf(")]}'\n")&&(h=h.substring(5)),h=Hg(h)):h=void 0;`
The parser explicitly checks that the value begins by )]} and strips
it.
I suggest you to just apply the same substring on the "json" string
I am rather new to NodeJS so hopefully I am able to articulate my question(s) properly. My goal is to create a NodeJS application that will use the node-rest-client to GET data and asynchronously display it in HTML on client side.
I have several node-rest-client methods created and currently I am calling my GET data operation when a user navigates to the /getdata page. The response is successfully logged to the console but I'm stumbling on the best method to dynamically populate this data in an HTML table on the /getdata page itself. I'd like to follow Node best practices, ensure durability under high user load and ultimately make sure I'm not coding a piece of junk.
How can I bind data returned from my Express routes to the HTML front end?
Should I use separate "router.get" routes for each node-rest-method?
How can I bind a GET request to a button and have it GET new data when clicked?
Should I consider using socket.io, angularjs and ajax to pipe data from the server side to client side?
-Thank you for reading.
This is an example of the route that is currently rendering the getdata page as well as calling my getDomains node-rest-client method. The page is rendering correct and the data returned by getDomains is successfully printed to the console, however I'm having trouble getting the data piped to the /getdata page.
router.get('/getdata', function(req, res) {
res.render('getdata', {title: 'This is the get data page'});
console.log("Rendering:: Starting post requirement");
args = {
headers:{"Cookie":req.session.qcsession,"Accept":"application/xml"},
};
qcclient.methods.getDomains(args, function(data, response){
var theProjectsSTRING = JSON.stringify(data);
var theProjectsJSON = JSON.parse(theProjectsSTRING);
console.log('Processing JSON.Stringify on DATA');
console.log(theProjectsSTRING);
console.log('Processing JSON.Parse on theProjectsSTRING');
console.log('');
console.log('Parsing the array ' + theProjectsJSON.Domains.Domain[0].$.Name );
});
});
I've started to experiment with creating several routes for my different node-rest-client methods that will use res.send to return the data and the perhaps I could bind an AJAX call or use angularjs to parse the data and display it to the user.
router.get('/domaindata', function(req, res){
var theProjectsSTRING;
var theProjectsJSON;
args = {
headers:{"Cookie": req.session.qcsession,"Accept":"application/xml"},
};
qcclient.methods.getDomains(args, function(data, response){
//console.log(data);
theProjectsSTRING = JSON.stringify(data);
theProjectsJSON = JSON.parse(theProjectsSTRING);
console.log('Processing JSON.Stringify on DATA');
console.log(theProjectsSTRING);
console.log('Processing JSON.Parse on theProjectsSTRING');
console.log('');
console.log('Parsing the array ' + theProjectsJSON.Domains.Domain[0].$.Name );
res.send(theProjectsSTRING);
});
});
I looked into your code. You are using res.render(..) and res.send(..). First of all you should understand the basic request-response cycle. Request object gives us the values passed from routes, and response will return values back after doing some kind of processing on request values. More particularly in express you will be using req.params and req.body if values are passed through the body of html.
So all response related statements(res.send(..),res.json(..), res.jsonp(..), res.render(..)) should be at the end of your function(req,res){...} where you have no other processing left to be done, otherwise you will get errors.
As per the modern web application development practices in javascript, frameworks such as Ruby on Rails, ExpressJS, Django, Play etc they all work as REST engine and front end routing logic is written in javascript. If you are using AngularJS then ngRoute and open source ui-router makes work really easy. If you look closely into some of the popular MEAN seed projects such as mean.io, mean.js even they use the ExpressJS as REST engine and AngularJS does the rest of heavyweight job in front end.
Very often you will be sending JSON data from backend so for that you can use res.json(..). To consume the data from your endpoints you can use angularjs ngResource service.
Let's take a simplest case, you have a GET /domaindata end point :
router.get('/domaindata',function(req,res){
..
..
res.json({somekey:'somevalue'});
});
In the front end you can access this using angularJS ngResource service :
var MyResource = $resource('/domaindata');
MyResource.query(function(results){
$scope.myValue = results;
//myValue variable is now bonded to the view.
});
I would suggest you to have a look into the ui-router for ui front end routing.
If you are looking for sample implementation then you can look into this project which i wrote sometime back, it can also give you an overview of implementing login, session management using JSON Web Token.
There are lot of things to understand, let me know if you need help in anything.
I am new to Django but i am advanced programmer in other frameworks.
What i intend to do:
Press a form button, triggering Javascript that fires a Ajax request which is processed by a Django View (creates a file) that return plain simple JSON data (the name of the file) - and that is appended as a link to a DOM-Element named 'downloads'.
What i achieved so far instead:
Press the button, triggering js that fires a ajax request which is process by a Django view (creates a file) that return the whole page appended as a duplicate to the DOM-Element named 'downloads' (instead of simple JSON data).
here is the extracted code from the corresponding Django view:
context = {
'filename': filename
}
data['filename'] = render_to_string(current_app+'/json_download_link.html', context)
return HttpResponse(json.dumps(data), content_type="application/json")
I tried several variants (like https://stackoverflow.com/a/2428119/850547), with and without RequestContext object; different rendering strats.. i am out of ideas now..
It seems to me that there is NO possibility to make ajax requests without using a template in the response.. :-/ (but i hope i am wrong)
But even then: Why is Django return the main template (with full DOM) that i have NOT passed to the context...
I just want JSON data - not more!
I hope my problem is understandable... if you need more informations let me know and i will add them.
EDIT:
for the upcoming questions - json_download_link.html looks like this:
Download
But i don't even want to use that!
corresponding jquery:
$.post(url, form_data)
.done(function(result){
$('#downloads').append(' Download CSV')
})
I don't understand your question. Of course you can make an Ajax request without using a template. If you don't want to use a template, don't use a template. If you just want to return JSON, then do that.
Without having any details of what's going wrong, I would imagine that your Ajax request is not hitting the view you think it is, but is going to the original full-page view. Try adding some logging in the view to see what's going on.
There is no need to return the full template. You can return parts of template and render/append them at the frontend.
A template can be as small as you want. For example this is a template:
name.html
<p>My name is {{name}}</p>
You can return only this template with json.dumps() and append it on the front end.
What is your json_download_link.html?
assuming example.csv is string
data ={}
data['filename'] = u'example.csv'
return HttpResponse(simplejson.dumps(data), content_type="application/json")
Is this what you are looking for?
Google Voice has XML URLs so I was wondering how somebody would pull the JSON part from the returned XML and parse it out to a page. Google Voice's search capability is busted right now and I want to get access to my history. I'm thinking that a synchronous call to all of the pages up to the last known page number in my history should do it...
This may be your best bet...
Read about dataType conversion here: http://api.jquery.com/extending-ajax/
Particularly the section that says:
You can define converters "inline," inside the options of an ajax call. For example, the following code requests an XML document, then extracts relevant text from it, and parses it as "mydatatype":
$.ajax( url, {
dataType: "xml text mydatatype",
converters: {
"xml text": function( xmlValue ) {
// Extract relevant text from the xml document
return textValue;
}
}
});
I don't know if this exact code snippet will return the JSON content properly, but at the very least it should strip it out of the XML response (you may need to add additional code to parse the returned "textValue" as JSON. Perhaps using the jQuery parseJSON method.
Maybe try:
$.ajax( url, {
dataType: "xml text mydatatype",
converters: {
"xml text": $.parseJSON;
}
}
});
Hope this helps.
XML and JSON are not the same data types. You will likely have to process the data as XML, if that's the only type your data is returned as. If the URL has .xml, you might try changing it to .json to see if it returns a JSON data type.
If you give us more information (examples, URLs, etc), someone might be able to help you better.