A method for standardizing URI/URLs in Ruby in a forgiving manner - ruby

I'm trying to find a method to take a URI/URL string from a user and determine a working, canonical form (or failing if the resource isn't valid). Simultaneously, it should also verify that the URL exists. So we're checking for both valid "syntax" and also existence.
For instance, a string like google.com should be turned into http://www.google.com, and a string like google.com/insights should be turned into http://www.google.com/insights. A string like http://thiswebsitedoesntexistatall.com should return some sort of error or exception.
I believe a portion of the solution may likely be calling an HTTP get_response() method and following redirects until I get a 200 OK status.
It seems like the URI.parse() method is not forgiving of leaving off the http. I realize I could write a simple thing to try adding http in front, etc., but I was hoping there was some existing gem or little-known library function that would be really forgiving about URLs and canonicalize them for me.
Both the built in net/http and HTTParty seem to be too strict for what I'm looking for. Is there a nice way to do this?

There are some problems with what you're asking for:
A URL parser shouldn't assume the value passed in is HTTP, when FTP and many other protocols are equally valid. If you know the protocol is likely to be HTTP, then you need to add the protocol.
If you try to connect to a site and follow redirects until you get a 200 response, you've only proven that the URL resolves to a valid page of some sort. That 200 could be an error-page returned because the one you want is a dead link or invalid, or that the site is temporarily down. To prove otherwise means you have to have some intimate pre-knowledge about the page you're looking for, such as specific content to search for.
Assuming the URL is good after you follow the redirects, is not quite safe. Many sites add on all sorts of session data to the URL, so what could start as a simple and clean URL can resolve to a long and convoluted one.
I'd recommend you look at the Addressable::URI gem. It's much more full-featured than Ruby's URI. It won't make the decisions for you, but at least it will give you a more complete API and can rewrite/normalize URLs. Cleaning them up and/or determining if they are good is still left as an exercise for you.

Related

What HTTP Protocol can I use if I need to GET something from the server but I also need to send a requestbody?

I am using SpringBoot...
I can not use GET protocol and include a body, but I am not going to create or update anything on the server so I do not want to use POST or PUT, any other protocol that acts like a GET with body?
if you wonder what I need to send in that body it is an url parameter, like for example http://somewebsite.com/stuff/etc and I feel that putting this inside a request body is better than putting it as a requestparam
I can not use GET protocol and include a body, but I am not going to create or update anything on the server so I do not want to use POST or PUT, any other protocol that acts like a GET with body?
Your best bet, where suitable, would be to mimic how HTML forms work; which is to say having a family of resources with identifiers that are filled in by the client (in general, via URI templates -- often via query parameters as would happen with an HTML form).
When that's not appropriate: as of 2022-11, your best bet is POST. It's not a great answer (in particular, general purpose HTTP components won't know that the semantics of the request are safe), but it is the best option available of the registered methods.
POST serves many useful purposes in HTTP, including the general purpose of "
"this action isn’t worth standardizing." -- Roy Fielding, 2009
Eventually, the HTTPbis-wg will finalize the safe-method-with-a-body proposal, and at that point that will become a much better option than POST (for the cases that match the new semantics).

how to pass object on Spring's REST Template using get

I am using a Spring REST template to pull data using POST and everything is working fine.
ResponseEntity<MyObject> resp= restTemplate.postForEntity("url", inputParam, MyObject.class);
But now since I am not doing any POST operation, I want to change it to GET. I can do this by adding all input params as url parameters and do:
ResponseEntity<MyObject> resp= restTemplate.getForEntity("url",MyObject.class);
But the problem is, inputParam has alot of parameters, so preparing the url manually is not the best solution. Also GET requests have length restrictions.
Is there any other better solution for handling this?
First of all, I think your second line should say getForEntity().
Secondly, there are numerous URL builder class options if you google around (including ones from Spring). So, I would use a URL building class to prepare the URL rather than manually doing it yourself which can get messy.
https://www.baeldung.com/spring-uricomponentsbuilder
https://square.github.io/okhttp/3.x/okhttp/okhttp3/HttpUrl.html
https://docs.oracle.com/javaee/7/api/javax/ws/rs/core/UriBuilder.html
Length Restriction
There's a good SO entry here noting length restrictions of common browsers; so if its going through a browser then I'd stick to POST if you're potentially over the 2000 lower limit they suggest.
Technically there shouldn't be a limit according to https://www.w3.org/2001/tag/doc/get7#myths.
I think on a lot of back-end technologies there is no limit. So, if this is API-only and not going through a browser (like back-end to back-end) then you may be able to ignore those limits. I'd recommend looking into that specifically though and testing it with your back-end.
UniRest
Also, as a personal recommendation, I have found UniRest to be an amazingly useful REST client which makes most of my code much cleaner :). If you have time, maybe try giving that a shot.
http://unirest.io/java.html

Combining GET and POST in Sinatra (Ruby)

I am trying to make a RESTful api and have some function which needs credentials. For example say I'm writing a function which finds all nearby places within a certain radius, but only authorised users can use it.
One way to do it is to send it all using GET like so:
http://myapi.heroku.com/getNearbyPlaces?lon=12.343523&lat=56.123533&radius=30&username=john&password=blabla123
but obviously that's the worst possible way to do it.
Is it possible to instead move the username and password fields and embed them as POST variables over SSL, so the URL will only look like so:
https://myapi.heroku.com/getNearbyPlaces?lon=12.343523&lat=56.123533&radius=30
and the credentials will be sent encrypted.
How would I then in Sinatra and Ruby properly get at the GET and POST variables? Is this The Right Way To Do It? If not why not?
If you are really trying to create a restful API instead if some URL endpoints which happen to speak some HTTP dialect, you should stick to GET. It's even again in your path, so you seem to be pretty sure it's a get.
Instead of trying to hide the username and password in GET or POST parameters, you should instead use Basic authentication, which was invented especially for that purpose and is universally available in clients (and is available using convenience methods in Sinatra).
Also, if you are trying to use REST, you should embrace the concept of resources and resoiurce collections (which is implied by the R and E of REST). So you have a single URL like http://myapi.heroku.com/NearbyPlaces. If you GET there, you gather information about that resource, if you POST, you create a new resource, if you PUT yopu update n existing resource and if you DELETE, well, you delete it. What you should do before is th structure your object space into these resources and design your API around it.
Possibly, you could have a resource collection at http://myapi.heroku.com/places. Each place as a resource has a unique URL like http://myapi.heroku.com/places/123. New polaces can be created by POSTing to http://myapi.heroku.com/places. And nearby places could be gathered by GETing http://myapi.heroku.com/places/nearby?lon=12.343523&lat=56.123533&radius=30. hat call could return an Array or URLs to nearby places, e.g.
[
"http://myapi.heroku.com/places/123",
"http://myapi.heroku.com/places/17",
"http://myapi.heroku.com/places/42"
]
If you want to be truly discoverable, you might also embrace HATEOAS which constraints REST smentics in a way to allows API clients to "browse" through the API as a user with a browser would do. To allow this, you use Hyperlink inside your API which point to other resources, kind of like in the example above.
The params that are part of the url (namely lon, lat and radius) are known as query parameters, the user and password information that you want to send in your form are known as form parameters. In Sinatra both of these type of parameters are made available in the params hash of a controller.
So in Sinatra you would be able to access your lon parameter as params[:lon] and the user parameter as params[:user].
I suggest using basic or digest authentication and a plain GET request. In other words, your request should be "GET /places?lat=x&lon=x&radius=x" and you should let HTTP handle the authentication. If I understand your situation correctly, this is the ideal approach and will certainly be the most RESTful solution.
As an aside, your URI could be improved. Having verbs ("get") and query-like adjectives ("nearby") in your resource names is not really appropriate. In general, resources should be nouns (ie. "places", "person", "books"). See the example request I wrote above; "get" is redundant because you are using a GET request and "nearby" is redundant because you are already querying by location.

What does a Ajax call response like 'for (;;); { json data }' mean? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why do people put code like “throw 1; <dont be evil>” and “for(;;);” in front of json responses?
I found this kind of syntax being used on Facebook for Ajax calls. I'm confused on the for (;;); part in the beginning of response. What is it used for?
This is the call and response:
GET http://0.131.channel.facebook.com/x/1476579705/51033089/false/p_1524926084=0
Response:
for (;;);{"t":"continue"}
I suspect the primary reason it's there is control. It forces you to retrieve the data via Ajax, not via JSON-P or similar (which uses script tags, and so would fail because that for loop is infinite), and thus ensures that the Same Origin Policy kicks in. This lets them control what documents can issue calls to the API — specifically, only documents that have the same origin as that API call, or ones that Facebook specifically grants access to via CORS (on browsers that support CORS). So you have to request the data via a mechanism where the browser will enforce the SOP, and you have to know about that preface and remove it before deserializing the data.
So yeah, it's about controlling (useful) access to that data.
Facebook has a ton of developers working internally on a lot of projects, and it is very common for someone to make a minor mistake; whether it be something as simple and serious as failing to escape data inserted into an HTML or SQL template or something as intricate and subtle as using eval (sometimes inefficient and arguably insecure) or JSON.parse (a compliant but not universally implemented extension) instead of a "known good" JSON decoder, it is important to figure out ways to easily enforce best practices on this developer population.
To face this challenge, Facebook has recently been going "all out" with internal projects designed to gracefully enforce these best practices, and to be honest the only explanation that truly makes sense for this specific case is just that: someone internally decided that all JSON parsing should go through a single implementation in their core library, and the best way to enforce that is for every single API response to get for(;;); automatically tacked on the front.
In so doing, a developer can't be "lazy": they will notice immediately if they use eval(), wonder what is up, and then realize their mistake and use the approved JSON API.
The other answers being provided seem to all fall into one of two categories:
misunderstanding JSONP, or
misunderstanding "JSON hijacking".
Those in the first category rely on the idea that an attacker can somehow make a request "using JSONP" to an API that doesn't support it. JSONP is a protocol that must be supported on both the server and the client: it requires the server to return something akin to myFunction({"t":"continue"}) such that the result is passed to a local function. You can't just "use JSONP" by accident.
Those in the second category are citing a very real vulnerability that has been described allowing a cross-site request forgery via tags to APIs that do not use JSONP (such as this one), allowing a form of "JSON hijacking". This is done by changing the Array/Object constructor, which allows one to access the information being returned from the server without a wrapping function.
However, that is simply not possible in this case: the reason it works at all is that a bare array (one possible result of many JSON APIs, such as the famous Gmail example) is a valid expression statement, which is not true of a bare object.
In fact, the syntax for objects defined by JSON (which includes quotation marks around the field names, as seen in this example) conflicts with the syntax for blocks, and therefore cannot be used at the top-level of a script.
js> {"t":"continue"}
typein:2: SyntaxError: invalid label:
typein:2: {"t":"continue"}
typein:2: ....^
For this example to be exploitable by way of Object() constructor remapping, it would require the API to have instead returned the object inside of a set of parentheses, making it valid JavaScript (but then not valid JSON).
js> ({"t":"continue"})
[object Object]
Now, it could be that this for(;;); prefix trick is only "accidentally" showing up in this example, and is in fact being returned by other internal Facebook APIs that are returning arrays; but in this case that should really be noted, as that would then be the "real" cause for why for(;;); is appearing in this specific snippet.
Well the for(;;); is an infinite loop (you can use Chrome's JavaScript console to run that code in a tab if you want, and then watch the CPU-usage in the task manager go through the roof until the browser kills the tab).
So I suspect that maybe it is being put there to frustrate anyone attempting to parse the response using eval or any other technique that executes the returned data.
To explain further, it used to be fairly commonplace to parse a bit of JSON-formatted data using JavaScript's eval() function, by doing something like:
var parsedJson = eval('(' + jsonString + ')');
...this is considered unsafe, however, as if for some reason your JSON-formatted data contains executable JavaScript code instead of (or in addition to) JSON-formatted data then that code will be executed by the eval(). This means that if you are talking with an untrusted server, or if someone compromises a trusted server, then they can run arbitrary code on your page.
Because of this, using things like eval() to parse JSON-formatted data is generally frowned upon, and the for(;;); statement in the Facebook JSON will prevent people from parsing the data that way. Anyone that tries will get an infinite loop. So essentially, it's like Facebook is trying to enforce that people work with its API in a way that doesn't leave them vulnerable to future exploits that try to hijack the Facebook API to use as a vector.
I'm a bit late and T.J. has basically solved the mystery, but I thought I'd share a great paper on this particular topic that has good examples and provides deeper insight into this mechanism.
These infinite loops are a countermeasure against "Javascript hijacking", a type of attack that gained public attention with an attack on Gmail that was published by Jeremiah Grossman.
The idea is as simple as beautiful: A lot of users tend to be logged in permanently in Gmail or Facebook. So what you do is you set up a site and in your malicious site's Javascript you override the object or array constructor:
function Object() {
//Make an Ajax request to your malicious site exposing the object data
}
then you include a <script> tag in that site such as
<script src="http://www.example.com/object.json"></script>
And finally you can read all about the JSON objects in your malicious server's logs.
As promised, the link to the paper.
This looks like a hack to prevent a CSRF attack. There are browser-specific ways to hook into object creation, so a malicious website could use do that first, and then have the following:
<script src="http://0.131.channel.facebook.com/x/1476579705/51033089/false/p_1524926084=0" />
If there weren't an infinite loop before the JSON, an object would be created, since JSON can be eval()ed as javascript, and the hooks would detect it and sniff the object members.
Now if you visit that site from a browser, while logged into Facebook, it can get at your data as if it were you, and then send it back to its own server via e.g., an AJAX or javascript post.

Ajax and filenames - Best practices

I am using jQuery to call PHP files via the $.get method
function fetchDepartment(company_id)
{
$.get("ajax/fetchDepartment.php?sec=departments&company_id="+company_id, function(data){
$("#department_id").html(data);
});
}
What I am thinking is can I secure the filename even further?
Currently I have a global access check within the .php file that check if the user is logged in, if he can access this data etc.
But I am wondering if there are further steps I can take so a user couldn't see this filename, or what other steps you recommend to take.
Encoded requests
You could make the request details effectively invisible to the casual miscreant by encoding almost all of the URL and then decoding the request details server-side.
The request details would include the action you wish to perform plus the parameters relevant to that action.
All requests would be sent to a single URL, where a server-side process would decode the request details and perform the relevant action as required.
Example Original URL:
/ajax/delete.php?parameter1=foo&parameter2=bar
Request details:
action=delete&parameter1=foo&parameter2=bar
Encoded request details (encoded using base64):
YWN0aW9uPWRlbGV0ZSZwYXJhbWV0ZXIxPWZvbyZwYXJhbWV0ZXIyPWJhcg==
Encoded URL:
/ajax/?request=YWN0aW9uPWRlbGV0ZSZwYXJhbWV0ZXIxPWZvbyZwYXJhbWV0ZXIyPWJhcg==
I don't believe there is native functionality to encode to base64 in JavaScript, but it's far from impossible to find a suitable method or to write your own.
With obfuscated/minified client-side JavaScript it would be quite tricky for someone to determine how to make a request 'by hand'.
Hide implementation details
There are a number of practices you can follow to make your application less susceptible to attack through URL misuse.
Let's start with a URL of: ajax/fetchDepartment.php?sec=departments&company_id=99
There's no need to reveal what server-side technology you're using (PHP) nor, through the query string (sec, company_id), what the query string values actually mean.
Masking the server-side technology
Assuming you have index.php defined as a default, the following URLs are equivalent:
ajax/fetchDepartment.php?sec=departments&company_id=99
ajax/fetchDepartment/index.php?sec=departments&company_id=99
ajax/fetchDepartment/?sec=departments&company_id=99
The third URL does not reveal the server-side technology you're using. This limits the range of possible attacks. It also makes it easier for you to switch over to a different server-side technology without changing your URLs.
Hiding the meaning of request parameters
ajax/fetchDepartment/?sec=departments&company_id=99
ajax/99/departments/
The latter URL still conveys enough information to perform the request without revealing what the information means.
Whilst someone could still change the values, they won't know what they're changing. This will make it more difficult for an attacker to evaluate and understand the result of any URL changes they make.
Pretty much the only way you can obscure the URL for a certain piece of information from the user is by not loading it in through http. Maybe you can load a set of data on the calling page, or another page with a more generic url.

Resources