HTTP response splitting - validation

I'm trying to handle this possible exploit and wondering what is the best way to do it? should i use apache's common-validator and create a list of known allowed symbols and use that?

From the wikipedia article:
The generic solution is to URL-encode strings before inclusion into HTTP headers such as Location or Set-Cookie.
Typical examples of sanitization include casting to integer, or aggressive regular expression replacement. It is worth noting that although this is not a PHP specific problem, the PHP interpreter contains protection against this attack since version 4.4.2 and 5.1.2.
Edit
im tied into using jsp's with java actions!
There don't appear to be any JSP-based protections for this attack vector - many descriptions on the web assume asp or php, but this link describes a fairly platform-neutral way to approach the problem (jsp used as an incidental example in it).
Basically your first step is to indentify the potentially hazardous characters (CRs, LFs, etc) and then to remove them. I'm afraid this about as robust a solution as you can hope for!
Solution
Validate input. Remove CRs and LFs (and all other hazardous characters) before embedding data into any HTTP response headers, particularly when setting cookies and redirecting. It is possible to use third party products to defend against CR/LF injection, and to test for existence of such security holes before application deployment.

Use PHP? ;)
According to Wikipedia and the PHP CHANGELOG, PHP's had protection against it in PHP4 since 4.4.2 and PHP5 since 5.1.2.
Only skimmed it -- but, this might help. His examples are written in JSP.

ok, well casting to an int is not much use when reading strings, also using regex in every action which recieves input from browser could be messy, im looking for a more robust solution

Related

Laravel 7 Sanitizing Form Data

I'm new to Laravel 7 and wondering if there's an out-of-the-box, elegant solution to sanitizing HTML form inputs? Or maybe a trusted third-party package I can download that you recommend? This is for data I will store in a database. Thanks for any help.
One recommended out-of-the-box way of sanitizing data is using the filter_var function that comes with PHP in conjunction with the different sanitize filters. By the way, this is also a cool way to validate input, take a look at the types of filters to find out more.
When working in Laravel projects, I like to use voku/portable-ascii library, because it's already a framework dependency. It is a nice assortment of functions to clean input, remove non-printable characters, and to generally transform any input into ASCII, complete with transliteration and whatnot. It's not always perfect, but usually good enough and gets the job done.
It always depends on what you want to sanitize, how, and why. In many situations you do not need to sanitize the input at all if you stick to the best practices. When working with Eloquent or the Query Builder, your data is automatically escaped and on retrieval, when you output it e.g. via {{ $data }}, they will be properly escaped too.
There are some situations where you should be more cautious, especially if you are handling the raw user input yourself and probably passing it to the system in command line parameters, filenames or such. In those cases it is usually a good idea to be as restrictive as possible and as permissive as necessary. Sometimes a good old preg_replace('/[^0-9A-Z_-]/i', '', $subject) is just the right choice. If you want to be as permissive as possible, give the suggestions above a try.

Follow all links in JSON-LD API

Say I want to consume an API that returns JSON-LD and follow all the links. (I'm experimenting with the Hydra API-Demo, but it should work with all JSON-LD APIs, not only Hydra-based ones. Any good APIs out there that I should try?)
So I want to follow all the links and my environment doesn't have native RDF support. Probably, I should first parse it with one of the libs and get it into extended form with jsonld.expand(). Then I just grab all the values with key #id. Is that the recommended way to do it or am I missing some edge-cases?
The purpose of the expansion API is to produce regular, context-free output (expanded form) for algorithmic processing -- which is exactly what it sounds like you want to do. So, yes, you've got the right approach; you shouldn't be missing any edge cases as I've understood you. After you have JSON-LD in expanded form, you can easily follow the #ids (and, if you also need to do some kind of analysis of the vocabulary/ontology, you can follow the properties which will then be fully-expanded URLs).

i18n server-side vs. client-side

Seeking some advice on two approaches to internationalization & localization. I have a web app using Spring MVC and Dojo, and I would like to support multiple languages. So, I could:
Use <spring:message> to generate the appropriate text on the server side using a properties file.
Use dojo/i18n to select the appropriate text on the client side using a js file.
And of course any combination of the two is also an option.
So, what are the pros and cons of each approach? When would you use one vs. the other?
The combination of these two approaches is the only reasonable answer.
Basically, you should try to stick to server-side, and only do client-side when it is really necessary (there is no other way, like you have some dynamically created controls).
Pros and cons? The main con of client-side string externalization is, you won't be able to translate everything correctly. That's because of context. The same English terms might be translated in a different way, depending on the context.
At the same time, you will often need to format message (add parameters to your message tag), which in regular Java you would do by calling MessageFormat.format(). Theoretically, you could do that on the client-side, but this is risky to say the least. You won't have access to original message parts (like dates, some data sources, whatever) and it might hurt translation correctness.
Formatting dates, numbers, etc. is more painful on the client-side. It is possible with Dojo or jQuery Globalize, but the results might not as good as they should be. But Spring has problem with formatting dates, anyway (lack of default local date/time designation, you may only choose from short, medium, long, full, which to me is completely useless).
Another issue might be handling plural forms (non-English). Believe, or not but languages may have more than one plural form (depending on the quantity) and because of that translations might differ. I don't think Dojo is handling it at all (however, I might be mistaken, some time has past since I evaluated it). Spring won't handle it as well, but you may build custom solution based on ICU's PluralRules (or PluralFormat if you're hard enough to learn formatting and want to kill the translators at the same time).
To cut a long story short, doing I18n correctly is far from being easy and you'll get better support on the server side.
BTW. I remember Dojo as quite "heavy", library itself was over 1MB... It might take a while to load it and your application might seem slow comparing to others... That was one of the reasons, I recommended Globalize rather than Dojo for our projects. It might not have so many features, but at least it seems lightweight.

Extracting a Hostname's TLD with a Regular Expression

Extracting an accurate representation of the top-level domain of a hostname is complicated by the fact that each top-level domain registry is free to make up its own policies regarding how domains are issued and what subdomains are defined. As there doesn't appear to be any standards body coordinating these or establishing standards, this has made determining the actual TLD a somewhat complicated affair.
Since web browsers assign cookies only to registered domains, and for security reasons must be vigilant about ensuring cookies cannot be assigned on a broader level, these browsers typically contain a database of all known TLDs in some form. I've found that Firefox has a fairly complete database:
http://hg.mozilla.org/mozilla-central/raw-file/3f91606bd115/netwerk/dns/effective_tld_names.dat
I have two specific questions:
Although it is fairly trivial to convert this listing into a regular expression, is there a gem or reference regexp that's a better solution than rolling your own? The tld gem only provides country-level info for the root-level domain.
Is there a better reference than the Firefox TLD listing? All of the local Google sites are correctly parsed by this specification, but that's hardly an exhaustive test.
If there's nothing out there, is anyone interested in a gem that performs this kind of operation? This sort of thing should be present in the URI module but is apparently missing.
Here's my take on converting this file into a usable Regexp in Ruby:
TLD_SPEC = Regexp.new(
'[^\.]+\.(' + %q[
// ***** BEGIN LICENSE BLOCK *****
// ... (Rest of file)
].split(/\n/).collect do |line|
line.sub(%r[//.*], '').sub(/\s+$/, '')
end.reject(&:blank?).collect do |s|
Regexp.escape(s).sub(/^\\\*\\\./, '[^\.]+\.')
end.join('|') + ')$'
)
You might want to look into using Addressable to see if that has what you need. It's got a lot more features than Ruby's default URI library. In particular, its template ability might help you.
From the docs:
Addressable is a replacement for the URI implementation that is part of Ruby's standard library. It more closely conforms to the relevant RFCs and adds support for IRIs and URI templates. Additionally, it provides extensive support for URI templates.
With the recent opening of the new TLDs, it's going to be a nightmare for a while. Check out the related list to the right to see how many people are trying to find a solution. Regex to match Domain.CCTLD recommends using a function to break it down into smaller steps and is what I'd do. Trying to do this with a regex assumes you can do it all in one expression, which starts to smell like using regex to parse XML or HTML. The target is too wiggly for a single pattern, or at least for a single maintainable pattern.
That answer mentions the public TLD list. Using the information there you could quickly use Ruby's Regexp.escape and Regexp.union methods to build a reasonably good regex on the fly. It'd be nice if we had Perl's Regexp::Assemble module available to us, but we don't so union will have to do. (See "Is there an efficient way to perform hundreds of text substitutions in Ruby?" for a way to work around this.)
There is another flat-file db here at http://guava-libraries.googlecode.com/svn-history/r42/trunk/src/com/google/common/net/TldPatterns.java
Perhaps you could combine the 2, and upload it to somewhere like OData.org, github, sourceforge, etc.
There's a gem called public-suffix-list which provides access to a more formalized version of the Mozilla listing.

What are the url parameters naming convention or standards to follow

Are there any naming conventions or standards for Url parameters to be followed. I generally use camel casing like userId or itemNumber. As I am about to start off a new project, I was searching whether there is anything for this, and could not find anything. I am not looking at this from a perspective of language or framework but more as a general web standard.
I recommend reading Cool URI's Don't Change by Tim Berners-Lee for an insight into this question. If you're using parameters in your URI, it might be better to rewrite them to reflect what the data actually means.
So instead of having the following:
/index.jsp?isbn=1234567890
/author-details.jsp?isbn=1234567890
/related.jsp?isbn=1234567890
You'd have
/isbn/1234567890/index
/isbn/1234567890/author-details
/isbn/1234567890/related
It creates a more obvious data structure, and means that if you change the platform architecture, your URI's don't change. Without the above structure,
/index.jsp?isbn=1234567890
becomes
/index.aspx?isbn=1234567890
which means all the links on your site are now broken.
In general, you should only use query strings when the user could reasonably expect the data they're retrieving to be generated, e.g. with a search. If you're using a query string to retrieve an unchanging resource from a database, then use URL-rewriting.
There are no standards that I'm aware of. Just be mindful of IE's URL length limit of 2,083 characters.
Standard for URI are defined by RFC2396.
Anything after the standardized portion of the URL is left to you.
You probably only want to follow a particular convention on your parameters based on the framework you use.
Most of the time you wouldn't even really care because these are not under your control, but when they are, you probably want to at least be consistent and try to generate user-friendly bits:
that are short,
if they are meant to be directly accessible by users, they should be easy to remember,
case-insensitive (may be hard depending on the server OS).
follow some SEO guidelines and best practices, they may help you a lot.
I would say that cleanliness and user-friendliness are laudable goals to strive for when presenting URLs.
StackOverflow does a fairly good job of it.
I use lowercase. Depending on the technology you use, QS is either threated as case-sensitive (eg. PHP) or not (eg. ASP). Using lowercase avoids possible confusion.
Like the other answers I've not heard about any conventions.
The only "standard" I would adhere to is to use the more search engine friendly practice of using a URL rewriter.
There are no standards that I know of, and case shouldn't matter.
However within your application (website), you should stick to your own standards. For your own sanity if nothing else.

Resources