Bug in Chrome, or Stupidity in User? Sanitising inputs on forms? - webforms

I've written a more detailed post about this on my blog at:
http://idisposable.co.uk/2010/07/chrome-are-you-sanitising-my-inputs-without-my-permission/
but basically, I have a string which is:
||abcdefg
hijklmn
opqrstu
vwxyz
||
the pipes I've added to give an indiciation of where the string starts and ends, in particular note the final carriage return on the last line.
I need to put this into a hidden form variable to post off to a supplier.
In basically, any browser except chrome, I get the following:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz
" />
but in chrome, it seems to apply a .Trim() or something else that gives me:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz" />
Notice it's cut off the last carriage return. These carriage returns (when Encoded) come up as %0A if that helps.
Basically, in any browser except chrome, the whole thing just works and I get the desired response from the third party. In Chrome, I get an 'invalid pareq' message (which suggests to me that those last carriage returns are important to the supplier).
Chrome version is 5.0.375.99
Am I going mad, or is this a bug?
Cheers,
Terry

You can't rely on form submission to preserve the exact character data you include in the value of a hidden field. I've had issues in the past with Firefox converting CRLF (\r\n) sequences into bare LFs, and your experience shows that Chrome's behaviour is similarly confusing.
And it turns out, it's not really a bug.
Remember that what you're supplying here is an HTML attribute value - strictly, the HTML 4 DTD defines the value attribute of the <input> element as of type CDATA. The HTML spec has this to say about CDATA attribute values:
User agents should interpret attribute values as follows:
Replace character entities with characters,
Ignore line feeds,
Replace each carriage return or tab with a single space.
User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
So whitespace within the attribute value is subject to a number of user agent transformations - conforming browsers should apparently be discarding all your linefeeds, not only the trailing one - so Chrome's behaviour is indeed buggy, but in the opposite direction to the one you want.
However, note that the browser is also expected to replace character entities with characters - which suggests you ought to be able to encode your CRs and LFs as 
 and
, and even spaces as , eliminating any actual whitespace characters from your value field altogether.
However, browser compliance with these SGML parsing rules is, as you've found, patchy, so your mileage may certainly vary.

Confirmed it here. It trims trailing CRLFs, they don't get parsed into the browser's DOM (I assume for all HTML attributes).
If you append CRLF with script, e.g.
var pareqMsg = document.forms[0]['pareqMsg']
if (/\r\n$/.test(pareqMsg.value) == false)
pareqMsg.value += '\r\n';
...they do get maintained and POSTed back to the server. Although the hidden <textarea> idea suggested by Gaby might be easier!

Normally in an input box you cannot enter (by keyboard) a newline.. so perhaps chrome enforces this even for embedded, through the attributes, values ..
try using a textarea (with display:none)..

Related

How to Escape Double Quotes from Ruby Page Object text

In using the Page Object gem, I'm trying to pull text from a page to verify error messages. One of these error messages contains double-quotes, but when the page object pulls the text from the page, it pulls some other characters.
expected ["Please select a category other than the Default â?oEMSâ?? before saving."]
to include "Please select a category other than the Default \"EMS\" before saving."
(RSpec::Expectations::ExpectationNotMetError)
I'm not quite sure how to escape these - I'm not sure where I could use Regexs and be able to escape these odd characters.
Honestly you are over complicating your validation.
I would recommend simplifying what you are trying to do, start by asking yourself: Is the part in quotes a critical part of your validation?
If it is, isolate it by doing a String.contains("EMS")
If it is not, then you are probably doing too much work, only check for exactly what you need in validation:
String.beginsWith("Please select a category other than the Default")
With respect to the actual issue you are having, on a technical level you have an encoding issue. Encode your result string with utf-8 before you pass it to your validation and you will be fine.
Good luck
It's pretty likely that somewhere along the line encoded the string improperly. (A tipoff is the accented characters followed by ?.) It seems pretty likely that the quotes were converted to "smart quotes" somewhere. This table compares Window-1252 to UTF-8:
Code Point Characters UTF-8 Bytes
Unicode Windows
1252 Expected Actual
------ ---- - --- -----------
U+201C 0x93 “ “ %E2 %80 %9C
U+201D 0x94 ” †%E2 %80 %9D
What you'll want to do is spot check various places in the code to find the first place the string is encoded in something other than UTF-8:
puts error_str.encoding
(For clarity, error_str is the variable that holds the string you are testing. I'm using puts, but you might want have another way to log diagnostic messages.)
Once you find the string that's not encoded UTF-8, you can convert it:
error_str.encode('UTF-8')
Or, if the string is hardcoded somewhere, just replace the string.
For more debugging advice, see: 3 Steps to Fix Encoding Problems in Ruby and How to Get From They’re to They’re.

First Name and Last Name Validation vs XSS Attack

My online research seems to show that firstnames and lastnames should not be heavily validated, to accommodate the variety of names out there. In fact, people have even advocated no validation altogether for the names. However, the possibility of xss attacks via the input fields make me worried. I checked the google naming guidelines, and they seem pretty relaxed and allow unicode characters as well as stuff like "%$#^&*...." !!
So, what would be the best approach to take, and how do I balance this out ?
ps - I don't intend to spark a debate here. I am genuinely confused and need help understanding the best approach to take !
Validation and XSS are two very different concepts. You cannot balance them. You cannot "sometimes allow XSS". You also do not want to allow input that does not make sense, or that you can't use. If you require an email for something, you can allow an user to enter "mailme at gmail dot com", but if you do not know how to parse this, then there is no point in allowing this as an input in the first place.
When you talk about validating a 'first name field', you ask yourself: "What kind of data do I want to accept in this field, and what kind of data do I not want to accept in this field?". I am not aware of a language where "%" can be part of a first name, so it is probably a safe bet to disallow this character. You have to tackle this problem alone, without even thinking about XSS. If a character, or a sequence of characters, does not make sense as a value for the thing you want to know, you should not include it. If a character does make sense to include, you should not decide otherwise because it has some special meaning.
XSS is the problem where incorrectly escaped (user) input is returned to the browser, allowing a possible attacker to load/run third-party scripts. It has nothing to do with validation. If the character "a" is potentially unsafe, would you disallow it from the first name field? The solution can be found in the definition: The problem exists if, and only if, the user input is incorrectly escaped.
Think about how you are going to sent back this data to the user. I take as example an input field: <input value="" />, but if you were going to put it in a textarea for example, you would need to alter your data for that. Inserting it between a <div></div> tag would require something entirely different again, and inserting it inside a script that is in <script></script> tags would require something different than all the previous things. There is no one-size-fits-all-solution.
For the input field example, find out what characters have a special meaning in this input field. The delimiter of the value attribute (" in value="") is one of the characters that has a special meaning. If there are any other special characters, you find them in accompanying documentation. You have to escape such characters. Escaping is the act of removing the special meaning from a character. How you do that can be found in the accompanying documentation. In case of an input element in html, you'll need to turn the special character into it's entity-form (" would become "). Php provides built-in functions to do this, but you should always be wary of what such a function actually does and if this function actually gives the desired output for every use-case.
tl;dr There is no balance. You use validation on a field to get the data you actually want. If you want to present this data to the user, you have to escape the data for the special case where you want to display this data.
Example: Let's look at the following case. We have a textarea. We allow the characters a-z, <, >, (, ), {, }, / and ; in any order. If the textarea contains other characters we consider it invalid. If the textarea is valid, we put the characters in the textarea between <div> and </div> in the html document.
From the definition above, you can derive that asdf is a valid input and that <scri (random nonsense to bypass faulty proxy) pt>alert();</sc (more random nonsense) ript> is also a valid input, but 123 isn't. That is the definition. The logic that handles validation should flawlessly discriminate between those two things. You probably notice that the second valid input may provide a problem, but that is of no concern to the validate function. The validate function only checks if the text matches the description of what we consider valid input.
If the text in the textarea is valid, the definition says we should put it between the div tags. This is where you start worrying about XSS. There are some characters, namely the < and > character that have a special meaning in html. Because they are valid input, we should remove their special meaning when we insert them in the html. If the textarea is invalid, we can't do anything. We would display a descriptive error message how it should be improved.
The pseudo-implementation below shows what I try to explain above. In a real-life application that communicates with the server, the server should do validation too, but it should show how both concepts are separated and should allow you to test things.
$('#billy').on('click', function(e) {
if (validate($('#txt').val())) {
$('#status').text("The textarea is valid. The contents have been inserted as html in the page.");
$('#result').html($('#txt').val());
} else {
$('#status').text("The textarea is INVALID. It contains characters we don't want.");
}
});
$('#betty').on('click', function(e) {
if (validate($('#txt').val())) {
$('#status').text("The textarea is valid. The contents have been inserted as html in the page.");
$('#result').html(escapeforhtml($('#txt').val()));
} else {
$('#status').text("The textarea is INVALID. It contains characters we don't want.");
}
});
function validate(txt) {
return txt.match(/^[a-z{}\/\(\)<>;]*$/);
}
//We know only a limited amount of characters can be inserted.
//From those, < and > are the only characters that have a special
//meaning.
function escapeforhtml(txt) {
return txt.replace(/</g, "<").replace(/>/g, ">");
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<div id="status"></div>
<textarea id="txt" cols="60" rows="10"></textarea>
<br/>
<input type="button" id="billy" value="Do something">
<input type="button" id="betty" value="Do something while ESCAPING">
<p>Result:</p>
<div id="result"></div>

Processing form input in a Joomla component

I am creating a Joomla component and one of the pages contains a form with a text input for an email address.
When a < character is typed in the input field, that character and everything after is not showing up in the input.
I tried $_POST['field'] and JFactory::getApplication()->input->getCmd('field')
I also tried alternatives for getCmd like getVar, getString, etc. but no success.
E.g. John Doe <j.doe#mail.com> returns only John Doe.
When the < is left out, like John Doe j.doe#mail.com> the value is coming in correctly.
What can I do to also have the < character in the posted variable?
BTW. I had to use & lt; in this question to display it as I want it. This form suffers from the same problem!!
You actually need to set the filtering that you want when you grab the input. Otherwise, you will get some heavy filtering. (Typically, I will also lose # symbols.)
Replace this line:
JFactory::getApplication()->input->getCmd('field');
with this line:
JFactory::getApplication()->input->getRaw('field');
The name after the get part of the function is the filtering that you will use. Cmd strips everything but alphanumeric characters and ., -, and _. String will run through the html clean tags feature of joomla and depending on your settings will clean out <>. (That usually doesn't happen for me, but my settings are generally pretty open to the point of no filtering on super admins and such.
getRaw should definitely work, but note that there is no filtering at all, which can open security holes in your application.
The default text filter trims html from the input for your field. You should set the property
filter="raw"
in your form's manifest (xml) file, and then use getRaw() to retrieve the value. getCmd removes the non-alphanumeric characters.

Freemarker Interpolation stripping whitespace?

I seem to be having issues with leading/trailing spaces in textareas!
If the last user has typed values into a textarea with leading/trailing spaces across multiple lines, they all disappear with exception to one space in the beginning & end.
Example:
If the textbox had the following lines: (quotes present only to help illustrate spaces)
" 3.0"
" 2.2 "
"0.3 "
it would be saved in the backend as
"<textarea id=... > 3.0/n 2.2 /n0.3 </textarea>"
My template (for this part) is fairly straightforward (entire template, not as easy...): ${label} ${textField}
When I load up the values again, I notice getTextField() is properly getting the desired string, quoted earlier... But when I look at the html page it's showing
" 3.0"
"2.2"
"0.3 "
And of course when "View Sourcing" it doesn't have the string seen in getTextField()
What I've tried:
Ensure the backend has setWhitespaceStripping(false); set
Adding the <#ftl strip_whitespace=false>
Adding the <#nl> on the same line as ${textField}
No matter what I've tried, I'm not having luck keeping the spaces after the interpolation.
Any help would be very appreciated!
Maybe you are inside a <#compress>...</#compress> (or <#compress>...</#compress>) block. Those filter the whole output on runtime and reduce whitespace regardless where it comes from. I recommend not using this directive. It makes the output somewhat smaller, but it has runtime overhead, and can corrupt output in cases like this.
FreeMarker interpolations don't remove whitespace from the inserted value, or change the value in any way. Except, if you are lexically inside an <#escape ...>....</#escape>, block, that will be automatically applied. But it's unlikely that you have an escaping expression that corrupts whitespace. But to be sure., you can check if there's any <#escape ...> in the same template file (no need to check elsewhere, as it's not a runtime directive).
strip_whitespace and #nt are only removing white-space during parsing (that's before execution), so they are unrelated.
You can also check if the whitespace is still there in the inserted value before inserting like this:
${textField?replace(" ", "[S]")?replace("\n", "[N]")?replace("\t", "[T]")}
If you find that they were already removed that probably means that they were already removed before the value was put into the data-model. So then if wasn't FreeMarker.

Passing colons in query string in Apex

I have a link in an apex report which takes the user to different page, and it passes some values to the new page. The button is set to a url because there are too many items being passed, but I don't think that would matter anyway:
f?p=&APP_ID.:27:&SESSION.::&DEBUG.::P27_1,P27_2,P27_3,P27_4,P27_5:0,#1#,#2#,#3#,#NULL#
The #1#, etc. are columns being passed. Everything seems to work correctly except that the data being passed often contains a colon (:), which messes up Apex's built in colon structure by cutting off anything in the new page's item that happens after the colon (including the colon itself) as well as messing up any fields after that. For example: #2# has a colon in it, so P27_3, 4, and 5 will not be filled with values.
I've tried manually replacing the colon with a '%3a' (the url encoding for colon), but it doesn't seem to work.
Try using UTL_URL.ESCAPE() to escape URL special characters and UTL_URL.UNESCAPE() to un-escape them back.
You can also try APEX_UTIL.URL_ENCODE() but you need to use one or the other, i.e. either UTL or APEX_UTIL.

Resources