Large form processing with ColdFusion via ajax - ajax

I use Ajax (Jquery) quite often to send forms off for processing in ColdFusion. I send the form to a CFC which returns results and errors back to the user through Ajax notificatitons.
The forms can be quite large (think full HTML pages with additional inputs) and require a lot of logic in the CFC to process correctly depending on what options were selected in the form.
Because each function within the CFC can be large (maybe 1200 lines of code), I am getting the dreaded "branch target offset too large for short" error from ColdFusion. To get around this I have put some code into a .cfm file and used <cfinclude> to get the code back into the cfc which 'solves' the problem but I feel is confusing when trying to organise all the little snippets to associate with a particular CFC. It also may be an inefficient way of working.
I would like to know how other ColdFusion users structure/handle processing forms using CFCs which do a lot of additional stuff while getting the form data into a database.
Some options I have thought of are:
Make 'shell' functions for Create, Update, Read, and Delete actions which don't have much code in them
Within the shell functions cfinclude all the code snippets by sub-function from many other .cfm files to keep the code to a minimum
OR invoke other CFCs which do the sub-functions and pass in the form variables as arguments to them
The options above will end up with me having a structure like this (which I don't like):
Article.cfc (CreateArticle, EditArticle etc)
CreateArticle_InsertImage.cfc
CreateArticle_ProcessBodytext.cfc
CreateArticle_InsertUser.cfc
CreateArticle_CheckIfExistingArticle.cfc
EditArticle_UpdateImage.cfc
EditArticle_UpdateBodytext.cfc
EditArticle_CheckIfExistingArticle.cfc
I would end up creating a new CFCs for each function rather than them being classes/objects in their own right. They could be CFM files instead (and use <cfinclude> but it seems wierd to do this like this. Is there a alternative/better/standard way anyone knows of?

Large chunks of cfif/cfelseif/cfif or cfswitch/cfcase can cause the branch offset error. We fixed this in some cases by just moving these to individual cfif statements. This severely reduces the amount of branching in your code.
Breaks:
<cfif ListFindNoCase("myString", trim(arguments.event_key)) GT 0>
<!--- myString code --->
<cfelseif ListFindNoCase("myOtherString", trim(arguments.event_key)) GT 0>
<!--- myOtherString code --->
</cfif>
Compiles:
<cfif ListFindNoCase("myString", trim(arguments.event_key)) GT 0>
<!--- myString code --->
</cfif>
<cfif ListFindNoCase("myOtherString", trim(arguments.event_key)) GT 0>
<!--- myOtherString code --->
</cfif>
You can move these conditional checks to separate functions, especially if you're ultimately just setting the value of a single variable.
If you're in a bind, just move the code to includes and break it down later. 1200 lines of code in a single function is nothing. My last gig had CFCs that were just piles of includes, extending other CFCs that were piles of includes and each include could contain 20-30k lines of code with functions that contained includes as well. The includes in the functions were often done to specifically address the branch-offset problem.
FWIW, I have a print out of a deep child object's meta-data dump on my wall as abstract art.

Related

Using cy.get() in Cypress to search for selectors with one of two properties

I'm using Cypress to write automated tests. This is for a codebase I have no control over, so I can't make any changes to the app itself.
I'm attempting to run a .each loop to run through a set of collapsible fields to verify the data in each of them is correct. The fields list medical problems and associated data. The issue is that there are two lists of fields, one for active problems, and one for resolved problems, where the only difference between them is the data-cy tags. Those are the only unique identifiers for these elements, so I have to use the data-cy tags to select these without selecting other elements in the same container.
I'd be able to run the exact same .each function on both sets of elements, but I currently can't due to the elements not having the same data-cy tags. Is it possible for me to to have the Cypress .get call search for elements with one of two properties? Something like the following:
cy.get('[data-cy="problem-entry"]' OR '[data-cy="resolved-problem"]')
EDIT: Also, to clarify - I am currently able to get the test to behave by just duplicating the .each loop, once for each data-cy tag. Since the loop is several hundred lines of code, I want to remove the redundancy to clean this up a bit.
This answer is not perfect fix, but may work in your case: if these selectors are the only data-cy with the word "problem" in their value, you could do something like this:cy.get('[data-cy*="problem"]'). This will choose any data-cy which contains word "problem".
If that is not the case, I would like to address your "EDIT" message: you may put the whole code (several hundred lines) in a Cypress custom command and then call it, so instead of copying the code and calling it twice, you would just call your custom command twice (one line each).

Laravel: advice for showing big, unchanging outline

My laravel app is a big family tree website. I have Person records (id, first, last, DOB, parent_family_id, etc) and Family records (id, caption, mother_id, father_id, original_bool).
Everything's done now except for my last challenge: an outline view! I have a number of 'original families' (the earliest ones), and for each of them I'd like to be able to display their descendants all the way down to present day in an outline form (kids, those kids' families, those families' kids, etc).
So far I've written a few functions in the relevant controller:
get_kids_of_family($family)
get_families_person_made($person)
get_descendants($family, $results)- this will be a recursive function that calls get_kids_of_family, adds them to a results set (some gigantic array that I'll pass along?), then for each of them it calls get_families_person_made, adds them to the results set, calls get_descendants on each of those families, etc.
Then it occurred to me: even if I can get this working, it'll be crunching around through a lot of data, and it rarely changes (weekly/monthly at most, with the discovery of a new person). Therefore I'm wondering if it makes sense to make/run a script to save the results someplace (like an xml or json file) and then somehow pass that file to my view for display.
Thoughts? What's a good (but not too advanced) way to implement something like this in a laravel app?
Thanks in advance for your help!

Purpose of web app input validation for security reasons

I often encounter advice for protecting a web application against a number of vulnerabilities, like SQL injection and other types of injection, by doing input validation.
It's sometimes even said to be the single most important technique.
Personally, I feel that input validation for security reasons is never necessary and better replaced with
if possible, not mixing user input with a programming language at all (e.g. using parameterized SQL statements instead of concatenating input in the query strings)
escaping user input before mixing it with a programming or markup language (e.g. html escaping, javascript escaping, ...)
Of course for a good UX it's best to catch input that would generate errors on the backand early in the GUI, but that's another matter.
Am I missing something or is the only purpose to try to make up for mistakes against the above two rules?
Yes you are generally correct.
A piece of data is only dangerous when "used". And it is only dangerous if it has special meaning in the context it is used.
For example, <script> is only dangerous if used in output to an HTML page.
Robert'); DROP TABLE Students;-- is only dangerous when used in a database query.
Generally, you want to make this data "safe" as late as possible. Such as HTML encoding when output as HTML to an HTML page, and parameterised when inserting into a database. The big advantage of this is that when the data is later retrieved from these locations, it will be returned in its original, unsanitized format.
So if you have the value A&B O'Leary in an input field, it would be encoded like so:
<input type="hidden" value="A& O'Leary" />
and if this is submitted to your application, your programming framework will automatically decode it for you back to A&B O'Leary. Same with your DB:
string name = "A&B O'Leary";
string sql = "INSERT INTO Customers (Name) VALUES (#Name)";
SqlCommand command = new SqlCommand(sql);
command.Parameters.Add("#Name", name];
Simples.
Additionally if you then need to give the user any output in plain text, you should retrieve it from your DB and spit it out. Or in JavaScript - you just JavaScript entity encode (although best avoided for complexity reasons - I find it easier to secure if I only output to HTML then read the values from the DOM).
If you'd HTML encoded it early, then to output to JavaScript/JSON you'd first have to convert it back then hex entity encode it. It will get messy and some developers will forget they have to decode first and you will have &amps everywhere.
You can use validation as an additional defence, but it should not be the first port of call. For example, if you are validating a UK postcode you would want to whitelist the alphanumeric characters in upper and lower cases. Any other characters would be rejected or removed by your application. This can reduce the chances of SQLi or XSS occurring on your application, but this method falls down where you need inputs to include characters that have special meaning to your output context (" '<> etc). For example, on Stack Overflow if they did not allow characters such as these you would be preventing questions and answers from including code snippets which would pretty much make the site useless.
Not all SQL statements are parameterizable. For example, if you need to use dynamic identifiers (as opposed to literals). Even whitelisting can be hard, sometimes it needs to be dynamic.
Escaping XSS on output is a good idea. Until you forget to escape it on your admin dashboard too and they steal all your admin's cookies. Don't let XSS in your database.

HTML/XSS escape on input vs output

From everything I've seen, it seems like the convention for escaping html on user-entered content (for the purposes of preventing XSS) is to do it when rendering content. Most templating languages seem to do it by default, and I've come across things like this stackoverflow answer arguing that this logic is the job of the presentation layer.
So my question is, why is this the case? To me it seems cleaner to escape on input (i.e. form or model validation) so you can work under the assumption that anything in the database is safe to display on a page, for the following reasons:
Variety of output formats - for a modern web app, you may be using a combination of server-side html rendering, a JavaScript web app using AJAX/JSON, and mobile app that receives JSON (and which may or may not have some webviews, which may be JavaScript apps or server-rendered html). So you have to deal with html escaping all over the place. But input will always get instantiated as a model (and validated) before being saved to db, and your models can all inherit from the same base class.
You already have to be careful about input to prevent code-injection attacks (granted this is usually abstracted to the ORM or db cursor, but still), so why not also worry about html escaping here so you don't have to worry about anything security-related on output?
I would love to hear the arguments as to why html escaping on page render is preferred
In addition to what has been written already:
Precisely because you have a variety of output formats, and you cannot guarantee that all of them will need HTML escaping. If you are serving data over a JSON API, you have no idea whether the client needs it for a HTML page or a text output (e.g. an email). Why should you force your client to unescape "Jack & Jill" to get "Jack & Jill"?
You are corrupting your data by default.
When someone does a keyword search for 'amp', they get "Jack & Jill". Why? Because you've corrupted your data.
Suppose one of the inputs is a URL: http://example.com/?x=1&y=2. You want to parse this URL, and extract the y parameter if it exists. This silently fails, because your URL has been corrupted into http://example.com/?x=1&y=2.
It's simply the wrong layer to do it - HTML related stuff should not be mixed up with raw HTTP handling. The database shouldn't be storing things that are related to one possible output format.
XSS and SQL Injection are not the only security problems, there are issues for every output you deal with - such as filesystem (think extensions like '.php' that cause web servers to execute code) and SMTP (think newline characters), and any number of others. Thinking you can "deal with security on input and then forget about it" decreases security. Rather you should be delegating escaping to specific backends that don't trust their input data.
You shouldn't be doing HTML escaping "all over the place". You should be doing it exactly once for every output that needs it - just like with any escaping for any backend. For SQL, you should be doing SQL escaping once, same goes for SMTP etc. Usually, you won't be doing any escaping - you'll be using a library that handles it for you.
If you are using sensible frameworks/libraries, this is not hard. I never manually apply SQL/SMTP/HTML escaping in my web apps, and I never have XSS/SQL injection vulnerabilities. If your method of building web pages requires you to remember to apply escaping, or end up with a vulnerability, you are doing it wrong.
Doing escaping at the form/http input level doesn't ensure safety, because nothing guarantees that data doesn't get into your database or system from another route. You've got to manually ensure that all inputs to your system are applying HTML escaping.
You may say that you don't have other inputs, but what if your system grows? It's often too late to go back and change your decision, because by this time you've got a ton of data, and may have compatibility with external interfaces e.g. public APIs to worry about, which are all expecting the data to be HTML escaped.
Even web inputs to the system are not safe, because often you have another layer of encoding applied e.g. you might need base64 encoded input in some entry point. Your automatic HTML escaping will miss any HTML encoded within that data. So you will have to do HTML escaping again, and remember to do, and keep track of where you have done it.
I've expanded on these here: http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/
The original misconception
Do not confuse sanitation of output with validation.
While <script>alert(1);</script> is a perfectly valid username, it definitely must be escaped before showing on the website.
And yes, there is such a thing as "presentation logic", which is not related to "domain business logic". And said presentation logic is what presentation layer deals with. And the View instances in particular. In a well written MVC, Views are full-blown objects (contrary to what RoR would try to to tell you), which, when applied in web context, juggle multiple templates.
About your reasons
Different output formats should be handled by different views. The rules and restrictions, which govern HTML, XML, JSON and other formats, are different in each case.
You always need to store the original input (sanitized to avoid injections, if you are not using prepared statements), because someone might need to edit it at some point.
And storing original and the xss-safe "public" version is waste. If you want to store sanitized output, because it takes too much resources to sanitize it each time, then you are already pissing at the wrong tree. This is a case, when you use cache, instead of polluting the database.

What is speculative parsing?

I've read that Firefox 3.5 has a new feature in its parser ?
Improvements to the Gecko layout
engine, including speculative parsing
for faster content rendering.
Could you explain that in simple terms.
It's all to do with this entry in bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=364315
In that entry, Anders Holbøll suggested:
It seems that when encountering a script-tag, that references an external file,
the browser does not attempt to load any elements after the script-tag until
the external script files is loaded. This makes sites, that references several
or large javascript files, slow.
...
Here file1.js will be loaded first, followed sequentially by file2.js. Then
img1.gif, img2.gif and file3.js will be loaded concurrently. When file3.js has
loaded completely, img3.gif will be loaded.
One might argue that since the js-files could contain for instance a line like
"document.write('<!--');", there is no way of knowing if any of the content
following a script-tag will ever be show, before the script has been executed.
But I would assume that it is far more probable that the content would be shown
than not. And in these days it is quite common for pages to reference many
external javascript files (ajax-libraries, statistics and advertising), which
with the current behavior causes the page load to be serialized.
So essentially, the html parser continues reading through the html file and loading referenced links, even if it is blocked from rendering due to a script.
It's called "speculative" because the script might do things like setting css parameters like "display: none" or commenting out sections of the following html, and by doing so, making certian loads unnecessary... However, in the 95% use case, most of the references will be loaded, so the parser is usually guessing correctly.
I think it means that when the browser would normally block (for example for a script tag), it will continue to parse the HTML. It will not create an actual DOM until the missing pieces are loaded, but it will start fetching script files and stylesheets in the background.

Resources