What's the purpose of tainting Ruby objects? - ruby

I'm aware of the possibility to mark untrusted objects as tainted, but what's the underlying purpose and why should I do it?

One tracks taint as a security precaution, in order to ensure that untrusted data isn't mistakenly used for calculations, transactions, or interpreted as code.
Tracking taint via a built-in language feature is more clear and more reliable than tracking via coding conventions or relying on code review.
For example, input from the user can generally be considered 'untrusted' until it has been sanitized properly for insertion into the database. By marking the input as tainted, Ruby ensures satisfactory sanitation takes place and prevents a potential SQL injection attack.
For an example of an "ancient" (2005) coding practice that demonstrates how taint was tracked without such Perl and Ruby modules, read some good old Joel:
http://www.joelonsoftware.com/articles/Wrong.html

It used to be a pretty standard practice when writing CGIs in Perl. There is even a FAQ on it. The basic idea was that the run time could guarantee that you did not implicitly trust a tainted value.

Related

How can I pass Selenium WebDriver objects between seperate Ruby processes?

I want to pass an instance of an object between two Ruby processes. Specifically, I want to pass an instance of a Selenium WebDriver from one process to another process. The reason I want to do this is because it takes a lot of time for Ruby to create this object, but I want it to be used by the other process.
I've found some related questions here and here that seem to point towards using DRb, but I've been unable to find any useful examples or sample code.
Is there a tool other than DRb that I should be using? Does anyone have an example similar to this that I could copy from?
It looks like you're going to have to use DRb, although the documentation for it seems to be lacking. There is however an interesting article here. You might also want to consider purchasing The dRuby Book by Masatoshi Seki to get a better idea of how to do this effectively.
Another option to investigate if you are not looking at simultaneous access, but you just want to send the object from one process to another, is to serialize (that is, encode in a way that Ruby can read) the object with YAML (for a human readable file) or Marshall (for a binary encoded file) and send it using a pipe. This was mentioned in another answer that has since been deleted.
Note that either of these solutions require modifying the Selenium code heavily since the objects you want to manipulate neither support copying, nor simultaneous access natively.
TL;DR
Most queue or distributed processes are going to require some sort of serialization to work properly. If you want to pass objects rather than messages, then this will a limiting factor in how you approach the problem.
DRb
I don't know if you can marshal a WebDriver object. If you can't, then DRb may be a good choice for your distributed Ruby programs because it supports DRbObject references for things that can't be marshaled. There are some examples provided in the DRb documentation.
Selenium Wire Protocol
Depending on what you're really trying to do, it may be worth taking a closer look at using the remote bindings for the Remote WebDriver client/server, or Selenium's JSON Wire Protocol as an alternative to passing objects between processes.
Other Alternatives: Fixtures, Factories, Stubs, and Mocks
Whether or not these work in your specific case will depend a lot on why you want to pass objects instead of simply driving the remote server. If it's largely an issue of how long it takes to build your object, then the serialization/de-serialization cycle may not necessarily be faster in all cases.
You might want to revisit why your object is so slow to create. If gathering and processing the data for it is what's taking too long, you can use some sort of test fixture or factory to trim that time, either by using a smaller set of fixed data, or using a pre-serialized object that's optimized for speed.
You might also consider whether you actually need real data or objects for your test at all. In many cases, you can speed up your tests a lot by stubbing methods or creating mock objects that will return the values you need for your integration tests without needing to perform expensive calculations or long-running operations.
There are certainly cases where you need to drive the full stack and perform acceptance tests on real data. Even then, you may be able to devise a set of fixture data that will take less time or memory to process. It's certainly worth at least thinking about.

`global` assertions?

Are there any languages with possibility of declaring global assertions - that is assertion that should hold during the whole program execution. So that it would be possible to write something like:
global assert (-10 < speed < 10);
and this assertion will be checked every time speed changes state?
eiffel supports all different contracts: precondition, postcondition, invariant... you may want to use that.
on the other hand, why do you have a global variable? why don't you create a class which modifies the speed. doing so, you can easily check your condition every time the value changes.
I'm not aware of any languages that truly do such a thing, and I would doubt that there exist any since it is something that is rather hard to implement and at the same time not something that a lot of people need.
It is often better to simply assert that the inputs are valid and modifications are only done when allowed and in a defined, sane way. This concludes the need of "global asserts".
You can get this effect "through the backdoor" in several ways, though none is truly elegant, and two are rather system-dependent:
If your language allows operator overloading (such as e.g. C++), you can make a class that overloads any operator which modifies the value. It is considerable work, but on the other hand trivial, to do the assertions in there.
On pretty much every system, you can change the protection of memory pages that belong to your process. You could put the variable (and any other variables that you want to assert) separately and set the page to readonly. This will cause a segmentation fault when the value is written to, which you can catch (and verify that the assertion is true). Windows even makes this explicitly available via "guard pages" (which are really only "readonly pages in disguise").
Most modern processors support hardware breakpoints. Unless your program is to run on some very exotic platform, you can exploit these to have more fine-grained control in a similar way as by tampering with protections. See for example this article on another site, which describes how to do it under Windows on x86. This solution will require you to write a kind of "mini-debugger" and implies that you may possibly run into trouble when running your program under a real debugger.

A scripting engine for Ruby?

I am creating a Ruby On Rails website, and for one part it needs to be dynamic so that (sorta) trusted users can make parts of the website work differently. For this, I need a scripting language. In a sort of similar project in ASP.Net, I wrote my own scripting language/DSL. I can not use that source code(written at work) though, and I don't want to make another scripting language if I don't have to.
So, what choices do I have? The scripting must be locked down and not be able to crash my server or anything. I'd really like if I could use Ruby as the scripting language, but it's not strictly necessary. Also, this scripting part will be called on almost every request for the website, sometimes more than once. So, speed is a factor.
I looked at the RubyLuaBridge but it is Alpha status and seems dead.
What choices for a scripting language do I have in a Ruby project?
Also, I will have full control over where this project is deployed(root access), so there are no real limits..
There's also Rufus-lua though it's at version 0.1.0...
What about JRuby? You can use java implementation of many scripting language, such as javascript, scheme etc
Well, since it hasn't been suggested yet, there's Locking Ruby In The Safe as described by the Pickaxe book. This allows you to use Ruby as the language without significant slowdown AFAIK.
This technique is intended to allow safe sandboxing of untrusted Ruby code and bug fixes and discussions are directed toward keeping it that way, but infinite loops and some other things still allow malicious users to peg the CPU. (e.g. this discussion maybe.)
What I don't know is how you return data that is inherently safe to use from outside the safe thread. A singleton object (for instance) can mimic whatever class and then do something dangerous when any method is called in the returning thread. I'm still googling around about it. (The Ruby Programming Language says that level 4 "Prevents metaprogramming methods" which would allow you to safely verify the class of a returned object, which I suppose would make results safe to use.)
Barring that, it might not be hard (*snrk*) to implement a Lisp-1 with dynamic scope since you already have a garbage collector.

Should you wrap 3rd party libraries that you adopt into your project? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
A discussion I had with a colleague today.
He claims whenever you use a 3rd party library, you should always write for it a wrapper. So you can always change things later and accomodate things for your specific use.
I disagree with the word always, the discussion arose regarding log4j and I claimed that log4j has well tested and time proven API and implementation, and everything thinkable can be configured a posteriori and there is nothing you should wrap. Even if you wanted to wrap there are proven wrappers like commons-logging and log5j.
Another example that we touched in our discussion is Hibernate. I claimed that it has a very big API to be wrapped. Furthermore it has a layered API which lets you tweak its inside if you so need. My friend claimed that he still believes it should be wrapped but he didn't do it because of the size of the API (this co-worker is much veteran than me in our current project).
I claimed this, and that wrapping should be done in specific cases:
you are not sure how the library will fit your needs
you will only use a small portion of a libary (in which case you may only expose a part of its API).
you are not sure of the quality of the library's API or implementation.
I also maintained that sometimes you can wrap your code instead of the library. For example, puting your database related code in a DAO layer, instead of preemptively wrapping all of hibernate.
Well, in the end this is not really a question, but your insights, experiences and opinions are highly appreciated.
It's a perfect example for YAGNI:
it is more work
it inflates your project
it may complicate your design
it has no immediate benefit
the scenarion you write it for may never manifest
when it does, your wrapper most likely needs to be re-written completely because it is tied too closely to the concrete library you were using and the new one's API simply doesn't match yours.
Well, the obvious benefit is for switching technologies. If you have a library that becomes deprecated, and you want to switch, you may end up rewriting a lot of code to accommodate the change, whereas if it were wrapped, you'd have an easier time writing a new wrapper for the new lib, than changing all your code.
On the other hand, it would mean that you have to write a wrapper for every trivial library that you include, which is probably an unacceptable amount of overhead.
My industry is all about speed, so the only time I'd be able to justify writing a wrapper is if it was around some critical library that was likely to change dramatically on a regular basis. Or, more commonly, if I need to take a new library and shoehorn it into old code, which is an unfortunate reality.
It's definitely not an "always" situation. It's something that may be desirable. But the time isn't always going to be there, and, in the end, if writing a wrapper takes hours and the long term code library changes are going to be few, and trivial...Why bother?
No. Java architects/wanna-bees are too busy designing against imaginary changes.
With modern IDE, it's a piece of cake when you do need change. Until then, keep it simple.
I agree with everything that's been said pretty much.
The only time wrapping third party code is useful (bar violating YAGNI) is for unit testing.
Mocking statics and so forth requires you to wrap the code, this is a valid reason to write wrappers for third party code.
In the case of logging code, its not needed though.
The problem here is partially the word 'wrapper', partially a false dichotomy, and partially a false distinction between the JDK and everything else.
The word 'wrapper'
Wrapping all of Hibernate, as you say, is a completely impractical enterprise.
Restricting the Hibernate dependencies to an identified, controlled, set of source files, on the other hand, may well be practical and achieve the same results.
The false dichotomy
The false dichotomy is the failure to recognize a third option: standards. If you use, say, JPA annotations, you can swap Hibernate for other things. If you are writing a web service and use JAX-WS annotations and JAX-B, you can swap between the JDK, CXF, Glassfish, or whatever.
The false distinction
Sure, the JDK changes slowly and is unlikely to die. But major open source packages also change slowly and are unlikely to die. Untold thousands of developers and projects use Hibernate. There's really no more risk of Hibernate disappearing or making radical incompatible API changes than there is of Java itself.
If the library you are planning to wrap is unique in its "access principles, metaphors and idioms" from other offerings in the same domain, then your wrapper is pretty much going to be similar to that library and won't do you any good if you one day switch to a different library since you will need a new wrapper.
If the library is accessed in a similar way to other libraries and the same wrapper can apply to these libraries, then they are probably written based on some existing standard and there is some common layer that already exists to access both of them.
I would only go with wrappers if I knew for sure that I would have to support multiple and substantially different libraries in production.
The main factor for deciding to wrap a library or not is the impact a library change will have on the code. When a library is only called from 1 class the impact of changing library will be minimal. If on the other side a library is called in all classes a wrapper is much more likely.
Any uncertainty around the choice of 3rd party library should be flushed out at the beginning of the project using prototypes to test the scalability/suitability/whatever of the 3rd party library.
If you decide to go ahead and provide full de-coupling/abstraction support it should be costed up and ultimately approved by the project sponsor - ultimately it's a commercial decision as someone has to pay for it and the work required to do it (unless it's absolutely trivial, in which case the api is probably low risk anyway).
Generally an experienced architect will chose a technology that they can be reasonably confident with, and have experience of, and that they are confident will last the lifetime of the app, OR else they will eliminate any risk in the decision early on in the project, thus removing any need to do this, most of the time
I'd tend to agree with most of your points. Using absolutes often gets you into trouble and saying you should "always" do something limits your flexibility. I'd add some more points to your list.
When you use wrapping code around a very common API, like Hibernate or log4j you make it more difficult to bring on new developers. New developers now have to learn a whole new API, where if you hadn't wrapped the code they would have been very familiar right away.
On the flip side of that, you also limit your developers' view into the API. Using an advanced feature of the API takes more time because you have to make sure that your wrapper is implemented in a way that can handle it.
Many of the wrapping layers I've seen also are very specific to the underlying implementation. So, if you write a log wrapper around log4j, you are thinking in log4j terms. If some new cool framework comes out, it may change the whole paradigm, so your wrapping code doesn't migrate as well as you had thought.
I'm definitely not saying wrapping code is always bad, but as you stated, there are a lot of factors you have to consider.
The purpose of wrapping even a well-tested and time-proven 3rd-party library is that you might decide to switch libraries at some point in the future. Wrapping it makes it easier to switch without changing any code in your core application. Only the wrapper needs to change.
If you're absolutely sure that you'll never (another absolute) use a different logging framework in your project, go ahead and skip the wrapper. Even having said that, I'd probably hold off on writing the wrapper until I knew I needed it, like the first time I need to switch.
This is kind of a funny question.
I've worked in systems where we've found showstopper bugs in libraries we were using, and which upstream was either no longer maintaining, or not interested in fixing. In a language like Java, you usually can't fix internal bugs from a wrapper. (Fortunately, if they're open-source, you can at least fix them yourself.) So it's no help here.
But I'm often working in a language where you can easily modify libraries at any time, without seeing or even having their source code -- I commonly add new methods to existing classes, for example. So in this case, there's no point in wrapping: just make the change you want.
Also, does your colleague draw the line at things called "libraries"? What about Java itself? Does he wrap built-in classes? Does he wrap the filesystem? The thread scheduler? The kernel? (That is, with his own wrappers -- in a sense, everything is a wrapper around the CPU, but it sounds like he's talking about wrappers in your source repo that are completely under your control.) I've had built-in functionality change or disappear when new versions of it appear. Java is not immune from this.
So the idea to always write a wrapper comes down to a bet. Assuming he's only wrapping third-party libraries, he seems to be implicitly betting that:
"first-party" functionality (like Java itself, the kernel, etc.) will never change
when "third-party" functionality changes, it will always be done in a way that can be fixed in a wrapper
Is that true in your case? I don't know. Of the medium-large Java projects I've done, it's rarely true for me. I wouldn't spend effort wrapping all third-party libraries, because it seems like a poor bet, but your situation is certainly different from mine.
There is one situation where you with good reason can wrap. Namely if you need to test stuff, and the default third party object is heavy weight. Then having an interface can really make a difference.
Note, this is not to replace the library ,but make it manageable where it doesn't matter much.
Wrapping a whole library is boilerplate, ineffective, and wrong in most cases. It can be done in a much clever way. I'd say that wrapping a library is appropriate mostly in case of UI component libraries, and again, you have to be adding some additional core functionality of yours to all the components for this to be needed.
if too much modifications and additions are needed, this is most likely not the library you are looking for
if there is a moderate amount of additions and modifications - there are always the design patterns that come handy in those cases. The Decorator pattern (allows new/additional behaviour to be added to an existing object dynamically) , for example, is rather suitable for the most cases.
IDE search/replace and refactoring capabilities offer an easy way to change your code in all required places if some important change is needed and a wrapping object appears. (of course, unit-tests would be helpful here ;) )
In my experience the question becomes fairly moot if you're using abstractions sufficiently. Coupling to a library is just like coupling to any other interface. Thus you want to reduce accidental coupling and the scope of rewrite necessary if you need to swap out the implementation. Don't bind your application logic to some construct, but don't just form a bunch of stupid (literally) wrappers around something and expect to gain any benefit.
A wrapper doesn't usually gain you anything unless it's answering a specific purpose (such as polymorphizing a non-polymorphic construct). They often show up in refactoring, but I wouldn't recommend forming an architecture on them. There's a few exceptions of course, but there is with any principle.
This doesn't speak toward adapters. An adapter can be a pretty important component for when you want to actually alter the interface of a library and its use to be in line with architecture, code, or domain concepts in your project.
You should do it always, often, sometimes, rarely, or never. Not even your colleague does it always, but the instructive cases are always and never. Suppose that it is sometimes necessary. If you never wrapped a library, the worst consequence is that one day you discovered that it was necessary for a library that you had used all over the place. It would take you some time to wrap that library and to perform shotgun surgery on the clients. The question is whether that eventuality would take more time than habitually providing wrappers that are rarely necessary, but having never to perform the shotgun surgery.
My instinct is to appeal to the YAGNI (you ain't gonna need it) principle and opt for "rarely".
I would not wrap it as a one to one thing, but I would layer the app so that each part it replaceable as much as possible. The ISO OSI model works well for all types of software :-)

How "defensive" should my code be?

I was having a discussion with one of my colleagues about how defensive your code should be. I am all pro defensive programming but you have to know where to stop. We are working on a project that will be maintained by others, but this doesn't mean we have to check for ALL the crazy things a developer could do. Of course, you could do that but this will add a very big overhead to your code.
How do you know where to draw the line?
Anything a user enters directly or indirectly, you should always sanity-check. Beyond that, a few asserts here and there won't hurt, but you can't really do much about crazy programmers editing and breaking your code, anyway!-)
I tend to change the amount of defense I put in my code based on the language. Today I'm primarily working in C++ so my thoughts are drifting in that direction.
When working in C++ there cannot be enough defensive programming. I treat my code as if I'm guarding nuclear secrets and every other programmer is out to get them. Asserts, throws, compiler time error template hacks, argument validation, eliminating pointers, in depth code reviews and general paranoia are all fair game. C++ is an evil wonderful language that I both love and severely mistrust.
I'm not a fan of the term "defensive programming". To me it suggests code like this:
void MakePayment( Account * a, const Payment * p ) {
if ( a == 0 || p == 0 ) {
return;
}
// payment logic here
}
This is wrong, wrong, wrong, but I must have seen it hundreds of times. The function should never have been called with null pointers in the first place, and it is utterly wrong to quietly accept them.
The correct approach here is debatable, but a minimal solution is to fail noisily, either by using an assert or by throwing an exception.
Edit: I disagree with some other answers and comments here - I do not think that all functions should check their parameters (for many functions this is simply impossible). Instead, I believe that all functions should document the values that are acceptable and state that other values will result in undefined behaviour. This is the approach taken by the most succesful and widely used libraries ever written - the C and C++ standard libraries.
And now let the downvotes begin...
I don't know that there's really any way to answer this. It's just something that you learn from experience. You just need to ask yourself how common a potential problem is likely to be and make a judgement call. Also consider that you don't necessarily have to always code defensively. Sometimes it's acceptable just to note any potential problems in your code's documentation.
Ultimately though, I think this is just something that a person has to follow their intuition on. There's no right or wrong way to do it.
If you're working on public APIs of a component then its worth doing a good amount of parameter validation. This led me to have a habit of doing validation everywhere. Thats a mistake. All that validation code never gets tested and potentially makes the system more complicated than it needs to be.
Now I prefer to validate by unit testing. Validation definitely happens for data coming from external sources, but not for calls from non-external developers.
I always Debug.Assert my assumptions.
My personal ideology: the defensiveness of a program should be proportional to the maximum naivety/ignorance of the potential user base.
Being defensive against developers consuming your API code is not that different from being defensive against regular users.
Check the parameters to make sure they are within appropriate bounds and of expected types
Verify that the number of API calls which could be made are within your Terms of Service. Generally called throttling it usually only applies to web services and password checking functions.
Beyond that there's not much else to do except make sure your app recovers well in the event of a problem and that you always give ample information to the developer so that they understand what's going on.
Defensive programming is only one way of hounouring a contract in a design-by-contract manner of coding.
The other two are
total programming and
nominal programming.
Of course you shouldnt defend yourself against every crazy thing a developer could do, but then you should state in wich context it will do what is expected to using preconditions.
//precondition : par is so and so and so
function doSth(par)
{
debug.assert(par is so and so and so )
//dostuf with par
return result
}
I think you have to bring in the question of whether you're creating tests as well. You should be defensive in your coding, but as pointed out by JaredPar -- I also believe it depends on the language you're using. If it's unmanaged code, then you should be extremely defensive. If it's managed, I believe you have a little bit of wiggleroom.
If you have tests, and some other developer tries to decimate your code, the tests will fail. But then again, it depends on test coverage on your code (if there is any).
I try to write code that is more than defensive, but down right hostile. If something goes wrong and I can fix it, I will. if not, throw or pass on the exception and make it someone elses problem. Anything that interacts with a physical device - file system, database connection, network connection should be considered unereliable and prone to failure. anticipating these failures and trapping them is critical
Once you have this mindset, the key is to be consistent in your approach. do you expect to hand back status codes to comminicate problems in the call chain or do you like exceptions. mixed models will kill you or at least drive you to drink. heavily. if you are using someone elses api, then isolate these things into mechanisms that trap/report in terms you use. use these wrapping interfaces.
If the discussion here is how to code defensively against future (possibly malevolent or incompetent) maintainers, there is a limit to what you can do. Enforcing contracts through test coverage and liberal use of asserting your assumptions is probably the best you can do, and it should be done in a way that ideally doesn't clutter the code and make the job harder for the future non-evil maintainers of the code. Asserts are easy to read and understand and make it clear what the assumptions of a given piece of code is, so they're usually a great idea.
Coding defensively against user actions is another issue entirely, and the approach that I use is to think that the user is out to get me. Every input is examined as carefully as I can manage, and I make every effort to have my code fail safe - try not to persist any state that isn't rigorously vetted, correct where you can, exit gracefully if you cannot, etc. If you just think about all the bozo things that could be perpetrated on your code by outside agents, it gets you in the right mindset.
Coding defensively against other code, such as your platform or other modules, is exactly the same as users: they're out to get you. The OS is always going to swap out your thread at an inopportune time, networks are always going to go away at the wrong time, and in general, evil abounds around every corner. You don't need to code against every potential problem out there - the cost in maintenance might not be worth the increase in safety - but it sure doesn't hurt to think about it. And it usually doesn't hurt to explicitly comment in the code if there's a scenario you thought of but regard as unimportant for some reason.
Systems should have well designed boundaries where defensive checking happens. There should be a decision about where user input is validated (at what boundary) and where other potential defensive issues require checking (for example, third party integration points, publicly available APIs, rules engine interaction, or different units coded by different teams of programmers). More defensive checking than that violates DRY in many cases, and just adds maintenance cost for very little benifit.
That being said, there are certain points where you cannot be too paranoid. Potential for buffer overflows, data corruption and similar issues should be very rigorously defended against.
I recently had scenario, in which user input data was propagated through remote facade interface, then local facade interface, then some other class, to finally get to the method where it was actually used. I was asking my self a question: When should be the value validated? I added validation code only to the final class, where the value was actually used. Adding other validation code snippets in classes laying on the propagation path would be too defensive programming for me. One exception could be the remote facade, but I skipped it too.
Good question, I've flip flopped between doing sanity checks and not doing them. Its a 50/50
situation, I'd probably take a middle ground where I would only "Bullet Proof" any routines that are:
(a) Called from more than one place in the project
(b) has logic that is LIKELY to change
(c) You can not use default values
(d) the routine can not be 'failed' gracefully
Darknight

Resources