collect stats on method usage in non-rails ruby project - ruby

I have a project where we keep our acceptance test code. It has about 1100 methods that I use for dealing with various aspects of the UI the project is quite old so I suspect that some methods are stale and never get used. Tests are run as Rspec tests. I would like to somehow create statistics method usage to delete stale ones and create a list of top 20 or 30 which are in active use.
Any ideas how I can collect that data?
To give more detail: I have all of the supporting page-object methods in lib directory. I run about 100 spec tests which call methods from these page objects. I want to collect stats on how often these methods from lib/ directory get used.

You can try https://github.com/danmayer/coverband
it's a dynamic analysis tool, it should get you the data you need

You can try using a deforest
It's an easy to use gem that tracks the number of times your model method are called and presents this data segmented as most used, medium used and least used. Exactly what you are looking for.

Related

PHPUnit - Creating tests after development

I've watched and read a handful of tutorials on PHPUnit and Test Driven Development and have recently begun working with Laravel which extends the PHPUnit Framework with it's TestCase class. All of these things make sense to me, as far as, creating tests as you develop. And I find Laravel's extensions particularly intuitive (especially in regards to testing Controller routes)
However, I've recently been tasked with creating unit tests for a sizable app that's near completion. The app is built in Codeigniter, and it was not built with any tests
I find that I'm not entirely sure where to begin, or what steps to take in order to determine the tests I should create.
Should I be looking to test each controller method? Or do I need to break it down more than that? Admittedly, many of these controller methods are doing more than one task.
It is really difficult to write tests for existing project. I will suggest you to first start with writing tests for classes which are not dependent on other classes. Then you can continue to write tests to classes which coupled with classes for which you wrote tests. You will increase your test coverage step by step by repeating this process.
Also don't forget that some times you will need to refactor your code to make it testable. You should improve design of code for example if your controller methods doing more than one task you should divide this method to sub methods and test each of these methods independently.
I also will suggest you to look at this question
You are in a bit of a tight spot, but here is what I would do in your situation. You need to refactor (ie. change) the existing code so that you end up with three types of functions.
The first type are those that deal with the outside world. By this I mean anything that talks to I/O, or your framework or your operating system or even libraries or code from stable modules. Basically everything that has a dependency on code that you can not, or may not change.
The second group of functions are where you transform or create data structures. The only thing they should know about are the data structures that they receive as parameters and the only way they communicate back is by changing those structures or by creating and populating a new structure.
The third group consists of co-ordinating functions which make the calls to the outside world functions, get their returned data structures and pass those structures to the transforming functions.
Your testing strategy is then as follows: the second group can be tested by creating fake data structures, passing them in and checking that the transforms were done correctly. The third group of co-ordinating functions can be tested by dependency injection and mocking to see that they call the outside world and transform functions correctly. Finally the last group of functions should not be tested. You follow the maxim - "make it so simple that their is obviously nothing wrong". See if you can keep it to a single line of code. If you go over four lines of code for these then you are probably doing it wrong.
If you are completely new to TDD I do however strongly suggest that you first get used to doing it on green field projects/modules. I made a couple of false starts on unit testing because I tried to bolt it onto projects afterwards. TDD is really a joy when you finally grok it so it would not be good if you get discouraged early on because of a too steep learning curve.

How can I pass Selenium WebDriver objects between seperate Ruby processes?

I want to pass an instance of an object between two Ruby processes. Specifically, I want to pass an instance of a Selenium WebDriver from one process to another process. The reason I want to do this is because it takes a lot of time for Ruby to create this object, but I want it to be used by the other process.
I've found some related questions here and here that seem to point towards using DRb, but I've been unable to find any useful examples or sample code.
Is there a tool other than DRb that I should be using? Does anyone have an example similar to this that I could copy from?
It looks like you're going to have to use DRb, although the documentation for it seems to be lacking. There is however an interesting article here. You might also want to consider purchasing The dRuby Book by Masatoshi Seki to get a better idea of how to do this effectively.
Another option to investigate if you are not looking at simultaneous access, but you just want to send the object from one process to another, is to serialize (that is, encode in a way that Ruby can read) the object with YAML (for a human readable file) or Marshall (for a binary encoded file) and send it using a pipe. This was mentioned in another answer that has since been deleted.
Note that either of these solutions require modifying the Selenium code heavily since the objects you want to manipulate neither support copying, nor simultaneous access natively.
TL;DR
Most queue or distributed processes are going to require some sort of serialization to work properly. If you want to pass objects rather than messages, then this will a limiting factor in how you approach the problem.
DRb
I don't know if you can marshal a WebDriver object. If you can't, then DRb may be a good choice for your distributed Ruby programs because it supports DRbObject references for things that can't be marshaled. There are some examples provided in the DRb documentation.
Selenium Wire Protocol
Depending on what you're really trying to do, it may be worth taking a closer look at using the remote bindings for the Remote WebDriver client/server, or Selenium's JSON Wire Protocol as an alternative to passing objects between processes.
Other Alternatives: Fixtures, Factories, Stubs, and Mocks
Whether or not these work in your specific case will depend a lot on why you want to pass objects instead of simply driving the remote server. If it's largely an issue of how long it takes to build your object, then the serialization/de-serialization cycle may not necessarily be faster in all cases.
You might want to revisit why your object is so slow to create. If gathering and processing the data for it is what's taking too long, you can use some sort of test fixture or factory to trim that time, either by using a smaller set of fixed data, or using a pre-serialized object that's optimized for speed.
You might also consider whether you actually need real data or objects for your test at all. In many cases, you can speed up your tests a lot by stubbing methods or creating mock objects that will return the values you need for your integration tests without needing to perform expensive calculations or long-running operations.
There are certainly cases where you need to drive the full stack and perform acceptance tests on real data. Even then, you may be able to devise a set of fixture data that will take less time or memory to process. It's certainly worth at least thinking about.

Analysis of messages sent (method invocations) within a ruby application?

Is there a tool that analyzes the messages that are sent to objects (i.e. method invocations) within a ruby application?
Ideally the tool would create a (GraphViz) diagram and is able filter classes in the results (f.i. monitor only classes specific to the application instead of all classes like String, Array and the lot).
Unless you have dtrace support, rubyprof is the next best thing.
As for graphing, you may have to use an auxiliary analysis package of some sort to get the kinds of results you want.

Unit Testing highly interdependent code

So I have some challenging code I would like to refactor. The challenge is that it depends on Database queries, EJB and Java serverFaces. Not simultaneously but close to it.
A good example would be a geocoder. Getting meaningful results depending on multiple queries to the DB depending on the data entered and stored. The code might also reference other helper classes and look them up via the JSF framework.
What are the best strategies for testing this sort of code? Should I try to separate out my code as much as possible? Should I use mocking instead? What has worked for other people?
Well, the short answer is "yes".
You're going to need, first of all, to factor the code sufficiently to construct unit tests at all. What you're describing is excessively complicated to apply the usual unit test methods, and what you would get in any case is more like a higher-level acceptance test.
Now, as far as that factoring goes, you have several possible approaches, and you will probably use them all.
Test the data base queries themselves, using an external script.
Construct an appropriate mock for the components directly accessing the DB, in order to see what happens against known results.
Build unit tests using a JUnit like framework for units of functionality.
Examine the state of the art to see if you can usefully test the output HTML against unit tests.

Is it bad practice to run tests on a database instead of on fake repositories?

I know what the advantages are and I use fake data when I am working with more complex systems.
What if I am developing something simple and I can easily set up my environment in a real database and the data being accessed is so small that the access time is not a factor, and I am only running a few tests.
Is it still important to create fake data or can I forget the extra coding and skip right to the real thing?
When I said real database I do not mean a production database, I mean a test database, but using a real live DBMS and the same schema as the real database.
The reasons to use fake data instead of a real DB are:
Speed. If your tests are slow you aren't going to run them. Mocking the DB can make your tests run much faster than they otherwise might.
Control. Your tests need to be the sole source of your test data. When you use fake data, your tests choose which fakes you will be using. So there is no chance that your tests are spoiled because someone left the DB in an unfamiliar state.
Order Independence. We want our tests to be runnable in any order at all. The input of one test should not depend on the output of another. When your tests control the test data, the tests can be independent of each other.
Environment Independence. Your tests should be runnable in any environment. You should be able to run them while on the train, or in a plane, or at home, or at work. They should not depend on external services. When you use fake data, you don't need an external DB.
Now, if you are building a small little application, and by using a real DB (like MySQL) you can achieve the above goals, then by all means use the DB. I do. But make no mistake, as your application grows you will eventually be faced with the need to mock out the DB. That's OK, do it when you need to. YAGNI. Just make sure you DO do it WHEN you need to. If you let it go, you'll pay.
It sort of depends what you want to test. Often you want to test the actual logic in your code not the data in the database, so setting up a complete database just to run your tests is a waste of time.
Also consider the amount of work that goes into maintaining your tests and testdatabase. Testing your code with a database often means your are testing your application as a whole instead of the different parts in isolation. This often result in a lot of work keeping both the database and tests in sync.
And the last problem is that the test should run in isolation so each test should either run on its own version of the database or leave it in exactly the same state as it was before the test ran. This includes the state after a failed test.
Having said that, if you really want to test on your database you can. There are tools that help setting up and tearing down a database, like dbunit.
I've seen people trying to create unit test like this, but almost always it turns out to be much more work then it is actually worth. Most abandoned it halfway during the project, most abandoning ttd completely during the project, thinking the experience transfer to unit testing in general.
So I would recommend keeping tests simple and isolated and encapsulate your code good enough it becomes possible to test your code in isolation.
As far as the Real DB does not get in your way, and you can go faster that way, I would be pragmatic and go for it.
In unit-test, the "test" is more important than the "unit".
I think it depends on whether your queries are fixed inside the repository (the better option, IMO), or whether the repository exposes composable queries; for example - if you have a repository method:
IQueryable<Customer> GetCustomers() {...}
Then your UI could request:
var foo = GetCustomers().Where(x=>SomeUnmappedFunction(x));
bool SomeUnmappedFunction(Customer customer) {
return customer.RegionId == 12345 && customer.Name.StartsWith("foo");
}
This will pass for an object-based fake repo, but will fail for actual db implementations. Of course, you can nullify this by having the repository handle all queries internally (no external composition); for example:
Customer[] GetCustomers(int? regionId, string nameStartsWith, ...) {...}
Because this can't be composed, you can check the DB and the UI independently. With composable queries, you are forced to use integration tests throughout if you want it to be useful.
It rather depends on whether the DB is automatically set up by the test, also whether the database is isolated from other developers.
At the moment it may not be a problem (e.g. only one developer). However (for manual database setup) setting up the database is an extra impediment for running tests, and this is a very bad thing.
If you're just writing a simple one-off application that you absolutely know will not grow, I think a lot of "best practices" just go right out the window.
You don't need to use DI/IOC or have unit tests or mock out your db access if all you're writing is a simple "Contact Us" form. However, where to draw the line between a "simple" app and a "complex" one is difficult.
In other words, use your best judgment as there is no hard-and-set answer to this.
It is ok to do that for the scenario, as long as you don't see them as "unit" tests. Those would be integration tests. You also want to consider if you will be manually testing through the UI again and again, as you might just automated your smoke tests instead. Given that, you might even consider not doing the integration tests at all, and just work at the functional/ui tests level (as they will already be covering the integration).
As others as pointed out, it is hard to draw the line on complex/non complex, and you would usually now when it is too late :(. If you are already used to doing them, I am sure you won't get much overhead. If that were not the case, you could learn from it :)
Assuming that you want to automate this, the most important thing is that you can programmatically generate your initial condition. It sounds like that's the case, and even better you're testing real world data.
However, there are a few drawbacks:
Your real database might not cover certain conditions in your code. If you have fake data, you cause that behavior to happen.
And as you point out, you have a simple application; when it becomes less simple, you'll want to have tests that you can categorize as unit tests and system tests. The unit tests should target a simple piece of functionality, which will be much easier to do with fake data.
One advantage of fake repositories is that your regression / unit testing is consistent since you can expect the same results for the same queries. This makes it easier to build certain unit tests.
There are several disadvantages if your code (if not read-query only) modifies data:
- If you have an error in your code (which is probably why you're testing), you could end up breaking the production database. Even if you didn't break it.
- if the production database changes over time and especially while your code is executing, you may lose track of the test materials that you added and have a hard time later cleaning it out of the database.
- Production queries from other systems accessing the database may treat your test data as real data and this can corrupt results of important business processes somewhere down the road. For example, even if you marked your data with a certain flag or prefix, can you assure that anyone accessing the database will adhere to this schema?
Also, some databases are regulated by privacy laws, so depending on your contract and who owns the main DB, you may or may not be legally allowed to access real data.
If you need to run on a production database, I would recommend running on a copy which you can easily create during of-peak hours.
It's a really simple application, and you can't see it growing, I see no problem running your tests on a real DB. If, however, you think this application will grow, it's important that you account for that in your tests.
Keep everything as simple as you can, and if you require more flexible testing later on, make it so. Plan ahead though, because you don't want to have a huge application in 3 years that relies on old and hacky (for a large application) tests.
The downsides to running tests against your database is lack of speed and the complexity for setting up your database state before running tests.
If you have control over this there is no problem in running the tests directly against the database; it's actually a good approach because it simulates your final product better than running against fake data. The key is to have a pragmatic approach and see best practice as guidelines and not rules.

Resources