How can I find the performance bottlenecks in my Ruby application? - ruby

I have written a Ruby application which parses lots of data from sources in different formats html, xml and csv files. How can I find out what areas of the code are taking the longest?
Is there any good resources on how to improve the performance of Ruby applications? Or do you have any performance coding standards you always follow?
For example do you always join your string with
output = String.new
output << part_one
output << part_two
output << '\n'
or would you use
output = "#{part_one}#{part_two}\n"

Well, there are certain well known practices like string concatenation is way slower than "#{value}" but in order to find out where you script is consuming most of its time or more time than required, you need to do profiling. There is a ruby gem called ruby-prof. The profiler can bring to your notice even those performance issues that may rarely occur. I have been using it a lot and find it very helpful. Here is some information about it directly from its official site
ruby-prof is a fast code profiler for
Ruby. Its features include:
Speed - it is a C extension and therefore many times faster than the
standard Ruby profiler.
Modes - Ruby prof can measure a number of different parameters,
including
call times, memory usage and object allocations.
Reports - can generate text and cross-referenced html reports
Flat Profiles - similar to the reports generated by the standard Ruby
profiler
Graph profiles - similar to GProf, these show how long a method runs,
which methods call it and which
methods it calls.
Call tree profiles - outputs results in the calltree format
suitable for the KCacheGrind profiling
tool.
Threads - supports profiling multiple threads simultaneously
Recursive calls - supports profiling recursive method calls

You can test the performance of individual sections of code with the standard Benchmark module.
You could also test your code on different implementations of Ruby (eg 1.9, Rubinius) and see if that speeds things up.
Of course if your problems are algorithmic in nature then there's not too much point in worrying about things like string concatenation speeds...

The answer to the string concatenation is here: https://web.archive.org/web/20090122123342/http://blog.cbciweb.com/2008/06/10/ruby-performance-use-double-quotes-vs-single-quotes

Besides what's written above I also recommend watching the Scaling Ruby screencast. It has some interesting tips&tricks on how to write faster Ruby code.

There is also a set of dTrace tools for Ruby you can use in the dTraceToolkit

Related

How to find out what implicit(s) are used in my scala code

Problem statement:
I read in multiple sources/articles that implicits drive up scala compilation time
I want to remove/reduce them to minimum possible to see what compilation time without them will look like (codebase is around 1000 files of various complexity based on scalaz & akka & slick)
I don't really know what kind of static analysis I can perform. Any liks/references to already existing tooling highly appreciated.
It is true that implicits can degrade compilation speed, especially for code that uses them for type-level computations. It is definitely worth it to measure their impact. Unfortunately it can be difficult to track down the culprits. There are tools that can help though:
Run scalac with -Ystatistics:typer to see how many tree nodes are processed during type-checking. E.g. you can check the number of ApplyToImplicitArgs and ApplyImplicitView relative to the total (and maybe compare this to another code base).
There is currently an effort by the Scala Center to improve the status quo hosted at scalacenter/scalac-profiling. It includes an sbt plugin that should be able to give you an idea about implicit search times, but it's still in its infancy (not published yet at the time of writing). I haven't tested it myself but you can still give it a try.
You can also compile with -Xlog-implicits, pipe the output to a file and analyse the logs. It will show a message for each implicit candidate that was considered but failed, complete with source position, search type and reason for the failure. Such failed searches are expensive. You can write a simple script with your favourite scripting language (why not Scala?) to summarize the data and even plot it with some nice graphics.
Aside: How to resolve a specific implicit instance?
Just use reify and good ol' println debugging:
import scala.collection.SortedSet
import scala.reflect.runtime.universe._
println(showCode(reify { SortedSet(1,2,3) }.tree))
// => SortedSet.apply(1, 2, 3)(Ordering.Int)
There is scalac profiling being done by scala center now: https://scalacenter.github.io/scalac-profiling/

Why is Crystal faster than Ruby?

I would very much like to know what exactly makes Crystal faster than Ruby while code is so similar. The short answer could be that it is compiled, and Ruby is interpreted, yet I would like to understand more about the language specifications.
I guess it's a combination of things:
Ruby is interpreted, and the interpreter could be improved. For example other interpreted languages like JS or Java have a very good VM and JIT compiler.
Many Ruby checks that are done at runtime, in Crystal are done at compile time. For example a simple method call in Ruby ends up in a method lookup. Even with a cache it won't beat a native function call. Or when Ruby decides to do different things based on the type of an argument, these checks are done at runtime. In Crystal they are known at compile time so those checks disappear. Without those checks the compiler can inline calls and do some pretty crazy stuff (thanks to LLVM). Or, for example, looking up an instance varaibles is a hash lookup in Ruby (as far as I know), while in Crystal it's just a memory indirection and load.
In Crystal we try to avoid extra memory allocations. For example to_s(io) writes to an IO instead of converting the object to a string in memory. Or we have tuples for fixed-sized arrays that are allocated on the stack. Or you can declare a type as a struct to avoid heap allocations.
Calls to C are done directly, without wrappers. Well, you could have a wrapper but that will be inlined by LLVM. In Ruby it always has to resolve a Ruby method first.
Probably there are many more reasons, but they are related.

How can I pass Selenium WebDriver objects between seperate Ruby processes?

I want to pass an instance of an object between two Ruby processes. Specifically, I want to pass an instance of a Selenium WebDriver from one process to another process. The reason I want to do this is because it takes a lot of time for Ruby to create this object, but I want it to be used by the other process.
I've found some related questions here and here that seem to point towards using DRb, but I've been unable to find any useful examples or sample code.
Is there a tool other than DRb that I should be using? Does anyone have an example similar to this that I could copy from?
It looks like you're going to have to use DRb, although the documentation for it seems to be lacking. There is however an interesting article here. You might also want to consider purchasing The dRuby Book by Masatoshi Seki to get a better idea of how to do this effectively.
Another option to investigate if you are not looking at simultaneous access, but you just want to send the object from one process to another, is to serialize (that is, encode in a way that Ruby can read) the object with YAML (for a human readable file) or Marshall (for a binary encoded file) and send it using a pipe. This was mentioned in another answer that has since been deleted.
Note that either of these solutions require modifying the Selenium code heavily since the objects you want to manipulate neither support copying, nor simultaneous access natively.
TL;DR
Most queue or distributed processes are going to require some sort of serialization to work properly. If you want to pass objects rather than messages, then this will a limiting factor in how you approach the problem.
DRb
I don't know if you can marshal a WebDriver object. If you can't, then DRb may be a good choice for your distributed Ruby programs because it supports DRbObject references for things that can't be marshaled. There are some examples provided in the DRb documentation.
Selenium Wire Protocol
Depending on what you're really trying to do, it may be worth taking a closer look at using the remote bindings for the Remote WebDriver client/server, or Selenium's JSON Wire Protocol as an alternative to passing objects between processes.
Other Alternatives: Fixtures, Factories, Stubs, and Mocks
Whether or not these work in your specific case will depend a lot on why you want to pass objects instead of simply driving the remote server. If it's largely an issue of how long it takes to build your object, then the serialization/de-serialization cycle may not necessarily be faster in all cases.
You might want to revisit why your object is so slow to create. If gathering and processing the data for it is what's taking too long, you can use some sort of test fixture or factory to trim that time, either by using a smaller set of fixed data, or using a pre-serialized object that's optimized for speed.
You might also consider whether you actually need real data or objects for your test at all. In many cases, you can speed up your tests a lot by stubbing methods or creating mock objects that will return the values you need for your integration tests without needing to perform expensive calculations or long-running operations.
There are certainly cases where you need to drive the full stack and perform acceptance tests on real data. Even then, you may be able to devise a set of fixture data that will take less time or memory to process. It's certainly worth at least thinking about.

Why Play! framework chose Groovy for template engine

From their website http://www.playframework.org/documentation/1.0/faq
"
The biggest CPU consumer in the Play stack at this time is the template engine based on Groovy. But as Play applications are easily scalable, it is not really a problem if you need to serve a very high traffic: you can balance the load between several servers. And we hope for a performance gain at this level with the new JDK7 and its better support of dynamic languages.
"
So there are no better choices? What about JSP?
JSP is not feasible as every JSP compiles to a Servlet and the servlet API provides things like the server side session which are not compatible with the RESTful paradigm. We don't want to go back to the dark ages of badly scalable server side sessions, back buttoning problems in the browser, reposts etc.
Japid templates are interesting, but they are not backed by a great community and perhaps didn't even exist at the time play was created (I don't know for sure though). I tried Japid as a replacement for the Groovy templates in my own application and found out in a JMeter test that the benefit would be only marginal, 10% to max. 25% improvement.
I guess in the end it all depends on your scalability requirements and the structure of your pages. I picked the 90% use case of my application and did the test. To me, the little improvement did not justify for the additional costs of the extra dependency (I like to keep dependencies to a minimum for maintainability).
Groovy templates are not bad or slow in general. Use typed variables wherever possible (instead of "def"), even in closures! Keep values of accessed properties in local variables. Do reasonable results paging. Then keep your fingers crossed that GSP might be able to run on groovy++ in the future and you're done ;)
To me, the question would not be why they used groovy in the views. That is, because I rather do miss it so much in the controller layer. Groovy would make coding the controller behaviour a lot easier IMHO.
First off, JSP was not a valid option for Play since it chose not to go down the Java EE route (which JSP is part of). Instead, Play chose to use Groovy as an intuitive, simple but powerful templating engine.
However, one of Play's greatest features is that it is a pluggable system, meaning that many parts of the core system can simply be replaced. This includes the template engine, and there are a couple that are already available.
The most popular is Japid. It claims to be 2-20x faster than the standard templating engine, and has been around for a while. For more info, see here http://www.playframework.org/modules/japid.
A second option is Cambridge, although this has only been out for a little while, but is reasonably active in the message boards (see https://groups.google.com/forum/?hl=en#!searchin/play-framework/cambridge/play-framework/IxSei-9BGq8/X-3pF5tWAKAJ).
I tend to stick to Groovy, as I like the way it works, and have not found it to be too bad in terms of performance, but every application is individual, so your own performance tests should lead you down your own particular path.
Yes there is Japid. Which is much, much faster.
http://www.playframework.org/modules/japid
I totally agree with the choice of ease over speed the Play Framework designers made here. My guess is that if the templating starts getting in the way in terms of performance, you can (and should!) measure the slow bits, and refactor them into fast tags where possible. With that, you're likely to save 80% of CPU by moving 20% into fast tags, leaving you with flexibility and adequate speed.
Having said that, I'm looking forward to an experiment I'm planning to see how well the new Scala templates (loosely "borrowed" from Razor.NET - awesome clean syntax) work with Java controllers/models. The Scala backend isn't there yet in terms of comparative ease, but the templating language certainly rocks.
I may be late to the party in 2016. Play 2 is out, and the JDK (not to mention the hardware) drastically improved. I am not using Play or Spring Boot, since my platform doesn't need them - just pure runtime text/HTML generation from templates.
First, when talking about Groovy templates, there is more than one. I use the original Groovy SimpleTemplateEngine for anything from emails to rich Web pages, whether most people nowadays favor the "advanced" MarkupTemplateEngine with its non-HTML builder syntax. I did not go that route because of the IDE (e.g. UntelliJ) support for JSPish HTML files with JavaScript - catching unclosed tags, braces, etc. Besides, how would you include JavaScript into the curly brace based "builder" style template?
Whichever you chose, both SimpleTemplateEngine and MarkupTemplateEngine statically compile their templates, even though the Groovy doc only mentions it for the latter. Why wouldn't it generates a class for the former? I didn't compare them against each other, but I'd expect SimpleTemplateEngine to be faster, since it is... well, simpler - doesn't translate any syntax into String concatenations with ifs and loops in between.
And it is indeed very fast. I was concerned about invoking it in a loop. Doesn't make any difference. There is no overhead, as the template is compiled once.
I use multiple small templates responsible for generating individual form control markup (HTML + JS) to generate a composite form, included in a higher-level container, included in another container, and so on, until the entire page is formed. Decomposing your view like that makes it, as you already guessed, modular, encapsulated, and "object-oriented" - composed of many individual MVC components building upon each other. Sort of like good old custom JSP tags, only evaluated at runtime and compatible with technologies like Spring Boot, if you cannot resist trendy resume-boosting stuff like that.
A test form with a 100 fields (all complex "smart" controls with encapsulated state management and behavior) renders in 150ms the first time, and then 10-14ms thereafter. In an IDE debugger on my memory-starved 4y.o. notebook. I also verified it is thread-safe, since Groovy never mentioned it explicitly. Why wouldn't it be, if it is compiled into a stateless Groovy class like any other? Call createTemplate() once, store the Template somewhere, and use it (call Template.make()) in your servlet or another concurrent context. Obviously I'll never have a 100-field form in a real application. Anyone who does, needs to reconsider his/her UX.
The performance is quite adequate. I'd even accept one second to render a 100-field page. Think of it, you don't need ultimate nanotrading or nuclear missile tracking performance in a Web app. If you do, pick Jamon: http://www.jamon.org/Overview.html, which generates a Java class, you'd normally write to concatenate Strings. I didn't test it, as I don't like extra compilation steps (automatically executed by Maven, but still). Compiled Groovy bytecode is good enough for me - compared to the compiled, yes, strongly-typed Java. The difference would be marginal unless you are doing something complex, which you shouldn't inside a template (see below). Playing with typed Groovy variables vs. def as suggested in this thread, only saved me a couple of milliseconds on that 100-template run.
Templates should not have much procedural logic (internal variables, ifs and loops) anyway - that's the controller's, not view's responsibility. That said, ifs and loops are a must for a template engine. Why would one use Handlebars/Mustache, if he/she can simply call String.replace()?
The rest of the template engines is also irrelevant. No String concatenation (e.g. Velocity, or Freemarker) or interpreted JS-based technology (e.g. Jade) would ever beat the most direct Jamon's approach performance-wise. And being a Java programmer, you want to use your favorite language/syntax: either directly (Jamon) or 90% close to Java, Groovy is (being a scripting-centric concise interpreted Java). I wouldn't comment on Scala - the matter of preference. Other than its allegedly "concise" syntax (less and less relevant with Java 8+) comes at a price. And only matters for complex loops. You do not want to write your entire app inside one template, like I already said. A couple of loops and up to ten if statements max.
And, like everyone mentioned, the intuitive syntax, and ease of use is the key. They drastically reduce the number of bugs. A good (additional) server costs a $1000, while developer salaries - to fix all of the bugs stemming form the complexity of marginal performance optimization, are 100 times higher.

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Resources