Performance-focused desktop-program: Ruby or Go? - ruby

I currently don't know either of the two languages. Design of a piece of software is close to complete.
The intriguing:
Ruby: Enjoyable. Follows thought process. Made for humans.
Go: Good performance. Fast compile times.
I don't know about Ruby's performance. If it's a lot slower than Go, I'll go with the latter (talking about typical speed here).
I'll learn both eventually, but right now, this will determine which one first.
Update: It's a very basic image-editing program. Technical and especially perceived speed should be high. Startup time is especially important.

Sadly, neither language is appropriate for a desktop image editing program.
You haven't told us which desktop you have in mind, I'll assume it's either Windows or Mac.
Ruby is not appropriate because it fails 2 of your requirements:
it has a terrible startup time because at startup it has to initialize a rather complicated VM, which involves loading quite a big part of its standard library
it's very slow (compared to C/Java/Go) doing the kind of computations that image processing entails
Go is statically linked and is compiled to machine code, so its startup time is excellent and the speed is close to C (i.e. it's the fastest language you can hope to choose after C/C++).
However, Go has no support whatsoever for writing Mac desktop apps (i.e. it has no bridge to Objective-C/Cocoa runtime) and the support for writing Windows desktop apps is extremely poor.
If you're doing Windows, the only language that gives you fast startup time is C/C++/Delphi. C# might have acceptable startup time and it's fast enough for the task (very popular paint.net is written in C# and you can find an old version of the code which is BSD-licensed and re-use a lot of its code).
For Mac, I would recommend Objective C - it's the native language of the platform, best documented and with the best, free dev tools (XCode). You can use https://github.com/philippec/Pixen as a starting point.

You really need to give us some idea as to what you consider to be good and bad performance because it's a very subjective subject.
For example, people are usually willing to trade a certain amount of technical or perceived speed for a system that easier to work with or develope. Plus it also matters what you are tying to do. Each language has it's own strengths and weaknesses. Ruby may be faster at some things than Go. Then again, if you really need speed, perhaps you should be looking at a language that is closer to the metal such as C.
Sometimes though, requests for speed from users are subjective too. I once had a system that the users thought was taking too long to do a specific task. There was no way technically to speed it up, so I animated the "Processing ..." window. Because the users could now see something "happening" on the screen, they thought it was going faster. On a stop watch, it actually took a couple of seconds longer.

I think those languages are the worst you can choose for performance-critical application. I don't know much about Go, but Ruby is similar to Python (even slower) and Python is slow as hell. As i've been reading, Go is much faster than Ruby, but still is like two or three times slower compared to other programming languages... It depends on what are you trying to do, of course, ie. I wouldn't choose any of those for real-time physics or something like that.
http://shootout.alioth.debian.org/u32/performance.php?test=nbody
Why is go language so slow?
http://attractivechaos.github.com/plb/
I've been working with python for a couple of years and it's really slow and I'm sure you will hate it and Ruby is very similar to Python and it's slower but as Go is too new I don't really know much about it, I can't tell..

Related

Program to measure small changes in reaction-time

I need some advice on writing a program that will be used as part of a psychology experiment. The program will track small changes in reaction time. The experimental subject will be asked to solve a series of very simple math problems (such as "2x4=" or "3+5="). The answer is always a single digit. The program will determine the time between the presentation of the problem and the keystroke that answers it. (Typical reaction times are on the order of 200-300 milliseconds.)
I'm not a professional programmer, but about twenty years ago, I took some courses in PL/I, Pascal, BASIC, and APL. Before I invest the time in writing the program, I'd like to know whether I can get away with using a programming package that runs under Windows 7 (this would be the easiest approach for me), or whether I should be looking at a real-time operating system. I've encountered conflicting opinions on this matter, and I was hoping to get some expert consensus.
I'm not relishing the thought of installing some sort of open-source Linux distribution that has real-time capabilities -- but if that's what it takes to get reliable data, then so be it.
Affect seems like it could save you the programming: http://ppw.kuleuven.be/leerpsy/affect4/index2.php. Concerning accuracy on a windows machine, read this.

What makes Ruby slow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Ruby is slow at certain things. But what parts of it are the most problematic?
How much does the garbage collector affect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.
I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?
Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?
Has there been recent work that has significantly improved performance?
Ruby is slow. But what parts of it are the most problematic?
It does "late lookup" for methods, to allow for flexibility. This slows it down quite a bit. It also has to remember variable names per context to allow for eval, so its frames and method calls are slower. Also it lacks a good JIT compiler currently, though MRI 1.9 has a bytecode compiler (which is better), and jruby compiles it down to java bytecode, which then (can) compile via the HotSpot JVM's JIT compiler, but it ends up being about the same speed as 1.9.
How much does the garbage collector effect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.
from some of the graphs at http://www.igvita.com/2009/06/13/profiling-ruby-with-googles-perftools/ I'd say it takes about 10% which is quite a bit--you can decrease that hit by increasing the malloc_limit in gc.c and recompiling.
I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?
Ruby 1.8 "didn't" implement basic math it implemented Numeric classes and you'd call things like Fixnum#+ Fixnum#/ once per call--which was slow. Ruby 1.9 cheats a bit by inlining some of the basic math ops.
Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?
Things like eval are hard to implement efficiently, though much work can be done, I'm sure. The kicker for Ruby is that it has to accomodate for somebody in another thread changing the definition of a class spontaneously, so it has to be very conservative.
Has there been recent work that has significantly improved performance?
1.9 is like a 2x speedup. It's also more space efficient. JRuby is constantly trying to improve speed-wise [and probably spends less time in the GC than KRI]. Besides that I'm not aware of much except little hobby things I've been working on. Note also that 1.9's strings are at times slower because of encoding friendliness.
Ruby is very good for delivering solutions quickly. Less so for delivering quick solutions. It depends what kind of problem you're trying to solve. I'm reminded of the discussions on the old CompuServe MSBASIC forum in the early 90s: when asked which was faster for Windows development, VB or C, the usual answer was "VB, by about 6 months".
In its MRI 1.8 form, Ruby is - relatively - slow to perform some types of computationally-intensive tasks. Pretty much any interpreted language suffers in that way in comparison to most mainstream compiled languages.
The reasons are several: some fairly easily addressable (the primitive garbage collection in 1.8, for example), some less so.
1.9 addresses some of the issues, although it's probably going to be some time before it becomes generally available. Some of the other implementation that target pre-existing runtimes, JRuby, IronRuby, MagLev for example, have the potential to be significantly quicker.
Regarding mathematical performance, I wouldn't be surprised to see fairly slow throughput: it's part of the price you pay for arbitrary precision. Again, pick your problem. I've solved 70+ of the Project Euler problems in Ruby with almost no solution taking more than a mintue to run. How fast do you need it to run and how soon do you need it?
The most problematic part is "everyone".
Bonus points if that "everyone" didn't really use the language, ever.
Seriously, 1.9 is much faster and now is on par with python, and jruby is faster than jython.
Garbage collectors are everywhere; for example, Java has one, and it's faster than C++ on dynamic memory handling. Ruby isn't suited well for number crunching; but few languages are, so if you have computational-intensive parts in your program in any language, you better rewrite them in C (Java is fast with math due to its primitive types, but it paid dearly for them, they're clearly #1 in ugliest parts of the language).
As for dynamic features: they aren't fast, but code without them in static languages can be even slower; for example, java would use a XML config instead of Ruby using a DSL; and it would likely be SLOWER since XML parsing is costly.
Hmm - I worked on a project a few years ago where I scraped the barrel with Ruby performance, and I'm not sure much has changed since. Right now it's caveat emptor - you have to know not to do certain things, and frankly games / realtime applications would be one of them (since you mention OpenGL).
The culprit for killing interactive performance is the garbage collector - others here mention that Java and other environments have garbage collection too, but Ruby's has to stop the world to run. That is to say, it has to stop running your program, scan through every register and memory pointer from scratch, mark the memory that's still in use, and free the rest. The process can't be interrupted while this happens, and as you might have noticed, it can take hundreds of milliseconds.
Its frequency and length of execution is proportional to the number of objects you create and destroy, but unless you disable it altogether, you have no control. My experience was there were several unsatisfactory strategies to smooth out my Ruby animation loop:
GC.disable / GC.enable around critical animation loops and maybe an opportunistic GC.start to force it to go when it can't do any harm. (because my target platform at the time was a 64MB Windows NT machine, this caused the system to run out of memory occasionally. But fundamentally it's a bad idea - unless you can pre-calculate how much memory you might need before doing this, you're risking memory exhaustion)
Reduce the number of objects you create so the GC has less work to do (reduces the frequency / length of its execution)
Rewrite your animation loop in C (a cop-out, but the one I went with!)
These days I would probably also see if JRuby would work as an alternative runtime, as I believe it relies on Java's more sophisticated garbage collector.
The other major performance issue I've found is basic I/O when trying to write a TFTP server in Ruby a while back (yeah I pick all the best languages for my performance-critical projects this was was just an experiment). The absolute simplest tightest loop to simply respond to one UDP packet with another, contaning the next piece of a file, must have been about 20x slower than the stock C version. I suspect there might have been some improvements to make there based around using low-level IO (sysread etc.) but the slowness might just be in the fact there is no low-level byte data type - every little read is copied out into a String. This is just speculation though, I didn't take this project much further but it warned me off relying on snappy I/O.
The main speed recent increase that has gone on, though I'm not fully up-to-date here, is that the virtual machine implementation was redone for 1.9, resulting in faster code execution. However I don't think the GC has changed, and I'm pretty sure there's nothing new on the I/O front. But I'm not fully up-to-date on bleeding-edge Ruby so someone else might want to chip in here.
I assume that you're asking, "what particular techniques in Ruby tend to be slow."
One is object instantiation. If you are doing large amounts of it, you want to look at (reasonable) ways of reducing that, such as using the flyweight pattern, even if memory usage is not a problem. In one library where I reworked it not to be creating a lot of very similar objects over and over again, I doubled the overall speed of the library.
Steve Dekorte: "Writing a Mandelbrot set calculator in a high level language is like trying to run the Indy 500 in a bus."
http://www.dekorte.com/blog/blog.cgi?do=item&id=4047
I recommend to learn various tools in order to use the right tool for the job. Doing matrix transformations could be done efficiently using high-level API which wraps around tight loops with arithmetic-intensive computations. See RubyInline gem for an example of embedding C or C++ code into Ruby script.
There is also Io language which is much slower than Ruby, but it efficiently renders movies in Pixar and outperforms raw C on vector arithmetics by using SIMD acceleration.
http://iolanguage.com
https://renderman.pixar.com/products/tools/it.html
http://iolanguage.com/scm/git/checkout/Io/docs/IoGuide.html#Primitives-Vector
Ruby 1.9.1 is about twice as fast as PHP, and a little bit faster than Perl, according to some benchmarks.
(Update: My source is this (screenshot). I don't know what his source is, though.)
Ruby is not slow. The old 1.8 is, but the current Ruby isn't.
Ruby is slow because it was designed to optimize the programmers experience, not the program's execution time. Slowness is just a symptom of that design decision. If you would prefer performance to pleasure, you should probably use a different language. Ruby's not for everything.
IMO, dynamic languages are all slow in general. They do something in runtime that static languages do in compiling time.
Syntax Check, Interpreting and Like type checking, converting. this is inevitable, therefore ruby is slower than c/c++/java, correct me if I am wrong.

Does Wirth's law still hold true?

Adage made by Niklaus Wirth in 1995:
«Software is getting slower more rapidly than hardware becomes faster»
Do you think it's actually true?
How should you measure "speed" of software? By CPU cycles or rather by time you need to complete some task?
What about software that is actually getting faster and leaner (measured by CPU cycles and MB of RAM) and more responsive with new versions, like Firefox 3.0 compared with 2.0, Linux 2.6 compared with 2.4, Ruby 1.9 compared to 1.8. Or completely new software that is order of magnitude faster then old stuff (like Google's V8 Engine)? Doesn't it negate that law?
Yes I think it is true.
How do I measure the speed of software? Well time to solve tasks is a relevant indicator. For me as a user of software I do not care whether there are 2 or 16 cores in my machine. I want my OS to boot fast, my programs to start fast and I absolutely do not want to wait for simple things like opening files to be done. A software has to just feel fast.
So .. when booting Windows Vista there is no fast software I am watching.
Software / Frameworks often improve their performance. That's great but these are mostly minor changes. The exception proves the rule :)
In my opinion it is all about feeling. And it feels like computers have been faster years ago. Of course I couldn't run the current games and software on those old machines. But they were just faster :)
It's not that software becomes slower, it's that its complexity increases.
We now build upon many levels of abstraction.
When was the last time people on SO coded in assembly language?
Most never have and never will.
It is wrong. Correct is
Software is getting slower at the same rate as hardware becomes faster.
The reason is that this is mostly determined by human patience, which stays the same.
It also neglects to mention that the software of today does more than 30 years ago, even if we ignore eye candy.
In general, the law holds true. As you have stated, there are exceptions "that prove the rule". My brother recently installed Win3.1 on his 2GHz+ PC and it boots in a blink of an eye.
I guess there are many reasons why the law holds:
Many programmers entering the profession now have never had to consider limited speed / resourced systems, so they never really think about the performance of their code.
There's generally a higher importance on getting the code written for deadlines and performance tuning usually comes last after bug fixing / new features.
I find FF's lack of immediate splash dialog annoying as it takes a while for the main window to appear after starting the application and I'm never sure if the click 'worked'. OO also suffers from this.
There are a few articles on the web about changing the perception of speed of software without changing the actual speed.
EDIT:
In addition to the above points, an example of the low importance given to efficiency is this site, or rather, most of the other Q&A sites. This site has always been developed to be fast and responsive and it shows. Compare this to the other sites out there - I've found phpBB based sites are flexible but slow. Google is another example of putting speed high up in importance (it even tells you how long the search took) - compare with other search engines that were around when google started (now, they're all fast thanks to google).
It takes a lot of effort, skill and experience to make fast code which is something I found many programmers lack.
From my own experience, I have to disagree with Wirth's law.
When I first approached a computer (in the 80'), the time for displaying a small still picture was perceptible. Today my computer can decode and display 1080p AVCHD movies in realtime.
Another indicator is the frames per second of video games. Not long ago it used to be around 15fps. Today 30fps to 60 fps are not uncommon.
Quoting from a UX study:
The technological advancements of 21 years have placed modern PCs in a completely different league of varied capacities. But the “User Experience” has not changed much in two decades. Due to bloated code that has to incorporate hundreds of functions that average users don’t even know exist, let alone ever utilize, the software companies have weighed down our PCs to effectively neutralize their vast speed advantages.
Detailed comparison of UX on a vintage Mac and a modern Dual Core: http://hubpages.com/hub/_86_Mac_Plus_Vs_07_AMD_DualCore_You_Wont_Believe_Who_Wins
One of the issues of slow software is a result of most developers using very high end machines with multicore CPUs and loads of RAM as their primary workstation. As a result they don't notice performance issues easily.
Part of their daily activity should be running their code on slower more mainstream hardware that the expected clients will be using. This will show the real world performance and allow them to focus on improving bottlenecks. Or even running within a VM with limited resources can aid in this review.
Faster hardware shouldn't be an excuse for creating slow sloppy code, however it is.
My machine is getting slower and clunkier every day. I attribute most of the slowdown to running antivirus. When I want to speed up, I find that disabling the antivirus works wonders, although I am apprehensive like being in a seedy brothel.
I think that Wirth's law was largely caused by Moore's law - if your code ran slow, you'd just disregard since soon enough, it would run fast enough anyway. Performance didn't matter.
Now that Moore's law has changed direction (more cores rather than faster CPUs), computers don't actually get much faster, so I'd expect performance to become a more important factor in software development (until a really good concurrent programming paradigm hits the mainstream, anyway). There's a limit to how slow software can be while still being useful, y'know.
Yes, software nowadays may be slower or faster, but you're not comparing like with like. The software now has so much more capability, and a lot more is expected of it.
Lets take as an example: Powerpoint. If I created a slideshow with Powerpoint from the early nineties, I can have a slideshow with pretty colours fairly easily, nice text etc. Now, its a slideshow with moving graphics, fancy transitions, nice images.
The point is, yes, software is slower, but it does more.
The same holds true of the people who use the software. back in the 70s, to create a presentation you had to create your own transparencies, maybe even using a pen :-). Now, if you did the same thing, you'd be laughed out of the room. It takes the same time, but the quality is higher.
This (in my opinion) is why computers don't give you gains in productivity, because you spend the same amount of time doing 'the job'. But if you use todays software, your results looks more professional, you gain in quality of work.
Skizz and Dazmogan have it right.
On the one hand, when programmers try to make their software take as few cycles as possible, they succeed, and it is blindingly fast.
On the other hand, when they don't, which is most of the time, their interest in "Galloping Generality" uses up every available cycle and then some.
I do a lot of performance tuning. (My method of choice is random halting.) In nearly every case, the reason for the slowness is over-design of class and data structure.
Oddly enough, the reason usually given for excessively event-driven and redundant data structure is "efficiency".
As Bompuis says, we build upon many layers of abstraction. That is exactly the problem.
Yes it holds true. You have given some prominent examples to counter the thesis, but bear in mind that these examples are developed by a big community of quite knowledgeable people, who are more or less aware of good practices in programming.
People working with the kernel are aware of different CPU's architectures, multicore issues, cache lines, etc. There is an interesting ongoing discussion about inclusion of hardware performance counters support in the mainline kernel. It is interesting from the 'political' point of view, as there is a conflict between the kernel people and people having much experience in performance monitoring.
People developing Firefox understand that the browser should be "lightweight" and fast in order to be popular. And to some extend they manage to do a good job.
New versions of software are supposed to be run on faster hardware in order to have the same user experience. But whether the price is just? How can we asses whether the functionality was added in the efficient way?
But coming back to the main subject, many of the people after finishing their studies are not aware of the issues related to performance, concurrency (or even worse, they do not care). For quite a long time Moore law was providing a stable performance boost. Thus people wrote mediocre code and nobody even noticed that there was something wrong with inefficient algorithms, data-structures or more low-level things.
Then some limitations came into play (thermal efficiency for example) and it is no longer possible to get 'easy' speed for few bucks. People who just depend on hardware performance improvements might get a cold shower. On the other hand, people who have in-depth knowledge of algorithms, data structures, concurrency issues (quite difficult to recruit these...) will continue to write good applications and their value on the job market will increase.
The Wirth law should not only be interpreted literally, it is also about poor code bloat, violating the keep-it-simple-stupid rule and people who waste the opportunity to use the 'faster' hardware.
Also if you happen to work in the area of HPC then these issues become quite obvious.
In some cases it is not true: the frame rate of games and the display/playing of multimedia content is far superior today than it was even a few years ago.
In several aggravatingly common cases, the law holds very, very true. When opening the "My Computer" window in Vista to see your drives and devices takes 10-15 seconds, it feels like we are going backward. I really don't want to start any controversy here but it was that as well as the huge difference in time needed to open Photoshop that drove me off of the Windows platform and on to the Mac. The point is that this slowdown in common tasks is serious enough to make me jump way out of my former comfort zone to get away from it.
Can´t find the sense. Why is this sentence a law?
You never can compare Software and Hardware, they are too different.
Hardware is genuine material and Software is a written code.
The connection is only that Software has to control the performance of Hardware. After an executed step in Hardware the Software needs a completion-sign, so the next order of Software can be done.
Why should I slow down Software? We allways try to make it faster !
It´s a lot of things to do in real physical way to make Hardware faster (changing print-modules or even physical parts of a computer).
It may be senseful,if Wirth means: to do this in one computer (= one Software- and Hardware-System).
To get a higher speed of Hardware it´s necessary to know the function of the Hardware, amount of parallel inputs or outputs at one moment and the frequency of possible switches in one second. Last not least it´s important that different Hardware-prints have the same or with a numeric factor multiplicated frequeny.
So perhaps the Software may slow down automaticaly very easy if You change something in the Hardware. - Wirth was thinking much more in Hardware, he is one of the great inventors since the computer is existing in the German-speaking area.
The other way is not easy. You have to know the System-Software of a computer very exactly to make the Hardware faster by changing the Software (=System-Software, Machine-Programs) of a computer. And if You use more layers you nearly have no direct influence in the speed of the Hardware.
Perhaps this may be the explanation of Wirth´s Law-Thinking....I got it!

With so much system resources available, how sure are you your code is tuned?

With CPUs being increasingly faster, hard disks spinning, bits flying around so quickly, network speeds increasing as well, it's not that simple to tell bad code from good code like it used to be.
I remember a time when you could optimize a piece of code and undeniably perceive an improvement in performance. Those days are almost over. Instead, I guess we now have a set of rules that we follow like "Don't declare variables inside loops" etc. It's great to adhere to these so that you write good code by default. But how do you know it can't be improved even further without some tool?
Some may argue that a couple of nanoseconds won't really make that big a difference these days. The truth is, we are stuck with so many layers that you get a staggering effect.
I'm not saying we should optimize every little millisecond out of our code as that will be expensive and unfeasible. I believe we have to do our best, given our time constraints, to write efficient code as well.
I'm just interested to know what tools you use to profile and measure performance of code, if at all.
I think that optimization should be thought of not as looking at each line of code, but rather, what asymptotic complexity is your algorithm. For example, using a bubble sort is probably one of the worst sorting algorithms you could use in terms of optimization. It takes the longest. Quicksort and mergesort are faster in terms of sorting, and should be always used before a bubble sort.
If you keep optimization always in your mind when designing a solution to a problem, then you should be able to write readable code, which other developers will approve of. Also, if you are programming in a higher level language that will be compiled before it is run, remember that compilers make some awesome optimizations nowadays that you or I may not think of, and also (more importantly) do not have to worry about.
Stick with a good and low big O(), and it should be optimized pretty good. If you are working with millions or greater in some type of dataset, then look for a big O(logn) algorithm. They work great for large tasks, and keep your code optimized.
Let the compilers work on the line by line code optimizations so you can focus on the solutions.
There are times that do warrant line by line optimizations, and if that is the case that you need that much speed, maybe you might want to look into assembly so that you can control every line that is written.
There's a big difference between "good" code and "fast" code. They aren't exactly separate from each other either, but "fast" code doesn't mean "good". Often times, "fast" actually means bad code because readability compromises must be made to make it fast.
The way I look at it, hardware is cheap, programmers are expensive. Unless there is a serious performance problem with some piece of code, you should never have to worry about speed. If there are performance problems, you'll notice them. Only when you notice the performance problem on good hardware should you have to worry about optimization (in my opinion)
If you reach the point where your code is slow, but you can't figure out why, I'd use a profiler like ANT, or dotTrace if you're in the .NET world (I'm sure there are others out there for other platforms & languages). They're pretty useful, but I've only ever had one situation where I needed a profiler to identify the problem. It was something that now that I know the issue, I won't need a profiler again to tell me it's a problem because I'll never forget the amount of time I spent trying to optimize it.
This is absolutely a valid concern, but not for most developers. Most developers are concerned with getting a product that works to their employer. Optimized code is seldom a requirement.
The best way to make sure your code is fast is to benchmark or profile it. A lot of compiler optimizations create non-intuitive oddities in the performance of a programmer's code, so in the end measurement becomes essential.
In my experience, Rational Quantify has given me the best results in terms of code tuning. It is not free, but it is very fully featured and seems to have given me the most useful results.
In terms of free tools, check out gprof or oprofile, if you are on a Unix environment. They are not as good as some of the commercial tools, but can often point you in the right direction.
On a side note, I am almost always surprised at what profilers turn up the first time I use them. You can have intuition as to where code may be bottlenecking, and it can often be completely wrong.
Almost all code I write is plenty fast enough. On the rare occasions when it isn't, for C, C++, and Objective Caml I use the venerable gprof and the excellent valgrind with its superb visualizer kcachegrind (part of the KDE SDK; don't be fooled by the out-of-date code on sourceforge).
The MLton Standard ML compiler and the Glasgow Haskell Compiler both ship with excellent profilers.
I wish there were a better profiler for Lua.
Uh, a profiler maybe? There are ones available for almost all platforms and languages.

Image Recognition

I'd like to do some work with the nitty-gritties of computer imaging. I'm looking for a way to read single pixels of data, analyze them programatically, and change them. What is the best language to use for this (Python, c++, Java...)? What is the best fileformat?
I don't want any super fancy software/APIs... I'm looking for the bare basics.
If you need speed (you'll probably always want speed with image processing) you definitely have to work with raw pixel data.
Java has some real disadvantages as you cannot access memory directly which makes pixel access quite slow compared to accessing the memory directly.
C++ is definitely the language of choice for production use image processing. But you can, for example, also use C# as it allows for unsafe code in specific areas. (Take a look at the scan0 pointer property of the bitmapdata class.)
I've used C# successfully for image processing applications and they are definitely much faster than their java counterparts.
I would not use any scripting language or java for such a purpose.
It's very east to manipulate the large multi-dimensional or complex arrays of pixel information that are pictures using high-level languages such as Python. There's a library called PIL (the Python Imaging Library) that is quite useful and will let you do general filters and transformations (change the brightness, soften, desaturate, crop, etc) as well as manipulate the raw pixel data.
It is the easiest and simplest image library I've used to date and can be extended to do whatever it is you're interested in (edge detection in very little code, for example).
I studied Artificial Intelligence and Computer Vision, thus I know pretty well the kind of tools that are used in this field.
Basically: you can use whatever you want as long as you know how it works behind the scene.
Now depending on what you want to achieve, you can either use:
C language, but you will lose a lot of time in bugs checking and memory management when implementing your algorithms. So theoretically, this is the fastest language to do that kind of job, but if your algorithms are not computationnally efficient (in terms of complexity) or if you lose too much time in bugs checking, this is clearly not worth it. So I would advise to first implement your application in another language, and then later you can always optimize small parts of your code with C bindings.
Octave/MatLab: very efficient language, almost as much as C, and you can make very elegant and succinct algorithms. If you are into vectorization, matrix and linear operations, you should go with that. However, you won't be able to develop a whole application with this language, it's more focused on algorithms, but then you can always develop an interface using another language later.
Python: all-in-one elegant and accessible language, used in gigantically large scale applications such as Google and Facebook. You can do pretty much everything you want with Python, any kind of application. It will be perfectly adapted if you want to make a full application (with client interaction and all, not only algorithms), or if you want to quickly draft a prototype using existent libraries since Python has a very large set of high quality libraries, like OpenCV. However if you only want to make algorithms, you should better use Octave/MatLab.
The answer that was selected as a solution is very biaised, and you should be careful about this kind of archaic comment.
Nowadays, hardware is cheaper than wetware (humans), and thus, you should use languages where you will be able to produce results faster, even if it's at the cost of a few CPU cycles or memory space.
Also, a lot of people tends to think that as long as you implement your software in C/C++, you are making the Saint Graal of speedness: this is just not true. First, because algorithms complexity matters a lot more than the language you are using (a bad algorithm will never beat a better algorithm, even if implemented in the slowest language in the universe), and secondly because high-level languages are nowadays doing a lot of caching and speed optimization for you, and this can make your program run even faster than in C/C++.
Of course, you can always do everything of the above in C/C++, but how much of your time are you willing to waste to reinvent the wheel?
Not only will C/C++ be faster, but most of the image processing sample code you find out there will be in C as well, so it will be easier to incorporate things you find.
if you are looking to numerical work on your images (think matrix) and you into Python check out http://www.scipy.org/PyLab - this is basically the ability to do matlab in python, buddy of mine swears by it.
(This might not apply for the OP who only wanted the bare basics -- but now that the speed issue was brought up, I do need to write this, just for the record.)
If you really need speed, it's better to forget about working on the pixel-by-pixel level, and rather see whether the operations that you need to perform could be vectorized. For example, for your C/C++ code you could use the excellent Intel IPP library (no, I don't work for Intel).
It depends a little on what you're trying to do.
If runtime speed is your issue then c++ is the best way to go.
If speed of development is an issue, though, I would suggest looking at java. You said that you wanted low level manipulation of pixels, which java will do for you. But the other thing that might be an issue is the handling of the various file formats. Java does have some very nice APIs to deal with the reading and writing of various image formats to file (in particular the java2d library. You choose to ignore the higher levels of the API)
If you do go for the c++ option (or python come to think of it) I would again suggest the use of a library to get you over the startup issues of reading and writing files. I've previously had success with libgd
What language do you know the best? To me, this is the real question.
If you're going to spend months and months learning one particular language, then there's no real advantage in using Python or Java just for their (to be proven) development speed.
I'm particularly proficient in C++ and I think that for this particular task I can be as speedy as a Java programmer, for example. With the aid of some good library (OpenCV comes to mind) you can create anything you need in a matter of a couple of lines of C++ code, really.
Short answer: C++ and OpenCV
Short answer? I'd say C++, you have far more flexibility in manipulating raw chunks of memory than Python or Java.

Resources