Data structure used for implementing spreadsheets - data-structures

What is the data structure used by spreadsheets program like ms-excel?

Maybe, probably, a sparse matrix:
http://en.wikipedia.org/wiki/Sparse_matrix

While it is not exactly Excel, Open Office's Spreadsheet program is open source. It's a fairly large code base but nevertheless giving it a peek might give you a better understanding of how such an application is implemented:
http://contributing.openoffice.org/programming.html

I got this question in an interview today.
The answer they were looking for (because I asked the interviewer at the end) was to implement a class structure that used multiple objects; formulas, references, and numerics.
Not much more detail to offer but suffice it to say that programs like google docs are much more interesting (to code) than they seem.

I expect it to use many.
For example, an AST to recalculate formulas (see this question).

Possibly a multi-dimensional array.

Related

Recommendations for Fast Multipole Method implementation?

I'm interested in implementing the Fast Multipole Method to efficiently simulate a system of repulsive particles.
I've found a large collection of references discussing FMM, but none seem very approachable for non-mathematicians who want to fully understand the algorithm.
Can you recommend a ground-up reference that clearly explains the mathematics behind the process, and includes pseudocode exemplifying a proper implementation?
I am by no means an expert in FMM, but this java implementation and introduction is the best source I've found so far for explaining it carefully and slowly. The paper is good at defining terms before using them, and the code at least is useful as a reference point. The math still gets hairy very quickly, but it is what it is :)
A pedestrian introduction to fast multipole methods is a close second. It doesn't explain the actual details of a working FMM implementation, but it's a good introduction to the basic ideas.
I like the short course on FMM. In begins with FMM in 1D, than it uses theory of complex variable to do FMM in 2D. And than there is the crazy 3D version which uses theory of spherical harmonics functions, which I guess can be very difficult for non-mathematician. But If you need FMM only in 2D you should be fine.
Unfortunately no pseudo codes are given there.
But do you really need the accuracy of FMM?. You might be fine with Barnes-Hut's algorithm
After running into a similar issue to you, I ended up writing a fully-documented Python fast multipole method implementation, pybbfmm. I've also written a short, mathematics-free tutorial on how the method works. Together, I think they're substantially more accessible than any of the other presentations I could find.
(meta: Although this is effectively a linkpost, the OP is explicitly asking for a link. I've added what I think was missing from the last one - the name fo the library - but I'm not sure how else to offer this answer except as a name and a link. Certainly it doesn't feel any more linkpost-y than the accepted answer. If this one gets deleted as well, I'll give up)

Making a spell check utility

This idea just popped into my head, so I don't have any code to show for it, but I was curious to know the answer. How is spell check implemented on most major word processors? I'm most curious to know what kind of data structures would be used in the creation of such a utility. Also, references to algorithms would be nice answers as well.
For a basic guide in python, have a look here.
Also you might want to look at this past question

What are algorithms and data structures in layman’s terms?

I currently work with PHP and Ruby on Rails as a web developer. My question is why would I need to know algorithms and data structures? Do I need to learn C, C++ or Java first? What are the practical benefits of knowing algorithms and data structures? What are algorithms and data structures in layman’s terms? (As you can tell unfortunately I have not done a CS course.)
Please provide as much information as possible and thank you in advance ;-)
Data structures are ways of storing stuff, just like you can put stuff in stacks, queues, heaps and buckets - you can do the same thing with data.
Algorithms are recipes or instructions, the quick start manual for your coffee maker is an algorithm to make coffee.
Algorithms are, quite simply, the steps by which you do something. For instance the Coffee Maker Algorithm would run something like
Turn on Coffee Maker
Grind Coffee Beans
Put in filter and place coffee in filter
Add Water
Start brewing process
Drink coffee
A data structure is a means by which we store information in a organized fashion. For further info, check out the Wikipedia Article.
An algorithm is a list of instructions and data structures are ways to represent information. If you're writing computer programs then you're already using algorithms and data structures even if you don't know what the words mean.
I think the biggest advantages in knowing standard algorithms and data structures are:
You can communicate with other programmers using a common language.
Other people will be able to understand your code once you've left.
You will also learn better methods for solving common problems. You could probably solve these problems eventually anyway even without knowing the standard way to do it, but you will spend a lot of time reinventing the wheel and it's unlikely your solutions will be as good as those that thousands of experts have worked on and improved over the years.
An algorithm is a sequence of well defined steps leading to the solution of a type of problem.
A data structure is a way to store and organize data to facilitate access and modifications.
The benefit of knowing standard algorithms and data structures is they are mostly better than you yourself could develop. They are the result of months or even years of work by people who are far more intelligent than the majority of programmers. Knowing a range of data structures and algorithms allows you to fit a problem roughly to a data structure or/and algorithm and tweak as required.
In the classic "cooking/baking equivalent", algorithms are recipes and data structures are your measuring cups, your baking sheets, your cookie cutters, mixing bowls and essentially any other tool you would be using (your cooker is your compiler/interpreter, though).
(source: mit.edu)
This book is the bible on algorithms. In general, data structures relate to how to organize your data to access it in memory, and algorithms are methods / small programs to resolve problems (ex: sorting a list).
The reason you should care is first to understand what can go wrong in your code; poorly implemented algorithms can perform very badly compared to "proven" ones. Knowing classic algorithms and what performance to expect from them helps in knowing how good your code can be, and whether you can/should improve it.
Then there is no need to reinvent the wheel, and rewrite a buggy or sub-optimal implementation of a well-known structure or algorithm.
An algorithm is a representation of the process involved in a computation.
If you wanted to add two numbers then the algorithm might go:
Get first number;
Get second number;
Add first number to second number;
Return result.
At its simplest, an algorithm is just a structured list of things to do - its use in computing is that it allows people to see the intent behind the code and makes logical (as opposed to syntactical) errors easier to spot.
e.g. if step three above said multiply instead of add then someone would be able to point out the error in the logic without having to debug code.
A data structure is a representation of how a system's data should be referenced. It might match a table structure exactly or may be de-normalised to make data access easier. At its simplest it should show how the entities in a system are related.
It is too large a topic to go into in detail but there are plenty of resources on the web.
Data structures are critical the second your software has more than a handful of users. Algorithms is a broad topic, and you'll want to study it if a good knowledge of data structures doesn't fix your performance problems.
You probably don't need a new programming language to benefit from data structures knowledge, though PHP (and other high level languages) will make a lot of it invisible to you, unless you know where to look. Java is my personal favorite learning language for stuff like this, but that's pretty subjective.
My question is why would I need to know algorithms and data structures?
If you are doing any non-trivial programming, it is a good idea to understand the class data structures and algorithms and their uses in order to avoid reinventing the wheel. For example, if you need to put an array of things in order, you need to understand the various ways of sorting, so that you can choose the most appropriate one for the task in hand. If you choose the wrong approach, you can end up with a program that is grossly inefficient in some circumstances.
Do I need to learn C, C++ or Java first?
You need to know how to program in some language in order to understand what the algorithms and data structures do.
What are the practical benefits of knowing algorithms and data structures?
The main practical benefits are:
to avoid having to reinvent the wheel all of the time,
to avoid the problem of square wheels.

How often do you use pseudocode in the real world?

Back in college, only the use of pseudo code was evangelized more than OOP in my curriculum. Just like commenting (and other preached 'best practices'), I found that in crunch time psuedocode was often neglected. So my question is...who actually uses it a lot of the time? Or do you only use it when an algorithm is really hard to conceptualize entirely in your head? I'm interested in responses from everyone: wet-behind-the-ears junior developers to grizzled vets who were around back in the punch card days.
As for me personally, I mostly only use it for the difficult stuff.
I use it all the time. Any time I have to explain a design decision, I'll use it. Talking to non-technical staff, I'll use it. It has application not only for programming, but for explaining how anything is done.
Working with a team on multiple platforms (Java front-end with a COBOL backend, in this case) it's much easier to explain how a bit of code works using pseudocode than it is to show real code.
During design stage, pseudocode is especially useful because it helps you see the solution and whether or not it's feasible. I've seen some designs that looked very elegant, only to try to implement them and realize I couldn't even generate pseudocode. Turned out, the designer had never tried thinking about a theoretical implementation. Had he tried to write up some pseudocode representing his solution, I never would have had to waste 2 weeks trying to figure out why I couldn't get it to work.
I use pseudocode when away from a computer and only have paper and pen. It doesn't make much sense to worry about syntax for code that won't compile (can't compile paper).
I almost always use it nowadays when creating any non-trivial routines. I create the pseudo code as comments, and continue to expand it until I get to the point that I can just write the equivalent code below it. I have found this significantly speeds up development, reduces the "just write code" syndrome that often requires rewrites for things that weren't originally considered as it forces you to think through the entire process before writing actual code, and serves as good base for code documentation after it is written.
I and the other developers on my team use it all the time. In emails, whiteboard, or just in confersation. Psuedocode is tought to help you think the way you need to, to be able to program. If you really unstand psuedocode you can catch on to almost any programming language because the main difference between them all is syntax.
If I'm working out something complex, I use it a lot, but I use it as comments. For instance, I'll stub out the procedure, and put in each step I think I need to do. As I then write the code, I'll leave the comments: it says what I was trying to do.
procedure GetTextFromValidIndex (input int indexValue, output string textValue)
// initialize
// check to see if indexValue is within the acceptable range
// get min, max from db
// if indexValuenot between min and max
// then return with an error
// find corresponding text in db based on indexValue
// return textValue
return "Not Written";
end procedure;
I've never, not even once, needed to write the pseudocode of a program before writing it.
However, occasionally I've had to write pseudocode after writing code, which usually happens when I'm trying to describe the high-level implementation of a program to get someone up to speed with new code in a short amount of time. And by "high-level implementation", I mean one line of pseudocode describes 50 or so lines of C#, for example:
Core dumps a bunch of XML files to a folder and runs the process.exe
executable with a few commandline parameters.
The process.exe reads each file
Each file is read line by line
Unique words are pulled out of the file stored in a database
File is deleted when its finished processing
That kind of pseudocode is good enough to describe roughly 1000 lines of code, and good enough to accurately inform a newbie what the program is actually doing.
On many occasions when I don't know how to solve a problem, I actually find myself drawing my modules on a whiteboard in very high level terms to get a clear picture of how their interacting, drawing a prototype of a database schema, drawing a datastructure (especially trees, graphs, arrays, etc) to get a good handle on how to traverse and process it, etc.
I use it when explaining concepts. It helps to trim out the unnecessary bits of language so that examples only have the details pertinent to the question being asked.
I use it a fair amount on StackOverflow.
I don't use pseudocode as it is taught in school, and haven't in a very long time.
I do use english descriptions of algorithms when the logic is complex enough to warrant it; they're called "comments". ;-)
when explaining things to others, or working things out on paper, i use diagrams as much as possible - the simpler the better
Steve McConnel's Code Complete, in its chapter 9, "The Pseudocode Programming Process" proposes an interesting approach: when writing a function longer than a few lines, use simple pseudocode (in the form of comments) to outline what the function/procedure needs to do before writing the actual code that does it. The pseudocode comments can then become actual comments in the body of the function.
I tend to use this for any function that does more than what can be quickly understood by looking at a screenful (max) of code. It works specially well if you are already used to separate your function body in code "paragraphs" - units of semantically related code separated by a blank line. Then the "pseudocode comments" work like "headers" to these paragraphs.
PS: Some people may argue that "you shouldn't comment what, but why, and only when it's not trivial to understand for a reader who knows the language in question better then you". I generally agree with this, but I do make an exception for the PPP. The criteria for the presence and form of a comment shouldn't be set in stone, but ultimately governed by wise, well-thought application of common sense anyway. If you find yourself refusing to try out a slight bent to a subjective "rule" just for the sake of it, you might need to step back and realize if you're not facing it critically enough.
Mostly use it for nutting out really complex code, or when explaining code to either other developers or non developers who understand the system.
I also flow diagrams or uml type diagrams when trying to do above also...
I generally use it when developing multiple if else statements that are nested which can be confusing.
This way I don't need to go back and document it since its already been done.
Fairly rarely, although I often document a method before writing the body of it.
However, If I'm helping another developer with how to approach a problem, I'll often write an email with a pseudocode solution.
I don't use pseudocode at all.
I'm more comfortable with the syntax of C style languages than I am with Pseudocode.
What I do do quite frequently for design purposes is essentially a functional decomposition style of coding.
public void doBigJob( params )
{
doTask1( params);
doTask2( params);
doTask3( params);
}
private void doTask1( params)
{
doSubTask1_1(params);
...
}
Which, in an ideal world, would eventually turn into working code as methods become more and more trivial. However, in real life, there is a heck of a lot of refactoring and rethinking of design.
We find this works well enough, as rarely do we come across an algorithm that is both: Incredibly complex and hard to code and not better solved using UML or other modelling technique.
I never use or used it.
I always try to prototype in a real language when I need to do something complex, usually writting unit tests first to figure out what the code needs to do.

Do you write code to sort a list these days? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
People in java/.net world has framework which provides methods for sorting a list.
In CS, we all might have gone through Bubble/Insertion/Merge/Shell sorting algorithms.
Do you write any of it these days?
With frameworks in place, do you write code for sorting?
Do you think it makes sense to ask people to write code to sort in an interview? (other than for intern/junior developer requirement)
There are two pieces of code I write today in order to sort data
list.Sort();
enumerable.OrderBy(x => x); // Occasionally a different lambda is used
I work for a developer tools company, and as such I sometimes need to write new RTL types and routines. Sorting is something developers need, so it's something I sometimes need to write.
Don't forget, all that library code wasn't handed down from some mountain: some developer somewhere had to write it.
I don't write the sorting algorithm, but I have implemented the IComparer in .Net for a few classes which was kind of interesting the first couple of times.
I wouldn't write the code for sorting given what is in the frameworks in most cases. There should be an understanding of why a particular sorting strategy like Quick sort is often used in frameworks like .Net.
I could see giving or being given a sorting question where some of the work is implementing the IComparer and understanding the different ways to sort a class. It would be a fairly easy thing to show someone a Bubble sort and ask, "Why wouldn't you want to do this in most applications?"
I can say with 100% certainty that I haven't written one of the 'traditional' sort routines since leaving University. It's nice to know the theory behind it, but to apply them to real-world situations that can't be done by other means doesn't happen very often (at least from my experience...).
only on employer's interview/test =)
I wrote a merge sort when I had to sort multi-gigabyte files with a custom key comparison. I love merge sort - it's easy to comprehend, stable, and has a worst-case O(n log n) performance.
I've been looking for an excuse to try radix sort too. It's not as general purpose as most sorting algorithms, so there aren't going to be any libraries that provide it, but under the right circumstances it should be a good speedup.
Personally, I've not had a need to write my own sorting code for a while.
As far as interview questions go, it would weed out those who didn't pay attention during CS classes.
You could test API knowledge by asking how would you build Comparable (Capital C) objects, or something along those lines.
The way I see it, just like many others fields of knowledge, programming also has a theoretical and a practical approach to it.
The field of "theoretical programming" is the one that gave us quicksort, Radix Sort, Djikstra's Algorithm and many other things absolutely necessary to the advance of computing.
The field of "practical programming" deals with the fact that the solutions created in "theoretical programming" should be easily accessible to all in a much easier way, so that the theoretical ideas can get many, many creative uses. This gave us high-level languages like Python and allowed pretty much any language to implement packed methods for the most basics operations like sorting or searching with a good enough performance to be fit for almost everyone.
One can't live without the other...
most of us not needing to hard code a sorting algorithm doesn't mean no one should.
I've reciently had to write a sort, of sorts.
I had a list of text.. the ten most common had to show up according to the frequency at which they were selected. All other entries had to show up according to alpha sort.
It wasn't crazy hard to do but I did have to write a sort to support it.
I've also had to sort objects whose elements aren't easily sorted with an out of the box code.
Same goes for searching.. I had to walk a file and search staticly sized records.. When I found a record I had to move one record back, because I was inserting before it.
For the most part it was very simple and I mearly pasted in a binary search. Some changes needed to be done to support the method of access, because I wasn't using an array that was acutally in memory.. Ah c&#p.. I could have treated it like a stream.. See now I want to go back and take a look..
Man, if someone asked me in an interview what the best sort algorithm was, and didn't understand immediately when I said 'timsort', I'd seriously reconsider if I wanted to work there.
Timsort
This describes an adaptive, stable,
natural mergesort, modestly called
timsort (hey, I earned it ). It
has supernatural performance on many
kinds of partially ordered arrays
(less than lg(N!) comparisons needed,
and as few as N-1), yet as fast as
Python's previous highly tuned
samplesort hybrid on random arrays.
In a nutshell, the main routine
marches over the array once, left to
right, alternately identifying the
next run, then merging it into the
previous runs "intelligently".
Everything else is complication for
speed, and some hard-won measure of
memory efficiency.
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
Is timsort general-purpose or Python-specific?
I haven't really implemented a sort, except as coding exercise and to observe interesting features of a language (like how you can do quicksort on one line in Python).
I think it's a valid question to ask in an interview because it reflects whether the developer thinks about these kind of things... I feel it's important to know what that list.sort() is doing when you call it. It's just part of my theory that you should know the fundamentals behind everything as a programmer.
I never write anything for which there's a library routine. I haven't coded a sort in decades. Nor would I ever. With quicksort and timsort directly available, there's no reason to write a sort.
I note that SQL does sorting for me.
There are lots of things I don't write, sorts being just one of them.
I never write my own I/O drivers. (Although I have in the past.)
I never write my own graphics libraries. (Yes, I did this once, too, in the '80s)
I never write my own file system. (Avoided this.)
There is definitely no reason to code one anymore. I think it is important though to understand the efficiency of what you are using so that you can pick the best one for the data you are sorting.
Yes. Sometimes digging out Shell sort beats the builtin sort routine when your list is only expected to be at most a few tens of records.

Resources