Purely functional soft heap - data-structures

Are there any implementations of a purely functional soft heap data structure in any language?

A quick search of the ACM digital library indicates that Chazelle's soft heap structure, despite being very interesting, has received relatively little study, and that persistent/functional soft heaps are thus an open research topic.
So I would say no, there are no known approaches for persistent soft heaps. Describing one would be a publishable result (it may boil down to adding copying where you would mutate the original structure, and identifying sharing opportunities).

The Haim Kaplan, Robert E. Tarjan, Uri Zwick paper describes but doesn't fully analyze purely functional variant. It can be found at:
http://phdtree.org/pdf/44150182-soft-heaps-simplified/

This project has Java code that might not be too terrible to translate to Scala... and then make it more functional.
https://github.com/lowasser/SoftSelect
But as noted previously the Purely Functional Data Structures book has Haskell code that may be easier to adopt to Soft Heaps, especially given the example Java code.
https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

Related

Is there a Strict Fibonacci Heap implementation available?

I would like to implement a strict fibonacci heap, but the structure is very complicated and it would be nice to have an example implamentation in any language. But I didn't find one yet.
It seems like the paper "Strict Fibonacci Heaps" from 2012 is the only source which describes this structure in detail. But in "A back-to-basics empirical study of priority queues" they measured it's practical performance so they had to implement the heap.
Is there any implementation publicly available?
In the paper that you cite (about the empirical study), the first reference is to their codebase: https://code.google.com/archive/p/priority-queue-testing/source/default/source
I also have an implementation for it, (also in C and still not fully completed), you can check it here: https://github.com/lucid-at-dream/citylife/blob/master/src/base-libs/data_structures/heap.c
Cheers!

Right approach to learning data structures

Maybe some of you will not find this question to be inappropriate in this forum but I sincerely need some guidance on this. I have been working on Algorithms and Data structures lately, for Algorithm I have been practicing problem solving on topcoder and codechef, which is helping me a lot in understanding algorithms. But most problems are focused on algorithm, and I still don't get a lot problems where I have to focus on which data structure to pick. So can anybody recommend some website or other tools that focus on developing right instincts for choosing Data structures and their implementation.
The first step is to implement the different data structures so you understand the operations they support. You should also make a table of them with the operations they support and their complexity. Take a book on algorithms and data structures and implement all the data structures in it and work through the problems.
Once you understand the data structures well, you'll gain much more from doing hard problems and looking at clever solutions. If you see a clever use of a data structure you know well, it'll typically be much more surprising to you and you'll remember the solution.
Another important point is that if you typically use a specific programming language, make sure you know what data structures are provided by its standard library and make sure you know what the standard (or documentation) says about their implementation (i.e. what complexity bounds different operations are guaranteed to have etc.).

Learning efficient algorithms

Up until now I've mostly concentrated on how to properly design code, make it as readable as possible and as maintainable as possible. So I alway chose to learn about the higher level details of programming, such as class interactions, API design, etc.
Algorithms I never really found particularly interesting. As a result, even though I can come up with a good design for my programs, and even if I can come up with a solution to a given problem it rarely is the most efficient.
Is there a particular way of thinking about problems that helps you come up with an as efficient solution as possible, or is it simple a matter of practice and/or memorizing?
Also, what online resources can you recommend that teach you various efficient algorithms for different problems?
Data dominates. If you design your program around the right abstract data structures (ADTs), you often get a clean design, the algorithms follow quite naturally and when performance is lacking, you should be able to "plug in" more efficient ones.
A strong background in maths and logic helps here, as it allows you to visualize your program at a high level as the interaction between functions, sets, graphs, sequences, etc. You then decide whether the sets need to be ordered (balanced BST, O(lg n) operations) or not (hash tables, O(1) operations), what operations need to supported on sequences (vector-like or list-like), etc.
If you want to learn some algorithms, get a good book such as Cormen et al. and try to implement the main data structures:
binary search trees
generic binary search trees (that work on more than just int or strings)
hash tables
priority queues/heaps
dynamic arrays
Introduction To Algorithms is a great book to get you thinking about efficiency of different algorithms/data structures.
The authors of the book also teach an algorithms course on MIT . You can find most lectures here
I would say that in coming up with good algorithms (which is actually part of good design IMHO), you have to develop a way of thinking. This is best done by studying algorithm design. By study I don't mean just knowing all the common algorithms covered in a textbook, but actually understanding how and why they work, and being able to apply the basic idea contained in them to actual problems you are trying to solve.
I would suggest reading a good book on algorithms (my favourite is CLRS). For an online resource I would recommend the series of articles in the TopCoder Algorithm Tutorials.
I do not understand why you would mention practice and memorization in the same breath. Memorization won't help you at all (you probably already know this), but practice is essential. If you cannot apply what you learned, its not really learning. You can practice at various online programming contest/puzzle sites like SPOJ, Project Euler and PythonChallenge.
Recommendations:
First of all i recommend the book "Intro to Algorithms, Second Edition By corman", great book has most(if not all) of the algorithms you will need. (Some of the more important topics are sorting-algorithms, shortest paths, dynamic programing, many data structures like bst, hash maps, heaps).
another great way to learn algorithms is http://ace.delos.com/usacogate, great practice after the begining.
To your questions you will just get used to write good fast running code, after a little practice you just wouldnt want to write un-efficient code.
While I think #larsmans is correct inasmuch that understanding logic and maths is a fast way to understanding how to choose useful ADTs for solving a given problem, studying existing solutions may be more instructive for those of us who struggle with those topics. In particular, reviewing code of established software (OSS) that solves some similar problem as the one you're interested in.
I find a particularly good method for this method of study is reviewing unit tests of such a project. Apache Lucene, for example, has a source control repository containing numerous examples. While it doesn't reveal the underlying algorithms, it helps trace to particular functionality that solves a certain problem. This leads to an opportunity for studying its innards - i.e. an interesting algorithm. In Lucene's case inverted indices come to mind.
While this does not guarantee the algorithm you discover is the best, it's likely one that's received a lot scrutiny and probably comes from project with an active mailing that may answer your questions. So it's a good resource for finding a solution that is probably better than what most of us would come up with on our own.

Formally verifying the correctness of an algorithm

First of all, is this only possible on algorithms which have no side effects?
Secondly, where could I learn about this process, any good books, articles, etc?
COQ is a proof assistant that produces correct ocaml output. It's pretty complicated though. I never got around to looking at it, but my coworker started and then stopped using it after two months. It was mostly because he wanted to get things done quicker, but if you need to verify an algorithm this might be a good idea.
Here is a course that uses COQ and talks about proving algorithms.
And here is a tutorial about writing academic papers in COQ.
It's generally a lot easier to verify/prove correctness when no side effects are involved, but it's not an absolute requirement.
You might want to look at some of the documentation for a formal specification language like Z. A formal specification isn't a proof itself, but is often the basis for one.
I think that verifying the correctness of an algorithm would be validating its conformance with a specification. There is a branch of theoretical Computer Science called Formal Methods which may be what you are looking for if you need to get as close to proof as you can. From wikipedia,
Formal Methods are a particular kind
of mathematically-based techniques for
the specification, development and
verification of software and hardware
systems
You will be able to find many learning resources and tools from the multitude of links on the linked Wikipedia page and from the Formal Methods wiki.
Usually proofs of correctness are very specific to the algorithm at hand.
However, there are several well known tricks that are used and re-used again. For example, with recursive algorithms you can use loop invariants.
Another common trick is reducing the original problem to a problem for which your algorithm's proof of correctness is easier to show, then either generalizing the easier problem or showing that the easier problem can be translated to a solution to the original problem. Here is a description.
If you have a particular algorithm in mind, you may do better in asking how to construct a proof for that algorithm rather than a general answer.
Buy these books: http://www.amazon.com/Science-Programming-Monographs-Computer/dp/0387964800
The Gries book, Scientific Programming is great stuff. Patient, thorough, complete.
Logic in Computer Science, by Huth and Ryan, gives a reasonably readable overview of modern systems for verifying systems. Once upon a time people talked about proving programs correct - with programming languages which may or may not have side effects. The impression I get from this book and elsewhere is that real applications are different - for instance proving that a protocol is correct, or that a chip's floating point unit can divide correctly, or that a lock-free routine for manipulating linked lists is correct.
ACM Computing Surveys Vol 41 Issue 4 (October 2009) is a special issue on software verification. It looks like you can get to at least one of the papers without an ACM account by searching for "Formal Methods: Practice and Experience".
The tool Frama-C, for which Elazar suggests a demo video in the comments, gives you a specification language, ACSL, for writing function contracts and various analyzers for verifying that a C function satisfies its contract and safety properties such as the absence of run-time errors.
An extended tutorial, ACSL by example, shows examples of actual C algorithms being specified and verified, and separates the side-effect-free functions from the effectful ones (the side-effect-free ones are considered easier and come first in the tutorial). This document is also interesting in that it was not written by the designers of the tools it describe, so it gives a fresher and more didactic look at these techniques.
If you are familiar with LISP then you should definitely check out ACL2: http://www.cs.utexas.edu/~moore/acl2/acl2-doc.html
Dijkstra's Discipline of Programming and his EWDs lay the foundation for formal verification as a science in programming. A simpler work is Wirth's Systematic Programming, which begins with the simple approach to using verification. Wirth uses pre-ISO Pascal for the language; Dijkstra uses an Algol-68-like formalism called Guarded (GCL). Formal verification has matured since Dijkstra and Hoare, but these older texts may still be a good starting point.
PVS tool developed by Stanford guys is a specification and verification system. I worked on it and found it very useful for Theoram Proving.
WRT (1), you will probably have to create a model of the algorithm in a way that "captures" the side-effects of the algorithm in a program variable intended to model such state-based side-effects.

Real world implementations of "classical algorithms"

I wonder how many of you have implemented one of computer science's "classical algorithms" like Dijkstra's algorithm or data structures (e.g. binary search trees) in a real world, not academic project?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
The library doesn't know what your problem domain is and won't be able to chose the correct algorithm to do the job. That is why I think it is important to know about them: then YOU can make the correct choice of algorithms to solve YOUR problem.
Knowing, or being able to understand these algorithms is important, these are the tools of your trade. It does not mean you have to be able to implement A* in an hour from memory. But you should be able to figure out what the advantages of using a red-black tree as opposed to a normal unbalanced tree are so you can decide if you need it or not. You need to be able to judge the fitness of an algorithm for solving your problem.
This might sound too school-masterish but these "classical algorithms" were not invented to give college students exam questions, they were invented to solve problems or improve on current solutions, just like the array, the linked list or the stack are building blocks to write a program so are some of these. Just like in math where you move from addition and subtraction to integration and differentiation, these are advanced techniques that will help you solve problems that are out there.
They might not be directly applicable to your problems or work situation but in the long run knowing of them will help you as a professional software engineer.
To answer your question, I did an implementation of A* recently for a game.
Is there a benefit to understanding your tools, rather than simply knowing that they exist?
Yes, of course there is. Taking a trivial example, don't you think there's a benefit to knowing what the difference is List (or your language's equivalent dynamic array implementation) and LinkedList (or your language's equivalent)? It's pretty important to know that one has constant random access time, while the other is linear. And one requires N copies if you insert a value in the middle of the sequence, while the other can do it in constant time.
Don't you think there's an advantage to understanding that the same sorting algorithm isn't always optimal? That for almost-sorted data, quicksort sucks, for example? Naively just calling Sort() and hoping for the best can become ridiculously expensive if you don't understand what's happening under the hood.
Of course there are a lot of algorithms you probably won't need, but even so, just understanding how they work may make it easier for yourself to come up with efficient algorithms to solve other, unrelated, problems.
Well, someone has to write the libraries. While working at a mapping software company, I implemented Dijkstra's, as well as binary search trees, b-trees, n-ary trees, bk-trees and hidden markov models.
Besides, if all you want is a single 'well known' algorithm, and you also want the freedom to specialise it and optimise it if it becomes critical to performance, including a whole library seems like a poor choice.
We use a home grown implementation of a p-random number generator from Knuth SemiNumeric as an aid in some statistical processing
In my previous workplace, which was an EDA company, we implemented versions of Prim and Dijsktra's algorithms, disjoint set data structures, A* search and more. All of these had real world significance. I believe this is dependent on problem domain - some domains are more algorithm-intensive and some less so.
Having said that, there is a fine line to walk - I see no business reason for re-implementing STL or Java Generics. In many cases, a standard library is better than "inventing a wheel". The more you are near your core application, the more it may be necessary to implement a textbook algorithm or data structure.
If you never work with performance-critical code, consider yourself lucky. However, I consider this scenario unrealistic. Performance problems could occur anywhere. And then it's necessary to know how to fix that problem. Obviously, merely knowing a few algorithm names isn't enough here – unless you want to implement them all and try them out one after the other.
No, knowing (at least some of) the inner workings of different algorithms is important for gauging their strengths and weaknesses and for analyzing how they would handle your situation.
Obviously, if there's a library already implementing exactly what you need, you're incredibly lucky. But let's face it, even if there is such a library, using it is often not completely straightforward (at the very least, interfaces and data representation often have to be adapted) so it's still good to know what to expect.
A* for a pac man clone. It took me weeks to really get but to this day I consider it a thing of beauty.
I've had to implement some of the classical algorithms from numerical analysis. It was easier to write my own than to connect to an existing library. Also, I've had to write variations on classical algorithms because the textbook case didn't fit my application.
For classical data structures, I nearly always use the standard libraries, such as STL for C++. The one time recently when I thought STL didn't have the structure I needed (a heap) I rolled my own, only to have someone point out almost immediately that I didn't need to do that.
Classical algorithms I have used in actual work:
A topological sort
A red-black tree (although I will
confess that I only had to implement
insertions for that application and
it only got used in a prototype).
This got used to implement an
'ordered dict' type structure in
Python.
A priority queue
State machines of various sorts
Probably one or two others I can't remember.
As to the second part of the question:
An understanding of how the algorithms work, their complexity and semantics gets used on a fairly regular basis. They also inform the design of systems. Occasionally one has to do things involving parsing or protocol handling, or some computation that's slightly clever. Having a working knowledge of what the algorithms do, how they work, how expensive they are and where one might find them lying around in library code goes a long way to knowing how to avoid reinventing the wheel poorly.
I use the Levenshtein distance algorithm to help implement a 'Did you mean [suggested word]?' feature in our website search.
Works quite well when combined with our 'tagging' system, which allows us to associate extra words (other than those in title/description/etc) with items in the database. \
It's not perfect by any means, but it's way better than most corporate site searches, if I don't say so myself ; )
Classical algorithms are usually associated with something glamorous, like games, or Web search, or scientific computation. However, I had to use some of the classical algorithms for a mere enterprise application.
I was building a metadata migration tool, and I had to use topological sort for dependency resolution, various forms of graph traversals for queries on metadata, and a modified variation of Tarjan's union-find datastructure to partition forest-like structured metadata to trees.
That was a really satisfying experience. Most of those algorithms were implemented before, but their implementations lacked something that I would need for my task. That's why It's important to understand their internals.

Resources