Why were algorithm names, which duplicate FP notions, reinvented - c++11

These algorithms seem to correspond one on one with common notions in functional programming:
std::transform - map
std::remove_if - filter
std::accumulate - foldr
Why did the committee decide to introduce new names for already established actions (as of 2011)? References to the technical proposals?

std::map is already taken by a container
std::remove_if is not a single function - there's a set of similar functions: std::remove, std::remove_copy, std::remove_copy_if. It would be really strange to have filter instead of remove_if in this set.
I can't really tell anything about accumulate and foldr, but from my opinion standard library names tend to be non-shortened for a clarity purpose and foldr is not very clear name.

Related

Do linked lists have a practical performance advantage in eager functional languages?

I'm aware that in lazy functional languages, linked lists take on generator-esque semantics, and that under an optimizing compiler, their overhead can be completely removed when they're not actually being used for storage.
But in eager functional languages, their use seems no less heavy, while optimizing them out seems more difficult. Is there a good performance reason that languages like Scheme use them over flat arrays as their primary sequence representation?
I'm aware of the obvious time complexity implications of using singly-linked lists; I'm more thinking about the in-practice performance consequences of using eager singly-linked lists as a primary sequence representation, compiler optimizations considered.
TL; DR: No performance advantage!
The first Lisp had cons (linked list cell) as only data structure and it used it for everything. Both Common Lisp and Scheme have vectors today, but for functional style it's not a good match. The linked list can have one recursive step add zero or more elements in front of an accumulator and in the end you have a list that was made with sharing between the iterations. The operation might do more than one recursion making several versions all sharing the tail. I would say sharing is the most important aspect of the linked list. If you make a minimax algorithm and store the state in a linked list you can change the state without having to copy the unchanged parts of the state.
Creator of C++ Bjarne Stroustrup mentions in a talk that the penalty of having the data structure scrambled and double the size like in a linked list is easily outperformed even if you insert in order and need to move half the elements in the data structure. Keep in mind these were doubly linked lists and mutation inserts, but he mentions that most of the time was following the pointers linearly, getting all the CPU cache misses, to find the correct spot and thus for every O(n) search in a sorted list a vector would be better.
If you have a program where you do many inserts on a list which is not in front then perhaps a tree is a better choice, which you can do in CL and Scheme with cons. In fact all of Chris Okasaki purly functional data structures can be implemented with cons. Most "mutable" structures in Haskell are implemented similar to it.
If you are suffering from performance problems in Scheme and after profiling you find out you should try to replace a linked list operation with an array one there is nothing standing in the way of that. In the end all algorithm choices have pros and cons. Hard computations are hard in any language.

What does it mean by express this puzzle as a CSP

What is meant by the below for the attached image.
By labelling each cell with a variable, express the puzzle as a CSP. Hint:
recall that a CSP is composed of three parts.
I initially thought just add variables to each cell like A, B, C etc to each cell and then constrain those cells, but I do not believe that is correct. I do not want the answer just an explanation of what is required. in terms of CSP.
In my opinion, a CSP is best divided into two parts:
State the constraints. This is called the modeling part or model.
Search for solutions using enumeration predicates like labeling/2.
These parts are best kept separate by using a predicate which we call core relation and which has the following properties:
It posts the constraints, i.e., it expresses part (1) above.
Its last argument is the list of variables that still need to be labeled.
By convention, its name ends with an underscore _.
Having this distinction in place allows you to:
try different search strategies without the need to recompile your code
reason about termination properties of the core relation in isolation of any concrete (and often very costly) search.
I can see how some instructors may decompose part (1) into:
1a. stating the domains of the variables, using for example in/2 constraints
1b. stating the other constraints that hold among the variables.
In my view, this distinction is artificial, because in/2 constraints are constraints like all other constraints in the modeling part, but some instructors may teach this separately also for historical reasons, dating back to the time when CSP systems were not as dynamic as they are now.
Nowadays, you can typically post additional domain restrictions any time you like and freely mix in/2 constraints with other constraints in any order.
So, the parts that are expected from you are likely: (a) state in/2 constraints, (b) state further constraints and (c) use enumeration predicates to search for concrete solutions. It also appears that you already have the right idea about how to solve this concrete CSP with this method.

What are the advantages of Cons?

Many functional programming languages support and recommend the data constructor Cons (for lists like (1, (2, (3))), such as Haskell and Scala.
But what are its advantages? Such lists can neither be randomly accessed, nor be appended to in O(1).
Cons (a shorthand for "construct") is not a data structure, it's the name of the operation for creating a cons-cell . And by linking together several cells a data structure can be built, in particular - a linked list. The rest of the discussion assumes that a linked list is being created with cons operations.
Although it's possible to append in O(1) at the head, accessing elements randomly by index is a costly operation, that requires the traversal of all the elements before the one that's being accessed.
The advantages of a linked list? it's a functional data structure, cheap to create or recreate in case of modifications; it allows sharing of nodes between several lists and it allows easy garbage collection. It's very flexible, with correct abstractions it's possible to represent other, more complex data structures such as stacks, queues, trees, graphs. And there are many, many procedures written specifically for manipulating lists - for instance map, filter, fold, etc. that make working with lists a joy. Finally, a list is a recursive data structure, and recursion (specially tail recursion) is the preferred way to solve problems in functional programming languages; so in these languages it's natural to have a recursive data structure as the main data structure.
First of all, let's distinguish between "cons" as a nickname for the ML-style list data constructor usually called :: and what the nickname comes from, the original Lisp-style cons function.
In Lisps, cons cells are an all-purpose data structure not restricted to lists of homogeneous element type. The equivalent in ML-style languages would be nested pairs or 2-tuples, with the empty list represented by the "unit" type often written (). Óscar López gives a good overview of the utility of the Lisp cons, so I'll leave it at that.
In most ML-style languages the advantages of immutable cons lists are not too different from their use for lists in Lisps, trading off the flexibility of dynamic typing for the guarantees of static typing and the syntax of ML-style pattern matching.
In Haskell, however, the situation is rather different due to lazy evaluation. Constructors are lazy and pattern matching on them is one of the few ways to force evaluation, so in contrast to strictly-evaluated languages it is often the case that you should avoid tail recursion. Instead, by placing the recursive call in the tail of a list it becomes possible to compute each recursive call only when needed. If a lazily-generated list is processed with appropriately lazy functions like map or foldr, it becomes possible to construct and consume a large list in constant memory, with the tails being forced at the same rate the heads are abandoned for the GC to clean up.
A common perspective in Haskell is that a lazy cons list is not so much a data structure as it is a control structure--a reified loop that composes efficiently with other such loops.
That said, there are many cases where a cons list is not appropriate--such as when repeated random access is needed--and in those situations using lists is certainly not recommended.

Method for runtime comparison of two programs' objects

I am working through a particular type of code testing that is rather nettlesome and could be automated, yet I'm not sure of the best practices. Before describing the problem, I want to make clear that I'm looking for the appropriate terminology and concepts, so that I can read more about how to implement it. Suggestions on best practices are welcome, certainly, but my goal is specific: what is this kind of approach called?
In the simplest case, I have two programs that take in a bunch of data, produce a variety of intermediate objects, and then return a final result. When tested end-to-end, the final results differ, hence the need to find out where the differences occur. Unfortunately, even intermediate results may differ, but not always in a significant way (i.e. some discrepancies are tolerable). The final wrinkle is that intermediate objects may not necessarily have the same names between the two programs, and the two sets of intermediate objects may not fully overlap (e.g. one program may have more intermediate objects than the other). Thus, I can't assume there is a one-to-one relationship between the objects created in the two programs.
The approach that I'm thinking of taking to automate this comparison of objects is as follows (it's roughly inspired by frequency counts in text corpora):
For each program, A and B: create a list of the objects created throughout execution, which may be indexed in a very simple manner, such as a001, a002, a003, a004, ... and similarly for B (b001, ...).
Let Na = # of unique object names encountered in A, similarly for Nb and # of objects in B.
Create two tables, TableA and TableB, with Na and Nb columns, respectively. Entries will record a value for each object at each trigger (i.e. for each row, defined next).
For each assignment in A, the simplest approach is to capture the hash value of all of the Na items; of course, one can use LOCF (last observation carried forward) for those items that don't change, and any as-yet unobserved objects are simply given a NULL entry. Repeat this for B.
Match entries in TableA and TableB via their hash values. Ideally, objects will arrive into the "vocabulary" in approximately the same order, so that order and hash value will allow one to identify the sequences of values.
Find discrepancies in the objects between A and B based on when the sequences of hash values diverge for any objects with divergent sequences.
Now, this is a simple approach and could work wonderfully if the data were simple, atomic, and not susceptible to numerical precision issues. However, I believe that numerical precision may cause hash values to diverge, though the impact is insignificant if the discrepancies are approximately at the machine tolerance level.
First: What is a name for such types of testing methods and concepts? An answer need not necessarily be the method above, but reflects the class of methods for comparing objects from two (or more) different programs.
Second: What are standard methods exist for what I describe in steps 3 and 4? For instance, the "value" need not only be a hash: one might also store the sizes of the objects - after all, two objects cannot be the same if they are massively different in size.
In practice, I tend to compare a small number of items, but I suspect that when automated this need not involve a lot of input from the user.
Edit 1: This paper is related in terms of comparing the execution traces; it mentions "code comparison", which is related to my interest, though I'm concerned with the data (i.e. objects) than with the actual code that produces the objects. I've just skimmed it, but will review it more carefully for methodology. More importantly, this suggests that comparing code traces may be extended to comparing data traces. This paper analyzes some comparisons of code traces, albeit in a wholly unrelated area of security testing.
Perhaps data-tracing and stack-trace methods are related. Checkpointing is slightly related, but its typical use (i.e. saving all of the state) is overkill.
Edit 2: Other related concepts include differential program analysis and monitoring of remote systems (e.g. space probes) where one attempts to reproduce the calculations using a local implementation, usually a clone (think of a HAL-9000 compared to its earth-bound clones). I've looked down the routes of unit testing, reverse engineering, various kinds of forensics, and whatnot. In the development phase, one could ensure agreement with unit tests, but this doesn't seem to be useful for instrumented analyses. For reverse engineering, the goal can be code & data agreement, but methods for assessing fidelity of re-engineered code don't seem particularly easy to find. Forensics on a per-program basis are very easily found, but comparisons between programs don't seem to be that common.
(Making this answer community wiki, because dataflow programming and reactive programming are not my areas of expertise.)
The area of data flow programming appears to be related, and thus debugging of data flow programs may be helpful. This paper from 1981 gives several useful high level ideas. Although it's hard to translate these to immediately applicable code, it does suggest a method I'd overlooked: when approaching a program as a dataflow, one can either statically or dynamically identify where changes in input values cause changes in other values in the intermediate processing or in the output (not just changes in execution, if one were to examine control flow).
Although dataflow programming is often related to parallel or distributed computing, it seems to dovetail with Reactive Programming, which is how the monitoring of objects (e.g. the hashing) can be implemented.
This answer is far from adequate, hence the CW tag, as it doesn't really name the debugging method that I described. Perhaps this is a form of debugging for the reactive programming paradigm.
[Also note: although this answer is CW, if anyone has a far better answer in relation to dataflow or reactive programming, please feel free to post a separate answer and I will remove this one.]
Note 1: Henrik Nilsson and Peter Fritzson have a number of papers on debugging for lazy functional languages, which are somewhat related: the debugging goal is to assess values, not the execution of code. This paper seems to have several good ideas, and their work partially inspired this paper on a debugger for a reactive programming language called Lustre. These references don't answer the original question, but may be of interest to anyone facing this same challenge, albeit in a different programming context.

Efficient Mutable Graph Representation in Prolog?

I would like to represent a mutable graph in Prolog in an efficient manner. I will searching for subsets in the graph and replacing them with other subsets.
I've managed to get something working using the database as my 'graph storage'. For instance, I have:
:- dynamic step/2.
% step(Type, Name).
:- dynamic sequence/2.
% sequence(Step, NextStep).
I then use a few rules to retract subsets I've matched and replace them with new steps using assert. I'm really liking this method... it's easy to read and deal with, and I let Prolog do a lot of the heavy pattern-matching work.
The other way I know to represent graphs is using lists of nodes and adjacency connections. I've seen plenty of websites using this method, but I'm a bit hesitant because it's more overhead.
Execution time is important to me, as is ease-of-development for myself.
What are the pros/cons for either approach?
As usual: Using the dynamic database gives you indexing, which may speed things up (on look-up) and slow you down (on asserting). In general, the dynamic database is not so good when you assert more often than you look up. The main drawback though is that it also significantly complicates testing and debugging, because you cannot test your predicates in isolation, and need to keep the current implicit state of the database in mind. Lists of nodes and adjacancy connections are a good representation in many cases. A different representation I like a lot, especially if you need to store further attributes for nodes and edges, is to use one variable for each node, and use variable attribtues (get_attr/3 and put_attr/3 in SWI-Prolog) to store edges on them, for example [edge_to(E1,N_1),edge_to(E2,N_2),...] where N_i are the variables representing other nodes (with their own attributes), and E_j are also variables onto which you can attach further attributes to store additional information (weight, capacity etc.) about each edge if needed.
Have you considered using SWI-Prolog's RDF database ? http://www.swi-prolog.org/pldoc/package/semweb.html
as mat said, dynamic predicates have an extra cost.
in case however you can construct the graph and then you dont need to change it, you can compile the predicate and it will be as fast as a normal predicate.
usually in sw-prolog the predicate lookup is done using hash tables on the first argument. (they are resized in case of dynamic predicates)
another solution is association lists where the cost of lookup etc is o(log(n))
after you understand how they work you could easily write an interface if needed.
in the end, you can always use a SQL database and use the ODBC interface to submit queries (although it sounds like an overkill for the application you mentioned)

Resources