Why is there no peek! function for clojure transient vectors?

Why is there no peek! function for clojure transient vectors? - data-structures

Clojure has transient analogs for some of its persistent data structures, vectors, maps and sets. For vectors, there are pop! and conj! functions, analogous to pop and conj for persistent vectors, but no peek!.
Is there a technical reason that makes an efficient implementation of peek! impossible? Or is it just not necessary in most use cases for transient vectors? I can always do
(defn peek! [tvec] (get tvec (dec (count tvec))))
But it seems strange that there's no built-in solution.

That's really a design question best directed to the ggroup, but FWIW, I did investigate peek / peek! a while ago and providing peek! seems to be a simple matter of creating a new clojure.lang.ITransientStack interface to parallel clojure.lang.IPersistentStack and having transient vectors implement it.
My guess is that if such an interface is not already available (and used by transients), it's probably a matter of priorities. A single-threaded fast stack implementation is already available in Clojure in the form of java.util.Stack, so we're not missing out on that many features here; syntactic convenience and smooth conversion to persistent vectors will probably come as headway is made on Clojure-in-Clojure.
(Where the return on the effort invested is high, improvements to the Java side of Clojure make sense even if the ultimate goal is eventually to drop the relevant part of the Java codebase and replace it with an implementation in Clojure. Where expected returns are lower, it might make more sense to wait for protocols to be used more pervasively etc. The currently available set of functions for handling transients suffices for Clojure's own needs and I'm not sure if there's ever been call for peek! on the ggroup -- as for #clojure, I remember one relevant conversation -- so the return is probably judged low... You could start a grassroots movements to change this, though. :-))

Related

Programming Chess in Functional programming

A year ago I programmed a chess AI using the Alphabeta prunning algorithm. This was relatively straight forward to do in c++. One of the main issues I considered while doing this was making my code efficient. I did this by using having a data type I called a "game" that I passed around through the search tree made by the algorithm. To increase efficiency I didn't ever copy the "game" data type but rather mutated it while keeping the nessisary information needed to return it to its previous states.
Recently I have been reading about functional programming and the concept of purely using functions that do not change the state of the parameters they are passed appeals to me. I am wondering how I would using the paradigm of functional programming while still taking efficiency of the program into account.
In OOP the solution seems quite straight forward (which is what I implemented) while in functional programming it seems that copying data types is nessisary which decreases efficiency. Is it possible to use functional programming without this loss of efficiency?

In functional programming, data structures are not always copied completely. In many cases, only the part that changes needs to be copied, while the old parts can be referenced (since no mutation is allowed, this is safe).
The article on persistant data structures describes this in more detail.

Jephron's answer points out the important fact that only small parts of a persistent data structure need to get updated, thus the bigger part is shared between the old state and the new state.
To be honest, this would still be slower than a mutation in most cases.
But immutable, persistent data structures have other advantages. Let's assume you have already completed the playing engine. And now, you want to implement a history (for example to allow the player to undo earlier moves). This is dead simple: Just remember all states in a list. You'll find that you need to touch only a few functions to take a list of states instead of just the last state, and you're done. You don't need to worry about compromising your game engine --- there is no global variable or something you could destroy.
Another thing is taking advantage of the many CPU cores you probably have by employing parallelism. Needless to say that you can't let many tasks, threads, fibers or whatever operate on a single mutable data structure. This would just become a synchronization nightmare, and your code would probably go slower even. However, there simply are no synchronization problems on immutable data, as they are read only for all threads.
This could very well speed up your code in such a way that it dwarfs the C++ solution, even if "doing a move" on a functional data structure is much slower than on mutable data.
Here is an example for changing a board game (TTT) from single threaded to parallel: https://dierk.gitbooks.io/fregegoodness/content/src/docs/asciidoc/incremental_episode4.html

Scheme efficiency structure

I was wondering if we interpreters were cheating to get better performance. As I understand, the only real datastructure in a scheme is the cons cell.
Obviously, a cons cell is good to make simple datastructure like linked list and trees but I think it might get make the code slowlier in some case for example if you want to access the cadr of an object. It would get worse with a data structure with many more elements...
That said, may be scheme car and cdr are so efficient that it's not much slowlier than having a register offset in C++ for example.
I was wondering if it was necessary to implement a special datastructure that allocate native memory block. Something similar to using malloc. I'm talking about pure scheme and not anything related to FFI.

As I understand, the only real datastructure in a scheme is the cons cell.
That’s not true at all.
R5RS, R6RS, and R7RS scheme all include vectors and bytevectors as well as pairs/lists, which are the contiguous memory blocks you allude to.
Also, consider that Scheme is a minimal standard, and individual Scheme implementations tend to provide many more primitives than exist in the standard. These make it possible to do efficient I/O, for example, and many Schemes also provide an FFI to call C code if you want to do something that isn’t natively supported by the Scheme implementation you’re using.
It’s true that linked lists are a relatively poor general-purpose data structure, but they are simple and work alright for iteration from front to back, which is fairly common in functional programming. If you need random access, though, you’re right—using linked lists is a bad idea.

First off. There are many primitive types and many different compound types and even user described types in Scheme.
In C++ the memory model and how values are stored are a crucial part of the standard. In Scheme you have not have access to the language internals as standard, but implementations can do it to have a higher percentage of the implementation written in Scheme.
The standard doesn't interfere in how the implementation chooses to store data so even though many imitate each other with embedding primitive values in the address and otherwise every other value is a object on the heap it doesn't need to be. Using pairs as the implementation of vectors (arrays in C++) is pushing it and would make for a very unpopular implementation if not just a funny prank.
With R6RS you can make your own types it's even extendable with records:
(define-record-type (node make-node node?)
(fields
(immutable value node-value)
(immutable left node-left))
(immutable right node-right)))
node? would be disjoint and thus no other values would be #t other than values made with the constructor make-node and this has 3 fields instead of using two cons to store the same.
Now C++ has perhaps the edge by default when it comes to storing elements of the same type in an array, but you can in many ways work around this. Eg. use the same trick as you see in this video about optimizing java for memory usage. I would have started by making a good data modeling with records and rather worried about the performance when they become an issue.

When to use references versus types versus boxes and slices versus vectors as arguments and return types?

I've been working with Rust the past few days to build a new library (related to abstract algebra) and I'm struggling with some of the best practices of the language. For example, I implemented a longest common subsequence function taking &[&T] for the sequences. I figured this was Rust convention, as it avoided copying the data (T, which may not be easily copy-able, or may be big). When changing my algorithm to work with simpler &[T]'s, which I needed elsewhere in my code, I was forced to put the Copy type constraint in, since it needed to copy the T's and not just copy a reference.
So my higher-level question is: what are the best-practices for passing data between threads and structures in long-running processes, such as a server that responds to queries requiring big data crunching? Any specificity at all would be extremely helpful as I've found very little. Do you generally want to pass parameters by reference? Do you generally want to avoid returning references as I read in the Rust book? Is it better to work with &[&T] or &[T] or Vec<T> or Vec<&T>, and why? Is it better to return a Box<T> or a T? I realize the word "better" here is considerably ill-defined, but hope you'll understand my meaning -- what pitfalls should I consider when defining functions and structures to avoid realizing my stupidity later and having to refactor everything?
Perhaps another way to put it is, what "algorithm" should my brain follow to determine where I should use references vs. boxes vs. plain types, as well as slices vs. arrays vs. vectors? I hesitate to start using references and Box<T> returns everywhere, as I think that'd get me a sort of "Java in Rust" effect, and that's not what I'm going for!

Modern data structures

I just realized all the data structures I regularly use are really old and really simple. Linked lists, hash tables, trees, and even the more complex variants such as VLists or RBTrees are all pretty old inventions.
Most of them were conceived for a serial, single CPU world and require adapting to work in parallel environments.
What kind of newer, better data structures do we have? Why are they not widely used?
I understand using a plain old linked list if you have to implement it and prefer the simplicity, but having huge STLs and piles of third party libraries like Guava or Boost, why am I still placing locks around hashes?
Don't we have potentially standard, hard-proven modern data structures that can actually replace the trusty old-timers?

There is nothing wrong with the old ones. A good way to keep flexibility is to separate concerns. Normal (old style) datastructures are concerned with the way how data is stored. Locking is a completely different concern, which should not be part of the datastructure.
Locking is a potentially expensive operation, so if you can, you should lock multiple structures at once to optimize your code. I.e. lock critical sections not datastructures. If you directly add locking to your datastructures, you do not have the possibility to optimize this way. Also this will introduce implicit synchronisation points, that you may not want and canot control.
This does not answer a different aspect of your question. The part of "Why do we need locking at all". The answer is, that sometimes there is just no way around it. You either need to have lock somewhere, completely rely on atomic operations or disallow mutation altogether.
Method one is not feasible, as I have pointed out above, because you loose potential for optimization and you have implicit synchronisation points.
Only using atomic operations in your data structure (i.e. non-locking structures) is still an open research question, and might not always be possible. I know of some non-locking structures, i.e. queues, lists etc, but I have never heard of a non locking tree. Also non-locking structures tend to become much more complicated and slower, so we still need some better structure for thread local data, and can only add these to our datastructure zoo.
Not having mutable datastructures at all is in my opinion the best way of all of them. Mutability is often more of a hassle than it is worth. However this is a concept from functional programming and only makes sense in such an environment. Functional programming however is regarded as an esoteric concept by most programmers. Most languages which are actually used in production work mainly use non-functional concepts (this does not mean it actually is more complicated or is more error prone, it is just reflecting the current state of training among developers). In my opinion functional programming will become more wide spread, once people start to note it solves their threading problems automatically in a blink. Several other languages are now borrowing already from functional languages, so this is probably where we will find the next evolution of data structures.

If you want lock-free data structures, study persistent data structures. These are mostly popular in the functional programming world, but are applicable in other domains as well. Most persistent DSs are variants of plain lists, trees etc. but newer ones such as hash tries have surfaced in recent years.

Does coding towards an interface rather then an implementation imply a performance hit?

In day to day programs I wouldn't even bother thinking about the possible performance hit for coding against interfaces rather than implementations. The advantages largely outweigh the cost. So please no generic advice on good OOP.
Nevertheless in this post, the designer of the XNA (game) platform gives as his main argument to not have designed his framework's core classes against an interface that it would imply a performance hit. Seeing it is in the context of a game development where every fps possibly counts, I think it is a valid question to ask yourself.
Does anybody have any stats on that? I don't see a good way to test/measure this as don't know what implications I should bear in mind with such a game (graphics) object.

Coding to an interface is always going to be easier, simply because interfaces, if done right, are much simpler. Its palpably easier to write a correct program using an interface.
And as the old maxim goes, its easier to make a correct program run fast than to make a fast program run correctly.
So program to the interface, get everything working and then do some profiling to help you meet whatever performance requirements you may have.

What Things Cost in Managed Code
"There does not appear to be a significant difference in the raw cost of a static call, instance call, virtual call, or interface call."
It depends on how much of your code gets inlined or not at compile time, which can increase performance ~5x.
It also takes longer to code to interfaces, because you have to code the contract(interface) and then the concrete implementation.
But doing things the right way always takes longer.

First I'd say that the common conception is that programmers time is usually more important, and working against implementation will probably force much more work when the implementation changes.
Second with proper compiler/Jit I would assume that working with interface takes a ridiculously small amount of extra time compared to working against the implementation itself.
Moreover, techniques like templates can remove the interface code from running.
Third to quote Knuth : "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
So I'd suggest coding well first, and only if you are sure that there is a problem with the Interface, only then I would consider changing.
Also I would assume that if this performance hit was true, most games wouldn't have used an OOP approach with C++, but this is not the case, this Article elaborates a bit about it.
It's hard to talk about tests in a general form, naturally a bad program may spend a lot of time on bad interfaces, but I doubt if this is true for all programs, so you really should look at each particular program.

Interfaces generally imply a few hits to performance (this however may change depending on the language/runtime used):
Interface methods are usually implemented via a virtual call by the compiler. As another user points out, these can not be inlined by the compiler so you lose that potential gain. Additionally, they add a few instructions (jumps and memory access) at a minimum to get the proper PC in the code segment.
Interfaces, in a number of languages, also imply a graph and require a DAG (directed acyclic graph) to properly manage memory. In various languages/runtimes you can actually get a memory 'leak' in the managed environment by having a cyclic graph. This imposes great stress (obviously) on the garbage collector/memory in the system. Watch out for cyclic graphs!
Some languages use a COM style interface as their underlying interface, automatically calling AddRef/Release whenever the interface is assigned to a local, or passed by value to a function (used for life cycle management). These AddRef/Release calls can add up and be quite costly. Some languages have accounted for this and may allow you to pass an interface as 'const' which will not generate the AddRef/Release pair automatically cutting down on these calls.
Here is a small example of a cyclic graph where 2 interfaces reference each other and neither will automatically be collected as their refcounts will always be greater than 1.
interface Parent {
Child c;
}
interface Child {
Parent p;
}
function createGraph() {
...
Parent p = ParentFactory::CreateParent();
Child c = ChildFactory::CreateChild();
p.c = c;
c.p = p;
... // do stuff here
// p has a reference to c and c has a reference to p.
// When the function goes out of scope and attempts to clean up the locals
// it will note that p has a refcount of 1 and c has a refcount of 1 so neither
// can be cleaned up (of course, this is depending on the language/runtime and
// if DAGS are allowed for interfaces). If you were to set c.p = null or
// p.c = null then the 2 interfaces will be released when the scope is cleaned up.
}

I think object lifetime and the number of instances you're creating will provide a coarse-grain answer.
If you're talking about something which will have thousands of instances, with short lifetimes, I would guess that's probably better done with a struct rather than a class, let alone a class implementing an interface.
For something more component-like, with low numbers of instances and moderate-to-long lifetime, I can't imagine it's going to make much difference.

IMO yes, but for a fundamental design reason far more subtle and complex than virtual dispatch or COM-like interface queries or object metadata required for runtime type information or anything like that. There is overhead associated with all of that but it depends a lot on the language and compiler(s) used, and also depends on whether the optimizer can eliminate such overhead at compile-time or link-time. Yet in my opinion there's a broader conceptual reason why coding to an interface implies (not guarantees) a performance hit:
Coding to an interface implies that there is a barrier between you and
the concrete data/memory you want to access and transform.
This is the primary reason I see. As a very simple example, let's say you have an abstract image interface. It fully abstracts away its concrete details like its pixel format. The problem here is that often the most efficient image operations need those concrete details. We can't implement our custom image filter with efficient SIMD instructions, for example, if we had to getPixel one at a time and setPixel one at a time and while oblivious to the underlying pixel format.
Of course the abstract image could try to provide all these operations, and those operations could be implemented very efficiently since they have access to the private, internal details of the concrete image which implements that interface, but that only holds up as long as the image interface provides everything the client would ever want to do with an image.
Often at some point an interface cannot hope to provide every function imaginable to the entire world, and so such interfaces, when faced with performance-critical concerns while simultaneously needing to fulfill a wide range of needs, will often leak their concrete details. The abstract image might still provide, say, a pointer to its underlying pixels through a pixels() method which largely defeats a lot of the purpose of coding to an interface, but often becomes a necessity in the most performance-critical areas.
Just in general a lot of the most efficient code often has to be written against very concrete details at some level, like code written specifically for single-precision floating-point, code written specifically for 32-bit RGBA images, code written specifically for GPU, specifically for AVX-512, specifically for mobile hardware, etc. So there's a fundamental barrier, at least with the tools we have so far, where we cannot abstract that all away and just code to an interface without an implied penalty.
Of course our lives would become so much easier if we could just write code, oblivious to all such concrete details like whether we're dealing with 32-bit SPFP or 64-bit DPFP, whether we're writing shaders on a limited mobile device or a high-end desktop, and have all of it be the most competitively efficient code out there. But we're far from that stage. Our current tools still often require us to write our performance-critical code against concrete details.
And lastly this is kind of an issue of granularity. Naturally if we have to work with things on a pixel-by-pixel basis, then any attempts to abstract away concrete details of a pixel could lead to a major performance penalty. But if we're expressing things at the image level like, "alpha blend these two images together", that could be a very negligible cost even if there's virtual dispatch overhead and so forth. So as we work towards higher-level code, often any implied performance penalty of coding to an interface diminishes to a point of becoming completely trivial. But there's always that need for the low-level code which does do things like process things on a pixel-by-pixel basis, looping through millions of them many times per frame, and there the cost of coding to an interface can carry a pretty substantial penalty, if only because it's hiding the concrete details necessary to write the most efficient implementation.

In my personal opinion, all the really heavy lifting when it comes to graphics is passed on to the GPU anwyay. These frees up your CPU to do other things like program flow and logic. I am not sure if there is a performance hit when programming to an interface but thinking about the nature of games, they are not something that needs to be extendable. Maybe certain classes but on the whole I wouldn't think that a game needs to programmed with extensibility in mind. So go ahead, code the implementation.

it would imply a performance hit
The designer should be able to prove his opinion.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio