Related
I saw a SO question yesterday about implementing a classic linked list in Java. It was clearly an assignment from an undergraduate data structures class. It's easy to find questions and implementations for lists, trees, etc. in all languages.
I've been learning about Java lambdas and trying to use them at every opportunity to get the idiom under my fingers. This question made me wonder: How would I write a custom list or tree so I could use it in all the Java 8 lambda machinery?
All the examples I see use the built in collections. Those work for me. I'm more curious about how a professor teaching data structures ought to rethink their techniques to reflect lambdas and functional programming.
I started with an Iterator,but it doesn't appear to be fully featured.
Does anyone have any advice?
Exposing a stream view of arbitrary data structures is pretty easy. The key interface you have to implement is Spliterator, which, as the name suggests, combines two things -- sequential element access (iteration) and decomposition (splitting).
Once you have a Spliterator, you can turn that into a stream easily with StreamSupport.stream(). In fact, here's the stream() method from AbstractCollection (which most collections just inherit):
default Stream<E> stream() {
return StreamSupport.stream(spliterator(), false);
}
All the real work is in the spliterator() method -- and there's a broad range of spliterator quality (the absolute minimum you need to implement is tryAdvance, but if that's all you implement, it will work sequentially, but will lose out on most of the stream optimizations.) Look in the JDK sources Arrays.stream(), IntStream.range()) for examples of how to do better.)
I'd look at http://www.javaslang.io for inspiration, a library that does exactly what you want to do: Implement custom lists, trees, etc. in a Java 8 manner.
It specifically doesn't closely couple with the JDK collections outside of importing/exporting methods, but re-implements all the immutable collection semantics that a Scala (or other FP language) developer would expect.
Sort is not trivial to implement and I can't find the module in either documentation or the autocomplete. Is it not supported yet?
Extending Alea GPU with sorting primitives is still pending but in the pipeline. In the mean time you would need to implement your own sorting kernels.
I was wondering if boost::range or range_v3 will reconciliate free functions and member functions in a similar way that std::begin reconciliates STL containers and C-like arrays (in terms of coding genericity I mean)?
More particularly it would be convenient to me to call std::sort on a list that automatically calls the best possible implementation given by std::list::sort.
At the end, could member functions be seen as interfaces for their generic
counterpart only (std::list::sort never called in client code)?
AFAIK, neither library you mention deals with this directly. There is a push to deal with this kind of thing more generally in C++17, including a proposal to make f(x) and x.f() equivalent, but as I mentioned in the comment above, I'm unclear if it will work with range-v3's algorithms.
I did notice an interesting comment in range-v3's sort.hpp: // TODO Forward iterators, like EoP?. So, perhaps Niebler does have ideas to support a more generic sort. ("EoP" is Elements of Programming by Alex Stepanov.)
One complication: A generic sort uses iterators to reorder values, while list::sort() reorders the links themselves. The distinction is important if you care what iterators point to after the sort, so you'd still need a way to select which sort you want. One could even argue that sort() should never call list::sort(), given the different semantics.
Are data structures like linked lists something that are purely academic for real programming or do you really use them? Are they things that are covered by generics so that you don't need to build them (assuming your language has generics)? I'm not debating the importance of understanding what they are, just the usage of them outside of academia. I ask from a front end web, backend database perspective. I'm sure someone somewhere builds these. I'm asking from my context.
Thank you.
EDIT: Are Generics so that you don't have to build linked lists and the like?
It will depend on the language and frameworks you're using. Most modern languages and frameworks won't make you reinvent these wheels. Instead, they'll provide things like List<T> or HashTable.
EDIT:
We probably use linked lists all the time, but don't realize it. We don't have to write implementations of linked lists on our own, because the frameworks we use have already written them for us.
You may also be getting confused about "generics". You may be referring to generic list classes like List<T>. This is just the same as the non-generic class List, but where the element is always of type T. It is probably implemented as a linked list, but we don't have to care about that.
We also don't have to worry about allocation of physical memory, or how interrupts work, or how to create a file system. We have operating systems to do that for us. But we may be taught that information in school just the same.
Certainly. Many "List" implementations in modern languages are actually linked lists, sometimes in combination with arrays or hash tables for direct access (by index as opposed to iteration).
Linked lists (especially doubly linked lists) are very commonly used in "real-world" data structures.
I would dare to say every common language has a pre-built implementation of linked list, either as a language primitive, native template library (e.g. C++), native library (e.g. Java) or some 3rd party implementation (probably open-source).
That being said, several times in the past I wrote a linked list implementation from scratch myself when creating infrastructure code for complex data structures. Sometimes it's a good idea to have full control over the implementation, and sometimes you need to add a "twist" to the classic implementation for it to satisfy your specific requirement. There's no right or wrong when it comes to whether to code your own implementation, as long as you understand the alternatives and trade-offs. In most cases, and certainly in very modern languages like C# I would avoid it.
Another point is when you should use lists versus array/vectors or hash tables. From your question I understand you are aware of the trade-offs here so I won't go too much into it, but basically, if your main usage is traversing lists by-order, and the list size may vary significantly, a list may be a viable option. Another consideration is the type of insertion. If a common use case is "inserting in the middle", than lists have a significant advantage over arrays/vectors. I can go on but this information is in the classic CS books :)
Clarification: My answer is language agnostic and does not relate specifically to Generics which to my understanding have a linked list implementation.
A singly-linked list is the only way to have a memory efficient immutable list which can be composed to "mutate" it. Look at how Erlang does it. It may be slightly slower than an array-backed list but it has very useful properties in multithreaded and purely-functional implementations.
Yes, there are real world application that use linked list, I sometimes have to maintain a huge application that makes very have use of linked lists.
And yes, linked lists are included in just about any class library from C++/STL to .net.
And I wish it used arrays instead.
In the real world linked lists are SLOW because of things like paging and CPU cache size (linked lists tend to spread you data and that makes it more likely that you will need to access data from different areas of memory and that is much slower on todays computers than using arrays that store all the data in one sequence).
Google "locality of reference" for more info.
Never used hand-made lists except for homeworks at university.
Depending on usage a linked list could be the best option. Deletes from the front of the list are much faster with a linked list than an array list.
In a Java program that I maintain profiling showed that I could increase performance by moving from an ArrayList to a LinkedList for a List that had lots of deletes at the beginning.
I've been developing line of business applications (.NET) for years and I can only think of one instance where I've used linked list and even then I did not have to create the object.
This has just been my experience.
I would say it depends on the usage, in some cases they are quicker than typical random access containers.
Also I think they are used by some libraries as an underlying collection type, so what may look like a non-linked list might actually be one underneath.
In a C/C++ application I developed at my last company we used doubly linked lists all the time. They were essential to what we were doing, which was real-time 3D graphics.
Yes all sorts of data-structures are very useful in daily software development. In most languages that I know (C/C++/Python/Objective-C) there are frameworks that implement those data-structures so that you don't have to reinvent the wheel.
And yes, data-structures are not only for academics, they are very useful and you would not be able to write software without them (depends on what you do).
You use data-structures in message queues, data maps, hash tables, keeping data ordered, fast access/removal/insertion and so on depends what needs to be done.
Yes, I do. It all depends on the situation. If I won't be storing a lot of data in them, or if the specific application needs a FIFO structure, I'll use them without a second thought because they are fast to implement.
However, in applications for other developers I know, there are times that a linked list would fit perfectly except that poor locality causes a lot of cache misses.
I cannot imagine many programs that doesn't deal with lists.
The minute you need to deal with more than 1 thing of something, lists in all forms and shapes becomes needed, as you need somewhere to store these things. That list might be a singly/doubly linked list, an array, a set, a hashtable if you need to index your things based on a key, a priority queue if you need to sort it etc.
Typically you'd store these lists in a database system, but somewhere you need to fetch them from the db, store them in your application and manipulate them, even if it's as simple to retrieve a little list of things you populate into a drop-down combobox.
These days, in languages such as C#,Python,Java and many more, you're usually abstracted away from having to implement your own lists. These languages come with a great deal of abstractions of containers you can store stuff in. Either via standard libraries or built into the language.
You're still at an advantage of learning these topics, e.g. if you're working with C# you'd want to know how an ArrayList works, and wheter you'd choose ArrayList or something else depending on your need to add/insert/search/random index such a list.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
People in java/.net world has framework which provides methods for sorting a list.
In CS, we all might have gone through Bubble/Insertion/Merge/Shell sorting algorithms.
Do you write any of it these days?
With frameworks in place, do you write code for sorting?
Do you think it makes sense to ask people to write code to sort in an interview? (other than for intern/junior developer requirement)
There are two pieces of code I write today in order to sort data
list.Sort();
enumerable.OrderBy(x => x); // Occasionally a different lambda is used
I work for a developer tools company, and as such I sometimes need to write new RTL types and routines. Sorting is something developers need, so it's something I sometimes need to write.
Don't forget, all that library code wasn't handed down from some mountain: some developer somewhere had to write it.
I don't write the sorting algorithm, but I have implemented the IComparer in .Net for a few classes which was kind of interesting the first couple of times.
I wouldn't write the code for sorting given what is in the frameworks in most cases. There should be an understanding of why a particular sorting strategy like Quick sort is often used in frameworks like .Net.
I could see giving or being given a sorting question where some of the work is implementing the IComparer and understanding the different ways to sort a class. It would be a fairly easy thing to show someone a Bubble sort and ask, "Why wouldn't you want to do this in most applications?"
I can say with 100% certainty that I haven't written one of the 'traditional' sort routines since leaving University. It's nice to know the theory behind it, but to apply them to real-world situations that can't be done by other means doesn't happen very often (at least from my experience...).
only on employer's interview/test =)
I wrote a merge sort when I had to sort multi-gigabyte files with a custom key comparison. I love merge sort - it's easy to comprehend, stable, and has a worst-case O(n log n) performance.
I've been looking for an excuse to try radix sort too. It's not as general purpose as most sorting algorithms, so there aren't going to be any libraries that provide it, but under the right circumstances it should be a good speedup.
Personally, I've not had a need to write my own sorting code for a while.
As far as interview questions go, it would weed out those who didn't pay attention during CS classes.
You could test API knowledge by asking how would you build Comparable (Capital C) objects, or something along those lines.
The way I see it, just like many others fields of knowledge, programming also has a theoretical and a practical approach to it.
The field of "theoretical programming" is the one that gave us quicksort, Radix Sort, Djikstra's Algorithm and many other things absolutely necessary to the advance of computing.
The field of "practical programming" deals with the fact that the solutions created in "theoretical programming" should be easily accessible to all in a much easier way, so that the theoretical ideas can get many, many creative uses. This gave us high-level languages like Python and allowed pretty much any language to implement packed methods for the most basics operations like sorting or searching with a good enough performance to be fit for almost everyone.
One can't live without the other...
most of us not needing to hard code a sorting algorithm doesn't mean no one should.
I've reciently had to write a sort, of sorts.
I had a list of text.. the ten most common had to show up according to the frequency at which they were selected. All other entries had to show up according to alpha sort.
It wasn't crazy hard to do but I did have to write a sort to support it.
I've also had to sort objects whose elements aren't easily sorted with an out of the box code.
Same goes for searching.. I had to walk a file and search staticly sized records.. When I found a record I had to move one record back, because I was inserting before it.
For the most part it was very simple and I mearly pasted in a binary search. Some changes needed to be done to support the method of access, because I wasn't using an array that was acutally in memory.. Ah c&#p.. I could have treated it like a stream.. See now I want to go back and take a look..
Man, if someone asked me in an interview what the best sort algorithm was, and didn't understand immediately when I said 'timsort', I'd seriously reconsider if I wanted to work there.
Timsort
This describes an adaptive, stable,
natural mergesort, modestly called
timsort (hey, I earned it ). It
has supernatural performance on many
kinds of partially ordered arrays
(less than lg(N!) comparisons needed,
and as few as N-1), yet as fast as
Python's previous highly tuned
samplesort hybrid on random arrays.
In a nutshell, the main routine
marches over the array once, left to
right, alternately identifying the
next run, then merging it into the
previous runs "intelligently".
Everything else is complication for
speed, and some hard-won measure of
memory efficiency.
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
Is timsort general-purpose or Python-specific?
I haven't really implemented a sort, except as coding exercise and to observe interesting features of a language (like how you can do quicksort on one line in Python).
I think it's a valid question to ask in an interview because it reflects whether the developer thinks about these kind of things... I feel it's important to know what that list.sort() is doing when you call it. It's just part of my theory that you should know the fundamentals behind everything as a programmer.
I never write anything for which there's a library routine. I haven't coded a sort in decades. Nor would I ever. With quicksort and timsort directly available, there's no reason to write a sort.
I note that SQL does sorting for me.
There are lots of things I don't write, sorts being just one of them.
I never write my own I/O drivers. (Although I have in the past.)
I never write my own graphics libraries. (Yes, I did this once, too, in the '80s)
I never write my own file system. (Avoided this.)
There is definitely no reason to code one anymore. I think it is important though to understand the efficiency of what you are using so that you can pick the best one for the data you are sorting.
Yes. Sometimes digging out Shell sort beats the builtin sort routine when your list is only expected to be at most a few tens of records.