Efficient Data Structures in Maple - performance

I'm working with a large amount of data in Maple and I need to know the most efficient way to store it. I started with lists, but I quickly learned how inefficient those are so I have since replaced them. Now I'm using a mixture of Arrays (for structures with a fixed length) and tables (for structures with variable length), but my code actually runs significantly slower than it did when I was only using lists.
So here are my questions:
What is the most efficient data structure to use in Maple for a static-length set of data? for a variable-length set?
Are there any "gotchas" I need to be aware of when using these structures as parameters in a recursive proc? If using Arrays or tables, does each one need to be copied for each iteration to avoid clobbering data?

I think I can wrap this one up now. I made a few performance improvements, mostly just small tweaks that only helped a bit, but I did manage a big improvement by removing as many instances of the copy command as I could (I used it on arrays and tables). It turns out this is what was causing my array/table implementation to be slower than my list-only implementation. But the code still didn't run as fast as I needed, so I re-wrote it in C#. That's probably not the best solution for "how to improve Maple efficiency", but it sure does run a lot faster now.

Related

What are some examples of Sorting methods in Web development?

I am a TA for algorithms class and we are doing a unit on sorting, and I wanted to a discussion of quicksort. There are many good theoretical discussions of sorting methods on the web showing which one is better in which circumstances...
What are some real-life instances of quick-sort I can give to my student. Especially in the field of Web Development.
does Django use quick-sort?
does React?
does Leaflet use any kind of sort?
In fact, I don't really care about quicksort particularly. Any sorting method will do if I can point to a specific library that uses it. Thanks.
why are my students learning sort? why am i teaching this? i can think of academic or theoretical reasons... basically that we are constantly ordering things - either in their own right or as part of another algorithm. how about for my students, who may never have to write their own sort function?
I'll answer the question "why do we learn how to write a sort function?" Why do we learn to write anything that's already given to us by a library? Hashes, lists, queues, trees... why learn to write any of them?
The most important is to appreciate their performance consequences and when to use which one. For example, Ruby Arrays supply a lot of built in functionality. They're so well done and easy to use that it's easy to forget you're working with a list and write yourself a pile of molasses.
Look at this loop that finds a thing in a list and replaces it.
things.each { |thing|
idx = thing.index(marker)
thing[idx] = stuff
}
With no understanding of the underlying algorithms that seems perfectly sensible.
For each list in the list of things.
Find the item to replace.
Insert a new item in its place.
Two steps per thing. What could be simpler? And when they run it with a small amount of test data it's fine. When they put it into production with a real amount of data and having to do it thousands of times per second it's dog slow. Why? Without an appreciation for what all those methods are doing under the hood, they cannot know.
things.each { |thing| # O(things)
idx = thing.index(marker) # O(thing)
thing[idx] = stuff # O(1)
}
Those deceivingly simple looking Array methods are their own hidden loops. In the worst case each one must scan the whole list. Loops in loops makes this exponentially slow, it's O(n*m). How slow? If things is 1000 items long, and each thing has 1000 items in it that's... 1000 * 1000 or 1,000,000 operations!
And this isn't nearly the amount of trouble students can get into, normally they write O(n!) loops. I actually find it hard to come up with an example I'm so ingrained against it.
But that only becomes apparent after you throw a ton of data at it. While you're writing it, how can you know?
How can they make it faster? Without understanding the other options available to you and their performance characteristics, like hashes and sets and trees, they cannot know. And experienced programmer would make one immediate change to the data structure and change things to a list of sets.
things.each { |thing| # O(things)
thing.delete(marker) # O(1)
thing.add(stuff) # O(1)
}
This is much faster. Deleting and adding with an unordered set is O(1) so it's effectively free no matter how large thing gets. Now if things is 1000 items long, and each thing has 1000 items in it that's 1000 operations. By using a more appropriate data structure I just sped up that loop by 1000 times. Really what I did is changed it from O(n*m) to O(n).
Another solid example is learning how to write a solid comparison function for multi-level data. Why is the Schwartzian transform fast? You can't appreciate that without understanding how sorting works.
You could simply be told these things, sorting is O(n log n), finding something in a list is O(n), and so on... but having to do it yourself gives you a visceral appreciation for what's going on under the hood. It makes you appreciate all the work a modern language does for you.
That said, there's little point in writing six different sort algorithms, or four different trees, or five different hash conflict resolution functions. Write one of each to appreciate them, then just learn about the rest so you know they exist and when to use them. 98% of the time the exact algorithm doesn't matter, but sometimes it's good to know that a merge sort might work better than a quick sort.
Because honestly, you're never going to write your own sort function. Or tree. Or hash. Or queue. And if you do, you probably shouldn't be. Unless you intend to be the 1% that writes the underlying libraries (like I do), if you're just going to write web apps and business logic, you don't need a full blown Computer Science education. Spend that time learning Software Engineering instead: testing, requirements, estimation, readability, communications, etc...
So when a student asks "why are we learning this stuff when it's all built into the language now?" (echos of "why do I have to learn math when I have a calculator?") have them write their naive loop with their fancy methods. Shove big data at it and watch it slow to a crawl. Then write an efficient loop with good selection of data structures and algorithms and show how it screams through the data. That's their answer.
NOTE: This is the original answer before the question was understood.
Most modern languages use quicksort as their default sort, but usually modified to avoid the O(n^2) worst case. Here's the BSD man page on their implementation of qsort_r(). Ruby uses qsort_r.
The qsort() and qsort_r() functions are an implementation of C.A.R. Hoare's ``quicksort'' algorithm, a variant of partition-exchange sorting; in particular, see D.E. Knuth's Algorithm Q. Quicksort takes O N lg N average time. This implementation uses median selection to avoid its O N**2 worst-case behavior.
PHP also uses quicksort, though I don't know which particular implementation.
Perl uses its own implementation of quicksort by default. But you can also request a merge sort via the sort pragma.
In Perl versions 5.6 and earlier the quicksort algorithm was used to implement "sort()", but in Perl 5.8 a mergesort algorithm was also made available, mainly to guarantee worst case O(N log N) behaviour: the worst case of quicksort is O(N**2). In Perl 5.8 and later, quicksort defends against quadratic behaviour by shuffling large arrays before sorting.
Python since 2.3 uses Timsort and is guaranteed to be stable. Any software written in Python (Django) is likely to also use the default Timsort.
Javascript, really the ECMAScript specification, does not say what type of sorting algorithm to use for Array.prototype.sort. It only says that it's not guaranteed to be stable. This means the particular sorting algorithm is left to the Javascript implementation. Like Python, any Javascript frameworks such as React or Leaflet are likely to use the built in sort.
Visual Basic for Applications (VBA) comes with NO sorting algorithm. You have to write your own. This is a bizarre oversight for any language, but particularly one that's designed for business use and spreadsheets.
Almost any table is sorted. Most web apps are backed by SQL database and the actual sorting is performed inside that SQL database. For example SQL query SELECT id, date, total FROM orders ORDER BY date DESC. This kind of sorting uses already sorted database indexes, which are mostly implemented using B-trees (or data structures inspired by B-trees). But if data needs to be sorted on the fly then I think quicksort is usually used.
Sorting, merging of sorted files and binary search in sorted files is often used in big data processing, analytics, ad dispatching, fulltext search... Even Google results are sorted :)
Sometimes you don't need sort, but partial sort, or min-heap. For example in Dijkstra's algorithm for finding shortest path. Which is used (or can be used, or I would use it :) ) for example in route planning (Google Maps).
As pointed out by Schwern, the sorting is almost always provided by the programming language or its implementation engine, and libraries / frameworks just use that algorithm, with a custom comparison function when they need to sort complex objects.
Now if your objective is to have a real life example in the Web context, you could actually use on the contrary the "lack of" sorting method in SVG, and make an exercise out of it. Unlike other DOM elements, an SVG container paints its children in the order they are appended, irrespective of any "z-index" equivalent. So to implement a "z-index" functionality, you have to re-order the nodes yourself.
And to avoid just using a custom comparison function and relying on array.sort, you could add extra constraints, like stability, typically to preserve the current order of nodes with the same "z-index".
Since you mention Leaflet, one of the frustration with the pre 1.0 version (e.g. 0.7.7), was that all vector shapes are appended into the same single SVG container, without any provided sorting functionality, except for bringToFront / bringToBack.

What are appropriate applications for a linked (doubly as well) list?

I have a question about fundamentals in data structures.
I understand that array's access time is faster than a linked list. O(1)- array vs O(N) -linked list
But a linked list beats an array in removing an element since there is no shifting needing O(N)- array vs O(1) -linked list
So my understanding is that if the majority of operations on the data is delete then using a linked list is preferable.
But if the use case is:
delete elements but not too frequently
access ALL elements
Is there a clear winner? In a general case I understand that the downside of using the list is that I access each node which could be on a separate page while an array has better locality.
But is this a theoretical or an actual concern that I should have?
And is the mixed-type i.e. create a linked list from an array (using extra fields) good idea?
Also does my question depend on the language? I assume that shifting elements in array has the same cost in all languages (at least asymptotically)
Singly-linked lists are very useful and can be better performance-wise relative to arrays if you are doing a lot of insertions/deletions, as opposed to pure referencing.
I haven't seen a good use for doubly-linked lists for decades.
I suppose there are some.
In terms of performance, never make decisions without understanding relative performance of your particular situation.
It's fairly common to see people asking about things that, comparatively speaking, are like getting a haircut to lose weight.
Before writing an app, I first ask if it should be compute-bound or IO-bound.
If IO-bound I try to make sure it actually is, by avoiding inefficiencies in IO, and keeping the processing straightforward.
If it should be compute-bound then I look at what its inner loop is likely to be, and try to make that swift.
Regardless, no matter how much I try, there will be (sometimes big) opportunities to make it go faster, and to find them I use this technique.
Whatever you do, don't just try to think it out or go back to your class notes.
Your problem is different from anyone else's, and so is the solution.
The problem with a list is not just the fragmentation, but mostly the data dependency. If you access every Nth element in array you don't have locality, but the accesses may still go to memory in parallel since you know the address. In a list it depends on the data being retrieved, and therefore traversing a list effectively serializes your memory accesses, causing it to be much slower in practice. This of course is orthogonal to asymptotic complexities, and would harm you regardless of the size.

CUDA parallel sorting algorithm vs single thread sorting algorithms

I have a large amount of data which i need to sort, several million array each with tens of thousand of values. What im wondering is the following:
Is it better to implement a parallel sorting algorithm, on the GPU, and run it across all the arrays
OR
implement a single thread algorithm, like quicksort, and assign each thread, of the GPU, a different array.
Obviously speed is the most important factor. For single thread sorting algorithm memory is a limiting factor. Ive already tried to implement a recursive quicksort but it doesnt seem to work for large amounts of data so im assuming there is a memory issue.
Data type to be sorted is long, so i dont believe a radix sort would be possible due to the fact that it a binary representation of the numbers would be too long.
Any pointers would be appreciated.
Sorting is an operation that has received a lot of attention. Writing your own sort isn't advisable if you are interested in high performance. I would consider something like thrust, back40computing, moderngpu, or CUB for sorting on the GPU.
Most of the above will be handling an array at a time, using the full GPU to sort an array. There are techniques within thrust to do a vectorized sort which can handle multiple arrays "at once", and CUB may also be an option for doing a "per-thread" sort (let's say, "per thread block").
Generally I would say the same thing about CPU sorting code. Don't write your own.
EDIT: I guess one more comment. I would lean heavily towards the first approach you mention (i.e. not doing a sort per thread.) There are two related reasons for this:
Most of the fast sorting work has been done along the lines of your first method, not the second.
The GPU is generally better at being fast when the work is well adapted for SIMD or SIMT. This means we generally want each thread to be doing the same thing and minimizing branching and warp divergence. This is harder to achieve (I think) in the second case, where each thread appears to be following the same sequence but in fact data dependencies are causing "algorithm divergence". On the surface of it, you might wonder if the same criticism might be levelled at the first approach, but since these libraries I mention arer written by experts, they are aware of how best to utilize the SIMT architecture. The thrust "vectorized sort" and CUB approaches will allow multiple sorts to be done per operation, while still taking advantage of SIMT architecture.

Efficient storage of external index of strings

Say you have a large collection with n objects on disk and each one has a variable-sized string. What are common practices of efficient ways to make an index of those objects with plain string comparison. Storing the whole strings on the index would be prohibitive in the long rundue to size and I/O, but since disks have a high latency storing only references isn't a good idea, either.
I've been thinking on using a B-Tree-like design with tries but can't find any database implementation using this approach. In fact, it's hard to find how major databases implement indexes for strings (it probably gets lost in the vast results for SQL-level information.)
TIA!
EDIT: changed title from "Efficient external sorting and searching of stored objects with large strings" to "Efficient storage of external index of strings."
A "prefix B-tree" or "simple prefix B-tree" would probably be helpful here.
A "simple prefix B-tree" is a bit simpler, just storing the shortest prefix that separates two items, without trying to eliminate redundancy within those prefixes (e.g. for 'astronomy' and 'azimuth', it would store just 'as' and 'az', but not try to keep from duplicating the 'a').
A "prefix B-tree" is close to what you've described -- something like a trie, but in a B-tree structure to give good characteristics when stored primarily on disk. Nonetheless, it's intended to remove (most of) the redundancy within the prefixes that form the index.
There is one other question: do you really need to traverse the records in order, or do you just need to look up a specified record quickly? If the latter is adequate, you might be able to use extendible hashing instead. Extendible hashing has been around (in a number of different forms) for a few decades, and still works pretty well. The general idea is fairly simple: hash the strings to create keys of fixed length, then create some sort of tree of those fixed-length pseudo-keys. As with (almost) any hash, you have to be prepared to deal with collisions. As with other hash tables, the details of the hashing and collision resolution vary (though probably not quite as much with extendible hashing as in-memory hashing).
As for real use, major DBMS and DBMS-like systems use all of the above. B-tree variants are probably the most common in the general purpose DBMS market (e.g. Oracle or MS SQL Server). Extendible hashing is used in a fair number of more-specialized products (e.g., Lotus Domino Server).
What are you doing with the objects?
If you're running a large system that needs low latency to handle lots of concurrent requests, then I'd store the objects in a database and have it take care of the sorting and indexing. This would be much simpler than implementing B-tree from scratch and possibly having it be buggy.
DBMSs also have caching and various other features that might make your life easier.
Start by being clear what you want. Do you want to sort them or index them? Sorting is likely to require moving at least some of the items on disk, but indexing would likely leave them where they are.
If you really want to sort them, Knuth's "The Art of Computer Programming" volume three covers sorting and searching in about as much details as you're likely to want.

Is sqrt still slow in Delphi 2009?

Is sqrt still slow in Delphi 2009?
Are the old tricks (lookup table, approx functions) still useful?
If you are dealing with a small set of really large numbers then a lookup table will most always be faster. While if you are dealing with a large set of small numbers then even a slow routine may be faster then maintaining a large table.
I looked in System.pas (where SQRT is located) and while there are a number of blocks marked as licensed from the Fastcode project, SQRT is not. In fact, it just makes an assembly call to FSQRT, so it most likely hasn't changed. So if it was comparatively slow at one point then it most likely still is as slow (although your CPU may be a lot faster and optimized for that now . . . .)
My recommendation is to look at your usage and test.
Many moons ago I had an app that computed distance to sort vectors. I realized sorting by the un-sqrt-ed values was the same so I skipped it altogether. Sorted by distance^2 and saved some time.
My answer to questions of efficiency is to just apply the simplest method first, then optimize later. Making an operation faster when there's no need for it to be all that fast in the first place seems silly.
I digress, however, since my answer is about efficiency, not the historical aspects of your question.

Resources