Is appending to lists in Standard ML O(n) time? - data-structures

I'm trying to insert elements at the end of a list, but I'm wondering if that would be slow?
I know that to append elements in Scheme requires traversing the entire list and thus O(N) time for a list of length N. Is that also true in ML?

This is true in all implementations of ML that I am aware of, though I certainly would believe that it's possible to create an implementation for which that is not the case.

Related

Is there an efficient way to iterate over an unsorted container in a specific order without sorting/copying/referencing the original container?

What I have in mind is a SortedIterator, which accepts a Less function that would be used to sort the container with all the known algorithms.
The brute force implementation would of course either keep a copy of the original elements, or keep references/pointers to the elements in the original list.
Is there an efficient way to iterate in a well-defined order without actually sorting the list? I'm asking this out of algorithmic curiosity, and I expect the answer to be no (or yes with a big but). It is asked out of a C++ style mindset, but is in fact a quite general language-agnostic premise.
If you want O(1) memory the O(n^2) complexity is the only way to do it that we know of. Otherwise we could improve selection-sort algorithm the same way. Any other sorting mechanism relies on being able to restructure part of array(merge sort relies on sorting parts of the array, qsort relies on splitting the array based on the pivot and so on).
Now if you relax the memory constrain you can do something a bit more efficient. For example you could store a heap to contain the lowest values x elements. So after one pass O(Nlog x) you get x elements for your iterator. For the next pass restrict only to elements greater than the last element you've emitted so far. You'll need to do N/x passes to get all. If x ==1 than the solution is O(N^2). If x == N the solution is O(Nlog N) (but with larger constant than the typical qsort). If the data is on disk then I would set x to about as much ram as you can, minus a few MB to be able to read large chunks for drive.

How to apply binary search O(log n) on a sorted linked list?

Recently I came across one interesting question on linked list. Sorted singly linked list is given and we have to search one element from this list.
Time complexity should not be more than O(log n). This seems that we need to apply binary search on this linked list. How? As linked list does not provide random access if we try to apply binary search algorithm it will reach O(n) as we need to find length of the list and go to the middle.
Any ideas?
It is certainly not possible with a plain singly-linked list.
Sketch proof: to examine the last node of a singly-linked list, we must perform n-1 operations of following a "next" pointer [proof by induction on the fact that there is only one reference to the k+1th node, and it is in the kth node, and it takes a operation to follow it]. For certain inputs, it is necessary to examine the last node (specifically, if the searched-for element is equal to or greater than its value). Hence for certain inputs, time required is proportional to n.
You either need more time, or a different data structure.
Note that you can do it in O(log n) comparisons with a binary search. It'll just take more time than that, so this fact is only of interest if comparisons are very much more expensive than list traversal.
You need to use skip list. This is not possible with a normal linked list (and I really want to learn if this is possible with normal list).
In Linked List, binary search may not achieve a complexity of O(log n) but least can be achieved a little by using Double Pointer Method as described here in this research work: http://www.ijcsit.com/docs/Volume%205/vol5issue02/ijcsit20140502215.pdf
As noted, this is not in general possible. However, in a language like C, if the list nodes are contiguously allocated, it would be possible to treat the structure as an array of nodes.
Obviously, this is only an answer to a trick question variant of this problem, but the problem is always an impossibility or a trick question.
Yes, it is possible in java language as below..
Collections.<T>binarySearch(List<T> list, T key)
for binary search on any List. It works on ArrayList and on LinkedList and on any other List.
Use MAPS to create LINK LISTS.
Map M , M[first element]=second element , M[second element]=third element ,
...
...
its a linked list...
but because its a map...
which internally uses binary search to search any element..
any searching of elements will take O(log n)

Complexity in using Binary search and Trie

given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs.
priorties: 1. speed. 2. memory
I already know I can use (n is number of words, m is average length of the words)
1. a trie, time is O(log(n)), space(best case) is O(log(nm)), space(worst case) is O(nm).
2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m)
I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?
It is O(m) time for the trie, and up to O(mlog(n)) for the binary search. The space is asymptotically O(nm) for any reasonable method, which you can probably reduce in some cases using compression. The trie structure is, in theory, somewhat better on memory, but in practice it has devils hiding in the implementation details: memory needed to store pointers and potentially bad cache access.
There are other options for implementing a set structure - hashset and treeset are easy choices in most languages. I'd go for the hash set as it is efficient and simple.
I think HashMap is perfectly fine for your case, since the time complexity for both put and get operations is O(1). It works perfectly fine even if you dont have a sorted list.!!!
Preprocessing is ok since I will be calling > this function many times over different
inputs.
As a food for thought, do you consider creating a set from the input data and then searching using particular hash? It will take more time process for the first time to build a set but if number of inputs is limited and you may return to them then set might be good idea with O(1) for "contains" operation for a good hash function.
I'd recommend a hashmap. You can find an extension to C++ for this in both VC and GCC.
Use a bloom filter. It is space efficient even for very large data and it is a fast rejection technique.

Fast element lookup for a functional language(Haskell)

Say we are traversing a graph and want to quickly determine if a node has been seen before or not. We have a few set preconditions.
Nodes have been marked with integers values 1..N
Graph is implemented with nodes having an adjacency list
Every integer value from 1..N occurs in the graph, which is of size N
Any ideas for doing this in a purely functional way?(No Hash tables or arrays allowed).
I want a data structure with two functions working on it; add(adds an encountered integer) and lookup(checks if integer has been added). Both should preferably take O(n) time amortized for N repetitions.
Is this possible?
You can use a Data.Set. You add an element by creating a new set from the old one with insert and pass the new set around. You look up whether an element is a member of the set with member. Both operations are O(log n).
Perhaps, you could consider using a state monad to thread the passing of the set.
Efficient element lookup in functional languages is quite hard. Data.Set (as shown above) is implemented using a binary tree which can be built up in a purely functional way providing lookup operations in O(log n). HashTables (which aren't purely functional) would have O(1).
I believe that Data.BitSet might be O(n).
Take a look at judy hashtables, if you don't mind wrapping your code in the IO monad.

How do I efficiently keep track of the smallest element in a collection?

In the vein of programming questions: suppose there's a collection of objects that can be compared to each other and sorted. What's the most efficient way to keep track of the smallest element in the collection as objects are added and the current smallest occasionally removed?
Using a min-heap is the best way.
http://en.wikipedia.org/wiki/Heap_(data_structure)
It is tailor made for this application.
If you need random insert and removal, the best way is probably a sorted array. Inserts and removals should be O(log(n)).
#Harpreet
That is not optimal. When an object is removed, erickson will have to scan entire collection to find the new smallest.
You want to read up on Binary search tree's. MS has a good site to start down the path. But you may want to get a book like Introduction to algorithms (Cormen, Leiserson, Rivest, Stein) if you want to deep dive.
For occasional removes a Fibonacci Heap is even faster than the min-heap. Insertion is O(1), and finding the min is also O(1). Removal is O(log(n))
If you need random insert and removal,
the best way is probably a sorted
array. Inserts and removals should be
O(log(n)).
Yes, but you will need to re-sort on each insert and (maybe) each deletion, which, as you stated, is O(log(n)).
With the solution proposed by Harpreet:
you have one O(n) pass in the beginning to find the smallest element
inserts are O(1) thereafter (only 1 comparison needed to the already-known smallest element)
deletes will be O(n) because you will need to re-find the smallest element (keep in mind Big O notation is worst case). You could also optimize by checking to see if the element to be deleted is the (known) smallest, and if not, just don't do any of the re-check to find the smallest element.
So, it depends. One of these algorithms will be better for an insert-heavy use case with few deletes, but the other is overall more consistent. I think I would default to Harpreet's mechanism unless I knew that the smallest number would be removed often, because that exposes a weak point in that algorithm.
Harpreet:
the inserts into that would be linear since you have to move items for an insert.
Doesn't that depend on the implementation of the collection? If it acts like a linked-list, inserts would be O(1), while if it were implemented like an array it would be linear, as you stated.
Depends on which operations you need your container to support. A min-heap is the best if you might need to remove the min element at any given time, although several operations are nontrivial (amortized log(n) time in some cases).
However, if you only need to push/pop from the front/back, you can just use a mindeque which achieves amortized constant time for all operations (including findmin). You can do a scholar.google.com search to learn more about this structure. A friend and I recently collaborated to reach a much easier-to-understand and -to-implement version of a mindeque, as well. If this is what you're looking for I could post the details for you.

Resources