What is the difference between a list and a multiset? - data-structures

From my understanding, both lists and multisets are collections of ordered values in which values can occur more than once. Is there any difference?

No, lists and multisets are different. Order matters in lists, and doesn't in multisets.
(list 1 2 3 2) != (list 2 1 3 2)
(multiset 1 2 2 3) == (multiset 1 3 2 2)

Besides order, each container has it own set of available methods and their complexity.
For example, searching in a list is o(n) (you'll have to check every element until you find the one). Searching in multiset is o(log(n)). It is usually implemented as a red-black tree to fit this requirement

Related

Iterating through every combination of elements with repetitions without generating whole set

I need to iterate over every possible combination of elements (with repetitions) up to n elements long.
I've found multiple solutions for this problem. But all of these are recursively generating collection of every possible combination, then iterating over it. while this works, for large element collections and combinations size it results in heavy memory use, so I'm looking for a solution that would allow me to calculate next combination from previous one, knowing number of elements, and maximum size of combination.
Is this even possible and is there any particular algorith that would work here?
Generate the combinations so that each combination is sorted. (This assumes the elements themselves can easily be placed in order. The precise ordering relationship is not important as long as it is a total order.)
Start with the combination consisting of n repetitions of the smallest element. To produce the next combination from any given combination:
Scan backwards until you find an element which is not the largest element. If you can't find one, you are done.
Replace that element and all following elements with the next larger element of that element.
If you want combinations of all lengths up to n, run that algorithm for each length up to n. Or start with a vector which contains empty slots and use the above algorithm with the understanding that the "next larger element" after an empty slot is the smallest element.
Example: length 3 of 3 values:
1 1 1
1 1 2
1 1 3
1 2 2
1 2 3
1 3 3
2 2 2
2 2 3
2 3 3
3 3 3

How to display all ways to give change

As far as I know, counting every way to give change to a set sum and a starting till configuration is a classic Dynamic Programming problem.
I was wondering if there was a way to also display (or store) the actual change structures that could possibly amount to the given sum while preserving the DP complexity.
I have never saw this issue being discussed and I would like some pointers or a brief explanation of how this can be done or why this cannot be done.
DP for change problem has time complexity O(Sum * ValuesCount) and storage complexity O(Sum).
You can prepare extra data for this problem in the same time as DP for change, but you need more storage O(O(Sum*ValuesCount), and a lot of time for output of all variants O(ChangeWaysCount).
To prepare data for way recovery, make the second array B of arrays (or lists). When you incrementing count array A element from some previous element, add used value to corresponding element of B. At the end, unwind all the ways from the last element.
Example: values 1,2,3, sum 4
index 0 1 2 3 4
A 0 1 2 3 4
B - 1 1 2 1 2 3 1 2 3
We start unwinding from B[4] elements:
1-1-1-1 (B[4]-B[3]-B[2]-B[1])
2-1-1 (B[4]-B[2]-B[1])
2-2 (B[4]-B[2])
3-1 (B[4]-B[1])
Note that I have used only ways with non-increasing values to avoid permutation variants (i.e. 1-3 and 3-1)

Is there built-in data structure in clojure supporting both duplicated elements and O(1) removing?

Recently, I need to implement a special set in clojure which may have duplicated elements (i.e. a multiset), like
#{1 2 3 4 1 2}
what's more, removing an arbitrary element equal to the assigned value in O(1) time is needed as well. For example, when I type
(my-remove #{1 1 2 2 3 4} 2)
it should return #{1 1 2 3 4} without loop through the whole set(or vector).
My question is, is there a built-in data structure in clojure satisfying these two properties. If not, is there any proper alternative solution to implement this? Thanks!
A map of values to their "count" ? (Removing a value would be decreasing the counter ?)

Find whether the following tree exists in the list of million of binary search trees

For example,
consider the following tree's, check whether they exist in the list of BST.
5
/ \
4 6
/ \
1 3
3
/ \
2 4
How to approach to this problem?
Sort the list according to the root (if roots are same then left node etc). For each query tree do a binary search.
This works if the number of queries is comparable to number of elements in the list. Complexity: ( (n+m)logn) where m is the number of queries and n is the number of elements in the list.
If the number of queries is small, brute-force searching is efficient.
I'll put it up as an answer so people can make variations if they'd like.
A naive approach would be to just scan through the list, compare each node and once you see a difference in the two trees you're comparing, just go on to the next one in the list. => O(N) where N is the total number of nodes.
The answer to this question was put all the trees of the list in hash table so that there is constant time search for a tree.

Sequentially Constructing Full B-Trees

If I have a sorted set of data, which I want to store on disk in a way that is optimal for both reading sequentially and doing random lookups on, it seems that a B-Tree (or one of the variants is a good choice ... presuming this data-set does not all fit in RAM).
The question is can a full B-Tree be constructed from a sorted set of data without doing any page splits? So that the sorted data can be written to disk sequentially.
Constructing a "B+ tree" to those specifications is simple.
Choose your branching factor k.
Write the sorted data to a file. This is the leaf level.
To construct the next highest level, scan the current level and write out every kth item.
Stop when the current level has k items or fewer.
Example with k = 2:
0 1|2 3|4 5|6 7|8 9
0 2 |4 6 |8
0 4 |8
0 8
Now let's look for 5. Use binary search to find the last number less than or equal to 5 in the top level, or 0. Look at the interval in the next lowest level corresponding to 0:
0 4
Now 4:
4 6
Now 4 again:
4 5
Found it. In general, the jth item corresponds to items jk though (j+1)k-1 at the next level. You can also scan the leaf level linearly.
We can make a B-tree in one pass, but it may not be the optimal storage method. Depending on how often you make sequential queries vs random access ones, it may be better to store it in sequence and use binary search to service a random access query.
That said: assume that each record in your b-tree holds (m - 1) keys (m > 2, the binary case is a bit different). We want all the leaves on the same level and all the internal nodes to have at least (m - 1) / 2 keys. We know that a full b-tree of height k has (m^k - 1) keys. Assume that we have n keys total to store. Let k be the smallest integer such that m^k - 1 > n. Now if 2 m^(k - 1) - 1 < n we can completely fill up the inner nodes, and distribute the rest of the keys evenly to the leaf nodes, each leaf node getting either the floor or ceiling of (n + 1 - m^(k - 1))/m^(k - 1) keys. If we cannot do that then we know that we have enough to fill all of the nodes at depth k - 1 at least halfway and store one key in each of the leaves.
Once we have decided the shape of our tree, we need only do an inorder traversal of the tree sequentially dropping keys into position as we go.
Optimal meaning that an inorder traversal of the data will always be seeking forward through the file (or mmaped region), and a random lookup is done in a minimal number of seeks.

Resources