Maximum Collection - algorithm

Mark has a collection of N postage stamps. Each stamp belongs to some type, which are enumerated as positive integers. More valuable stamps have a higher enumerated type.
On any particular day, E-bay lists several offers, each of which is represented as an unordered pair {A, B}, allowing its users to exchange stamps of type A with an equal number of stamps of type B. Mark can use such an offer to put up any number of stamps of enumerated type A on the website and get the same number of stamps of type B in return, or vice-versa . Assume that any number of stamps Mark wants are always available on the site's exchange market. Each offer is open during only one day: Mark can't use it after this day, but he can use it several times during this day. If there are some offers which are active during a given day, Mark can use them in any order.
Find maximum possible value of his collection after going through (accepting or declining) all the offers. Value of Mark's collection is equal to the sum of type enumerations of all stamps in the collection.
How dynamic programming lead to the solution for the problem ? (Mark knows what offers will come in future)

I would maintain a table that gives, for each type, the maximum value that you can get for a member of that type using only the last N swaps.
To compute this for N=0 just put down the value of each type without swaps.
To compute this for N=i+1 look at the ith swap and the table for N=i. The i-th swap is for two offsets in that table, which probably have different values. Because you can use the i-th swap, you can alter the table to set the lower value of the two equal to the higher value of the two.
When you have a table taking into account all the swaps you can sum up the values for the types that Mark is starting with to get the answer.
Example tables for the swaps {4, 5}, {5, 3},{3, 1}, {1, 20}
1 2 3 4 5 .. 20
20 2 3 4 5 .. 20
20 2 20 3 4 .. 20
20 2 20 3 20 .. 20
20 2 20 20 20 .. 20
Example for swaps {1, 5} and then {1, 20}
1 2 3 4 5 .. 20
20 2 3 4 5 .. 20
20 2 3 4 20 .. 20
Note that i=1 means take account of the last swap possible, so we are working backwards as far as swaps are concerned. The final table reflects the fact that 5 can be swapped for 1 before 1 is swapped for 20. You can work out a schedule of which swaps to do when by looking at what swap is available at time i and which table entries change at this time.

Dynamic Programming means simplifying a problem into smaller sub-sequences of problems. Your problem is well defined as a value ordered collection of stamps of different types. So, Value(T1) < Value(T2) .. Value(Tn-1)
Finding the maximum value of the collection will be determined by the opportunities to swap pairs of types. Of course, we only want to swap pairs when it will increase the total value of the collection.
Therefore, we define a simple swap operation where we will swap if the collection contains stamps of the lower valued stamp in the swap opportunity.
If sufficient opportunities of differing types are offered, then the collection could ultimately contain all stamps at the highest value.
My suggestion is to create a collection data structure, a simple conditioned swap function and perhaps an event queue which responds to swap events.
Dynamic Table
Take a look at this diagram which shows how I would set up my data. The key is to start from the last row and work backwards computing the best deals, then moving forward and taking the best going forward.

Related

SHEETS: How do you sum children recursively into a parent row?

I have a simple google sheet where each row represents a node in a tree that holds a reference to its parent and some descriptor values about it. I would like to have a column that sums the child nodes beneath the current one.
e.g:
Node ID, Parent Node ID, Minimum Value, Self Value, Total Value
1, 0, 30, 10, 90
2, 1, 10, 20, 40
3, 1, 10, 20, 40
4, 2, 1, 10, 10
5, 3, 1, 10, 10
6, 3, 1, 10, 10
7, 2, 1, 10, 10
Where Self Value is statically defined, and Total Value represents Self Value + SUM(CHILDREN.Total Value). Do I need to re-organize the sheet to accomplish this or am I missing the proper way to recursively sum-up the child rows?
Introduction
tl,dr: this method works but is impractical for large complex datasets.
This seems to be rather complicated for what Sheets or similar software are designed, I'm think it would be probably much easier to solve in Apps Script (where I have no experience) or any other scripting language outside of Sheets.
Meanwhile I have come up with solution that works using only formulas in Sheets. It has some limitations however: the two formulas have to be manually extended (details below), and it would be cumbersome to use for a very large depth of the dataset (by depth I mean the maximal amount of generations of children nodes).
I have reorganized the columns in your example dataset to make this easier to understand and added two more rows to test it better. I have removed the Minimum Value column, as per your question, it is not relevant to the expected results.
SelfValue
NodeID
Parent
10
1
0
20
2
1
20
3
1
10
4
2
10
5
3
10
6
3
10
7
2
5
8
7
5
9
8
Solution and explanation
My main idea was that it is relatively easy to calculate Total Value of a given node if we know its children in all generations (not just its immediate children, but also "grandchildren" and so on) and their Self Value.
In particular, to know Total Value of a node, we do not need to have explicitly calculated the Total Value of its immediate children.
I have not found a simple way to enumerate children from all generations for a given node. I have approached it by finding the parents of the parents, and so on, for all nodes instead. To do this, I have entered the following formula in D2 and then manually extended this formula across the next columns up to column H (the first column to show only empty values):
=ARRAYFORMULA(IFERROR(VLOOKUP(C2:C,$B$2:$C,2,false)))
I attempted to make it automatically fill multiple columns without manually extending, but this gave me the circular dependence error.
The next and final step is to calculate the Total Value of all nodes, now that we have a way to identify all of their children (in all generations). I entered the following formula in cell I2 and then manually extended it down across all rows:
=IFERROR(SUM(FILTER(A$2:A,BYROW(B$2:H,LAMBDA(row,NOT(ISERROR(MATCH(B2,row,0))))))))
This calculates the Total Value by adding Self Value of all nodes, for which the given node is parent (in any generation) and the Self Value of the given node itself.
The range B$2:H has to be adapted, if the dataset is deeper and there are more columns filled with the first formula.
Here is the final result, I have colorized the cells where the two formulas are entered (green, yellow) and extended (light green, light yellow):
It seems it would be more efficient (less calculations in the background, more responsive sheet) by using QUERY, but then all the columns C-H need to be explicitly listed like select sum(A) where B="&B2&" or C="&B2&" or ..., so it becomes a problem in itself to construct this formula and adapt to a variable number of columns from the previous step.
I attempted to make the formula automatically fill all rows (instead of manually expanding) by experimenting with ARRAYFORMULA or MAP(LAMBDA), but it either didn't work or exceeded the calculation limit.
Anyway it would be interesting to see if there is another simpler solution to it using only formulas. Also it surely can be done more efficiently and elegantly using Apps Script.

Modified maximum cost after replacement

Initial cost of items is provided by an array denoting cost of each items.
These costs can be altered as if we are allowed to change the cost according to the offer applicable on the given day.(More clarification through example)
For e.g., Cost of items are 1, 3, 5
And D, A, B represents that on Dth day item having cost of A can be changed to B or vice-versa.
Now, following lines represents D, A, B:
1 4 3
1 1 3
2 6 22
3 5 8
On a given day, any offer can be applied any number of times to any number of items.
All details of offers are provided earlier so that you can accept/decline it depending whether it provides you maximum cost or not.
We have to obtain the maximum cost of items that can be achieved so Item having cost of 1 is changed to 3 and then 3 is changed to 4 and then 5 is changed to 8, 2nd day offer can't be applied as no items are having cost of 6 or 22 and also these costs can't be achieved anyhow.
Hence, the final costs of items are 4, 4, 8, so maximum cost is 16.
How to approach the solution when the data are large enough ?
How to find maximum value in the first place?
Observations:
items are independent
One might just operate on any items possible sequences of values, possibly ignoring lots of offers
offers are partially ordered by day, and depend on (current) item value, not type
One might put together possible sequences for all "end-of-day" values resulting from replacements, each possibly to be used by many items
(Anyone is welcome to improve this post. Each OP (MetaD, for one) is welcome to answer her/his own question.)

How to display all ways to give change

As far as I know, counting every way to give change to a set sum and a starting till configuration is a classic Dynamic Programming problem.
I was wondering if there was a way to also display (or store) the actual change structures that could possibly amount to the given sum while preserving the DP complexity.
I have never saw this issue being discussed and I would like some pointers or a brief explanation of how this can be done or why this cannot be done.
DP for change problem has time complexity O(Sum * ValuesCount) and storage complexity O(Sum).
You can prepare extra data for this problem in the same time as DP for change, but you need more storage O(O(Sum*ValuesCount), and a lot of time for output of all variants O(ChangeWaysCount).
To prepare data for way recovery, make the second array B of arrays (or lists). When you incrementing count array A element from some previous element, add used value to corresponding element of B. At the end, unwind all the ways from the last element.
Example: values 1,2,3, sum 4
index 0 1 2 3 4
A 0 1 2 3 4
B - 1 1 2 1 2 3 1 2 3
We start unwinding from B[4] elements:
1-1-1-1 (B[4]-B[3]-B[2]-B[1])
2-1-1 (B[4]-B[2]-B[1])
2-2 (B[4]-B[2])
3-1 (B[4]-B[1])
Note that I have used only ways with non-increasing values to avoid permutation variants (i.e. 1-3 and 3-1)

Subset calculation of list of integers

I'm currently implementing an algorithm where one particular step requires me to calculate subsets in the following way.
Imagine I have sets (possibly millions of them) of integers. Where each set could potentially contain around a 1000 elements:
Set1: [1, 3, 7]
Set2: [1, 5, 8, 10]
Set3: [1, 3, 11, 14, 15]
...,
Set1000000: [1, 7, 10, 19]
Imagine a particular input set:
InputSet: [1, 7]
I now want to quickly calculate to which this InputSet is a subset. In this particular case, it should return Set1 and Set1000000.
Now, brute-forcing it takes too much time. I could also parallelise via Map/Reduce, but I'm looking for a more intelligent solution. Also, to a certain extend, it should be memory-efficient. I already optimised the calculation by making use of BloomFilters to quickly eliminate sets to which the input set could never be a subset.
Any smart technique I'm missing out on?
Thanks!
Well - it seems that the bottle neck is the number of sets, so instead of finding a set by iterating all of them, you could enhance performance by mapping from elements to all sets containing them, and return the sets containing all the elements you searched for.
This is very similar to what is done in AND query when searching the inverted index in the field of information retrieval.
In your example, you will have:
1 -> [set1, set2, set3, ..., set1000000]
3 -> [set1, set3]
5 -> [set2]
7 -> [set1, set7]
8 -> [set2]
...
EDIT:
In inverted index in IR, to save space we sometimes use d-gaps - meaning we store the offset between documents and not the actual number. For example, [2,5,10] will become [2,3,5]. Doing so and using delta encoding to represent the numbers tends to help a lot when it comes to space.
(Of course there is also a downside: you need to read the entire list in order to find if a specific set/document is in it, and cannot use binary search, but it sometimes worths it, especially if it is the difference between fitting the index into RAM or not).
How about storing a list of the sets which contain each number?
1 -- 1, 2, 3, 1000000
3 -- 1, 3
5 -- 2
etc.
Extending amit's solution, instead of storing the actual numbers, you could just store intervals and their associated sets.
For example using a interval size of 5:
(1-5): [1,2,3,1000000]
(6-10): [2,1000000]
(11-15): [3]
(16-20): [1000000]
In the case of (1,7) you should consider intervals (1-5) and (5-10) (which can be determined simply by knowing the size of the interval). Intersecting those ranges gives you [2,1000000]. Binary search of the sets shows that indeed, (1,7) exists in both sets.
Though you'll want to check the min and max values for each set to get a better idea of what the interval size should be. For example, 5 is probably a bad choice if the min and max values go from 1 to a million.
You should probably keep it so that a binary search can be used to check for values, so the subset range should be something like (min + max)/N, where 2N is the max number of values that will need to be binary searched in each set. For example, "does set 3 contain any values from 5 to 10?" this is done by finding the closest values to 5 (3) and 10 (11), in this case, no it does not. You would have to go through each set and do binary searches for the interval values that could be within the set. This means ensuring that you don't go searching for 100 when the set only goes up to 10.
You could also just store the range (min and max). However, the issue is that I suspect your numbers are going be be clustered, thus not providing much use. Although as mentioned, it'll probably be useful for determining how to set up the intervals.
It'll still be troublesome to pick what range to use, too large and it'll take a long time to build the data structure (1000 * million * log(N)). Too small, and you'll start to run into space issues. The ideal size of the range is probably such that it ensures that the number of set's related to each range is approximately equal, while also ensuring that the total number of ranges isn't too high.
Edit:
One benefit is that you don't actually need to store all intervals, just the ones you need. Although, if you have too many unused intervals, it might be wise to increase the interval and split the current intervals to ensure that the search is fast. This is especially true if processioning time isn't a major issue.
Start searching from biggest number (7) of input set and
eliminate other subsets (Set1 and Set1000000 will returned).
Search other input elements (1) in remaining sets.

Can I do better than binary search here?

I want to pick the top "range" of cards based upon a percentage. I have all my possible 2 card hands organized in an array in order of the strength of the hand, like so:
AA, KK, AKsuited, QQ, AKoff-suit ...
I had been picking the top 10% of hands by multiplying the length of the card array by the percentage which would give me the index of the last card in the array. Then I would just make a copy of the sub-array:
Arrays.copyOfRange(cardArray, 0, 16);
However, I realize now that this is incorrect because there are more possible combinations of, say, Ace King off-suit - 12 combinations (i.e. an ace of one suit and a king of another suit) than there are combinations of, say, a pair of aces - 6 combinations.
When I pick the top 10% of hands therefore I want it to be based on the top 10% of hands in proportion to the total number of 2 cards combinations - 52 choose 2 = 1326.
I thought I could have an array of integers where each index held the combined total of all the combinations up to that point (each index would correspond to a hand from the original array). So the first few indices of the array would be:
6, 12, 16, 22
because there are 6 combinations of AA, 6 combinations of KK, 4 combinations of AKsuited, 6 combinations of QQ.
Then I could do a binary search which runs in BigOh(log n) time. In other words I could multiply the total number of combinations (1326) by the percentage, search for the first index lower than or equal to this number, and that would be the index of the original array that I need.
I wonder if there a way that I could do this in constant time instead?
As Groo suggested, if precomputation and memory overhead permits, it would be more efficient to create 6 copies of AA, 6 copies of KK, etc and store them into a sorted array. Then you could run your original algorithm on this properly weighted list.
This is best if the number of queries is large.
Otherwise, I don't think you can achieve constant time for each query. This is because the queries depend on the entire frequency distribution. You can't look only at a constant number of elements to and determine if it's the correct percentile.
had a similar discussion here Algorithm for picking thumbed-up items As a comment to my answer (basically what you want to do with your list of cards), someone suggested a particular data structure, http://en.wikipedia.org/wiki/Fenwick_tree
Also, make sure your data structure will be able to provide efficient access to, say, the range between top 5% and 15% (not a coding-related tip though ;).

Resources