An algorithm to calculate foreign residence without enumerating days? - algorithm

The visa conditions for a country to which I travel frequently include this restriction:
"You may reside in [country] for a maximum of 90 days in any period of 180"
Given a tentative list of pairs of dates (entry and exit dates), is there an algorithm that can tell me, for each visit, whether I will be in or out of compliance, and by how many days?
Clearly, one way to do this would be to build a large array of individual days and then slide a 180-day window along it, counting residence days. But I'm wondering whether there is a more elegant method which doesn't involve building a great long list of days.

The normal algorithm for this is basically a greedy algorithm, though it could also be seen as a 1D dynamic progamming algorithm. Basically, rather than sliding the window 1 day at a time, you slide it 1 starting-date at a time. Like so:
first_interval = 0
last_interval = 0
for first_interval = 0 to N:
# include more intervals as long as they (partially) fit within 180 days after the first one
while last_interval < N and A[last_interval].start - A[first_interval].start < 180:
last_interval += 1
calculate total number of days in intervals, possibly clipping the last one
The need to clip the last interval makes it a bit less elegant than it could otherwise be: in similar algorithms, rather than summing the total each time, you add to it for added-on intervals (when incrementing last_interval) and subtract from it for left-behind intervals (when incrementing first_interval). You could do something kind of similar here with the second-to-last interval, but unless you're in a serious performance bind it's probably best not to.

The following C++ code calculates the duration between two arbitrary dates no earlier than Jan 1, 1 A.D. in O(1) time. Is this what you're looking for?
#include <iostream>
using namespace std;
int days(int y,int m,int d){
int i,s,n[13]={0,31,28,31,30,31,30,31,31,30,31,30,31};
y+=(m-1)/12; m=(m-1)%12+1; if(m<1) {y--; m+=12;}
y--; s=y*365+y/4-y/100+y/400; y++;
if(y%4==0 && y%100 || y%400==0) n[2]++;
for(i=1;i<m;i++) s+=n[i]; s+=d; return s;
}
int main(){
cout<<days(2017,8,14)-days(2005,2,28)<<endl;
return 0;
}
You can use the days() function to map all dates of entry and exit to integers and then use the Sneftel's algorithm.

Related

Memory-constrained coin changing for numbers up to one billion

I faced this problem on one training. Namely we have given N different values (N<= 100). Let's name this array A[N], for this array A we are sure that we have 1 in the array and A[i] ≤ 109. Secondly we have given number S where S ≤ 109.
Now we have to solve classic coin problem with this values. Actually we need to find minimum number of element which will sum to exactly S. Every element from A can be used infinite number of times.
Time limit: 1 sec
Memory limit: 256 MB
Example:
S = 1000, N = 10
A[] = {1,12,123,4,5,678,7,8,9,10}. The result is 10.
1000 = 678 + 123 + 123 + 12 + 12 + 12 + 12 + 12 + 12 + 4
What I have tried
I tried to solve this with classic dynamic programming coin problem technique but it uses too much memory and it gives memory limit exceeded.
I can't figure out what should we keep about those values. Thanks in advance.
Here are the couple test cases that cannot be solved with the classic dp coin problem.
S = 1000000000 N = 100
1 373241370 973754081 826685384 491500595 765099032 823328348 462385937
251930295 819055757 641895809 106173894 898709067 513260292 548326059
741996520 959257789 328409680 411542100 329874568 352458265 609729300
389721366 313699758 383922849 104342783 224127933 99215674 37629322
230018005 33875545 767937253 763298440 781853694 420819727 794366283
178777428 881069368 595934934 321543015 27436140 280556657 851680043
318369090 364177373 431592761 487380596 428235724 134037293 372264778
267891476 218390453 550035096 220099490 71718497 860530411 175542466
548997466 884701071 774620807 118472853 432325205 795739616 266609698
242622150 433332316 150791955 691702017 803277687 323953978 521256141
174108096 412366100 813501388 642963957 415051728 740653706 68239387
982329783 619220557 861659596 303476058 85512863 72420422 645130771
228736228 367259743 400311288 105258339 628254036 495010223 40223395
110232856 856929227 25543992 957121494 359385967 533951841 449476607
134830774
OUTPUT FOR THIS TEST CASE: 5
S = 999865497 N = 7
1 267062069 637323855 219276511 404376890 528753603 199747292
OUTPUT FOR THIS TEST CASE: 1129042
S = 1000000000 N = 40
1 12 123 4 5 678 7 8 9 10 400 25 23 1000 67 98 33 46 79 896 11 112 1223 412
532 6781 17 18 19 170 1400 925 723 11000 607 983 313 486 739 896
OUTPUT FOR THIS TEST CASE: 90910
(NOTE: Updated and edited for clarity. Complexity Analysis added at the end.)
OK, here is my solution, including my fixes to the performance issues found by #PeterdeRivaz. I have tested this against all of the test cases provided in the question and the comments and it finishes all in under a second (well, 1.5s in one case), using primarily only the memory for the partial results cache (I'd guess about 16MB).
Rather than using the traditional DP solution (which is both too slow and requires too much memory), I use a Depth-First, Greedy-First combinatorial search with pruning using current best results. I was surprised (very) that this works as well as it does, but I still suspect that you could construct test sets that would take a worst-case exponential amount of time.
First there is a master function that is the only thing that calling code needs to call. It handles all of the setup and initialization and calls everything else. (all code is C#)
// Find the min# of coins for a specified sum
int CountChange(int targetSum, int[] coins)
{
// init the cache for (partial) memoization
PrevResultCache = new PartialResult[1048576];
// make sure the coins are sorted lowest to highest
Array.Sort(coins);
int curBest = targetSum;
int result = CountChange_r(targetSum, coins, coins.GetLength(0)-1, 0, ref curBest);
return result;
}
Because of the problem test-cases raised by #PeterdeRivaz I have also added a partial results cache to handle when there are large numbers in N[] that are close together.
Here is the code for the cache:
// implement a very simple cache for previous results of remainder counts
struct PartialResult
{
public int PartialSum;
public int CoinVal;
public int RemainingCount;
}
PartialResult[] PrevResultCache;
// checks the partial count cache for already calculated results
int PrevAddlCount(int currSum, int currCoinVal)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// use it, as long as it's actually the same partial sum
// and the coin value is at least as large as the current coin
if ((prev.PartialSum == currSum) && (prev.CoinVal >= currCoinVal))
{
return prev.RemainingCount;
}
// otherwise flag as empty
return 0;
}
// add or overwrite a new value to the cache
void AddPartialCount(int currSum, int currCoinVal, int remainingCount)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// only add if the Sum is different or the result is better
if ((prev.PartialSum != currSum)
|| (prev.CoinVal <= currCoinVal)
|| (prev.RemainingCount == 0)
|| (prev.RemainingCount >= remainingCount)
)
{
prev.PartialSum = currSum;
prev.CoinVal = currCoinVal;
prev.RemainingCount = remainingCount;
PrevResultCache[cacheAddr] = prev;
}
}
And here is the code for the recursive function that does the actual counting:
/*
* Find the minimum number of coins required totaling to a specifuc sum
* using a list of coin denominations passed.
*
* Memory Requirements: O(N) where N is the number of coin denominations
* (primarily for the stack)
*
* CPU requirements: O(Sqrt(S)*N) where S is the target Sum
* (Average, estimated. This is very hard to figure out.)
*/
int CountChange_r(int targetSum, int[] coins, int coinIdx, int curCount, ref int curBest)
{
int coinVal = coins[coinIdx];
int newCount = 0;
// check to see if we are at the end of the search tree (curIdx=0, coinVal=1)
// or we have reached the targetSum
if ((coinVal == 1) || (targetSum == 0))
{
// just use math get the final total for this path/combination
newCount = curCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// prune this whole branch, if it cannot possibly improve the curBest
int bestPossible = curCount + (targetSum / coinVal);
if (bestPossible >= curBest)
return bestPossible; //NOTE: this is a false answer, but it shouldnt matter
// because we should never use it.
// check the cache to see if a remainder-count for this partial sum
// already exists (and used coins at least as large as ours)
int prevRemCount = PrevAddlCount(targetSum, coinVal);
if (prevRemCount > 0)
{
// it exists, so use it
newCount = prevRemCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// always try the largest remaining coin first, starting with the
// maximum possible number of that coin (greedy-first searching)
newCount = curCount + targetSum;
for (int cnt = targetSum / coinVal; cnt >= 0; cnt--)
{
int tmpCount = CountChange_r(targetSum - (cnt * coinVal), coins, coinIdx - 1, curCount + cnt, ref curBest);
if (tmpCount < newCount) newCount = tmpCount;
}
// Add our new partial result to the cache
AddPartialCount(targetSum, coinVal, newCount - curCount);
return newCount;
}
Analysis:
Memory: Memory usage is pretty easy to determine for this algorithm. Basiclly there's only the partial results cache and the stack. The cache is fixed at appx. 1 million entries times the size of each entry (3*4 bytes), so about 12MB. The stack is limited to O(N), so together, memory is clearly not a problem.
CPU: The run-time complexity of this algorithm starts out hard to determine and then gets harder, so please excuse me because there's a lot of hand-waving here. I tried to search for an analysis of just the brute-force problem (combinatorial search of sums of N*kn base values summing to S) but not much turned up. What little there was tended to say it was O(N^S), which is clearly too high. I think that a fairer estimate is O(N^(S/N)) or possibly O(N^(S/AVG(N)) or even O(N^(S/(Gmean(N))) where Gmean(N) is the geometric mean of the elements of N[]. This solution starts out with the brute-force combinatorial search and then improves it with two significant optimizations.
The first is the pruning of branches based on estimates of the best possible results for that branch versus what the best result it has already found. If the best-case estimators were perfectly accurate and the work for branches was perfectly distributed, this would mean that if we find a result that is better than 90% of the other possible cases, then pruning would effectively eliminate 90% of the work from that point on. To make a long story short here, this should work out that the amount of work still remaining after pruning should shrink harmonically as it progress. Assuming that some kind of summing/integration should be applied to get a work total, this appears to me to work out to a logarithm of the original work. So let's call it O(Log(N^(S/N)), or O(N*Log(S/N)) which is pretty darn good. (Though O(N*Log(S/Gmean(N))) is probably more accurate).
However, there are two obvious holes with this. First, it is true that the best-case estimators are not perfectly accurate and thus they will not prune as effectively as assumed above, but, this is somewhat counter-balanced by the Greedy-First ordering of the branches which gives the best chances for finding better solutions early in the search which increase the effectiveness of pruning.
The second problem is that the best-case estimator works better when the different values of N are far apart. Specifically, if |(S/n2 - S/n1)| > 1 for any 2 values in N, then it becomes almost perfectly effective. For values of N less than SQRT(S), then even two adjacent values (k, k+1) are far enough apart that that this rule applies. However for increasing values above SQRT(S) a window opens up so that any number of N-values within that window will not be able to effectively prune each other. The size of this window is approximately K/SQRT(S). So if S=10^9, when K is around 10^6 this window will be almost 30 numbers wide. This means that N[] could contain 1 plus every number from 1000001 to 1000029 and the pruning optimization would provide almost no benefit.
To address this, I added the partial results cache which allows memoization of the most recent partial sums up to the target S. This takes advantage of the fact that when the N-values are close together, they will tend to have an extremely high number of duplicates in their sums. As best as I can figure, this effectiveness is approximately the N times the J-th root of the problem size where J = S/K and K is some measure of the average size of the N-values (Gmean(N) is probably the best estimate). If we apply this to the brute-force combinatorial search, assuming that pruning is ineffective, we get O((N^(S/Gmean(N)))^(1/Gmean(N))), which I think is also O(N^(S/(Gmean(N)^2))).
So, at this point take your pick. I know this is really sketchy, and even if it is correct, it is still very sensitive to the distribution of the N-values, so lots of variance.
[I've replaced the previous idea about bit operations because it seems to be too time consuming]
A bit crazy idea and incomplete but may work.
Let's start with introducing f(n,s) which returns number of combinations in which s can be composed from n coins.
Now, how f(n+1,s) is related to f(n)?
One of possible ways to calculate it is:
f(n+1,s)=sum[coin:coins]f(n,s-coin)
For example, if we have coins 1 and 3,
f(0,)=[1,0,0,0,0,0,0,0] - with zero coins we can have only zero sum
f(1,)=[0,1,0,1,0,0,0,0] - what we can have with one coin
f(2,)=[0,0,1,0,2,0,1,0] - what we can have with two coins
We can rewrite it a bit differently:
f(n+1,s)=sum[i=0..max]f(n,s-i)*a(i)
a(i)=1 if we have coin i and 0 otherwise
What we have here is convolution: f(n+1,)=conv(f(n,),a)
https://en.wikipedia.org/wiki/Convolution
Computing it as definition suggests gives O(n^2)
But we can use Fourier transform to reduce it to O(n*log n).
https://en.wikipedia.org/wiki/Convolution#Convolution_theorem
So now we have more-or-less cheap way to find out what numbers are possible with n coins without going incrementally - just calculate n-th power of F(a) and apply inverse Fourier transform.
This allows us to make a kind of binary search which can help handling cases when the answer is big.
As I said the idea is incomplete - for now I have no idea how to combine bit representation with Fourier transforms (to satisfy memory constraint) and whether we will fit into 1 second on any "regular" CPU...

How to compute blot exposure in backgammon efficiently

I am trying to implement an algorithm for backgammon similar to td-gammon as described here.
As described in the paper, the initial version of td-gammon used only the raw board encoding in the feature space which created a good playing agent, but to get a world-class agent you need to add some pre-computed features associated with good play. One of the most important features turns out to be the blot exposure.
Blot exposure is defined here as:
For a given blot, the number of rolls out of 36 which would allow the opponent to hit the blot. The total blot exposure is the number of rolls out of 36 which would allow the opponent to hit any blot. Blot exposure depends on: (a) the locations of all enemy men in front of the blot; (b) the number and location of blocking points between the blot and the enemy men and (c) the number of enemy men on the bar, and the rolls which allow them to re-enter the board, since men on the bar must re-enter before blots can be hit.
I have tried various approaches to compute this feature efficiently but my computation is still too slow and I am not sure how to speed it up.
Keep in mind that the td-gammon approach evaluates every possible board position for a given dice roll, so each turn for every players dice roll you would need to calculate this feature for every possible board position.
Some rough numbers: assuming there are approximately 30 board position per turn and an average game lasts 50 turns we get that to run 1,000,000 game simulations takes: (x * 30 * 50 * 1,000,000) / (1000 * 60 * 60 * 24) days where x is the number of milliseconds to compute the feature. Putting x = 0.7 we get approximately 12 days to simulate 1,000,000 games.
I don't really know if that's reasonable timing but I feel there must be a significantly faster approach.
So here's what I've tried:
Approach 1 (By dice roll)
For every one of the 21 possible dice rolls, recursively check to see a hit occurs. Here's the main workhorse for this procedure:
private bool HitBlot(int[] dieValues, Checker.Color checkerColor, ref int depth)
{
Moves legalMovesOfDie = new Moves();
if (depth < dieValues.Length)
{
legalMovesOfDie = LegalMovesOfDie(dieValues[depth], checkerColor);
}
if (depth == dieValues.Length || legalMovesOfDie.Count == 0)
{
return false;
}
bool hitBlot = false;
foreach (Move m in legalMovesOfDie.List)
{
if (m.HitChecker == true)
{
return true;
}
board.ApplyMove(m);
depth++;
hitBlot = HitBlot(dieValues, checkerColor, ref depth);
board.UnapplyMove(m);
depth--;
if (hitBlot == true)
{
break;
}
}
return hitBlot;
}
What this function does is take as input an array of dice values (i.e. if the player rolls 1,1 the array would be [1,1,1,1]. The function then recursively checks to see if there is a hit and if so exits with true. The function LegalMovesOfDie computes the legal moves for that particular die value.
Approach 2 (By blot)
With this approach I first find all the blots and then for each blot I loop though every possible dice value and see if a hit occurs. The function is optimized so that once a dice value registers a hit I don't use it again for the next blot. It is also optimized to only consider moves that are in front of the blot. My code:
public int BlotExposure2(Checker.Color checkerColor)
{
if (DegreeOfContact() == 0 || CountBlots(checkerColor) == 0)
{
return 0;
}
List<Dice> unusedDice = Dice.GetAllDice();
List<int> blotPositions = BlotPositions(checkerColor);
int count = 0;
for(int i =0;i<blotPositions.Count;i++)
{
int blotPosition = blotPositions[i];
for (int j =unusedDice.Count-1; j>= 0;j--)
{
Dice dice = unusedDice[j];
Transitions transitions = new Transitions(this, dice);
bool hitBlot = transitions.HitBlot2(checkerColor, blotPosition);
if(hitBlot==true)
{
unusedDice.Remove(dice);
if (dice.ValuesEqual())
{
count = count + 1;
}
else
{
count = count + 2;
}
}
}
}
return count;
}
The method transitions.HitBlot2 takes a blotPosition parameter which ensures that only moves considered are those that are in front of the blot.
Both of these implementations were very slow and when I used a profiler I discovered that the recursion was the cause, so I then tried refactoring these as follows:
To use for loops instead of recursion (ugly code but it's much faster)
To use parallel.foreach so that instead of checking 1 dice value at a time I check these in parallel.
Here are the average timing results of my runs for 50000 computations of the feature (note the timings for each approach was done of the same data):
Approach 1 using recursion: 2.28 ms per computation
Approach 2 using recursion: 1.1 ms per computation
Approach 1 using for loops: 1.02 ms per computation
Approach 2 using for loops: 0.57 ms per computation
Approach 1 using parallel.foreach: 0.75 ms per computation
6 Approach 2 using parallel.foreach: 0.75 ms per computation
I've found the timings to be quite volatile (Maybe dependent on the random initialization of the neural network weights) but around 0.7 ms seems achievable which if you recall leads to 12 days of training for 1,000,000 games.
My questions are: Does anyone know if this is reasonable? Is there a faster algorithm I am not aware of that can reduce training?
One last piece of info: I'm running on a fairly new machine. Intel Cote (TM) i7-5500U CPU #2.40 GHz.
Any more info required please let me know and I will provide.
Thanks,
Ofir
Yes, calculating these features makes really hairy code. Look at the GNU Backgammon code. find the eval.c and look at the lines for 1008 to 1267. Yes, it's 260 lines of code. That code calculates what the number of rolls that hits at least one checker, and also the number of rolls that hits at least 2 checkers. As you see, the code is hairy.
If you find a better way to calculate this, please post your results. To improve I think you have to look at the board representation. Can you represent the board in a different way that makes this calculation faster?

Subtract a number's digits from the number until it reaches 0

Can anyone help me with some algorithm for this problem?
We have a big number (19 digits) and, in a loop, we subtract one of the digits of that number from the number itself.
We continue to do this until the number reaches zero. We want to calculate the minimum number of subtraction that makes a given number reach zero.
The algorithm must respond fast, for a 19 digits number (10^19), within two seconds. As an example, providing input of 36 will give 7:
1. 36 - 6 = 30
2. 30 - 3 = 27
3. 27 - 7 = 20
4. 20 - 2 = 18
5. 18 - 8 = 10
6. 10 - 1 = 9
7. 9 - 9 = 0
Thank you.
The minimum number of subtractions to reach zero makes this, I suspect, a very thorny problem, one that will require a great deal of backtracking potential solutions, making it possibly too expensive for your time limitations.
But the first thing you should do is a sanity check. Since the largest digit is a 9, a 19-digit number will require about 1018 subtractions to reach zero. Code up a simple program to continuously subtract 9 from 1019 until it becomes less than ten. If you can't do that within the two seconds, you're in trouble.
By way of example, the following program (a):
#include <stdio.h>
int main (int argc, char *argv[]) {
unsigned long long x = strtoull(argv[1], NULL, 10);
x /= 1000000000;
while (x > 9)
x -= 9;
return x;
}
when run with the argument 10000000000000000000 (1019), takes a second and a half clock time (and CPU time since it's all calculation) even at gcc insane optimisation level of -O3:
real 0m1.531s
user 0m1.528s
sys 0m0.000s
And that's with the one-billion divisor just before the while loop, meaning the full number of iterations would take about 48 years.
So a brute force method isn't going to help here, what you need is some serious mathematical analysis which probably means you should post a similar question over at https://math.stackexchange.com/ and let the math geniuses have a shot.
(a) If you're wondering why I'm getting the value from the user rather than using a constant of 10000000000000000000ULL, it's to prevent gcc from calculating it at compile time and turning it into something like:
mov $1, %eax
Ditto for the return x which will prevent it noticing I don't use the final value of x and hence optimise the loop out of existence altogether.
I don't have a solution that can solve 19 digit numbers in 2 seconds. Not even close. But I did implement a couple of algorithms (including a dynamic programming algorithm that solves for the optimum), and gained some insight that I believe is interesting.
Greedy Algorithm
As a baseline, I implemented a greedy algorithm that simply picks the largest digit in each step:
uint64_t countGreedy(uint64_t inputVal) {
uint64_t remVal = inputVal;
uint64_t nStep = 0;
while (remVal > 0) {
uint64_t digitVal = remVal;
uint_fast8_t maxDigit = 0;
while (digitVal > 0) {
uint64_t nextDigitVal = digitVal / 10;
uint_fast8_t digit = digitVal - nextDigitVal * 10;
if (digit > maxDigit) {
maxDigit = digit;
}
digitVal = nextDigitVal;
}
remVal -= maxDigit;
++nStep;
}
return nStep;
}
Dynamic Programming Algorithm
The idea for this is that we can calculate the optimum incrementally. For a given value, we pick a digit, which adds one step to the optimum number of steps for the value with the digit subtracted.
With the target function (optimum number of steps) for a given value named optSteps(val), and the digits of the value named d_i, the following relationship holds:
optSteps(val) = 1 + min(optSteps(val - d_i))
This can be implemented with a dynamic programming algorithm. Since d_i is at most 9, we only need the previous 9 values to build on. In my implementation, I keep a circular buffer of 10 values:
static uint64_t countDynamic(uint64_t inputVal) {
uint64_t minSteps[10] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
uint_fast8_t digit0 = 0;
for (uint64_t val = 10; val <= inputVal; ++val) {
digit0 = val % 10;
uint64_t digitVal = val;
uint64_t minPrevStep = 0;
bool prevStepSet = false;
while (digitVal > 0) {
uint64_t nextDigitVal = digitVal / 10;
uint_fast8_t digit = digitVal - nextDigitVal * 10;
if (digit > 0) {
uint64_t prevStep = 0;
if (digit > digit0) {
prevStep = minSteps[10 + digit0 - digit];
} else {
prevStep = minSteps[digit0 - digit];
}
if (!prevStepSet || prevStep < minPrevStep) {
minPrevStep = prevStep;
prevStepSet = true;
}
}
digitVal = nextDigitVal;
}
minSteps[digit0] = minPrevStep + 1;
}
return minSteps[digit0];
}
Comparison of Results
This may be considered a surprise: I ran both algorithms on all values up to 1,000,000. The results are absolutely identical. This suggests that the greedy algorithm actually calculates the optimum.
I don't have a formal proof that this is indeed true for all possible values. It intuitively kind of makes sense to me. If in any given step, you choose a smaller digit than the maximum, you compromise the immediate progress with the goal of getting into a more favorable situation that allows you to catch up and pass the greedy approach. But in all the scenarios I thought about, the situation after taking a sub-optimal step just does not get significantly more favorable. It might make the next step bigger, but that is at most enough to get even again.
Complexity
While both algorithms look linear in the size of the value, they also loop over all digits in the value. Since the number of digits corresponds to log(n), I believe the complexity is O(n * log(n)).
I think it's possible to make it linear by keeping counts of the frequency of each digit, and modifying them incrementally. But I doubt it would actually be faster. It requires more logic, and turns a loop over all digits in the value (which is in the range of 2-19 for the values we are looking at) into a fixed loop over 10 possible digits.
Runtimes
Not surprisingly, the greedy algorithm is faster to calculate a single value. For example, for value 1,000,000,000, the runtimes on my MacBook Pro are:
greedy: 3 seconds
dynamic: 36 seconds
On the other hand, the dynamic programming approach is obviously much faster at calculating all the values, since its incremental approach needs them as intermediate results anyway. For calculating all values from 10 to 1,000,000:
greedy: 19 minutes
dynamic: 0.03 seconds
As already shown in the runtimes above, the greedy algorithm gets about as high as 9 digit input values within the targeted runtime of 2 seconds. The implementations aren't really tuned, and it's certainly possible to squeeze out some more time, but it would be fractional improvements.
Ideas
As already explored in another answer, there's no chance of getting the result for 19 digit numbers in 2 seconds by subtracting digits one by one. Since we subtract at most 9 in each step, completing this for a value of 10^19 needs more than 10^18 steps. We mostly use computers that perform in the rough range of 10^9 operations/second, which suggests that it would take about 10^9 seconds.
Therefore, we need something that can take shortcuts. I can think of scenarios where that's possible, but haven't been able to generalize it to a full strategy so far.
For example, if your current value is 9999, you know that you can subtract 9 until you reach 9000. So you can calculate that you will make 112 steps ((9999 - 9000) / 9 + 1) where you subtract 9, which can be done in a few operations.
As said in comments already, and agreeing with #paxdiablo’s other answer, I’m not sure if there is an algorithm to find the ideal solution without some backtracking; and the size of the number and the time constraint might be tough as well.
A general consideration though: You might want to find a way to decide between always subtracting the highest digit (which will decrease your current number by the largest possible amount, obviously), and by looking at your current digits and subtracting which of those will give you the largest “new” digit.
Say, your current number only consists of digits between 0 and 5 – then you might be tempted to subtract the 5 to decrease your number by the highest possible value, and continue with the next step. If the last digit of your current number is 3 however, then you might want to subtract 4 instead – since that will give you 9 as new digit at the end of the number, instead of “only” 8 you would be getting if you subtracted 5.
Whereas if you have a 2 and two 9 in your digits already, and the last digit is a 1 – then you might want to subtract the 9 anyway, since you will be left with the second 9 in the result (at least in most cases; in some edge cases it might get obliterated from the result as well), so subtracting the 2 instead would not have the advantage of giving you a “high” 9 that you would otherwise not have in the next step, and would have the disadvantage of not lowering your number by as high an amount as subtracting the 9 would …
But every digit you subtract will not only affect the next step directly, but the following steps indirectly – so again, I doubt there is a way to always chose the ideal digit for the current step without any backtracking or similar measures.

Divide a group of people into two disjoint subgroups (of arbitrary size) and find some values

As we know from programming, sometimes a slight change in a problem can
significantly alter the form of its solution.
Firstly, I want to create a simple algorithm for solving
the following problem and classify it using bigtheta
notation:
Divide a group of people into two disjoint subgroups
(of arbitrary size) such that the
difference in the total ages of the members of
the two subgroups is as large as possible.
Now I need to change the problem so that the desired
difference is as small as possible and classify
my approach to the problem.
Well,first of all I need to create the initial algorithm.
For that, should I make some kind of sorting in order to separate the teams, and how am I suppose to continue?
EDIT: for the first problem,we have ruled out the possibility of a set being an empty set. So all we have to do is just a linear search to find the min age and then put it in a set B. SetA now has all the other ages except the age of setB, which is the min age. So here is the max difference of the total ages of the two sets, as high as possible
The way you described the first problem, it is trivial in the way that it requires you to find only the minimum element (in case the subgroups should contain at least 1 member), otherwise it is already solved.
The second problem can be solved recursively the pseudo code would be:
// compute sum of all elem of array and store them in sum
min = sum;
globalVec = baseVec;
fun generate(baseVec, generatedVec, position, total)
if (abs(sum - 2*total) < min){ // check if the distribution is better
min = abs(sum - 2*total);
globalVec = generatedVec;
}
if (position >= baseVec.length()) return;
else{
// either consider elem at position in first group:
generate(baseVec,generatedVec.pushback(baseVec[position]), position + 1, total+baseVec[position]);
// or consider elem at position is second group:
generate(baseVec,generatedVec, position + 1, total);
}
And now just start the function with generate(baseVec,"",0,0) where "" stand for an empty vector.
The algo can be drastically improved by applying it to a sorted array, hence adding a test condition to stop branching, but the idea stays the same.

Algorithm interview from Google

I am a long time lurker, and just had an interview with Google where they asked me this question:
Various artists want to perform at the Royal Albert Hall and you are responsible for scheduling
their concerts. Requests for performing at the Hall are accommodated on a first come first served
policy. Only one performance is possible per day and, moreover, there cannot be any concerts
taking place within 5 days of each other
Given a requested time d which is impossible (i.e. within 5 days of an already sched-
uled performance), give an O(log n)-time algorithm to find the next available day d2
(d2 > d).
I had no clue how to solve it, and now that the interview is over, I am dying to figure out how to solve it. Knowing how smart most of you folks are, I was wondering if you can give me a hand here. This is NOT for homework, or anything of that sort. I just want to learn how to solve it for future interviews. I tried asking follow up questions but he said that is all I can tell you.
You need a normal binary search tree of intervals of available dates. Just search for the interval containing d. If it does not exist, take the interval next (in-order) to the point where the search stopped.
Note: contiguous intervals must be fused together in a single node. For example: the available-dates intervals {2 - 15} and {16 - 23} should become {2 - 23}. This might happen if a concert reservation was cancelled.
Alternatively, a tree of non-available dates can be used instead, provided that contiguous non-available intervals are fused together.
Store the scheduled concerts in a binary search tree and find a feasible solution by doing a binary search.
Something like this:
FindDateAfter(tree, x):
n = tree.root
if n.date < x
n = FindDateAfter(n.right, x)
else if n.date > x and n.left.date < x
return n
return FindDateAfter(n.left, x)
FindGoodDay(tree, x):
n = FindDateAfter(tree, x)
while (n.date + 10 < n.right.date)
n = FindDateAfter(n, n.date + 5)
return n.date + 5
I've used a binary search tree (BST) that holds the ranges for valid free days that can be scheduled for performances.
One of the ranges must end with int.MaxValue, because we have an infinite amount of days so it can't be bound.
The following code searches for the closest day to the requested day, and returns it.
The time complexity is O(H) when H is the tree height (usually H=log(N), but can become H=N in some cases.).
The space complexity is the same as the time complexity.
public static int FindConcertTime(TreeNode<Tuple<int, int>> node, int reqDay)
{
// Not found!
if (node == null)
{
return -1;
}
Tuple<int, int> currRange = node.Value;
// Found range.
if (currRange.Item1 <= reqDay &&
currRange.Item2 >= reqDay)
{
// Return requested day.
return reqDay;
}
// Go left.
else if (currRange.Item1 > reqDay)
{
int suggestedDay = FindConcertTime(node.Left, reqDay);
// Didn't find appropriate range in left nodes, or found day
// is further than current option.
if (suggestedDay == -1 || suggestedDay > currRange.Item1)
{
// Return current option.
return currRange.Item1;
}
else
{
// Return suggested day.
return suggestedDay;
}
}
// Go right.
// Will always find because the right-most node has "int.MaxValue" as Item2.
else //if (currRange.Item2 < reqDay)
{
return FindConcertTime(node.Right, reqDay);
}
}
Store the number of used nights per year, quarter, and month. To find a free night, find the first year that is not fully booked, then the quarter within that year, then the month. Then check each of the nights in that month.
Irregularities in the calendar system makes this a little tricky so instead of using years and months you can apply the idea for units of 4 nights as "month", 16 nights as "quarter", and so on.
Assume, at level 1 all schedule details are available.
Group schedule of 16 days schedule at level 2.
Group 16 level 2 status at level 3.
Group 16 level 3 status at level 4.
Depends on number of days that you want to expand, increase the level.
Now search from higher level and do binary search at the end.
Asymtotic complexity:-
It means runtime is changing as the input grows.
suppose we have an input string “abcd”. Here we traverse through each character to find its length thus the time taken is proportional to the no of characters in the string like n no of char. Thus O(n).
but if we put the length of the string “abcd” in a variable then no matter how long the string be we still can find the length of thestring by looking at the variable len. (len=4).
ex: return 23. no matter what you input is we still have the output as 23.
thus the complexity is O(1). Thus th program will be running in a constant time wrt input size.
for O(log n) - the operations are happening in logarithmic steps.
https://drive.google.com/file/d/0B7eUOnXKVyeERzdPUE8wYWFQZlk/view?usp=sharing
Observe the image in the above link. Over here we can see the bended line(logarithmic line). Here we can say that for smaller inputs the O(log n) notation works good as the time taken is less as we can see in the bended line but when the input grows the linear notation i.e O(n) is considered as better way.
There are also the best and worst case scenarios to be seen. Like the above example.
You can also refer to this cheat for the algorithms: http://bigocheatsheet.com/
It was already mentioned above, but basically keep it simple with a binary tree. You know a binary tree has log N complexity. So you already know what algorithm you need to use.
All you have to do is to come up with a tree node structure and use binary tree insertion algorithm to find next available date:
A possible one:
The tree node has two attributes: d (date of the concert) and d+5 (end date for the blocking period of 5 days). Again to keep it simple, use a timestamp for the two date attributes.
Now it is trivial to find next available date by using binary tree inorder insertion algorithm with initial condition of root = null.
Why not try to use Union-Find? You can group each concert day + the next 5 days as part of one set and then perform a FIND on the given day which would return the next set ID which would be your next concert date.
If implemented using a tree, this gives a O(log n) time complexity.

Resources