different time steps in Vensim do not have the same result, how can this be solved? - systemdynamics

I am making a simple model in Vensim.
the model consists of a stock and an out-flow variable:
stock:
equations
= INTEG (-"out-flow"), initial value: 2.5
out-flow:
equations
= IF THEN ELSE( stock>0, MIN(stock, 1), 0)
simulation runs for 5 years (initial time = 0 and final time = 5) and the unit of time is year.
I need to do the simulation 64 times every year, therefore the time step is set to “0.015625”.
the result I get with this time step is not logical and is not what I expect but the desired result is obtained by setting "time step = 1".
as mentioned earlier the simulation needs to run 64 times every year and simulation with time step = 1 is of no use to me. how can I solve this problem?
thanks in advance.

To do that you can transform your units of years into days, rounding off the days of the TIME STEP.
Units for Time = Year, INITIAL TIME = 0, FINAL TIME = 5, TIME STEP = 1
IF THEN ELSE( stock > 0, MIN(stock, 1), 0)
after transformation
Units for Time = Day, INITIAL TIME = 0, FINAL TIME = 1825, TIME STEP = 6
IF THEN ELSE( stock/365 > 0, MIN(stock/365, 1/365), 0)

Related

stasmodels SARIMAX predictions

I'm trying to understand how to verify a ARIMAX model for > 1 step ahead using statsmodels.
My understanding is the results.get_prediction(start=, dynamic=) api does this but I'm having trouble getting my head around how it works. My training data is indexed by a localised DateTimeIndex (tz='Sydney\Australia') at 15T freq. I want to predict a full day for '2019-02-04 00:00:00+1100' using one-step-ahead prediction up to '2019-02-04 06:00:00+1100' the previous predicted endogenous values for the rest of the day.
Is the code below correct? It seems statsmodel converts the start to a TimeStamp and treats dynamic as a multiple of the freq, so this should start the simulation using 1 step ahead until 06:00 then use the previous predicted endogenous values. The results don't look great so I want to confirm it's a model issue rather than me having incorrect diagnosis.
dt = '2019-02-04'
predict = res.get_prediction(start='2019-02-04 00:00:00+11:00')
predict_dy = res.get_prediction(start='2019-02-04 00:00:00+11:00', dynamic=4*6)
fig = plt.figure(figsize=(10,10)) ax = fig.gca()
y_train[dt].plot(ax=ax, style='o', label='Observed')
predict.predicted_mean[dt].plot(ax=ax, style='r--', label='One-step-ahead forecast')
predict_dy.predicted_mean[dt].plot(ax=ax, style='g', label='Dynamic forecast')
It seems statsmodel converts the start to a TimeStamp
Yes, if you give it a string value, then it will attempt to map it to an index in your dataset (like a timestamp).
and treats dynamic as a multiple of the freq
But this is not correct. dynamic is an integer offset to start. So if dynamic=0, that means that dynamic prediction begins at start, whereas if dynamic=1, that means that dynamic prediction begins at start+1.
It's not quite clear to me what's going on in your example (or what you think is not great about the predictions you generated), so here is a description of how dynamic works that may help:
Here's an example that may help explain how things work. A couple of key points for this exercise will be:
I set all elements of endog to be equal to 1
This is an AR(1) model with parameter 0.5. That means that if we know y_t, then the prediction of y_t+1 is equal to 0.5 * y_t.
Now, the example code is:
ix = pd.date_range(start='2018-12-01', end='2019-01-31', freq='D')
endog = pd.Series(np.ones(len(ix)), index=ix)
mod = sm.tsa.SARIMAX(endog, order=(1, 0, 0), concentrate_scale=True)
res = mod.smooth([0.5])
p1 = res.predict(start='January 1, 2019', end='January 5, 2019').rename('d=False')
p2 = res.predict(start='January 1, 2019', end='January 5, 2019', dynamic=0).rename('d=0')
p3 = res.predict(start='January 1, 2019', end='January 5, 2019', dynamic=1).rename('d=2')
print(pd.concat([p1, p2, p3], axis=1))
this gives:
d=False d=0 d=2
2019-01-01 0.5 0.50000 0.5000
2019-01-02 0.5 0.25000 0.5000
2019-01-03 0.5 0.12500 0.2500
2019-01-04 0.5 0.06250 0.1250
2019-01-05 0.5 0.03125 0.0625
The first column (d=False) is the default case, where dynamic=False. Here, all predictions are one-step-ahead predictions. Since I set every element of endog to 1 and we have an AR(1) model with parameter 0.5, all one-step-ahead predictions will be equal to 0.5 * 1 = 0.5.
In the second column (d=0), we specify that dynamic=0 so that dynamic prediction begins at the first prediction. This means that we do not use any endog data past start - 1 in forming our predictions, which in this case means we do not use any data past December 31, 2018 in making predictions. The first prediction will be equal to 0.5 times the observation on December 31, 2018, i.e. 0.5 * 1 = 0.5. Each subsequent prediction will be equal to 0.5 * the previous prediction, so the second prediction is 0.5 * 0.5 = 0.25, etc.
The third column (d=1) is like the second column, except that here dynamic=1 so that dynamic prediction begins at the second prediction. This means we do not use any endog data past start (i.e. past January 1, 2019).

design a random(5) using random(7)

Given a random number generator random(7) which can generate number 1,2,3,4,5,6,7 in equal probability(i.e., the probability of each number occurs is 1/7). Now we want to design a random(5) which can generate 1,2,3,4,5 in equal probability(1/5).
There is one way: every time we run random(7), only return when it generates 1-5. If it is 6 or 7, run again until it is 1-5.
I am a little confused. The first question is:
How to proof the probability of each number occurs is 1/5 in mathematical way?
For example, assume probability of returned number 1 is P(1). If B means 'the selected number is in 1-5' and A means 'select 1', then according to conditional probability, P(1) = P(A|B) = P(AB) / P(B). Obviously P(B) is 5/7. But if P(1)=1/5, P(AB) should be 1/7, why? I think P(A)=1/7. Is there anywhere wrong?
The second question is, this method will run until random(7) not return 6 or 7. What if it runs for a long time not returning 1-5? I know the chance is very very small but is there any way to prevent it?
Thanks!
The answer to your first question is given by basic conditional probability:
Let X be the random(7) number then for any k in {1,2,3,4,5}:
P(X = k | X <= 5) = P(X = k)/P(X <= 5) = (1/7)/(5/7) = 1/5
This follows from the observation that the intersection of the 2 events X = k and X <= 5 is simply X = k.
The number of trials until the first success (where a success is getting a number <= 5) is a geometric random variable with p = 5/7. The expected number of trials is 1/p = 7/5 = 1.4. You will get a success sooner rather than later in this set up. As #PeterWalser said in his answer, the chance of not quickly getting a number in the range 1-5 is vanishingly small.
For fun you can write a short script to investigate it. Here is one in Python:
from random import randint
from collections import Counter
def trials_needed():
num = randint(1,7)
trial = 1
while num > 5:
num = randint(1,7)
trial += 1
return trial
counts = Counter(trials_needed() for i in range(10**6))
for c,i in counts.items(): print(c,":",i)
Output from a typical run:
1 : 714212
2 : 204141
3 : 58340
4 : 16515
5 : 4814
6 : 1456
7 : 347
8 : 133
9 : 28
10 : 10
11 : 4
Over 99% of the time less than 5 trials are needed. More than 10 trials is extremely rare.
The probability to roll a number n(1..5) with a rnd(7) are 1/7 in each roll.
The chance to get such a number in the first roll are 5/7, or: in 2/7 of all first-roll cases, you need to roll again.
This results in a series, when examining the probability that a certain number n(1..5) is rolled:
p(n) = 1/7 + 2/7 * (1/7 + 2/7 * (1/7 + 2/7 * (...)))
This series evaluates to 1/5, and that's the expected probability to roll a specific number n(1..5).
Second question: there is a chance that you need to roll eternally. The probability to have a result in x rolls is 1-(2/7)^x, this quickly approaches 1, so you're very likely to get a result in just a few rolls, but without guarantee. The probability to still not have a result in a large number of rolls becomes smaller than the probability C'thulu swallowing the planet in the next 5 minues, so it's not necessary to build in some prevention. If you absolutely must, then return 1 after n internal rolls, this only slightly skews the distribution of the random numbers produced.

Scheduling - Assigning jobs to the most efficient worker

This was asked by a friend of mine. I had no previous context, so I want to know what type of algorithm this problem belongs to. Any hint or suggestions will do.
Suppose we have a group of N workers working at a car assembly line. Each worker can do 3 types of work, and their skills rated from 1 to 10. For example, Worker1's "paint surface" efficiency is rated 8, but "assemble engine" efficiency is only rated 5.
The manager has a list of M jobs defined by start time, duration, job type, and importance, rated from 0 to 1. Each worker can only work on 1 job at a time, and 1 job can be worked by 1 worker. How can the managers assign the jobs properly to get maximum output?
The maximum output for a job = worker skill rating * job importance * duration.
For example, we have workers {w1, w2}
w1: paint_skill = 9, engine_skill = 8
w2: paint_skill = 10,engine_skill = 5
We have jobs {j1, j2}
j1: paint job, start_time = 0, duration = 10, importance = 0.5
j2: engine job, start_time = 3, duration = 10, importance = 0.9
We should assign w1 to j2, and w2 to j1. output = 8 * 10 * 0.5 + 10 * 10 * 0.9 = 40 + 90 = 130
A greedy solution that matches the next available worker with the next job is clearly sub-optimal, as in the example we could have matched w1 to j1, which is not optimal.
A exhaustive brute-force solution would guarantee the best output, but will use exponentially more time to compute with large job lists.
How can this problem be approached?

Algorithm/Function about computing taxi fare [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Let's say that a taxi charges $3.10 for the first fifth of a mile. Then for each additional fifth of a mile it charges half a dollar. For every minute of waiting or delay it charges half a dollar. However, this delay charge is applied instead of the mileage charge for each minute during which the speed of the taxi is slower than the brea-even point. The break-even point is the speed at which a fifth of a mile is driven in one minute. We are assuming the taxi goes at constant speed.
I am trying to write an algorithm or function that takes in the total distance and total time travelled by taxi and outputs the fare. However, I am having difficulty factoring in the delay charge.
So if the taxi is going at constant speed then ideally it would travel x miles during the time given (constant speed * time interval).
If we subtract the actual miles travelled from this value, we would get the number of "wasted" miles that could have been travelled but were not.
And then I lose the train of thought and I am not sure where to go from there. Any help/hint would be appreciated.
There is no single output to this problem when given only total_time and total_distance. I will
show two instances resulting in same total_time and total_distance but different total fares.
Instance 1:
1st min: travels 0.4 mile; fare = 3.1+0.5 = 3.6
2nd min: waits at signal; fare = 0.5 for waiting a min at speed less than break even point
3rd min: travels 0.8 mile; fare = 0.5*4 = 2
total_fare = 3.6 + 0.5 + 2 = 6.1
Instance 2:
1st min: travels 0.4 mile; fare = 3.1+0.5 = 3.6
3rd min: travels 0.4 mile; fare = 0.5*2 = 1
3rd min: travels 0.4 mile; fare = 0.5*2 = 1
total_fare = 3.6 + 1 + 1 = 5.6
However, in both cases total_distance = 1.2 mile and total_time = 3 min but the fares came out to be different.
If I understand correctly, this should work. If this isn't correct, please provide example inputs and outputs so I know when I have it right. The example code is in JavaScript.
interactive example
Constants:
var baseRate = 3.10;
var mileRate = 0.50;
var minuteRate = 0.50;
function fare(miles, minutes) {
var n = baseRate, mode;
Under a fifth of a mile, so base rate
if (miles <= 1/5) {
n = baseRate;
}
Per mile charge is added to the base rate. We subtract the 1/5 miles first.
else if (miles < minutes / 5) {
miles -= 1/5;
n += mileRate * miles;
}
We went less than a fifth of a mile per minute, so charge the per-minute rate.
else {
n += minutes * minuteRate;
}
return n;
}
If you are calculating the charge at the very end of the trip, then you just need to calculate the overall speed, and nominal time based on the break-even rule.
For one scenario, let's say the trip took 30 mins for 10 miles.
t = 30m
d = 10mi
s = 10/30 = 1/3 mi/m which is > (1/5 mi/m)
There are no delays so the the cost is based on miles.
cost = $3.10 + (10mi - 1/5mi) * $0.50 = $8.00
For another scenario, let's say the trip took 60 mins for 10 miles.
t = 60m
d = 10mi
s = 10/60 = 1/6 mi/m which is < (1/5 mi/m)
The nominal time for this trip is 50 minutes (10 / (1/5)), so there is a 10m delay, added to the charge.
cost = $3.10 + (10mi - 1/5mi) * $0.50 + (10m * $0.50) = $13

Algorithm for nice graph labels for time/date axis?

I'm looking for a "nice numbers" algorithm for determining the labels on a date/time value axis. I'm familiar with Paul Heckbert's Nice Numbers algorithm.
I have a plot that displays time/date on the X axis and the user can zoom in and look at a smaller time frame. I'm looking for an algorithm that picks nice dates to display on the ticks.
For example:
Looking at a day or so: 1/1 12:00, 1/1 4:00, 1/1 8:00...
Looking at a week: 1/1, 1/2, 1/3...
Looking at a month: 1/09, 2/09, 3/09...
The nice label ticks don't need to correspond to the first visible point, but close to it.
Is anybody familiar with such an algorithm?
The 'nice numbers' article you linked to mentioned that
the nicest numbers in decimal are 1, 2, 5 and all power-of-10 multiples of these numbers
So I think for doing something similar with date/time you need to start by similarly breaking down the component pieces. So take the nice factors of each type of interval:
If you're showing seconds or minutes use 1, 2, 3, 5, 10, 15, 30
(I skipped 6, 12, 15, 20 because they don't "feel" right).
If you're showing hours use 1, 2, 3, 4, 6, 8, 12
for days use 1, 2, 7
for weeks use 1, 2, 4 (13 and 26 fit the model but seem too odd to me)
for months use 1, 2, 3, 4, 6
for years use 1, 2, 5 and power-of-10 multiples
Now obviously this starts to break down as you get into larger amounts. Certainly you don't want to do show 5 weeks worth of minutes, even in "pretty" intervals of 30 minutes or something. On the other hand, when you only have 48 hours worth, you don't want to show 1 day intervals. The trick as you have already pointed out is finding decent transition points.
Just on a hunch, I would say a reasonable crossover point would be about twice as much as the next interval. That would give you the following (min and max number of intervals shown afterwards)
use seconds if you have less than 2 minutes worth (1-120)
use minutes if you have less than 2 hours worth (2-120)
use hours if you have less than 2 days worth (2-48)
use days if you have less than 2 weeks worth (2-14)
use weeks if you have less than 2 months worth (2-8/9)
use months if you have less than 2 years worth (2-24)
otherwise use years (although you could continue with decades, centuries, etc if your ranges can be that long)
Unfortunately, our inconsistent time intervals mean that you end up with some cases that can have over 1 hundred intervals while others have at most 8 or 9. So you'll want to pick the size of your intervals such than you don't have more than 10-15 intervals at most (or less than 5 for that matter). Also, you could break from a strict definition of 2 times the next biggest interval if you think its easy to keep track of. For instance, you could use hours up to 3 days (72 hours) and weeks up to 4 months. A little trial and error might be necessary.
So to go back over, choose the interval type based on the size of your range, then choose the interval size by picking one of the "nice" numbers that will leave you with between 5 and about 15 tick marks. Or if you know and/or can control the actual number of pixels between tick marks you could put upper and lower bounds on how many pixels are acceptable between ticks (if they are spaced too far apart the graph may be hard to read, but if there are too many ticks the graph will be cluttered and your labels may overlap).
Have a look at
http://tools.netsa.cert.org/netsa-python/doc/index.html
It has a nice.py ( python/netsa/data/nice.py ) which i think is stand-alone, and should work fine.
Still no answer to this question... I'll throw my first idea in then! I assume you have the range of the visible axis.
This is probably how I would do.
Rough pseudo:
// quantify range
rangeLength = endOfVisiblePart - startOfVisiblePart;
// qualify range resolution
if (range < "1.5 day") {
resolution = "day"; // it can be a number, e.g.: ..., 3 for day, 4 for week, ...
} else if (range < "9 days") {
resolution = "week";
} else if (range < "35 days") {
resolution = "month";
} // you can expand this in both ways to get from nanoseconds to geological eras if you wish
After that, it should (depending on what you have easy access to) be quite easy to determine the value to each nice label tick. Depending on the 'resolution', you format it differently. E.g.: MM/DD for "week", MM:SS for "minute", etc., just like you said.
[Edit - I expanded this a little more at http://www.acooke.org/cute/AutoScalin0.html ]
A naive extension of the "nice numbers" algorithm seems to work for base 12 and 60, which gives good intervals for hours and minutes. This is code I just hacked together:
LIM10 = (10, [(1.5, 1), (3, 2), (7, 5)], [1, 2, 5])
LIM12 = (12, [(1.5, 1), (3, 2), (8, 6)], [1, 2, 6])
LIM60 = (60, [(1.5, 1), (20, 15), (40, 30)], [1, 15, 40])
def heckbert_d(lo, hi, ntick=5, limits=None):
'''
Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
'''
if limits is None:
limits = LIM10
(base, rfs, fs) = limits
def nicenum(x, round):
step = base ** floor(log(x)/log(base))
f = float(x) / step
nf = base
if round:
for (a, b) in rfs:
if f < a:
nf = b
break
else:
for a in fs:
if f <= a:
nf = a
break
return nf * step
delta = nicenum(hi-lo, False)
return nicenum(delta / (ntick-1), True)
def heckbert(lo, hi, ntick=5, limits=None):
'''
Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
'''
def _heckbert():
d = heckbert_d(lo, hi, ntick=ntick, limits=limits)
graphlo = floor(lo / d) * d
graphhi = ceil(hi / d) * d
fmt = '%' + '.%df' % max(-floor(log10(d)), 0)
value = graphlo
while value < graphhi + 0.5*d:
yield fmt % value
value += d
return list(_heckbert())
So, for example, if you want to display seconds from 0 to 60,
>>> heckbert(0, 60, limits=LIM60)
['0', '15', '30', '45', '60']
or hours from 0 to 5:
>>> heckbert(0, 5, limits=LIM12)
['0', '2', '4', '6']
I'd suggest you grab the source code to gnuplot or RRDTool (or even Flot) and examine how they approach this problem. The general case is likely to be N labels applied based on width of your plot, which some kind of 'snapping' to the nearest 'nice' number.
Every time I've written such an algorithm (too many times really), I've used a table of 'preferences'... ie: based on the time range on the plot, decide if I'm using Weeks, Days, Hours, Minutes etc as the main axis point. I usually included some preferred formatting, as I rarely want to see the date for each minute I plot on the graph.
I'd be happy but surprised to find someone using a formula (like Heckbert does) to find 'nice', as the variation in time units between minutes, hours, days, and weeks are not that linear.
In theory you can also change your concept. Where it is not your data at the center of the visualization, but at the center you have your scale.
When you know the start and the end of the dates of your data, you can create a scale with all dates and dispatch you data in this scale. Like a fixed scales.
You can have a scale of type year, month, day, hours, ... and limit the scaling just to these scales, implying you remove the concept of free scaling.
The advantage is to can easily show dates gaps. But if you have a lot of gaps, that can become also useless.

Resources