An algorithm for economic simulation?

An algorithm for economic simulation? - algorithm

I would like to create a game where the players create differents products with different prices (call it offers), and I give them a certain number of customers (call it demands).
Now, I want an algorithm to decid what's the market share of each players. Of course, I could just make mine right now, using random. But before doing this, I prefer to ask, because I'm sure that's a lot of people already tried to do this before me!
My question is not really precise, it's because your answer doesn't need to be precise too ;)
Thank you in advance!

It really depends on the variables you have set up, and the kind of "market" you want to create. You could start with the following simple formula below (which fundamentally reduces market share to a question of profits) as a start, and I'll go through what I mean by "kind of market" after.
marketShare = totalCompanyProfit/allMoneyInProductCategory;
marketShare = ( (productSalePrice * demand)-(productManufactureCost * supply) ) / allMoneyInProductCategory;
It gets interesting here because the "kind of market" is determined by your definition of demand. For example, say the product was ferraris, and the market you were trying to simulate was the Republic of Congo, which had a GDP of $189/capita.
targetMarketSize = (percentOfFittingDemographic * totalPopulation)
percentWhoHateYourProduct = AVERAGE( ( ABS(productVariable1 - variableIdeal1) / variableIdeal1 ), ( ABS(productVariable2 - variableIdeal2) / variableIdeal2 ), etc )
demand = (targetMarketSize) * ( 1- percentWhoHateYourProduct )
percentOfFittingDemographic is the percent of the population which fits into the demographic which would buy such a product (i.e people with enough disposable income to afford $100,000 car), which in the Congo example above could be something like .001 .
The average of the absolute value of the difference of certain product attributes (productVariable) from their ideals (variableIdeals) over their ideals give the % of the population which are going to be turned off by the product not being what they want. Subtracting that from 1 gives the percent of people who DO want to buy your product, and multiplying that by targetMarketSize gives you the people who want to buy your product- ie demand. If the product is perfect, it becomes the average of 0's, and the whole target market becomes a user of the product.
One could also add weighting to the average to say, for example, that the market prefers a lower price over a bigger screen size. To imply that more of one attribute increases the desire for the product in a population (i.e instead of "one month of free service", you give away "6 months of free service", and people want it more), you could add it into the average with
percentLikesProductNow = 1 - e^(-1 * infinitelyLikedAttribute)
This goes from 0% at infinitelyLikedAttribute=0, to about 0.005% at infintelyLikedAttribute=10, so you could play around and find a way to "scale" that attribute to roughly be between 1 and 10. This does sort of make sense with real life, because there are products I would never have bought if they didnt have a free trial. For example: 3 free months of verizon internet. I would have probably gone with comcast otherwise, as I only was living there for 6 months, but saving 100 bucks was pretty big at the time. At the other extreme however, if verizon were to offer me 100 free years of internet, another 50 extra years on top of that (assuming it's not transferable, etc) really doesn't add much more to the attractiveness of the offer.
You can always multiply all of these things with a random number generator as well, to maybe give it a +/- 15% variance, and keep everyone guessing :)
I hope this was even remotely useful :)

Related

Algorithm development and optimization

I have this problem:
You need to develop a meal regime based on the data entered by the user. We have a database with the meals and their prices (meals also have a mark whether they are breakfast, cause on breakfast we eat smth different from lunch and dinner most often). The input receives the amount of money (Currency is not important) and the number of days. At the output, we must get a meal regime for a given number of days. Conditions:
Final price does not differ from the given one by more than 3%.
meals mustn't repeat more than once every 5 days.
I found this not effective solution: We are looking for an average price per day = amount of money / number of days. Then, until we reach the given number of days, we iterate throught each breakfast, then lunch and dinner (3 for loops, 2 are nested) and if price is not too different, then we end the search and add this day to the result list. So the design now looks like this:
while(daysCounter < days){
for(){
for(){
for(){
}
}
It looks scary, although there is not a lot of data (number of meals is about 150). There are thoughts that it is possible to find a more effective solution. Also i think about dynamic programming, but so far there are no ideas how to implement it.

Dynamic programming won't work because a necessary part of your state is the meals from the last 5 days. And the number of possibilities for that are astronomical.
However it is likely that there are many solutions, not just a few. And that it is easy to find a working solution by being greedy. Also that an existing solution can be improved fairly easily.
So I'd solve it like this. Replace days with an array of meals. Take the total you want to spend. Divide it up among your meals, in proportion to the median price of the options for that meal. (Breakfast is generally cheaper than dinner.) And now add up that per meal cost to get running totals.
And now for each meal choose the meal you have not had in the last 5 days that brings the running total of what has been spent as close as possible to the ideal total. Choose all of your meals, one at a time.
This is the greedy approach. Normally, but not always, it will come fairly close to the target.
And now, to a maximum of n tries or until you have the target within 3%, pick a random meal, consider all options that are not eaten within the last or next 5 days, and randomly pick one (if any such options exist) that brings the overall amount spent closer to the target.
(For a meal plan that is likely to vary more over a long period, I'd suggest trying simulated annealing. That will produce more interesting results, but it will also take longer.)

Pure logic: how to get number of shares, knowing part and price of the share

My question is not about programming languages but definetly about programming.
I have a model portfolio with shares:
Part Code Price, $ Number of shares in portfolio
23,80% CSIQ 24,91 ?
18,90% TSL 10,52 ?
11,20% JKS 24,40 ?
10,70% YGE 2,90 ?
35,40% DQ 26,05 ?
I need to calculate minimum number of shares that should be in portfolio so that part of that share in portfolio would equal to part in model portfolio.
Just imagine that you want to purchase such portfolio in real world. How many of each stocks should you buy, to get desired part (which is shown in model portfolio). I can't buy non-integer number of shares and part in recalculated (after purchase) portfolio should equal part in model portfolio.
Example: I need to get portfolio with 50.0% in Google ($500 per share) and 50.0% in Apple ($700 per share). Solution is 5 shares of Apple (total value $3500) and 7 shares of Google (total value $3500).

Let us expand on the approach devised in the comments.
The first step is to choose a share to be a reference point; this can be any, so we'll go with the first one, CSIQ. Let us say then that we will purchase one share of this, so we now know that 23.8% of the portfolio is worth $24.91.
For the second share, this is now the problem we have:
Part Code Price, $ Number of shares in portfolio
23,80% CSIQ 24,91 1
18,90% TSL 10,52 ?
Since we know the value of a fraction of the portfolio, let us work out what the whole portfolio would be:
total_value = (100 / 23.8) * 24.91
= $104.663865546
That means the amount we can spend on TSL is:
tsl_value = 104.663865546 * (18.8/100)
= $19.676806723
We know how much a TSL share costs, so we must buy a non-integer amount of this share:
share_amount = 19.676806723/10.52
= 1.87041908
You can then go through each share in the same way, and end up with a portfolio in the desired ratios.
If you already own a number of shares in one stock, you can modify the algorithm but instead of starting with 1 share, you start with X shares - multiply everything by X and it will still work.
After you added the constraint that shares can only be purchased in integer amounts, I would suggest that you use the X multiplier approach above, coupled with rounding share amounts to the closest integer. As you increase X exponentially (10, 100, etc) your level of inaccuracy due to rounding will get progressively smaller.
As I suggested in the comments, you could build this in a spreadsheet first and determine the level of inaccuracy for inputs of X. Of course, if you plan to actually buy these shares, X is constrained by the amount of money you have; conversely if it is theoretical you can make it 6 or 7 figures and achieve good levels of accuracy.

Simulating amazon.com best seller for books

I was just going through amazon.com and an interesting thing that caught my eye is how they calculate best sells in books.
I was thinking of writing a sample program to calculate this. I was thinking that suppose i am calculating best sellers for the month than just sum the sales count of the individual books and show the top 10. Is it ok or am I missing something?
EDIT
One more interesting thing can happen: suppose one book having id1 was sold 10 pieces on first day but after that it has not been sold but book having id2 is getting sold for 1 or 2 pieces regularly. So how it would affect the best seller calculation. Thanks.

Sounds about right. Depends on how exactly you want to define it.
"best sellers" is the number of units sold.
Another way to do it, if you don't want to fix it to one month is to have some distribution function (like square decay, t^2) and add the counts weighted by the distribution function.
This way, even though you don't have a fixed timed window you look at both new comers and old books. Your function should look like this:
for a_book in books:
score = 0
for a_sale in sales[a_book]:
score += 1 / (days(now() - a_sale.time()) ** 2) # pow 2
I think you get the idea. You can try different functions like exp(days) or different powers. Experiment and see what makes sense for you.

slot machine payout calculation

There's this question but it has nothing close to help me out here.
Tried to find information about it on the internet yet this subject is so swarmed with articles on "how to win" or other non-related stuff that I could barely find anything. None worth posting here.
My question is how would I assure a payout of 95% over a year?
Theoretically, of course.
So far I can think of three obvious variables to consider within the calculation: Machine payout term (year in my case), total paid and total received in that term.
Now I could simply shoot a random number between the paid/received gap and fix slots results to be shown to the player but I'm not sure this is how it's done.
This method however sounds reasonable, although it involves building the slots results backwards..
I could also make a huge list of all possibilities, save them in a database randomized by order and simply poll one of them each time.
This got many flaws - the biggest one is the huge list I'm going to get (millions/billions/etc' records).
I certainly hope this question will be marked with an "Answer" (:

You have to make reel strips instead of huge database. Here is brief example for very basic 3-reel game containing 3 symbols:
Paytable:
3xA = 5
3xB = 10
3xC = 20
Reels-strip is a sequence of symbols on each reel. For the calculations you only need the quantity of each symbol per each reel:
A = 3, 1, 1 (3 symbols on 1st reel, 1 symbol on 2nd, 1 symbol on 3rd reel)
B = 1, 1, 2
C = 1, 1, 1
Full cycle (total number of all possible combinations) is 5 * 3 * 4 = 60
Now you can calculate probability of each combination:
3xA = 3 * 1 * 1 / full cycle = 0.05
3xB = 1 * 1 * 2 / full cycle = 0.0333
3xC = 1 * 1 * 1 / full cycle = 0.0166
Then you can calculate the return for each combination:
3xA = 5 * 0.05 = 0.25 (25% from AAA)
3xB = 10 * 0.0333 = 0.333 (33.3% from BBB)
3xC = 20 * 0.0166 = 0.333 (33.3% from CCC)
Total return = 91.66%
Finally, you can shuffle the symbols on each reel to get the reels-strips, e.g. "ABACA" for the 1st reel. Then pick a random number between 1 and the length of the strip, e.g. 1 to 5 for the 1st reel. This number is the middle symbol. The upper and lower ones are from the strip. If you picked from the edge of the strip, use the first or last one to loop the strip (it's a virtual reel). Then score the result.
In real life you might want to have Wild-symbols, free spins and bonuses. They all are pretty complicated to describe in this answer.
In this sample the Hit Frequency is 10% (total combinations = 60 and prize combinations = 6). Most of people use excel to calculate this stuff, however, you may find some good tools for making slot math.
Proper keywords for Google: PAR-sheet, "slot math can be fun" book.
For sweepstakes or Class-2 machines you can't use this stuff. You have to display a combination by the given prize instead. This is a pretty different task, so you may try to prepare a database storing the combinations sorted by the prize amount.

Well, the first problem is with the keyword assure, if you are dealing with random, you cannot assure, unless you change the logic of the slot machine.
Consider the following algorithm though. I think this style of thinking is more reliable then plotting graphs of averages to achive 95%;
if( customer_able_to_win() )
{
calculate_how_to_win();
}
else
no_win();
customer_able_to_win() is your data log that says how much intake you have gotten vs how much you have paid out, if you are under 95%, payout, then customer_able_to_win() returns true; in that case, calculate_how_to_win() calculates how much the customer would be able to win based on your %, so, lets choose a sampling period of 24 hours. If over the last 24 hours i've paid out 90% of the money I've taken in, then I can pay out up to 5%.... lets give that 5% a number such as 100$. So calculate_how_to_win says I can pay out up to 100$, so I would find a set of reels that would pay out 100$ or less, and that user could win. You could add a little random to it, but to ensure your 95% you'll have to have some other rules such as a forced max payout if you get below say 80%, and so on.
If you change the algorithm a little by adding random to the mix you will have to have more of these caveats..... So to make it APPEAR random to the user, you could do...
if( customer_able_to_win() && payout_percent() < 90% )
{
calculate_how_to_win(); // up to 5% payout
}
else
no_win();
With something like that, it will go on a losing streak after you hit 95% until you reach 90%, then it will go on a winning streak of random increments until you reach 95%.
This isn't a full algorithm answer, but more of a direction on how to think about how the slot machine works.
I've always envisioned this is the way slot machines work especially with video poker. Because the no_win() function would calculate how to lose, but make it appear to be 1 card off to tease you to think you were going to win, instead of dealing with a 'fair' game and the random just happens to be like that....
Think of the entire process of.... first thinking if you are going to win, how are you going to win, if you're not going to win, how are you going to lose, instead of random number generators determining if you will win or not.

I worked many years ago for an internet casino in Australia, this one being the only one in the world that was regulated completely by a government body. The algorithms you speak of that produce "structured randomness" are obviously extremely complex especially when you are talking multiple lines in all directions, double up, pick the suit, multiple progressive jackpots and the like.
Our poker machine laws for our state demand a payout of 97% of what goes in. For rudely to be satisfied that our machine did this, they made us run 10 million mock turns of the machine and then wanted to see that our game paid off at what the law states with the tiniest range of error (we had many many machines running a script to auto playing using a script to simulate the click for about a week before we hit the 10 mil).
Anyhow the algorithms you speak of are EXPENSIVE! They range from maybe $500k to several million per machine so as you can understand, no one is going to hand them over for free, that's for sure. If you wanted a single line machine it would be easy enough to do. Just work out you symbols/cards and what pay structure you want for each. Then you could just distribute those payouts amongst non-payouts till you got you respective figure. Obviously the more options there are means the longer it will take to pay out at that respective rate, it may even payout more early in the piece. Hit frequency and prize size are also factors you may want to consider

A simple way to do it, if you assume that people win a constant number of times a time period:
Create a collection of all possible tumbler combinations with how much each one pays out.
The first time someone plays, in that time period, you can offer all combinations at equal probability.
If they win, take that amount off the total left for the time period, and remove from the available options any combination that would payout more than you have left.
Repeat with the reduced combinations until all the money is gone for that time period.
Reset and start again for the next time period.

Understanding algorithms for measuring trends

What's the rationale behind the formula used in the hive_trend_mapper.py program of this Hadoop tutorial on calculating Wikipedia trends?
There are actually two components: a monthly trend and a daily trend. I'm going to focus on the daily trend, but similar questions apply to the monthly one.
In the daily trend, pageviews is an array of number of page views per day for this topic, one element per day, and total_pageviews is the sum of this array:
# pageviews for most recent day
y2 = pageviews[-1]
# pageviews for previous day
y1 = pageviews[-2]
# Simple baseline trend algorithm
slope = y2 - y1
trend = slope * log(1.0 +int(total_pageviews))
error = 1.0/sqrt(int(total_pageviews))
return trend, error
I know what it's doing superficially: it just looks at the change over the past day (slope), and scales this up to the log of 1+total_pageviews (log(1)==0, so this scaling factor is non-negative). It can be seen as treating the month's total pageviews as a weight, but tempered as it grows - this way, the total pageviews stop making a difference for things that are "popular enough," but at the same time big changes on insignificant don't get weighed as much.
But why do this? Why do we want to discount things that were initially unpopular? Shouldn't big deltas matter more for items that have a low constant popularity, and less for items that are already popular (for which the big deltas might fall well within a fraction of a standard deviation)? As a strawman, why not simply take y2-y1 and be done with it?
And what would the error be useful for? The tutorial doesn't really use it meaningfully again. Then again, it doesn't tell us how trend is used either - this is what's plotted in the end product, correct?
Where can I read up for a (preferably introductory) background on the theory here? Is there a name for this madness? Is this a textbook formula somewhere?
Thanks in advance for any answers (or discussion!).

As the in-line comment goes, this is a simple "baseline trend algorithm",
which basically means before you compare the trends of two different pages, you have to establish
a baseline. In many cases, the mean value is used, it's straightforward if you
plot the pageviews against the time axis. This method is widely used in monitoring
water quality, air pollutants, etc. to detect any significant changes w.r.t the baseline.
In OP's case, the slope of pageviews is weighted by the log of totalpageviews.
This sorta uses the totalpageviews as a baseline correction for the slope. As Simon put it, this puts a balance
between two pages with very different totalpageviews.
For exmaple, A has a slope 500 over 1000,000 total pageviews, B is 1000 over 1,000.
A log basically means 1000,000 is ONLY twice more important than 1,000 (rather than 1000 times).
If you only consider the slope, A is less popular than B.
But with a weight, now the measure of popularity of A is the same as B. I think it is quite intuitive:
though A's pageviews is only 500 pageviews, but that's because it's saturating, you still gotta give it enough credit.
As for the error, I believe it comes from the (relative) standard error, which has a factor 1/sqrt(n), where
n is the number of data points. In the code, the error is equal to (1/sqrt(n))*(1/sqrt(mean)).
It roughly translates into : the more data points, the more accurate the trend. I don't see
it is an exact math formula, just a brute trend analysis algorithm, anyway the relative
value is more important in this context.
In summary, I believe it's just an empirical formula. More advanced topics can be found in some biostatistics textbooks (very similar to monitoring the breakout of a flu or the like.)

The code implements statistics (in this case the "baseline trend"), you should educate yourself on that and everything becomes clearer. Wikibooks has a good instroduction.
The algorithm takes into account that new pages are by definition more unpopular than existing ones (because - for example - they are linked from relatively few other places) and suggests that those new pages will grow in popularity over time.
error is the error margin the system expects for its prognoses. The higher error is, the more unlikely the trend will continue as expected.

The reason for moderating the measure by the volume of clicks is not to penalise popular pages but to make sure that you can compare large and small changes with a single measure. If you just use y2 - y1 you will only ever see the click changes on large volume pages. What this is trying to express is "significant" change. 1000 clicks change if you attract 100 clicks is really significant. 1000 click change if you attract 100,000 is less so. What this formula is trying to do is make both of these visible.
Try it out at a few different scales in Excel, you'll get a good view of how it operates.
Hope that helps.

another way to look at it is this:
suppose your page and my page are made at same day, and ur page gets total views about ten million, and mine about 1 million till some point. then suppose the slope at some point is a million for me, and 0.5 million for you. if u just use slope, then i win, but ur page already had more views per day at that point, urs were having 5 million, and mine 1 million, so that a million on mine still makes it 2 million, and urs is 5.5 million for that day. so may be this scaling concept is to try to adjust the results to show that ur page is also good as a trend setter, and its slope is less but it already was more popular, but the scaling is only a log factor, so doesnt seem too problematic to me.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio