i'll explain the matter.
introduction
I have a database containing some trasactions between people.
These trasanction contains lots of data but for the sake of this algorithm it's only needed 3 things : value of transaction, user giving money and user receiving money (I'm a student trying to build on my own a real world project for a group of friends).
Steps
This is not a real time processing but needs to be done only one or two time over long period of time.
The input to this algorithm will be the simplified transaction list and it'll get a map of what user gave and what user received (eg from this [Mark, Sophie, 20], [John, Mark, 30] to this [Mark : -10 , Sophie : -20, John : +30])
After that i'd like to get the most efficient way reset all the members of the dataset to 0. (in that case it'd be mark give 10 to john and sophie give 20 to john, but for a larger quantity of transaction there will be optimal ways and sub-optimal ways).
What i thought
At first thought i think a greedy way is to get the max and the min and "equalize them" as result one (or at best both) should be 0 and going on like that untill there are none value left (the 0 value will be removed from the dataset not to loop or there will be a check if a value is not 0, probably the former).
I hope i explained it well, if not feel free to ask more. Is there a better way to minimize the numbers of transaction to equalize the dataset?
Thanks all for your attention.
during my model set up on innovation diffusion, another little programming issue occured to me in NetLogo. I would like to model that people more likely learn from people they are alike. Therfore the model considers an ability value that allocated to each agent:
[set ability random 20 ]
In the go procedure I then want them to compare their own ability value with the values from their linked neighbors.
So for example: ability of turtle1 = 5, ability of neighbor1 = 10, ability of neighbor2 = 4. Hence the (absoulute) differences are [ 5, 1]. Hence he will learn more from neighbor2 than from neighbor1.
But I don't know how to approach the problem of asking each single neighbor for the difference. As a first idea, I thought of doing it via a list-variable like [difference1, ..., difference(n)].
So far I only got an aggregated approach using average values, but this is not really consistent with recent social learning theory and might overlay situtations on which the agent has many different neighbors but one who is quite similar to him:
ask turtles
[
set ability random 20
set ability-of-neighbor (sum [ability] of link-neighbors / count link-neighbors)
set neighbor-coefficient (abs (ability - ability-of-neighbor))
;;the smaller the coefficient the more similar are the neighbors and the more the turtle learns from his neighbor(s)
]
Thank you again for your help and advice, and I really appreciate any comments.
Kind regards,
Moritz
I am having a bit of a time getting my head around what you want but here is a method of ranking link-neighbors.
let link-neighbor-rank sort-on [abs (ability - [ability] of myself)] link-neighbors
it produces a list of link neighbors in ascending order of difference of ability.
if you only want the closest neighbor use
let best min-one-of link-neighbors [abs (ability - [ability] of myself)]
I hope this helps.
I was just going through amazon.com and an interesting thing that caught my eye is how they calculate best sells in books.
I was thinking of writing a sample program to calculate this. I was thinking that suppose i am calculating best sellers for the month than just sum the sales count of the individual books and show the top 10. Is it ok or am I missing something?
EDIT
One more interesting thing can happen: suppose one book having id1 was sold 10 pieces on first day but after that it has not been sold but book having id2 is getting sold for 1 or 2 pieces regularly. So how it would affect the best seller calculation. Thanks.
Sounds about right. Depends on how exactly you want to define it.
"best sellers" is the number of units sold.
Another way to do it, if you don't want to fix it to one month is to have some distribution function (like square decay, t^2) and add the counts weighted by the distribution function.
This way, even though you don't have a fixed timed window you look at both new comers and old books. Your function should look like this:
for a_book in books:
score = 0
for a_sale in sales[a_book]:
score += 1 / (days(now() - a_sale.time()) ** 2) # pow 2
I think you get the idea. You can try different functions like exp(days) or different powers. Experiment and see what makes sense for you.
I would like to create a game where the players create differents products with different prices (call it offers), and I give them a certain number of customers (call it demands).
Now, I want an algorithm to decid what's the market share of each players. Of course, I could just make mine right now, using random. But before doing this, I prefer to ask, because I'm sure that's a lot of people already tried to do this before me!
My question is not really precise, it's because your answer doesn't need to be precise too ;)
Thank you in advance!
It really depends on the variables you have set up, and the kind of "market" you want to create. You could start with the following simple formula below (which fundamentally reduces market share to a question of profits) as a start, and I'll go through what I mean by "kind of market" after.
marketShare = totalCompanyProfit/allMoneyInProductCategory;
marketShare = ( (productSalePrice * demand)-(productManufactureCost * supply) ) / allMoneyInProductCategory;
It gets interesting here because the "kind of market" is determined by your definition of demand. For example, say the product was ferraris, and the market you were trying to simulate was the Republic of Congo, which had a GDP of $189/capita.
targetMarketSize = (percentOfFittingDemographic * totalPopulation)
percentWhoHateYourProduct = AVERAGE( ( ABS(productVariable1 - variableIdeal1) / variableIdeal1 ), ( ABS(productVariable2 - variableIdeal2) / variableIdeal2 ), etc )
demand = (targetMarketSize) * ( 1- percentWhoHateYourProduct )
percentOfFittingDemographic is the percent of the population which fits into the demographic which would buy such a product (i.e people with enough disposable income to afford $100,000 car), which in the Congo example above could be something like .001 .
The average of the absolute value of the difference of certain product attributes (productVariable) from their ideals (variableIdeals) over their ideals give the % of the population which are going to be turned off by the product not being what they want. Subtracting that from 1 gives the percent of people who DO want to buy your product, and multiplying that by targetMarketSize gives you the people who want to buy your product- ie demand. If the product is perfect, it becomes the average of 0's, and the whole target market becomes a user of the product.
One could also add weighting to the average to say, for example, that the market prefers a lower price over a bigger screen size. To imply that more of one attribute increases the desire for the product in a population (i.e instead of "one month of free service", you give away "6 months of free service", and people want it more), you could add it into the average with
percentLikesProductNow = 1 - e^(-1 * infinitelyLikedAttribute)
This goes from 0% at infinitelyLikedAttribute=0, to about 0.005% at infintelyLikedAttribute=10, so you could play around and find a way to "scale" that attribute to roughly be between 1 and 10. This does sort of make sense with real life, because there are products I would never have bought if they didnt have a free trial. For example: 3 free months of verizon internet. I would have probably gone with comcast otherwise, as I only was living there for 6 months, but saving 100 bucks was pretty big at the time. At the other extreme however, if verizon were to offer me 100 free years of internet, another 50 extra years on top of that (assuming it's not transferable, etc) really doesn't add much more to the attractiveness of the offer.
You can always multiply all of these things with a random number generator as well, to maybe give it a +/- 15% variance, and keep everyone guessing :)
I hope this was even remotely useful :)
We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete
I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.
What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.
I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.
Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine
Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.
Take a look at Self organizing lists.
A quick and dirty look:
Move to Front Heuristic:
A linked list, Such that whenever a node is selected, it is moved to the front of the list.
Frequency Heuristic:
A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.
It looks like the move to front implementation would best suit your needs.
EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.
Further Edit:
Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.
The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.
This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.
Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)
I'd go for something similar to:
NoM = Number of Mail
(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30
Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0
It's just an idea, anyway it is to give different importance to the most and just mailed contacts.
If you want to get crazy, mark the most 'active' emails in one of several ways:
Last access
Frequency of use
Contacts with pending sales
Direct bosses
Etc
Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.
It's a lot of work but kind of fun...
Maybe count the number of emails sent to each address. Then:
ORDER BY EmailCount DESC, LastName, FirstName
That way, your most-often-used addresses come first, even if they haven't been used in a few days.
I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).
I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.
Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.
This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.
The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.
In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.
I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).
I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).
Each use:
UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = 'foo#bar.com'
Each interval:
UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)
In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.
The increment and diminisher would have to be adjusted to preference, of course.