I am working on a project where my task is to find the % price change given 2 time frames for 100+ stocks in an efficient way.
The time frames are pre defined and can only be 5 mins, 10 mins, 30 mins, 1 hour, 4 hour, 12 hour, 24 hours.
Given a time frame, I need a way to efficiently figure out the % price change of all the stocks that I am tracking.
As per the current implementation, I am getting price data for those stocks every second and dumping the data to a price table.
I have another cron job which updates the % change of the stock based on the values in the price table every few seconds.
The solution is kind of in working state but its not efficient. Is there any data structure/ algorithm that I can use to find the % change efficiently?
Related
I am building an agent-based model for product usage. I am trying to develop a function to decide whether the user is using the product at a given time, while incorporating randomness.
So, say we know the user spends a total of 1 hour per day using the product, and we know the average distribution of this time (e.g., most used at 6-8pm).
How can I generate a set of usage/non-usage times (i.e., during each 10 minute block is the user active or not) while ensuring that at the end of the day the total active time sums to one hour.
In most cases I would just run the distributor without concern for the total, and then at the end normalize by making it proportional to the total target time so the total was 1 hour. However, I can't do that because time blocks must be 10 minutes. I think this is a different question because I'm really not computing time ranges, I'm computing booleans to associate with different 10 minute time blocks (e.g., the user was/was not active during a given block).
Is there a standard way to do this?
I did some more thinking and figured it out, if anyone else is looking at this.
The approach to take is this: You know the allowed number n of 10-minute time blocks for a given agent.
Iterate n times, and on each iteration select a time block out of the day subject to your activity distribution function.
Main point is to iterate over the number of time blocks you want to place, not over the entire day.
I got a task at work to evenly schedule commercial timed items into pre-defined commercial breaks (containers).
Each campaign has a set of commercials with or without spreading order. I need to allow users to chose multiple campaigns and distribute all the commercials to best fit the breaks within a time window.
Example of Campaign A:
Item | Duration | # of times to schedule | order
A1 15 sec 2 1
A2 25 sec 2 2
A3 30 sec 2 3
Required outcome:
each item should appear only once in a break, no repeating.
if there is specific order try to best fit by keeping the order. If
no order shuffle it.
At the end of the process the breaks should contain evenly amount of
commercial time.
Ideal spread would fully fill all desired campaigns into the breaks.
For example: Campaign {Item,Duration,#ofTimes,Order}
Campaign A which has set {A1,15,2,1},{A2,25,2,2},{A3,10,1,3}
Campaign B which has set {B1,20,2,2},{B2,35,3,1},
Campaign C which has set {C1,10,1,1},{C2,15,2,3 sec},{C3,15,1,2 sec}
,{C4,40,1,4}
A client will choose to schedule those campaigns in a specific date that hold 5 breaks of 60 second each.
A good outcome would result in:
Container 1: {A1,15}{B2,35}{C1,10} total of 60 sec
Container 2: {C3,15}{A2,25}{B1,20} total of 60 sec
Container 3: {A3,10}{C2,15}{B2,35} total of 60 sec
Container 4: {C4,40}{B1,20} total of 60 sec
Container 5: {C2,15}{A3,10}{B3,35} total of 60 sec
Of course it's rarely that all will fit so perfectly in real-life examples.
There are so many combinations with large amount of items and I'm not sure how to go about it. The order of items inside a break needs to be dynamically calculated so that the end result would best fit all the items into the breaks.
If the architecture is poor and someone has a better idea (like giving priority to items over order and schedule based on priority or such I'll be glad to hear).
It seems like simulated annealing might be a good way to approach this problem. Just incorporate your constraints of: keeping order, even spreading and fitting into 60sec frame into the scoring function. Your random neighbor function might just swap 2 items with each other | move an item to a different frame.
I have a database with stock values in a table, for example:
id - unique id for this entry
stockId - ticker symbol of the stock
value - price of the stock
timestamp - timestamp of that price
I would like to create separate arrays for a timeframe of 24 hour, 7 days and 1 month from my database entries, each array containing datapoints for a stock chart.
For some stockIds, I have just a few data points per hour, for others it could be hundreds or thousands.
My question:
What is a good algorithm to "aggregate" the possibly many datapoints into a few - for example, for the 24 hours chart I would like to have at a maximum 10 datapoints per hour. How do I handle exceptionally high / low values?
What is the common approach in regards to stock charts?
Thank you for reading!
Some options: (assuming 10 points per hour, i.e. one roughly every 6 minutes)
For every 6 minute period, pick the data point closest to the centre of the period
For every 6 minute period, take the average of the points over that period
For an hour period, find the maximum and minimum for each 4 minutes period and pick the 5 maximum and 5 minimum in these respective sets (4 minutes is somewhat randomly chosen).
I originally thought to pick the 5 minimum points and the 5 maximum points such that each maximum point is at least 8 minutes apart, and similarly for minimum points.
The 8 minutes here is so we don't have all the points stacked up on each other. Why 8 minutes? At an even distribution, 60/5 = 12 minutes, so just moving a bit away from that gives us 8 minutes.
But, in terms of the implementation, the 4 minutes approach will be much simpler and should give similar results.
You'll have to see which one gives you the most desirable results. The last one is likely to give a decent indication of variation across the period, whereas the second one is likely to have a more stable graph. The first one can be a bit more erratic.
I have a dataset containing > 100,000 records where each record has a timestamp.
This dataset has been aggregated from several "controller" nodes which each collect their data from a set of children nodes. Each controller collects these records periodically, (e.g. once every 5 minutes or once every 10 minutes), and it is the controller that applies the timestamp to the records.
E.g:
Controller One might have 20 records timestamped at time t, 23 records timestamped at time t + 5 minutes, 33 records at time t + 10 minutes.
Controller Two might have 30 records timestamped at time (t + 2 minutes) + 10 minutes, 32 records timestamped at time (t + 2 minutes) + 20 minutes, 41 records timestamped at time (t + 2 minutes) + 30 minutes etcetera.
Assume now that the only information you have is the set of all timestamps and a count of how many records appeared at each timestamp. That is to say, you don't know i) which sets of records were produced by which controller, ii) the collection interval of each controller or ii) the total number of controllers. Is there an algorithm which can decompose the set of all timestamps into individual subsets such that the variance in difference between consecutive (ordered) elements of each given subset is very close to 0, while adding any element from one subset i to another subset j would increase this variance? Keep in mind, for this dataset, a single controller's "periodicity" could fluctuate by +/- a few seconds because of CPU timing/network latency etc.
My ultimate objective here is to establish a) how many controllers there are and b) the sampling interval of each controller. So far I've been thinking about the problem in terms of periodic functions, so perhaps there are some decomposition methods from that area that could be useful.
The other point to make is that I don't need to know which controller each record came from, I just need to know the sampling interval of each controller. So e.g. if there were two controllers that both started sampling at time u, and one sampled at 5-minute intervals and the other at 50-minute intervals, it would be hard to separate the two at the 50-minute mark because 5 is a factor of 50. This doesn't matter, so long as I can garner enough information to work out the intervals of each controller despite these occasional overlaps.
One basic approach would be to perform an FFT decomposition (or, if you're feeling fancy, a periodogram) of the dataset and look for peaks in the resulting spectrum. This will give you a crude approximation of the periods of the controllers, and may even give you an estimate of their number (and by looking at the height of the peaks, it can tell you how many records were logged).
Let's imagine that we have high traffic project (a tube site) which should provide sorting using this options (NOT IN REAL TIME). Number of videos is about 200K and all information about videos is stored in MySQL. Number of daily video views is about 1.5KK. As instruments we have Hard Disk Drive (text files), MySQL, Redis.
Views
top viewed
top viewed last 24 hours
top viewed last 7 days
top viewed last 30 days
top rated last 365 days
How should I store such information?
The first idea is to log all visits to text files (single file per hour, for example visits_20080101_00.log). At the beginning of each hour calculate views per video for previous hour and insert this information into MySQL. Then recalculate totals (for last 24 hours) and update statistics in tables. At the beginning of every day we have to do the same but recalculate for last 7 days, last 30 days, last 365 days. This method seems to be very poor for me because we have to store information about last 365 days for each video to make correct calculations.
Is there any other good methods? Probably, we have to choose another instruments for this?
Thank you.
If absolute precision is not important, you could summarize the information that is longer than 2 units back.
You would store the individual views for the last 1-2 hours, the hourly views (one value per hour) for the last 1-2 days, and the daily views (one value per day) further.
"1-2" means that you store until you have two units full, then summarize the earlier of them.