Replace every cell in a matrix with the average of adjcent cells - algorithm
Requirement: Must be done in-place.
For example:
Given matrix
1, 2, 3
4, 5, 6
7, 8, 9
Should replace by the average of its sum of 3*3 neighbor cells and its own:
(1+2+4+5)/4, (2+1+3+4+5+6)/6 , (3+2+6+5)/4
(1+2+5+4+7+8)/6, (1+2+3+4+5+6+7+8+9)/9, (2+3+5+6+8+9)/6
(4+5+7+8)/4, (4+5+6+7+8+9)/6, (5+6+8+9)/4
which is:
All floating number convert to int
3, 3.5(3), 4 3, 3, 4
4.5(4), 5, 5.5(5) => 4, 5, 5
6, 6.5(6), 7 6, 6, 7
I tried to just iterate over the matrix and update each cell, but I found this will affect the future calculation:
Say I update the original 1 to 3, but when I when I tried to update the original 2, the original 1 becomes 3 now.
Copying the original matrix for calculating average is a workaround but it's a bad idea, Could we achieve that without using that much space?
In most cases, you should just create a copy of the original matrix and use that for calculating the averages. Unless creating a copy of the matrix would use more memory than you have available, the overhead should be negligible.
If you have a really large matrix, you could use a "rolling" backup (in lack of a better term). Let's say you update the cells row-by-row and you are currently in row n. You don't need a backup of row n-2, as those cells are not relevant any more, and neither of row n+1, because those are still the original values. So you can just keep a backup of the previous and the current row. Whenever you advance to the next row, discard the backup of the previous row, move the backup of the current row to previous, and create a backup of the new current row.
Some pseudo-code (not taking any edge-cases into account):
previous = [] # or whatever works for the first row
for i in len(matrix):
current = copy(matrix[i])
for k in len(matrix[i]):
matrix[i][k] = previous[k-1] + ... + current[k] + ... matrix[i+1][k+1] / 9
previous = current
(You might also keep a backup of the next row, just so you can use only the backup rows for all the values instead of having to differentiate.)
You must have some kind of cache for the result data so you can keep reference to the original data. I don't think there is a way around it.
If the data set is large, you could optimize by using a smaller data buffer (like looking through a keyhole) and 'scrolling' the input matrix as you update it. In your case, you could use a buffer as small as 3x3.
It is a compromise between speed and space though. The smaller your buffer, the worse the performance will be.
To visualize the problem, starting from the top-left (0,0) of the dataset:
(result values are rounded down for simplicity)
First step: update first 4 cells (prime the buffer)
// Data Set // Data Viewport // Result Set
01,02,03,04,05 01,02,03 04,04,??
06,07,08,09,10 06,07,08 06,07,??
11,12,13,14,15 11,12,13 ??,??,??
16,17,18,19,20
21,22,23,24,25
then for each iteration..
( new values indicated with [xx] )
++ update first column in Data Set from Result Set
// Data Set // Data Viewport // Result Set
[04],02,03,04,05 01,02,03 04,04,??
[06],07,08,09,10 06,07,08 06,07,??
11 ,12,13,14,15 11,12,13 ??,??,??
16 ,17,18,19,20
21 ,22,23,24,25
++ shift Data Viewport and Result Set right 1 column
// Data Set // Data Viewport // Result Set
[04],02,03,04,05 02,03,04 04,[03],??
[06],07,08,09,10 07,08,09 07,[08],??
11 ,12,13,14,15 12,13,14 ??, ?? ,??
16 ,17,18,19,20
21 ,22,23,24,25
++ update middle column of Result Set
// Data Set // Data Viewport // Result Set
[04],02,03,04,05 02,03,04 04,[05],??
[06],07,08,09,10 07,08,09 07,[08],??
11 ,12,13,14,15 12,13,14 ??, ?? ,??
16 ,17,18,19,20
21 ,22,23,24,25
At the following iteration, the data state would be:
// Data Set // Data Viewport // Result Set
04,[04],03,04,05 03,04,05 05,[06],??
06,[07],08,09,10 08,09,10 08,[09],??
11, 12 ,13,14,15 13,14,15 ??, ?? ,??
16, 17 ,18,19,20
21, 22 ,23,24,25
.. etc
Don't forget to handle the other edge cases.
*The Data Viewport representation is just for visualization. In code, the actual viewport would be the result buffer.
Related
Minimum number of additional weights required in order to weight items in range from 1 to 100
Given a set of weights S={w1,w2,w3} and a range of weights, we need to determine whether the weights in S can be used to balance every weight in the range. If not, we need to add the minimum additional weights to S so that all of the weights in the range can be balanced. For example: Range is 1 to 5 S = {4,8,9} The item with weight 1 can be balanced by putting the item on the left pan along with the 8, and put the 9 on the right pan. 1 + 8 = 9 3 + 9 = 8 + 4 4 = 4 5 + 8 = 9 + 4 But 2 can't be balanced using the weights {4,8,9} so we need to add another weight. Adding a weight of 1 allows 2 to balanced with 2 + 8 = 1 + 9 My question is there a mathematical algorithm that can be used to solve this problem?
There certainly are algorithms that would solve this. For clarity's sake, I'm assuming your use of the term "set" is the mathematical set, where all set elements are distinct, though this should not affect the below code all that much. Breaking down the problem into 2 parts: (1) Determine if the provided set of weights can be arranged on the scale such that the required range of integer values are covered A solution to part (1), in python: (to run, call check_range(int, int, []), where the first two args are the integer bounds of the range, low/high respectively, and the 3rd arg is a list of the weights in set s) def get_possible_offsets(s=[]): #the variable set "temp" will hold the possible offsets that we can create by arranging the weights on the scale temp=set() #optionally, we don't need to add any of the weights, ergo add value 0 by default temp.add(0) #per every weight in the given set of weights for weight in s: #take an iterable snapshot of our set of possible offsets l = list(temp) #for each value in that list, |i+/-w| the weight value for i in l: temp.add(i + weight) temp.add(abs(i - weight)) #and also add the weight by itself temp.add(weight) return(temp) def check_range(r_low=0, r_high=1, s=[]): #get the set of weight values available using the provided set of weights possible_offsets = get_possible_offsets(s) #list to store the required weight values not available using the provided set of weights missing_offsets = [] #for each required weight in the range, check if that weight exists in our current possible offsets for i in range(r_low, r_high+1): if i not in possible_offsets: missing_offsets.append(i) #if we're not missing any values from the required range, then we are done! if len(missing_offsets) == 0: print ("Yes! The required range is covered by the provided weights.") else: print ("Tragically, the following weight offsets are not covered:",missing_offsets) (2) If 1. is false, then determine the minimum required additional weights to complete the required range Part (2) of the problem, I have not added a complete solution yet, however, we just need to take the missing_offsets list in the above code, and boil it down to the additional weight values that could be included in the set of possible_offsets, as performed in the lines of code: for i in l: temp.add(i + weight) temp.add(abs(i - weight)) This problem also sounds a lot like search-tree algos (though not binary), as well as combinatorics, so there are likely several efficient ways of calculating the desired output.
The set of absolute differences between each side of the scale is our range. Let's enumerate them for S, aggregating each element in turn (add and subtract each element to each previously seen absolute difference, then add the element itself as a difference): S: {4, 8, 9} up to element S_0: 4 up to element S_1: 4, 12, 8 up to element S_2: 4, 12, 8, 13, 5, 21, 3, 1, 17, 9 Now let's order them: 1, 3, 4, 5, 8, 9, 12, 13, 17, 21 To cover our range, 1 to 5, we need to fill the gap between 1 and 3. Adding a 1 will add ±1 to every difference we can create. Would it not be the case that to cover any range, we would need to add ceil(k / 2) 1's, where k is the maximum gap in our range, when considering our enumerated differences? In this case, ceil(1 / 2) = one 1? As ruakh commented below, this is not the case. Any lower range we can build, in fact can be used to fill-in gaps anywhere, and the coverage of the filled-in range can be applied again to growing ranges. For example: {1, 2} covers 1 to 3 Now add 7 and we've increased our range to 1 - 10 by virtue of applying ±3 to 7 Now we can add 21 and achieve the range 21 ± 10! This points to the possibility of overlapping subproblems.
Algorithm- How to place n-texts on p-bands so that the global accesing time is minimum. Each text has its' own length
The problem sounds like this, we are given n-texts and they are going to be placed on a p number of tapes/bands(don't really know what's the equivalent in english, but I think you understand what I'm talking about). In order to read the text situated at a position k on one of the bands, we have to read the texts from positions 1,2,...,k on the certain band. Each text has its' own length. Now, we have to figure out a way of placing the texts on the p-bands so that we get a global accesing time that is minimum. The global accesing time is calculated by adding all the total accesing times from each band. The formula for calculating the total accesing time of a band is: n_ \ [L(T1)+L(T2)+...+L(Ti)] /_ i=1 Now, that little drawing I did is SUM from 1 to n; L(T i) is the length of T i; T i is the text situated at position i on the respective band; Here is an equivalent in "pseudocode" in case it helps: n-number of texts; Band[n]-array of texts sum=0, sum2=0; for(int i=0;i<n;i++) {sum=0; for(int j=0;j<=i;j++ ) sum=sum+Band[j].length; sum2=sum2+sum; } return sum2; Here's an example to clarify the problem: say p is 3, so we get 3 bands say n is 9, so we get 9 texts and the lengths are : 2, 3, 4, 5, 6, 7, 8, 9, 10 and they are placed on the bands in the following way: band-1: 2, 5, 8 -> total accesing time of band-1: 24 band-2: 3, 6, 9 -> total accesing time of band-2: 30 band-3: 4, 7, 10 -> total accesing time of band-3: 36 the global accesing time: 24 + 30 + 36 = 90
I'll refer to text position as the number of texts that appear after a specific text in a tape, it also represents how many additional times will the text be read. Since you are solely interested in the sum of access time there's no real meaning to how are the texts grouped into tapes but what is the position of each text, switching 2 texts in the same position but on different tapes for example won't change the global access time. Switching 2 texts of different size on different positions will change the time though, generally longer texts should be placed in lower positions (closer to the end) The algorithm can be greedy, go over the texts from the longest to the shortest and place each text in the last available spot on one of the tapes with the least texts in it, so if for example there are 10 texts and 5 tapes then the longer 5 texts will be in the end of each tape and the shorter 5 texts will be in the beginning of it.
filling the holes in a time series data
So i am trying to build one factor models with stocks and indices in R. I have 30 stocks and 16 indices in total. They are all time series from "2013-1-1" to "2014-12-31". Well at least all my stocks are. All of my indices are missing some entries here and there. For example, all of my stocks' data have the length of 522 but one indice has a length of 250, one 300, another 400 etc. But they all start from "2013-1-1" and end at "2014-12-31". Because my indice data has holes in it, i can't check correlations and build linear models with them. I can't do anything basically. So i need to fill these holes. I am thinking about filling those holes with their mean. But i don't know how to do it.I am open to other ideas of course. Can you help me? It is an important term project for me, so there is a lot on the line...
Edited based upon your comments (and to fix a mistake I made): This is basic data management and I'm surprised that you're being required to work with timeseries data without knowing how to merge() and how to create dataframes. Create some fake date and value data with holes in the dates: dFA <- data.frame(seq.Date(as.Date("2014-01-01"), as.Date("2014-02-28"), 3)) names(dFA) <- "date" dFA$vals <- rnorm(nrow(dFA), 25, 5) Create a dataframe of dates from the min value in dFA to the max value in dFA dFB <- as.data.frame(seq.Date(as.Date(min(dFA$date, na.rm = T), format = "%Y-%m-%d"), as.Date(max(dFA$date, na.rm = T), format = "%Y-%m-%d"), 1)) names(dFB) <- "date" Merge the two dataframes together tmp <- merge(dFB, dFA, by = "date", all = T) Change NA values in tmp$vals to whatever you want tmp$vals[is.na(tmp$vals)] <- mean(dFA$vals) head(tmp) date vals 1 2014-01-01 18.48131 2 2014-01-02 24.16256 3 2014-01-03 24.16256 4 2014-01-04 28.78855 5 2014-01-05 24.16256 6 2014-01-06 24.16256 Original comment below The easiest way to fill in the holes is with merge(). Create a new data frame with one vector as a sequence of dates that span the range of your original dataframe and the other vector with whatever you're going to fill the holes (zeroes, means, whatever). Then just merge() the two together: merge(dFB, dFA, by = [the column with the date values], all = TRUE)
Microsoft Excel cumulative calculation performance
I know 2 ways to calculate cumulative values in Excel. 1st method: A B Value Cumulative total 9 =A1 8 =B1+A2 7 =B2+A3 6 =B3+A4 2nd method: A B Value Cumulative total 9 =SUM($A$1:A1) 8 =SUM($A$1:A2) 7 =SUM($A$1:A3) 6 =SUM($A$1:A4) 2 questions: Which method has better performance when the data set gets really big (say 100k rows)? 1st method seems to be having less overhead. Because when adding a new value in column A (Value), new cell in column B only needs to do "B(n-1)+A(n)". Where in 2nd method, is it smart enough to do similar? Or it will adds 100k rows from A1:A(n)? What's the best way to calculate the cumulative values? I found 2nd method is more popular though I doubt its performance. The only upside for 2nd method I can see now is the formula in column B cells are more consistent. In 1st method, the 1st cell in column B has to be a determined in advance.
number sequence 9, 8, 7, 6, -9, -8, -7, -6; workbook set to manual calculation, triggered by following code: Sub ManualCalc() Dim R As Range Set R = Selection [F1] = Now() R.Worksheet.Calculate [F2] = Now() [F3] = ([F2] - [F1]) * 86400 End Sub At 4096 rows calculation time is not measurable for both variants (0 seconds), at 65536 rows your 1st method is still not measurable, your 2nd method takes a bit less than 8 seconds on my laptop (Dell Latitude E6420, Win7, Office2010 - average of 3 measurements each). So for high number of rows I would therefore prefer method 1. Regarding your Q1 ... yes it would add 100k sums of ever growing ranges ... Excel is not supposed to be smart, it's supposed to calculate whatever you ask it to calculate. If it did, it would interpret the intention of a set of formulas at runtime which I'd regard as very dangerous!
Random number generator that fills an interval
How would you implement a random number generator that, given an interval, (randomly) generates all numbers in that interval, without any repetition? It should consume as little time and memory as possible. Example in a just-invented C#-ruby-ish pseudocode: interval = new Interval(0,9) rg = new RandomGenerator(interval); count = interval.Count // equals 10 count.times.do{ print rg.GetNext() + " " } This should output something like : 1 4 3 2 7 5 0 9 8 6
Fill an array with the interval, and then shuffle it. The standard way to shuffle an array of N elements is to pick a random number between 0 and N-1 (say R), and swap item[R] with item[N]. Then subtract one from N, and repeat until you reach N =1.
This has come up before. Try using a linear feedback shift register.
One suggestion, but it's memory intensive: The generator builds a list of all numbers in the interval, then shuffles it.
A very efficient way to shuffle an array of numbers where each index is unique comes from image processing and is used when applying techniques like pixel-dissolve. Basically you start with an ordered 2D array and then shift columns and rows. Those permutations are by the way easy to implement, you can even have one exact method that will yield the resulting value at x,y after n permutations. The basic technique, described on a 3x3 grid: 1) Start with an ordered list, each number may exist only once 0 1 2 3 4 5 6 7 8 2) Pick a row/column you want to shuffle, advance it one step. In this case, i am shifting the second row one to the right. 0 1 2 5 3 4 6 7 8 3) Pick a row/column you want to shuffle... I suffle the second column one down. 0 7 2 5 1 4 6 3 8 4) Pick ... For instance, first row, one to the left. 2 0 7 5 1 4 6 3 8 You can repeat those steps as often as you want. You can always do this kind of transformation also on a 1D array. So your result would be now [2, 0, 7, 5, 1, 4, 6, 3, 8].
An occasionally useful alternative to the shuffle approach is to use a subscriptable set container. At each step, choose a random number 0 <= n < count. Extract the nth item from the set. The main problem is that typical containers can't handle this efficiently. I have used it with bit-vectors, but it only works well if the largest possible member is reasonably small, due to the linear scanning of the bitvector needed to find the nth set bit. 99% of the time, the best approach is to shuffle as others have suggested. EDIT I missed the fact that a simple array is a good "set" data structure - don't ask me why, I've used it before. The "trick" is that you don't care whether the items in the array are sorted or not. At each step, you choose one randomly and extract it. To fill the empty slot (without having to shift an average half of your items one step down) you just move the current end item into the empty slot in constant time, then reduce the size of the array by one. For example... class remaining_items_queue { private: std::vector<int> m_Items; public: ... bool Extract (int &p_Item); // return false if items already exhausted }; bool remaining_items_queue::Extract (int &p_Item) { if (m_Items.size () == 0) return false; int l_Random = Random_Num (m_Items.size ()); // Random_Num written to give 0 <= result < parameter p_Item = m_Items [l_Random]; m_Items [l_Random] = m_Items.back (); m_Items.pop_back (); } The trick is to get a random number generator that gives (with a reasonably even distribution) numbers in the range 0 to n-1 where n is potentially different each time. Most standard random generators give a fixed range. Although the following DOESN'T give an even distribution, it is often good enough... int Random_Num (int p) { return (std::rand () % p); } std::rand returns random values in the range 0 <= x < RAND_MAX, where RAND_MAX is implementation defined.
Take all numbers in the interval, put them to list/array Shuffle the list/array Loop over the list/array
One way is to generate an ordered list (0-9) in your example. Then use the random function to select an item from the list. Remove the item from the original list and add it to the tail of new one. The process is finished when the original list is empty. Output the new list.
You can use a linear congruential generator with parameters chosen randomly but so that it generates the full period. You need to be careful, because the quality of the random numbers may be bad, depending on the parameters.