How can i make my rounds() function in MATLAB, be very precised when handling around 400 values at a time? - precision

I have made this function rounds() which rounds the values to the nearest multiple of 0.5, ie, rounds(2.685)=2.5 rounds(2.332)=2.5 rounds(2.7554)=3.0 rounds(2.245)=2.0 it works well in the way mentioned above, but while handling large number of values the precision drop. It does not give me the desired results. I have 30 functions, each of them computes 14 values which I pass in rounds() as a whole vector containing those 14 values. The results are like, for values for which it should return 3.0 such as rounds(2.7554) it only returns 2.5 and that affects my overall accuracy alot. Individually it works well for all values even 2.7554 returns 3.0 when i pass it to check its working. Can anyone tell me why this happens that while handling large number of values its performance decreases and also tell me the solution.
function [newVal] = rounds(x)
dist = mod(x, 0.5);
floorVal = x - dist;
if dist >=0.25
newVal = floorVal + 0.5;
else
newVal = floorVal;
end
end
The above is the rounds function and below i am showing how i have used it my functions.
if true
calc(1) = x+y;
calc(2) = x-y;
calc(3) = x*y+a;
.......
.......
.......
calc(14) = a+b*c+x;
calc = calc';
final_calc = rounds(calc);
end
Even in a single function the rounds function handles only 14 values at once still the result is not precised, while if i pass those same values individually it gives the correct output. Please solve this issue someone. Thanks in advance.

function [newVal] = rounds2(x)
newVal = round(x/0.5)*0.5;
end
This was something i was struggling for a long time but here I have now a perfect answer. This was happening because i was passing the vector expression dist >= 0.25 to if...end, incorrectly thinking that the if...end will be evaluated separately for each element of x(i). Hope that it helps some others too

Related

How do I repeat a random number

I've tried searching for help but I haven't found a solution yet, I'm trying to repeat math.random.
current code:
local ok = ""
for i = 0,10 do
local ok = ok..math.random(0,10)
end
print(ok)
no clue why it doesn't work, please help
Long answer
Even if the preferable answer is already given, just copying it will probably not lead to the solution you may expect or less future mistakes. So I decided to explain why your code fails and to fix it and also help better understand how DarkWiiPlayer's answer works (except for string.rep and string.gsub).
Issues
There are at least three issues in your code:
the math.random(m, n) function includes lower and the upper values
local declarations hide a same-name objects in outer scopes
math.random gives the same number sequence unless you set its seed with math.randomseed
See Detailed explanation section below for more.
Another point seems at least worth mentioning or suspicious to me, as I assume you might be puzzled by the result (it seems to me to reflect exactly the perspective of the C programmer, from which I also got to know Lua): the Lua for loop specifies start and end value, so both of these values are included.
Attempt to repair
Here I show how a version of your code that yields the same results as the answer you accepted: a sequence of 10 percent-encoded decimal digits.
-- this will change the seed value (but mind that its resolution is seconds)
math.randomseed(os.time())
-- initiate the (only) local variable we are working on later
local ok = ""
-- encode 10 random decimals (Lua's for-loop is one-based and inclusive)
for i = 1, 10 do
ok = ok ..
-- add fixed part
'%3' ..
-- concatenation operator implicitly converts number to string
math.random(0, 9) -- a random number in range [0..9]
end
print(ok)
Detailed explanation
This explanation makes heavily use of the assert function instead of adding print calls or comment what the output should be. In my opinion assert is the superior choice for illustrating expected behavior: The function guides us from one true statement - assert(true) - to the next, at the first miss - assert(false) - the program is exited.
Random ranges
The math library in Lua provides actually three random functions depending on the count of arguments you pass to it. Without arguments, the result is in the interval [0,1):
assert(math.random() >= 0)
assert(math.random() < 1)
the one-argument version returns a value between 1 and the argument:
assert(math.random(1) == 1)
assert(math.random(10) >= 1)
assert(math.random(10) <= 10)
the two-argument version explicitly specifies min and max values:
assert(math.random(2,2) == 2)
assert(math.random(0, 9) >= 0)
assert(math.random(0, 9) <= 9)
Hidden outer variable
In this example, we have two variables x of different type, the outer x is not accessible from the inner scope.
local x = ''
assert(type(x) == 'string')
do
local x = 0
assert(type(x) == 'number')
-- inner x changes type
x = x .. x
assert(x == '00')
end
assert(type(x) == 'string')
Predictable randomness
The first call to math.random() in a Lua program will return always the same number because the pseudorandom number generator (PRNG) starts at seed 1. So if you call math.randomseed(1), you'll reset the PRNG to its initial state.
r0 = math.random()
math.randomseed(1)
r1 = math.random()
assert(r0 == r1)
After calling math.randomseed(os.time()) calls to math.random() will return different sequences presuming that subsequent program starts differ at least by one second. See question Current time in milliseconds and its answers for more information about the resolutions of several Lua functions.
string.rep(".", 10):gsub(".", function() return "%3" .. math.random(0, 9) end)
That should give you what you want

Code running very slow

My code seems to run very slowly and I can't think of any way to make it faster. All my arrays have been preallocated. S is a large number of element (say 10000 element, for example). I know my code runs slowly because of the "for k=1:S" but i cant think of another way to perform this loop at a relatively fast speed. Can i please get help because it takes hours to run.
[M,~] = size(Sample2000_X);
[N,~] = size(Sample2000_Y);
[S,~] = size(Prediction_Point);
% Speed Preallocation
Distance = zeros(M,N);
Distance_Prediction = zeros(M,1);
for k=1:S
for i=1:M
for j=1:N
Distance(i,j) = sqrt(power((Sample2000_X(i)-Sample2000_X(j)),2)+power((Sample2000_Y(i)-Sample2000_Y(j)),2));
end
Distance_Prediction(i,1) = sqrt(power((Prediction_Point(k,1)-Sample2000_X(i)),2)+power((Prediction_Point(k,2)-Sample2000_Y(i)),2));
end
end
Thanks.
I realized the major problem was organization of my code. I was performing calculation in a loop where it was absolutely unnecessary. So i seperated the code in two blocks and it Works much faster.
for i=1:M
for j=1:N
Distance(i,j) = sqrt(power((Sample2000_X(i)-Sample2000_X(j)),2)+power((Sample2000_Y(i)-Sample2000_Y(j)),2));
end
end
for k=1:S
for i=1:M
Distance_Prediction(i,1) = sqrt(power((Prediction_Point(k,1)-Sample2000_X(i)),2)+power((Prediction_Point(k,2)-Sample2000_Y(i)),2));
end
end
Thanks to the community for the help.
Your matrix Distance does not depend on k, so you can easily calculate it outside the main for-loop, for instance using:
d = sqrt((repmat(Sample2000_X, [1,M]) - repmat(Sample2000_X', [M,1])).^2 + (repmat(Sample2000_Y, [1,N]) - repmat(Sample2000_Y', [N,1])).^2);
I assume M=N, because elsewise your code won't work. Next, you can calculate your Distance_Prediction matrix. It is rather strange that you calculate this inside the for-loop over k, because the matrix will be changed in every iteration without using it. Anyway, this will do exactly the same as your code:
for k=1:S
Distance_Prediction = sqrt((Sample2000_X - Prediction_Point(k,1)).^2 + (Sample2000_Y - Prediction_Point(k,1)).^2);
end

Returning multiple ints and passing them as multiple arguements in Lua

I have a function that takes a variable amount of ints as arguments.
thisFunction(1,1,1,2,2,2,2,3,4,4,7,4,2)
this function was given in a framework and I'd rather not change the code of the function or the .lua it is from. So I want a function that repeats a number for me a certain amount of times so this is less repetitive. Something that could work like this and achieve what was done above
thisFunction(repeatNum(1,3),repeatNum(2,4),3,repeatNum(4,2),7,4,2)
is this possible in Lua? I'm even comfortable with something like this:
thisFunction(repeatNum(1,3,2,4,3,1,4,2,7,1,4,1,2,1))
I think you're stuck with something along the lines of your second proposed solution, i.e.
thisFunction(repeatNum(1,3,2,4,3,1,4,2,7,1,4,1,2,1))
because if you use a function that returns multiple values in the middle of a list, it's adjusted so that it only returns one value. However, at the end of a list, the function does not have its return values adjusted.
You can code repeatNum as follows. It's not optimized and there's no error-checking. This works in Lua 5.1. If you're using 5.2, you'll need to make adjustments.
function repeatNum(...)
local results = {}
local n = #{...}
for i = 1,n,2 do
local val = select(i, ...)
local reps = select(i+1, ...)
for j = 1,reps do
table.insert(results, val)
end
end
return unpack(results)
end
I don't have 5.2 installed on this computer, but I believe the only change you need is to replace unpack with table.unpack.
I realise this question has been answered, but I wondered from a readability point of view if using tables to mark the repeats would be clearer, of course it's probably far less efficient.
function repeatnum(...)
local i = 0
local t = {...}
local tblO = {}
for j,v in ipairs(t) do
if type(v) == 'table' then
for k = 1,v[2] do
i = i + 1
tblO[i] = v[1]
end
else
i = i + 1
tblO[i] = v
end
end
return unpack(tblO)
end
print(repeatnum({1,3},{2,4},3,{4,2},7,4,2))

Removing a "row" from a structure array

This is similar to a question I asked before, but is slightly different:
So I have a very large structure array in matlab. Suppose, for argument's sake, to simplify the situation, suppose I have something like:
structure(1).name, structure(2).name, structure(3).name structure(1).returns, structure(2).returns, structure(3).returns (in my real program I have 647 structures)
Suppose further that structure(i).returns is a vector (very large vector, approximately 2,000,000 entries) and that a condition comes along where I want to delete the jth entry from structure(i).returns for all i. How do you do this? or rather, how do you do this reasonably fast? I have tried some things, but they are all insanely slow (I will show them in a second) so I was wondering if the community knew of faster ways to do this.
I have parsed my data two different ways; the first way had everything saved as cell arrays, but because things hadn't been working well for me I parsed the data again and placed everything as vectors.
What I'm actually doing is trying to delete NaN data, as well as all data in the same corresponding row of my data file, and then doing the very same thing after applying the Hampel filter. The relevant part of my code in this attempt is:
for i=numStock+1:-1:1
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
stock(i).return = sort(stock(i).return);
stock(i).returnLength = length(stock(i).return);
stock(i).medianReturn = median(stock(i).return);
stock(i).madReturn = mad(stock(i).return,1);
end;
for i=numStock:-1:1
for j = length(stock(i+1).volume):-1:1
if(isnan(stock(i+1).volume(j)))
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end
end
stock(i+1).volume = sort(stock(i+1).volume);
stock(i+1).volumeLength = length(stock(i+1).volume);
stock(i+1).medianVolume = median(stock(i+1).volume);
stock(i+1).madVolume = mad(stock(i+1).volume,1);
end;
for i=numStock+1:-1:1
for j=stock(i).returnLength:-1:1
if (abs(stock(i).return(j) - stock(i).medianReturn) > 3*stock(i).madReturn)
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end;
end;
end;
for i=numStock:-1:1
for j=stock(i+1).volumeLength:-1:1
if (abs(stock(i+1).volume(j) - stock(i+1).medianVolume) > 3*stock(i+1).madVolume)
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end;
end;
end;
However, this returns an error:
"Matrix index is out of range for deletion.
Error in Failure (line 110)
stock(k).return(j) = [];"
So instead I tried by parsing everything in as vectors. Then I decided to try and delete the appropriate entries in the vectors prior to building the structure array. This isn't returning an error, but it is very slow:
%% Delete bad data, Hampel Filter
% Delete bad entries
id=strcmp(returns,'');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
id=strcmp(returns,'C');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
% Convert returns from string to double
returns=cellfun(#str2double,returns);
sp500=cellfun(#str2double,sp500);
% Delete all data for which a return is not a number
nanid=isnan(returns);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Delete all data for which a volume is not a number
nanid=isnan(volume);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Apply the Hampel filter, and delete all data corresponding to
% observations deleted by the filter.
medianReturn = median(returns);
madReturn = mad(returns,1);
for i=length(returns):-1:1
if (abs(returns(i) - medianReturn) > 3*madReturn)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
medianVolume = median(volume);
madVolume = mad(volume,1);
for i=length(volume):-1:1
if (abs(volume(i) - medianVolume) > 3*madVolume)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
As I said, this is very slow, probably because I'm using a for loop on a very large data set; however, I'm not sure how else one would do this. Sorry for the gigantic post, but does anyone have a suggestion as to how I might go about doing what I'm asking in a reasonable way?
EDIT: I should add that getting the vector method to work is probably preferable, since my aim is to put all of the return vectors into a matrix and get all of the volume vectors into a matrix and perform PCA on them, and I'm not sure how I would do that using cell arrays (or even if princomp would work on cell arrays).
EDIT2: I have altered the code to match your suggestion (although I did decide to give up speed and keep with the for-loops to keep with the structure array, since reparsing this data will be way worse time-wise). The new code snipet is:
stock_return = zeros(numStock+1,length(stock(1).return));
for i=1:numStock+1
for j=1:length(stock(i).return)
stock_return(i,j) = stock(i).return(j);
end
end
stock_return = stock_return(~any(isnan(stock_return)), : );
This returns an Index exceeds matrix dimensions error, and I'm not sure why. Any suggestions?
I could not find a convenient way to handle structures, therefore I would restructure the code so that instead of structures it uses just arrays.
For example instead of stock(i).return(j) I would do stock_returns(i,j).
I show you on a part of your code how to get rid of for-loops.
Say we deal with this code:
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
Now, the deletion of columns with any NaN data goes like this:
stock_return = stock_return(:, ~any(isnan(stock_return)) );
As for the absolute difference from medianVolume, you can write a similar code:
% stock_return_length is a scalar
% stock_median_return is a column vector (eg. [1;2;3])
% stock_mad_return is also a column vector.
median_return = repmat(stock_median_return, stock_return_length, 1);
is_bad = abs(stock_return - median_return) > 3.* stock_mad_return;
stock_return = stock_return(:, ~any(is_bad));
Using a scalar for stock_return_length means of course that the return lengths are the same, but you implicitly assume it in your original code anyway.
The important point in my answer is using any. Logical indexing is not sufficient in itself, since in your original code you delete all the values if any of them is bad.
Reference to any: http://www.mathworks.co.uk/help/matlab/ref/any.html.
If you want to preserve the original structure, so you stick to stock(i).return, you can speed-up your code using essentially the same scheme but you can only get rid of one less for-loop, meaning that your program will be substantially slower.

Build fixed interval dataset from random interval dataset using stale data

Update: I've provided a brief analysis of the three answers at the bottom of the question text and explained my choices.
My Question: What is the most efficient method of building a fixed interval dataset from a random interval dataset using stale data?
Some background: The above is a common problem in statistics. Frequently, one has a sequence of observations occurring at random times. Call it Input. But one wants a sequence of observations occurring say, every 5 minutes. Call it Output. One of the most common methods to build this dataset is using stale data, i.e. set each observation in Output equal to the most recently occurring observation in Input.
So, here is some code to build example datasets:
TInput = 100;
TOutput = 50;
InputTimeStamp = 730486 + cumsum(0.001 * rand(TInput, 1));
Input = [InputTimeStamp, randn(TInput, 1)];
OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001)';
Output = [OutputTimeStamp, NaN(TOutput, 1)];
Both datasets start at close to midnight at the turn of the millennium. However, the timestamps in Input occur at random intervals while the timestamps in Output occur at fixed intervals. For simplicity, I have ensured that the first observation in Input always occurs before the first observation in Output. Feel free to make this assumption in any answers.
Currently, I solve the problem like this:
sMax = size(Output, 1);
tMax = size(Input, 1);
s = 1;
t = 2;
%#Loop over input data
while t <= tMax
if Input(t, 1) > Output(s, 1)
%#If current obs in Input occurs after current obs in output then set current obs in output equal to previous obs in input
Output(s, 2:end) = Input(t-1, 2:end);
s = s + 1;
%#Check if we've filled out all observations in output
if s > sMax
break
end
%#This step is necessary in case we need to use the same input observation twice in a row
t = t - 1;
end
t = t + 1;
if t > tMax
%#If all remaining observations in output occur after last observation in input, then use last obs in input for all remaining obs in output
Output(s:end, 2:end) = Input(end, 2:end);
break
end
end
Surely there is a more efficient, or at least, more elegant way to solve this problem? As I mentioned, this is a common problem in statistics. Perhaps Matlab has some in-built function I'm not aware of? Any help would be much appreciated as I use this routine a LOT for some large datasets.
THE ANSWERS: Hi all, I've analyzed the three answers, and as they stand, Angainor's is the best.
ChthonicDaemon's answer, while clearly the easiest to implement, is really slow. This is true even when the conversion to a timeseries object is done outside of the speed test. I'm guessing the resample function has a lot of overhead at the moment. I am running 2011b, so it is possible Mathworks have improved it in the intervening time. Also, this method needs an additional line for the case where Output ends more than one observation after Input.
Rody's answer runs only slightly slower than Angainor's (unsurprising given they both employ the histc approach), however, it seems to have some problems. First, the method of assigning the last observation in Output is not robust to the last observation in Input occurring after the last observation in Output. This is an easy fix. But there is a second problem which I think stems from having InputTimeStamp as the first input to histc instead of the OutputTimeStamp adopted by Angainor. The problem emerges if you change OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001)'; to OutputTimeStamp = 730486.002 + (0:0.0001:TOutput * 0.0001 - 0.0001)'; when setting up the example inputs.
Angainor's appears robust to everything I threw at it, plus it was the fastest.
I did a lot of speed tests for different input specifications - the following numbers are fairly representative:
My naive loop: Elapsed time is 8.579535 seconds.
Angainor: Elapsed time is 0.661756 seconds.
Rody: Elapsed time is 0.913304 seconds.
ChthonicDaemon: Elapsed time is 22.916844 seconds.
I'm +1-ing Angainor's solution and marking the question solved.
This "stale data" approach is known as a zero order hold in signal and timeseries fields. Searching for this quickly brings up many solutions. If you have Matlab 2012b, this is all built in to the timeseries class by using the resample function, so you would simply do
TInput = 100;
TOutput = 50;
InputTimeStamp = 730486 + cumsum(0.001 * rand(TInput, 1));
InputData = randn(TInput, 1);
InputTimeSeries = timeseries(InputData, InputTimeStamp);
OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001);
OutputTimeSeries = resample(InputTimeSeries, OutputTimeStamp, 'zoh'); % zoh stands for zero order hold
Here is my take on the problem. histc is the way to go:
% find Output timestamps in Input bins
N = histc(Output(:,1), Input(:,1));
% find counts in the non-empty bins
counts = N(find(N));
% find Input signal value associated with every bin
val = Input(find(N),2);
% now, replicate every entry entry in val
% as many times as specified in counts
index = zeros(1,sum(counts));
index(cumsum([1 counts(1:end-1)'])) = 1;
index = cumsum(index);
val_rep = val(index)
% finish the signal with last entry from Input, as needed
val_rep(end+1:size(Output,1)) = Input(end,2);
% done
Output(:,2) = val_rep;
I checked against your procedure for a few different input models (I changed the number of Output timestamps) and the results are the same. However, I am still not sure I understood your problem, so if something is wrong here let me know.

Resources