SAS: Using WHERE / IF Statement in SGPlot - syntax

Good afternoon,
I would like to define my parameters in my plot as opposed to generating a plot with all values.
For example, I want to show only the sale price of the data not exceeding $400,000. This syntax is not correct, but this is my attempt at it. Should I use the if, by, or where statement in this matter? Thank you!
proc sgplot data=mydata;
loess x = FirstFlrSF y = saleprice / group= OverallQual;
reg x = FirstFlrSF y = saleprice;
where saleprice =< 400000;
title "First Floor SF vs sales price"; run;

IF's don't work in PROCS, but WHERE's do, however you have the comparison operator specified incorrectly. It's <= instead of =<. I always remember the order by saying it out loud, less than or equal to.
proc sgplot data=sashelp.class;
scatter x=height y=weight;
where age <= 15;
run;quit;

The placement of the where statement was not in the correct line.
proc sgplot data=mydata (where =(saleprice <= 400000));
loess x = FirstFlrSF y = saleprice / group= OverallQual;
reg x = FirstFlrSF y = saleprice;
title "First Floor SF vs sales price"; run;

Related

speed up loop in matlab

I'm very new in MATLAB (this is my first script).
I wonder how may I speed up this loop, I don't know any toolbox or 'tricks' as I'm a newbie on it. I tried to code it with instinct, it works, but it is really long.
All are variables get with fread or integer manually entered, so this is basically simple math, but I have no clue on why is it so long (maybe nested loops ?) and how to improve, as I am more familiar with Python and for example multiprocess.
Thanks a lot
X = 0;
Points = [0,0,0];
for i=1:nbLines
for j=1:nbPositions-1
if lDate(i)>posDate(j) && lDate(i)<=posDate(j+1)
weight = (lDate(i) - posDate(j)) / (posDate(j+1)- posDate(j));
X = posX(j)*(1-weight) + posX(j+1) * weight;
end
end
if X ~= 0
for j=1:nbScans
Y = - distance(i,j) / tan(angle(i,j));
Points = [Points;X, Y, distance(i,j)];
end
end
end
X = 0;
Points = cell([],1) ;
Points{1} = [0,0,0];
count = 1 ;
for i=1:nbLines
id = find(lDate(i)>posDate & lDate(i)<=posDate) ;
if length(id) > 1
weight = (lDate(i) - posDate(id(1))) / (posDate(id(end))- posDate(id(1)));
X = posX(id(1))*(1-weight) + posX(id(end)) * weight;
end
if X ~= 0
j=1:nbScans ;
count = count+1 ;
Y = - distance(i,j)./tan(angle(i,j));
Points{count} = [repelem(X,size(Y,2),size(Y),1), Y, distance(i,j)'];
end
end
You have one issue with the given code. The blow line:
Points = [Points; X, Y, distance(i,j)];
This will definitely slow up your code. You need to initialize this array to store the numbers. If you initialize it, you will find good difference in speed.
X = 0;
Points = zeros([],3) ;
Points(1,:) = [0,0,0];
count = 1 ;
for i=1:nbLines
for j=1:nbPositions-1
if lDate(i)>posDate(j) && lDate(i)<=posDate(j+1)
weight = (lDate(i) - posDate(j)) / (posDate(j+1)- posDate(j));
X = posX(j)*(1-weight) + posX(j+1) * weight;
end
end
if X ~= 0
for j=1:nbScans
count = count+1 ;
Y = - distance(i,j) / tan(angle(i,j));
Points(count,:) = [X, Y, distance(i,j)];
end
end
end
Note that, you code only saves the last value of X, is this what you want?
try using parallelization- "parfor" instead of "for" that uses all available processors.
parfor i=1:nbLines
rest of code here
end

difference in results while using pixel value or int

While using Matlab for image processing (exactly improving img by Fuzzy Logic) I found a really strange thing. My fuzzy function is correct, I tested it on random values and they are basically simple linear functions.
function f = Udark(z)
if z < 50
f = 1;
elseif z > 125
f = 0;
elseif (z >= 50) && (z <= 125)
f = -z/75 + 125/75;
end
end
where z is a value of a pixel (in grayscale). Now there is a really strange thing going on.
f = -z/75 + 125/75;, where a is an image. However, it is giving really different results if used as an input. I.e. if I use a variable p = 99, the output of the function is 0.3467 as it should be, when if I use A(i,j) it is giving me result f=2. Since it is clearly impossible, I do not know where is the problem. I thought that maybe there is a case with the type of the variable but if I change it to uint8 it stays the same... If you know what's going on, please, let me know :)
1.Changed line:
f = (125/75) - (z/75);
After editing the third condition the resultant/transformed image has no pixel values of 2. Not sure if you intend to work with decimals. If decimals are necessary using the im2double() function to convert the image and scaling it up by a factor of 255 might suffice your needs. See heading 3 for rounding details.
2.Reading in Image and Testing:
%Reading in the image and applying the function%
Image = imread("RGB_Image.png");
Greyscale_Image = rgb2gray(Image);
[Image_Height,Image_Width] = size(Greyscale_Image);
Transformed_Image = zeros(Image_Height,Image_Width);
for Row = 1: +1: Image_Height
for Column = 1: +1: Image_Width
Pixel_Value = Greyscale_Image(Row,Column);
[Transformed_Pixel_Value] = Udark(Pixel_Value);
Transformed_Image(Row,Column) = Transformed_Pixel_Value;
end
end
subplot(1,2,1); imshow(Greyscale_Image);
subplot(1,2,2); imshow(Transformed_Image);
%Checking that no transformed pixels falls in this impossible range%
Check = (Transformed_Image > (125/75)) & (Transformed_Image ~= 1);
Check_Flag = any(Check,'all');
%Function to transform pixel values%
function f = Udark(z)
if z < 50
f = 1;
elseif z > 125
f = 0;
elseif (z >= 50) && (z <= 125)
f = (125/75) - (z/75);
end
end
3.Evaluating the Specifics of the Third Condition
Working with integers (uint8) will force the values to be rounded to the nearest integer. Any number that falls between the range (50,125] will evaluate to 1 or 0.
f = -z/75 + 125/75;
If z = 50.1,
-50.1/75 + 125/75 = 74.9/75 ≈ 0.9987 → rounds to 1
Using MATLAB version: R2019b

Can i declare a local variable in a for loop?

for x = 1, 16 do
for y = 1, 16 do
local cntr = Center:new()
cntr.point = {x = 0.5 + x - 1, y = 0.5 + y - 1}
centerLookup[cntr.point] = cntr
table.insert(self.centers, cntr)
end
end
In the code above, centerLookup[point] is meant to look up the respective Center object by inputting a point location.
However, when I try to do this:
function neighbors(center, sqrtsize)
if center.point.y + 1 < sqrtsize then
local up = {x = center.point.x, y = center.point.y+1}
local centerup = centerLookup[up]
table.insert(center.neighbors, centerup)
end
end
centerup returns as a nil value
Idk if the problem is that I can't use a table as an index, but that is what I'm thinking.
Anybody know what's wrong here?
P.S. if it's helpful, centers start at 0.5 (so [0.5, 0.5] would be the first center, then [0.5, 1.5], etc.)
Thanks in advance!
This has nothing to do with local variables and everything to do with the fact that tables are compared by-reference and not by-value.
In Lua, tables are reference types that have their own identity. Even if two tables have the same contents, Lua does not consider them equal unless they are the exact same object.
To illustrate this, here is some sample code, and the printed values:
local tbl1 = {x = 0.5, y = 0.5}
local tbl2 = tbl1
local tbl3 = {x = 0.5, y = 0.5}
print(tbl1 == tbl2) -- True; tbl1 and tbl2 both reference the same table
print(tbl1 == tbl3) -- False; tbl1 and tbl3 reference different tables
local up = {x = center.point.x, y = center.point.y+1}
local centerup = centerLookup[up]
In this snippet, up is a completely new table with only one reference (the up variable itself). This new table won't be a key in your centerLookup table, even if a table key exists with the same contents.
cntr.point = {x = 0.5 + x - 1, y = 0.5 + y - 1}
centerLookup[cntr.point] = cntr
table.insert(self.centers, cntr)
In this snippet, you create a new table, and reference it in three different places: cntr.point, centerLookup as a key, and self.centers as a value. You presumably iterate through the self.centers array, and use the exact same table to look up items in the centerLookup table. However, if you were to use a table not in the self.centers array, it would not work.
Colonel Thirty Two explained the reason why your code not working as expected. I just want to add quick solution:
function pointToKey(point)
return point.x .. "_" .. point.y
end
Use this function for lookup in both places
--setup centerLookup
centerLookup[pointToKey(cntr.point)] = cntr
--find point from lookup
local centerup = centerLookup[pointToKey(up)]

Speeding up simulation of the Levy motion algorithm

Here is my little script for simulating Levy motion:
clear all;
clc; close all;
t = 0; T = 1000; I = T-t;
dT = T/I; t = 0:dT:T; tau = T/I;
alpha = 1.5;
sigma = dT^(1/alpha);
mu = 0; beta = 0;
N = 1000;
X = zeros(N, length(I));
for k=1:N
L = zeros(1,I);
for i = 1:I-1
L( (i + 1) * tau ) = L(i*tau) + stable2( alpha, beta, sigma, mu, 1);
end
X(k,1:length(L)) = L;
end
q = 0.1:0.1:0.9;
quant = qlines2(X, q, t(1:length(X)), tau);
hold all
for i = 1:length(quant)
plot( t, quant(i) * t.^(1/alpha), ':k' );
end
Where stable2 returns a stable random variable with given parameters (you may replace it with normrnd(mu, sigma) for this case, it's not crucial); qlines2 returns quantiles needed for plotting.
But I don't want to talk about math here. My problem is that this implementation is pretty slow, and I would like to speed it up. Unfortunately, computer science is not my main field - I heard something about methods like memoization, vectorization and that there is a lot of other techniques, but I don't know how to use them.
For example, I'm pretty sure I should replace this filthy double for-loop somehow, but I'm not sure what to do instead.
EDIT: Maybe I should use (and learn...) another language (Python, C, any functional one)? I always though that Matlab/OCTAVE is designed for numerical computation, but if change, then for which one?
The crucial bit is, as you said, the for loops, Matlab does not like those, so vectorization is indeed the keyword. (Together with preallocating the space.
I just altered you for loop section somewhat so that you do not have to reset L over and over again, instead we save all Ls in a bigger matrix (also I elimiated the length(L) command).
L = zeros(N,I);
for k=1:N
for i = 1:I-1
L(k,(i + 1) * tau ) = L(k,i*tau) + normrnd(mu, sigma);
end
X(k,1:I) = L(k,1:I);
end
Now you can already see that X(k,1:I) = L(k,1:I); in the loop is obsolete and that also means that we can switch the order of the loops. This is crucial, because the i-steps are recursive (depend on the previous step) that means we cannot vectorize this loop, we can only vectorize the k-loop.
Now your original code needed 9.3 seconds on my machine, the new code still needs about the same time)
L = zeros(N,I);
for i = 1:I-1
for k=1:N
L(k,(i + 1) * tau ) = L(k,i*tau) + normrnd(mu, sigma);
end
end
X = L;
But now we can apply the vectorization, instead of looping throu all rows (the loop over k) we can instead eliminate this loop, and doing all rows at "once".
L = zeros(N,I);
for i = 1:I-1
L(:,(i + 1) * tau ) = L(:,i*tau) + normrnd(mu, sigma); %<- this is not yet what you want, see comment below
end
X = L;
This code need only 0.045 seconds on my machine. I hope you still get the same output, because I have no idea what you are calculating, but I also hope you could see how you go about vectorizing code.
PS: I just noticed that we now use the same random number in the last example for the whole column, this is obviously not what you want. Instad you should generate a whole vector of random numbers, e.g:
L = zeros(N,I);
for i = 1:I-1
L(:,(i + 1) * tau ) = L(:,i*tau) + normrnd(mu, sigma,N,1);
end
X = L;
PPS: Great question!

Convert Excel Column Number to Column Name in Matlab

I am using Excel 2007 which supports Columns upto 16,384 Columns. I would like to obtain the Column name corresponding Column Number.
Currently, I am using the following code. However this code supports upto 256 Columns. Any idea how to obtain Column Name if the column number is greater than 256.
function loc = xlcolumn(column)
if isnumeric(column)
if column>256
error('Excel is limited to 256 columns! Enter an integer number <256');
end
letters = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
count = 0;
if column-26<=0
loc = char(letters(column));
else
while column-26>0
count = count + 1;
column = column - 26;
end
loc = [char(letters(count)) char(letters(column))];
end
else
letters = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'];
if size(column,2)==1
loc =findstr(column,letters);
elseif size(column,2)==2
loc1 =findstr(column(1),letters);
loc2 =findstr(column(2),letters);
loc = (26 + 26*loc1)-(26-loc2);
end
end
Thanks
As a diversion, here is an all function handle example, with (almost) no file-based functions required. This is based on the dec2base function, since Excel column names are (almost) base 26 numbers, with the frustrating difference that there are no "0" characters.
Note: this is probably a terrible idea overall, but it works. Better solutions are probably found elsewhere in the file exchange.
First, the one file based function that I couldn't get around, to perform arbitrary depth function composition.
function result = compose( fnHandles )
%COMPOSE Compose a set of functions
% COMPOSE({fnHandles}) returns a function handle consisting of the
% composition of the cell array of input function handles.
%
% For example, if F, G, and H are function handles with one input and
% one output, then:
% FNCOMPOSED = COMPOSE({F,G,H});
% y = FNCOMPOSED(x);
% is equivalent to
% y = F(G(H(x)));
if isempty(fnHandles)
result = #(x)x;
elseif length(fnHandles)==1
result = fnHandles{1};
else
fnOuter = fnHandles{1};
fnRemainder = compose(fnHandles(2:end));
result = #(x)fnOuter(fnRemainder(x));
end
Then, the bizarre, contrived path to convert base26 values into the correct string
%Functions leading to "getNumeric", which creates a numeric, base26 array
remapUpper = #(rawBase)(rawBase + (rawBase>='A')*(-55)); %Map the letters 'A-P' to [10:26]
reMapLower = #(rawBase)(rawBase + (rawBase<'A')*(-48)); %Map characters '0123456789' to [0:9]
getRawBase = #(x)dec2base(x, 26);
getNumeric = #(x)remapUpper(reMapLower(getRawBase(x)));
%Functions leading to "correctNumeric"
% This replaces zeros with 26, and reduces the high values entry by 1.
% Similar to "borrowing" as we learned in longhand subtraction
borrowDownFrom = #(x, fromIndex) [x(1:(fromIndex-1)) (x(fromIndex)-1) (x(fromIndex+1)+26) (x((fromIndex+2):end))];
borrowToIfNeeded = #(x, toIndex) (x(toIndex)<=0)*borrowDownFrom(x,toIndex-1) + (x(toIndex)>0)*(x); %Ugly numeric switch
getAllConditionalBorrowFunctions = #(numeric)arrayfun(#(index)#(numeric)borrowToIfNeeded(numeric, index),(2:length(numeric)),'uniformoutput',false);
getComposedBorrowFunction = #(x)compose(getAllConditionalBorrowFunctions(x));
correctNumeric = #(x)feval(getComposedBorrowFunction(x),x);
%Function to replace numerics with letters, and remove leading '#' (leading
%zeros)
numeric2alpha = #(x)regexprep(char(x+'A'-1),'^#','');
%Compose complete function
num2ExcelName = #(x)arrayfun(#(x)numeric2alpha(correctNumeric(getNumeric(x))), x, 'uniformoutput',false)';
Now test using some stressing transitions:
>> num2ExcelName([1:5 23:28 700:704 727:729 1024:1026 1351:1355 16382:16384])
ans =
'A'
'B'
'C'
'D'
'E'
'W'
'X'
'Y'
'Z'
'AA'
'AB'
'ZX'
'ZY'
'ZZ'
'AAA'
'AAB'
'AAY'
'AAZ'
'ABA'
'AMJ'
'AMK'
'AML'
'AYY'
'AYZ'
'AZA'
'AZB'
'AZC'
'XFB'
'XFC'
'XFD'
This function I wrote works for any number of columns (until Excel runs out of columns). It just requires a column number input (e.g. 16368 will return a string 'XEN').
If the application of this concept is different than my function, it's important to note that a column of x number of A's begins every 26^(x-1) + 26^(x-2) + ... + 26^2 + 26 + 1. (e.g. 'AAA' begins on 26^2 + 26 + 1 = 703)
function [col_str] = let_loc(num_loc)
test = 2;
old = 0;
x = 0;
while test >= 1
old = 26^x + old;
test = num_loc/old;
x = x + 1;
end
num_letters = x - 1;
str_array = zeros(1,num_letters);
for i = 1:num_letters
loc = floor(num_loc/(26^(num_letters-i)));
num_loc = num_loc - (loc*26^(num_letters-i));
str_array(i) = char(65 + (loc - 1));
end
col_str = strcat(str_array(1:length(str_array)));
end
Hope this saves someone some time!

Resources