sas local maximum time series - time

I have a time series of precipitation data for 30 years (jahre) for a lot of stations (idgem). Each year has a value for a half month step like this:
idgem year jan1 jan2 feb1 feb2 ... dec 1 dec 2
1 1960 20 22 25 10 ... 32 30
1 1961 22 25 30 20 ... 30 25
[![example data][1]][1]
Now I want to find out the local maxima in the way that the values are kept, if they are bigger than the previous and the next. I tried it like this
data test3;
set test2;
array maxi [24] jan1 jan2 feb1 feb2 mar1 mar2 apr1 apr2 may1 may2 jun1
jun2 jul1 jul2 aug1 aug2 sep1 sep2 oct1 oct2 nov1 nov2 dec1 dec2;
do i=1 to 24;
if maxi [i-1] < maxi [i]> maxi [i+1] then maxi [i] = maxi [i];
else maxi [i]=.;
end;
run;
I always get an error message "Array subscript out of range
".
Any ideas how I can tell SAS to compare wiht the previous and next value and keep the compared value if it is bigger? And how do I compare the value for example of jan1 of 1961 to dec2 of 1960?
[1]: https://i.stack.imgur.com/CId0x.gif i got help and now i have this solution
data test3;
set test2;
array local [24];
do _n_ = 1 to 24;
local[_n_] = "" ;
end;
array maxi [24] jan1 jan2 feb1 feb2 mar1 mar2 apr1 apr2 may1 may2 jun1
jun2 jul1 jul2 aug1 aug2 sep1 sep2 oct1 oct2 nov1 nov2 dec1 dec2;
do _n_ = 2 to 23;
if (maxi [_n_-1] < maxi [_n_]) and (maxi [_n_]> maxi [_n_+1]) then
local [_n_] = maxi [_n_];
if maxi [_n_] = maxi [_n_-1] then local [_n_] = maxi [_n_];
if maxi [_n_] = maxi [_n_+1] then local [_n_] = maxi [_n_];
end;
do _n_ = 1;
if (maxi [_n_] > maxi [_n_+1]) and (maxi [_n_] > lag(dec2)) then local
[_n_] = maxi [_n_];
if (maxi [_n_] = maxi [_n_+1]) and (maxi [_n_] = lag(dec2)) then local
[_n_] = maxi [_n_];
end;
do _n_ = 24; ???
run;
with the LOG FUNKTION i can compare jan1 with the value dec2 from the previous observation. but how can i compare dec2 with jan1 of the following observation?
any ideas?

So you could just use the LAG() function to find the previous version of the last element (DEC2) and use a "lead" technique to get the next version of the first element. So your MAXI array can now be made with 26 elements and new LOCAL maximum array will have 24 elements. That will prevent indexing outside of the array bounds.
data test3;
set test2;
set test2 (firstobs=2 keep=jan1 rename=(jan1=next)) test2(obs=1 drop=_all_);
array local [24];
previous = lag(dec2);
array maxi
previous
jan1 jan2 feb1 feb2 mar1 mar2 apr1 apr2 may1 may2 jun1 jun2
jul1 jul2 aug1 aug2 sep1 sep2 oct1 oct2 nov1 nov2 dec1 dec2
next
;
do i=1 to dim(local);
local(i)= max(maxi(i),maxi(i+1),maxi(i+2));
end;
drop i;
run;

Related

Find linear trend up to the maximum value using awk

I have a datafile as below:
ifile.txt
-10 /
-9 /
-8 /
-7 3
-6 4
-5 13
-4 16
-3 17
-2 23
-1 26
0 29
1 32
2 35
3 38
4 41
5 40
6 35
7 30
8 25
9 /
10 /
Here "/" are the missing values. I would like to compute the linear trend up to the maximum value in the y-axis (i.e. up to the value "41" in 2nd column). So it should calculate the trend from the following data:
-7 3
-6 4
-5 13
-4 16
-3 17
-2 23
-1 26
0 29
1 32
2 35
3 38
4 41
Other (x, y) won't be consider because the y values are less than 41 after (4, 41)
The following script is working fine for all values:
awk '!/\//{sx+=$1; sy+=$2; c++;
sxx+=$1*$1; sxy+=$1*$2}
END {det=c*sxx-sx*sx;
print (det?(c*sxy-sx*sy)/det:"DIV0")}' ifile.txt
But I can't able to do it for maximum value
For the given example the result will be 3.486
Updated based on your comments. I assumed your trend calculations were good and used them:
$ awk '
$2!="/" {
b1[++j]=$1 # buffer them up until or if used
b2[j]=$2
if(max=="" || $2>max) { # once a bigger than current max found
max=$2 # new champion
for(i=1;i<=j;i++) { # use all so far buffered values
# print b1[i], b2[i] # debug to see values used
sx+=b1[i] # Your code from here on
sy+=b2[i]
c++
sxx+=b1[i]*b1[i]
sxy+=b1[i]*b2[i]
}
j=0 # buffer reset
delete b1
delete b2
}
}
END {
det=c*sxx-sx*sx
print (det?(c*sxy-sx*sy)/det:"DIV0")
}' file
For data:
0 /
1 1
2 2
3 4
4 3
5 5
6 10
7 7
8 8
with debug print uncommented program would output:
1 1
2 2
3 4
4 3
5 5
6 10
1.51429
You can do the update of the concerned rows only when $2 > max and save the intermediate rows into variables. for example using associate arrays:
awk '
$2 == "/" {next}
$2 > max {
# update max if $2 > max
max = $2;
# add all elemenet of a1 to a and b1 to b
for (k in a1) {
a[k] = a1[k]; b[k] = b1[k]
}
# add the current row to a, b
a[NR] = $1; b[NR] = $2;
# reset a1, b1
delete a1; delete b1;
next;
}
# if $2 <= max, then set a1, b1
{ a1[NR] = $1; b1[NR] = $2 }
END{
for (k in a) {
#print k, a[k], b[k]
sx += a[k]; sy += b[k]; sxx += a[k]*a[k]; sxy += a[k]*b[k]; c++
}
det=c*sxx-sx*sx;
print (det?(c*sxy-sx*sy)/det:"DIV0")
}
' ifile.txt
#3.48601
Or calculate sx, sy etc directly instead of using arrays:
awk '
$2 == "/" {next}
$2 > max {
# update max if $2 > max
max = $2;
# add the current Row plus the cached values
sx += $1+sx1; sy += $2+sy1; sxx += $1*$1+sxx1; sxy += $1*$2+sxy1; c += 1+c1
# reset the cached variables
sx1 = 0; sy1 = 0; sxx1 = 0; sxy1 = 0; c1 = 0;
next;
}
# if $2 <= max, then calculate and cache the values
{ sx1 += $1; sy1 += $2; sxx1 += $1*$1; sxy1 += $1*$2; c1++ }
END{
det=c*sxx-sx*sx;
print (det?(c*sxy-sx*sy)/det:"DIV0")
}
' ifile.txt

Subset sum with maximum equal sums and without using all elements

You are given a set of integers and your task is the following: split them into 2 subsets with an equal sum in such way that these sums are maximal. You are allowed not to use all given integers, that's fine. If it's just impossible, report error somehow.
My approach is rather straightforward: at each step, we pick a single item, mark it as visited, update current sum and pick another item recursively. Finally, try skipping current element.
It works on simpler test cases, but it fails one:
T = 1
N = 25
Elements: 5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5
One can run it as follows:
1 25 5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5
I expect sum to be equal 239, but it the algorithm fails to find such solution.
I've ended up with the following code:
#include <iostream>
#include <unordered_set>
using namespace std;
unordered_set<uint64_t> visited;
const int max_N = 50;
int data[max_N];
int p1[max_N];
int p2[max_N];
int out1[max_N];
int out2[max_N];
int n1 = 0;
int n2 = 0;
int o1 = 0;
int o2 = 0;
int N = 0;
void max_sum(int16_t &sum_out, int16_t sum1 = 0, int16_t sum2 = 0, int idx = 0) {
if (idx < 0 || idx > N) return;
if (sum1 == sum2 && sum1 > sum_out) {
sum_out = sum1;
o1 = n1;
o2 = n2;
for(int i = 0; i < n1; ++i) {
out1[i] = p1[i];
}
for (int i = 0; i < n2; ++i) {
out2[i] = p2[i];
}
}
if (idx == N) return;
uint64_t key = (static_cast<uint64_t>(sum1) << 48) | (static_cast<uint64_t>(sum2) << 32) | idx;
if (visited.find(key) != visited.end()) return;
visited.insert(key);
p1[n1] = data[idx];
++n1;
max_sum(sum_out, sum1 + data[idx], sum2, idx + 1);
--n1;
p2[n2] = data[idx];
++n2;
max_sum(sum_out, sum1, sum2 + data[idx], idx + 1);
--n2;
max_sum(sum_out, sum1, sum2, idx + 1);
}
int main() {
int T = 0;
cin >> T;
for (int t = 1; t <= T; ++t) {
int16_t sum_out;
cin >> N;
for(int i = 0; i < N; ++i) {
cin >> data[i];
}
n1 = 0;
n2 = 0;
o1 = 0;
o2 = 0;
max_sum(sum_out);
int res = 0;
int res2 = 0;
for (int i = 0; i < o1; ++i) res += out1[i];
for (int i = 0; i < o2; ++i) res2 += out2[i];
if (res != res2) cerr << "ERROR: " << "res1 = " << res << "; res2 = " << res2 << '\n';
cout << "#" << t << " " << res << '\n';
visited.clear();
}
}
I have the following questions:
Could someone help me to troubleshoot the failing test? Are there any obvious problems?
How could I get rid of unordered_set for marking already visited sums? I prefer to use plain C.
Is there a better approach? Maybe using dynamic programming?
Another approach is consider all the numbers till [1,(2^N-2)].
Consider the position of each bit to position of each element .Iterate all numbers from [1,(2^N-2)] then check for each number .
If bit is set you can count that number in set1 else you can put that number in set2 , then check if sum of both sets are equals or not . Here you will get all possible sets , if you want just one once you find just break.
1) Could someone help me to troubleshoot the failing test? Are there any obvious problems?
The only issue I could see is that you have not set sum_out to 0.
When I tried running the program it seemed to work correctly for your test case.
2) How could I get rid of unordered_set for marking already visited sums? I prefer to use plain C.
See the answer to question 3
3) Is there a better approach? Maybe using dynamic programming?
You are currently keeping track of whether you have seen each choice of value for first subset, value for second subset, amount through array.
If instead you keep track of the difference between the values then the complexity significantly reduces.
In particular, you can use dynamic programming to store an array A[diff] that for each value of the difference either stores -1 (to indicate that the difference is not reachable), or the greatest value of subset1 when the difference between subset1 and subset2 is exactly equal to diff.
You can then iterate over the entries in the input and update the array based on either assigning each element to subset1/subset2/ or not at all. (Note you need to make a new copy of the array when computing this update.)
In this form there is no use of unordered_set because you can simply use a straight C array. There is also no difference between subset1 and subset2 so you can only keep positive differences.
Example Python Code
from collections import defaultdict
data=map(int,"5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5".split())
A=defaultdict(int) # Map from difference to best value of subset sum 1
A[0] = 0 # We start with a difference of 0
for a in data:
A2 = defaultdict(int)
def add(s1,s2):
if s1>s2:
s1,s2=s2,s1
d = s2-s1
if d in A2:
A2[d] = max( A2[d], s1 )
else:
A2[d] = s1
for diff,sum1 in A.items():
sum2 = sum1 + diff
add(sum1,sum2)
add(sum1+a,sum2)
add(sum1,sum2+a)
A = A2
print A[0]
This prints 239 as the answer.
For simplicity I haven't bothered with the optimization of using a linear array instead of the dictionary.
A very different approach would be to use a constraint or mixed integer solver. Here is a possible formulation.
Let
x(i,g) = 1 if value v(i) belongs to group g
0 otherwise
The optimization model can look like:
max s
s = sum(i, x(i,g)*v(i)) for all g
sum(g, x(i,g)) <= 1 for all i
For two groups we get:
---- 31 VARIABLE s.L = 239.000
---- 31 VARIABLE x.L
g1 g2
i1 1
i2 1
i3 1
i4 1
i5 1
i6 1
i7 1
i8 1
i9 1
i10 1
i11 1
i12 1
i13 1
i14 1
i15 1
i16 1
i17 1
i18 1
i19 1
i20 1
i21 1
i22 1
i23 1
i25 1
We can easily do more groups. E.g. with 9 groups:
---- 31 VARIABLE s.L = 52.000
---- 31 VARIABLE x.L
g1 g2 g3 g4 g5 g6 g7 g8 g9
i2 1
i3 1
i4 1
i5 1
i6 1
i7 1
i8 1
i9 1
i10 1
i11 1
i12 1
i13 1
i14 1
i15 1
i16 1
i17 1
i19 1
i20 1
i21 1
i22 1
i23 1
i24 1
i25 1
If there is no solution, the solver will select zero elements in each group with a sum s=0.

How do you find which row and column a number belongs to in Floyd Triangle

How do you find which row and column does a number belongs to in Floyd Triangle?
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54 55
For example,
33 is in the 8th row and 5th column (input 33 → output 8th row, 5th column)
46 is in the 10th row and 1st column
27 is in the 7th row and 6th column
Thank you so much in advance!
Note that n-th row ends with value n*(n+1)/2. So you can make quadratic equation and solve it to get row number for given number k
n*(n+1)/2 = k
n^2 + n - 2*k = 0
D = 1 + 8*k
n_row = Ceil((-1 + Sqrt(D)) / 2) //round float value up
For example, for k=33 you can calculate
n_row = Ceil((-1 + Sqrt(265)) / 2) =
Ceil(7.639) =
8
Having n_row, find the last number of previous row and position of k in the current row
n_Column = 33 - n_row * (n_row - 1) / 2 =
33 - 28 =
5
Pseudocode for alternative method of row finding:
sum = 0
row = 0
while sum < k do
row++
sum = sum + row
I think that this approach is somehow more natural:
#include <iostream>
size_t getRow(size_t n)
{ // just count the rows, and when you meet the number, return the row
size_t row(0), k(1);
for (row = 1; row <= n; row++)
{
for (size_t j = 1; j <= row; j++)
{
if (k == n)
{
goto end;
}
k++;
}
}
end:return row;
}
size_t getCol(size_t n)
{ /* well, we have already found the row, so now knowing that every n-th row starts
with n(n-1)/2+1 and ends with n(n+1)/2, just count the columns and when you
meet the number (and that surely will happen), just return the column and you're done*/
size_t col(1);
size_t r = getRow(n);
for (size_t j = r * (r - 1) / 2+1; j <= r*(r+1)/2; j++)
{
if (j == n)
{
break;
}
col++;
}
return col;
}
int main()
{
size_t n;
std::cin >> n;
std::cout << "Number " << n << " lies in row " << getRow(n) << ", column " << getCol(n) << " of the Floyd's triangle.\n";
return 0;
}
In python this looks like this (if you don't want to use sqrt):
def rc(n):
# rc(1) = (1, 1); rc(33) -> (8, 5)
assert n > 0 and int(n) == n
sum = 0
row = 0
while sum < n:
row += 1
sum += row
col = n - sum + row
return row, col

SAS 9.3 Sorting Data by rows

I have an excel file that I imported into SAS that contains 3 variables and 3 observations.
All values are numbers.
24 12 47
99 30 14
50 5 41
Is there a way I can code so that each row is sorted in ascending order?
Result would be:
12 24 47
14 30 99
5 41 50
I need to do this for several excel files that contain huge number of variables and observations.
Thank You.
The simple way is to use CALL SORTN which sorts across rows.
data have;
input a b c;
datalines;
24 12 47
99 30 14
50 5 41
;
run;
data have;
modify have;
call sortn(of _numeric_);
run;
I would use a FCMP sort routine. FCMP functions and subroutines only allow temporary arrays to be passed to them for modification. So you have to assign the values into a temporary array, sort, and then reassign to the permanent variables.
Modify the code below for your number of columns and column names.
options cmplib=work.cmp;
proc fcmp outlib=work.cmp.fns;
subroutine qsort(arr[*],lo,hi);
outargs arr;
i = lo;
j = hi;
do while (i < hi);
pivot = arr[floor((lo+hi)/2)];
do while (i<=j);
do while (arr[i] < pivot);
i = i + 1;
end;
do while (arr[j] > pivot);
j = j - 1;
end;
if (i<=j) then do;
t = arr[i];
arr[i] = arr[j];
arr[j] = t;
i = i + 1;
j = j - 1;
end;
end;
if (lo < j) then
call qsort(arr,lo,j);
lo = i;
j = hi;
end;
endsub;
run;
quit;
data test;
input a b c;
datalines;
24 12 47
99 30 14
50 5 41
;
run;
%let ncol=3;
%let cols = a b c;
data sorted;
set test;
array vars[&ncol] &cols;
/*Only temporary arrays can be passed to FCMP functions*/
array tmp[&ncol] _temporary_;
/*Assign to tmp*/
do i=1 to &ncol;
tmp[i] = vars[i];
end;
/*Sort*/
call qsort(tmp,1,&ncol);
/*Put back sorted values*/
do i=1 to &ncol;
vars[i] = tmp[i];
end;
drop i;
run;
Though there's a package SAS/IML designed specifically for manipulations with matrices (where, I believe, this task would be trivial), it still can be done with SAS Base using a couple of PROCs wrapped into macro loop.
data raw;
input a b c;
datalines;
24 12 47
99 30 14
50 5 41
;
run;
proc transpose data=raw out=raw_t(drop=_:); run;
proc sql noprint;
select name into :vars separated by ' '
from sashelp.vcolumn
where libname='WORK' and memname='RAW_T';
quit;
%macro sort_rows;
%do i=1 %to %sysfunc(countw(&vars));
proc sort data=raw_t(keep=%scan(&vars,&i)) out=column;
by %scan(&vars,&i);
run;
data sortedrows;
%if &i>1 %then set sortedrows;;
set column;
run;
%end;
%mend sort_rows;
%sort_rows
proc transpose data=sortedrows out=sortedrows(drop=_:); run;
First, you transpose your original dataset.
Then you iterate through all columns (which were rows originally) one by one, sorting them and right-joining to each other.
And finally, transpose everything back.

Vectorize find function in a for loop

I have the following code which outputs the values of array1 that are less than or equal to each element of array2. The two arrays are not the same length. This for loop is pretty slow since the arrays are large (~500,000 elements). FYI, both arrays are always in ascending order.
Any help making this a vector operation and speeding it up would be appreciated.
I was considering some kind of multi-step process of interp1() with the 'nearest' option. Then finding where the corresponding outArray was larger than array2 and then fixing points somehow ... but I thought there had to be a better way.
array2 = [5 6 18 25];
array1 = [1 5 9 15 22 24 31];
outArray = nan(size(array2));
for a =1:numel(array2)
outArray(a) = array1(find(array1 <= array2(a),1,'last'));
end
returns:
outArray =
5 5 15 24
Here is one possible vectorization:
[~,idx] = max(cumsum(bsxfun(#le, array1', array2)));
outArray = array1(idx);
EDIT:
In recent editions, and thanks to JIT compilations, MATLAB has gotten pretty good at executing good old non-vectorized loops.
Below is some code similar to yours that takes advantage of the fact that the two arrays are sorted (thus if pos(a) = find(array1<=array2(a), 1, 'last') then we are guaranteed that pos(a+1) computed at the next iteration will be no less than the previous pos(a))
pos = 1;
idx = zeros(size(array2));
for a=1:numel(array2)
while pos <= numel(array1) && array1(pos) <= array2(a)
pos = pos + 1;
end
idx(a) = pos-1;
end
%idx(idx==0) = []; %# in case min(array2) < min(array1)
outArray = array1(idx);
Note: The commented line handles the case when the minimum value of array2 is less than the minimum value of array1 (i.e when find(array1<=array2(a)) is empty)
I performed a comparison between all the methods posted so far, and this is indeed the fastest one. The timings (performed using the TIMEIT function) for vectors of length N=5000 were:
0.097398 # your code
0.39127 # my first vectorized code
0.00043361 # my new code above
0.0016276 # Mohsen Nosratinia's code
and here are the timings for N=500000:
(? too-long) # your code
(out-of-mem) # my first vectorized code
0.051197 # my new code above
0.25206 # Mohsen Nosratinia's code
.. a pretty good improvement from the initial 10 minutes you reported down to 0.05 second!
Here is the test code if you are want to reproduce the results:
function [t,v] = test_array_find()
%array2 = [5 6 18 25];
%array1 = [1 5 9 15 22 24 31];
N = 5000;
array1 = sort(randi([100 1e6], [1 N]));
array2 = sort(randi([min(array1) 1e6], [1 N]));
f = {...
#() func1(array1,array2); %# Aero Engy
#() func2(array1,array2); %# Amro
#() func3(array1,array2); %# Amro
#() func4(array1,array2); %# Mohsen Nosratinia
};
t = cellfun(#timeit, f);
v = cellfun(#feval, f, 'UniformOutput',false);
assert( isequal(v{:}) )
end
function outArray = func1(array1,array2)
%idx = arrayfun(#(a) find(array1<=a, 1, 'last'), array2);
idx = zeros(size(array2));
for a=1:numel(array2)
idx(a) = find(array1 <= array2(a), 1, 'last');
end
outArray = array1(idx);
end
function outArray = func2(array1,array2)
[~,idx] = max(cumsum(bsxfun(#le, array1', array2)));
outArray = array1(idx);
end
function outArray = func3(array1,array2)
pos = 1;
lastPos = numel(array1);
idx = zeros(size(array2));
for a=1:numel(array2)
while pos <= lastPos && array1(pos) <= array2(a)
pos = pos + 1;
end
idx(a) = pos-1;
end
%idx(idx==0) = []; %# in case min(array2) < min(array1)
outArray = array1(idx);
end
function outArray = func4(array1,array2)
[~,I] = sort([array1 array2]);
a1size = numel(array1);
J = find(I>a1size);
outArray = nan(size(array2));
for k=1:numel(J),
if I(J(k)-1)<=a1size,
outArray(k) = array1(I(J(k)-1));
else
outArray(k) = outArray(k-1);
end
end
end
One reason for it being slow is that you are comparing all elements in array1 with all elements in array2 so if they contain M and N elements, respectively, the complexity is O(M*N). However, since the arrays are already sorted there is a linear-time, O(M+N), solution for it
array2 = [5 6 18 25];
array1 = [1 5 9 15 22 24 31];
outArray = nan(size(array2));
k1 = 1;
n1 = numel(array1);
n2 = numel(array2);
ks = 1;
while ks <= n2 && array2(ks) < array1(1)
ks = ks + 1;
end
for k2=ks:n2
while k1 < n1 && array2(k2) >= array1(k1+1)
k1 = k1+1;
end
outArray(k2) = array1(k1);
end
Here is a test case to measure the time it takes for each method to run for two arrays of length 500,000.
array2 = 1:500000;
array1 = array2-1;
tic
outArray1 = nan(size(array2));
k1 = 1;
n1 = numel(array1);
n2 = numel(array2);
ks = 1;
while ks <= n2 && array2(ks) < array1(1)
ks = ks + 1;
end
for k2=ks:n2
while k1 < n1 && array2(k2) >= array1(k1+1)
k1 = k1+1;
end
outArray1(k2) = array1(k1);
end
toc
tic
outArray2 = nan(size(array2));
for a =1:numel(array2)
outArray2(a) = array1(find(array1 <= array2(a),1,'last'));
end
toc
And the result is
Elapsed time is 0.067637 seconds.
Elapsed time is 418.458722 seconds.
NOTE: This was my initial solution and is the one that is benchmarked in Amro's answer. However, it is slower than the linear-time solution that I provided in my other answer.
One reason for it being slow is that you are comparing all elements in array1 with all elements in array2 so if they contain M and N elements the complexity is O(M*N). However, you can concatenate them and sort them together and get faster algorithm of complexity (M+N)*log2(M+N). Here is one way of doing it:
array2 = [5 6 18 25];
array1 = [1 5 9 15 22 24 31];
[~,I] = sort([array1 array2]);
a1size = numel(array1);
J = find(I>a1size);
outArray = nan(size(array2));
for k=1:numel(J),
if I(J(k)-1)<=a1size,
outArray(k) = array1(I(J(k)-1));
else
outArray(k) = outArray(k-1);
end
end
disp(outArray)
% Test using original code
outArray = nan(size(array2));
for a =1:numel(array2)
outArray(a) = array1(find(array1 <= array2(a),1,'last'));
end
disp(outArray)
The concatenated array will be
>> [array1 array2]
ans =
1 5 9 15 22 24 31 5 6 18 25
and
>> [B,I] = sort([array1 array2])
B =
1 5 5 6 9 15 18 22 24 25 31
I =
1 2 8 9 3 4 10 5 6 11 7
It shows that in sorted array B the first 5 comes from second position in concatenated array and second 5 from eight position, and so on. So to find the largest element in array1 that is smaller than a given element in array2 we just need to go through all indices in I that are larger than the size of array1 (therefore belonging to array2) and go back and find closest index belonging to array1. J contains the position of these elements in vector I:
>> J = find(I>a1size)
J =
3 4 7 10
Now the for loop goes through these indices and checks if in I the index right before each index referenced from J belongs to array1or not. If it belongs to array1 it retrieves it value from array1 otherwise it copies the value found for previous index.
Note that both your code and this code fails if array2 contains an element that is smaller than the smallest element in array1.

Resources