Related
I wish to run a series of multinomial logits (600ish per covariate of interest) and gather the z-statistics from each of these (I do not care about the order in which these are recorded).
These mlogits are run on a small piece of my data (sharing a group ID). The mlogits have a varying number of outcomes involved (n), and there will be (n - 1) z statistics to gather from each mlogit. Each mlogit takes the form: y = a + _b*x + \epsilon where y can take on between 2 and 9 values (in my data), although the mean is 3.7.
I believe the difficulty comes in pulling these z-stats out of the mlogit, as there is no way I know to directly call a matrix of z-stats. My solution is to construct the z-stats from the e(V) and e(b) matrices. For each iteration of the mlogit, I construct a matrix of z-stats; I then append this to the previous matrix of z-stats (thereby building a matrix of all of them calculated). Unfortunately, my code does not seem to do this properly.
The symptoms are as follows. The matrix mat_covariate includes many missing values (over half of the matrix values have been missing in the troubleshooting I have done). It also includes many zeroes (which are possible, but unlikely - especially at this rate, about 16%). As written, the code does not yet suppress the mlogits I run, and so I can go back and check what makes it into the matrix. At most one value from each mlogit is recorded, but these are often recorded multiple times. 40% of the mlogits had nothing recorded.
The relevant loop is below:
local counter = 1
forvalues i = 1/`times' {
preserve
keep if group_id==`i'
foreach covariate in `covariates' {
if `counter' == 1 {
mlogit class `covariate'
sum outcomes_n, meanonly
local max = `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate' = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'[`j',1] = `z'
}
}
else {
mlogit class `covariate'
sum outcomes_n, meanonly
local max `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate'_temp = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'_temp[`j',1] = `z'
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
}
local counter = `counter'+1
restore
}
Some reasons for why I did some of the things in the loop. I believe these things work, but they are not my first instincts, and I am unclear why my first instinct does not work. If there's a simpler/more elegant way to solve them, that would be a nice bonus:
the main if/else (and the counter) is to solve the issue that I cannot define a matrix as a function of itself when it has not yet been defined.
I define a local for the max, and a separate one for the (max-1). The forvalues loop would not accept "1/(`max'-1) {" and I am unsure why.
I created some sample data that can be used to replicate this problem. Below is code for a .do file which sets up data, locals for the loop, the loop above, and demonstrates the symptoms by displaying the matrix in question:
clear all
version 14
//================== sample data: ==================
set obs 500
set seed 12345
gen id = _n
gen group_id = .
replace group_id = 1 if id <= 50
replace group_id = 2 if id <= 100 & missing(group_id)
replace group_id = 3 if id <= 150 & missing(group_id)
replace group_id = 4 if id <= 200 & missing(group_id)
replace group_id = 5 if id <= 250 & missing(group_id)
replace group_id = 6 if id <= 325 & missing(group_id)
replace group_id = 7 if id <= 400 & missing(group_id)
replace group_id = 8 if id <= 500 & missing(group_id)
gen temp_subgroup_id = .
replace temp_subgroup_id = floor((3)*runiform() + 2) if group_id < 6
replace temp_subgroup_id = floor((4)*runiform() + 2) if group_id < 8 & missing(temp_subgroup_id)
replace temp_subgroup_id = floor((5)*runiform() + 2) if missing(temp_subgroup_id)
egen subgroup_id = group(group_id temp_subgroup_id)
bysort subgroup_id : gen subgroup_size = _N
bysort group_id subgroup_id : gen tag = (_n == 1)
bysort group_id : egen outcomes_n = total(tag)
gen binary_x = floor(2*runiform())
//================== locals: ==================
local covariates binary_x
local times = 8
// times is equal to the number of group_ids
//================== loop in question: ==================
local counter = 1
forvalues i = 1/`times' {
preserve
keep if group_id==`i'
foreach covariate in `covariates' {
if `counter' == 1 {
mlogit subgroup_id `covariate'
sum outcomes_n, meanonly
local max = `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate' = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'[`j',1] = `z'
}
}
else {
mlogit subgroup_id `covariate'
sum outcomes_n, meanonly
local max `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate'_temp = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'_temp[`j',1] = `z'
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
}
local counter = `counter' + 1
restore
}
//================== symptoms: ==================
matrix list mat_binary_x
I'm trying to figure out what is wrong in my code, but have been unable to find the issue (although I've found some other smaller errors, but none that have had an impact on the main problem - I would be unsurprised if there are multiple bugs).
Consider the simplest case when i == 1 and max_minus == 2:
preserve
keep if group_id == 1
summarize outcomes_n, meanonly
local max = `r(max)'
local max_minus = `max' - 1
mlogit subgroup_id binary_x
matrix V = e(V)
matrix b = e(b)
This produces the following:
. matrix list V
symmetric V[6,6]
1: 1: 2: 2: 3: 3:
o. o.
binary_x _cons binary_x _cons binary_x _cons
1:binary_x .46111111
1:_cons -.225 .225
2:o.binary_x 0 0 0
2:o._cons 0 0 0 0
3:binary_x .2111111 -.09999999 0 0 .47896825
3:_cons -.09999999 .09999999 0 0 -.24285714 .24285714
. matrix list b
b[1,6]
1: 1: 2: 2: 3: 3:
o. o.
binary_x _cons binary_x _cons binary_x _cons
y1 .10536052 -.22314364 0 0 .23889194 -.35667502
. local j = `max_minus'
. display "z = `= b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)'"
z = .
The value of z is missing because you are dividing the value of a row in the
matrix e(b) that does not exist. In other words, your loops are
not set up correctly and substitute incorrect values.
I'm currently working on an OpenVibe Session in which I must program a Lua Script. My problem is generating a random table with 2 values: 1s and 2s. If the value in table is 1, then send Stimulus through output 1. And if it's 2, then through output 2.
My question is how I can generate in Lua code a table of 52 1s and 2s (44 1s and 8 2s which correspond to 85% 1s and 15% 2s) in a way that you have at least 3 1s before the next 2s? Somehow like this: 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2.
I´m not an expert in Lua. So any help would be most appreciated.
local get_table_52
do
local cached_C = {}
local function C(n, k)
local idx = n * 9 + k
local value = cached_C[idx]
if not value then
if k == 0 or k == n then
value = 1
else
value = C(n-1, k-1) + C(n-1, k)
end
cached_C[idx] = value
end
return value
end
function get_table_52()
local result = {}
for j = 1, 52 do
result[j] = 1
end
local r = math.random(C(28, 8))
local p = 29
for k = 8, 1, -1 do
local b = 0
repeat
r = r - b
p = p - 1
b = C(p - 1, k - 1)
until r <= b
result[p + k * 3] = 2
end
return result
end
end
Usage:
local t = get_table_52()
-- t contains 44 ones and 8 twos, there are at least 3 ones before next two
Here is the logic.
You have 8 2s. Before each 2 there is a string of 3 1s. That's 32 of your numbers.
Those 8 groups of 1112 separate 9 spots that the remaining 20 1s can go.
So your problem is to randomly distribute 20 1s to 9 random places. And then take that collection of numbers and write out your list. So in untested code from a non-Lua programmer:
-- Populate buckets
local buckets = {0, 0, 0, 0, 0, 0, 0, 0, 0}
for k = 1, 20 do
local bucket = floor(rand(9))
buckets[bucket] = buckets[bucket] + 1
end
-- Turn that into an array
local result = {}
local i = 0
for bucket = 0, 8 do
-- Put buckets[bucket] 1s in result
if 0 < buckets[bucket] do
for j = 0, buckets[bucket] do
result[i] = 1
i = i + 1
end
end
-- Add our separating 1112?
if bucket < 8 do
result[i] = 1
result[i+1] = 1
result[i+2] = 1
result[i+3] = 2
i = i + 4
end
end
I have a data, which consists of a number of chunks. I now that they come from some continuous curve, but later were shifted in the y-direction. Now I want to shift them back to estimate original curve. Some parts are not shifted, but just absent. To clarify the situation dummy code to generate something similar is below (Matlab):
%% generate some dummy data
knots = rand(10,2);
% fix starting and stop points
knots = [[0,rand()];knots;[1,rand()]];
% sort knots
knots=unique(knots,'rows');
% generate dummy curve
dummyX = linspace(0,1,10^4);
dummyY = interp1(knots(:,1),knots(:,2),dummyX,'spline');
figure()
subplot(2,1,1)
plot(dummyX,dummyY)
%% Add offset and wipe some parts
% get borders of chunks
borders = [1;randi([1,numel(dummyX)],20,1);numel(dummyX)];
borders = unique(borders);
borders = [borders(1:end-1)+1,borders(2:end)];
borders(1) = 1;
% add ofsets or nans
offset = (rand(size(borders,1),1)-0.5)*5;
offset(randperm(numel(offset),floor(size(borders,1)/3)))=nan;
for iBorder = 1:size(borders,1)
idx = borders(iBorder,1): borders(iBorder,2);
dummyY(idx)=dummyY(idx)+offset(iBorder);
dummyY(idx([1,end]))=nan;
end
subplot(2,1,2)
plot(dummyX,dummyY)
Original curve is on top, shifted on the bottom. I try to shift chunks pairwise, minimizing the length of the cubic spline, but it did not work for me. I understand, that it is impossible to obtain absolutely same curve (I may lose some peaks).
Could You help me to find the best shifts?
I had several ideas for this and played with overall curvature, arc length, etc. as well as mixed combinations. Turned out that a simple chi**2 works best. So it goes as simple as this:
Get some knots to fit every chunk with a given precision by splines
join everything
reduce knots to avoid very close knots in touching sets, those can result in large curvature.
use leastsq fit on entire set with splines on joined and reduced set of knots to find shifts.
In theory one could play with / modify:
spline order
min knot density
max knot density
how adjacent sets are dealt with
adding a knot to a large gap
etc.
(Note: In some random data the splrev produced error messages. As those are mostly not very helpful, I can only say that this code is not 100% robust.)
Code is as follows
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d, splrep, splev
from scipy.optimize import fmin, leastsq
def reduce_knots( inList, dist ):
outList=[]
addList=[]
for i in inList:
try:
if abs( i - addList[ -1 ] ) < dist:
addList += [ i ]
else:
outList += [ addList ]
addList = [ i ]
except IndexError:### basically the first
addList = [ i]
outList += [ addList ]
return [ sum( x ) / len( x ) for x in outList ]
def adaptive_knots( inX, inY, thresh=.005 ):
ll = len( inX )
sup = ll - 4
assert sup > 3
nN = 3
test = True
while test:
testknots = np.linspace( 1, len( inX ) - 2, nN, dtype=np.int )
testknots = [ inX[ x ] for x in testknots ]
myTCK= splrep( inX , inY, t=testknots )
newY = splev( inX , myTCK )
chi2 = np.sum( ( newY - inY )**2 ) / ll
if chi2 > thresh:
nN += 1
if nN > sup:
test = False
else:
test = False
return testknots
def global_residuals( shiftList, xBlocks, yBlocks, allTheKnots ):# everything shifted (1 is redundant by global offset) Blocks must be ordered an np.arrays
localYBlocks = [ s + yList for s, yList in zip( shiftList, yBlocks ) ]
allTheX = np.concatenate( xBlocks )
allTheY = np.concatenate( localYBlocks )
tck = splrep( allTheX, allTheY, t=allTheKnots )
yList = splev( allTheX, tck )
diff = yList - allTheY
return diff
#~ np.random.seed( 28561 )
np.random.seed( 5561 )
#~ np.random.seed( 733437 )
### python way for test data
knots = np.random.rand( 8, 2 )
knots = np.array( sorted( [ [ 0, np.random.rand() ] ] + list( knots ) + [ [ 1, np.random.rand() ] ], key=lambda x: x[ 0 ] ) )
dummyX = np.linspace( 0, 1, 3e4 )
f = interp1d( knots[ :, 0 ], knots[ :, 1 ], 'cubic' )
dummyY = np.fromiter( ( f( x ) for x in dummyX ), np.float )
chunk = np.append( [ 0 ], np.append( np.sort( np.random.randint( 7, high=len( dummyX ) - 10 , size= 10, dtype=np.int ) ), len( dummyX ) ) )
xDataDict = dict()
yDataDict = dict()
allX = np.array( [] )
allY = np.array( [] )
allK = np.array( [] )
allS = []
for i, val in enumerate(chunk[ : -1 ] ):
if np.random.rand() < .75: ## 25% of not appearing
xDataDict[ i ] = dummyX[ val:chunk[ i + 1 ] ]
realShift = 1.5 * ( 1 - 2 * np.random.rand() )
allS += [ realShift ]
yDataDict[ i ] = dummyY[ val:chunk[ i + 1 ] ] + realShift
yDataDict[ i ] = np.fromiter( ( np.random.normal( scale=.05, loc=y ) for y in yDataDict[ i ] ), np.float )
allX = np.append( allX, xDataDict[ i ] )
allY = np.append( allY, yDataDict[ i ] )
### Plotting
fig = plt.figure()
ax = fig.add_subplot( 3, 1, 1 )
ax.plot( knots[ :, 0 ],knots[ :, 1 ], ls='', c='r', marker='o')
ax.plot( dummyX , dummyY, '--' )
for key in xDataDict.keys():
ax.plot(xDataDict[ key ], yDataDict[ key ] )
myKnots = adaptive_knots( xDataDict[ key ], yDataDict[ key ] )
allK = np.append( allK, myKnots )
myTCK = splrep( xDataDict[ key ], yDataDict[ key ], t=myKnots )
ax.plot( xDataDict[ key ], splev( xDataDict[ key ] , myTCK ) )
myTCK = splrep( allX, allY, t=allK )
ax.plot( allX, splev( allX, myTCK ) )
for x in allK:
ax.axvline( x=x, linestyle=':', color='#AAAAAA', linewidth=1 )
### now fitting
myXBlockList = []
myYBlockList = []
for key in sorted( xDataDict.keys() ):
myXBlockList += [ xDataDict[ key ] ]
myYBlockList += [ yDataDict[ key ] ]
#start values
s = [ 0 ]
for i,y in enumerate( myYBlockList[ :-1 ] ):
ds = myYBlockList[ i + 1 ][ 0 ] - y[ -1 ]
s += [ -ds ]
startShift = np.cumsum( s )
allK = reduce_knots( allK, .01 )
sol, ierr = leastsq( global_residuals, x0=startShift, args=( myXBlockList, myYBlockList, allK ), maxfev=10000 )
sol = np.array(sol) - sol[ 0 ]
print "solution: ", -sol
print "real: ", np.array( allS ) - allS[ 0 ]
### Plotting solutions
bx = fig.add_subplot( 3, 1, 3, sharex=ax )
for x, y, s in zip( myXBlockList, myYBlockList, sol ):
bx.plot( x, y + s )
localYBlocks = [ s + yList for s,yList in zip( sol, myYBlockList ) ]
allTheX = np.concatenate( myXBlockList )
allTheY = np.concatenate( localYBlocks )
tck = splrep( allTheX, allTheY, t=allK )
dx = allTheX[ 1 ] - allTheX[ 0 ]
testX = np.arange( allTheX[ 0 ], allTheX[ -1 ], dx )
finalyList = splev( testX, tck)
bx.plot( testX, finalyList , 'k--' )
mean = sum( dummyY ) / len( dummyY ) - sum( finalyList ) / len( finalyList )
bx.plot( dummyX, dummyY - mean, '--' )
for x in allK:
bx.axvline( x=x, linestyle=':', color='#AAAAAA', linewidth=1 )
cx = fig.add_subplot( 3, 1, 2, sharex=ax )
for x, y, s in zip( myXBlockList, myYBlockList, startShift ):
cx.plot( x, y + s )
plt.show()
For small gaps this works nicely on the test data
The upper graph shows the original spline and its knots as red dots. This generated the data. Moreover, it shows the noisy shifted chunks, the initial fitting knots as vertical lines and an according spline fit.
Mid graph shows the chunks shifted by the pre-calculated start values - aligned ends.
Lower graph shows original spline, fitted spline, reduced knot positions, and chunks shifted according to the fit solution.
Naturally, the larger the gaps the more the solution deviates from the original
...but still quite good.
This problem appeared in some regional contest for ICPC.
Given n numbers, you have to remove numbers between i to j such that remaining numbers have least average. You can't remove first and last numbers.
2 <= n <= 10^5
We had a discussion about it, and I am still not able to understand it. Some how this problem can be converted to finding contiguous subarray with maximum sum and then it was solved with binary search in O(nlog n).
I couldn't catch that solution while discussion and now after thinking a lot I am not able to understand that solution.
Link to the original problem in case it's not clear: http://programmingteam.cc.gatech.edu/contest/Mercer14/problems/6.pdf
Here is an approach that I think might work:
Compute the partial average from left for all elements, with and updating average, this can be done in O(N): a_L(i) = (a_L(i-1)*(i-1) + a_L(i))/i
Do the same for the partial averages from the right: a_R(i) = (a_R(i+1)*(N-i) + a_R(i))/(N-i+1)
Find the minimum in of both lists.
If the minimum is in the left partial averages (a_L), look for the minimum right to it in the a_R and the other way around if the minimum is found in a_R.
All parts take O(N). Thus, this would result in an O(N) algorithm. Though, it sounds a little bit simple and I might be missing something.
Edit: The original answer stopped in the middle for both lists, which is insufficient on second thought.
Actually, if the minima overlap, I believe, there is no interval to cut out. Here is a little Python implementation of the algorithm:
grades = [5, 5, 1, 7, 8, 2]
N = len(grades)
glob_avg = float(sum(grades))/float(N)
print('total average: {0}'.format(glob_avg))
avg_L = grades[:]
avg_R = grades[:]
minL = 0
minR = N-1
for i in range(1,N):
avg_L[i] = float(avg_L[i-1]*i + grades[i])/float(i+1)
if avg_L[i] <= avg_L[minL]:
minL = i
avg_R[N-i-1] = float(avg_R[N-i]*i + grades[N-i-1])/float(i+1)
if avg_R[N-i-1] <= avg_R[minR]:
minR = N-i-1
opti_avg = glob_avg
if minL < minR:
first = minL+1
last = minR
opti_avg = (avg_L[first-1]*first + avg_R[last]*(N-last)) / float(N + first - last)
print('')
print('Interval to cut: {0} - {1}'.format(first,last))
for pre in grades[:first]:
print('{0}'.format(pre))
for cut in grades[first:last]:
print('X {0} X'.format(cut))
for post in grades[last:]:
print('{0}'.format(post))
else:
print('NO interval found that would reduce the avg!')
print('')
print('--------------------------------------')
print('minimal avg: {0:0.3f}'.format(opti_avg))
print('--------------------------------------')
I would try checking each value above the global minimum, starting with largest.
You can add to left or right (whichever is largest), as long as the average is above the global average.
Keep a note of any minimums to remaining items.
For each item >= global average
While( average( selected) > global average
If average(un selected items) < best so far
Best so far = selected range
End
Add to selection largest of left and right
End while
End for
Only by finding sequences which are above the average will a minimum for unselected work.
Any item which has been considered as a list can be discounted
Had a go at implementing in Python :-
lst = [ -1, -1,1,-90,1,3,-1,-1,1,2,3,1,2,3,4,1, -1,-1];
First solution - look really at an exhausitve test - allow me to verify correctness.
lbound = 0
ubound = len( lst)
print( ubound );
# from http://math.stackexchange.com/questions/106700/incremental-averageing
def Average( lst, lwr, upr, runAvg = 0, runCnt = 0 ):
cnt = runCnt;
avg = runAvg;
for i in range( lwr, upr ):
cnt = cnt + 1
avg = float(avg) + (float(lst[i]) - avg)/cnt
return (avg, cnt )
bestpos_l = 0
bestpos_u = 0
bestpos_avg = 0
best_cnt = 0
######################################################
# solution in O(N^2) - works always
for i in range( 1, len( lst ) - 1 ):
for j in range( i+1, len(lst ) ):
tpl = Average( lst, 0, i ) # get lower end
res = Average( lst, j, len(lst), tpl[0], tpl[1] )
if (best_cnt == 0 or
(best_cnt < res[1] and res[0] == bestpos_avg ) or
res[0] < bestpos_avg ):
bestpos_l = i
bestpos_u = j
bestpos_avg = res[0]
best_cnt = res[1]
print( "better", i,j, res[0], res[1] )
print( "solution 1", bestpos_l, bestpos_u, bestpos_avg, best_cnt )
This came up with valid answers, but I hadn't appreciated, with the current data set, it doesn't really want the right hand side.
########################################################
# O(N)
#
# Try and minimize left/right sides.
#
# This doesn't work - it knows -90 is really good, but can't decide if to
# ignore -90 from the left, or the right, so does neither.
#
lower = []
upper = []
lower_avg = 0
best_lower = lst[0]
lower_i = 0
best_upper = lst[-1]
upper_avg = 0
upper_i = len(lst) -1
cnt = 0
length = len(lst)
for i in range( 0, length ):
cnt = cnt + 1
lower_avg = float( lower_avg) + ( float(lst[i]) - lower_avg)/cnt
upper_avg = float( upper_avg) + ( float(lst[-(i+1)]) - upper_avg)/cnt
upper.append( upper_avg )
lower.append( lower_avg )
if lower_avg <= best_lower:
best_lower = lower_avg
lower_i = i
if upper_avg <= best_upper:
best_upper = upper_avg
upper_i = (len(lst) - (i+1))
if( lower_i + 1 > upper_i ):
sol2 = Average( lst,0, len(lst ))
else:
sol_tmp = Average( lst,0, lower_i+1 )
sol2 = Average( lst, upper_i, len(lst),sol_tmp[0],sol_tmp[1] )
print( "solution 2", lower_i + 1, upper_i, sol2[0],sol2[1] )
The third solution was what I was trying to explain. My implementation is limited because :-
Couldn't find a good way of finding starting points. I wanted to start from the biggest elements, as they are most likely to reduce the average, but haven't got a good way of finding them.
Wasn't sure about the stability of keeping running-averages. Thought about removing items from the average by un-doing each numbers effect. Wasn't sure how this affected precision.
Was fairly sure that any interval which has been checked, can't have a starting item. That would limit further work, but unsure how best to implement such (keeping O(xx) to a minimum.
Solution 3
#################################
## can we remove first / last? if so, this needs adjusting
def ChooseNext( lst, lwr, upr ):
if lwr > 1 and upr < len(lst) -2:
# both sides available.
if lst[lwr-1] > lst[upr]:
return -1
else:
return 1
elif lwr > 1:
return -1
elif upr < len(lst) -2:
return 1
return 0
# Maximize average of data removed.
glbl_average = Average( lst, 0, len(lst) )
found = False
min_pos = 0
max_pos = 0
best_average = glbl_average[0]
for i in range(1, len(lst ) - 1):
# ignore stuff below average.
if lst[i]> glbl_average[0] or (found == False ):
lwr = i
upr = i+1
cnt = 1 # number for average
avg = lst[i]
tmp = Average( lst, 0, lwr)
lcl = Average( lst, upr, len(lst ), tmp[0], tmp[1] )
if found == False or lcl[0] < best_average:
best_average = lcl[0]
min_pos = lwr
max_pos = upr
found = True
# extend from interval (lwr,upr]
choice = ChooseNext( lst, lwr, upr )
while( choice != 0 ):
if( choice == -1):
new_lwr = lwr -1
new_upr = upr
else:
new_lwr = lwr
new_upr = upr + 1
tmp = Average( lst, 0, new_lwr )
lcl_best = Average( lst, new_upr, len(lst), tmp[0], tmp[1] )
if( lcl_best[0] > glbl_average[0]):
choice = 0
else:
lwr = new_lwr
upr = new_upr
if lcl_best[0] < best_average:
min_pos = lwr
max_pos = upr
best_average = lcl_best[0]
choice = ChooseNext( lst, lwr, upr )
print( "solution 3", min_pos, max_pos, best_average )
A sequence of integers is called zigzag sequence if each of its elements is either strictly less or strictly greater than its neighbors.
Example : The sequence 4 2 3 1 5 3 forms a zigzag, but 7 3 5 5 2 and 3 8 6 4 5 don't.
For a given array of integers we need to find the length of its largest (contiguous) sub-array that forms a zigzag sequence.
Can this be done in O(N) ?
Currently my solution is O(N^2) which is just simply taking every two points and checking each possible sub-array if it satisfies the condition or not.
I claim that the length of overlapping sequence of any 2 zigzag sub-sequences is a most 1
Proof by contradiction:
Assume a_i .. a_j is the longest zigzag sub-sequence, and there is another zigzag sub-sequence b_m...b_n overlapping it.
without losing of generality, let's say the overlapping part is
a_i ... a_k...a_j
--------b_m...b_k'...b_n
a_k = b_m, a_k+1 = b_m+1....a_j = b_k' where k'-m = j-k > 0 (at least 2 elements are overlapping)
Then they can merge to form a longer zig-zag sequence, contradiction.
This means the only case they can be overlapping each other is like
3 5 3 2 3 2 3
3 5 3 and 3 2 3 2 3 is overlapping at 1 element
This can still be solved in O(N) I believe, like just greedily increase the zig-zag length whenever possible. If fails, move iterator 1 element back and treat it as a new zig-zag starting point
Keep record the latest and longest zig-zag length you have found
Walk along the array and see if the current item belongs to (fits a definition of) a zigzag. Remember the las zigzag start, which is either the array's start or the first zigzag element after the most recent non-zigzag element. This and the current item define some zigzag subarray. When it appears longer than the previously found, store the new longest zigzag length. Proceed till the end of array and you should complete the task in O(N).
Sorry I use perl to write this.
#!/usr/bin/perl
#a = ( 5, 4, 2, 3, 1, 5, 3, 7, 3, 5, 5, 2, 3, 8, 6, 4, 5 );
$n = scalar #a;
$best_start = 0;
$best_end = 1;
$best_length = 2;
$start = 0;
$end = 1;
$direction = ($a[0] > $a[1]) ? 1 : ($a[0] < $a[1]) ? -1 : 0;
for($i=2; $i<$n; $i++) {
// a trick here, same value make $new_direction = $direction
$new_direction = ($a[$i-1] > $a[$i]) ? 1 : ($a[$i-1] < $a[$i]) ? -1 : $direction;
print "$a[$i-1] > $a[$i] : direction $new_direction Vs $direction\n";
if ($direction != $new_direction) {
$end = $i;
} else {
$this_length = $end - $start + 1;
if ($this_length > $best_length) {
$best_start = $start;
$best_end = $end;
$best_length = $this_length;
}
$start = $i-1;
$end = $i;
}
$direction = $new_direction;
}
$this_length = $end - $start + 1;
if ($this_length > $best_length) {
$best_start = $start;
$best_end = $end;
$best_length = $this_length;
}
print "BEST $best_start to $best_end length $best_length\n";
for ($i=$best_start; $i <= $best_end; $i++) {
print $a[$i], " ";
}
print "\n";
For each index i, you can find the smallest j such that the subarray with index j,j+1,...,i-1,i is a zigzag. This can be done in two phases:
Find the longest "increasing" zig zag (starts with a[1]>a[0]):
start = 0
increasing[0] = 0
sign = true
for (int i = 1; i < n; i ++)
if ((arr[i] > arr[i-1] && sign) || )arr[i] < arr[i-1] && !sign)) {
increasing[i] = start
sign = !sign
} else if (arr[i-1] < arr[i]) { //increasing and started last element
start = i-1
sign = false
increasing[i] = i-1
} else { //started this element
start = i
sign = true
increasing[i] = i
}
}
Do similarly for "decreasing" zig-zag, and you can find for each index the "earliest" possible start for a zig-zag subarray.
From there, finding the maximal possible zig-zag is easy.
Since all oporations are done in O(n), and you basically do one after the other, this is your complexity.
You can combine the both "increasing" and "decreasing" to one go:
start = 0
maxZigZagStart[0] = 0
sign = true
for (int i = 1; i < n; i ++)
if ((arr[i] > arr[i-1] && sign) || )arr[i] < arr[i-1] && !sign)) {
maxZigZagStart[i] = start
sign = !sign
} else if (arr[i-1] > arr[i]) { //decreasing:
start = i-1
sign = false
maxZigZagStart[i] = i-1
} else if (arr[i-1] < arr[i]) { //increasing:
start = i-1
sign = true
maxZigZagStart[i] = i-1
} else { //equality
start = i
//guess it is increasing, if it is not - will be taken care of next iteration
sign = true
maxZigZagStart[i] = i
}
}
You can see that you can actually even let go of maxZigZagStart aux array and stored local maximal length instead.
A sketch of simple one-pass algorithm. Cmp compares neighbour elements, returning -1, 0, 1 for less, equal and greater cases.
Zigzag ends for cases of Cmp transitions:
0 0
-1 0
1 0
Zigzag ends and new series starts:
0 -1
0 1
-1 -1
1 1
Zigzag series continues for transitions
-1 1
1 -1
Algo:
Start = 0
LastCmp = - Compare(A[i], A[i - 1]) //prepare to use the first element individually
MaxLen = 0
for i = 1 to N - 1 do
Cmp = Compare(A[i], A[i - 1]) //returns -1, 0, 1 for less, equal and greater cases
if Abs(Cmp - LastCmp) <> 2 then
//zigzag condition is violated, series ends, new series starts
MaxLen = Max(MaxLen, i - 1 - Start)
Start = i
//else series continues, nothing to do
LastCmp = Cmp
//check for ending zigzag
if LastCmp <> 0 then
MaxLen = Max(MaxLen, N - Start)
examples of output:
2 6 7 1 7 0 7 3 1 1 7 4
5 (7 1 7 0 7)
8 0 0 3 5 8
1
0 0 7 0
2
1 2 0 7 9
3
8 3 5 2
4
1 3 7 1 6 6
2
1 4 0 6 6 3 4 3 8 0 9 9
5
Lets consider sequence 5 9 3 4 5 4 2 3 6 5 2 1 3 as an example. You have a condition which every internal element of subsequence should satisfy (element is strictly less or strictly greater than its neighbors). Lets compute this condition for every element of the whole sequence:
5 9 3 6 5 7 2 3 6 5 2 1 3
0 1 1 1 1 1 1 0 1 0 0 1 0
The condition is undefined for outermost elements because they have only one neighbor each. But I defined it as 0 for convenience.
The longest subsequence of 1's (9 3 6 5 7 2) is the internal part of the longest zigzag subsequence (5 9 3 6 5 7 2 3). So the algorithm is:
Find the longest subsequence of elements satisfying condition.
Add to it one element to each side.
The first step can be done in O(n) by the following algorithm:
max_length = 0
current_length = 0
for i from 2 to len(a) - 1:
if a[i - 1] < a[i] > a[i + 1] or a[i - 1] > a[i] < a[i + 1]:
current_length += 1
else:
max_length = max(max_length, current_length)
current_length = 0
max_length = max(max_length, current_length)
The only special case is if the sequence total length is 0 or 1. Then the whole sequence would be the longest zigzag subsequence.
#include "iostream"
using namespace std ;
int main(){
int t ; scanf("%d",&t) ;
while(t--){
int n ; scanf("%d",&n) ;
int size1 = 1 , size2 = 1 , seq1 , seq2 , x ;
bool flag1 = true , flag2 = true ;
for(int i=1 ; i<=n ; i++){
scanf("%d",&x) ;
if( i== 1 )seq1 = seq2 = x ;
else {
if( flag1 ){
if( x>seq1){
size1++ ;
seq1 = x ;
flag1 = !flag1 ;
}
else if( x < seq1 )
seq1 = x ;
}
else{
if( x<seq1){
size1++ ;
seq1=x ;
flag1 = !flag1 ;
}
else if( x > seq1 )
seq1 = x ;
}
if( flag2 ){
if( x < seq2 ){
size2++ ;
seq2=x ;
flag2 = !flag2 ;
}
else if( x > seq2 )
seq2 = x ;
}
else {
if( x > seq2 ){
size2++ ;
seq2 = x ;
flag2 = !flag2 ;
}
else if( x < seq2 )
seq2 = x ;
}
}
}
printf("%d\n",max(size1,size2)) ;
}
return 0 ;
}