H2O Frame apply function on each row in python - h2o

I am looking for a method similar to the 'apply' function in pandas. I tried
my_H2Oframe.apply(lambda x: my_function(x), axis=1)
But this doesn't work.
ValueError: Unimplemented: op < my_function > not bound in H2OFrame
I found this question. It seems we can only use those functions that have already been defined by H2O. I think there must be a method similar to the apply function because this is a common operation. Does anyone have a solution?

There is no other apply type method at the moment. the H2O apply method is suppose to be a close equivalent to pandas apply. It is true that H2O's apply function is limited to certain operations such as addition (+), subtraction (-), division, etc. If you use one that H2O doesn't have you will get the error above.
here are a few examples to try to see how the apply function can work (first one gets the mean across columns, the second returns a boolean column):
h2oframe = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
h2oframe.apply(lambda x: x.mean(), axis=0)
h2oframe.apply(lambda x: x['PSA'] > x['VOL'],axis=1)
And here is the current documentation on it:
apply(fun=None, axis=0):
Apply a lambda expression to an H2OFrame.
Parameters:
fun – a lambda expression to be applied per row or per column.
axis – 0 = apply to each column; 1 = apply to each row
Returns:
a new H2OFrame with the results of applying fun to the current frame.

Related

SARIMAX model in PyMC3

I would like to write down the following SARIMAX model (2,0,0) (2,0,0,12) in PyMC3 to perform bayesian estimation of its coefficients but I cannot figure out how to start with the seasonal part
Has anyone tries something like this?
with pm.Model() as ar2:
theta = pm.Normal("theta", 0.0, 1.0, shape=2)
sigma = pm.HalfNormal("sigma", 3)
likelihood = pm.AR("y", theta, sigma=sigma, observed=data)
trace = pm.sample(
1000,
tune=2000,
random_seed=13,
)
idata = az.from_pymc3(trace)
Although it would be best (e.g. best performance) if you can get an answer that uses PyMC3 exclusively, in case that does not exist yet, there is an alternative way to do this that uses the SARIMAX model in Statsmodels in combination with PyMC3.
There are too many details to repeat a full answer here, but basically you wrap the log-likelihood and gradient methods associated with a Statsmodels SARIMAX model. Here is a link to an example Jupyter notebook that shows how to do this:
https://www.statsmodels.org/stable/examples/notebooks/generated/statespace_sarimax_pymc3.html
I'm not sure if you'll still need it, however, expanding on cfulton's answer, here is how to fix the error in the statsmodels example (https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_pymc3.html, cell 8):
with pm.Model():
# Priors
arL1 = pm.Uniform('ar.L1', -0.99, 0.99)
maL1 = pm.Uniform('ma.L1', -0.99, 0.99)
sigma2 = pm.InverseGamma('sigma2', 2, 4)
# convert variables to tensor vectors
# # this is wrong:
theta = tt.as_tensor_variable([arL1, maL1, sigma2])
# # this is correct:
theta = tt.as_tensor_variable([arL1, maL1, sigma2], 'v')
# use a DensityDist (use a lamdba function to "call" the Op)
# # this is wrong:
# pm.DensityDist('likelihood', lambda v: loglike(v), observed={'v': theta})
# # this is correct:
pm.DensityDist('likelihood', lambda v: loglike(v), observed=theta)
# Draw samples
trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True, cores=4)
I'm no pymc3/theano expert, but I think the error means that Theano has failed to associate the tensor's name with the values. If you define the name along with the values right at the beginning, it works.
I know it's not a direct answer to your question. Nevertheless, I hope it helps.

Is there a way to use range with Z3ints in z3py?

I'm relatively new to Z3 and experimenting with it in python. I've coded a program which returns the order in which different actions is performed, represented with a number. Z3 returns an integer representing the second the action starts.
Now I want to look at the model and see if there is an instance of time where nothing happens. To do this I made a list with only 0's and I want to change the index at the times where each action is being executed, to 1. For instance, if an action start at the 5th second and takes 8 seconds to be executed, the index 5 to 12 would be set to 1. Doing this with all the actions and then look for 0's in the list would hopefully give me the instances where nothing happens.
The problem is: I would like to write something like this for coding the problem
list_for_check = [0]*total_time
m = s.model()
for action in actions:
for index in range(m.evaluate(action.number) , m.evaluate(action.number) + action.time_it_takes):
list_for_check[index] = 1
But I get the error:
'IntNumRef' object cannot be interpreted as an integer
I've understood that Z3 isn't returning normal ints or bools in their models, but writing
if m.evaluate(action.boolean):
works, so I'm assuming the if is overwritten in a way, but this doesn't seem to be the case with range. So my question is: Is there a way to use range with Z3 ints? Or is there another way to do this?
The problem might also be that action.time_it_takes is an integer and adding a Z3int with a "normal" int doesn't work. (Done in the second part of the range).
I've also tried using int(m.evaluate(action.number)), but it doesn't work.
Thanks in advance :)
When you call evaluate it returns an IntNumRef, which is an internal z3 representation of an integer number inside z3. You need to call as_long() method of it to convert it to a Python number. Here's an example:
from z3 import *
s = Solver()
a = Int('a')
s.add(a > 4);
s.add(a < 7);
if s.check() == sat:
m = s.model()
print("a is %s" % m.evaluate(a))
print("Iterating from a to a+5:")
av = m.evaluate(a).as_long()
for index in range(av, av + 5):
print(index)
When I run this, I get:
a is 5
Iterating from a to a+5:
5
6
7
8
9
which is exactly what you're trying to achieve.
The method as_long() is defined here. Note that there are similar conversion functions from bit-vectors and rationals as well. You can search the z3py api using the interface at: https://z3prover.github.io/api/html/namespacez3py.html

Matlab : image region analyzer. Alternative for 'bwpropfilt'?

I'm running basic edge detection to detect windows region based on this http://www.mathworks.com/videos/edge-detection-with-matlab-119353.html
The edge works successfully :
final_edge = edge(gray_I,'sobel');
BW_out = bwareaopen(imfill(final_edge,'holes'),20);
figure;
imshow(BW_out);
Now when come to these following codes to filter image based on properties, it seems like my MATLAB R2013a can't identify this bwpropfilt method.
% imageRegionAnalyzer(BW);
% Filter image based on image properties
BW_out = bwpropfilt(BW_out,'Area', [400, 467]);
BW_out = bwpropfilt(BW_out,'Solidity',[0.5, 1]);
It says:
Undefined function 'bwpropfilt' for input arguments of type 'char'.
Then what should be my alternative to change this bwpropfilt?
bwpropfilt simply takes a look at the corresponding attribute that is output from regionprops and gives you objects that conform to that certain range and also filtering out those that are outside of the range. You can rewrite the algorithm by explicitly calling regionprops, creating a logical array to index into the structure to retain only the values within the right range (seen in the third input of bwpropfilt) corresponding to the property you want to examine (seen in the second input of bwpropfilt). If you want to finally reconstruct the image after filtering, you'll need to use the column major linear indices found in the PixelIdxList attribute, stack them all into a single vector and write to a new output image by setting all of these values to true.
Specifically, you can use the following code to reproduce the last two lines of code you have shown:
% Run regionprops and get all properties
s = regionprops(BW_out, 'all');
%%% For the first line of code
values = [s.Area];
s = s(values > 400 & values < 467);
%%% For the second line of code
values = [s.Solidity];
s = s(values > 0.5 & values < 1);
% Stack column major indices
ind = vertcat(s.PixelIdxList);
% Create output image
final_out = false(size(BW_out));
final_out(ind) = true;
final_out contains the filtered image only retaining the values within the range specified by the desired property.
Caution
The above logic only works for attributes returned from regionprops that contain only a single scalar value per unique region. If you examine the supported properties found in bwpropfilt, you will see that this list is a subset of the full list found in regionprops. This makes sense as certain regionprops properties return a vector or a matrix depending on what you choose so using a range to filter out properties becomes ambiguous if you have multiple values that characterize a particular unique region returned by regionprops.
Minor Note
Being curious, I opened up bwpropfilt to see how it is implemented as I currently have MATLAB R2016a. The above logic, with the exception of some exception handling, is essentially how bwpropfilt has been implemented so the code that I wrote is in line with the logic of the function.

Using if conditions inside a TensorFlow graph

In tensorflow CIFAR-10 tutorial in cifar10_inputs.py line 174 it is said you should randomize the order of the operations random_contrast and random_brightness for better data augmentation.
To do so the first thing I think of is drawing a random variable from the uniform distribution between 0 and 1 : p_order. And do:
if p_order>0.5:
distorted_image=tf.image.random_contrast(image)
distorted_image=tf.image.random_brightness(distorted_image)
else:
distorted_image=tf.image.random_brightness(image)
distorted_image=tf.image.random_contrast(distorted_image)
However there are two possible options for getting p_order:
1) Using numpy which disatisfies me as I wanted pure TF and that TF discourages its user to mix numpy and tensorflow
2) Using TF, however as p_order can only be evaluated in a tf.Session()
I do not really know if I should do:
with tf.Session() as sess2:
p_order_tensor=tf.random_uniform([1,],0.,1.)
p_order=float(p_order_tensor.eval())
All those operations are inside the body of a function and are run from another script which has a different session/graph. Or I could pass the graph from the other script as an argument to this function but I am confused.
Even the fact that tensorflow functions like this one or inference for example seem to define the graph in a global fashion without explicitly returning it as an output is a bit hard to understand for me.
You can use tf.cond(pred, fn1, fn2, name=None) (see doc).
This function allows you to use the boolean value of pred inside the TensorFlow graph (no need to call self.eval() or sess.run(), hence no need of a Session).
Here is an example of how to use it:
def fn1():
distorted_image=tf.image.random_contrast(image)
distorted_image=tf.image.random_brightness(distorted_image)
return distorted_image
def fn2():
distorted_image=tf.image.random_brightness(image)
distorted_image=tf.image.random_contrast(distorted_image)
return distorted_image
# Uniform variable in [0,1)
p_order = tf.random_uniform(shape=[], minval=0., maxval=1., dtype=tf.float32)
pred = tf.less(p_order, 0.5)
distorted_image = tf.cond(pred, fn1, fn2)

Real/imaginary part of sympy complex matrix

Here is my problem.
I'm using sympy and a complex matrix P (all elements of P are complex valued).
I wanna extract the real/imaginary part of the first row.
So, I use the following sequence:
import sympy as sp
P = sp.Matrix([ [a+sp.I*b,c-sp.I*d], [c-sp.I*d,a+sp.I*b] ])
Row = P.row(0)
Row.as_mutable()
Re_row = sp.re(Row)
Im_row = sp.im(Row)
But the code returns me the following error:
"AttributeError: ImmutableMatrix has no attribute as_coefficient."
The error occurs during the operation sp.re(Row) and sp.im(Row)...
Sympy tells me that Row is an Immutable matrix but I specify that I want a mutable one...
So I'm in a dead end, and I don't have the solution...
Could someone plz help me ?
thank you very much !
Most SymPy functions won't work if you just pass a Matrix to them directly. You need to use the methods of the Matrix, or if there is not such method (as is the case here), use applyfunc
In [34]: Row.applyfunc(re)
Out[34]: [re(a) - im(b) re(c) + im(d)]
In [35]: Row.applyfunc(im)
Out[35]: [re(b) + im(a) -re(d) + im(c)]
(I've defined a, b, c, and d as just ordinary symbols here, if you set them as real the answer will come out much simpler).

Resources