Is there any document about using Python preprocess with h2o steam? - h2o

The h2o steam website said Python preprocess with pojo
As .War is an optional, but I can not find any examples about doing this step by step,
Where can I find out more details about this? Or I better do it in Java only?
The situation is I have one python preprocess program, mainly use pandas to do some data munging before calling h2o to train/score the model. I want to use the
h2o steam as the score engine. The website mentions I can wrap the python and h2o pojo/mojo file together as a .war file, so I can call it through REST API. But I
can not find example or details about how to proceed. Also do I need to and if yes how to include these
python library like pandas in the war file?

Related

can i use some native Parquet file tools in Azure Logic App?

Hi there I need to change format of parquet file to csv using only Logic app native tools. Is that even possible?
I did research of similar issues, I found how to use Azure Functions to change format, but it's not native Logic App tool.
There's a custom connector that will transform Parquet to Json for you.
It will also allow you to perform filter and sorting operations on the data prior to it being returned.
Documentation can be found here ... https://www.statesolutions.com.au/parquet-to-json/

Uploading an image into Apache Openwhisk

Currently, I'm running Openwhisk on a Docker standalone container. I've added normal simple JavaScript functions as actions and invoked the same. The input arguments are taken as --param or --param-file for json. The output is always a json file.
Is it possible to upload a simple picture(jpeg, png), do a simple change on it and return the image?
Running a python shell within the javascript file in which the python file uses pillow library to rotate an image for example. This is done because I wanted performance metrics with https://github.com/jthomas/openwhisk-metrics (can be done only with js files).
Serialization has been a very tedious process and highly inefficient.

Monetdbe Python UDF

Given that monetdbe is a Python package, I'm optimistic that Python user-defined-functions are possible, but I haven't been able to find an example. I've tried this:
drop function every_other_letter;
create function every_other_letter(s string)
returns string
language python {
letters = "".join(s[::2])
return letters
};
select every_other_letter('foobarbaz');
I get this error:
ParseException:SQLparser:42000!Internal error while compiling statement: TypeException:user.p2_1[4]:'pyapi3.eval' library error in: Embedded Python 3 has not been enabled. Start server with --set embedded_py=3
Is there any way to set these flags in the embedded version?
The LANGUAGE PYTHON UDF's are a nice development feature in MonetDB's server installation but this feature requires an additional Python module to be loaded. And there is currently no way to configure monetdbe to load the required python module.
However assuming you have performance requirement for some production setting that are not met with the out-of-the-box SQL toolset in monetdbe, it makes more sense to implement a custom UDF extension written in C/C++. In regular MonetDB's server installation, the database server mserver5 can load an arbitrary extension module using the --loadmodule=<module> command option. But there is no equivalent monetdbe_option as of yet.
You might consider adding a feature request for this on monetdbe-python's github repository.
However there seems to exist a functioning undocumented workaround for adding UDF extensions to monetdbe. During its initialization, monetdbe attempts to load a set of hard coded modules. One of those is a module named "udf". You can create your own implementation of this module and load it into monetdbe.
Creating a native UDF extension is outside of the scope of this question and answer but there exist a nice up-to-date tutorial for writing UDF extensions for MonetDB here. Following the steps described in that tutorial, you end up with a SQL function revstr which has a user defined native implementation. The following Python script demonstrate its use:
from monetdbe import connect
with connect(autocommit=True) as con:
cur = cur.execute("select revstr('abcde')")
result = cur.fetchall()
print(result)
Make sure that the library containing your UDF extension is in the search path of the dynamic linker:
LD_LIBRARY_PATH=<path to directory containing lib_udf> python test.py

How to import a NiFi_Flow.json or convert to a template?

I've worked the whole day on a Nifi Flow in a local docker container. Once finished I've downloaded the flow as json file and killed the container. I now want it to import into my Nifi instance on Kubernetes. Unfortunately, it seems that the way to go is using templates. So I guess the download flow as JSON file function is a one way road? Or what is the purpose of this functionality?
Is there a ways to convert this JSON to a template.xml? Otherwise I have to redo all my work.
You can upload the flow definition when creating the Process Group. Use the "Browse" icon:
You need NiFi Registry to import:
Good resource: https://community.cloudera.com/t5/Community-Articles/How-to-import-a-flow-to-NiFi-registry-in-CDP-Cloud/ta-p/308335
Personally, I do not like posts by Timothy Spann, it may be useful but lack "lots" of explanations.
Summary:
Install NiFi Registry
Connect NiFi with the registry
Import Json file manually or using NiFi Toolkit or NiPyAPI to do it programmatically.

AWS Lambda: How To Upload & Test Code Using Python And Command Line

I am no longer able to edit my AWS lambda function using the inline editor because of the error, "Your inline editor code size is too large. Maximum size is 51200." Yet, I can't find a walk-through that explains how to do these things from localhost:
Upload a python script to Lambda
Supply "event" data to the script
View Lambda output
You'll need to create a deployment package for your code, which is just a zip archive but with a particular format. Instructions are in the AWS Python deployment documentation.
Then you can use the context object to supply event data to your script, starter information in the AWS Python programming model documentation.
Side note: once your Lambda code starts to get larger, it's often handy to move to some sort of management framework. Several have been written for Lambda, I use Apex which is written in Go but works for any Lambda code, but you might be more comfortable with Gordon (has a great list of examples and is more active) or Kappa, which are both written in Python.

Resources