Feature Serving for h2o - h2o

I see h2o has model serving (as documented in "Productionizing H2O" at http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html#) but I don't see any reference to serving the feature pipeline.
Does h2o have a feature serving methodology? (for example, TensorFlow has feature serving via Apache Beam, which is served along with the model via TFServing.)

Related

Serving static files in a microservices environment?

What is the preferred way of serving static files for an application that is deployed in a microservices architecture (in production)?
Let's say for simplicity that I have 3 application servers and one load-balancer that forwards requests to these servers.
Should the load-balancer store the files and serve them imminently upon request? OR..
Should the load-balancer forward static files requests to the different application instances (each request to a different instance)?
Is there a best practice for this?
As stated in other comments/answers, there are a lot of ways to handle this. This largely depends on what you actually need (version control, access control, http headers, CDN).
Specifically to your question, if it was me, I wouldn't deploy these files on the load balancers, because a newer version of the static files would require a downtime on the load balancers. Instead, I would build very simple Nginx/Caddy containers that their sole purpose is to serve these files, and have the LB route the traffic to those containers.
Best practice would be to store it in a service meant for static content like blob storage (in the cloud this would be Azure Storage, S3, etc.). If necessary, leverage a CDN to improvement latency and throughput to end users.
But as someone else commented, there are many ways to handle this depending on your particular requirements.

Mesos HTTP API vs Native API

I am trying to write a framework on top of Mesos and so far I was able to download Mesos for Ubuntu and start a master and slave on a single machine.
I want to build a Mesos framework using Python, should I use the HTTP API or the native API? What is the difference between them?
I was able to find no documentation on the Python native API, except for some examples.
The HTTP API has documentation but no examples on how to use it. Should I be building a web service if I choose to use the HTTP API?
You should use HTTP API
Native API is easiest way to build Mesos Framework. Just include lib in your project and implement interfaces. Although it comes with some limitations:
Native API is no logger extended, new features goes only to HTTP API e.g., Maintenance Mode MESOS-2063
Native API require mesoslib to be available on system. This makes hard coupling between framework and platform it runs on. With HTTP API you can run your framework on any system no needed to load mesoslib.
Documetnation for HTTP API exists here. It's language agnostic. So there are no examples in python, rather raw HTTP requests. But there are some tutorials how to use it. I can recomend one givien by Marco Massenzi at MesosCon EU 2015
Video
Code
Slides

Amazon AWS and usage model for S3 storage

There is this example on amazon, a high traffic web application. I noticed that they are using S3 as their content delivery method. I was wondering if I need to have a Web Server for the content delivery and a Web App for my application. I don't understand why they have 2 web servers and 2 web app in the diagram.
And what is the best way to set up a website that serves images and static contents through S3 and the rest of the content through the regular storage.
My last question is, can I consider S3 as a main storage, reliable enough that I can only keep my static content there and don't have a normal storage as a backup ?
That is a very general diagram, specific diagrams will vary depending on the specifics of the overall architecture.
Having said that, I believe the Web Server represents something like Apache or Nginx and the App Server represent something like Rails, Rack Server, Unicorn, Gunicorn, Django, Sinatra, Flask, Jetty, Tomcat, etc. In some cases you can merge the Web Server and the App Server together like for example deploying Apache with python mod_wsgi to run your Django app. (So depends on Architecture)
what is the best way to set up a website that serves images and static
contents through S3 and the rest of the content through the regular
storage.
There's no really best way other than just point your dynamic content to your Databases (SQL and NoSQL) and point your static files to an S3 bucket (images, css, Jquery code, etc) You can also use third party modules depending on your application stack. For example you can accomplish this in Django with the django-storages module. You can find similar modules for other app stacks like Rails.
My last question is, can I consider S3 as a main storage, reliable
enough that I can only keep my static content there and don't have a
normal storage as a backup ?
S3 is pretty reliable, they provide a 99.999999999% reliability of your data. That goes down if you use their RRS (Reduced Redundancy Storage), but if you want to use it you probably want to back up your data in a non RRS bucket anyways. Anyhow, if it's extremely critical data, you are more than free to backup your data somewhere else just in case.
Notice in the diagram that they also recommend using CloudFront for your static files and this is especially useful if your users will be accessing your application from different geographical areas.
Hope this helps.

grails serve index.html from CDN

I would like my grails app to be deployed in the root of my domain:
www.example.com
instead of
www.example.com/myapp
However www.example.com/index.html is going to be static (static html, images, etc). I'm concerned about performance of having the the application server serve up the homepage. Can I configure my grails app / the cdn to serve index.html and it's content, and have the application server handle the dynamic pages?
I am using grails 2.2.4
I will be using Amazon S3 + ElasticBeanstalk + CloudFront.
Or should I be worried about performance at all? I am new to grails but it's my understanding that static content should be served by the webserver (Apache). Since there is no apache, I'm looking for another option to keep the load off of the webserver. The CDN seems like a good idea.
You certainly can do that. My personal recommendation would be to keep your images on S3 and use Cloud Front on top of that. Unless your static HTML itself is excessively large, it would be my recommendation to let Grails be Grails and take advantage of using Grails Resources for your JS and CSS as typical Grails projects do - even if your index page won't be doing anything dynamic right now. The more you break off the Grails conventions, the more complex things like builds and continuous integration can become. Look at using caching, minifying plugins and performance is very good.
As for deploying to the root "/" context, you can either do this by "prod war ROOT.war" for your Tomcat (or wherever) deployment OR you can build it as "whateverapp.war" and handle the routing rules with Apache mod_jk for more complex situations.
I've done probably a dozen Grails projects and use a very similar architecture now.
The simplest thing to do is to serve your entire domain from CloudFront and then serve the home page from your Grails app. You can configure CloudFront to cache requests to the home page so you will minimize the number of requests to Grails. You can map CloudFront directly to the ELB running in your Elastic Beanstalk environment.
The default Elastic Beanstalk configuration doesn't give you any way of serving static files from Apache; all the requests to Elastic Beanstalk are proxied to Tomcat. You can use advanced configuration to set this up though (using the .ebextensions mechanism).
You can also set up the Cache plugin to set up full page caching on the server side (I recommend using the Cache EhCache plugin as well). Combining server-side caching with CDN caching will get you a long way.
BTW, another good option for serving static content is to use S3 itself to serve pages. If you aren't doing anything too complicated it will save you the work of setting up and running a web server, although with Elastic Beanstalk there's not much to do.

Web analytics platform that can run on a load-balanced server cluster?

I'm using Piwik for my web-analytics, and recently I've discovered PHPFog/CloudControl as hosting providers that set up a load balanced, fully managed server for your applications to run on. Piwik requires certain directories to be writable in order to set configuration files, and this prevents me from using load-balancing to enhance my piwik response times.
Does anyone know of an analytics package like piwik (or maybe a different version of piwik) that supports load balancing?
Piwik supports load balancing! See the doc at: http://piwik.org/faq/new-to-piwik/#faq_134

Resources