How to implement new methods into pipeline using H2O?

How to implement new methods into pipeline using H2O? - methods

I'm new to machine learning and H2O tools, and I'd like to know if there is a high-level H2O interface that allows us to implement new methods into a pipeline.
I know we can build models thanks to Flow interface and export them as POJO/MOJO. But how can I, for example, decide to use kNN method as an imputation method for my data, when Flow only allows simple imputation like mean/mode ?

You cannot add new methods in your pipeline in Flow. Flow is just a simple GUI which allows you to do things like create train/test splits, train models, test models and view some metrics. You'd have to use the R or Python client to create a pipeline of any sort.

Related

Best practice to deploy multi models that will run concurrently at scale (something like map reduce)

I have a model that consists 150 models (runs in for loop).
In order to be performance oriented, I would like to split it into 150 models, that for every request my server gets it will send 150 api requests to every different model and then combine the result (so that the invocations will run parallely). So called map reduce
I thought about AWS SageMaker multi model but it says that the use case is better for serial running more than parallel or concurrent run.
In addition, I thought about maybe creating lambda function that will read the model and scale accordingly (serverless), but it sounds very odd to me and that I miss SageMaker's usecases.
Thanks!

are your models similarly sized? This should not be an issue for the concurrent requests as long as you choose an instance type to back the endpoint that has an appropriate amount of workers to be able to handle these requests. Check out the Real-Time Inference SageMaker Pricing page to see the different instance types you can use, I would suggest tuning this instance type along with count to be able to handle your requests.

(Golang) Clean Architecture - Who should do the orchestration?

I am trying to understand which of the following two options is the right approach and why.
Say we have GetHotelInfo(hotel_id) API that is being invoked from the Web till the Controller.
The logic of the GetHotelInfo is:
Invoke GetHotelPropertyData() (Location, facilities…)
Invoke GetHotelPrice(hotel_id, dates…)
Invoke GetHotelReviews(hotel_id)
Once all results come back, process and merge the data and return 1 object that contains all relevant data of the hotel.
Option 1:
Create 3 different repositories (HotelPropertyRepo, HotelPriceRepo,
HotelReviewsRepo)
Create GetHotelInfo usecase that will use these 3 repositories and
return the final result.
Option 2:
Create 3 different repositories (HotelPropertyRepo, HotelPriceRepo,
HotelReviewsRepo)
Create 3 different usecases (GetHotelPropertyDataUseCase,
GetHotelPriceUseCase, GetHotelReviewsUseCase)
Create GetHotelInfoUseCase that will orchestrate the previous 3
usecases. (It can also be a controller, but that’s a different topic)
Let’s say that right now only GetHotelInfo is being exposed to the Web but maybe in the future, I will expose some of the inner requests as well.
And would the answer be different if the actual logic of GetHotelInfo is not a combination of 3 endpoints but rather 10?

You can see a similar method (called Get()) in "Clean Architecture with GO" from Manato Kuroda
Manato points out that:
following Acyclic Dependencies Principle (ADP), the dependencies only point inward in the circle, not point outward and no circulation.
that Controller and Presenter are dependent on Use Case Input Port and Output Port which is defined as an interface, not as specific logic (the details). This is possible (without knowing the details in the outer layer) thanks to the Dependency Inversion Principle (DIP).
That is why, in example repository manakuro/golang-clean-architecture, Manato creates for the Use cases layer three directories:
repository,
presenter: in charge of Output Port
interactor: in charge of Input Port, with a set of methods of specific application business rules, depending on repository and presenter interface.
You can use that example, to adapt your case, with GetHotelInfo declared first in hotel_interactor.go file, and depending on specific business method declared in hotel_repository, and responses defined in hotel_presenter

Is expected Interactors (Use Case class) call other interactors. So, both approaches follow Clean Architecture principles.
But, the "maybe in the future" phrase goes against good design and architecture practices.
We can and should think the most abstract way so that we can favor reuse. But always keeping things simple and avoiding unnecessary complexity.
And would the answer be different if the actual logic of GetHotelInfo is not a combination of 3 endpoints but rather 10?
No, it would be the same. However, as you are designing APIs, in case you need the combination of dozens of endpoints, you should start considering put a GraphQL layer instead of adding complexity to the project.

Clean is not a well-defined term. Rather, you should be aiming to minimise the impact of change (adding or removing a service). And by "impact" I mean not only the cost and time factors but also the risk of introducing a regression (breaking a different part of the system that you're not meant to be touching).
To minimise the "impact of change" you would split these into separate services/bounded contexts and allow interaction only through events. The 'controller' would raise an event (on a shared bus) like 'hotel info request', and each separate service (property, price, and reviews) would respond independently and asynchronously (maybe on the same bus), leaving the controller to aggregate the results and return them to the client, which could be done after some period of time. If you code the result aggregator appropriately it would be possible to add new 'features' or remove existing ones completely independently of the others.
To improve on this you would then separate the read and write functionality of each context into its own context, each responding to appropriate events. This will allow you to optimise and scale the write function independently of the read function. We call this CQRS.

Algo software architecture

I have the components of a trading robot. This follows the architecture of:
Data layer - streaming and formatting data
Model layer - updates model and writes events to event queue
Intelligence layer - fetch event, classify (buy, sell, null), filter, construct order (instrument, buy/sell, stop), write to order queue
Order layer - fetch event, choose size (or reject), place order, write order to DB
My question is:
What is the best design pattern to co-ordinate all components involved in each layer?
For a (simplified) example, I do not feel the following would be good practice:
Model M creates an instance of DataSource D
M creates an instance of Intelligence I
I creates an instance of Order O
The main point of the above being everything instantiates everything else, so nothing is operating independently (thus reduced redundancy).
But I also don't feel like one class which instantiates everything and manages the interactions is good practice.
Can anyone advise?

This is why people use IoC, it solves this problem. https://en.wikipedia.org/wiki/Inversion_of_control
Look at your language/framework stack, and search for an IoC library, it will likely fix most of your issues.

Filter-Interceptor-Kafka topics

Hi am trying to build a analytical engine to determine realtime analysis of urls\events being used by client as well as to log the performance of api.
Following is the logic I am planning to implement:
1. Create a filter to intercept urls
2. Code filter as a reusable jar which have the logic to intercept them
using mvc-interceptors.
3. The interceptor will produce and publish events into kafka streams if url pattern is matched.
My confusion is this is the best approach to achieve this. Or is there any alternative better approach, keeping in mind high traffice flow into apis.

If the filtering is just done a single message at a time it could also be done in Kafka Connect using the new Single Message Transforms feature https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect

Comparison with mainstream workflow engines

I'd like to use Spring SM in my next future that has very simple workflows, 3-4 states, rule based transitions, and max actors.
The WF is pretty fixed, so storing its definition in java config is quite ok.
I'd prefer to use SM than WF engine which comes with the whole machinery, but I couldnt find out if there is a notion of Actor.
Meaning, only one particular user (determined by login string) can trigger a transition between states.
Also, can I run the same State machine definition in parallel. Is there a notion of instance, like process instance in WF jargon?
Thanks,
Milan

Actor with a security is an interesting concept but we don't have anything build in right now. I'd say that this can be accomplished via Spring Security i.e. https://spring.io/blog/2013/07/04/spring-security-java-config-preview-method-security/ and there's more in its reference doc.
I could try to think if there's something what we could do to make this easier with Spring Security.
Parallel machines are on my todo list. It is a big topic so takes while to implement. Follow https://github.com/spring-projects/spring-statemachine/issues/35 and other related tickets. That issue is a foundation of making distributed state machines.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to implement new methods into pipeline using H2O? - methods

You cannot add new methods in your pipeline in Flow. Flow is just a simple GUI which allows you to do things like create train/test splits, train models, test models and view some metrics. You'd have to use the R or Python client to create a pipeline of any sort.

Related

Best practice to deploy multi models that will run concurrently at scale (something like map reduce)

(Golang) Clean Architecture - Who should do the orchestration?

Algo software architecture

Filter-Interceptor-Kafka topics

Comparison with mainstream workflow engines

Categories

Resources