Parallelisation over the cloud - parallel-processing

Parallelisation over the cloud - parallel-processing

I have a routine named isIn(i), where i is an integer, which returns a number between 0 and 1.
My main routine is a for loop calling every isIn to find the best match.
Suppose now that I put this algorithm on a cloud service (e.g. OVH or Amazon).
What is the best programming language I should use? (given that it has to be well-known, I was thinking of Python).
What is the best way to efficiently parallelise this algorithm? (OS used? Communication protocol?).
Subsidiary question: how do I scale it efficiently? (for example if the main routine is called by many users)

Generally speaking what you might consider is setting up an image that simply presents an RPC service. Python can do this fairly easily. You could have the image just boot and launch a python application that spins up a simple http server and listens for JSON queries.
It can parse those queries... perform isln(x) and return a result in json.
Then you can setup an http load balancer with a public interface to route requests dynamically across however many instances of the rpc service you deploy. You may consider ways to dynamically grow your load balancers as well as your number of rpc instances.
This would be in line with how most cloud applications currently operate.

Related

What would be the right ZMQ Pattern?

I am trying to build a ZeroMQ pattern where,
There can be many clients connecting to a single server endpoint
Server will distribute incoming client tasks to available workers (will be mapped to the number of cores on the server)
These tasks are long running (in hours) and need to perform a lot of local I/O
During each task execution (iteration) there will be data/messages (potentially in order of [GB]s) sent back and forth between the client and the server worker
Client and server workers need to know if there are failures/errors on the peer side, so that they can recover (retry) or shutdown gracefully and try later
Based on the above, I presume that the ROUTER/DEALER pattern would be useful. PUB/SUB is discarded as I need to know if the peer fails.
I tried using various combinations of the ROUTER/DEALER pattern but I am unable to ensure that multiple messages from a client reach the same worker within an iteration. I understand that I need to implement a broker/forwarder/device that routes the incoming messages to the right recipient/handler/worker. But I am unable to map the frontend and backend sockets in the broker. I am looking at MajorDomo pattern, but I guess there has to be a simpler broker model that could just route the messages to the assigned worker. (not really get into services)
I am looking for some examples, if there are any or any guidance on what I may be missing. I am trying to build this in Golang.

Q : "What would be the right ZMQ Pattern?"
Based on the complex composition of all the requirements posted under items 1 - 5, I dare to say, The Right would be NOT to use a single one of the standard, built-in, ZeroMQ trivial primitive Communication Archetype Patterns, but to rather create a multi-layered application-specific composition of a ( M + N + 1 hot-standby robust-enough?) (self-resilient?) Signalling-Messaging infrastructure, that covers all your current ( and possibly extensible for any future one ) application-level requirements, like depicted here for a way simpler distributed-computing use-case, where but a trivial remote-SigKILL was implemented.
Yes, the best would be to create ( and maintain ) your own formalised signalling, that the application level can handle and interact across -- like the heart-beating for detecting dead-worker(s) + permitting to re-instate such failed jobs right on-detected failures (most probably re-located and/or re-scheduled to take place & respective resources not statically pre-mapped, but where physically most feasible at the re-instating moment of time - so even more telemetry signalling will help you decide about the re-instating of the such failed micro-jobs).
ZeroMQ is a fabulous framework right for such complex signalling and messaging hierarchies, so your System Architect's imagination is the only ceiling in this concept.
ZeroMQ will take the rest and do all the hard work nice and easily.

How to communicate with external system

I'm trying to write a logic (js script) to communicate with external system. As far as understand, logic will be executed on all endorsing peer.
In this case, how can I avoid duplicate operation to external system ? For example, how to increment a value in external database ? If I write a logic to increment the value in js, I think the value will be incremented by all endorsing peer.
I'll appreciate any comment.

Firstly, currently the only way you can interact with external systems is using the experimental post API. This allows your Transaction Processor function to HTTP POST data to an external system and then to process the response.
Documentation here:
https://hyperledger.github.io/composer/integrating/call-out.html
You are correct in stating that if you have 4 peers, then the chain code container for each peer will run your logic, so you'd expect to see 4 calls to your HTTP service. This is required because each peer node is independent and Fabric must achieve consensus across the peers.
The external functions should therefore (ideally) be side-effect free "pure" functions (idempotent), meaning that for a given set of input parameters you always get the same set of output results.
Clearly a function that returns an incrementing integer doesn't fit this description! You probably need to rethink how you are structuring your problem to make it compatible with a decentralised blockchain-based approach.

Is the mux in this golang socket.io example necessary?

In an app that I'm making, a user is always part of a 'game'. I'd like to set up a socket.io server to communicate with users in a game. I'm planning to use http://godoc.org/github.com/madari/go-socket.io go-socket.io, which defines the newSocketIOfunction to create a new socketio instance.
Instead of creating one socketio instance, I thought it might be possible to create a map that maps game id's to socket.io instances, and configure them so that they listen on an url that represents the game id.
This way, I can use methods such as broadcast and broadcastExcept to broadcast to all players ithin a single game. However, I'd have to start a new goroutine for every game, and I don't know enough about their performance characteristics to know if this is scalable, since the request rate for a single socketio instance will be very low, about 1/second at peak times, but the connection might be idle for tens of seconds at other times (except for heartbeat, and possibly other communication specified by the socket.io protocol).
Would I be better off creating 1 socket.io instance, and tracking which connections belong to which games?

I'd have to start a new goroutine for every game, and I don't know enough about their performance characteristics to know if this is scalable
Fire away, the Go scheduler is built to efficiently handle thousands and even millions of goroutines.
The default net/http server in the Go standard library spawns a goroutine for every client for instance.
Just remember to return from your goroutines once they're done working. Else you'll end up with a lot of stale ones.
Would I be better off creating 1 socket.io instance, and tracking which connections belong to which games?
I'm not involved in the project but if it follows Go's "get sh*t done" philosophy, then it shouldn't matter. You can find out what works better by profiling both approaches though.

Can Netty efficiently handle scores of outgoing connections as a client?

I'm creating a client-server relationship whereby a single client will be connected to an arbitrary number of servers using persistent TCP connections. The actual number of servers is as-of-yet undetermined, but the design goal is to shoot for 1000.
I found an example using direct Java NIO that nearly completely matches my mental model of how this could work:
http://drdobbs.com/jvm/184406242
In general, it opens up all of the channels and adds them to a single thread monitoring java.nio.channels.Selector. The use of the Selector, in particular, is what allows this to scale far better than using the standard thread-per-channel.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO. Unfortunately, I have not been able to determine how Netty would handle a case like this. That is, the examples and discussions I've found all tend to center around the server side, with accepting scores of concurrent connections.
But what about doing this from the client side? If I create a large number of channels and just wait on their events, how is Netty going to handle this at the back-end?

This isn't a direct answer to your question but I hope it is helpful nonetheless. Below, I describe a way for you to determine the answer that you are looking for. This is something that I recently did myself for an upcoming project.
Compared to OIO (Old IO) the asynchronous nature of the Netty framework and NIO will indeed provide much better memory and CPU usage characteristics for your application. The way buffers are handled in Netty will also be of benefit as it will help you to avoid copying byte buffers. The point is that all of the thread pool and NIO details will be handled for you allowing you to focus on your business logic. You mentioned the NIO Selector and you will benefit from that; the nice thing about Netty is that you get the benefits without having to worry about that implementation yourself because it is already done for you.
My understanding of the client side is that it is very similar to the server side and should provide you with commensurate performance gains (as long as your business logic doesn't introduce any performance issues).
My advice would be to throw together a prototype that more or less does what you want. Leave out any time consuming details and just add in the basic Netty handlers that you need to make something that works.
Then I would use jmeter to invoke your client to apply load to the server and client. Using something like jconsole or jvisualvm will show you the performance characteristics of the client and server under load. You could also try jprobe. You can add a listener in jmeter that will indicate the throughput. I would advise to use jmeter in server mode, the client on another machine and the server on yet another. This is a bit of up front work but if you decide to move forward you will have these tools ready to go for further testing as your proceed.
I suspect a decent Netty implementation that doesn't introduce any extraneous poorly performing components will give you the performance characteristics you are looking for, but, the only way to know for sure is to measure the system under the expected load.
You need to define what the expected load looks like and the desired performance characteristics under such load. Given these inputs you can measure your system to find out if it will meet your expectations. I personally don't think anyone can tell you if it will behave in the desired manner. You have to measure it. It's the only reliable way to know if the system can meet your needs.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO.
This is the correct approach. You can try implementing your own NIO server and client but why do that when you have the benefit of a highly refined framework at your fingertips already?

Netty will use up to x worker threads that handle the work for you. Each worker thread will have one Selector that is used to register Channels to it. The number of used workers is configurable and by default 2 * cpu-count.

As you can see in the example from Netty's doc [http://netty.io/docs/stable/guide/html/#start.9][1] you can control exactly the number of worker threads (meaning the number of underlying selectors) on the Client side.
Netty solves a numbers of issues that are very hard to handle in a simple way such as NIO vs SSL, and have a lot of default encoder/decoder for Zip... etc.
I started using Netty a few week ago and it was quite fast to came into. (I recommend dowloading the project with all the example code inside, there is a lot of documentation in it that can not be found on the url above.
ChannelFactory factory = new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
ClientBootstrap bootstrap = new ClientBootstrap(factory);
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() {
return Channels.pipeline(new TimeClientHandler());
}
});
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
bootstrap.connect(new InetSocketAddress(host, port));
Good luck,
Renaud

Web crawler in Ruby: How to achieve the best perfomance?

I'm writing a web-crawler that should be able to parse multiple pages at the same time. I use Nokogiri for parsing which is quiet good and solve all my tasks, but I don't know how to achieve better perfomance.
I use threads to make many open-uri requests at the same time and it makes the process quicker, but it seems that it's still far from the potential that I can achieve from a single server. Should I use multiple processes? What are the limits of the threads and processes that can be launched for a single ruby application?
By the other words: how to achieve the best performance in this case.

I really like Typhoeus and Hydra for handling multiple requests at once.
Typhoeus is the http client side, and Hydra is the part that handles multiple requests. The examples are good so go through them and see.

While it sounds like you're not looking for something quite so complex I found this thesis an interesting read awhile ago: Building blocks of a scalable webcrawler - Marc Seeger.
In terms of threading/process limits Ruby has very low threading potential. Standard Ruby (MRI/YARV) and Rubinius don't support simultaneous thread execution, unless using an extension specifically built to support it. Depending on how much of your performance trouble is in the IO and how much is in the processing I could suggest using EventMachine.
Multi process however Ruby works very well, as long as you've got a good manager/database for all the processes to communicate with then running multiple processes should scale as well as your processing power allows.

Hey another way is to use a combination of Nokogiri and IronWorker (IronMQ and IronCache).
See a full blog entry on the Topic here

We use a combination of ActiveMQ/Active Messaging, Event Machine, and multi-threading for this problem. We start off with a big list of URL's to fetch. We then break them down into batches of 100 URL's per batch. Each batch is then pushed into ActiveMQ. Then, we have an array of poller/consumer processes listening to the queue. These consumers can all be on one computer, or they can be spread across multiple computers. The array of consumers can grow arbitrarily large to support as much parallelism as we want. The consumers use Active Messaging, which is a nice Ruby integration with ActiveMQ.
When a consumer receives a message to process a batch of 100 URL's, it kicks off Event Machine to create a thread pool that can process multiple messages in multiple threads. Like you, we use Nokogiri to process each URL.
So, there are three levels of parallelism:
1) Multiple concurrent requests per consumer process, supported by Event Machine and threads.
2) Multiple consumer processes per computer.
3) Multiple computers.

If you want something easy go for http://anemone.rubyforge.org/
If you want something fast, code something with eventmachine/em-http-request
I found redis to be a great multi purpose tool for queue management, caching and so on. You could also use specialized things like beanstalkd/active mq/... but at least in my use case, I didn't really find them to be a big advantage compared to redis.
Especially the load on the backend system could be a bottleneck, so chose your database carefully and pay attention to what you save

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio