Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
How can I use several computers to create a faster environment? I have about 12 computers with 4GB each and 2GHz each. I need to run some time consuming data transform and would like to use the combined power of these machines. They are all running Win2003 server.
Basically we have a large number of video files that we need to transform so our analysts can do their analysis. The problem is complicated by the fact I can't tell you more about the project.
I moved it to: https://serverfault.com/questions/40615/is-it-possible-to-create-a-faster-computer-from-many-computers
Yes, it's called Grid Computing or more recently, Cloud Computing.
There are many programming toolkits that are available to distribute your operations across a network. Everything from doing builds to database operations to complex mathematics libraries to special parallel programming languages.
There are solutions for every size product, from IBM and Oracle down to smaller vendors like Globus. And there are even open-source solutions, such as GridGain and NGrid (the latter is on SourceForge).
You will get speed only if you can split the job and run it on multiple computers parallely. Can you do that with your data transform program? One of the things I am aware of is that Amazon supports map-reduce. If you can express your data transform problem as a map-reduce problem, you can potentially leverage Amazon's cloud based Hadoop service.
There's really no "out of the box" way to just combine multiple computers into one big computer in a generic way like that.
The idea here is distributed computing, and you would have to write a program (possibly using an existing framework) that would essentially split your data transform into smaller chunks, send those off to each of the other computers to process, then aggregate the results.
Whether this would work or not would depend on the nature of your problem - can it be split into multiple chunks that can be worked on independently or not?
If so, there are several existing frameworks out there that you could use to build such an application. Hadoop for example, which uses Map-Reduce would be a good place to start.
Your program will probably need to be modified to to take advantage of multiple machines.
One method of doing this is to use an implementation of MPI (possibly MSMPI as you're using Windows Server)
Try looking at Condor. Their homepage is light on info, so check out the wikipedia article first.
There are a variety of tools out there that can use a distributed network of computers.
An example is Incredibuild
Imagine a Beowulf cluster of these things!
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I want to learn how to get started with developing highly scalable server/client applications--more specifically for non-web-based/not-in-a-browser desktop clients. I figure that developing a very minimalistic chat application (roughly comparable to AIM/Skype) is a reasonable way to get started down such a path of learning about servers/clients and scalability.
I am unsure which programming language would be appropriate for this task considering my emphasis on scalability. Personally, the only languages I am interested in working with are Java, C#, and C/C++. As far as the server OS goes, I will be dealing with Linux, so C# in my case would imply Mono.
I suppose my specific interest boils down to what language to use on the server, since it is the infrastructure supporting the application which has to be highly scalable. I have heard mixed reviews of Java and C# server scalability. My intuition would suggest that they are both perfectly reasonable choices, but then I hear about others running into problems once they reach a certain threshold of application/user traffic. It is hard to know what to make of hearsay, but I do suppose that the lack of bare-metal support of these languages could hinder scalability at certain thresholds. When I hear about C/C++, I hear mention of the great Boost libraries (ex. such as Boost.Asio) offering the ultimate scalability. But then I am scared off when I hear that sockets in particular are much more complex to deal with in C/C++ than with other languages like Java/C#.
What is an effective way to get started in making highly scalable server-client applications such as a chat client? Of the ones which I have mentioned, which programming language is adequately suited for developing such applications? What other languages should I consider for such an application?
EDIT: the term "scale" most directly relates to scaling to serve a large number of users (perhaps tens or hundreds of thousands, maybe millions).
"Scale" - in which way has it to scale? Scaling with CPU cores, with users or with code base?
You could ask: Which language implementation is the fastest? Which language will handle a lot of requests without problems?
In every language implementation you will need to have strategies to build a distributed system. If you have to worry about speed, you should rather worry about having a possibility to distribute your system on many machines.
If you want maximal scalability in terms of cores and non-blocking request, go with Erlang. It will handle a shitload of traffic on server side.
Each of the languages you mentioned will scale.
If your serious about this, you should choose the language you know best and build it - you havent even prototyped your idea yet and your concerning yourself with scale.
We could list many programs and websites written in each of the languages above that scale perfectly well (We can also list many that dont).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
This question answers part of my question but not completely.
How do I run a script that manages this, is it from my local filesystem? Where exactly do things like MrJob or Dumbo come into picture? Are there any more alternative?
I am trying to run K-Means where each iterations (a MapReduce job) output will be the input to the next iteration with Hadoop Streaming and Python.
I do not have much experience and any information should help me make this work.Thanks!
If you are not very tightly coupled with Python then you have a very good option. There is one project from Cloudera called "Crunch" that allows us to create pipelines of MR jobs easily. it's a java library that provides a framework for writing, testing, and running MapReduce pipelines, and is based on Google's FlumeJava library.
There is another non-python option. Graphlab is an open source project to produce free implementations of scalable machine learning algorithms on multicore machine and clusters. There is an implemented fast scalable version of the Kmeans++ algorithm included in the package. See Graphlab for details.
Clustering API of graphlab can be found here .
Seems like a good applications for Spark it has also streaming option but I'm afraid it only works with Scala, but they have Python API, definitively worth a try, it is not that difficult to use ( at least the tutorials) and it can scale at large.
It should be possible to use GraphLab Create (in Python) running on Hadoop to do what you describe. The clustering toolkit can help implement the K-Means part. You can coordinate/script it from your local machine and use the graphlab.deploy API to run the job on Hadoop.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I would like to build a simple cross-browser multiplayer game (like chess or a card game) which will communicate to a server using sockets.
I have some intermediate knowledge of Ruby language, but I'm not so convinced that it is a good solution for a multi-client server, so a thought that Node.js or Socket.io might be a better one. I know that Java or C++ could be great for the job, but I am not too comfortable with none of them, so that is the reason that I'm levitating towards Server-side JavaScript.
My question is, what do you think is the best solution for a project like this one? What might be the best server-side technology on which I will build an entire game and communication logic? Maybe some combination of them? Any comment regarding speed, server load, hosting solutions and development speed for each of the technologies will be greatly appreciated.
If you're comfy with JavaScript, you've got nothing to lose by giving node.js a go: the learning curve will be gentle. It's a pretty cool server tech.
Only disadvantage with node js is of course it won't scale like java. At all. This is often fine for web apps because you can throw a caching layer in front (reverse proxy), which greatly mitigates this. I imagine this won't be reasonable for your application, since the game state will change too frequently.
Node js can "scale" though, by spinning up more instances. If one server can easily accomodate more than one "game world", then this is straightforward. If you need to split a gameworld across multiple servers, then servers must cooperate. Beware this scenario though, it's not as simple as it first may seem: it's called the "multi master" problem and is one of the hungry internet monsters.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Are there reports or thesis about the performance of Google App Engine or other cloud platforms?
I'am writing an article about how to choose an appropriate cloud platform, and want to reference some test data.
A little work with Google may bring up some material that others have found. For instance the canonical resource for Azure benchmarking is here: http://azurescope.cloudapp.net/. However, there's not much comparative material as it really doesn't make sense.
Comparing cloud platforms solely on performance is like comparing apples with bananas with oranges. Each have their own qualities that make them appropriate for a particular kind of application.
For example, in broad terms, for multi-platform use where you have control of the underlying OS, go EC2; for a managed Windows application platform go Azure; or for a managed Java/Python platform choose App Engine. Once you've chosen the platform you can pretty much then pay for the performance you need.
Bear in mind too that "performance" means different things for different applications. The application I'm working on, for instance, relies heavily on SQL database performance. That will have a very different performance profile from (say) an application that uses a key-value pair storage system, or an application that's mostly static HTML.
So, in practice, there aren't much in the way of performance benchmarks out there because every application is different.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I was shocked to learn that OpenMosix is closed. Can you suggest any similar free tool for linux.
For those who don't know, OpenMosix is
a software package that turns networked computers running GNU/Linux into a cluster. It automatically balances the load between different nodes of the cluster, and nodes can join or leave the running cluster without disruption of the service. The load is spread out among nodes according to their connection and CPU speeds.
The nicest part is that you don't need to link your programs with any special libraries neither do you need to modify your programs. Just "fork and forget".
Another nice (but not must have) feature is the fact that it doesn't have to be installed on dedicated computers, but can sit on various desktop computers in your organization/lab/home etc.
I'm aware of the names of several possible solutions (for example). I'm looking for personal experience and/or nice reviews
EDIT Mosix, the predecessor of OpenMosix, used to be free (as free beer). However, now it costs money
I'm not sure how it compares feature-wise to OpenMosix, but Rocks is an open source cluster Linux distro.
From the website:
Rocks is an open-source Linux cluster
distribution that enables end users to
easily build computational clusters,
grid endpoints and visualization
tiled-display walls. Hundreds of
researchers from around the world have
used Rocks to deploy their own cluster
You may want to listen to this episode of FLOSS Weekly that is all about Rocks.
The closet similar free solution to the openMosix technology is Kerrighed.
Shamelessly ripped from the Beowulf mailing list:
OpenSSI or
Mosix If you don't need a fully open-source solution and is a non-profit.
For a much more in-depth discussion check out this thread:
Beowulf - open mosix alternative
To help make this dead thread more useful, a more modern alternative is criu (Checkpoint and Restore In Userspace).
See for example:
https://chandanduttachowdhury.wordpress.com/2015/08/10/test-driving-criu-live-migrate-any-process-on-linux/
http://criu.org/
You might also consider containers like Docker as well or instead
E.g.
http://blog.circleci.com/checkpoint-and-restore-docker-container-with-criu/
I looked here to get an update as I have not used openmosix since graduating, but there is now a new tech called "Mesh Computing", and also the ether of bitcoin, so processes must transport the means of getting their data to a suitable node in a secure manner, and then try to run in a fault tolerant manner. I think the answer is a HURD, which before the mesh was more of a pipe dream. I think you should go to https://www.gnu.org/software/hurd/hurd.html and pitch in if you have time. The mesh is upon us and there is no access to anything except agent hosting on mesh.