I've read the part of the guide that recommends creating one Context. My previous implementation of my application had multiple contexts that I created ad-hoc to get a subscription running. I've since changed it to using a single context for all subscriptions.
What are the drawbacks of creating multiple contexts and what use cases are there for doing so? The guide has this following blurb:
Getting the Context Right
ZeroMQ applications always start by creating a context, and then using that for creating sockets. In C, it’s the zmq_ctx_new() call. You should create and use exactly one context in your process. Technically, the context is the container for all sockets in a single process, and acts as the transport for inproc sockets, which are the fastest way to connect threads in one process. If at runtime a process has two contexts, these are like separate ZeroMQ instances. If that’s explicitly what you want, OK, but otherwise remember
Does this mean that it's just not as efficient to use multiple contexts, but it would still work?
Q : "What are the drawbacks of creating multiple contexts ... ?"
Resources consumed. Nothing else. The more Context()-instances one produces, the more memory-allocated & the more overhead-time was spent on doing that.
One-time add-on costs may represent a drawback - some people forget about the Amdahl's Law (and forget to account for setup & termination add-on costs there) where small amounts of "useful"-work may start to be expensive right due to the (for some, ... it may surprise how often & how many ... hidden) add-on costs in attempts to distribute/parallelise some part of the application workloads, yet need not bother you, if not entering low-latency or ultra-low latency domains. Run-time add-on overheads ( to maintain each of the Context()-instances internal work - yes, it works in the background, so it consumes some CPU-clocks even when doing nothing ) may start doing troubles, when numbers of semi-persistent instances grow higher ( also depends on CPU-microarchitecture & O/S & soft-real-time needs, if present )
Q : "What ... use cases are there for doing so?"
When good software architect designs the code for ultimate performance and tries to shave-off the last few nanoseconds, there we go.
Using well thought & smart-crafted specialised-Context()-engines, the resulting ZeroMQ performance may grow to almost the CPU/memory-I/O based limits. One may like to read more on relative-prioritisation, CPU-core-mappings and other high-performance tricks on doing this, in my evangelisations of ZeroMQ design-principles.
Q : "Does this mean that it's just not as efficient to use multiple contexts, but it would still work?"
The part "it would still work" is easier - it would, if not violating the O/S maximum number of threads permitted and if there is still RAM available to store the actual flow of the messages intended for out-of-platform delivery, which uses additional, O/S-specific, buffers - yes, additional SpaceDOMAIN and TimeDOMAIN add-on latency & latency jitter costs start to appear in doing that.
The Zero-Copy inproc:// TransportClass is capable of actually doing a pure in-memory flag-signalling of memory-mapped Zero-Copy message-data, that never moves. In specific cases, there can be zero-I/O-threads Context()-instances for such inproc://-only low-latency data-"flow" models, as the data is Zero-Copied and never "flow" ;o) ).
Q : "Why does ZeroMQ recommend..."
Well, this seems to be a part of the initial Pieter HINTJENS' & Martin SUSRIK's evangelisation of Zero-Sharing, Zero-blocking designs. That was an almost devilish anti-pattern to the Herd of Nerds, who lock/unlock "shared" resources and were suddenly put to a straight opposite ZeroMQ philosophy of designing smart behaviours (without a need to see under the hood).
The art of the Zen-of-Zero - never share, never block, never copy (if not in a need to do so) was astoundingly astonishing to Nerds, who could not initially realise the advantages thereof (as they were for decades typing in code that was hard to read, hard to rewrite, hard to debug, right due to the heaps of sharing-, locking- and blocking-introduced sections and that they/we were "proud-off" to be the Nerds, who "can", where not all our colleagues were able to decode/understand the less improve).
The "central", able to be globally shared Context()-instance was a sign of light for those, who started to read, learn and use the new paradigm.
After 12+ years this may seem arcane, yet the art of the Zen-of-Zero started with this pain (and a risk of an industry-wide "cultural", not a technical, rejection).
Until today, this was a brave step from both Pieter HINTJENS & Martin SUSTRIK.
Ultimate ~Respect!~ to the whole work they together undertook... for our learning their insights & chances to re-use them in BAU... without an eye-blink.
Great minds.
A program of mine that 'computes', on OS/X, shows a time like:
real 0m10.883s
user 0m6.924s
sys 0m3.957s
On a nearby Linux system, in contrast, I see:
real 0m7.480s
user 0m7.172s
sys 0m0.280s
To make matters worse, this situation arrived after rewriting one particular algorithm, and neither the new nor the old does any obvious system-call-ish stuff.
Some poking around with dapptrace and iprofiler failed to turn up anything. This is all 10.8.2, xcode 4.2. The code in question is C++.
The code here is not (yet) small enough to post here. I can say that the area where things changed has STL vectors and maps in it. However, dapptrace was not revealing.
procsystime, however, reveals that 'madvise' is the system call at the root of this evil.
I'm going to vote to close this and write a new question based on what I've learned.
"sys" time (short for "system") represents cpu time spent running kernel code. Obviously, your program is relying on some facility that is implemented in the OSX kernel that is not implemented in the Linux kernel. My guess is that you're using some library that has different implementations on the two respective platforms.
Another possibility might be the result of cpu time being accounted differently--they are entirely separate operating systems, after all. Their internal instrumentation is going to be different from each other.
Impossible to say for sure without more info about your program -- care to post any source code?
If you see mysterious sys time, procsystime is your friend. It provides a concise summary of system time usage.
Beware of free. On OS/X, free has a fondness for madvice. It might be a bug, it might be a feature, but code that runs lickety-split on Linux can run much more slowly on OS/X due to free calling madvise.
Basically, what I want to achieve is to run a program in an environment which will give any value asked by the program on basis criteria decided by me. Say e.g., games regularly query system about the time so as to execute animations, now if I have a control over the values of time that are passed on to the program I could control the speed of the animation (and perhaps clear some difficult rounds easily :P).
On similar grounds keystrokes, mouse movements could also be controlled (I have tried Java's Robot class, but did not find it satisfactory: slows the system {or perhaps my implementation was bad}, the commands are executed on currently focused programs and can't be targeted specifically).
Any graceful way of doing it or some pointers on achieving it will be highly appreciated.
I will start with the question and then proceed to explain the need:
Given a single C++ source code file which compiles well in modern g++ and uses nothing more than the standard library, can I set up a single-task operating system and run it there?
EDIT: Following the comment by Chris Lively, I would have better asked: What's the easiest way you can suggest to try to tweak linux into effectively giving me a single-tasking behavior.
Nevertheless, it seems like I did get a good answer although I did not phrase my question well enough. See the second paragraph in sarnold's answer regarding the scheduler.
Motivation:
In some programming competitions the communication between a contestant's program and the grading program involves a huge number of very short interactions.
Thus, using getrusage to measure the time spent by a contestant's program is inaccurate because getrusage works by sampling a process at constant intervals (usually around once per 10ms) which are too large compared to the duration of each interaction.
Another approach to timing would be to measure time before and after the program is run using something like *clock_gettime* and then subtract their values. We should also subtract the amount of time spent on I/O and this can be done be intercepting printf and scanf using something like LD_PRELOAD and accumulate the time spent in each of these functions by checking the time just before and just after each call to printf/scanf (it's OK to require contestants to use these functions only for I/O).
The method proposed in the last paragraph is ofcourse only true assuming that the contestant's program is the only program running which is why I want a single-tasking OS.
To run both the contestant's program and the grading program at the same time I would need a mechanism which, when one of these program tries to read input and blocks, runs the other program until it write enough output. I still consider this to be single tasking because the programs are not going to run at the same time. The "context switch" would occur when it is needed.
Note: I am aware of the fact that there are additional issues to timing such as CPU power management, but I would like to start by addressing the issue of context switches and multitasking.
First things first, what I think would suit your needs best would actually be a language interpreter -- one that you can instrument to keep track of "execution time" of the program in some purpose-made units like "mems" to represent memory accesses or "cycles" to represent the speeds of different instructions. Knuth's MMIX in his The Art of Computer Programming may provide exactly that functionality, though Python, Ruby, Java, Erlang, are all reasonable enough interpreters that can provide some instruction count / cost / memory access costs should you do enough re-writing. (But losing C++, obviously.)
Another approach that might work well for you -- if used with extreme care -- is to run your programming problems in the SCHED_FIFO or SCHED_RR real-time processing class. Programs run in one of these real-time priority classes will not yield for other processes on the system, allowing them to dominate all other tasks. (Be sure to run your sshd(8) and sh(1) in a higher real-time class to allow you to kill runaway tasks.)
I came across the following statement in Trapexit, an Erlang community website:
Erlang is a programming language used
to build massively scalable soft
real-time systems with requirements on
high availability.
Also I recall reading somewhere that Twitter switched from Ruby to Scala to address scalability problem.
Hence, I wonder what is the relation between a programming language and scalability?
I would think that scalability depends only on the system design, exception handling etc. Is it because of the way a language is implemented, the libraries, or some other reasons?
Hope for enlightenment. Thanks.
Erlang is highly optimized for a telecommunications environment, running at 5 9s uptime or so.
It contains a set of libraries called OTP, and it is possible to reload code into the application 'on the fly' without shutting down the application! In addition, there is a framework of supervisor modules and so on, so that when something fails, it gets automatically restarted, or else the failure can gradually work itself up the chain until it gets to a supervisor module that can deal with it.
That would be possible in other languages of course too. In C++, you can reload dlls on the fly, load plugsin. In Python you can reload modules. In C#, you can load code in on-the-fly, use reflection and so on.
It's just that that functionality is built in to Erlang, which means that:
it's more standard, any erlang developer knows how it works
less stuff to re-implement oneself
That said, there are some fundamental differences between languages, to the extent that some are interpreted, some run off bytecode, some are native compiled, so the performance, and the availability of type information and so on at runtime differs.
Python has a global interpreter lock around its runtime library so cannot make use of SMP.
Erlang only recently had changes added to take advantage of SMP.
Generally I would agree with you in that I feel that a significant difference is down to the built-in libraries rather than a fundamental difference between the languages themselves.
Ultimately I feel that any project that gets very large risks getting 'bogged down' no matter what language it is written in. As you say I feel architecture and design are pretty fundamental to scalability and choosing one language over another will not I feel magically give awesome scalability...
Erlang comes from another culture in thinking about reliability and how to achieve it. Understanding the culture is important, since Erlang code does not become fault-tolerant by magic just because its Erlang.
A fundamental idea is that high uptime does not only come from a very long mean-time-between-failures, it also comes from a very short mean-time-to-recovery, if a failure happened.
One then realize that one need automatic restarts when a failure is detected. And one realize that at the first detection of something not being quite right then one should "crash" to cause a restart. The recovery needs to be optimized, and the possible information losses need to be minimal.
This strategy is followed by many successful softwares, such as journaling filesystems or transaction-logging databases. But overwhelmingly, software tends to only consider the mean-time-between-failure and send messages to the system log about error-indications then try to keep on running until it is not possible anymore. Typically requiring human monitoring the system and manually reboot.
Most of these strategies are in the form of libraries in Erlang. The part that is a language feature is that processes can "link" and "monitor" each other. The first one is a bi-directional contract that "if you crash, then I get your crash message, which if not trapped will crash me", and the second is a "if you crash, i get a message about it".
Linking and monitoring are the mechanisms that the libraries use to make sure that other processes have not crashed (yet). Processes are organized into "supervision" trees. If a worker process in the tree fails, the supervisor will attempt to restart it, or all workers at the same level of that branch in the tree. If that fails it will escalate up, etc. If the top level supervisor gives up the application crashes and the virtual machine quits, at which point the system operator should make the computer restart.
The complete isolation between process heaps is another reason Erlang fares well. With few exceptions, it is not possible to "share values" between processes. This means that all processes are very self-contained and are often not affected by another process crashing. This property also holds between nodes in an Erlang cluster, so it is low-risk to handle a node failing out of the cluster. Replicate and send out change events rather than have a single point of failure.
The philosophies adopted by Erlang has many names, "fail fast", "crash-only system", "recovery oriented programming", "expose errors", "micro-restarts", "replication", ...
Erlang is a language designed with concurrency in mind. While most languages depend on the OS for multi-threading, concurrency is built into Erlang. Erlang programs can be made from thousands to millions of extremely lightweight processes that can run on a single processor, can run on a multicore processor, or can run on a network of processors. Erlang also has language level support for message passing between processes, fault-tolerance etc. The core of Erlang is a functional language and functional programming is the best paradigm for building concurrent systems.
In short, making a distributed, reliable and scalable system in Erlang is easy as it is a language designed specially for that purpose.
In short, the "language" primarily affects the vertical axii of scaling but not all aspects as you already eluded to in your question. Two things here:
1) Scalability needs to be defined in relation to a tangible metric. I propose money.
S = # of users / cost
Without an adequate definition, we will discussing this point ad vitam eternam. Using my proposed definition, it becomes easier to compare system implementations. For a system to be scalable (read: profitable), then:
Scalability grows with S
2) A system can be made to scale based on 2 primary axis:
a) Vertical
b) Horizontal
a) Vertical scaling relates to enhancing nodes in isolation i.e. bigger server, more RAM etc.
b) Horizontal scaling relates to enhancing a system by adding nodes. This process is more involving since it requires dealing with real world properties such as speed of light (latency), tolerance to partition, failures of many kinds etc.
(Node => physical separation, different "fate sharing" from another)
The term scalability is too often abused unfortunately.
Too many times folks confuse language with libraries & implementation. These are all different things. What makes a language a good fit for a particular system has often more to do with the support around the said language: libraries, development tools, efficiency of the implementation (i.e. memory footprint, performance of builtin functions etc.)
In the case of Erlang, it just happens to have been designed with real world constraints (e.g. distributed environment, failures, need for availability to meet liquidated damages exposure etc.) as input requirements.
Anyways, I could go on for too long here.
First you have to distinguish between languages and their implementations. For instance ruby language supports threads, but in the official implementation, the thread will not make use of multicore chips.
Then, a language/implementation/algorithm is often termed scalable when it supports parallel computation (for instance via multithread) AND if it exhibits a good speedup increase when the number of CPU goes up (see Amdahl Law).
Some languages like Erlang, Scala, Oz etc. have also syntax (or nice library) which help writing clear and nice parallel code.
In addition to the points made here about Erlang (Which I was not aware of) there is a sense in which some languages are more suited for scripting and smaller tasks.
Languages like ruby and python have some features which are great for prototyping and creativity but terrible for large scale projects. Arguably their best features are their lack of "formality", which hurts you in large projects.
For example, static typing is a hassle on small script-type things, and makes languages like java very verbose. But on a project with hundreds or thousands of classes you can easily see variable types. Compare this to maps and arrays that can hold heterogeneous collections, where as a consumer of a class you can't easily tell what kind of data it's holding. This kind of thing gets compounded as systems get larger. e.g. You can also do things that are really difficult to trace, like dynamically add bits to classes at runtime (which can be fun but is a nightmare if you're trying to figure out where a piece of data comes from) or call methods that raise exceptions without being forced by the compiler to declare the exception. Not that you couldn't solve these kinds of things with good design and disciplined programming - it's just harder to do.
As an extreme case, you could (performance issues aside) build a large system out of shell scripts, and you could probably deal with some of the issues of the messiness, lack of typing and global variables by being very strict and careful with coding and naming conventions ( in which case you'd sort of be creating a static typing system "by convention"), but it wouldn't be a fun exercise.
Twitter switched some parts of their architecture from Ruby to Scala because when they started they used the wrong tool for the job. They were using Ruby on Rails—which is highly optimised for building green field CRUD Web applications—to try to build a messaging system. AFAIK, they're still using Rails for the CRUD parts of Twitter e.g. creating a new user account, but have moved the messaging components to more suitable technologies.
Erlang is at its core based on asynchronous communication (both for co-located and distributed interactions), and that is the key to the scalability made possible by the platform. You can program with asynchronous communication on many platforms, but Erlang the language and the Erlang/OTP framework provides the structure to make it manageable - both technically and in your head. For instance: Without the isolation provided by erlang processes, you will shoot yourself in the foot. With the link/monitor mechanism you can react on failures sooner.