debugging scheduling modules in hadoop and storm - debugging

I have been studying Big data platforms like Hadoop and Storm for a while and I'm at the beginning of a research work in the field of scheduling/resource management. But I wonder how they (developers of these libraries) manage to debug the Scheduling Classes? Do they use a specific tool (IDE) or just use logging/unit testing to control how it's running. Because I think using just logging/testing is too complicated and I can't imagine how to test entire scheduler and subsystems altogether.
The Question is, how can I debug my Algorithm after implementation and integrating it into the platform? Is there a place that I can find a sample to understand the logic of their work?
Because using tools (I use intelliJ IDEA) I can debug user level programs with no problem, but at the system level (Scheduling and Resource Management Classes) this method doesn't work!
Any idea would be appreciated (either on hadoop library or storm).

Related

Why Java based serverless functions have cold start if the JVM uses a JIT compiler?

Late Friday night thoughts after reading through material on how Cloudflare's v8 based "no cold start" Workers function - in short, because of the V8 engine's Just-in-Time compiler of Javascript code - I'm wondering why this no cold start type of serverless functions seems to only exist for Javascript.
Is this just because architecturally when AWS Lambda / Azure Functions were launched, they were designed as a kind of even more simplified Kubernetes model, where each function exists in its own container? I would assume that was a simpler model of keeping different clients' code separate than whatever magic sauce v8 isolates provided under the hood.
So given Java is compiled into bytecode for the JVM, which uses JIT compilation (if it doesn't optimise and compile to machine code certain high usage functions), is it therefore also technically possible to have no cold start Java serverless functions? As long as there is some way to load in each client's bytecode as they are invoked, on the cloud provider's server.
What are the practical challenges for this to become a reality? I'm not a big expert on all this, but can imagine perhaps:
The compiled bytecode isn't designed to be loaded in this way - it expects to be the only code being executed in a JVM
JVM optimisations aren't written to support loading short-lived, multiple functions, and treats all code loaded in to be one massive program
JVM once started doesn't support loading additional bytecode.
In principle, you could probably develop a Java-centric serverless runtime in which individual functions are dynamically loaded on-demand, and you might be able to achieve pretty good cold-start time this way. However, there are two big reasons why this might not work as well as JavaScript:
While Java is designed for JIT compiling, it has not been optimized for startup time nearly as intensely as V8 has. Today, the JVM is most commonly used in large always-on servers, where startup speed is not that important. V8, on the other hand, has always focused on a browser environment where code is downloaded and executed while a user is waiting, so minimizing startup latency is critical. (It might actually be interesting to look at an alternative Java runtime like Android's Dalvik, which has had much more reason to prioritize startup speed. Maybe it could be the basis of a really fast Java serverless environment!)
Security. V8 and other JavaScript runtimes have been designed with hostile code in mind from the beginning, and have had a huge amount of security research done on them. Java tried to target this too, in the very early days, with "applets", but that usage of Java never caught on. These days, secure sandboxing is not a major concern of Java. Because of this, it is probably too risky to run multiple Java apps that don't trust each other within the same container. And so, you are back to starting a separate container for each application.

SCA Solution in Large Distributed Control System

I want to provide a solution for building our large distributed control system. The current implementation is written in C++. I need to rewrite it again.
I have several questions:
The system should have hot-plugin feature, I don't know whether
it exists some OSGi implementations to support C++ programming model
Which ESB could be better if consider real-time and flexible
routing, since large volume messages will be transferred quickly
between nodes?
Since integration is very important in our system, which MOM can
be used to build my ESB according to real-time and flexible routing
constraint?
Which open source SCA implementation is suitable for C++
programming model?
Hope your answers eagerly!
Thanks very much!
If you require a C++ runtime, I would look at Trentino (http://trentino.sourceforge.net/), which is sponsored by Siemens.
There are a number of Java=based runtimes. One that supports dynamic deployment of contributions is Fabric3 (www.fabric3.org).

Is jeromq production ready?

I've used ZeroMQ in the past with with JVM applications via the jzmq library. I am planning on using zeromq on a new project where some of the services are implemented on the JVM. I just discovered jeromq, a pure java implementation of zeromq, and I would like to use it mostly since it is tracking zeromq 3.x and it removes the headache of dealing with jzmq. However, I can't tell from the repo page if it is production ready. Does anyone have experience with jeromq in production?
As the author of the project, I'm a little bit biased.
The reason I made jeromq was I also had some trouble with deploying jzmq having JNI.
The project has a short history but keep improving from feedbacks and contributions.
But it's not a replacement of jzmq. Both project are active and driven by a major community. You can get help from the community and contribute to the projects also.
From the 3.0-SNAPSHOT, it has a API level compatibility. You can switch between jeromq and jzmq easily without changing your code.
Why not write a JNI that would do all the interaction with 0MQ ? This would bring the problem in your hands instead of hoping for some 3rd party library being mature enough or production-ready.
That's what I'd do. The C/C++ API of zeromq is IMHO the most mature of them and, as such, I think it would bring you the most benefit.
Writing a JNI is not hard either so I think this would be a good way to go.

Quartz.NET vs Windows Scheduled Tasks. How different are they?

I'm looking for some comparison between Quartz.NET and Windows Scheduled Tasks?
How different are they? What are the pros and cons of each one? How do I choose which one to use?
TIA,
With Quartz.NET I could contrast some of the earlier points:
Code to write - You can express your intent in .NET language, write unit tests and debug the logic
Integration with event log, you have Common.Logging that allows to write even to db..
Robust and reliable too
Even richer API
It's mostly a question about what you need. Windows Scheduled tasks might give you all you need. But if you need clustering (distributed workers), fine-grained control over triggering or misfire handling rules, you might like to check what Quartz.NET has to offer on these areas.
Take the simplest that fills your requirements, but abstract enough to allow change.
My gut reaction would be to try and get the integral WinScheduler to work with your needs first before installing yet another scheduler - reasoning:
no installation required - installed and enabled by default
no code to write - jobs expressed as metadata
integration with event log etc.
robust and reliable - good enough for MSFT, Google etc.
reasonably rich API - create jobs, check status etc.
integrated with remote management tools
security integration - run jobs in different credentials
monitoring tooling
Then reach for Quartz if it doesn't meet your needs. Quartz certainly has many of these features too, but resist adding yet another service to own and manage if you can.
One important distinction, for me, that is not included in the other answers is what gets executed by the scheduler.
Windows Task Scheduler can only run executable programs and scripts. The code written for use within Quartz can directly interact with your project's .NET components.
With Task Scheduler, you'll have to write a shell executable or script. Inside of that shell, you can interact with your project's components. While writing this shell code is not a difficult process, you do have to consider deploying the extra files.
If you anticipate adding more scheduled tasks over the lifetime of the project, you may end up needing to create additional executable shells or script files, which requires updates to the deployment process. With Quartz, you don't need these files, which reduces the total effort needed to create and deploy additional tasks.
Unfortunately, Quartz.NET job assemblies can't be updated without restarting the process/host/service. That's a pretty big one for some folks (including myself).
It's entirely possible to build a framework for jobs running under Task Scheduler. MEF-based assemblies can be called by a single console app, with everything managed via a configuration UI. Here's a popular managed wrapper:
https://github.com/dahall/taskscheduler
https://www.nuget.org/packages/TaskScheduler
I did enjoy my brief time of working with Quart.NET, but the restart requirement was too big a problem to overcome. Marko has done a great job with it over the years, and he's always been helpful and responsive. Perhaps someday the project will get multiple AppDomain support, which would address this. (That said, it promises to be a lot of work. Kudos to he and his contributors if they decide to take it on.)
To paraphrase Marko, if you need:
Clustering (distributed workers)
Fine-grained control over triggering or misfire handling rules
...then Quartz.NET will be your requirement.

Seti#Home kind of frameworks

Are there any client-server frameworks similar to SETI available ?
I have such client-server model, where volunteers sign up as client (agent or node, call it whatever) and give their idle computing resources.
So I will need to write a framework to distribute and track the work-units (or jobs) given to agents.
Is there any such FW available which i could go for. Then I save time to write the job processing logic etc.
Further, I hope the framework will also support the OS compatibility issues, agent binaries updates etc.
Pl. give any other suggestions in general on such distributed computing project you think I should investigate.
Look at BOINC, which is a general framework for handling SETI style stuff.
Edit to expand: in fact, iirc BOINC is a spinoff of SETI. It'll probably handle all of your requirements.

Resources