We are testing a load harness and we wrote it as shell scripts calling javascript files via the mongo shell. I just wanted to get your opinion as to whether running the algorithm this way might not give us correct results given that the mongo shell is for admin and relies on a javascript engine. Would running the algorithm written as an app in Java or some other language give us different results?
I guess the short question is will an algorithm that loads mongodb have performance differences when using the shell vs a driver. Also, anything we should keep in mind as we compare MMAP vs WiredTiger in 3.0.7?
Related
Late Friday night thoughts after reading through material on how Cloudflare's v8 based "no cold start" Workers function - in short, because of the V8 engine's Just-in-Time compiler of Javascript code - I'm wondering why this no cold start type of serverless functions seems to only exist for Javascript.
Is this just because architecturally when AWS Lambda / Azure Functions were launched, they were designed as a kind of even more simplified Kubernetes model, where each function exists in its own container? I would assume that was a simpler model of keeping different clients' code separate than whatever magic sauce v8 isolates provided under the hood.
So given Java is compiled into bytecode for the JVM, which uses JIT compilation (if it doesn't optimise and compile to machine code certain high usage functions), is it therefore also technically possible to have no cold start Java serverless functions? As long as there is some way to load in each client's bytecode as they are invoked, on the cloud provider's server.
What are the practical challenges for this to become a reality? I'm not a big expert on all this, but can imagine perhaps:
The compiled bytecode isn't designed to be loaded in this way - it expects to be the only code being executed in a JVM
JVM optimisations aren't written to support loading short-lived, multiple functions, and treats all code loaded in to be one massive program
JVM once started doesn't support loading additional bytecode.
In principle, you could probably develop a Java-centric serverless runtime in which individual functions are dynamically loaded on-demand, and you might be able to achieve pretty good cold-start time this way. However, there are two big reasons why this might not work as well as JavaScript:
While Java is designed for JIT compiling, it has not been optimized for startup time nearly as intensely as V8 has. Today, the JVM is most commonly used in large always-on servers, where startup speed is not that important. V8, on the other hand, has always focused on a browser environment where code is downloaded and executed while a user is waiting, so minimizing startup latency is critical. (It might actually be interesting to look at an alternative Java runtime like Android's Dalvik, which has had much more reason to prioritize startup speed. Maybe it could be the basis of a really fast Java serverless environment!)
Security. V8 and other JavaScript runtimes have been designed with hostile code in mind from the beginning, and have had a huge amount of security research done on them. Java tried to target this too, in the very early days, with "applets", but that usage of Java never caught on. These days, secure sandboxing is not a major concern of Java. Because of this, it is probably too risky to run multiple Java apps that don't trust each other within the same container. And so, you are back to starting a separate container for each application.
I have developed some XQuery scripts which I call in a chain via Saxon-CLI (bat files).
My problem is now that Saxon CLI is quite slow (because Java is slow, and Java on DotNet is even slower).
The problem is the startup time which takes some seconds (not the query execution itself). So my idea is to avoid creating new processes over and over again and just use one XSLT or XQuery process which loads the scripts and execute them.
But how to load & execute an XQuery-File in Saxon-XSLT? Is it possible?
Certainly, command-line scripts that involve firing up a new Java VM for each step are NOT the way to do it!
XProc is certainly a good candidate. But I have to confess I still do a lot of this in Ant: it's old but it works.
It's also possible to control a sequence of queries and transformations from within XSLT (you can invoke queries using a Saxon extension function, but it needs Saxon-PE or higher). I don't think that's the preferred way, but it's one less technology to learn about.
There are also quite a few pipeline processors out there such as Orbeon.
I'm looking to develop a program that detects calls to certain Windows API functions and simply records the calling process, call count, and hopefully their arguments, to later mark them as benign or malicious.
The GUI program API monitor is a good example of the functionality I'm trying to achieve. Ideally I would like to track each desired API function individually and get the caller PID and parameters when or after it is used, without user input. The program should be able to run on any windows 7 machine, but can be limited to 32bit applications.
I understand there are several methods of hooking a function, and from my understanding Microsoft detours implements one of these, but I don't know if its the one best suited to what I want to do. I've seen detours, easyhook, deviare API hook, and others mentioned on very old posts, but I have a hard time getting my head around the differences and features of each.
So my question is, given what I'm trying to do, what do you recommend and why?
For reference I'm an intermediate level programmer, but a beginner at Windows programming.
Thanks for your help
I'm part of Nektra Deviare API Hook team. Our hooking engine is used by a large number of companies all over the world, in different types of end-user products (e.g.: Anti Virus, Data Loss Prevention, AI, handicapped software, Data Classification, App virtualization).
Deviare-InProc is the MS Detours replacement and Deviare2 has built in all the RPC you need to hook another process and get the calls in your own process.
We continuously fix all reported issues. You can verify it in our GitHub:
https://github.com/nektra/Deviare2
https://github.com/nektra/Deviare-InProc
You can see Deviare2 running in Nektra's SpyStudio API Monitor.
Detours is an excellent software but very expensive (USD 10k). In addition to this, it completely lacks of support. It can be compared to Deviare InProc,
EasyHook used to be a good start point because it was the only free option. But, now Deviare2 family is open source and EasyHook has a lot of stability issues for the real world.
I would like to build an application that uses Genetic Programming to figure out what exactly the user is asking. It's a programming application for non-programmers. Basically the user feeds the application a bunch of examples, and from the examples the application will derive the rules required to build a new program for the user's own use/distribution.
I've built prototypes using linear regression but it could only solve simple problems. This week I experimented genetic programming using pyevolve and it worked much more brilliantly than I expected! However, I suspect it being written in pure python made it require dozens of seconds to solve an example, whereas in my application I only have at most a couple of seconds time.
I've been trying to find a more performant library that was as easy to use as pyevolve but cannot find a suitable one. I tried openBeagle but after getting an example running, and hours of poring through the documentation later, I still cannot find a way to actually pick an individual out of the "Vivarium". I've seen people recommend GAUL but that is a GPL library and will limit how I can license my future application. I've tried to download lil-gp but the ftp download links are locked by a university's login screen.
Since the application will be a Mac OS X cocoa application, I did not consider Java, C# or Matlab GP libraries.
As a developer of Open BEAGLE I still recommend you to use that library if you seek a fast GP library. Retrieving your best individual would actually be done by running a second program that parses the XML file that is logged at the end of the evolution. Otherwise, you can access it through the Vivarium.getHallOfFame() method and then sort it and access the first element with the HallOfFame.operator[]. The Member you'll get is a struct of the individual with the generation it was recorded and in what deme it was.
That way you can get access to the best individual that ever lived in your evolution.
If you have specific questions on Open BEAGLE I recommend you to ask them directly to the developer list, we usually answer very quickly.
Although, if you wish to try a very different library in Python I recommend you DEAP that allows a lot more flexibility than Pyevolve. Some GP examples run much faster under PyPy than Python.
If you ask the key developer of the GAUL project for permission to use an alternative license agreement, then he* is quite likely to agree.
*"he" is me.
For alot of the tasks I have to do I find myself having to choose between making the program using a Shell Script in Linux or a programming language such as Java or Groovy. Does anyone havce any experience about how I should choose one over the other and why?
Shell scripts are excellent for concise filesystem operations and scripting the combination of existing functionality in filters and command line tools via pipes.
When your needs are greater - whether in functionality, robustness, performance, efficiency etc - then you can move to a more full-featured language. They tend to offer some combination of:
type safety
more advanced containers
better control over variable lifetimes and memory usage
threading
advanced IPC like shared memory and TCP/IP
full binary I/O capabilities, memory mapped file access etc.
access to OS APIs and myriad powerful libraries
better readability and maintainability as the project size increases
support for more advanced programming paradigms: Object Orientation, Functional Programming, Generative Programming etc.
better error-checking before your program starts running, hence less dependent on test case coverage
#Tony provides an excellent list of pros and cons. I would add one more general point - shell scripts, because they are so handy, risk displaying the "there is nothing more lasting than a temporary solution" characteristic with all the attendant problems of maintenance when somebody else needs to use it.
Shell script is the most intuitive way to have a "glue" on your system. However, it does not have some useful concepts, like inheritance and modularization, that languages like Python (which is very used to "glue" systems too) have.
True, the use of language depends basically on the task you are trying to do. For the most cases I've worked, shell script worked well, although I'm using a lot of Python to accomplish system related tasks. I don't think that Java would be an alternative in this case. Probably Groovy would be, but not Java (I mean Java as a language, not Java as platform.)
In a sysadmin view, I think Python and Ruby are awesome languages. Not only because of dynamic typing and the lack of need to be compiled, but because tools like Fabric, Capistrano, Puppet, and a lot of others which makes a sysadmin's life a lot easier :-)