I have read that with dart your application can start up to 10x faster because of snapshots. Can anyone explain what it really is and how it works? In what kind of application would i be using snapshots?
Dart's Snapshots are like Smalltalk images in the sense that they allow nearly instant application startup. However, unlike Smalltalk images, Snapshots don't store the program state.
This is especially helpful in slower mobile devices because they are inherently slower and also restricted by memory much more than a desktop system. That reason and the fact that battery usage begs us to close unnecessary programs makes startup speed important.
Dart addresses this issue of slow startup with the heap snapshot feature, which is similar to Smalltalk's image system. The heap of an application is traversed and all objects are written to a simple file. Note: at the moment, the Dart distribution ships with a tool that fires up a Dart VM, loads an application's code, and just before calling main, it takes a snapshot of the heap. The Dart VM can use such a snapshot file to quickly load an application.
The snapshot feature is also used to serialize object graphs that are being sent between Dart Isolates (serialized with SnapshotWriter).
Currently I do not know of any way to initiating a snapshot or dealing with them. In the future, I would expect it to be possible to serve a snapshot file from the web server and having that processed by the browser Dart VM instantaneously.
The snapshot format itself is cross-platform meaning that it works between 32-bit, 64-bit machines and so forth. The format has been made so that it's quick to read into memory with a emphasis on minimizing extra work like pointer fixups.
Here's the source code for snapshot.cc: http://code.google.com/p/dart/source/browse/trunk/dart/runtime/vm/snapshot.cc
and the tests: http://code.google.com/p/dart/source/browse/trunk/dart/runtime/vm/snapshot_test.cc
So the reason why it can speed up an application startup by a factor of 10 is because it's not a bunch of source code like JavaScript that is send as-is and slowly processed afterwards.
And where would you like to use it? Anywhere you possibly can. On the server side, it's basically already happening for you (and doesn't matter really). but on the client-side, that's not possible yet. As I understand it, it will be possible to serve these snapshots to the browser for instant startup, but you really have to wait since it's not available as of now.
Related
Late Friday night thoughts after reading through material on how Cloudflare's v8 based "no cold start" Workers function - in short, because of the V8 engine's Just-in-Time compiler of Javascript code - I'm wondering why this no cold start type of serverless functions seems to only exist for Javascript.
Is this just because architecturally when AWS Lambda / Azure Functions were launched, they were designed as a kind of even more simplified Kubernetes model, where each function exists in its own container? I would assume that was a simpler model of keeping different clients' code separate than whatever magic sauce v8 isolates provided under the hood.
So given Java is compiled into bytecode for the JVM, which uses JIT compilation (if it doesn't optimise and compile to machine code certain high usage functions), is it therefore also technically possible to have no cold start Java serverless functions? As long as there is some way to load in each client's bytecode as they are invoked, on the cloud provider's server.
What are the practical challenges for this to become a reality? I'm not a big expert on all this, but can imagine perhaps:
The compiled bytecode isn't designed to be loaded in this way - it expects to be the only code being executed in a JVM
JVM optimisations aren't written to support loading short-lived, multiple functions, and treats all code loaded in to be one massive program
JVM once started doesn't support loading additional bytecode.
In principle, you could probably develop a Java-centric serverless runtime in which individual functions are dynamically loaded on-demand, and you might be able to achieve pretty good cold-start time this way. However, there are two big reasons why this might not work as well as JavaScript:
While Java is designed for JIT compiling, it has not been optimized for startup time nearly as intensely as V8 has. Today, the JVM is most commonly used in large always-on servers, where startup speed is not that important. V8, on the other hand, has always focused on a browser environment where code is downloaded and executed while a user is waiting, so minimizing startup latency is critical. (It might actually be interesting to look at an alternative Java runtime like Android's Dalvik, which has had much more reason to prioritize startup speed. Maybe it could be the basis of a really fast Java serverless environment!)
Security. V8 and other JavaScript runtimes have been designed with hostile code in mind from the beginning, and have had a huge amount of security research done on them. Java tried to target this too, in the very early days, with "applets", but that usage of Java never caught on. These days, secure sandboxing is not a major concern of Java. Because of this, it is probably too risky to run multiple Java apps that don't trust each other within the same container. And so, you are back to starting a separate container for each application.
I understand why cold starts happen (Byte code needs to be turned into machine code through JIT compilation). However with all the generated meta data available for binaries these days I do not understand why there isn't a simple tool that automatically takes the byte code and turns ALL PATHS THROUGH THE CODE (auto discovered) into machine code specific for that target platform. That would mean the first request through any path (assume a rest api) would be fast and not require any further Just In Time Compilation.
We can create an automation test suite or load test to JIT all the paths before allowing the machine into the load balancer rotation (good best practice anyway). We can also flip the "always on" setting in cloud hosting providers to keep the warmed application from getting evicted from memory (requiring the entire process over again). However, it seems like such an archaic process to still be having in 2020.
Why isn't there a tool that does this? What is the limitation that prevents us from using meta data, debug symbols and/or other means to understand how to generate machine code that is already warm and ready for users from the start?
So I have been asking some sharp minds around my professional network and no one seems to be able to point out exactly what limitation makes this so hard to do. However, I did get a few tools on my radar that do what i'm looking for to some level.
Crossgen appears to be the most promising but it's far from widely used among the many peers I've spoken to. Will have to take a closer look.
Also several do some sort of startup task that runs some class initialization and also register them as singletons. I wouldn't consider those much different then just running integration or load tests on the application.
Most programming languages have some form of native image compiler tool. It's up to you to use them if that is what you are looking to do.
Providers are supposed to give you a platform for your application and there is a certain amount of isolation and privacy you should expect from your provider. They should not go digging into your application to figure out all its "paths". That would be very invasive.
Plus "warming up" all paths would be an overly resource intensive process for a provider to be obligated to perform for every application they host.
I am trying to decide what application paradigm to use for my iOS app build with Appcelerator. Many people talk about the Tweetanium way as the best way, i.e. single context.
I think I am going to use that but I have a few questions about it.
Since I include all "windows" on the first page. Does that mean that it will have to load all windows in the application at app start?
Will this paradigm really be very fast and memory conservative compared to "normal" way of for example the Kitchensink?
What is the downside of using Tweetaniums way of doing things?
Is it suitable for complex apps?
Thankful for all input!
Short version: Yes :)
Longer version:
Multi-context apps (like the Kitchen Sink) are also fine generally speaking, but you run into the following two problems with larger apps:
1.) Sharing data between windows/contexts within an app
2.) Unsure when the code for a given window has been run
You can also (potentially) maintain a pointer to a UI object created in one context after the window associated with that context is closed, which under some circumstances can lead your app to leak memory. Single context is easier and will ultimately get you into less trouble. Also, if your app is large, be sure to only load scripts as you need them, and not all up front.
What do you do to increase startup speed (or to decrease startup time) of your Delphi app?
Other than application specific, is there a standard trick that always works?
Note: I'm not talking about fast algorithms or the likes. Only the performance increase at startup, in terms of speed.
In the project options, don't auto-create all of your forms up front. Create and free them as needed.
Try doing as little as possible in your main form's OnCreate event. Rather move some initialization to a different method and do it once the form is shown to the user. An indicator that the app is busy with a busy mouse cursor goes a long way.
Experiments done shows that if you take the exact same application and simply add a startup notification to it, users actually perceive that app as starting up faster!
Other than that you can do the usual things like exclude debug information and enable optimization in the compiler.
On top of that, don't auto create all your forms. Create them dynamically as you need them.
Well, as Argalatyr suggested I change my comment to a separate answer:
As an extension to the "don't auto create forms" answer (which will be quite effective by itself) I suggest to delay opening connections to databases, internet, COM servers and any peripheral device until you need it first.
Three things happen before your form is shown:
All 'initialization' blocks in all units are executed in "first seen" order.
All auto-created forms are created (loaded from DFM files and their OnCreate handler is called)
You main form is displayed (OnShow and OnActivate are called).
As other have pointed out, you should auto-create only small number of forms (especially if they are complicated forms with lots of component) and should not put lengthy processing in OnCreate events of those forms. If, by chance, your main form is very complicated, you should redesign it. One possibility is to split main form into multiple frames which are loaded on demand.
It's also possible that one of the initialization blocks is taking some time to execute. To verify, put a breakpoint on the first line of your program (main 'begin..end' block in the .dpr file) and start the program. All initialization block will be executed and then the breakpoint will stop the execution.
In a similar way you can step (F8) over the main program - you'll see how long it takes for each auto-created form to be created.
Display a splash screen, so people won't notice the long startup times :).
Fastest code - it's the code, that never runs. Quite obvious, really ;)
Deployment of the application can (and usually does!) happen in ways the developer may not have considered. In my experience this generates more performance issues than anyone would want.
A common bottleneck is file access - a configuration file, ini file that is required to launch the application can perform well on a developer machine, but perform abysmally in different deployment situations. Similarly, application logging can impede performance - whether for file access reasons or log file growth.
What I see so often are rich-client applications deployed in a Citrix environment, or on a shared network drive, where the infrastructure team decides that user temporary files or personal files be stored in a location that the application finds issues with, and this leads to performance or stability issues.
Another issue I often see affecting application performance is the method used to import and export data to files. Commonly in Delphi business applications I see export functions that work off DataSets - iterating and writing to file. Consider the method used to write to file, consider memory available, consider that the 'folder' being written to/read from may be local to the machine, or it may be on a remote server.
A developer may argue that these are installation issues, outside the scope of their concern. I usually see many cycles of developer analysis on this sort of issue before it is identified as an 'infrastructure issue'.
First thing to do is to clear auto
created forms list (look for Project
Options). Create forms on the fly
when needed, especially if the
application uses database connection
(datamodule) or forms that include
heavy use of controls.
Consider using form inheritance also
to decrease exe size (resource usage is mimized)
Decrease number of forms and merge similar or related functionality into single form
Put long running tasks (open database connections, connect to app server, etc) that have to be performed on startup in a thread. Any functionality that depends on these tasks are disabled until the thread is done.
It's a bit of a cheat, though. The main form comes up right away, but you're only giving the appearance of faster startup time.
Compress your executable and any dlls using something like ASPack or UPX. Decompression time is more than made up for by faster load time.
UPX was used as an example of how to load FireFox faster.
Note that there are downsides to exe compression.
This is just for the IDE, but Chris Hesick made a blog posting about increasing startup performance under the debugger.
Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.
As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.
It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.
How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.
#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).
One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.
A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.
It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.
If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)