Why is mongoose package very slow to load from network drive?

Why is mongoose package very slow to load from network drive? - performance

When starting a NodeJS app stored on my network drive, which requires mongoose (also in node_modules on the same network drive), it takes about 15-20 seconds for the app to start. (Test app on local drive loads mongoose much faster.)
My network drive is on a 100Mbit wired Ethernet network, and the actual transfer speeds are around 80Mbit per my tests.
The Mongoose package itself is only about 1.75MB. Does it seriously load 150MB of other modules? or is something else going on that I can adjust or improve?
Thanks for any tips.

It's not the bandwidth but latency that is often the bottleneck when working across a network.
NodeJS, due to it's modular nature, is inherently filesystem-heavy; a typical node_modules folder can contain thousands of tiny .js files, and accessing them requires a lot of round-trips to the filesystem, and that can be very slow.
To load just the mongoose dependencies, 452 .js files need to be accessed.
In addition, NodeJS uses heuristics when searching for .js files, as there are different ways in which a Node project can be structured. That requires listing a directory first before reading a specific file (so an extra round-trip or two, per file).
To get a feel of it, try for example copying 1,000 small files over a network. The average throughput would be much less than of a single big file.

Related

How can I use Cloud Storage as a Media Server without caching entire file before playback?

TL:DR - How can I stream content (specifically music and videos) from a Cloud Storage solution, like Google Drive without having the entire file cached first? My goal is to create a Netflix/YouTube-esque experience with my movie/music library.
So, this seems to be an issue that many people are having, and so many forum posts say that PlexCloud is the solution, but it isn't available anymore, so I want to find another way.
Essentially, I would like to free up space on my local machine, offloading my movies and music to the cloud. I would like these files to be available instantly from any of my devices.
The solutions I have come across so far are:
Google File Stream (or similar)
Expandrive
CloudMounter
These apps mount your cloud storage as a network drive and allow you to store files on the cloud and have "instant" access. These sound great in prinicple, but the issue with all of them is that the entire file has to be cached first before you can watch/listen. This defeats the whole purpose of having the files saved to the cloud, as every time you want to watch a video, the entire file has to be cached. This is very inconvenient for me, as I have a rather slow internet connection, monthly transfer limits, and you have to wait until the file has been cached before you can watch.
The closest I've got to making this work is with Kodi, but the interface is horrible on anything other than a TV. On desktop or mobile, it's useless! But, as far as functionality, the way it retrieves files is perfect. On their website, it says that it only caches up to ~60MB at a time, meaning you can start watching/listening instantly, and the file doesn't need to be cached in its entirety.
So my questions are:
Is there an alternative to Kodi that works on all major OS's, where the files are instantly available and the caching system works like YouTube, Netflix, where only a small portion of the file is cached at once?
Is it actually possible to play a video natively in the OS (in an app like VLC) before the entire video is stored on the local disk, either in storage or in cache?
If so, how would I go about doing this?
A few conditions for the solution:
I don't want to have to use the browser every time - A desktop/mobile app, Finder, or File Explorer is essential.
Ideally something that will run on Android TV, or at least is able to use Chromecast.
Files must be instantly accessible - nothing that will cache the entire file first (unless this is impossible due to how OS's work).
If possible, I would prefer NOT to have to go through some massively complicated set up with coding, terminal commands, or using a dedicated server. The solution must use cloud storage, ideally with an app that works on major OS's.
Thanks in advance for help and suggestions!

Speed up Build-Process of WiX-Installer

For my Wix project I am harvesting 4 directories, via the pre-build-event of visual studio, which will result in about 160mb of data, and about 220 files, but the build process tooks very long.
How can i speed that process up? I have one embedded media.cab file which will hold all the files. Is it the size or the amount of files that will slow the process down? Or is it the harvesting with the heat tool in the pre-build-event? Would it be faster with the HeatDirectory element?
Anyone made some experience with speeding this up?

For us, the vast majority of the time was spent invoking light (for the linking phase).
light is very slow at compressing cabinets. Changing the DefaultCompressionLevel in the .wixproj from high to mszip (or low or none) helps a lot. However, our build was still too slow.
It turns out that light handles cabinets independently, and automatically links them on multiple threads by default. To take advantage of this parallelism, you need to generate multiple cabinets. In our .wxs Product element, we had the following that placed everything in a single cabinet:
<MediaTemplate EmbedCab="yes" />
It turns out you can use the MaximumUncompressedMediaSize attribute to declare the threshold (in MB) at which you want files to be automatically sharded into different .cab files:
<MediaTemplate EmbedCab="yes" MaximumUncompressedMediaSize="2" />
Now light was much faster, even with high compression, but still not fast enough for incremental builds (where only a few files change).
Back in the .wixproj, we can use the following to set up a cabinet cache, which is ideal for incremental builds where few files change and most cabinets don't need to be regenerated:
<CabinetCachePath>$(OutputPath)cabcache\</CabinetCachePath>
<ReuseCabinetCache>True</ReuseCabinetCache>
Suppressing validation also gives a nice speedup (light.exe spends about a third of its time validating the .msi by default). We activate this for debug builds:
<SuppressValidation>True</SuppressValidation>
With these changes, our final (incremental) build went from over a minute to a few seconds for a 32 MB .msi output, and a full rebuild stayed well under a minute, even with the high compression level.

WiX Help File: How To: Optimize build speed. In other words: 1) Cabinet reuse and 2) multi-threaded cab creation are built-in mechanisms in WiX to speed up builds.
Hardware: The inevitable "throw hardware at it". New SSD and NVMe disks are so much faster than older IDE drives that you might want to try them as another way to improve build speed and installation speed. Obvious yes, but very important. It can really improve the speed of development. See this answer.
Challenges with NVMe drives?: 1) They run hot, 2) they usually have limited capacity (size), 3) they might be more vulnerable than older 2.5" drives when used in laptops (I am not sure - keep in mind that some NVMe drives are soldered solid to the motherboard on laptops), 4) data rescue can be a bit challenging if you don't have good quality external enclosures (form factor etc...), 5) NVMe drives are said to burn out over time, 6) They are still somewhat pricey - especially the larger capacity ones, and there are further challenges for sure - but overall: these drives are awesome.
Compression: You can try to compile your setup with a different compression level (for example none for debug builds). No compression makes builds faster. Here are illustrations for doing the opposite, setting higher compression (just use none instead of high for your purpose):
CompressionLevel: Msi two times larger than msm
MediaTemplate: How can I reduce the size of a 1GB MSI file using Orca?
A related answer on compression: What is the compression method used by MSI files?
Separate Setup: If you still go compressed, you could put prerequisites and merge modules in a separate setup to avoid compressing them for every build (or use release flags if you are in Installshield, or check the Preprocessor features in Wix).
External Source Files: I suppose you could use external source files if that's acceptable - then you don't have a lengthy compression operation taking place during the build, just a file copy (which keeps getting faster - especially with flash drives).
Shim: Another technique is to shim all the files you install to be 1 KB if what you are testing is the setup itself and its GUI and custom actions. It is then just a "shell" of a setup - which is a great way to test new custom actions to your setup. Many have written tools for this, but I don't have a link for you. There is always github.com to search.
Release Flags: Another way to save time is to use special release flags (Installshield only) to compile smaller versions of the setup you are working on at the moment (leaving out many features). WiX has similar possibilities via its preprocessor. More on WiX preprocessor practical use.
Debug Build: I usually use combinations of these techniques to make a debug build.
I normally use external source files when I experiment and add new features and keep rebuilding and installing the setup all the time.
Release flags to compile only part of the setup, cabinet reuse and release flags combined can save a lot of time depending on the size of your setup, the number of files and your hardware configuration.
Perhaps the most effective is a separate setup in my opinion (provided it is stable and not changing that often). Beware though: Wix to Install multiple Applications (the problems involved when it comes to splitting setups).
My take on it: go for a prerequisites-only separate setup. This is good also for Large Scale Deployment scenarios where corporate users want to use their own, standardized prerequisites and are annoyed with lots of embedded "junk" in a huge setup. A lot of package preparation time in large companies is spent taking out outdated runtimes and prerequisites. You can also deliver updates to these prerequisites without rebuilding your entire setup. Good de-coupling.
Links:
How can I speed up MSI package install and uninstall?

Simply put, don't harvest files. Please see my blog article: Dealing with very large number of files
The third downside is that your build will take A LOT longer to
perform since it's not only creating your package but that it's also
authoring and validating your component definitions.
I maintain an open source project on CodePlex called IsWiX. It contains project templates (scaffolding) and graphical designers to assist you in setting up and maintaining your WiX source. That said, it was designed around merge modules which slows the build down a bit as the .MSM has to be built and then merged into the .MSI. Pure fragments would be faster if you are really concerned about pure speed. That said I have many installers around 160mb and it doesn't take long at all.
And of course don't forget about having a fast build machine. CPU, RAM and SSD disk I/O all contribute to fast generation of MSIs. For my consulting, I use Microsoft Visual Studio Online (VSO). I have a Core i7-2600k Hyper-V server with 32GB of ram and a Samsung 850evo SSD. My build server (VM) runs a TFS proxy server for local SCC caching.
For fun, on the above machine, I took a 220 files from my system32 folder totaling 160MB. It took 30 seconds to build the MSM and 30 seconds to build the MSI for a total of 60 seconds. This is 'fast enough' for me. I would expect an MSI authored using only fragments to take 30 seconds.

Unity 3D: Asset Bundles vs. Resources folder vs www.Texture

So, I've done a bit of reading around the forums about AssetBundles and the Resources folder in Unity 3D, and I can't figure out the optimal solution for the problem I'm facing. Here's the problem:
I've got a program designed for standalone, that loads "books" full of .png and .jpg images. The pages are, at the moment, the same every time the program starts. At the start of the scene for any "book", it's loading all those images at once using www.texture and a path. I'm realizing now, however, that this is possibly an non-performant method for accessing things at runtime -- it's slow! Which means the user can't do anything for 5-20 seconds while the scene starts and the book's page images load up (on non-legendary computers). SO, I can't figure out which of the three things would be the fastest:
1) Loading one asset bundle per book (say 20 textures # 1 mb each).
2) Loading one asset bundle per page (1 mb each).
3) Either of the first two options, but loaded from the resources folder.
Which one would be faster, and why? I understand that asset bundles are packaged by unity, but does this mean that the textures inside will be pre-compressed and easier on memory at load time? Does the resources folder cause less load time? What gives? As I understand it, the resources folder loads into a cache -- but is it the same cache that the standalone player uses normally? Or is this extra, unused space? I guess another issue is that I'm not sure what the difference is between loading things from memory and storing them in the cache.
Cheers, folks...

The Resource folders are bundled managed assets. That means they will be compressed by Unity, following the settings you apply in the IDE. They are therefore efficient to load at runtime. You can tailor the compression for each platform, which should further optimize performance.
We make expensive use of Resources.Load() to pull assets and it performs well on both desktop and mobile.
There is also a special folder, called StreamingAssets, that you can use to put bundled un-managed assets. This is where we put the videos we want to play at runtime, but don't want Unity to convert them to the default ogg codec. On mobile these play in the native video player. You can also put images in there and loading them is like using WWW class. Slow, because Unity needs to sanitize and compress the images at load time.
Loading WWW is slower due to the overhead of processing asset, as mentioned above. But you can pull data from a server or from outside the application "sandbox".
Only load what you need to display and implement a background process to fetch additional content when the user is busy going through the first pages of each book. This would avoid blocking the UI too long.
Optimize the images to reduce the file size. Use tinypng, if you need transparent images, or stick to compressed JPGs
Try using Power of 2 images where possible. This should speed up the runtime processing a little.
ath.

Great answer from Jerome about Resources. To add some additional info for future searches regarding AssetBundles, here are two scenarios:
Your game is too big
You have a ton of textures, say, and your iOS game is above 100 mb -- meaning Apple will show a warning to users and prevent them from downloading over cellular. Resources won't help because everything in that folder is bundled with the app.
Solution: Move the artwork you don't absolutely need on first-run into asset bundles. Build the bundles, upload them to a server somewhere, then download them at runtime as needed. Now your game is much smaller and won't have any scary warnings.
You need different versions of artwork for different platforms
Alternative scenario: you're developing for iPhone and iPad. For the same reasons as above you shrink your artwork as much as possible to hit the 100 mb limit for iPhone. But now the game looks terrible on iPad. What do?
Solution: You create an asset bundle with two variants. One for phones with low res artwork, and one for tablets with high res artwork. In this case the asset bundles can be shipped with the game or sent to a server. At run-time you pick the correct variant and load from the asset bundle, getting the appropriate artwork without having to if/else everywhere.
With all that being said, asset bundles are more complicated to use, poorly documented, and Unity's demos don't work properly at times. So seriously evaluate whether you need them.

Graceful File Reading without Locking

Whiteboard Overview
The images below are 1000 x 750 px, ~130 kB JPEGs hosted on ImageShack.
Internal
Global
Additional Information
I should mention that each user (of the client boxes) will be working straight off the /Foo share. Due to the nature of the business, users will never need to see or work on each other's documents concurrently, so conflicts of this nature will never be a problem. Access needs to be as simple as possible for them, which probably means mapping a drive to their respective /Foo/username sub-directory.
Additionally, no one but my applications (in-house and the ones on the server) will be using the FTP directory directly.
Possible Implementations
Unfortunately, it doesn't look like I can use off the shelf tools such as WinSCP because some other logic needs to be intimately tied into the process.
I figure there are two simple ways for me to accomplishing the above on the in-house side.
Method one (slow):
Walk the /Foo directory tree every N minutes.
Diff with previous tree using a combination of timestamps (can be faked by file copying tools, but not relevant in this case) and check-summation.
Merge changes with off-site FTP server.
Method two:
Register for directory change notifications (e.g., using ReadDirectoryChangesW from the WinAPI, or FileSystemWatcher if using .NET).
Log changes.
Merge changes with off-site FTP server every N minutes.
I'll probably end up using something like the second method due to performance considerations.
Problem
Since this synchronization must take place during business hours, the first problem that arises is during the off-site upload stage.
While I'm transferring a file off-site, I effectively need to prevent the users from writing to the file (e.g., use CreateFile with FILE_SHARE_READ or something) while I'm reading from it. The internet upstream speeds at their office are nowhere near symmetrical to the file sizes they'll be working with, so it's quite possible that they'll come back to the file and attempt to modify it while I'm still reading from it.
Possible Solution
The easiest solution to the above problem would be to create a copy of the file(s) in question elsewhere on the file-system and transfer those "snapshots" without disturbance.
The files (some will be binary) that these guys will be working with are relatively small, probably ≤20 MB, so copying (and therefore temporarily locking) them will be almost instant. The chances of them attempting to write to the file in the same instant that I'm copying it should be close to nil.
This solution seems kind of ugly, though, and I'm pretty sure there's a better way to handle this type of problem.
One thing that comes to mind is something like a file system filter that takes care of the replication and synchronization at the IRP level, kind of like what some A/Vs do. This is overkill for my project, however.
Questions
This is the first time that I've had to deal with this type of problem, so perhaps I'm thinking too much into it.
I'm interested in clean solutions that don't require going overboard with the complexity of their implementations. Perhaps I've missed something in the WinAPI that handles this problem gracefully?
I haven't decided what I'll be writing this in, but I'm comfortable with: C, C++, C#, D, and Perl.

After the discussions in the comments my proposal would be like so:
Create a partition on your data server, about 5GB for safety.
Create a Windows Service Project in C# that would monitor your data driver / location.
When a file has been modified then create a local copy of the file, containing the same directory structure and place on the new partition.
Create another service that would do the following:
Monitor Bandwidth Usages
Monitor file creations on the temporary partition.
Transfer several files at a time (Use Threading) to your FTP Server, abiding by the bandwidth usages at the current time, decreasing / increasing the worker threads depending on network traffic.
Remove the files from the partition that have successfully transferred.
So basically you have your drives:
C: Windows Installation
D: Share Storage
X: Temporary Partition
Then you would have following services:
LocalMirrorService - Watches D: and copies to X: with the dir structure
TransferClientService - Moves files from X: to ftp server, removes from X:
Also use multi threads to move multiples and monitors bandwidth.
I would bet that this is the idea that you had in mind but this seems like a reasonable approach as long as your really good with your application development and your able create a solid system that would handle most issues.
When a user edits a document in Microsoft Word for instance, the file will change on the share and it may be copied to X: even though the user is still working on it, within windows there would be an API see if the file handle is still opened by the user, if this is the case then you can just create a hook to watch when the user actually closes the document so that all there edits are complete, then you can migrate to drive X:.
this being said that if the user is working on the document and there PC crashes for some reason, the document / files handle may not get released until the document is opened at a later date, thus causing issues.

For anyone in a similar situation (I'm assuming the person who asked the question implemented a solution long ago), I would suggest an implementation of rsync.
rsync.net's Windows Backup Agent does what is described in method 1, and can be run as a service as well (see "Advanced Usage"). Though I'm not entirely sure if it has built-in bandwidth limiting...
Another (probably better) solution that does have bandwidth limiting is Duplicati. It also properly backs up currently-open or locked files. Uses SharpRSync, a managed rsync implementation, for its backend. Open source too, which is always a plus!

Large application/file load-time

I'm sure many have noticed that when you have a large application (i.e. something requiring a few MBs of DLLs) it loads much faster the second time than the first time.
The same happens if you read a large file in your application. It's read much faster after the first time.
What affects this? I suppose this is the hard-drive cache, or is the OS adding some memory-caching of its own.
What techniques do you use to speed-up the loading times of large applications and files?
Thanks in advance
Note: the question refers to Windows
Added: What affects the cache size of the OS? In some apps, files are slow-loading again after a minute or so, so the cache fills in a minute?

Two things can affect this. The first is hard-disk caching (done by the disk which has little impact and by the OS which tends to have more impact). The second is that Windows (and other OS') have little reason to unload DLLs when they're finished with them unless the memory is needed for something else. This is because DLLs can easily be shared between processes.
So DLLs have a habit of hanging around even after the applications that were using them disappear. If another application decides the DLL is needed, it's already in memory and just has to be mapped into the processes address space.
I've seen some application pre-load their required DLLs (usually called QuickStart, I think both MS Office and Adobe Reader do this) so that the perceived load times are better.

Windows's memory manager is actually pretty slick -- it services memory requests AND acts as the disk cache. With enough free memory on the system, lots of files that have been recently accessed will reside in memory. Until the physical memory is needed, those DLLs will remain in cache -- all ala the CacheManager.
As far as how to help, look into Delay Loading your DLLs. The advantages of LoadLibrary only when you need it, but automatic so you don't have LoadLibrary/GetProcAddress on all of your code. (Well automatic, as far as just needing to add a linker command switch):
http://msdn.microsoft.com/en-us/library/yx9zd12s.aspx
Or you could pre-load like Office and others do (as mentioned above), but I personally hate that -- slows down the computer at initial boot up.

I see two possibilities :
preload yourlibraries at system startup as already mentionned Office, OpenOffice and others are doing just that.
I am not a great fan of that solution : It makes your boot time longer and eats lots of memory.
load your DLL dynamically (see LoadLibrary) only when needed. Unfortunately not possible with all DLL.
For example, why load at startup a DLL to export file in XYZ format when you are not sure it will ever be needed ?? Load it when the user did select this export format.
I have a dream where Adobe Acrobat use this approach, instead of bogging me with loads of plugins I never use every time I want to display a PDF file !
Depending on your needs you might have to use both techniques : preload some big heavliy used librairies and load on demand only specific plugins...

One item that might be worth looking at is "rebasing". Each DLL has a preset "base" address that it prefers to be loaded into memory at. If an application is loading the DLL at a different address (because the preferred one is not available) the DLL is loaded at the new address and "rebased". Roughly speaking this means that parts of the dll are updated on the fly. This only applies to native images as opposed to .NET vm .dll's.
This really old MSDN article covers rebase'ng:
http://msdn.microsoft.com/en-us/library/ms810432.aspx
Not sure whether much of it still applies (it's a very old article)... but here's an enticing quote:
Prefer one large DLL over several
small ones; make sure that the
operating system does not need to
search for the DLLs very long; and
avoid many fixups if there is a chance
that the DLL may be rebased by the
operating system (or, alternatively,
try to select your base addresses such
that rebasing is unlikely).
Btw if you're dealing with .NET then "ngen'ng" your app/dlls should help speed things up (ngen = natve image generation).

Yep, anything read in from the hard drive is cached so it will load faster the second time. The basic assumption is that it's rare to use a large chunk of data from the HD only once and then discard it (this is usually a good assumption in practice). Typically I think it's the operating system (kernel) that implements the cache, taking up a chunk of RAM to do so, although I'm not sure if modern hard drives have some builtin cache capability. (I once wrote a small kernel as an academic project; caching of HD data in memory was one of its features)

One additional factor which affects program startup time is Superfetch, a technology introduced with (I believe) Windows XP. Essentially it monitors disk access during program startup, recognizes file access patterns and them attempts to "bunch up" the required data for quicker access (e.g. by rearranging the data sequentially on disk according to its loading order).
As the others mentioned, generally speaking any read operation is likely to be cached by the Windows disk cache, and reused unless the memory is needed for other operations.

NGENing the assemblies might help with the startup time, however, runtime might be effected (Sometimes the NGened code is not as optimal as OnDemand Compiled code)
NGENing can be done in the background as well: http://blogs.msdn.com/davidnotario/archive/2005/04/27/412838.aspx
Here's another good article NGen and Performance http://msdn.microsoft.com/en-us/magazine/cc163808.aspx

The system cache is used for anything that comes off disk. That includes file metadata, so if you are using applications that open a large number of files (say, directory scanners), then you can easily flush the cache if you also have applications running that eat up a lot of memory.
For the stuff I use, I prefer to use a small number of large files (>64 MB to 1 GB) and asynchronous un-bufferred I/O. And a good ol' defrag every once in a while.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio