stanford corenlp java sdk for Android - stanford-nlp

I am looking to use java sdk in Android app for text processing and bucketing those under some category.
but I see corenlp java sdk size is 371MB (version 3.7.0), which is not feasible in Android app, as size is very less than this and can not afford to have more app size.
Is it possible that only subset of nlp can be availed?

Could you describe what you want to do with Stanford CoreNLP? And what size do you need to get things down to? You can remove some code and resources, but the more you remove the more functionality you will lose.
UPDATE: Something to keep in mind, the code and dependencies are ~10 MB. Most of that 350+ MB is resources that are only needed at run time. You could definitely cut down the 10 MB if you are only using a subset.
Couldn't you put the resources in expansion files which allow up to 4 GB. You could still remove some resources that are not necessary.
Though another serious problem is it appears that Android apps have a small RAM limit. It may be challenging to get under 30 MB of RAM usage.

Related

Is Electron a good choice for an app dealing with a large amount of data?

I'm working on a web app to download, decrypt, and create a report of a user's data. This report could be > 100 Gb with individual files of up to 5 Gb. The initial hope was to achieve this in the browser. But memory limitations, especially with a 5 Gb file, have scrapped this idea. Instead the new plan is to provide a standalone app to compile and download the report.
A suggestion has been put forward to use Electron. I'd like to know if this is viable? Or will Electron suffer from the same limitations as the browser?
I'll provide my own response based on experience.
I'd like to know if this is viable? Or will Electron suffer from the same limitations as the browser?
Electron has no such limits. It is perfectly capable of streaming large files and a large quantity of files. In our usage we validated it for 500 GBs and 12 million individual files.
There are other limits. For example if you add 100k+ files to a single zip, it takes a lot of time to extract. As such other strategies may be required. However this is not an Electron limitation.
One limitation of Electron that does matter is its lack of support for FIPS. i.e. Enterprise / Government level security. In our case it's omission means we need to re-write the client in C++. It was possible to get FIPS support on Linux. But this was only achieved using, with a good amount of effort, boringssl. As there are no boringssl releases for Windows we switched to C++.

How to reduce overall analysis duration in Sonarqube Analysis?

My use case details are like below.
Sonarqube Version: 4.5.2
Ram: 16 GB
Code base size: in GBs
OS: Windows
Project Languages: Java, JavaScript
Project Type: Multi Module
Analysis takes half of the day like 12-13 hours. Need help in minimizing the same.
There could be a couple things address here
network latency - for the version you're running, you want to make sure the machine that performs the analysis is a close as possible on the network to your database
database contention - in 4.5.2, the scanner talks directly to the database. If analyses of other projects are happening concurrently, they could be interfering with each other. You can remove this problem by upgrading to the new L.T.S. version, 5.6, which fully cuts the ties from the scanner to the database. In 5.6 analysis reports are generated by the SonarQube scanner, and submitted to the server, where they're queued for processing and handled serially
Your project is just too darn big to analyze successfully in a reasonable amount of time. This may or may not be the case, but you should give it honest consideration, and potentially analyze components independently. Doing so would have the knock-on benefit of not reanalyzing the whole thing when one file in one module has changed. If you need to re-aggregate the results of the individual component analyses, you can do so with the Governance plugin($).

Speed up Build-Process of WiX-Installer

For my Wix project I am harvesting 4 directories, via the pre-build-event of visual studio, which will result in about 160mb of data, and about 220 files, but the build process tooks very long.
How can i speed that process up? I have one embedded media.cab file which will hold all the files. Is it the size or the amount of files that will slow the process down? Or is it the harvesting with the heat tool in the pre-build-event? Would it be faster with the HeatDirectory element?
Anyone made some experience with speeding this up?
For us, the vast majority of the time was spent invoking light (for the linking phase).
light is very slow at compressing cabinets. Changing the DefaultCompressionLevel in the .wixproj from high to mszip (or low or none) helps a lot. However, our build was still too slow.
It turns out that light handles cabinets independently, and automatically links them on multiple threads by default. To take advantage of this parallelism, you need to generate multiple cabinets. In our .wxs Product element, we had the following that placed everything in a single cabinet:
<MediaTemplate EmbedCab="yes" />
It turns out you can use the MaximumUncompressedMediaSize attribute to declare the threshold (in MB) at which you want files to be automatically sharded into different .cab files:
<MediaTemplate EmbedCab="yes" MaximumUncompressedMediaSize="2" />
Now light was much faster, even with high compression, but still not fast enough for incremental builds (where only a few files change).
Back in the .wixproj, we can use the following to set up a cabinet cache, which is ideal for incremental builds where few files change and most cabinets don't need to be regenerated:
<CabinetCachePath>$(OutputPath)cabcache\</CabinetCachePath>
<ReuseCabinetCache>True</ReuseCabinetCache>
Suppressing validation also gives a nice speedup (light.exe spends about a third of its time validating the .msi by default). We activate this for debug builds:
<SuppressValidation>True</SuppressValidation>
With these changes, our final (incremental) build went from over a minute to a few seconds for a 32 MB .msi output, and a full rebuild stayed well under a minute, even with the high compression level.
WiX Help File: How To: Optimize build speed. In other words: 1) Cabinet reuse and 2) multi-threaded cab creation are built-in mechanisms in WiX to speed up builds.
Hardware: The inevitable "throw hardware at it". New SSD and NVMe disks are so much faster than older IDE drives that you might want to try them as another way to improve build speed and installation speed. Obvious yes, but very important. It can really improve the speed of development. See this answer.
Challenges with NVMe drives?: 1) They run hot, 2) they usually have limited capacity (size), 3) they might be more vulnerable than older 2.5" drives when used in laptops (I am not sure - keep in mind that some NVMe drives are soldered solid to the motherboard on laptops), 4) data rescue can be a bit challenging if you don't have good quality external enclosures (form factor etc...), 5) NVMe drives are said to burn out over time, 6) They are still somewhat pricey - especially the larger capacity ones, and there are further challenges for sure - but overall: these drives are awesome.
Compression: You can try to compile your setup with a different compression level (for example none for debug builds). No compression makes builds faster. Here are illustrations for doing the opposite, setting higher compression (just use none instead of high for your purpose):
CompressionLevel: Msi two times larger than msm
MediaTemplate: How can I reduce the size of a 1GB MSI file using Orca?
A related answer on compression: What is the compression method used by MSI files?
Separate Setup: If you still go compressed, you could put prerequisites and merge modules in a separate setup to avoid compressing them for every build (or use release flags if you are in Installshield, or check the Preprocessor features in Wix).
External Source Files: I suppose you could use external source files if that's acceptable - then you don't have a lengthy compression operation taking place during the build, just a file copy (which keeps getting faster - especially with flash drives).
Shim: Another technique is to shim all the files you install to be 1 KB if what you are testing is the setup itself and its GUI and custom actions. It is then just a "shell" of a setup - which is a great way to test new custom actions to your setup. Many have written tools for this, but I don't have a link for you. There is always github.com to search.
Release Flags: Another way to save time is to use special release flags (Installshield only) to compile smaller versions of the setup you are working on at the moment (leaving out many features). WiX has similar possibilities via its preprocessor. More on WiX preprocessor practical use.
Debug Build: I usually use combinations of these techniques to make a debug build.
I normally use external source files when I experiment and add new features and keep rebuilding and installing the setup all the time.
Release flags to compile only part of the setup, cabinet reuse and release flags combined can save a lot of time depending on the size of your setup, the number of files and your hardware configuration.
Perhaps the most effective is a separate setup in my opinion (provided it is stable and not changing that often). Beware though: Wix to Install multiple Applications (the problems involved when it comes to splitting setups).
My take on it: go for a prerequisites-only separate setup. This is good also for Large Scale Deployment scenarios where corporate users want to use their own, standardized prerequisites and are annoyed with lots of embedded "junk" in a huge setup. A lot of package preparation time in large companies is spent taking out outdated runtimes and prerequisites. You can also deliver updates to these prerequisites without rebuilding your entire setup. Good de-coupling.
Links:
How can I speed up MSI package install and uninstall?
Simply put, don't harvest files. Please see my blog article: Dealing with very large number of files
The third downside is that your build will take A LOT longer to
perform since it's not only creating your package but that it's also
authoring and validating your component definitions.
I maintain an open source project on CodePlex called IsWiX. It contains project templates (scaffolding) and graphical designers to assist you in setting up and maintaining your WiX source. That said, it was designed around merge modules which slows the build down a bit as the .MSM has to be built and then merged into the .MSI. Pure fragments would be faster if you are really concerned about pure speed. That said I have many installers around 160mb and it doesn't take long at all.
And of course don't forget about having a fast build machine. CPU, RAM and SSD disk I/O all contribute to fast generation of MSIs. For my consulting, I use Microsoft Visual Studio Online (VSO). I have a Core i7-2600k Hyper-V server with 32GB of ram and a Samsung 850evo SSD. My build server (VM) runs a TFS proxy server for local SCC caching.
For fun, on the above machine, I took a 220 files from my system32 folder totaling 160MB. It took 30 seconds to build the MSM and 30 seconds to build the MSI for a total of 60 seconds. This is 'fast enough' for me. I would expect an MSI authored using only fragments to take 30 seconds.

Unity 3D: Asset Bundles vs. Resources folder vs www.Texture

So, I've done a bit of reading around the forums about AssetBundles and the Resources folder in Unity 3D, and I can't figure out the optimal solution for the problem I'm facing. Here's the problem:
I've got a program designed for standalone, that loads "books" full of .png and .jpg images. The pages are, at the moment, the same every time the program starts. At the start of the scene for any "book", it's loading all those images at once using www.texture and a path. I'm realizing now, however, that this is possibly an non-performant method for accessing things at runtime -- it's slow! Which means the user can't do anything for 5-20 seconds while the scene starts and the book's page images load up (on non-legendary computers). SO, I can't figure out which of the three things would be the fastest:
1) Loading one asset bundle per book (say 20 textures # 1 mb each).
2) Loading one asset bundle per page (1 mb each).
3) Either of the first two options, but loaded from the resources folder.
Which one would be faster, and why? I understand that asset bundles are packaged by unity, but does this mean that the textures inside will be pre-compressed and easier on memory at load time? Does the resources folder cause less load time? What gives? As I understand it, the resources folder loads into a cache -- but is it the same cache that the standalone player uses normally? Or is this extra, unused space? I guess another issue is that I'm not sure what the difference is between loading things from memory and storing them in the cache.
Cheers, folks...
The Resource folders are bundled managed assets. That means they will be compressed by Unity, following the settings you apply in the IDE. They are therefore efficient to load at runtime. You can tailor the compression for each platform, which should further optimize performance.
We make expensive use of Resources.Load() to pull assets and it performs well on both desktop and mobile.
There is also a special folder, called StreamingAssets, that you can use to put bundled un-managed assets. This is where we put the videos we want to play at runtime, but don't want Unity to convert them to the default ogg codec. On mobile these play in the native video player. You can also put images in there and loading them is like using WWW class. Slow, because Unity needs to sanitize and compress the images at load time.
Loading WWW is slower due to the overhead of processing asset, as mentioned above. But you can pull data from a server or from outside the application "sandbox".
Only load what you need to display and implement a background process to fetch additional content when the user is busy going through the first pages of each book. This would avoid blocking the UI too long.
Optimize the images to reduce the file size. Use tinypng, if you need transparent images, or stick to compressed JPGs
Try using Power of 2 images where possible. This should speed up the runtime processing a little.
ath.
Great answer from Jerome about Resources. To add some additional info for future searches regarding AssetBundles, here are two scenarios:
Your game is too big
You have a ton of textures, say, and your iOS game is above 100 mb -- meaning Apple will show a warning to users and prevent them from downloading over cellular. Resources won't help because everything in that folder is bundled with the app.
Solution: Move the artwork you don't absolutely need on first-run into asset bundles. Build the bundles, upload them to a server somewhere, then download them at runtime as needed. Now your game is much smaller and won't have any scary warnings.
You need different versions of artwork for different platforms
Alternative scenario: you're developing for iPhone and iPad. For the same reasons as above you shrink your artwork as much as possible to hit the 100 mb limit for iPhone. But now the game looks terrible on iPad. What do?
Solution: You create an asset bundle with two variants. One for phones with low res artwork, and one for tablets with high res artwork. In this case the asset bundles can be shipped with the game or sent to a server. At run-time you pick the correct variant and load from the asset bundle, getting the appropriate artwork without having to if/else everywhere.
With all that being said, asset bundles are more complicated to use, poorly documented, and Unity's demos don't work properly at times. So seriously evaluate whether you need them.

Project is growing ang growing, Xcode slowing down

My projects is growing. It includes about 16 thousands .m4a (sound) files, because it's the App helping to learn languages with examples, but only few classes and files containing code.
Since I 've added those 16000 files working on this project is PITA. Renaming any file takes time, compiling, building, launching the app takes so much time. Of course I know that about 200MB has to be transfered, but the problem is the compouter is responding badly at that time.
Fortnately I have a SSD drive and 8GB RAM, I don't want to even think, how long would it take on HDD.
Is there any way to improve the perfomance?
I'll be also responsible for creating more than a ten similar apps for other pair of languages, and I would like to have all of them in one project and only play with targets. So if I don't do anything with performance now, there is high probability than one day I'll throw away this computer through the window of my house on the 2nd floor...
You can try downloading each m4a from the web once you need it. means - the app will be thin when a user download it, but once a sound file have to be played - it'll be downloaded from the web and saved on the SD. next time you have to play this file - play it from the SD.
And yeah, XCode have many problems - this is one of them..
I have solved the problem, by creating an additional Core Data sqlite file, that contains all of the resources, so the entity looks like:
name (nsstring) - name of file
data (nsdata) - binary of file
works like a charm. Quick builds, quick debugs, just like before.

Resources