Adobe Air - Database vs XML for large static data - performance

I'm developing an application using Flas Builder / Flex for Adobe Air. This application will be processing a large set of static text (100 - 200 MB) using a variable set of processing instructions. The target platforms will be iOS, Android and Desktop.
The data set can be either one large XML file or broken into a bunch of XML files about 3MB each. This will be decided at design time.
From your experience would it be better to store the text in an Adode Air database or a set of XML files for best performance (including speed and battery life)?
What other considerations should I take into account?

I quote one of my favourite bookmarks:
There are several different methods for persisting data in AIR applications:
Flat files
Local shared objects
EncryptedLocalStore
Object serialization
SQL database
Each of these methods has its own set of advantages and disadvantages (an explanation of which is beyond the scope of this article). One of the advantages of using a SQL database is that it helps to keep your application's memory footprint down, rather than loading a lot of data into memory from flat files. For example, if you store your application's data in a database, you can select only what you need, when you need it, then easily remove the data from memory when you're finished with it.
Source: http://www.adobe.com/devnet/air/articles/10_tips_building_on_air.html
I don't understand one thing: is EVERY file 100-200 Mb in size? Or this is the total size of ALL your files?

Related

Performance issue while using microstream

I just started learning microstream. After going through the examples published to microstream github repository, I wanted to test its performance with an application that deals with more data.
Application source code is available here.
Instructions to run the application and the problems I faced are available here
To summarize, below are my observations
While loading a file with 2.8+ million records, processing takes 5 minutes
While calculating statistics based on loaded data, application fails with an OutOfMemoryError
Why is microstream trying to load all data (4 GB) into memory? Am I doing something wrong?
MicroStream is not like a traditional database and starts from the concept that all data are in memory. And an Object graph can be stored to disk (or other media) when you store this through the StorageManager.
In your case, all data are in 1 list and thus when accessing this list it reads all records from the disk. The Lazy reference isn't useful how you have used it since it just handles the access to the one list with all data.
Some optimizations that you can introduce.
Split the data based on vendorId, or day using a Map<String, Lazy<List>>
When a Map value is 'processed' removed it from the memory again by clearing the lazy reference. https://docs.microstream.one/manual/5.0/storage/loading-data/lazy-loading/clearing-lazy-references.html
Increase the number of Channels to optimize the reading and writing the data. see https://docs.microstream.one/manual/5.0/storage/configuration/using-channels.html
Don't store the object graph every 10000 lines but just at the end of the loading.
Hope this helps you solve the issues you have at the moment

Images storage performance react native (base64 vs uri path)

I have an app to create reports with some data and images (min 1 img, max 6). This reports keeps saved on my app, until user sent it to API (which can be done at the same day that he registered a report, or a week later).
But my question is: What's the proper way to store this images (I'm using Realm), is it saving the path (uri) or a base64 string? My current version keeps the base64 for this images (500 ~~ 800 kb img size), and then after my users send his reports to API, I deleted this base64 hash.
I was developing a way to save the path to the image, and then I display it. But image-picker uri returned is temporary. So to do this, I need to copy this file to another place, then save the path. But doing it, I got (for kind of 2 or 3 days) 2x images stored on phone (using memory).
So before I develop all this stuff, I was wondering, will it (copy image to another path then save path) be more performant that save base64 hash (to store at phone), or it shouldn't make much difference?
I try to avoid text only answers; including code is best practice but the question about storing images comes up frequently and it's not really covered in the documentation so I thought it should be addressed at a high level.
Generally speaking, Realm is not a solution for storing blob type data - images, pdf's etc. There are a number of technical reasons for that but most importantly, an image can go well beyond the capacity of a Realm field. Additionally it can significantly impact performance (especially in a sync'ing use case)
If this is a local only app, storing the images on disk in the device and keep a reference to where they are (their path) stored in Realm. That will enable the app to be fast and responsive with a minimal footprint.
If this is a sync'd solution where you want to share images across devices or with other users, there are several cloud based solutions to accommodate image storage and then store a URL to the image in Realm.
One option is part of the MongoDB family of products (which also includes MongoDB Realm) called GridFS. Another option is a solid product we've leveraged for years is called Firebase Cloud Storage.
Now that I've made those statements, I'll backtrack just a bit and refer you to this article Realm Data and Partitioning Strategy Behind the WildAid O-FISH Mobile Apps which is a fantastic article about implementing Realm in a real-world use application and in particular how to deal with images.
In that article, note they do store the images in Realm for a short time. However, one thing they left out of that (which was revealed in a forum post) is that the images are compressed to ensure they don't go above the Realm field size limit.
I am not totally on board with general use of that technique but it works for that specific use case.
One more note: the image sizes mentioned in the question are pretty small (500 ~~ 800 kb img size) and that's a tiny amount of data which would really not have an impact, so storing them in realm as a data object would work fine. The caveat to that is future expansion; if you decide to later store larger images, it would require a complete re-write of the code; so why not plan for that up front.

CFSpreadSheet functions using up memory for large data sets

We have a Coldfusion application that is running a large query (up to 100k rows) and then displaying it in HTML. The UI then offers an Export button that triggers writing the report to an Excel spreadsheet in .xlsx format using the cfspreadsheet tags and spreadsheet function, in particular, spreadsheetSetCellValue for building out row column values, spreadsheetFormatRow and spreadsheetFormatCell functions for formatting. The ssObj is then written to a file using:
<cfheader name="Content-Disposition" value="attachment; filename=OES_#sel_rtype#_#Dateformat(now(),"MMM-DD-YYYY")#.xlsx">
<cfcontent type="application/vnd-ms.excel" variable="#ssObj#" reset="true">
where ssObj is the SS object. We are seeing the file size about 5-10 Mb.
However... the memory usage for creating this report and writing the file jumps up by about 1GB. The compounding problem is that the memory is not released right away after the export completes by the java GC. When we have multiple users running and exporting this type of report, the memory keeps climbing up and reaches the heap size allocated and kills the serer's performance to the point it brings down the server. A reboot is usually necessary to clear it out.
Is this normal/expected behavior or how should we be dealing with this issue? Is it possible to easily release the memory usage of this operation on demand after the export has completed, so that others running the report readily get access to the freed up space for their reports? Is this type of memory usage for a 5-10Mb file common with cfspreadsheet functions and writing the object out?
We have tried temporarily removing the expensive formatting functions and still the memory usage is large for the creation and writing of the .xlsx file. We have also tried using the spreadsheetAddRows approach and the cfspreadsheet action="write" query="queryname" tag passing in a query object but this too took up a lot of memory.
Why are these functions so memory hoggish? What is the optimal way to generate Excel SS files without this out of memory issue?
I should add the server is running in Apache/Tomcat container on Windows and we are using CF2016.
How much memory do you have allocated to your CF instance?
How many instances are you running?
Why are you allowing anyone to view 100k records in HTML?
Why are you allowing anyone to export that much data on the fly?
We had issues of this sort (CF and memory) at my last job. Large file uploads consumed memory, large excel exports consumed memory, it's just going to happen. As your application's user base grows, you'll hit a point where these memory hogging requests kill the site for other users.
Start with your memory settings. You might get a boost across the board by doubling or tripling what the app is allotted. Also, make sure you're on the latest version of the supported JDK for your version of CF. That can make a huge difference too.
Large file uploads would impact the performance of the instance making the request. This meant that others on the same instance doing normal requests were waiting for those resources needlessly. We dedicated a pool of instances to only handle file uploads. Specific URLs were routed to these instances via a load balancer and the application was much happier for it.
That app also handled an insane amount of data and users constantly wanted "all of it". We had to force search results and certain data sets to reduce the amount shown on screen. The DB was quite happy with that decision. Data exports were moved to a queue so they could craft those large excel files outside of normal page requests. Maybe they got their data immediately, maybe the waited a while to get a notification. Either way, the application performed better across the board.
Presumably a bit late for the OP, but since I ended up here others might too. Whilst there is plenty of general memory-related sound advice in the other answer+comments here, I suspect the OP was actually hitting a genuine memory leak bug that has been reported in the CF spreadsheet functions from CF11 through to CF2018.
When generating a spreadsheet object and serving it up with cfheader+cfcontent without writing it to disk, even with careful variable scoping, the memory never gets garbage collected. So if your app runs enough Excel exports using this method then it eventually maxes out memory and then maxes out CPU indefinitely, requiring a CF restart.
See https://tracker.adobe.com/#/view/CF-4199829 - I don't know if he's on SO but credit to Trevor Cotton for the bug report and this workaround:
Write spreadsheet to temporary file,
read spreadsheet from temporary file back into memory,
delete temporary file,
stream spreadsheet from memory to
user's browser.
So given a spreadsheet object that was created in memory with spreadsheetNew() and never written to disk, then this causes a memory leak:
<cfheader name="Content-disposition" value="attachment;filename=#arguments.fileName#" />
<cfcontent type="application/vnd.ms-excel" variable = "#SpreadsheetReadBinary(arguments.theSheet)#" />
...but this does not:
<cfset local.tempFilePath = getTempDirectory()&CreateUUID()&arguments.filename />
<cfset spreadsheetWrite(arguments.theSheet, local.tempFilePath, "", true) />
<cfset local.theSheet = spreadsheetRead(local.tempFilePath) />
<cffile action="delete" file="#local.tempFilePath#" />
<cfheader name="Content-disposition" value="attachment;filename=#arguments.fileName#" />
<cfcontent type="application/vnd.ms-excel" variable = "#SpreadsheetReadBinary(local.theSheet)#" />
It shouldn't be necessary, but Adobe don't appear to be in a hurry to fix this, and I've verified that this works for me in CF2016.

Using NSPersistentDocument to create 'Documents'

I would like to create an app that uses
Swift
CoreData
'Documents' which work in the standard macOS fashion [custom extension, a single 'file'/filewrapper containing all data relating to that document]
This does not appear possible. The documentation states very clearly that
NSPersistentDocument does not support some document behaviors:
File wrappers. [..]
which makes me think that the usual ways of dealing with images in CoreData - binary data with 'allow external storage' and save them to a different location, store the URL in the database - cannot be used with NSPersistentDocument. I want my users to be able to do the usual Finder operations on my 'file' (duplicate, move to external storage, restore from external backup) and need all my data to be in one single package.
The SQL version of the file store results in the usual three-fold stack when saving - .sqlite, .sqlite-shm, .sqlite-wal - which is useless as a 'document'.
Is there a solution I have overlooked? (examples are very sparse; the Big Nerd Ranch sample does not solve this, either; neither Marcus Zarra nor Objc.io touch on NSPersistentDocument).
The only option that will work with NSPersistentDocument the way you want it is to store the images directly in the database. You need a Binary Data attribute on your entity, but you cannot turn on the Allows External Storage option.
If you turn on this option, Core Data will decide - depending on the size - whether to store the image directly in the database or in a hidden folder inside the folder where your document is located:
(I made the folder visible entering cmd-shift-. in the Finder). The sample document is named Test 1.doof and it contains three images:
You can see that the hidden folder .Test 1_SUPPORT/EXTERNAL DATA contains two files, which are the two bigger images (1.3 MB and 494 KB). The third one with only 50 KB is stored inside Test 1.doof. If you move Test 1.doof into another folder, the hidden folder is left behind. Opening the file in that other folder leads to two missing images.
Storing the images inside the database is not that bad if you put the binary data into a separate entity with a one-to-one relation to the rest of the data, like so:
That way the image does not interfere with any search or sort operation. NSPersistentDocument gives you a lot of cool functionality for free, so you should use it anyway if possible.
Two additional remarks:
If you turn on Allows External Storage for an attribute, you do not have to care about URLs or where to store the images, Core Data does that for you (but not in a useful way for document-based apps).
These shm or wal files are temporary files that appear "sometimes", for databases without external storage as well. If they stick, you can safely remove them when you app is closed.
If you want to put more then just a database in your document, then you should implement NSDocument instead of NSPersistentDocument. In that case you don't get built-in support for CoreData, but you can use your document as a container for multiple file types.
See also Is NSDocument and CoreData a possible combination, or is NSPersistentDocument the only way?

Clearing and freeing memory

I am developing a windows application using C# .Net. This is in fact a plug-in which is installed in to a DBMS. The purpose of this plug-in is to read all the records (a record is an object) in DBMS, matching the provided criteria and transfer them across to my local file system as XML files. My problem is related to usage of memory. Everything is working fine. But, each time I read a record, it occupies the memory and after a certain limit the plug in stops working, because of out of memory.
I am dealing with around 10k-20k of records (objects). Is there any memory related methods in C# to clear the memory of each record as soon as they are written to the XML file. I tried all the basic memory handling methods like clear(), flush(), gc(), & finalize()/ But no use.
Please consider he following:
Record is an object, I cannot change this & use other efficient data
structures.
Each time I read a record I write them to XML. and repeat this
again & again.
C# is a garbage collected language. Therefore, to reclaim memory used by an object, you need to make sure all references to that object are removed so that it is eligible for collection. Specifically, this means you should remove the objects from any data structures that are holding references to them after you're done doing whatever you need to do with them.
If you get a little more specific about what type of data structures you're using we can probably give a more specific answer.

Resources