dojo.io.script.get caching - caching

I'm loading data using dojo.io.script.get. Size of each request can be big and I need to issue lots of them.
Question is, after data loaded and later dismissed is it cached by browser?
In other words, when I load some data that have content "myFunc('blah blah blah')". It will execute myFunc function. What happens to the browser memory after execution? If I say load it 100 times and size of each string within myFunc is say 1GB, will browser run out of memory?
Thanks.
Andrei

One of the things I have learned about Dojo is that the source code is a great reference.
My quick inspection of dojo/io/script.js shows that there is some logic involving dead code tags and destroying script tags so I guess it should protect against the memory leaks you mention. (Of course, you should always test this kind of stuff yourself, just to be sure).

Related

Is it safe to read concurrently from a pointer?

I'm working on an image uploader and want to concurrently resize the image to different sizes. Once I've read the file as a []byte I'm passing a reference of that buffer to my resize functions that are being run concurrently.
Is this safe? I'm thinking by passing a reference of a large file to be read by resize functions will save me memory, and the concurrency will save me time.
Thank you!
Read-only data is usually fine for concurrent access, but you have to be very careful when passing references (pointers, slices, maps and so on) around. Today maybe no one is modifying them while you're also reading, but tomorrow someone may be.
If this is a throwaway script, you'll be fine. But if it's part of a larger program, I'd recommend future-proofing your code by judiciously protecting concurrent access. In your case something like a reader-writer lock could be a good match - all the readers will be able to acquire the lock concurrently, so the performance impact is negligible. And then if you do decide in the future this data could be modified, you already have the proper groundwork laid down w.r.t. safety.
Don't forget to run your code with the race detector enabled.

CFSpreadSheet functions using up memory for large data sets

We have a Coldfusion application that is running a large query (up to 100k rows) and then displaying it in HTML. The UI then offers an Export button that triggers writing the report to an Excel spreadsheet in .xlsx format using the cfspreadsheet tags and spreadsheet function, in particular, spreadsheetSetCellValue for building out row column values, spreadsheetFormatRow and spreadsheetFormatCell functions for formatting. The ssObj is then written to a file using:
<cfheader name="Content-Disposition" value="attachment; filename=OES_#sel_rtype#_#Dateformat(now(),"MMM-DD-YYYY")#.xlsx">
<cfcontent type="application/vnd-ms.excel" variable="#ssObj#" reset="true">
where ssObj is the SS object. We are seeing the file size about 5-10 Mb.
However... the memory usage for creating this report and writing the file jumps up by about 1GB. The compounding problem is that the memory is not released right away after the export completes by the java GC. When we have multiple users running and exporting this type of report, the memory keeps climbing up and reaches the heap size allocated and kills the serer's performance to the point it brings down the server. A reboot is usually necessary to clear it out.
Is this normal/expected behavior or how should we be dealing with this issue? Is it possible to easily release the memory usage of this operation on demand after the export has completed, so that others running the report readily get access to the freed up space for their reports? Is this type of memory usage for a 5-10Mb file common with cfspreadsheet functions and writing the object out?
We have tried temporarily removing the expensive formatting functions and still the memory usage is large for the creation and writing of the .xlsx file. We have also tried using the spreadsheetAddRows approach and the cfspreadsheet action="write" query="queryname" tag passing in a query object but this too took up a lot of memory.
Why are these functions so memory hoggish? What is the optimal way to generate Excel SS files without this out of memory issue?
I should add the server is running in Apache/Tomcat container on Windows and we are using CF2016.
How much memory do you have allocated to your CF instance?
How many instances are you running?
Why are you allowing anyone to view 100k records in HTML?
Why are you allowing anyone to export that much data on the fly?
We had issues of this sort (CF and memory) at my last job. Large file uploads consumed memory, large excel exports consumed memory, it's just going to happen. As your application's user base grows, you'll hit a point where these memory hogging requests kill the site for other users.
Start with your memory settings. You might get a boost across the board by doubling or tripling what the app is allotted. Also, make sure you're on the latest version of the supported JDK for your version of CF. That can make a huge difference too.
Large file uploads would impact the performance of the instance making the request. This meant that others on the same instance doing normal requests were waiting for those resources needlessly. We dedicated a pool of instances to only handle file uploads. Specific URLs were routed to these instances via a load balancer and the application was much happier for it.
That app also handled an insane amount of data and users constantly wanted "all of it". We had to force search results and certain data sets to reduce the amount shown on screen. The DB was quite happy with that decision. Data exports were moved to a queue so they could craft those large excel files outside of normal page requests. Maybe they got their data immediately, maybe the waited a while to get a notification. Either way, the application performed better across the board.
Presumably a bit late for the OP, but since I ended up here others might too. Whilst there is plenty of general memory-related sound advice in the other answer+comments here, I suspect the OP was actually hitting a genuine memory leak bug that has been reported in the CF spreadsheet functions from CF11 through to CF2018.
When generating a spreadsheet object and serving it up with cfheader+cfcontent without writing it to disk, even with careful variable scoping, the memory never gets garbage collected. So if your app runs enough Excel exports using this method then it eventually maxes out memory and then maxes out CPU indefinitely, requiring a CF restart.
See https://tracker.adobe.com/#/view/CF-4199829 - I don't know if he's on SO but credit to Trevor Cotton for the bug report and this workaround:
Write spreadsheet to temporary file,
read spreadsheet from temporary file back into memory,
delete temporary file,
stream spreadsheet from memory to
user's browser.
So given a spreadsheet object that was created in memory with spreadsheetNew() and never written to disk, then this causes a memory leak:
<cfheader name="Content-disposition" value="attachment;filename=#arguments.fileName#" />
<cfcontent type="application/vnd.ms-excel" variable = "#SpreadsheetReadBinary(arguments.theSheet)#" />
...but this does not:
<cfset local.tempFilePath = getTempDirectory()&CreateUUID()&arguments.filename />
<cfset spreadsheetWrite(arguments.theSheet, local.tempFilePath, "", true) />
<cfset local.theSheet = spreadsheetRead(local.tempFilePath) />
<cffile action="delete" file="#local.tempFilePath#" />
<cfheader name="Content-disposition" value="attachment;filename=#arguments.fileName#" />
<cfcontent type="application/vnd.ms-excel" variable = "#SpreadsheetReadBinary(local.theSheet)#" />
It shouldn't be necessary, but Adobe don't appear to be in a hurry to fix this, and I've verified that this works for me in CF2016.

Coldfusion/Railo: What's the most efficient way to output file contents - fileRead or include?

While I've always cached database calls and placed commonly used data into memory for faster access, I've been finding of late that simple processing and output of data can add a significant amount of time to page load and thus I've been working on a template caching component that will save parsed HTML to either a file, or in memory, for quicker inclusion on pages.
This is all working very well, reducing some page loads down to 10% of the uncached equivalent - however I find myself wondering what would be the most efficient way to output the content.
Currently I'm using fileRead to pull in the parsed HTML and save to a variable, which is output on the page.
This seems very fast, but I'm noticing the memory used by the Tomcat service gradually increasing - presumably because the fileRead operation is reading the contents into memory, and quite possibly, Tomcat isn't removing that data when its finished.
(Side question: Anyone know a way that I can interrogate the JVM memory and find details/stack traces of the objects that CF has created??)
Alternatively, I could use cfinclude to simply include the parsed HTML file. From all the information I can find it seems that the speed would be about the same - so would this method be more memory efficient?
I've had issues on the server before with memory usage crashing Tomcat, so keeping it down is quite important.
Is there anyone doing something similar that can give me the benefit of their experience?
cfinclude just includes the template into the one being compiled, whereas fileread has to read it into memory first and then output, so technically is going to consume more memory. I don;t expect the speed difference is much, but you can see the difference by just turning on debugging and checking the execution times.
The most efficient way would be to cached it with cachePut() and serve it from cacheGet(). What can be faster than fetching from RAM? Don't fetch it at all with proper Expire headers if it's the whole page, or smartly return 304 for Not Modified.
It turns out that CFInclude actually compiles the (already rendered in this case) content into a class, which itself has overhead. The classes aren't unloaded (according to CFTracker) and as such, too many of these can cause permgen errors. FileRead() seems to be orders of magnitude more efficient, as all we're doing is inserting content into the output buffer.

windows memory managment: check if a page is in memory

Is there a way, in Windows, to check if a page in in memory or in disk(swap space)?
The reason I want know this is to avoid causing page fault if the page is in disk, by not accessing that page.
There is no documented way that I am aware of for accomplishing this in user mode.
That said, it is possible to determine this in kernel mode, but this would involve inspecting the Page Table Entries, which belong to the Memory Manager - not something that you really wouldn't want to do in any sort of production code.
What is the real problem you're trying to solve?
The whole point of Virtual Memory is to abstract this sort of thing away. If you are storing your own data and in user-land, put it in a data-structure that supports caching and don't think about pages.
If you are writing code in kernel-space, I know in linux you need to convert a memory address from a user-land to a kernal-space one, then there are API calls in the VMM to get at the page_table_entry, and subsequently the page struct from the address. Once that is done, you use logical operators to check for flags, one of which is "swapped". If you are trying to make something fast though, traversing and messing with memory at the page level might not be the most efficient (or safest) thing to do.
More information is needed in order to provide a more complete answer.

caching snippets (modX)

I was simple cruising through the modx options and i noticed the option to cache snippets. I was wondering what kind of effect this would have (downsides) to my site. I know that caching would improve the loading time of the site by keeping them 'cached' after the first time and then only reloading the updates but this all seems to good to be true. My question is simple: are there any downsides to caching snippets? Cheers, Marco.
Great question!
The first rule of Modx is (almost) always cache. They've said so in their own blog.
As you said, the loading time will be lower. Let's just get the basics on the floor first. When you chose to cache a page, the page with all the output is stored as a file in your cache-folder. If you have a small and simple site, you might not see the biggest difference in caching and not, but if you have a complex one with lots of chunks-in-chunks, snippets parsing chunks etc, the difference is enormous. Some of the websites I've made goes down 15-30 levels to parse the content in come sections. Loading all this fresh from the database can take up to a coupe of seconds, while loading a flat-file would take only a few microseconds. There is a HUGE difference (remember that).
Now. You can cache both snippets and chunks. Important to remember. You can also cache one chunk while uncache the next level. Using Modx's brilliant markup, you can chose what to cache and what to uncache, but in general you want as much as possible cached.
You ask about the downside. There are none, but there are a few cases where you can't use cached snippets/chunks. As mentioned earlier, the cached response is divided into each page. That means that if you have a page (or url or whatever you want to call it), where you display different content based on for example GET-parameters. You can't cache a search-result (because the content changes) or a page with pagination (?page=1, ?page=2 etc would produce different output on the same page). Another case is when a snippet's output is random/different every time. Say you put a random quotes in your header, this needs to be uncached, or you will just see the first random result every time. In all other cases, use caching.
Also remember that every time you save a change in the manager, the cache will be wiped. That means that if you for example display the latest news-articles on your frontpage, this can still be cached because it will not display different content until you add/edit a resource, and then the cache will be cleared.
To sum it all up. Caching is GREAT and you should use it as much as possible. I usually make all my snippets/chunks cached, and if I crash into problems, that is the first thing I check.
Using caching makes your webserver respond quicker (good for the user) and produces fewer queries to the database (good for you). All in all. Caching is a gift. Use it.
There's no downsides to caching and honestly I wonder what made you think there were downsides to it?
You should always cache everything you can - there's no point in having something be executed on every page load when it's exactly the same as before. By caching the output and the source, you bypass the need for processing time and improve performance.
Assuming MODX Revolution (2.x), all template tags you use can be called both cached and uncached.
Cached:
[[*pagetitle]]
[[snippet]]
[[$chunk]]
[[+placeholder]]
[[%lexicon]]
Uncached:
[[!*pagetitle]] - this is pointless
[[!snippet]]
[[!$chunk]]
[[!+placeholder]]
[[!%lexicon]]
In MODX Evolution (1.x) the tags are different and you don't have as much control.
Some time ago I wrote about caching in MODX Revolution on my blog and I strongly encourage you to check it out as it provides more insight into why and how to use caching effectively: https://www.markhamstra.com/modx/2011/10/caching-guidelines-for-modx-revolution/
(PS: If you have MODX specific questions, I'd suggest posting them on forums.modx.com - there's a larger MODX audience there that can help)

Resources