Does iText 7 PdfReader support partial reading? - itext7

I'm in the process of moving from iText 5 to 7. We process huge PDF files, so parsing the entire PDF into memory is not at all desirable. In 5, there is a special constructor on PdfReader that forces 'partial mode'. Does iText 7 always parse the entire PDF or does it always effectively use 'partial mode'?
Looking at the iText 7 source, it appears that PdfReader no longer caches document content. Instead, PdfDocument takes care of the caching. This means that it should be possible to create a new PdfDocument for each page, which would have the same effect of the iText 5 'partial mode' in PdfReader.
If someone could confirm my thinking on that, I would appreciate it.

Partial (or I would rather call it lazy) reading mode is supported in iText7 and it is active by default. This means objects will be read/loaded as needed. Of course, some necessary things will be read in any case (like cross-reference table, catalog etc, as well as nested direct objects).
Also, PdfObject has release() method in iText7, which frees that object from memory and that object will be read again if needed. But if you are using a lot of high-level API then release() might not be that useful and indeed creating several PdfDocument instances might be more useful and simple.
Important note: As the files are huge, they are probably located on disk, so it is very important to use PdfReader(String) or PdfReader(File) constructors. Those take advantage of random read possibility. Otherwise, if you simply pass an InputStream, the stream will be first read fully into memory and then document will be constructed. This of course still saves some memory for the data structures but keeps the source document in memory which I believe is undesired.

Related

Does NSFileWrapper support lazy loading?

I am creating a NSDocument package that contains potentially hundreds of large files, so I don't want to read it all in when opening the document.
I've spent some time searching, but I can't find a definitve answer. Most people seem to think that NSFileWrapper loads all of the data into memory, but some indicate that it doesn't load data until you invoke -regularFileContents on a wrapper. (See Does NSFileWrapper load everything into memory? and Objective-C / Cocoa: Uploading Images, Working Memory, And Storage for examples.)
The documentation isn't entirely clear, but options like NSFileWrapperReadingImmediate and NSFileWrapperReadingWithoutMapping seem to suggest that it doesn't always read everything in.
I gather that NSFileWrapper supports incremental saving, only writing out sub-wrappers that have been replaced. So it'd be nice if it supports incremental loading too.
Is there a definitive answer?
NSFileWrapper loads lazily by default, unless you specify the NSFileWrapperReadingImmediate option. It will avoid reading a file into memory until something actually requests it.
As a debugging aid only, you can see whether a file has been loaded yet, by examining:
[wrapper valueForKey:#"_contents"];
It gets filled in as NSData once the file is read from disk.

Clearing and freeing memory

I am developing a windows application using C# .Net. This is in fact a plug-in which is installed in to a DBMS. The purpose of this plug-in is to read all the records (a record is an object) in DBMS, matching the provided criteria and transfer them across to my local file system as XML files. My problem is related to usage of memory. Everything is working fine. But, each time I read a record, it occupies the memory and after a certain limit the plug in stops working, because of out of memory.
I am dealing with around 10k-20k of records (objects). Is there any memory related methods in C# to clear the memory of each record as soon as they are written to the XML file. I tried all the basic memory handling methods like clear(), flush(), gc(), & finalize()/ But no use.
Please consider he following:
Record is an object, I cannot change this & use other efficient data
structures.
Each time I read a record I write them to XML. and repeat this
again & again.
C# is a garbage collected language. Therefore, to reclaim memory used by an object, you need to make sure all references to that object are removed so that it is eligible for collection. Specifically, this means you should remove the objects from any data structures that are holding references to them after you're done doing whatever you need to do with them.
If you get a little more specific about what type of data structures you're using we can probably give a more specific answer.

Why does OpenURI treat files under 10kb in size as StringIO?

I fetch images with open-uri from a remote website and persist them on my local server within my Ruby on Rails application. Most of the images were shown without a problem, but some images just didn't show up.
After a very long debugging-session I finally found out (thanks to this blogpost) that the reason for this is that the class Buffer in the open-uri-libary treats files with less than 10kb in size as IO-objects instead of tempfiles.
I managed to get around this problem by following the answer from Micah Winkelspecht to this StackOverflow question, where I put the following code within a file in my initializers:
require 'open-uri'
# Don't allow downloaded files to be created as StringIO. Force a tempfile to be created.
OpenURI::Buffer.send :remove_const, 'StringMax' if OpenURI::Buffer.const_defined?('StringMax')
OpenURI::Buffer.const_set 'StringMax', 0
This works as expected so far, but I keep wondering, why they put this code into the library in the first place? Does anybody know a specific reason, why files under 10kb in size get treated as StringIO ?
Since the above code practically resets this behaviour globally for my entire application, I just want to make sure that I am not breaking anything else.
When one does network programming, you allocate a buffer of a reasonably large size and send and read units of data which will fit in the buffer. However, when dealing with files (or sometimes things called BLOBs) you cannot assume that the data will fit into your buffer. So, you need special handling for these large streams of data.
(Sometimes the units of data which fit into the buffer are called packets. However, packets are really a layer 4 thing, like frames are at layer 2. Since this is happening a layer 7, they might better be called messages.)
For replies larger than 10K, the open-uri library is setting up the extra overhead to write to a stream objects. When under the StringMax size, it just includes the string in the message, since it knows it can fit in the buffer.

How to edit an XML File

Howdy,
I was wondering if I was able to alter a distinct element in an XML file saved on a Windows Phone 7 device without actually having to serialize the whole file all over again.
As mentioned in previous answers, you can't save a fragment of XML on its own without saving the whole file. If file size was a big issue, then you could split the data into separate files (perhaps alphabetically; depends on the data), so that you're only saving a smaller dataset if you make a change.
As Matt mentioned, XML serialization does not provide the best performance on WP7 devices. Kevin Marshall has a great blog post detailing different serialization approaches and the performance of each. The fastest method is binary serialization, though there's nothing stopping you serializing the XML using the binary serialization approach.
In general, no - and this has little to do with Windows Phone 7 (although I don't know whether IsolatedStorageFileStream on WP7 even supports seeking).
I don't know of any mainstream filesystems with high level abstractions (such as those used by Java and C#) which allow you to delete or insert data in the middle of a file.
I suppose theoretically if you were happy to pad with whitespace, or never change the length of the data you're using, you could just overwrite the relevant bytes - but I don't think it would be a good idea at all. Very brittle and hard to work with.
Just go for overwriting the whole file.

storing huge amount of records into classic asp cache object is SLOW

we have some nasty legacy asp that is performing like a dog and i narrowed it down to because we are trying to store 15K+ records into the application cache object. but that's not the killer. before it stores it, it converts the ADO stream to XML then stores it. this conversion of the huge record set to XML spikes the CPU and causes all kinds of havoc on users when it's happening. and unfortunately we do this XML conversion to read the cache a lot, causing site wide performance problems.
i don't have the resources to convert everything to .net. so that's out. but i need to obviously use caching, but int his case the caching is hurting instead of helping. is there a more effecient way to store this data instead of doing this xml conversion to/from every time we read/update the cache?
Maybe you should take a look here: A Classic ASP Page Caching Object (cached version)
You can also to consider storing a MSXML2.DOMDocument directly into that application variable and to transform it using MSXML2.XSLTemplate (4.0 or later):
Set oXSLProcessor = xmlStyle.CreateProcessor
With oXSLProcessor
.Input = xmlDoc
.Transform
Response.Write .Output
End With
You may want to try Caprock Dictionary:
http://www.miniat.net/caprock-dictionary-object-component.asp

Resources