Rewinding an NSInputStream based on an NSData - cocoa

I have an NSData that I would like to read as an NSInputStream. This way I can have a consistent API for processing both files and in-memory data. As part of the processing, I would like to make sure that the stream begins with some set of bytes (if it doesn't, I need to process it differently). I'd like to avoid reading a whole file into memory if it's of the wrong type.
So I'm looking for either a way to rewind the stream, or a way to "peek" at the upcoming bytes without moving the read pointer. If this is an NSInputStream created with URL, I can use setProperty:forKey: on NSStreamFileCurrentOffsetKey, but bizarrely this does not work on an NSInputStream created from an NSData (even though you would presume this would have been even easier to implement than the file version). I can't close and reopen the steam to reset the input pointer either (this is explicitly not allowed by NSStream).
I can rework this problem using an NSData-only interface and -initWithContentsOfMappedFile, but I'd rather stay with the NSStream approach if I can.

I think I don't understand something here. An NSInputStream can take data from three places: a socket, an NSData object, or a file. You haven't said that you want to use a socket, which leaves the other two as your data sources. Also, docs for NSStream say that only file-based streams are seekable. (NSStream, overview, 3rd paragraph)
Given that, I'd think that an NSData object would be a better choice. An NSData object will handle both files and bytes (which I think is what you mean by data in memory).
But you consider that and say that you'd prefer to stick with streams. Is there some other consideration here?
(Edit) Sorry, I should have made this a real answer. My answer for the issue you've described is that using NSData really is the right thing to do.
If you prefer a different answer, then please give more details.

You can indeed seek in an NSInputStream that is reading a file:
BOOL samplePositionAccepted = [iStream setProperty:[NSNumber numberWithUnsignedLong:samplePosition] forKey:NSStreamFileCurrentOffsetKey];
I am not sure if this works for NSData though. (Sorry I haven't got enough points to write a comment yet...)
(Oops sorry, didn't see you already tried this...)

Related

Avoid blocking the main thread in a NSDraggingSession using a NSPasteboardItemDataProvider

In a Mac OS X app (Cocoa), I'm copying some images from my app to others using a NSDraggingSession. The NSDraggingItem makes use of an object that implements the protocol NSPasteboardItemDataProvider, to provide the data when the user drops it.
As I'm dealing with images, the types involved are: NSPasteboardTypePNG, kPasteboardTypeFileURLPromise, kUTTypeFileURL, com.adobe.photoshop-image and public.svg-image. These images are in a remote location, so before I can provide them to the pasteboard, I have to download them from the Internet.
I implement the method - pasteboard(pasteboard:item:provideDataForType:) doing something like this:
If the type requested is kPasteboardTypeFileURLPromise, I get the paste location and build and set in the pasteboard the URL string with the location where the file is supposed to be written in the future.
If the type requested is kUTTypeFileURL, I download the file, specify a temporal location and write the downloaded file to that location. Then, I set in the pasteboard the URL string of the location.
If the type requested is one of the others, I download the file and set the plain NSData in the pasteboard.
All these operations are performed on the main thread, producing some lags that I want to get rid of.
I've tried to perform these operations on a background thread, and come back to the main thread to set the final data in the pasteboard, but this doesn't work because the method finishes before.
Does anyone know a way to achieve it?
Promises of pasteboard types are usually meant to be an alternative format of data that you already have, where you want to avoid the expense in time and memory of converting before it's necessary. I don't think it's really appropriate to use it to defer downloading any of the data, at all. For one thing, the download could fail when it's ultimately requested. For another, it could take an arbitrarily long time, as you're struggling with now.
So, I think you should download the data in advance. Either keep it in memory or save it to a temporary file. Use promised types, if appropriate, to deliver it in different forms, but have it on hand in advance.

How to do Lazy Map deserialization in Haskell

Similar to this question by #Gabriel Gonzalez: How to do fast data deserialization in Haskell
I have a big Map full of Integers and Text that I serialized using Cerial. The file is about 10M.
Every time I run my program I deserialize the whole thing just so I can lookup an handful of the items. Deserialization takes about 500ms which isn't a big deal but I alway seem to like profiling on Friday.
It seems wasteful to always deserialize 100k to 1M items when I only ever need a few of them.
I tried decodeLazy and also changing the map to a Data.Map.Lazy (not really understanding how a Map can be Lazy, but ok, it's there) and this has no effect on the time except maybe it's a little slower.
I'm wondering if there's something that can be a bit smarter, only loading and decoding what's necessary. Of course a database like sqlite can be very large but it only loads what it needs to complete a query. I'd like to find something like that but without having to create a database schema.
Update
You know what would be great? Some fusion of Mongo with Sqlite. Like you could have a JSON document database using flat-file storage ... and of course someone has done it https://github.com/hamiltop/MongoLiteDB ... in Ruby :(
Thought mmap might help. Tried mmap library and segfaulted GHCI for the first time ever. No idea how can even report that bug.
Tried bytestring-mmap library and that works but no performance improvement. Just replacing this:
ser <- BL.readFile cacheFile
With this:
ser <- unsafeMMapFile cacheFile
Update 2
keyvaluehash may be just the ticket. Performance seems really good. But the API is strange and documentation is missing so it will take some experimenting.
Update 3: I'm an idiot
Clearly what I want here is not lazier deserialization of a Map. I want a key-value database and there's several options available like dvm, tokyo-cabinet and this levelDB thing I've never seen before.
Keyvaluehash looks to be a native-Haskell key-value database which I like but I still don't know about the quality. For example, you can't ask the database for a list of all keys or all values (the only real operations are readKey, writeKey and deleteKey) so if you need that then have to store it somewhere else. Another drawback is that you have to tell it a size when you create the database. I used a size of 20M so I'd have plenty of room but the actual database it created occupies 266M. No idea why since there isn't a line of documentation.
One way I've done this in the past is to just make a directory where each file is named by a serialized key. One can use unsafeinterleaveIO to "thunk" the deserialized contents of each read file, so that values are only forced on read...

How can I get the *original* data behind an NSImage?

I have an instance of NSImage that's been handed to me by an API whose implementation I don't control.
I would like to obtain the original data (NSData) from which that NSImage was created, without the data being converted to another representation/format (or otherwise "molested"). If the image was created from a file, I want the exact, byte-for-byte contents of the file, including all metadata, etc. If the image was created from some arbitrary NSData instance I want an exact, byte-for-byte-equivalent copy of that NSData.
To be pedantic (since this is the troublesome case I've come across), if the NSImage was created from an animated GIF, I need to get back an NSData that actually contains the original animated GIF, unmolested.
EDIT: I realize that this may not be strictly possible for all NSImages all the time; How about for the subset of images that were definitely created from files and/or data?
I have yet to figure out a way to do this. Anyone have any ideas?
I agree with Ken, and having a subset of conditions (I know it's a GIF read from a file) doesn't change anything. By the time you have an NSImage, a lot of things have already happened to the data. Cocoa doesn't like to hold a bunch of data in memory that it doesn't directly need. If you had an original CGImage (not one generated out of the NSImage), you might get really lucky and find the data you wanted in CGDataProviderCopyData, but even if it happened to work, there's no promises about it.
But thinking through how you might, if you happened to get incredibly lucky, try to make it work:
Get the list of representations with -imageRepresentations.
Find the one that matches the original (hopefully there's just the one)
Get a CGImage from it with -CGImageForProposedRect:context:hints. You probably want a rect that matches the size of the image, and I'd probably pass a hint of no interpolation.
Get the data provider with CGImageGetDataProvider
Copy its data with CGDataProviderCopyData. (But I doubt this will be the actual original data including metadata, byte-for-byte.)
There are callbacks that will get you a direct byte-pointer into the internal data of a CGDataProvider (like CGDataProviderGetBytePointerCallback), but I don't know of any way to request the list of callbacks from an existing CGDataProvider. That's typically something Quartz accesses, and we just pass during creation.
I strongly suspect this is impossible.
This is not possible.
For one thing, not all images are backed by data. Some may be procedural. For example, an image created using +imageWithSize:flipped:drawingHandler: takes a block which draws the image.
But, in any case, even CGImage converts the data on import, and that's about as low-level as the Mac frameworks get.

Does NSFileWrapper support lazy loading?

I am creating a NSDocument package that contains potentially hundreds of large files, so I don't want to read it all in when opening the document.
I've spent some time searching, but I can't find a definitve answer. Most people seem to think that NSFileWrapper loads all of the data into memory, but some indicate that it doesn't load data until you invoke -regularFileContents on a wrapper. (See Does NSFileWrapper load everything into memory? and Objective-C / Cocoa: Uploading Images, Working Memory, And Storage for examples.)
The documentation isn't entirely clear, but options like NSFileWrapperReadingImmediate and NSFileWrapperReadingWithoutMapping seem to suggest that it doesn't always read everything in.
I gather that NSFileWrapper supports incremental saving, only writing out sub-wrappers that have been replaced. So it'd be nice if it supports incremental loading too.
Is there a definitive answer?
NSFileWrapper loads lazily by default, unless you specify the NSFileWrapperReadingImmediate option. It will avoid reading a file into memory until something actually requests it.
As a debugging aid only, you can see whether a file has been loaded yet, by examining:
[wrapper valueForKey:#"_contents"];
It gets filled in as NSData once the file is read from disk.

iTunes XML Parsing in cocoa

I am developing an application in cocoa .I need to parse a iTunes XML file of large size(about 25Mb).I am using the following code snippet now
NSDictionary *itunesDatabase = [NSDictionary dictionaryWithContentsOfFile:itunesPath];
But this is a little bit slow
Is there any faster method to load the entire data to a dictionary??
The reason you're having such slow performance is because NSDictionary reads everything into memory all at once. For a large iTunes library, this can take a long time and -- feel free to confirm this with Activity Monitor -- a metric assload of memory. (This is the precise technical term for that amount of memory)
The alternative in these situations is to use a callback-based XML parser (generally known as "SAX" parsers). These parse XML documents an entity at a time and call your callback methods. In Cocoa, the NSXMLParser class provides this functionality. You set your class as its delegate, call the parse method, and the parser starts calls the delegate methods as it reads tags, attributes, text, etc. in the XML file.
Now, this is obviously harder than just loading everything into an NSDictionary and walking the resulting tree of objects. You'll need to keep track of state information yourself. And you'll have to "build up" your objects progressively, so organizing your classes can be difficult.
However, you can ignore the XML you aren't interested in, and that saves a lot of memory. And, depending on what data you're getting out of iTunes, you may also be able to end the parsing as soon as you've gotten the data you need. Even if this does end up taking quite a while, at least you'll be able to show your user a progress bar or some other indication that your program is working, which is much better than just hanging for 10-20 seconds while NSDictionary loads a giant XML file.
If you're able to use third-party frameworks, run, do not walk to EyeTunes. (BSD license.) It's an abstraction layer around Apple Events for communicating with iTunes, and as such it doesn't parse the XML database directly (I think, it's been a while since I've used it), but you'll have get/set access to anything in the XML.
Try to use libxml:
http://www.cimgf.com/2008/08/18/cocoa-tutorial-libxml-and-xmlreader/
To minimize highest memory footprint, create and drain NSAutoreleasePool in your loop

Resources