Modifying a large file in Cocoa - cocoa

I'm trying to modify tiny chunks (32 bytes) of large (hundreds of MB) .wav audio files.
Currently I load the files using
[NSData dataWithContentsOfURL:]
modify the bytes, and save the file using
[data writeToURL:].
Is there a convenient way to modify a binary file without loading it into RAM?
edit:
The following stdio functions work for me:
NSUInteger myOffset = 8;
const char *myBytes = myData.bytes;
NSUInteger myLength = myData.length;
FILE *file = fopen([[url path] cStringUsingEncoding:NSASCIIStringEncoding], "rb+");
assert(file);
fseek(file, myOffset, SEEK_CUR);
fwrite(myBytes, 1, myLength, file);
fclose(file);

yes, you would use a lower level approach such as fopen to avoid repeatedly loading/reloading the file via NSData (as you have found and mentioned in your update). this is the level i work at for audio file I/O.
if you want a Foundation type, you may want to try NSFileHandle.

Related

Writing last N bytes to file opened with FILE_FLAG_NO_BUFFERING

When writing lots of sequential data to disk I found that having an internal 4MB buffer and when opening the file for writing I specify [FILE_FLAG_NO_BUFFERING][1], so that my internal buffer is used.
But that also creates a requirement to write in full sector blocks (512 bytes on my machine).
How do I write the last N<512 bytes to disk?
Is there some flag to WriteFile to allow this?
Do I pad them with extra NUL characters and then truncate the file size down to the correct value?
(With SetFileValidData or similar?)
For those wondering the reason for trying this approach. Our application logs a lot. To handle this a dedicated log-thread exists, which formats and writes logs to disk. Also if we log with fullest detail we might log more per second than the disk-system can handle. (Usually noticed for customers with SAN systems that are not well tweaked.)
So, the goal is log write as much as possible, but also notice when we start to overload the system, and then hold back a bit, like reducing the details of the logs.
Hence the idea to have a fill a big memory-block and give that to the OS, hoping to reduce the overheads.
As the comments suggest, doing file writing this way is probably not the best solution for real world situations. But if writing with FILE_FLAG_NO_BUFFERING is used,
SetFileInformationByHandle is the way to mark the file shorter than whole blocks.
int data_len = len(str);
int len_last_block = BLOCKSIZE%datalen;
int padding_to_fill_block = (data_last_block == BLOCKSIZE ? 0 : (BLOCKSIZE-len_last_block);
str.append('\0', padding_to_fill_block);
ULONG bytes_written = 0;
::WriteFile(hFile, data, data_len+padding_to_fill_block, &bytes_written, NULL));
m_filesize += bytes_written;;
LARGE_INTEGER end_of_file_pos;
end_of_file_pos.QuadPart = m_filesize - padding_to_fill_block;
if (!::SetFileInformationByHandle(hFile, FileEndOfFileInfo, &end_of_file_pos, sizeof(end_of_file_pos)))
{
HRESULT hr = ::GetLastErrorMessage();
}

Is it possible to open a file allowing another processes to delete this file?

It seems the default os.Open call allows another processes to write the opened file, but not to delete it. Is it possible to enable deletion as well? In .NET this can be done using FileShare.Delete flag, is there any analog in Go?
os.Open will get you a file descriptor with flag O_RDONLY set; that means read only. You can specify your own flag by using os.OpenFile
O_RDONLY int = syscall.O_RDONLY // open the file read-only.
O_WRONLY int = syscall.O_WRONLY // open the file write-only.
O_RDWR int = syscall.O_RDWR // open the file read-write.
O_APPEND int = syscall.O_APPEND // append data to the file when writing.
O_CREATE int = syscall.O_CREAT // create a new file if none exists.
O_EXCL int = syscall.O_EXCL // used with O_CREATE, file must not exist
O_SYNC int = syscall.O_SYNC // open for synchronous I/O.
O_TRUNC int = syscall.O_TRUNC // if possible, truncate file when opened.
None of these modes, however, will allow you to have multiple writers on a single file. You can share the file descriptor by exec-ing or fork-ing but naively writing to the file from both processes will result in the OS deciding how to synchronise those writes -- which is almost never what you want.
Deleting a file while a process has a FD on it doesn't matter on unix-like systems. I'll go ahead and assume Windows won't like that, though.
Edit given the windows tag and #Not_a_Golfer's excellent observations:
You should be able to pass syscall.FILE_SHARE_DELETE as a flag to os.OpenFile on Windows, if that is what solves your problem.
If you need to combine several flags you can do so by or-ing them together:
syscall.FILE_SHARE_DELETE | syscall.SOME_OTHER_FLAG | syscall.AND_A_THIRD_FLAG
(note, however, that it's up to you to build a coherent flag)

How do I read a text file of about 2 GB? [duplicate]

This question already has answers here:
Text editor to open big (giant, huge, large) text files [closed]
(2 answers)
Closed 7 years ago.
I have a .txt file whose memory is more than 2 GB. The problem is I cannot open it with Notepad, Notepad++ or any other editor programs.
Any solutions?
Try Glogg.
the fast, smart log explorer.
I have opened log file of size around 2 GB, and the search is also very fast.
WordPad will open any text file no matter the size. However, it has limited capabilities as compared to a text editor.
Instead of loading / reading the complete file, you could use a tool to split the text file in smaller chunks. If you're using Linux, you could just use the split command (see this stackoverflow thread). For Windows, there are several tools available like HJSplit (see this superuser thread).
I use UltraEdit to edit large files. The maximum size I open with UltraEdit was about 2.5 GB. Also UltraEdit has a good hex editor in comparison to Notepad++.
EmEditor works quite well for me. It's shareware IIRC but doesn't stop working after the license expires..
I always use 010 Editor to open huge files. It can handle 2 GB easily. I was manipulating files with 50 GB with 010 Editor :-)
It's commercial now, but it has a trial version.
If you only need to read the file, I can suggest Large Text File Viewer.
https://www.portablefreeware.com/?id=693
and also refer this
Text editor to open big (giant, huge, large) text files
else if you would like to make your own tool try this . i presume that you know filestream reader in c#
const int kilobyte = 1024;
const int megabyte = 1024 * kilobyte;
const int gigabyte = 1024 * megabyte;
public void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
FileStream fileStream = new FileStream(theFilename, FileMode.Open, FileAccess.Read);
using (fileStream)
{
byte[] buffer = new byte[gigabyte];
fileStream.Seek(whereToStartReading, SeekOrigin.Begin);
int bytesRead = fileStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
ProcessChunk(buffer, bytesRead);
bytesRead = fileStream.Read(buffer, 0, buffer.Length);
}
}
}
private void ProcessChunk(byte[] buffer, int bytesRead)
{
// Do the processing here
}
refer this kindly
http://www.codeproject.com/Questions/543821/ReadplusBytesplusfromplusLargeplusBinaryplusfilepl
Try Vim,
emacs (has a low maximum buffer size limit if compiled in 32-bit mode), hex tools
There are quite number of tools available for viewing large files.
http://download.cnet.com/Large-Text-File-Viewer/3000-2379_4-90541.html
This for instance.
However, I was successful with larger files viewing in Visual studio. Thought it took some time to load, it worked.
For reading and editing, Geany for Windows is another good option. I've run in to limit issues with Notepad++, but not yet with Geany.

VisualC++ in memory Uncompression

I have a "stringstream" variable that stores some compressed binary data in gzip format.
I want to decompress this stringstream variable in memory.
First of all, for in-memory decompression of binary data in gzip format, what third party library do you suggest to use ?
I noticed zlib library for compression/decompression of gzip and deflate formats.
However, the two functions handling decompression that zlip provides do not seem to meet my needs exactly:
int uncompress (Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen);
int gzread (gzFile file, voidp buf, unsigned len);
The first one (uncompress) requires me to know the length of the decompressed data in advance to properly allocate enough memory for storage. In my case, it is unknown.
On the other hand, the second one (gzread) takes a file as input, not a memory buffer.
What do you suggest for an "efficient" in-memory decompression using zlip or some other library ?
Thanks.
There appears to be some decompression filters for gzip in the Boost library, this might be worth looking into:
http://www.boost.org/doc/libs/1_48_0/libs/iostreams/doc/classes/gzip.html

Leaking memory with Cocoa garbage collection

I've been beating my head against a wall trying to figure out how I had a memory leak in a garbage collected Cocoa app. (The memory usage in Activity Monitor would just grow and grow, and running the app using the GC Monitor instruments would also show an ever-growing graph.)
I eventually narrowed it down to a single pattern in my code. Data was being loaded into an NSData and then parsed by a C library (the data's bytes and length were passed into it). The C library has callbacks which would fire and return sub-string starting pointers and lengths (to avoid internal copying). However, for my purposes, I needed to turn them into NSStrings and keep them around awhile. I did this by using NSString's initWithBytes:length:encoding: method. I assumed that would copy the bytes and NSString would manage it appropriately, but something is going wrong because this leaks like crazy.
This code will "leak" or somehow trick the garbage collector:
- (void)meh
{
NSData *data = [NSData dataWithContentsOfFile:[[NSBundle mainBundle] pathForResource:#"holmes" ofType:#"txt"]];
const int substrLength = 80;
for (const char *substr = [data bytes]; substr-(const char *)[data bytes] < [data length]; substr += substrLength) {
NSString *cocoaString = [[NSString alloc] initWithBytes:substr length:substrLength encoding:NSUTF8StringEncoding];
[cocoaString length];
}
}
I can put this in timer and just watch memory usage go up and up with Activity Monitor as well as with the GC Monitor instrument. (holmes.txt is 594KB)
This isn't the best code in the world, but it shows the problem. (I'm running 10.6, the project is targeted for 10.5 - if that matters). I read over the garbage collection docs and noticed a number of possible pitfalls, but I don't think I'm doing anything obviously against the rules here. Doesn't hurt to ask, though. Thanks!
Project zip
Here's a pic of the object graph just growing and growing:
This is an unfortunate edge case. Please file a bug (http://bugreport.apple.com/) and attach your excellent minimal example.
The problem is two fold;
The main event loop isn't running and, thus, the collector isn't triggered via MEL activity. This leaves the collector doing its normal background only threshold based collections.
The data stores the data read from the file into a malloc'd buffer that is allocated from the malloc zone. Thus, the GC accounted allocation -- the NSData object itself -- is really tiny, but points to something really large (the malloc allocation). The end result is that the collector's threshold isn't hit and it doesn't collect. Obviously, improving this behavior is desired, but it is a hard problem.
This is a very easy bug to reproduce in a micro-benchmark or in isolation. In practice, there is typically enough going on that this problem won't happen. However, there may be certain cases where it does become problematic.
Change your code to this and the collector will collect the data objects. Note that you shouldn't use collectExhaustively often -- it does eat CPU.
- (void)meh
{
NSData *data = [NSData dataWithContentsOfFile:[[NSBundle mainBundle] pathForResource:#"holmes" ofType:#"txt"]];
const int substrLength = 80;
for (const char *substr = [data bytes]; substr-(const char *)[data bytes] < [data length]; substr += substrLength) {
NSString *cocoaString = [[NSString alloc] initWithBytes:substr length:substrLength encoding:NSUTF8StringEncoding];
[cocoaString length];
}
[data self];
[[NSGarbageCollector defaultCollector] collectExhaustively];
}
The [data self] keeps the data object alive after the last reference to it.

Resources