How do I read a text file of about 2 GB? [duplicate] - text-files

This question already has answers here:
Text editor to open big (giant, huge, large) text files [closed]
(2 answers)
Closed 7 years ago.
I have a .txt file whose memory is more than 2 GB. The problem is I cannot open it with Notepad, Notepad++ or any other editor programs.
Any solutions?

Try Glogg.
the fast, smart log explorer.
I have opened log file of size around 2 GB, and the search is also very fast.

WordPad will open any text file no matter the size. However, it has limited capabilities as compared to a text editor.

Instead of loading / reading the complete file, you could use a tool to split the text file in smaller chunks. If you're using Linux, you could just use the split command (see this stackoverflow thread). For Windows, there are several tools available like HJSplit (see this superuser thread).

I use UltraEdit to edit large files. The maximum size I open with UltraEdit was about 2.5 GB. Also UltraEdit has a good hex editor in comparison to Notepad++.

EmEditor works quite well for me. It's shareware IIRC but doesn't stop working after the license expires..

I always use 010 Editor to open huge files. It can handle 2 GB easily. I was manipulating files with 50 GB with 010 Editor :-)
It's commercial now, but it has a trial version.

If you only need to read the file, I can suggest Large Text File Viewer.
https://www.portablefreeware.com/?id=693
and also refer this
Text editor to open big (giant, huge, large) text files
else if you would like to make your own tool try this . i presume that you know filestream reader in c#
const int kilobyte = 1024;
const int megabyte = 1024 * kilobyte;
const int gigabyte = 1024 * megabyte;
public void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
FileStream fileStream = new FileStream(theFilename, FileMode.Open, FileAccess.Read);
using (fileStream)
{
byte[] buffer = new byte[gigabyte];
fileStream.Seek(whereToStartReading, SeekOrigin.Begin);
int bytesRead = fileStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
ProcessChunk(buffer, bytesRead);
bytesRead = fileStream.Read(buffer, 0, buffer.Length);
}
}
}
private void ProcessChunk(byte[] buffer, int bytesRead)
{
// Do the processing here
}
refer this kindly
http://www.codeproject.com/Questions/543821/ReadplusBytesplusfromplusLargeplusBinaryplusfilepl

Try Vim,
emacs (has a low maximum buffer size limit if compiled in 32-bit mode), hex tools

There are quite number of tools available for viewing large files.
http://download.cnet.com/Large-Text-File-Viewer/3000-2379_4-90541.html
This for instance.
However, I was successful with larger files viewing in Visual studio. Thought it took some time to load, it worked.

For reading and editing, Geany for Windows is another good option. I've run in to limit issues with Notepad++, but not yet with Geany.

Related

Writing last N bytes to file opened with FILE_FLAG_NO_BUFFERING

When writing lots of sequential data to disk I found that having an internal 4MB buffer and when opening the file for writing I specify [FILE_FLAG_NO_BUFFERING][1], so that my internal buffer is used.
But that also creates a requirement to write in full sector blocks (512 bytes on my machine).
How do I write the last N<512 bytes to disk?
Is there some flag to WriteFile to allow this?
Do I pad them with extra NUL characters and then truncate the file size down to the correct value?
(With SetFileValidData or similar?)
For those wondering the reason for trying this approach. Our application logs a lot. To handle this a dedicated log-thread exists, which formats and writes logs to disk. Also if we log with fullest detail we might log more per second than the disk-system can handle. (Usually noticed for customers with SAN systems that are not well tweaked.)
So, the goal is log write as much as possible, but also notice when we start to overload the system, and then hold back a bit, like reducing the details of the logs.
Hence the idea to have a fill a big memory-block and give that to the OS, hoping to reduce the overheads.
As the comments suggest, doing file writing this way is probably not the best solution for real world situations. But if writing with FILE_FLAG_NO_BUFFERING is used,
SetFileInformationByHandle is the way to mark the file shorter than whole blocks.
int data_len = len(str);
int len_last_block = BLOCKSIZE%datalen;
int padding_to_fill_block = (data_last_block == BLOCKSIZE ? 0 : (BLOCKSIZE-len_last_block);
str.append('\0', padding_to_fill_block);
ULONG bytes_written = 0;
::WriteFile(hFile, data, data_len+padding_to_fill_block, &bytes_written, NULL));
m_filesize += bytes_written;;
LARGE_INTEGER end_of_file_pos;
end_of_file_pos.QuadPart = m_filesize - padding_to_fill_block;
if (!::SetFileInformationByHandle(hFile, FileEndOfFileInfo, &end_of_file_pos, sizeof(end_of_file_pos)))
{
HRESULT hr = ::GetLastErrorMessage();
}

Windows (ReFS,NTFS) file preallocation hint

Assume I have multiple processes writing large files (20gb+). Each process is writing its own file and assume that the process writes x mb at a time, then does some processing and writes x mb again, etc..
What happens is that this write pattern causes the files to be heavily fragmented, since the files blocks get allocated consecutively on the disk.
Of course it is easy to workaround this issue by using SetEndOfFile to "preallocate" the file when it is opened and then set the correct size before it is closed. But now an application accessing these files remotely, which is able to parse these in-progress files, obviously sees zeroes at the end of the file and takes much longer to parse the file.
I do not have control over the this reading application so I can't optimize it to take zeros at the end into account.
Another dirty fix would be to run defragmentation more often, run Systernal's contig utility or even implement a custom "defragmenter" which would process my files and consolidate their blocks together.
Another more drastic solution would be to implement a minifilter driver which would report a "fake" filesize.
But obviously both solutions listed above are far from optimal. So I would like to know if there is a way to provide a file size hint to the filesystem so it "reserves" the consecutive space on the drive, but still report the right filesize to applications?
Otherwise obviously also writing larger chunks at a time obviously helps with fragmentation, but still does not solve the issue.
EDIT:
Since the usefulness of SetEndOfFile in my case seems to be disputed I made a small test:
LARGE_INTEGER size;
LARGE_INTEGER a;
char buf='A';
DWORD written=0;
DWORD tstart;
std::cout << "creating file\n";
tstart = GetTickCount();
HANDLE f = CreateFileA("e:\\test.dat", GENERIC_ALL, FILE_SHARE_READ, NULL, CREATE_ALWAYS, 0, NULL);
size.QuadPart = 100000000LL;
SetFilePointerEx(f, size, &a, FILE_BEGIN);
SetEndOfFile(f);
printf("file extended, elapsed: %d\n",GetTickCount()-tstart);
getchar();
printf("writing 'A' at the end\n");
tstart = GetTickCount();
SetFilePointer(f, -1, NULL, FILE_END);
WriteFile(f, &buf,1,&written,NULL);
printf("written: %d bytes, elapsed: %d\n",written,GetTickCount()-tstart);
When the application is executed and it waits for a keypress after SetEndOfFile I examined the on disc NTFS structures:
The image shows that NTFS has indeed allocated clusters for my file. However the unnamed DATA attribute has StreamDataSize specified as 0.
Systernals DiskView also confirms that clusters were allocated
When pressing enter to allow the test to continue (and waiting for quite some time since the file was created on slow USB stick), the StreamDataSize field was updated
Since I wrote 1 byte at the end, NTFS now really had to zero everything, so SetEndOfFile does indeed help with the issue that I am "fretting" about.
I would appreciate it very much that answers/comments also provide an official reference to back up the claims being made.
Oh and the test application outputs this in my case:
creating file
file extended, elapsed: 0
writing 'A' at the end
written: 1 bytes, elapsed: 21735
Also for sake of completeness here is an example how the DATA attribute looks like when setting the FileAllocationInfo (note that the I created a new file for this picture)
Windows file systems maintain two public sizes for file data, which are reported in the FileStandardInformation:
AllocationSize - a file's allocation size in bytes, which is typically a multiple of the sector or cluster size.
EndOfFile - a file's absolute end of file position as a byte offset from the start of the file, which must be less than or equal to the allocation size.
Setting an end of file that exceeds the current allocation size implicitly extends the allocation. Setting an allocation size that's less than the current end of file implicitly truncates the end of file.
Starting with Windows Vista, we can manually extend the allocation size without modifying the end of file via SetFileInformationByHandle: FileAllocationInfo. You can use Sysinternals DiskView to verify that this allocates clusters for the file. When the file is closed, the allocation gets truncated to the current end of file.
If you don't mind using the NT API directly, you can also call NtSetInformationFile: FileAllocationInformation. Or even set the allocation size at creation via NtCreateFile.
FYI, there's also an internal ValidDataLength size, which must be less than or equal to the end of file. As a file grows, the clusters on disk are lazily initialized. Reading beyond the valid region returns zeros. Writing beyond the valid region extends it by initializing all clusters up to the write offset with zeros. This is typically where we might observe a performance cost when extending a file with random writes. We can set the FileValidDataLengthInformation to get around this (e.g. SetFileValidData), but it exposes uninitialized disk data and thus requires SeManageVolumePrivilege. An application that utilizes this feature should take care to open the file exclusively and ensure the file is secure in case the application or system crashes.

How to create a big file to fill disk to the full on windows phone?

For example, I have a Lumia 920, its total space is 32G, and available free space is 24G.
Now I want to create some files to filling disk to the full, how can I create 24G files as quickly as possilbe? I tried, but very slowly. :-(
As far as I know, one app (see below link) can do that, but I really can't understand how does he do it? Write isolated storage is very slow slow. Could you give me some advices?
http://www.windowsphone.com/en-us/store/app/%E7%BC%93%E5%AD%98%E6%B8%85%E7%90%86/b790919d-8ec8-40d8-b97a-10c466cedca8
You just need to create the file like thisand write a single byte:
FileStream fs = new FileStream(#"c:\tmp\yourfilename", FileMode.CreateNew);
fs.Seek(2048L * 1024 * 1024, SeekOrigin.Begin);
fs.WriteByte(0);
fs.Close();
This example will create a 2GB file with a single byte of content !

Modifying a large file in Cocoa

I'm trying to modify tiny chunks (32 bytes) of large (hundreds of MB) .wav audio files.
Currently I load the files using
[NSData dataWithContentsOfURL:]
modify the bytes, and save the file using
[data writeToURL:].
Is there a convenient way to modify a binary file without loading it into RAM?
edit:
The following stdio functions work for me:
NSUInteger myOffset = 8;
const char *myBytes = myData.bytes;
NSUInteger myLength = myData.length;
FILE *file = fopen([[url path] cStringUsingEncoding:NSASCIIStringEncoding], "rb+");
assert(file);
fseek(file, myOffset, SEEK_CUR);
fwrite(myBytes, 1, myLength, file);
fclose(file);
yes, you would use a lower level approach such as fopen to avoid repeatedly loading/reloading the file via NSData (as you have found and mentioned in your update). this is the level i work at for audio file I/O.
if you want a Foundation type, you may want to try NSFileHandle.

MiniFilter Driver - modify a file bytes on IRP_MJ_CLOSE and IRP_MJ_CREATE

I'd like to change a file when it is closed and reverse the change when it is opened.
It's kind of like encryption driver except I don't want to encrypt the file.
I've created a new "Filter Driver: Filesystem Mini-Filter" project with WDK8 in Visual Studio 2012 and registered PreCreate, PostCreate, PreClose and PostClose as callback functions.
For example, on IRP_MJ_CLOSE of file which it's byte are {72,101,108,108,111} ("Hello"), I want that after the PostClose function the file would look like this on the hard disk:
{10,11,12,72,101,108,108,111}.
I suspect it is not as easy as just:
FLT_PREOP_CALLBACK_STATUS
PreClose (
_Inout_ PFLT_CALLBACK_DATA Data,
_In_ PCFLT_RELATED_OBJECTS FltObjects,
_Flt_CompletionContext_Outptr_ PVOID *CompletionContext
)
{
//...
//some if statment...
{
Data->Iopb->Parameters.Write.WriteBuffer = newBfr;
Data->Iopb->Parameters.Write.Length = newLen;
}
//...
return FLT_PREOP_SUCCESS_WITH_CALLBACK;
}
I'd like some guidance on the subject.
Also what is the best way to debug this? I Haven't found a way to print to the windows 7 debug.
Thanks!
gfgqtmakia.
EDIT: I've read http://code.msdn.microsoft.com/windowshardware/swapBuffer-File-System-6b7e6e2d but I don't think it'll help me because it is for read/write, which I don't want to deal with.
EDIT2: Or maybe I should do my changes in the PreCreate and PostClose, when the file is on the hard drive and not in the middle of an IRP, and then I won't need to deal with buffers "on the fly" but on the disk?
You will have to write something like swap buffers. Modifying file data in PostCreate/PreClose would not be good idea.
Few reasons:
Firstly in PostCreate/PreClose you shouldn't be accessing Data->Iopb->Parameters.Write.WriteBuffer. That is valid only in IRP_MJ_WRITE. You can do FltWriteFile to write data to file.
Windows kernel may not write file data immediately to the disk in/after IRP_MJ_CLOSE. Think about page cache.
There are may complexities like paging i/o, direct i/o etc. that need to be taken care properly.
Another major thing I notice it that you will also change the file size (as said in your question actual data length is 5 bytes while you will update data to 8 bytes). Now this is very difficult to manage. It never recommended to change the file size in minifilter/file system driver.

Resources