How to read large text file on windows? [closed] - text-files

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have a large server log file (~750 MB) which I can't open with either Notepad or Notepad++ (they both say the file is too large).
Can anyone suggest a program (for Windows) that will only read a small part of the file into memory at a time?
Or do I need to write my own app to parse this file?

try this...
Large Text File Viewer
By the way, it is free :)
But, I think you should ask this on serverfault.com instead

If all you need is a tool for reading, then this thing will open the file instantly http://www.readfileonline.com/

use EmEditor, it's pretty good, i used it to open a file with more than 500mb

The integrated Text-Viewer of Total Commander can open huge files (>10GB) for viewing without any problems. It also provides different views, e.g. a Hex-View.

Definitely EditPad Lite !
It's extremely fast not just while opening files, but also functions like "Replace All", trimming of leading/trailing whitespaces or converting content to lowercase are very fast.
And it is also very similar to Notepad++ ;)

I have been using the BareTail for quite some time for viewing large logs (some GBs) and it is working very well is very fast. There is a free version and a commercial Pro version.
They say that it has
Real-time file
Optimised real-time viewing engine View files of any size (> 2GB)
Scroll to any point in the whole file instantly
View files over a network
Configurable line wrapping
Configurable TAB expansion
Configurable font, including spacing and offset to maximise use of screen space
Another alternative is Far Manager. Viewing a several GBs file is no problem (little memory footprint), but attempting to open the text file in the Editing mode might take several GBs of RAM, so be aware of that. I am not aware of the file size limit that can be viewed/edited in Far.

UltraEdit will do the trick.

I just used less on top of Cygwin to read a 3GB file, though I ended up using grep to find what I needed in it.
(less is more, but better.)
See this answer for more details on less: https://stackoverflow.com/a/1343576/1005039

if you can code, write a console app. here is the c# equivalent of what you're after.
you can do what you want with the results (split, execute etc):
SqlCommand command = null;
try
{
using (var connection = new SqlConnection("XXXX"))
{
command = new SqlCommand();
command.Connection = connection;
if (command.Connection.State == ConnectionState.Closed) command.Connection.Open();
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("C:\\test.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
command.CommandText = line;
command.ExecuteNonQuery();
Console.Write(" - DONE");
}
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
finally
{
if (command.Connection.State == ConnectionState.Open) command.Connection.Close();
}

I hate to promote my own stuff (well, not really), but PowerPad can open very large files.
Otherwise, I'd recommend a hex editor.

While Large Text File Viewer works great for just looking at a large file (and is free!), if the file is either a delimited or fixed-width file, then you should check out File Query. Not only can it open a file of any size (I have personally opened a 280GB file, but it can go larger), but it lets you query the file as though it was in a database as well, finding out any sort of information you could want from it.
It is not free though, so it is more for people that work with large files a lot, but if you have a one-off problem, you can just use the 30-day trial for free.

GnuUtils for Windows make this easy as well. In that package are standard UNIX utils like cat, ls and more. I am using cat filename | more to page through a huge file that Notepad++ can't open at all.

You should try TextPad, it can read a file of that size.
It's free to evaluate (you can evaluate indefinitely)

Related

Is there a way to find the PDF version of file in Xamarin? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a windows .NET application that manages many PDF Files. Some of the files are corrupt.
2 issues: I'll try to explain in my imperfect English...sorry
1.)
How can I detect if any pdf file is correct ?
I want to read header of PDF and detect if it is correct.
var okPDF = PDFCorrect(#"C:\temp\pdfile1.pdf");
2.)
How to know if byte[] (bytearray) of file is PDF file or not.
For example, for ZIP files, you could examine the first four bytes and see if they match the local header signature, i.e. in hex
50 4b 03 04
if (buffer[0] == 0x50 && buffer[1] == 0x4b && buffer[2] == 0x03 &&
buffer[3] == 0x04)
If you are loading it into a long, this is (0x04034b50). by David Pierson
I want the same for PDF files.
byte[] dataPDF = ...
var okPDF = PDFCorrect(dataPDF);
Any sample source code in .NET?
I check Header PDF like this:
public bool IsPDFHeader(string fileName)
{
byte[] buffer = null;
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
//buffer = br.ReadBytes((int)numBytes);
buffer = br.ReadBytes(5);
var enc = new ASCIIEncoding();
var header = enc.GetString(buffer);
//%PDF−1.0
// If you are loading it into a long, this is (0x04034b50).
if (buffer[0] == 0x25 && buffer[1] == 0x50
&& buffer[2] == 0x44 && buffer[3] == 0x46)
{
return header.StartsWith("%PDF-");
}
return false;
}
a. Unfortunately, there is no easy way to determine is pdf file corrupt. Usually, the problem files have a correct header so the real reasons of corruption are different. PDF file is effectively a dump of PDF objects. The file contains a reference table giving the exact byte offset locations of each object from the start of the file. So, most probably corrupted files have a broken offsets or may be some object is missed.
The best way to detect the corrupted file is to use specialized PDF libraries.
There are lots of both free and commercial PDF libraries for .NET. You may simply try to load PDF file with one of such libraries. iTextSharp will be a good choice.
b. According to the PDF reference the header of a PDF file usually looks like %PDF−1.X (where X is a number, for the present from 0 to 7). And 99% of PDF files have such header. However, there are some other kinds of headers which Acrobat Viewer accepts and even absence of a header isn't a real problem for PDF viewers. So, you shouldn't treat file as corrupted if it does not contain a header.
E.g., the header may be appeared somewhere within the first 1024 bytes of the file or be in the form %!PS−Adobe−N.n PDF−M.m
Just for your information I am a developer of the Docotic PDF library.
Well-behaving PDFs start with the first 9 Bytes as %PDF-1.x plus a newline (where x in 0..8). 1.x is supposed to give you the version of the PDF file format. The 2nd line are some binary bytes in order to help applications (editors) to identify the PDF as a non-ASCIItext file type.
However, you cannot trust this tag at all. There are lots of applications out there which use features from PDF-1.7 but claim to be PDF-1.4 and are thusly misleading some viewers into spitting out invalid error messages. (Most likey these PDFs are a result of a mis-managed conversion of the file from a higher to a lower PDF version.)
There is no such section as a "header" in PDF (maybe the initial 9 Bytes of %PDF-1.x are what you meant with "header"?). There may be embedded a structure for holding metadata inside the PDF, giving you info about Author, CreationDate, ModDate, Title and some other stuff.
My way to reliably check for PDF corruption
There is no other way to check for validity and un-corrupted-ness of a PDF than to render it.
A "cheap" and rather reliable way to check for such validity for me personally is to use Ghostscript.
However: you want this to happen fast and automatically. And you want to use the method programatically or via a scripted approach to check many PDFs.
Here is the trick:
Don't let Ghostscript render the file to a display or to a real (image) file.
Use Ghostscript's nullpage device instead.
Here's an example commandline:
gswin32c.exe ^
-o nul ^
-sDEVICE=nullpage ^
-r36x36 ^
"c:/path to /input.pdf"
This example is for Windows; on Unix use gs instead of gswin32c.exe and -o /dev/null.
Using -o nul -sDEVICE=nullpage will not output any rendering result. But all the stderr and stdout output of Ghostscript's processing the input.pdf will still appear in your console. -r36x36 sets resolution to 36 dpi to speed up the check.
%errorlevel% (or $? on Linux) will be 0 for an uncorrupted file. It will be non-0 for corrupted files. And any warning or error messages appearing on stdout may help you to identify problems with the input.pdf.
There is no other way to check for a PDF file's corruption than to somehow render it...
Update: Meanwhile not only %PDF-1.0, %PDF-1.1, %PDF-1.2, %PDF-1.3, %PDF-1.4, %PDF-1.5, %PDF-1.6, %PDF-1.7 and %PDF-1.8 are valid version indicators, but also %PDF-2.0.
The first line of a PDF file is a header identifying the version of the PDF specification
to which the file conforms %PDF-1.0, %PDF-1.1, %PDF-1.2, %PDF-1.3, %PDF-1.4 etc.
You could check this by reading some bytes from the start of the file and see if you have the header at the beginning for a match as PDF file. See the PDF reference from Adobe for more details.
Don't have a .NET example for you (haven't touched the thing in some years now) but even if I had, I'm not sure you can check for a complete valid content of the file. The header might be OK but the rest of the file might be messed up (as you said yourself, some files are corrupt).
You could use iTextSharp to open and attempt to parse the file (e.g. try and extract text from it) but that's probably overkill. You should also be aware that it's GNU Affero GPL unless you purchase a commercial licence.
Checking the header is tricky. Some of the code above simply won't work since not all PDF's start with %PDF. Some pdf's that open correctly in a viewer start with a BOM marker, others start like this
------------e56a47d13b73819f84d36ee6a94183
Content-Disposition: form-data; name="par"
...etc
So checking for "%PDF" will not work.
What I do is:
1.Validate extension
2.Open PDF file, read the header (first line) and check if it contains this string: "%PDF-"
3.Check if the file contains a string that specifies the number of pages by searching for multiple "/Page" (PDF file should always have at least 1 page)
As suggested earlier you can also use a library to read the file:
Reading PDF File Using iTextSharp

Powershell and Adobe OCR [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
we have many pdf files they are all unlocked they have text, pictures etc. everytime we have to open the file on adobe and do it manually i was thinking maybe there is a better way to do with PowerShell if not yeah we have to do over 1000 files and more are coming but thank you for your answer
Peggy
After looking into it a bit more, I discovered a command-line tool that you can use in tangent with PowerShell. It's called tesseract. For Windows and Linux, download the prebuilt binaries. For MacOS, you need to get use MacPorts or Homebrew.
You'll want to do something like this:
# Using Get-ChildItem's -Include parameter to filter file types
# requires the target path to end in an asterisk. Using just an
# asterisk as the path makes it target the current directory.
foreach ($pdf in (Get-ChildItem * -Include *.pdf))
{
# An array isn't needed, it's just good for arranging arguments
tesseract #(
#INPUT:
$pdf
#OUTPUT:
"$($pdf.Directory)\{OCR} $($pdf.Name)"
#LANGUAGE:
'-l','eng'
)
# The directory is included in the output path so that you can
# change Get-ChildItem's target without adjusting the argument
}
Or, without the fluff:
foreach ($pdf in (Get-ChildItem * -Include *.pdf))
{
tesseract $pdf "$($pdf.Directory)\{OCR} $($pdf.Name)" -l eng
}
Granted, I haven't actually tested tesseract out, but I did read other Q&A pages to derive the appropriate command. Let me know if there's any issues.
Your question is a bit unclear. There is a way to OCR images using PowerShell, such as using this function, and you can convert pdfs to images using this function (it does require imagemagick, which is available here, there are portable options if yuo don't want to install anything). This would effectively allow you to search PDF files that haven't been OCR'd.
However, in terms of directly editing the PDF files with PowerShell to make them into OCR'd PDFs, while PowerShell functionality might help you automate the process, you would first need to find a program that can do that sort of thing from the command line. The PDFs would also have to all be unlocked so that editing them would even be possible (though there are ways to circumvent PDF locks to unlock them).
Unfortunately, I don't really know of any programs that can do that. Maybe it's possible with some advanced Ghostscript parameters, but I haven't looked into it. It is certainly not going to be easy!

How do I delete data/characters from a file? [duplicate]

This question already has answers here:
How do I insert and delete some characters in the middle of a file?
(4 answers)
Closed 9 years ago.
I'm writing a program to edit a txt file.
But I found that the windows API WriteFile can only add data/characters to a file, but not deleting data from files.
The only solution I've come up is to read the whole file into a buffer using ReadFile, and then use a loop to shift the data one by one, then replace the old file with the new file. But I think this will probably make my program really slow.
Can anyone help please
thanks.
If you're trying to delete from the end of the file it can be very fast with truncate() and ftruncate().
Where are you trying to delete the data from? If it's from the middle, you'll have to use fseek(): If the file contains "ABCDEFG", and you want to delete "DEF", use fseek() to get to G, copy "G" into a buffer, fseek to where "C" is, then write() what's there. Then truncate the file to the correct size with ftruncate().
If this really becomes a performance issue for you, you'll want to either design your file in a way that accounts for this or use a database of some kind. You may also want to use memory-mapped files, but usually this is better done by a database that someone else wrote instead of reinventing the wheel.
Files are linear streams of data. If you want to remove content from a file, you must re-write all the content of the file that follows the part that you have remove. So, unless the content to be removed is at the end of the file, you will need to perform some writing. In the worst case scenario, in order to remove the first byte of a file, you need to re-write the entire file apart from the byte that you removed.
FWIW, Raymond Chen wrote a nice article on this subject: How do I delete bytes from the beginning of a file?

Great tools to find and replace in files? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I'm switching from a Windows PHP-specific editor to VIM, on the philosophy of "use one editor for everything and learn it really well."
However, one feature I liked in my PHP editor was its "find and replace" capability. I could approach things two ways:
Just find. Search all files in a project for a string, see all the occurrences listed, and click to dive into that file at that line.
Blindly replace all occurrences of "foo" with "bar".
And of course I could use the GUI to say what types of files, whether to look in subfolders, whether it was case sensitive, etc.
I'm trying to approximate this ability now, and trying to piece it together with bash is pretty tedious. Doable, but tedious.
Does anybody know any great tools for things like this, for Linux and/or Windows? (I would really prefer a GUI if possible.) Or failing that, a bash script that does the job well? (If it would list file names and line numbers and show code snippets, that would be great.)
Try sed. For example:
sed -i -e 's/foo/bar/g' myfile.txt
Vim has multi-file search built in using the command :vimgrep (or :grep to use an external grep program - this is the only option prior to Vim 7).
:vimgrep will search through files for a regex and load a list of matches into a buffer - you can then either navigate the list of results visually in the buffer or with the :cnext and :cprev commands. It also supports searching through directory trees with the ** wildcard. e.g.
:vimgrep "^Foo.*Bar" **/*.txt
to search for lines starting with Foo and containing Bar in any .txt file under the current directory.
:vimgrep uses the 'quickfix' buffer to store its results. There is also :lvimgrep which uses a local buffer that is specific to the window you are using.
Vim does not support multi-file replace out of the box, but there are plugins that will do that too on vim.org.
I don't get why you can't do this with VIM.
Just Find
/Foo
Highlights all instances of Foo in the file and you can do what you want.
Blindly Replace
:% s/Foo/Bar/g
Obviously this is just the tip of the iceberg. You have lots of flexibility of the scope of your search and full regex support for your term. It might not work exactly like your former editor, but I think your original 'use one editor' idea is a valid one.
Notepad++ allows me to search and replace in an entire folder (and subfolders), with regex support.
You can use perl in command prompt to replace text in files.
perl -p -i".backup" -e "s/foo/bar/g" test.txt
Since you are looking for a GUI tool, I generally use the following 2 tools. Both of them have great functionality including wildcat matching, regex, filetype filter etc. Both of them displays good useful information about the hit in files like filename/lines.
Visual Studio: fast yet powerful. I uses it if the file number is huge (say, tens of thousands...)
pspad: lightweight. And a good feature about find/replace for pspad is that it will organize hits in different files in a tree hierarchy, which is very clear.
There are a number of tools that you can use to make things easier. Firstly, to search all the files in the project from vim you can use :grep like so:
:grep 'Function1' myproject/
This essentially runs a grep and lets you quickly jump from/to locations where it has been found.
Ctags is a tool that finds declarations in your code and then allows vim to jump to these declarations. To do this, run ctags and then place your cursor over a function call and then use Ctrl-]. Here is a link with some more ctags information:
http://www.davedevelopment.co.uk/2006/03/13/vim-ctags-and-php-5/
I don't know if it is an option for you, but if you load all your files into vim with
vim *.php
than you can
:set hidden
:argdo %s/foo/bar/g => will execute the substitue command in all opened buffers
:wall => will write all opened buffers
Or instead of loading all your files into vim try :help vimgrep and a cominbation of :help argdo and :help argadd
For Windows, I think that grepWin is hard to beat -- a GUI to a powerful and flexible grep tool for Windows. It searches, and replaces, knows about regular expressions, that sort of stuff.
look into sed ... powerful command line tool that should accomplish most of what you're looking for ... its supports regex, so your find/replace is quite easy.
(man sed)
Notepad++ has support for syntax highlighting in many languages and supports find and replace across all open files with regex and basic \n \r \t support.
The command grep -rn "search terms" * will search for the specified terms in all files (including those in sub-directories) and will return matching lines including file name and line number. Armed with this info, it is easy to jump to a particular file/line in VIM.
As was mentioned before, sed is extremely powerful for doing find-and-replace.
You can run both of these tools from inside VIM as well.
Some developers I currently work with swear by Textpad. It has a UI and also supports using regex's -- everything you're looking for and more.
A very useful search tool is ack. (Ubuntu refers to it as "ack-grep" in the repositories and man pages.)
The short version of what it does is a combination of find and grep that's more powerful and intelligent than that pair.

Free alternative(s) to PowerGREP [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 months ago.
Improve this question
First of all, great praise goes out to PowerGREP. It's a great program.
But it's not free. Some of its options I'm looking for:
Being able to use .NET regexp's (or similar) to find things in a filtered list of files through subdirectories.
Replacing that stuff with other regexps.
Being able to jump to that part of the file in some sort of editor.
Non commandline.
Being able to copy the results / filename and occurrences of the text.
Low overhead would also be nice, so not too many dependencies, etc.
And I need it on Windows.
I would suggest trying the new dnGrep. It's a .NET application that provides grep-like functionality and has almost all the features you specified.
Here are the features and a sample screenshot:
Shell integration (ability to search from Windows Explorer)
Plain text/regex/XPath search (including case-insensitive search)
Phonetic search (using Bitap and Needleman-Wunch algorithms)
File move/copy/delete actions
Search inside archives (via plug-ins)
Search Microsoft Word documents (via plug-ins)
Search PDF documents (via plug-ins)
Undo functionality
Optional integration with a text editor (like Notepad++)
Bookmarks (ability to save regex searches for the future)
Pattern test form
Search result highlighting
Search result preview
Does not require installation (can be run from a USB drive)
Feature-wise nothing even comes close to PowerGREP, so the question is, how many compromises are you willing to make? I agree that PowerGREP's price tag is a bit steep (not that I have ever regretted a single penny I spent on it), so perhaps something cheaper might do?
UltraEdit is an excellent text editor with very good regex support. It supports Perl-style regular expressions, and you can do find/replace operations in multiple (optionally pre-filtered) files with it. I'd say it can do everything you want to do according to your question.
RegexBuddy, apart from being the best regex editor/debugger on the market, also has a limited GREP functionality, allowing search/replace in (pre-filtered) subdirectories. It's also not free, but considerably less expensive than PowerGREP, and its regex engine has all the features you could ask for (the current version even introduced recursive regexes, and the extremely useful ability to translate regexes between flavors). Big pluses here are the ability to do a non-desctructive preview for all operations, and to have backups automatically be created of all files that are modified during a grep.
I use GrepWin extensively during development and on production servers - it doesn't support all the features you specify, but it gets the job done (your mileage may vary).
For a fast loading, fast executing program used to only find (no search and replace) then I've found Baregrep to be pretty good. It does subdirectories.
You might have a look on this:
Open Source PowerGREP Alternatives
Currently there're six alternatives to PowerGREP.
Get Cygwin for a bunch of free alternatives!
grep, sed, awk, perl, python... goes on.
But, oops! you want to stick to GUI.
I always wonder at how people wrap GUI around things like grep and get cash for that!
WinGrep seems to be free though and, yet comes with quite a punch.
Windows Grep is designed for searching plain-ASCII text files, such as program source, HTML, RTF and batch files, but it can also search binary files such as word processor documents, databases, spreadsheets and executables.
I do not know PowerGREP, but grepWin lets you search regexes in directories.
You can get GNU grep or Gawk.

Resources