Printing a sample from a huge file

Printing a sample from a huge file - bash

I have a rather large text corpus, of which I would like to check a few lines to see if the format is correct (and to just generally get some idea of its contents). Is there a simple one-liner that can be used to print just the first few lines of a huge text file?
Personally I'm using PowerShell, but answers are appreciated for bash and several other shells.

In powershell
get-content c:\filename.txt -TotalCount 3 #here just the first 3 line.

The n first lines
head -n filename
The n first bytes
dd if=filename bs=1 count=n

$ head yourfile.txt
I'm certain there's an equivalent in PowerShell. I mean, there must be, right?
Edit: Yep. Windows equivalent of the 'tail' command

You can use less. It's very efficient with large files. And, if you need to see more* you can continue paging through the file.
*"less is more"

Related

Is it possible to write a bash script that changes the current directory to a random one?

Of course there would be little to no point to this, but the idea struck me today and I haven't really seen anything on it. I suppose it could be a good exercise on efficiency, especially if you consider every directory under (/).

First idea that comes to mind:
Pipe a recursive ls command to a file, use wc to count number of lines, then generate a random integer and use it to pick a line from the file you just created.
ls -R / >> file_list.txt
count=$(wc -l file_list.txt)
etc. You can get it from here.

Any way to see the number of lines in a file without opening the file?

I have a rather large text file (~45MB) and I want to know how many lines total it has. Since the file is so large, it takes too long to open with a text editor and check manually that way. I am wondering if there is a shell command/shell script (I'd prefer a tcsh answer since that's what I use) or any other way to "quickly" (meaning, more quickly than opening the file and checking out the end) determine how many lines the text file has?
I am in a *nix environment.

wc -l filename
It won't be fast, since it has to read the entire file to count the lines. But there's no other way, since Unix doesn't keep track of that anywhere.

Use wc (word count, which has a "lines" mode):
LINES=`wc -l file.txt`
echo $LINES

Scanning text file line by line and reading line with unix shell script

I know there are a lot of different threads already like this, but nothing I can find seems to explain well enough exactly what I'm trying to do.
Basically I want to have a shell script that just goes through a text file, line by line, and searches for the words "Error" or "Exception". Whenever it comes across those words it would record the line number so I can later shoot the text file off in an email with the problem lines.
I've seen a lot of stuff that explains how to loop through a text file line by line, but I don't understand how I can run a regular expression on that line, because I'm not sure exactly how to use regular expressions with a shell script and also what variable each line is being stored in...
If anybody can clarify these things for me I would really appreciate it.

There are numerous tools that automatically loop thru files. I would suggest a simple solution like:
grep -inE 'error|exception' logfile > /tmp/logSearch.$$
if [[ -s /tmp/logSearch.$$ ]] ;then
mailx -s "errors in Log" < /tmp/logSearch.$$
fi
/bin/rm /tmp/logSearch.$$
use man grep to understand the options I'm supplying.
From `man bash
-s file
True if file exists and has a size greater than zero.
I hope this helps.

You need to look into using the grep command. With it you can search for a specific string, output the line number and the line itself, and do much much more.
Here is a link to a site with practical examples of using the command: http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/
Point #15 in the article may be of special interest to you.

Do I need to generate a second file to sort a file?

I want to sort a bunch of files. I can do
sort file.txt > foo.txt
mv foo.txt file.txt
but do I need this second file?
(I tried sort file.txt > file.txt of course, but then I just ended up with an empty file.)

Try:
sort -o file.txt file.txt
See http://ss64.com/bash/sort.html
`-o OUTPUT-FILE'
Write output to OUTPUT-FILE instead of standard output. If
OUTPUT-FILE is one of the input files, `sort' copies it to a
temporary file before sorting and writing the output to
OUTPUT-FILE.

The philosophy of classic Unix tools like sort includes that you can build a pipe with them. Every little tool reads from STDIN and writes to STDOUT. This way the next little tool down the pipe can read the output of the first as input and act on it.
So I'd say that this is a bug and not a feature.
Please also read about Pipes, Redirection, and Filters in the very nice book by ESR.

Because you're writing back to the same file you'll always end up with a problem of the redirect opening the output file before sort gets done loading the original. So yes, you need to use a separate file.
Now, having said that, there are ways to buffer the whole file into the pipe stream first but generally you wouldn't want to do that, although it is possible if you write something to do it. But you'd be inserting special tools at the beginning and the end to do the buffering. Bash, however, will open the output file too soon if you use it's > redirect.

Yes, you do need a second file! The command
sort file.txt > file.txt
would have bash to set up the redirection of stout before it starts executing sort. This is a certain way to clobber your input file.
If you want to sort many files try :
cat *.txt | sort > result.txt

if you are dealing with sorting fixed length records from a single file, then the sort algorithm can swap records within the file. There are a few available algorithms availabe. Your choice would depend on the amount of the file's randomness properties. Generally, quicksort tends to swap the fewest number of records and is usually the sort that completes first, when compared to othersorting algorithms.

Loop through a directory with Grep (newbie)

I'm trying to do loop through the current directory that the script resides in, which has a bunch of files that end with _list.txt I would like to grep each file name and assign it to a variable and then execute some additional commands and then move on to the next file until there are no more _list.txt files to be processed.
I assume I want something like:
while file_name=`grep "*_list.txt" *`
do
Some more code
done
But this doesn't work as expected. Any suggestions of how to accomplish this newbie task?
Thanks in advance.

If I understand you problem correctly, you don't need a grep. You can just do:
for file in *_list.txt
do
# use $file, like echo $file
done

grep is one of the most useful commands of Unix. You must comprehend it well; see some useful examples here. As far as your current requirement, I think following code will be useful:
for file in *.*
do
echo "Happy Programming"
done
In place of *.* you can also use regular expressions. For more such useful examples, see First Time Linux, or read all grep options at your terminal using man grep.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Printing a sample from a huge file - bash

In powershell get-content c:\filename.txt -TotalCount 3 #here just the first 3 line.

The n first lines head -n filename The n first bytes dd if=filename bs=1 count=n

$ head yourfile.txt I'm certain there's an equivalent in PowerShell. I mean, there must be, right? Edit: Yep. Windows equivalent of the 'tail' command

You can use less. It's very efficient with large files. And, if you need to see more* you can continue paging through the file. *"less is more"

Related

Is it possible to write a bash script that changes the current directory to a random one?

Any way to see the number of lines in a file without opening the file?

Scanning text file line by line and reading line with unix shell script

Do I need to generate a second file to sort a file?

Loop through a directory with Grep (newbie)

Categories

Resources