How do I read the first line of a file using cat? - bash

How do I read the first line of a file using cat?

You don't need cat.
head -1 file
will work fine.

You don't, use head instead.
head -n 1 file.txt

There are many different ways:
sed -n 1p file
head -n 1 file
awk 'NR==1' file

You could use cat file.txt | head -1, but it would probably be better to use head directly, as in head -1 file.txt.

This may not be possible with cat. Is there a reason you have to use cat?
If you simply need to do it with a bash command, this should work for you:
head -n 1 file.txt

cat alone may not be possible, but if you don't want to use head this works:
cat <file> | awk 'NR == 1'

I'm surprised that this question has been around as long as it has, and nobody has provided the pre-mapfile built-in approach yet.
IFS= read -r first_line <file
...puts the first line of the file in the variable expanded by "$first_line", easy as that.
Moreover, because read is built into bash and this usage requires no subshell, it's significantly more efficient than approaches involving subprocesses such as head or awk.

You dont need any external command if you have bash v4+
< file.txt mapfile -n1 && echo ${MAPFILE[0]}
or if you really want cat
cat file.txt | mapfile -n1 && echo ${MAPFILE[0]}
:)

Use the below command to get the first row from a CSV file or any file formats.
head -1 FileName.csv

There is plenty of good answer to this question. Just gonna drop another one into the basket if you wish to do it with lolcat
lolcat FileName.csv | head -n 1

Adding one more obnoxious alternative to the list:
perl -pe'$.<=1||last' file
# or
perl -pe'$.<=1||last' < file
# or
cat file | perl -pe'$.<=1||last'

Related

"grep"ing first 12 of last 24 character from a line

I am trying to extract "first 12 of last 24 character" from a line, i.e.,
for a line:
species,subl,cmp= 1 4 1 s1,torque= 0.41207E-09-0.45586E-13
I need to extract "0.41207E-0".
(I have not written the code, so don't curse me for its formatting. )
I have managed to do this via:
var_s=`grep "species,subl,cmp= $3 $4 $5" $tfile |sed -n '$s/.*\(........................\)$/\1/p'|sed -n '$s/\(............\).*$/\1/p'`
but, is there any more readable way of doing this, rather then counting dots?
EDIT
Thanks to both of you;
so, I have sed,awk grep and bash.
I will run that in loop, for 100's of file.
so, can you also suggest me which one is most efficient, wrt time?
One way with GNU sed (without counting dots):
$ sed -r 's/.*(.{11}).{12}/\1/' file
0.41207E-09
Similarly with GNU grep:
$ grep -Po '.{11}(?=.{12}$)' file
0.41207E-09
Perhaps a python solution may also be helpful:
python -c 'import sys;print "\n".join([a[-24:-13] for a in sys.stdin])' < file
0.41207E-09
I'm not sure your example data and question match up so just change the values in the {n} quantifier accordingly.
Simplest is using pure bash:
echo "${str:(-24):12}"
OR awk can also do that:
awk '{print substr($0, length($0)-23, 12)}' <<< $str
OUTPUT:
0.41207E-09
EDIT: For using bash solution on a file:
while read l; do echo "${l:(-24):12}"; done < file
Another one, less efficient but has the advantage of making you discover new tools
`echo "$str" | rev | cut -b 1-24 | rev | cut -b 1-12
You can use awk to get first 12 characters of last 24 characters from a line:
awk '{substr($0,(length($0)-23))};{print substr($0,(length($0)-10))}' myfile.txt

Read lines between two lines specified by their line number

How would I go about reading all lines between two specific lines?
Lets say line 23 is where I want to start, and line 56 is the last line to read, but it is not the end of the file.
How would I go about reading lines 23 thru 56? I will be outputting them to another file.
By row number like that is quite easy with awk:
awk 'NR >= 23 && NR <= 56'
And either way, sed makes it fun.
sed '23,56!d'
Or for a pattern,
sed '/start/,/end/!d'
I would go for sed, but a head/tail combination is possible as well:
head -n 56 file | tail -n $((56-23))
Well - I'm pretty sure there is an off-by-one-error inside. I'm going to find it. :)
Update:
Haha - know your errors, I found it:
head -n 56 file | tail -n $((56-23+1))
Sed can do that:
$ sed -n 23,56p yourfile
EDIT: as commenters pointed out making sed stop processing after the last line of the interval will make sed perform as fast as head-tail combination. So the most optimal way of getting the lines would be
$ sed -n '23,56p;57q' yourfile
But performance will greatly depend on the file you're processing, the interval and lots of other factors. So in case you're developing some script to be run frequently on known data testing all three methods mentioned in answers (sed, awk, head-tail) would be a good idea.
use sed. This should do it.
sed -n '23,56p' > out.txt
This might work for you:
sed '1,22d;56q' file
or this:
sed '23,56!d;56q' file
or this:
awk 'NR>56{exit};NR==23,NR==56' file

Output specific line huge text file

I have a sql dump with 300mb that gives me an error on specific line.
But that line is in the middle of the file. What is the best approach?
head -n middleLine dump.sql > output?
Or can i output only the line i need?
You could use sed -n -e 123456p your.dump to print line 123456
If the file is long, consider using
sed -n 'X{p;q}' file
Where X is the line number. It will stop reading the file after reaching that line.
If sed is too slow for your taste you may also use
cat $THE_FILE | head -n $DESIRED_LINE | tail -n 1
You can use sed:
sed -n "x p" dump.sql
where x is the line number.
This might work for you:
sed 'X!d;q' file
where X is the line number.
This can also be done with Perl:
perl -wnl -e '$. == 4444444 and print and exit;' FILENAME.sql
4444444 being the line number you wish to print.
You can also try awk like:
awk 'NR==YOUR_LINE_NO{print}' file_name
If you know a phrase on that line I would use grep. If the phrase is "errortext" use:
$ cat dump.sql | grep "errortext"

Reading a specific line of a file

What is the best way (better performance) to read a specific line of a file? Currently, I'm using the following command line:
head -line_number file_name | tail -1
ps.: preferentially, using shell tools.
You could use sed.
# print line number 10
$ sed -n '10p' file_name
$ sed '10!d' file_name
$ sed '10q;d' file_name
#print 10th line
awk NR==10 file_name
awk -v linenum=10 'NR == linenum {print; exit}' file
If you know the lines are the same length, then a program could directly index in to that line without reading all the preceeding ones: something like od might be able to do that, or you could code it up in half a dozen lines in most-any language. Look for a function called seek() or fseek().
Otherwise, perhaps...
tail +N | head -n 1
...as this asks tail to skip to the Nth line, and there are less lines put needlessly through the pipe than with your head to tail solution.
ruby -ne '$.==10 and (print; exit)' file
I've tried it couple of times to avoid the file cache and found the head + tail was quick but the ruby was the fastest:
$ wc -l myfile.txt
920391 myfile.txt
$ time awk NR==334227 myfile.txt
my_searched_line
real 0m14.963s
user 0m1.235s
sys 0m0.126s
$ time head -334227 myfile.txt |tail -1
my_searched_line
real 0m5.524s
user 0m0.569s
sys 0m0.725s
$ time sed '334227!d' myfile
my_searched_line
real 0m12.565s
user 0m0.814s
sys 0m0.398s
$ time ruby -ne '$.==334227 and (print; exit)' myfile
my_searched_line
real 0m0.750s
user 0m0.568s
sys 0m0.179s

How can I remove the first line of a text file using bash/sed script?

I need to repeatedly remove the first line from a huge text file using a bash script.
Right now I am using sed -i -e "1d" $FILE - but it takes around a minute to do the deletion.
Is there a more efficient way to accomplish this?
Try tail:
tail -n +2 "$FILE"
-n x: Just print the last x lines. tail -n 5 would give you the last 5 lines of the input. The + sign kind of inverts the argument and make tail print anything but the first x-1 lines. tail -n +1 would print the whole file, tail -n +2 everything but the first line, etc.
GNU tail is much faster than sed. tail is also available on BSD and the -n +2 flag is consistent across both tools. Check the FreeBSD or OS X man pages for more.
The BSD version can be much slower than sed, though. I wonder how they managed that; tail should just read a file line by line while sed does pretty complex operations involving interpreting a script, applying regular expressions and the like.
Note: You may be tempted to use
# THIS WILL GIVE YOU AN EMPTY FILE!
tail -n +2 "$FILE" > "$FILE"
but this will give you an empty file. The reason is that the redirection (>) happens before tail is invoked by the shell:
Shell truncates file $FILE
Shell creates a new process for tail
Shell redirects stdout of the tail process to $FILE
tail reads from the now empty $FILE
If you want to remove the first line inside the file, you should use:
tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"
The && will make sure that the file doesn't get overwritten when there is a problem.
You can use -i to update the file without using '>' operator. The following command will delete the first line from the file and save it to the file (uses a temp file behind the scenes).
sed -i '1d' filename
For those who are on SunOS which is non-GNU, the following code will help:
sed '1d' test.dat > tmp.dat
You can easily do this with:
cat filename | sed 1d > filename_without_first_line
on the command line; or to remove the first line of a file permanently, use the in-place mode of sed with the -i flag:
sed -i 1d <filename>
No, that's about as efficient as you're going to get. You could write a C program which could do the job a little faster (less startup time and processing arguments) but it will probably tend towards the same speed as sed as files get large (and I assume they're large if it's taking a minute).
But your question suffers from the same problem as so many others in that it pre-supposes the solution. If you were to tell us in detail what you're trying to do rather then how, we may be able to suggest a better option.
For example, if this is a file A that some other program B processes, one solution would be to not strip off the first line, but modify program B to process it differently.
Let's say all your programs append to this file A and program B currently reads and processes the first line before deleting it.
You could re-engineer program B so that it didn't try to delete the first line but maintains a persistent (probably file-based) offset into the file A so that, next time it runs, it could seek to that offset, process the line there, and update the offset.
Then, at a quiet time (midnight?), it could do special processing of file A to delete all lines currently processed and set the offset back to 0.
It will certainly be faster for a program to open and seek a file rather than open and rewrite. This discussion assumes you have control over program B, of course. I don't know if that's the case but there may be other possible solutions if you provide further information.
The sponge util avoids the need for juggling a temp file:
tail -n +2 "$FILE" | sponge "$FILE"
If you want to modify the file in place, you could always use the original ed instead of its streaming successor sed:
ed "$FILE" <<<$'1d\nwq\n'
The ed command was the original UNIX text editor, before there were even full-screen terminals, much less graphical workstations. The ex editor, best known as what you're using when typing at the colon prompt in vi, is an extended version of ed, so many of the same commands work. While ed is meant to be used interactively, it can also be used in batch mode by sending a string of commands to it, which is what this solution does.
The sequence <<<$'1d\nwq\n' takes advantage of modern shells' support for here-strings (<<<) and ANSI quotes ($'...') to feed input to the ed command consisting of two lines: 1d, which deletes line 1, and then wq, which writes the file back out to disk and then quits the editing session.
As Pax said, you probably aren't going to get any faster than this. The reason is that there are almost no filesystems that support truncating from the beginning of the file so this is going to be an O(n) operation where n is the size of the file. What you can do much faster though is overwrite the first line with the same number of bytes (maybe with spaces or a comment) which might work for you depending on exactly what you are trying to do (what is that by the way?).
You can edit the files in place: Just use perl's -i flag, like this:
perl -ni -e 'print unless $. == 1' filename.txt
This makes the first line disappear, as you ask. Perl will need to read and copy the entire file, but it arranges for the output to be saved under the name of the original file.
should show the lines except the first line :
cat textfile.txt | tail -n +2
Could use vim to do this:
vim -u NONE +'1d' +'wq!' /tmp/test.txt
This should be faster, since vim won't read whole file when process.
How about using csplit?
man csplit
csplit -k file 1 '{1}'
This one liner will do:
echo "$(tail -n +2 "$FILE")" > "$FILE"
It works, since tail is executed prior to echo and then the file is unlocked, hence no need for a temp file.
Since it sounds like I can't speed up the deletion, I think a good approach might be to process the file in batches like this:
While file1 not empty
file2 = head -n1000 file1
process file2
sed -i -e "1000d" file1
end
The drawback of this is that if the program gets killed in the middle (or if there's some bad sql in there - causing the "process" part to die or lock-up), there will be lines that are either skipped, or processed twice.
(file1 contains lines of sql code)
tail +2 path/to/your/file
works for me, no need to specify the -n flag. For reasons, see Aaron's answer.
You can use the sed command to delete arbitrary lines by line number
# create multi line txt file
echo """1. first
2. second
3. third""" > file.txt
deleting lines and printing to stdout
$ sed '1d' file.txt
2. second
3. third
$ sed '2d' file.txt
1. first
3. third
$ sed '3d' file.txt
1. first
2. second
# delete multi lines
$ sed '1,2d' file.txt
3. third
# delete the last line
sed '$d' file.txt
1. first
2. second
use the -i option to edit the file in-place
$ cat file.txt
1. first
2. second
3. third
$ sed -i '1d' file.txt
$cat file.txt
2. second
3. third
If what you are looking to do is recover after failure, you could just build up a file that has what you've done so far.
if [[ -f $tmpf ]] ; then
rm -f $tmpf
fi
cat $srcf |
while read line ; do
# process line
echo "$line" >> $tmpf
done
Based on 3 other answers, I came up with this syntax that works perfectly in my Mac OSx bash shell:
line=$(head -n1 list.txt && echo "$(tail -n +2 list.txt)" > list.txt)
Test case:
~> printf "Line #%2d\n" {1..3} > list.txt
~> cat list.txt
Line # 1
Line # 2
Line # 3
~> line=$(head -n1 list.txt && echo "$(tail -n +2 list.txt)" > list.txt)
~> echo $line
Line # 1
~> cat list.txt
Line # 2
Line # 3
Would using tail on N-1 lines and directing that into a file, followed by removing the old file, and renaming the new file to the old name do the job?
If i were doing this programatically, i would read through the file, and remember the file offset, after reading each line, so i could seek back to that position to read the file with one less line in it.

Resources