The wc -l gives wrong result - bash

I got wrong result from the wc -l command. After a long :( checking a found the core of the problem, here is the simulation:
$ echo "line with end" > file
$ echo -n "line without end" >>file
$ wc -l file
1 file
here are two lines, but missing the last "\n". Any easy solution?

For the wc line is what ends with the "\n" char. One of solutions is grep-ing the lines. The grep not looking for the ending NL.
e.g.
$ grep -c . file #count the occurrence of any character
2
the above will not count empty lines. If you want them, use the
$ grep -c '^' file #count the beginnings of the lines
2

from man page of wc
-l, --lines
print the newline counts
form man page of echo
-n do not output the trailing newline
so you have 1 newline in your file and thus wc -l shows 1.
You can use the following awk command to count lines
awk 'END{print NR}' file

Related

Why is wc -l counting lines incorrectly? [duplicate]

I have a text file which has over 60MB size. It has got entries in 5105043 lines, but when I am doing wc -l it is giving only 5105042 results which is one less than actual. Does anyone have any idea why it is happening?
Is it a common thing when the file size is large?
Last line does not contain a new line.
One trick to get the result you want would be:
sed -n '=' <yourfile> | wc -l
This tells sed just to print the line number of each line in your file which wc then counts. There are probably better solutions, but this works.
The last line in your file is probably missing a newline ending. IIRC, wc -l merely counts the number of newline characters in the file.
If you try: cat -A file.txt | tail does your last line contain a trailing dollar sign ($)?
EDIT:
Assuming the last line in your file is lacking a newline character, you can append a newline character to correct it like this:
printf "\n" >> file.txt
The results of wc -l should now be consistent.
60 MB seems a bit big file but for small size files. One option could be
cat -n file.txt
OR
cat -n sample.txt | cut -f1 | tail -1

How to determine if a line contains a character in bash?

I would like to make a bash script that uses 2 arguments, file1 file2 that copies all lines from the file1 that contains the letter b into file2 . I have found the solution to determine if a string is contains the letter
if [[ $string == *"b"* ]]; then
echo "It's there!"
fi
I just can figure how to apply this code to my problem, and run through each line of a random file.
In the course description i have found that this problem can be solved with the usage of head -n tail -n cat echo wc -c wc -l wc -w if case test , but we don't have to limit ourselves to the usage of just these commands.
This is the reason why grep has been invented:
grep "b" file1.txt >>file2.txt
(This copies all lines from file1.txt, containing the character b, to file2.txt)

Delete the 3 last line of my txt with bash? [duplicate]

I want to remove some n lines from the end of a file. Can this be done using sed?
For example, to remove lines from 2 to 4, I can use
$ sed '2,4d' file
But I don't know the line numbers. I can delete the last line using
$sed $d file
but I want to know the way to remove n lines from the end. Please let me know how to do that using sed or some other method.
I don't know about sed, but it can be done with head:
head -n -2 myfile.txt
If hardcoding n is an option, you can use sequential calls to sed. For instance, to delete the last three lines, delete the last one line thrice:
sed '$d' file | sed '$d' | sed '$d'
From the sed one-liners:
# delete the last 10 lines of a file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # method 2
Seems to be what you are looking for.
A funny & simple sed and tac solution :
n=4
tac file.txt | sed "1,$n{d}" | tac
NOTE
double quotes " are needed for the shell to evaluate the $n variable in sed command. In single quotes, no interpolate will be performed.
tac is a cat reversed, see man 1 tac
the {} in sed are there to separate $n & d (if not, the shell try to interpolate non existent $nd variable)
Use sed, but let the shell do the math, with the goal being to use the d command by giving a range (to remove the last 23 lines):
sed -i "$(($(wc -l < file)-22)),\$d" file
To remove the last 3 lines, from inside out:
$(wc -l < file)
Gives the number of lines of the file: say 2196
We want to remove the last 23 lines, so for left side or range:
$((2196-22))
Gives: 2174
Thus the original sed after shell interpretation is:
sed -i '2174,$d' file
With -i doing inplace edit, file is now 2173 lines!
If you want to save it into a new file, the code is:
sed -i '2174,$d' file > outputfile
You could use head for this.
Use
$ head --lines=-N file > new_file
where N is the number of lines you want to remove from the file.
The contents of the original file minus the last N lines are now in new_file
Just for completeness I would like to add my solution.
I ended up doing this with the standard ed:
ed -s sometextfile <<< $'-2,$d\nwq'
This deletes the last 2 lines using in-place editing (although it does use a temporary file in /tmp !!)
To truncate very large files truly in-place we have truncate command.
It doesn't know about lines, but tail + wc can convert lines to bytes:
file=bigone.log
lines=3
truncate -s -$(tail -$lines $file | wc -c) $file
There is an obvious race condition if the file is written at the same time.
In this case it may be better to use head - it counts bytes from the beginning of file (mind disk IO), so we will always truncate on line boundary (possibly more lines than expected if file is actively written):
truncate -s $(head -n -$lines $file | wc -c) $file
Handy one-liner if you fail login attempt putting password in place of username:
truncate -s $(head -n -5 /var/log/secure | wc -c) /var/log/secure
This might work for you (GNU sed):
sed ':a;$!N;1,4ba;P;$d;D' file
Most of the above answers seem to require GNU commands/extensions:
$ head -n -2 myfile.txt
-2: Badly formed number
For a slightly more portible solution:
perl -ne 'push(#fifo,$_);print shift(#fifo) if #fifo > 10;'
OR
perl -ne 'push(#buf,$_);END{print #buf[0 ... $#buf-10]}'
OR
awk '{buf[NR-1]=$0;}END{ for ( i=0; i < (NR-10); i++){ print buf[i];} }'
Where "10" is "n".
With the answers here you'd have already learnt that sed is not the best tool for this application.
However I do think there is a way to do this in using sed; the idea is to append N lines to hold space untill you are able read without hitting EOF. When EOF is hit, print the contents of hold space and quit.
sed -e '$!{N;N;N;N;N;N;H;}' -e x
The sed command above will omit last 5 lines.
It can be done in 3 steps:
a) Count the number of lines in the file you want to edit:
n=`cat myfile |wc -l`
b) Subtract from that number the number of lines to delete:
x=$((n-3))
c) Tell sed to delete from that line number ($x) to the end:
sed "$x,\$d" myfile
You can get the total count of lines with wc -l <file> and use
head -n <total lines - lines to remove> <file>
Try the following command:
n = line number
tail -r file_name | sed '1,nd' | tail -r
This will remove the last 3 lines from file:
for i in $(seq 1 3); do sed -i '$d' file; done;
I prefer this solution;
head -$(gcalctool -s $(cat file | wc -l)-N) file
where N is the number of lines to remove.
sed -n ':pre
1,4 {N;b pre
}
:cycle
$!{P;N;D;b cycle
}' YourFile
posix version
To delete last 4 lines:
$ nl -b a file | sort -k1,1nr | sed '1, 4 d' | sort -k1,1n | sed 's/^ *[0-9]*\t//'
I came up with this, where n is the number of lines you want to delete:
count=`wc -l file`
lines=`expr "$count" - n`
head -n "$lines" file > temp.txt
mv temp.txt file
rm -f temp.txt
It's a little roundabout, but I think it's easy to follow.
Count up the number of lines in the main file
Subtract the number of lines you want to remove from the count
Print out the number of lines you want to keep and store in a temp file
Replace the main file with the temp file
Remove the temp file
For deleting the last N lines of a file, you can use the same concept of
$ sed '2,4d' file
You can use a combo with tail command to reverse the file: if N is 5
$ tail -r file | sed '1,5d' file | tail -r > file
And this way runs also where head -n -5 file command doesn't run (like on a mac!).
#!/bin/sh
echo 'Enter the file name : '
read filename
echo 'Enter the number of lines from the end that needs to be deleted :'
read n
#Subtracting from the line number to get the nth line
m=`expr $n - 1`
# Calculate length of the file
len=`cat $filename|wc -l`
#Calculate the lines that must remain
lennew=`expr $len - $m`
sed "$lennew,$ d" $filename
A solution similar to https://stackoverflow.com/a/24298204/1221137 but with editing in place and not hardcoded number of lines:
n=4
seq $n | xargs -i sed -i -e '$d' my_file
In docker, this worked for me:
head --lines=-N file_path > file_path
Say you have several lines:
$ cat <<EOF > 20lines.txt
> 1
> 2
> 3
[snip]
> 18
> 19
> 20
> EOF
Then you can grab:
# leave last 15 out
$ head -n5 20lines.txt
1
2
3
4
5
# skip first 14
$ tail -n +15 20lines.txt
15
16
17
18
19
20
POSIX compliant solution using ex / vi, in the vein of #Michel's solution above.
#Michel's ed example uses "not-POSIX" Here-Strings.
Increment the $-1 to remove n lines to the EOF ($), or just feed the lines you want to (d)elete. You could use ex to count line numbers or do any other Unix stuff.
Given the file:
cat > sometextfile <<EOF
one
two
three
four
five
EOF
Executing:
ex -s sometextfile <<'EOF'
$-1,$d
%p
wq!
EOF
Returns:
one
two
three
This uses POSIX Here-Docs so it is really easy to modify - especially using set -o vi with a POSIX /bin/sh.
While on the subject, the "ex personality" of "vim" should be fine, but YMMV.
This will remove the last 12 lines
sed -n -e :a -e '1,10!{P;N;D;};N;ba'

command to count occurrences of word in entire file

I am trying to count the occurrences of a word in a file.
If word occurs multiple times in a line, I will count is a 1.
Following command will give me the output but will fail if line has multiple occurrences of word
grep -c "word" filename.txt
Is there any one liner?
You can use grep -o to show the exact matches and then count them:
grep -o "word" filename.txt | wc -l
Test
$ cat a
hello hello how are you
hello i am fine
but
this is another hello
$ grep -c "hello" a # Normal `grep -c` fails
3
$ grep -o "hello" a
hello
hello
hello
hello
$ grep -o "hello" a | wc -l # grep -o solves it!
4
Set RS in awk for a shorter one.
awk 'END{print NR-1}' RS="word" file
GNU awk allows it to be done in single command with use of multiple piped commands:
awk -v w="word" '$1==w{n++} END{print n}' RS=' |\n' file
cat file | cut -d ' ' | grep -c word
This assumes that all words in the file have spaces between the words. If there's punctuation concatenating the word to itself, or otherwise no spaces on a single line between the word and itself, they'll count as one.
grep word filename.txt | wc -l
grep prints the lines that match, then wc -l prints the number of lines matched

bash echo number of lines of file given in a bash variable without the file name

I have the following three constructs in a bash script:
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE)
echo $NUMOFLINES" lines"
echo $(wc -l $JAVA_TAGS_FILE)" lines"
echo "$(wc -l $JAVA_TAGS_FILE) lines"
And they both produce identical output when the script is run:
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
I.e. the name of the file is also echoed (which I don't want to). Why do these scriplets fail and how should I output a clean:
121711 lines
?
An Example Using Your Own Data
You can avoid having your filename embedded in the NUMOFLINES variable by using redirection from JAVA_TAGS_FILE, rather than passing the filename as an argument to wc. For example:
NUMOFLINES=$(wc -l < "$JAVA_TAGS_FILE")
Explanation: Use Pipes or Redirection to Avoid Filenames in Output
The wc utility will not print the name of the file in its output if input is taken from a pipe or redirection operator. Consider these various examples:
# wc shows filename when the file is an argument
$ wc -l /etc/passwd
41 /etc/passwd
# filename is ignored when piped in on standard input
$ cat /etc/passwd | wc -l
41
# unusual redirection, but wc still ignores the filename
$ < /etc/passwd wc -l
41
# typical redirection, taking standard input from a file
$ wc -l < /etc/passwd
41
As you can see, the only time wc will print the filename is when its passed as an argument, rather than as data on standard input. In some cases, you may want the filename to be printed, so it's useful to understand when it will be displayed.
wc can't get the filename if you don't give it one.
wc -l < "$JAVA_TAGS_FILE"
You can also use awk:
awk 'END {print NR,"lines"}' filename
Or
awk 'END {print NR}' filename
(apply on Mac, and probably other Unixes)
Actually there is a problem with the wc approach: it does not count the last line if it does not terminate with the end of line symbol.
Use this instead
nbLines=$(cat -n file.txt | tail -n 1 | cut -f1 | xargs)
or even better (thanks gniourf_gniourf):
nblines=$(grep -c '' file.txt)
Note: The awk approach by chilicuil also works.
It's a very simple:
NUMOFLINES=$(cat $JAVA_TAGS_FILE | wc -l )
or
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE | awk '{print $1}')
I normally use the 'back tick' feature of bash
export NUM_LINES=`wc -l filename`
Note the 'tick' is the 'back tick' e.g. ` not the normal single quote

Resources