Grep the first number in a line [duplicate] - bash

This question already has answers here:
Printing only the first field in a string
(3 answers)
Closed 6 years ago.
I have a file that has lines of numbers and a file:
2 20 3 file1.txt
93 21 42 file2.txt
52 10 12 file3.txt
How do I use grep, awk, or some other command to just give me the first numbers of each line so that it will only display:
2
93
52
Thanks.

So many ways to do this. Here are some (assuming the input file is gash.txt):
awk '{print $1}' gash.txt
or using pure bash:
while read num rest
do
echo $num
done < gash.txt
or using "classic" sed:
sed 's/[ \t]*\([0-9]\{1,\}\).*/\1/' gash.txt
or using ERE with sed:
sed -E 's/[ \t]*([0-9]+).*/\1/' gash.txt
Using cut is problematic because we don't know if the whitespace is spaces or tabs.
By the way, if you want to add the total number of lines:
awk '{total+=$1} END{print "Total:",total}' gash.txt

You can use this:
grep -oE '^\s*[0-9]+' filename
To handle the leading spaces, I'm currently out of options. You better accept the awk answer.

You can use awk
awk '{print $1}' file

Related

How to only read the last line from a text file [duplicate]

This question already has answers here:
How to read the last line of a text file into a variable using Bash? [closed]
(2 answers)
Print the last line of a file, from the CLI
(5 answers)
Closed 4 years ago.
I am working on a tool project. I need to grab the last line from a file & assign into a variable. This is what I have tried:
line=$(head -n $NF input_file)
echo $line
Maybe I could read the file in reverse then use
line=$(head -n $1 input_file)
echo $line
Any ideas are welcome.
Use tail ;)
line=$(tail -n 1 input_file)
echo $line
Combination of tac and awk here. Benefit in this approach could be we need NOT to read complete Input_file in it.
tac Input_file | awk '{print;exit}'
With sed or awk :
sed -n '$p' file
sed '$!d' file
awk 'END{print}' file
However, tail is still the right tool to do the job.

How can I read a file from last line to first line in shell? [duplicate]

This question already has an answer here:
Read a file backwords line by line in bash
(1 answer)
Closed 5 years ago.
I would like to read a line from last line to first line.
For example:
ant
ball
cat
upto 50 Lines
I need to print it in backwards
Solution 1st: Will be using tac:
tac Input_file
Solution 2nd: In case you don't have tac in your system:
sed '1!G;h;$!d' Input_file
Solution 3rd: With another sed solution too:
sed -n '1!G;h;$p' Input_file
Solution 4th: Using awk too:
awk '{a[FNR]=$0} END{for(i=FNR;i>=1;i--){print a[i]}}' Input_file
Solution 5th: perl solution:
perl -e 'print reverse <>' Input_file
Use tac [file] which is like cat but prints lines backwards.
from tac manual pages:
NAME
tac - concatenate and print files in reverse
SYNOPSIS
tac [OPTION]... [FILE]...
$ echo -e 1\\n2 |
awk '{b=$0 (b==""?"":ORS b)}END{print b}'
2
1
Explained:
$ awk '{b=$0 (b==""?"":ORS b)} # buffer records to the beginning of b
END{print b}' # after all records were buffered print b

How can I extract the 11th and 12th characters of a grep match in Bash?

I have a text file called temp.txt that consists of 3 serial numbers.
AB400-251429-0014
AA200-251429-0028
AD200-251430-0046
The 11th and 12th characters in the serial number correspond to the week. I want to extract this number for each unit and do something with it (but for this example just echo it). I have the following code:
while read line; do
week=` grep A[ABD][42]00 $line | cut -c11-12 `
echo $week
done < temp.txt
Looks like it's not working as cut is expecting a filename called the serial number in each case. Is there an alternative way to do this?
The problem is not with cut but with grep which expects a filename, but gets the line contents. Also, the expression doesn't match the IDs: they don't start with S followed by A, B, or D.
You can process lines in bash without starting a subshell:
while read line ; do
echo 11th and 12th characters are: "${line:10:2}".
done < temp.txt
Your original approach is still possible:
week=$( echo "$line" | grep 'S[ABD][42]00' | cut -c11-12 )
Note that for non matching lines, $week would be empty.
You can try also the:
grep -oP '.{10}\K..' filename
for your input prints
29
29
30
The \K mean, A variable length look-behind. With other words, the grep would look for the pattern before \K but would not include it into the result.
More precise selection of the lines:
grep -oP '[ABD][42]00-.{4}\K..' # or more precise
grep -oP '^\w[ABD][42]00-.{4}\K..' # or even more
grep -oP '^[A-Z][ABD][42]00-.{4}\K..' # or
grep -oP '^[A-Z][ABD][42]00-\d{4}\K..' # or
prints like the above, but selects the interesting lines... :)
I would use this simple awk
awk '{print substr($0,11,2)}' text.file
29
29
30
To get it into an array that you can use later:
results=($(awk '{print substr($0,11,2)}' text.file))
echo "${results[#]}"
29 29 30
TL;DR
Looping with Bash is pretty inefficient, especially when reading a file a line at a time. You can get what you want faster and more effectively by using grep to cut only the interesting lines, or by using awk to avoid having to call cut in a separate pipelined process.
GNU Grep and Cut Solution
$ grep '[[:alpha:]][ABD][42]' temp.txt | cut -c11,12
29
29
30
Awk Solutions
# As far as I know, this will work on most awks. If you find an exception,
# please post a constructive comment!
$ awk -v NF=1 -v FPAT=. '/[[:alpha:]][ABD][42]00/ { print $11 $12 }' temp.txt
29
29
30
# A more elegant solution as noted by #rici that works with GNU awk,
# and possibly others.
$ gawk -v FS= '/[[:alpha:]][ABD][42]00/ { print $11 $12 }' temp.txt
29
29
30
Store the Results in a Bash Array
Either way, you can store the results of your match in an Bash array to use later. For example:
$ results=(`grep '[[:alpha:]][ABD][42]00' temp.txt | cut -c11,12`)
$ echo "${results[#]}"
29 29 30

"grep"ing first 12 of last 24 character from a line

I am trying to extract "first 12 of last 24 character" from a line, i.e.,
for a line:
species,subl,cmp= 1 4 1 s1,torque= 0.41207E-09-0.45586E-13
I need to extract "0.41207E-0".
(I have not written the code, so don't curse me for its formatting. )
I have managed to do this via:
var_s=`grep "species,subl,cmp= $3 $4 $5" $tfile |sed -n '$s/.*\(........................\)$/\1/p'|sed -n '$s/\(............\).*$/\1/p'`
but, is there any more readable way of doing this, rather then counting dots?
EDIT
Thanks to both of you;
so, I have sed,awk grep and bash.
I will run that in loop, for 100's of file.
so, can you also suggest me which one is most efficient, wrt time?
One way with GNU sed (without counting dots):
$ sed -r 's/.*(.{11}).{12}/\1/' file
0.41207E-09
Similarly with GNU grep:
$ grep -Po '.{11}(?=.{12}$)' file
0.41207E-09
Perhaps a python solution may also be helpful:
python -c 'import sys;print "\n".join([a[-24:-13] for a in sys.stdin])' < file
0.41207E-09
I'm not sure your example data and question match up so just change the values in the {n} quantifier accordingly.
Simplest is using pure bash:
echo "${str:(-24):12}"
OR awk can also do that:
awk '{print substr($0, length($0)-23, 12)}' <<< $str
OUTPUT:
0.41207E-09
EDIT: For using bash solution on a file:
while read l; do echo "${l:(-24):12}"; done < file
Another one, less efficient but has the advantage of making you discover new tools
`echo "$str" | rev | cut -b 1-24 | rev | cut -b 1-12
You can use awk to get first 12 characters of last 24 characters from a line:
awk '{substr($0,(length($0)-23))};{print substr($0,(length($0)-10))}' myfile.txt

How can I delete every Xth line in a text file?

Consider a text file with scientific data, e.g.:
5.787037037037037063e-02 2.048402977658663748e-01
1.157407407407407413e-01 4.021264347118673754e-01
1.736111111111111049e-01 5.782032163406526371e-01
How can I easily delete, for instance, every second line, or every 9 out of 10 lines in the file? Is it for example possible with a bash script?
Background: the file is very large but I need much less data to plot. Note that I am using Ubuntu/Linux.
This is easy to accomplish with awk.
Remove every other line:
awk 'NR % 2 == 0' file > newfile
Remove every 10th line:
awk 'NR % 10 != 0' file > newfile
The NR variable in awk is the line number. Anything outside of { } in awk is a conditional, and the default action is to print.
How about perl?
perl -n -e '$.%10==0&&print' # print every 10th line
You could possibly do it with sed, e.g.
sed -n -e 'p;N;d;' file # print every other line, starting with line 1
If you have GNU sed it's pretty easy
sed -n -e '0~10p' file # print every 10th line
sed -n -e '1~2p' file # print every other line starting with line 1
sed -n -e '0~2p' file # print every other line starting with line 2
Try something like:
awk 'NR%3==0{print $0}' file
This will print one line in three. Or:
awk 'NR%10<9{print $0}' file
will print 9 lines out of ten.
This might work for you (GNU sed):
seq 10 | sed '0~2d' # delete every 2nd line
1
3
5
7
9
seq 100 | sed '0~10!d' # delete 9 out of 10 lines
10
20
30
40
50
60
70
80
90
100
You can use a awk and a shell script. Awk can be difficult but...
This will delete specific lines you tell it to:
nawk -f awkfile.awk [filename]
awkfile.awk contents
BEGIN {
if (!lines) lines="3 4 7 8"
n=split(lines, lA, FS)
for(i=1;i<=n;i++)
linesA[lA[i]]
}
!(FNR in linesA)
Also I can't remember if VIM comes with the standard Ubuntu or not. If not get it.
Then open the file with vim
vim [filename]
Then type
:%!awk NR\%2 or :%!awk NR\%2
This will delete every other line. Just change the 2 to another integer for a different frequency.

Resources