Cut from column to end of line - bash

I'm having a bit of an issue cutting the output up from egrep. I have output like:
From: First Last
From: First Last
From: First Last
I want to cut out the "From: " (essentially leaving the "First Last").
I tried
cut -d ":" -f 7
but the output is just a bunch of blank lines.
I would appreciate any help.
Here's the full code that I am trying to use if it helps:
egrep '^From:' $file | cut -d ":" -f 7
NOTE: I've already tested the egrep portion of the code and it works as expected.

The cut command lines in your question specify colon-separated fields and that you want the output to consist only of field 7; since there is no 7th field in your input, the result you're getting isn't what you intend.
Since the "From:" prefix appears to be identical across all lines, you can simply cut from the 7th character onward:
egrep '^From:' $file | cut -c7-
and get the result you intend.

you were really close.
I think you only need to replace ":" with " " as separator and add "-" after the "7": like this:
cut -d " " -f 2-
I tested and works pretty well.

The -f argument is for what fields. Since there is only one : in the line, there's only two fields. So changing -f 7 to -f 2- will give you want you want. Albeit with a leading space.

You can combine the egrep and cut parts into one command with sed:
sed -n 's/^From: //gp' $file
sed -n turns off printing by default, and then I am using p in the sed command explicitly to print the lines I want.

You can use sed:
sed 's/^From: *//'
OR awk:
awk -F ': *' '$1=="From"{print $2}'
OR grep -oP
grep -oP '^From: *\K.*'

Here is a Bash one-liner:
grep ^From file.txt | while read -a cols; do echo ${cols[#]:1}; done
See: Handling positional parameters at wiki.bash-hackers.org

cut itself is a very handy tool in bash
cut -d (delimiter character) -f (fields that you want as output)
a single field is given directly as -f 3 ,
range of fields can be selected as -f 5-9
so in your this particular case code would be
egrep '^From:' $file | cut -d\ -f 2-3
the delimiter is space here and can be escaped using a \
-f 1 corresponds to " From " and 2-3 corresponds to " First Last "

Related

How to extract text by unspecified spaces

I'm trying to extract usernames from a text file in one per line format and from my research, it seems like the only way to do it is by spacing commands here's the format:
1 user 3
2 fusrfff 4
3 usrf 12
The only problem is because all of the users are different I can't define a static space amount. There's also the fact the UIDs (first numbers) go from 1-40k. There's a bunch of other information after the user group number too. Can anyone point me in the right direction? Thanks.
awk does not care about the amount of space between fields:
awk '{print $2}' your_file.txt
If you want to go with bash only, read does not care either:
while read uid username other_stuff; do
printf '%s\n' "$name"
done < your_file.txt
First replace spaces by one space. You can use sed 's/ +/ /g' or
tr -s " " < file.txt| cut -d" " -f2
This is using sed
$ cat file.txt| sed "s/ */ /g" | cut -d' ' -f2
user
fusrfff
usrf

How to print the last column of a row only using "grep" and "cut" bash command

I need to parse the line written bold below:
line="eth1 Link encap:Ethernet HWaddr 11:11:11:11:11:11"
This line may have more words unexpectedly such as
line="eth1 Link encap:Ethernet Extra HWaddr 11:11:11:11:11:11"
So, for parsing the MAC address correctly, I need to parse the line accordingly with a bash command.
echo $line | cut -d' ' -f5* works for the first line, while *echo $line | cut -d' ' -f6* works for the second. So, I need to parse only the last column of the line.
However, because of the device restriction, I can only use grep and cut command. Not sed, awk, rev,reverse, etc.
With grep:
echo $line | grep -o -E '[^ ]+$'
With cut, a solution can be made with an extra computation based on the word count, assuming the delimiter is a space:
nw=$(echo $line | wc -w)
echo $line | cut -d ' ' -f$nw-
If the MAC address is the last sequence of characters after a space, you can remove the longest match of "* " (asterisk and a space) pattern using pure Bash:
echo "${line##* }"
You can also extract the last 17 characters from the string:
echo "${line: -17}"
If you want a strict match at the end of the line (due to .*):
echo $(expr match "$line" '.*\(\([a-zA-Z0-9]\{2\}\:\)\{5\}[a-zA-Z0-9]\{2\}\)')
Using GNU grep:
grep -o -P '(?:[a-zA-Z0-9]{2}:){5}[a-zA-Z0-9]{2}' <<< "$line"
In the latter case, you may want to add the $ anchor for the end of the line. Of course, you don't have to use here string. You may want to use a pipe instead: echo "$line" | grep -o -P ....

printing first word in every line of a txt file unix bash

So I'm trying to print the first word in each line of a txt file. The words are separated by one blank.
cut -c 1 txt file
Thats the code I have so far but it only prints the first character of each line.
Thanks
To print a whole word, you want -f 1, not -c 1. And since the default field delimiter is TAB rather than SPACE, you need to use the -d option.
cut -d' ' -f1 filename
To print the last two words not possible with cut, AFAIK, because it can only count from the beginning of the line. Use awk instead:
awk '{print $(NF-1), $NF;}' filename
you can try
awk '{print $1}' your_file
read word _ < file
echo "$word"
What's nice about this solution is it doesn't read beyond the first line of the file. Even awk, which has some very clean, terse syntax, has to be explicitly told to stop reading past the first line. read just reads one line at a time. Plus it's a bash builtin (and a builtin in many shells), so you don't need a new process to run.
If you want to print the first word in each line:
while read word _; do printf '%s\n' "$word"; done < file
But if the file is large then awk or cut will win out for reading every line.
You can use:
cut -d\ -f1 file
Where:
-d is the delimiter (here using \ for a space)
-f is the field selector
Notice that there is a space after the \.
-c is for characters, you want -f for fields, and -d to indicate your separator of space instead of the default tab:
cut -d " " -f 1 file

How to remove the last character from a bash grep output

COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2`
outputs something like this
"Abc Inc";
What I want to do is I want to remove the trailing ";" as well. How can i do that? I am a beginner to bash. Any thoughts or suggestions would be helpful.
This will remove the last character contained in your COMPANY_NAME var regardless if it is or not a semicolon:
echo "$COMPANY_NAME" | rev | cut -c 2- | rev
I'd use sed 's/;$//'. eg:
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | sed 's/;$//'`
foo="hello world"
echo ${foo%?}
hello worl
I'd use head --bytes -1, or head -c-1 for short.
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | head --bytes -1`
head outputs only the beginning of a stream or file. Typically it counts lines, but it can be made to count characters/bytes instead. head --bytes 10 will output the first ten characters, but head --bytes -10 will output everything except the last ten.
NB: you may have issues if the final character is multi-byte, but a semi-colon isn't
I'd recommend this solution over sed or cut because
It's exactly what head was designed to do, thus less command-line options and an easier-to-read command
It saves you having to think about regular expressions, which are cool/powerful but often overkill
It saves your machine having to think about regular expressions, so will be imperceptibly faster
I believe the cleanest way to strip a single character from a string with bash is:
echo ${COMPANY_NAME:: -1}
but I haven't been able to embed the grep piece within the curly braces, so your particular task becomes a two-liner:
COMPANY_NAME=$(grep "company_name" file.txt); COMPANY_NAME=${COMPANY_NAME:: -1}
This will strip any character, semicolon or not, but can get rid of the semicolon specifically, too.
To remove ALL semicolons, wherever they may fall:
echo ${COMPANY_NAME/;/}
To remove only a semicolon at the end:
echo ${COMPANY_NAME%;}
Or, to remove multiple semicolons from the end:
echo ${COMPANY_NAME%%;}
For great detail and more on this approach, The Linux Documentation Project covers a lot of ground at http://tldp.org/LDP/abs/html/string-manipulation.html
Using sed, if you don't know what the last character actually is:
$ grep company_name file.txt | cut -d '=' -f2 | sed 's/.$//'
"Abc Inc"
Don't abuse cats. Did you know that grep can read files, too?
The canonical approach would be this:
grep "company_name" file.txt | cut -d '=' -f 2 | sed -e 's/;$//'
the smarter approach would use a single perl or awk statement, which can do filter and different transformations at once. For example something like this:
COMPANY_NAME=$( perl -ne '/company_name=(.*);/ && print $1' file.txt )
don't have to chain so many tools. Just one awk command does the job
COMPANY_NAME=$(awk -F"=" '/company_name/{gsub(/;$/,"",$2) ;print $2}' file.txt)
In Bash using only one external utility:
IFS='= ' read -r discard COMPANY_NAME <<< $(grep "company_name" file.txt)
COMPANY_NAME=${COMPANY_NAME/%?}
Assuming the quotation marks are actually part of the output, couldn't you just use the -o switch to return everything between the quote marks?
COMPANY_NAME="\"ABC Inc\";" | echo $COMPANY_NAME | grep -o "\"*.*\""
you can strip the beginnings and ends of a string by N characters using this bash construct, as someone said already
$ fred=abcdefg.rpm
$ echo ${fred:1:-4}
bcdefg
HOWEVER, this is not supported in older versions of bash.. as I discovered just now writing a script for a Red hat EL6 install process. This is the sole reason for posting here.
A hacky way to achieve this is to use sed with extended regex like this:
$ fred=abcdefg.rpm
$ echo $fred | sed -re 's/^.(.*)....$/\1/g'
bcdefg
Some refinements to answer above. To remove more than one char you add multiple question marks. For example, to remove last two chars from variable $SRC_IP_MSG, you can use:
SRC_IP_MSG=${SRC_IP_MSG%??}
cat file.txt | grep "company_name" | cut -d '=' -f 2 | cut -d ';' -f 1
I am not finding that sed 's/;$//' works. It doesn't trim anything, though I'm wondering whether it's because the character I'm trying to trim off happens to be a "$". What does work for me is sed 's/.\{1\}$//'.

How to make the 'cut' command treat same sequental delimiters as one?

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut command in the following manner:
cat text.txt | cut -d " " -f 4
Unfortunately, cut doesn't treat several spaces as one delimiter. I could have piped through awk
awk '{ printf $4; }'
or sed
sed -E "s/[[:space:]]+/ /g"
to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively?
Try:
tr -s ' ' <text.txt | cut -d ' ' -f4
From the tr man page:
-s, --squeeze-repeats replace each input sequence of a repeated character
that is listed in SET1 with a single occurrence
of that character
As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.
Let me however go through all the possible combinations for future readers. Explanations are at the Test section.
tr | cut
tr -s ' ' < file | cut -d' ' -f4
awk
awk '{print $4}' file
bash
while read -r _ _ _ myfield _
do
echo "forth field: $myfield"
done < file
sed
sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file
Tests
Given this file, let's test the commands:
$ cat a
this is line 1 more text
this is line 2 more text
this is line 3 more text
this is line 4 more text
tr | cut
$ cut -d' ' -f4 a
is
# it does not show what we want!
$ tr -s ' ' < a | cut -d' ' -f4
1
2 # this makes it!
3
4
$
awk
$ awk '{print $4}' a
1
2
3
4
bash
This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.
$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4
sed
This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.
$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4
shortest/friendliest solution
After becoming frustrated with the too many limitations of cut, I wrote my own replacement, which I called cuts for "cut on steroids".
cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.
One example, out of many, addressing this particular question:
$ cat text.txt
0 1 2 3
0 1 2 3 4
$ cuts 2 text.txt
2
2
cuts supports:
auto-detection of most common field-delimiters in files (+ ability to override defaults)
multi-char, mixed-char, and regex matched delimiters
extracting columns from multiple files with mixed delimiters
offsets from end of line (using negative numbers) in addition to start of line
automatic side-by-side pasting of columns (no need to invoke paste separately)
support for field reordering
a config file where users can change their personal preferences
great emphasis on user friendliness & minimalist required typing
and much more. None of which is provided by standard cut.
See also: https://stackoverflow.com/a/24543231/1296044
Source and documentation (free software): http://arielf.github.io/cuts/
This Perl one-liner shows how closely Perl is related to awk:
perl -lane 'print $F[3]' text.txt
However, the #F autosplit array starts at index $F[0] while awk fields start with $1
With versions of cut I know of, no, this is not possible. cut is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.

Resources