Remove certain characters or keywords from a TXT file in bash - bash

I was wondering if there was a way to remove certain keywords from a text file, say I have a large file with lines saying
My name is John
My name is Peter
My name is Joe
Would there be a way to remove "My name is" without removing the entire line? Could this be done with grep somehow? I tried to find a solution but pretty much all of the ones I came across simply focus on deleting entire lines. Even if I could delete the text up until a certain column, that would fix my issue.

You need a text processing tool like sed or awk to do this, but not grep.
Try this:
sed 's/My name is//g' file
EDIT
Purpose of grep:
$ man grep | grep -A2 DESCRIPTION
DESCRIPTION
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a
match to the given PATTERN. By default, grep prints the matching lines.

With GNU grep:
grep -Po "My name is\K.*" file
Output with a leading white space:
John
Peter
Joe
-P: Interpret PATTERN as a Perl regular expression
-o: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
\K: Remove matched part before \K.

try with one more simple grep.
grep -o '[^ ]*$' Input_file
-o will print only matched part of line, now in regex where it will look for text from last space to till last of the line.

An awk solution which first removes empty
lines and then prints last field.
awk '!/^$/{print $NF}' file
John
Peter
Joe

Using cut:
cut -d' ' -f4 input_file
GNU cut features a complement option, used to remove the area specified with -f. If the input_file had surnames such as "My name is John Doe", the previous code would print "John", and this would print "John Doe":
cut --complement -d' ' -f1-3 input_file
cut needs less memory, compared to other utils:
# these numbers will vary by *nix version and disto...
wc -c `which cut sed awk grep` | head -n -1 | sort -n
43224 /usr/bin/cut
109000 /bin/sed
215360 /bin/grep
662240 /usr/bin/awk

Related

shell script cut from variables

The file is like this
aaa&123
bbb&234
ccc&345
aaa&456
aaa$567
bbb&678
I want to output:(contain "aaa" and text after &)
123
456
I want to do in in shell script,
Follow code be consider
#!/bin/bash
raw=$(grep 'aaa' 1.txt)
var=$(cut -f2 -d"&" "$raw")
echo $var
It give me a error like
cut: aaa&123
aaa&456
aaa$567: No such file or directory
How to fix it? and how to cut (or grep or other) from exist variables?
Many thanks!
With GNU grep:
grep -oP 'aaa&\K.*' file
Output:
123
456
\K: ignore everything before pattern matching and ignore pattern itself
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-P, --perl-regexp
Interpret PATTERN as a Perl compatible regular expression (PCRE)
Cyrus has my vote. An awk alternative if GNU grep is not available:
awk -F'&' 'NF==2 && $1 ~ /aaa/ {print $2}' file
Using & as the field separator, for lines with 2 fields (i.e. & must be present) and the first field contains "aaa", print the 2nd field.
The error with your answer is that you are treating the grep output like a filename in the cut command. What you want is this:
grep 'aaa.*&' file | cut -d'&' -f2
The pattern means "aaa appears before an &"

bash, text file remove all text in each line before the last space

I have a file with a format like this:
First Last UID
First Middle Last UID
Basically, some names have middle names (and sometimes more than one middle name). I just want a file that only as UIDs.
Is there a sed or awk command I can run that removes everything before the last space?
awk
Print the last field of each line using awk.
The last field is indexed using the NF variable which contains the number of fields for each line. We index it using a dollar sign, the resulting one-liner is easy.
awk '{ print $NF }' file
rs, cat & tail
Another way is to transpose the content of the file, then grab the last line and transpose again (this is fairly easy to see).
The resulting pipe is:
cat file | rs -T | tail -n1 | rs -T
cut & rev
Using cut and rev we could also achieve this goal by reversing the lines, cutting the first field and then reverse it again.
rev file | cut -d ' ' -f1 | rev
sed
Using sed we simply remove all chars until a space is found with the regex ^.* [^ ]*$. This regex means match the beginning of the line ^, followed by any sequence of chars .* and a space . The rest is a sequence of non spaces [^ ]* until the end of the line $. The sed one-liner is:
sed 's/^.* \([^ ]*\)$/\1/' file
Where we capture the last part (in between \( and \)) and sub it back in for the entire line. \1 means the first group caught, which is the last field.
Notes
As Ed Norton cleverly pointed out we could simply not catch the group and remove the former part of the regex. This can be as easily achieved as
sed 's/.* //' file
Which is remarkably less complicated and more elegant.
For more information see man sed and man awk.
Using grep:
$ grep -o '[^[:blank:]]*$' file
UID
UID
-o tells grep to print only the matching part. The regex [^[:blank:]]*$ matches the last word on the line.

Delete first characters off of a line in a file with awk or grep

I'm attempting to remove a certain pattern from a line, but not the entire line itself. An example would be:
Original:
user=dannyBoy
Desired:
dannyBoy
I have a file that is full of lines like that, so I was wondering how I would be able to cut a specific part of the text off, whether that be just removing the first five characters from the list or searching for the pattern "user=" and removing it.
There are many ways to do this:
cut -d'=' -f2- file
sed 's/^[^=]*//' file
awk -F= '{print $2}' file #if just one = is present
cut sets a delimiter (-d'=) and then prints all the fields starting from the 2nd one (-f2-).
sed looks for all the content from the beginning up to the first = and removes it.
awk sets = as field separator and prints the second field.
Using ex:
echo user=dannyBoy | ex -s +"norm df=" +%p -cq! /dev/stdin
where ex is equivalent to vi -e/vim -e which basically executes vi command: df= (delete until finds =), then print the buffer (%p).
If you've multiple lines like that, then it would be simpler by using substitution:
ex -s +"%s/^.*=//g" +%p -cq! foo.txt
To edit file in place, change -cq! to -cwq.
The command below deletes the first 5 characters:
$ echo "user=dannyboy" | cut -c 6-
You can use it on a file with cut -c 6- inputfilename as well.

Extract all characters after a match - shell script

I am in need to extract all characters after a pattern match.
For example ,
NAME=John
Age=16
I need to extract all characters after "=". Output should be like
John
16
I cant go with perl or Jython for this purpose because of some restrictions.
I tried with grep , but to my knowledge I came as shown below only
echo "NAME=John" |grep -o -P '=.{0,}'
You were pretty close:
grep -oP '(?<=\w=)\w+' file
makes it.
Explanation
it looks for any word after word= and prints it.
-o stands for "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line".
-P stands for "Interpret PATTERN as a Perl regular expression".
(?<=\w=)\w+ means: match only \w+ following word=. More info in [Regex tutorial - Lookahead][1] and in [this nice explanation by sudo_O][2].
Test
$ cat file
NAME=John
Age=16
$ grep -oP '(?<=\w=)\w+' file
John
16
One sed solution
sed -ne 's/.*=//gp' <filename>
another awk solution
awk -F= '$0=$2' <filename>
Explanation:
in sed we remove anything from the beginning of a line till a = and print the rest.
in awk we break the string in 2 parts, separated by =, now after that $0=$2 is making replacing the whole string with the second portion

Display all fields except the last

I have a file as show below
1.2.3.4.ask
sanma.nam.sam
c.d.b.test
I want to remove the last field from each line, the delimiter is . and the number of fields are not constant.
Can anybody help me with an awk or sed to find out the solution. I can't use perl here.
Both these sed and awk solutions work independent of the number of fields.
Using sed:
$ sed -r 's/(.*)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
Note: -r is the flag for extended regexp, it could be -E so check with man sed. If your version of sed doesn't have a flag for this then just escape the brackets:
sed 's/\(.*\)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
The sed solution is doing a greedy match up to the last . and capturing everything before it, it replaces the whole line with only the matched part (n-1 fields). Use the -i option if you want the changes to be stored back to the files.
Using awk:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file
1.2.3.4
sanma.nam
c.d.b
The awk solution just simply prints n-1 fields, to store the changes back to the file use redirection:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file > tmp && mv tmp file
Reverse, cut, reverse back.
rev file | cut -d. -f2- | rev >newfile
Or, replace from last dot to end with nothing:
sed 's/\.[^.]*$//' file >newfile
The regex [^.] matches one character which is not dot (or newline). You need to exclude the dot because the repetition operator * is "greedy"; it will select the leftmost, longest possible match.
With cut on the reversed string
cat youFile | rev |cut -d "." -f 2- | rev
If you want to keep the "." use below:
awk '{gsub(/[^\.]*$/,"");print}' your_file

Resources