Delete first characters off of a line in a file with awk or grep - bash

I'm attempting to remove a certain pattern from a line, but not the entire line itself. An example would be:
Original:
user=dannyBoy
Desired:
dannyBoy
I have a file that is full of lines like that, so I was wondering how I would be able to cut a specific part of the text off, whether that be just removing the first five characters from the list or searching for the pattern "user=" and removing it.

There are many ways to do this:
cut -d'=' -f2- file
sed 's/^[^=]*//' file
awk -F= '{print $2}' file #if just one = is present
cut sets a delimiter (-d'=) and then prints all the fields starting from the 2nd one (-f2-).
sed looks for all the content from the beginning up to the first = and removes it.
awk sets = as field separator and prints the second field.

Using ex:
echo user=dannyBoy | ex -s +"norm df=" +%p -cq! /dev/stdin
where ex is equivalent to vi -e/vim -e which basically executes vi command: df= (delete until finds =), then print the buffer (%p).
If you've multiple lines like that, then it would be simpler by using substitution:
ex -s +"%s/^.*=//g" +%p -cq! foo.txt
To edit file in place, change -cq! to -cwq.

The command below deletes the first 5 characters:
$ echo "user=dannyboy" | cut -c 6-
You can use it on a file with cut -c 6- inputfilename as well.

Related

how to grep everything between single quotes?

I am having trouble figuring out how to grep the characters between two single quotes .
I have this in a file
version: '8.x-1.0-alpha1'
and I like to have the output like this (the version numbers can be various):
8.x-1.0-alpha1
I wrote the following but it does not work:
cat myfile.txt | grep -e 'version' | sed 's/.*\?'\(.*?\)'.*//g'
Thank you for your help.
Addition:
I used the sed command sed -n "s#version:\s*'\(.*\)'#\1#p"
I also like to remove 8.x- which I edited to sed -n "s#version:\s*'8.x-\(.*\)'#\1#p".
This command only works on linux and it does not work on MAC. How to change this command to make it works on MAC?
sed -n "s#version:\s*'8.x-\(.*\)'#\1#p"
If you just want to have that information from the file, and only that you can quickly do:
awk -F"'" '/version/{print $2}' file
Example:
$ echo "version: '8.x-1.0-alpha1'" | awk -F"'" '/version/{print $2}'
8.x-1.0-alpha1
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of commands.
-F "'": Here we tell awk to define the field separator FS to be a <single quote> '. This means the all lines will be split in fields $1, $2, ... ,$NF and between each field there is a '. We can now reference these fields by using $1 for the first field, $2 for the second ... etc and this till $NF where NF is the total number of fields per line.
/version/{print $2}: This is the condition-action pair.
condition: /version/:: The condition reads: If a substring in the current record/line matches the regular expression /version/ then do action. Here, this is simply translated as if the current line contains a substring version
action: {print $2}:: If the previous condition is satisfied, then print the second field. In this case, the second field would be what the OP requests.
There are now several things that can be done.
Improve the condition to be /^version :/ && NF==3 which reads _If the current line starts with the substring version : and the current line has 3 fields then do action
If you only want the first occurance, you can tell the system to exit immediately after the find by updating the action to {print $2; exit}
I'd use GNU grep with pcre regexes:
grep -oP "version: '\\K.*(?=')" file
where we are looking for "version: '" and then the \K directive will forget what it just saw, leaving .*(?=') to match up to the last single quote.
Try something like this: sed -n "s#version:\s*'\(.*\)'#\1#p" myfile.txt. This avoids the redundant cat and grep by finding the "version" line and extracting the contents between the single quotes.
Explanation:
the -n flag tells sed not to print lines automatically. We then use the p command at the end of our sed pattern to explicitly print when we've found the version line.
Search for pattern: version:\s*'\(.*\)'
version:\s* Match "version:" followed by any amount of whitespace
'\(.*\)' Match a single ', then capture everything until the next '
Replace with: \1; This is the first (and only) capture group above, containing contents between single quotes.
When your only want to look at he quotes, you can use cut.
grep -e 'version' myfile.txt | cut -d "'" -f2
grep can almost do this alone:
grep -o "'.*'" file.txt
But this may also print lines you don't want to: it will print all lines with 2 single quotes (') in them. And the output still has the single quotes (') around it:
'8.x-1.0-alpha1'
But sed alone can do it properly:
sed -rn "s/^version: +'([^']+)'.*/\1/p" file.txt

bash, text file remove all text in each line before the last space

I have a file with a format like this:
First Last UID
First Middle Last UID
Basically, some names have middle names (and sometimes more than one middle name). I just want a file that only as UIDs.
Is there a sed or awk command I can run that removes everything before the last space?
awk
Print the last field of each line using awk.
The last field is indexed using the NF variable which contains the number of fields for each line. We index it using a dollar sign, the resulting one-liner is easy.
awk '{ print $NF }' file
rs, cat & tail
Another way is to transpose the content of the file, then grab the last line and transpose again (this is fairly easy to see).
The resulting pipe is:
cat file | rs -T | tail -n1 | rs -T
cut & rev
Using cut and rev we could also achieve this goal by reversing the lines, cutting the first field and then reverse it again.
rev file | cut -d ' ' -f1 | rev
sed
Using sed we simply remove all chars until a space is found with the regex ^.* [^ ]*$. This regex means match the beginning of the line ^, followed by any sequence of chars .* and a space . The rest is a sequence of non spaces [^ ]* until the end of the line $. The sed one-liner is:
sed 's/^.* \([^ ]*\)$/\1/' file
Where we capture the last part (in between \( and \)) and sub it back in for the entire line. \1 means the first group caught, which is the last field.
Notes
As Ed Norton cleverly pointed out we could simply not catch the group and remove the former part of the regex. This can be as easily achieved as
sed 's/.* //' file
Which is remarkably less complicated and more elegant.
For more information see man sed and man awk.
Using grep:
$ grep -o '[^[:blank:]]*$' file
UID
UID
-o tells grep to print only the matching part. The regex [^[:blank:]]*$ matches the last word on the line.

printing first word in every line of a txt file unix bash

So I'm trying to print the first word in each line of a txt file. The words are separated by one blank.
cut -c 1 txt file
Thats the code I have so far but it only prints the first character of each line.
Thanks
To print a whole word, you want -f 1, not -c 1. And since the default field delimiter is TAB rather than SPACE, you need to use the -d option.
cut -d' ' -f1 filename
To print the last two words not possible with cut, AFAIK, because it can only count from the beginning of the line. Use awk instead:
awk '{print $(NF-1), $NF;}' filename
you can try
awk '{print $1}' your_file
read word _ < file
echo "$word"
What's nice about this solution is it doesn't read beyond the first line of the file. Even awk, which has some very clean, terse syntax, has to be explicitly told to stop reading past the first line. read just reads one line at a time. Plus it's a bash builtin (and a builtin in many shells), so you don't need a new process to run.
If you want to print the first word in each line:
while read word _; do printf '%s\n' "$word"; done < file
But if the file is large then awk or cut will win out for reading every line.
You can use:
cut -d\ -f1 file
Where:
-d is the delimiter (here using \ for a space)
-f is the field selector
Notice that there is a space after the \.
-c is for characters, you want -f for fields, and -d to indicate your separator of space instead of the default tab:
cut -d " " -f 1 file

Display all fields except the last

I have a file as show below
1.2.3.4.ask
sanma.nam.sam
c.d.b.test
I want to remove the last field from each line, the delimiter is . and the number of fields are not constant.
Can anybody help me with an awk or sed to find out the solution. I can't use perl here.
Both these sed and awk solutions work independent of the number of fields.
Using sed:
$ sed -r 's/(.*)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
Note: -r is the flag for extended regexp, it could be -E so check with man sed. If your version of sed doesn't have a flag for this then just escape the brackets:
sed 's/\(.*\)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
The sed solution is doing a greedy match up to the last . and capturing everything before it, it replaces the whole line with only the matched part (n-1 fields). Use the -i option if you want the changes to be stored back to the files.
Using awk:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file
1.2.3.4
sanma.nam
c.d.b
The awk solution just simply prints n-1 fields, to store the changes back to the file use redirection:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file > tmp && mv tmp file
Reverse, cut, reverse back.
rev file | cut -d. -f2- | rev >newfile
Or, replace from last dot to end with nothing:
sed 's/\.[^.]*$//' file >newfile
The regex [^.] matches one character which is not dot (or newline). You need to exclude the dot because the repetition operator * is "greedy"; it will select the leftmost, longest possible match.
With cut on the reversed string
cat youFile | rev |cut -d "." -f 2- | rev
If you want to keep the "." use below:
awk '{gsub(/[^\.]*$/,"");print}' your_file

Delete all lines beginning with a # from a file

All of the lines with comments in a file begin with #. How can I delete all of the lines (and only those lines) which begin with #? Other lines containing #, but not at the beginning of the line should be ignored.
This can be done with a sed one-liner:
sed '/^#/d'
This says, "find all lines that start with # and delete them, leaving everything else."
I'm a little surprised nobody has suggested the most obvious solution:
grep -v '^#' filename
This solves the problem as stated.
But note that a common convention is for everything from a # to the end of a line to be treated as a comment:
sed 's/#.*$//' filename
though that treats, for example, a # character within a string literal as the beginning of a comment (which may or may not be relevant for your case) (and it leaves empty lines).
A line starting with arbitrary whitespace followed by # might also be treated as a comment:
grep -v '^ *#' filename
if whitespace is only spaces, or
grep -v '^[ ]#' filename
where the two spaces are actually a space followed by a literal tab character (type "control-v tab").
For all these commands, omit the filename argument to read from standard input (e.g., as part of a pipe).
The opposite of Raymond's solution:
sed -n '/^#/!p'
"don't print anything, except for lines that DON'T start with #"
you can directly edit your file with
sed -i '/^#/ d'
If you want also delete comment lines that start with some whitespace use
sed -i '/^\s*#/ d'
Usually, you want to keep the first line of your script, if it is a sha-bang, so sed should not delete lines starting with #!. also it should delete lines, that just contain only a hash but no text. put it all together:
sed -i '/^\s*\(#[^!].*\|#$\)/d'
To be conform with all sed variants you need to add a backup extension to the -i option:
sed -i.bak '/^\s*#/ d' $file
rm -Rf $file.bak
You can use the following for an awk solution -
awk '/^#/ {sub(/#.*/,"");getline;}1' inputfile
This answer builds upon the earlier answer by Keith.
egrep -v "^[[:blank:]]*#" should filter out comment lines.
egrep -v "^[[:blank:]]*(#|$)" should filter out both comments and empty lines, as is frequently useful.
For information about [:blank:] and other character classes, refer to https://en.wikipedia.org/wiki/Regular_expression#Character_classes.
If you want to delete from the file starting with a specific word, then do this:
grep -v '^pattern' currentFileName > newFileName && mv newFileName currentFileName
So we have removed all the lines starting with a pattern, writing the content into a new file, and then copy the content back into the source/current file.
You also might want to remove empty lines as well
sed -E '/(^$|^#)/d' inputfile
Delete all empty lines and also all lines starting with a # after any spaces:
sed -E '/^$|^\s*#/d' inputfile
For example, see the following 3 deleted lines (including just line numbers!):
1. # first comment
2.
3. # second comment
After testing the command above, you can use option -i to edit the input file in place.
Just this!
Here is it with a loop for all files with some extension:
ll -ltr *.filename_extension > list.lst
for i in $(cat list.lst | awk '{ print $8 }') # validate if it is the 8 column on ls
do
echo $i
sed -i '/^#/d' $i
done

Resources