I want to use grep to move the last 7 characters of each line to another file [duplicate] - bash

This question already has answers here:
Is there a cleaner way of getting the last N characters of every line?
(4 answers)
Closed 2 years ago.
I have a text file with thousands of lines. The last 7 characters on each line are a mix of letters and numbers (eg AAP8945 or GGR6645). I want to save these in a separate file.
Excuse the noob question, but I can't work it out.

With GNU grep
Assuming you have GNU grep:
grep -o -E '.{7}$' input > output
The -o option means 'output only what matches' (rather than the whole line). This is the key feature which makes it possible to use grep for the job. Without support for -o (or an equivalent option), grep is the wrong tool for the job.
The -E option is for extended regular expressions, and it means that the . (any character) is matched 7 times and then matches the end of line.
Without GNU grep
If you don't have GNU grep (or a compatible grep with the -o option or equivalent), then you can use sed instead (GNU or any other variant):
sed -e 's/.*\(.\{7\}\)$/\1/' input > output
This matches the start of the line (.*) and captures the last 7 characters (\(…\)) of the line; it replaces the whole with the captured part, and prints the result. If your variant of sed has extended regular expressions (usually -E or sometimes -r), then:
sed -E -e 's/.*(.{7})$/\1/' input > output
The difference is in the number of backslashes needed.
Both of those will print any short lines in their entirety. If those should be omitted, use:
sed -n -e 's/.*\(.\{7\}\)$/\1/p' input > output
sed -n -E -e 's/.*(.{7})$/\1/p' input > output

grep -Eo '.{7}$'
Or without grep:
rev input|cut -c -7|rev >output
The double rev is necessary here because I can not specify a position of the text from the right with cut.

Related

Text processing in bash - extracting information between multiple HTML tags and outputting it into CSV format [duplicate]

I can't figure how to tell sed dot match new line:
echo -e "one\ntwo\nthree" | sed 's/one.*two/one/m'
I expect to get:
one
three
instead I get original:
one
two
three
sed is line-based tool. I don't think these is an option.
You can use h/H(hold), g/G(get).
$ echo -e 'one\ntwo\nthree' | sed -n '1h;1!H;${g;s/one.*two/one/p}'
one
three
Maybe you should try vim
:%s/one\_.*two/one/g
If you use a GNU sed, you may match any character, including line break chars, with a mere ., see :
.
Matches any character, including newline.
All you need to use is a -z option:
echo -e "one\ntwo\nthree" | sed -z 's/one.*two/one/'
# => one
# three
See the online sed demo.
However, one.*two might not be what you need since * is always greedy in POSIX regex patterns. So, one.*two will match the leftmost one, then any 0 or more chars as many as possible, and then the rightmost two. If you need to remove one, then any 0+ chars as few as possible, and then the leftmost two, you will have to use perl:
perl -i -0 -pe 's/one.*?two//sg' file # Non-Unicode version
perl -i -CSD -Mutf8 -0 -pe 's/one.*?two//sg' file # S&R in a UTF8 file
The -0 option enables the slurp mode so that the file could be read as a whole and not line-by-line, -i will enable inline file modification, s will make . match any char including line break chars, and .*? will match any 0 or more chars as few as possible due to a non-greedy *?. The -CSD -Mutf8 part make sure your input is decoded and output re-encoded back correctly.
You can use python this way:
$ echo -e "one\ntwo\nthree" | python -c 'import re, sys; s=sys.stdin.read(); s=re.sub("(?s)one.*two", "one", s); print s,'
one
three
$
This reads the entire python's standard input (sys.stdin.read()), then substitutes "one" for "one.*two" with dot matches all setting enabled (using (?s) at the start of the regular expression) and then prints the modified string (the trailing comma in print is used to prevent print from adding an extra newline).
This might work for you:
<<<$'one\ntwo\nthree' sed '/two/d'
or
<<<$'one\ntwo\nthree' sed '2d'
or
<<<$'one\ntwo\nthree' sed 'n;d'
or
<<<$'one\ntwo\nthree' sed 'N;N;s/two.//'
Sed does match all characters (including the \n) using a dot . but usually it has already stripped the \n off, as part of the cycle, so it no longer present in the pattern space to be matched.
Only certain commands (N,H and G) preserve newlines in the pattern/hold space.
N appends a newline to the pattern space and then appends the next line.
H does exactly the same except it acts on the hold space.
G appends a newline to the pattern space and then appends whatever is in the hold space too.
The hold space is empty until you place something in it so:
sed G file
will insert an empty line after each line.
sed 'G;G' file
will insert 2 empty lines etc etc.
How about two sed calls:
(get rid of the 'two' first, then get rid of the blank line)
$ echo -e 'one\ntwo\nthree' | sed 's/two//' | sed '/^$/d'
one
three
Actually, I prefer Perl for one-liners over Python:
$ echo -e 'one\ntwo\nthree' | perl -pe 's/two\n//'
one
three
Below discussion is based on Gnu sed.
sed operates on a line by line manner. So it's not possible to tell it dot match newline. However, there are some tricks that can implement this. You can use a loop structure (kind of) to put all the text in the pattern space, and then do the operation.
To put everything in the pattern space, use:
:a;N;$!ba;
To make "dot match newline" indirectly, you use:
(\n|.)
So the result is:
root#u1804:~# echo -e "one\ntwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#
Note that in this case, (\n|.) matches newline and all characters. See below example:
root#u1804:~# echo -e "oneXXXXXX\nXXXXXXtwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#

How to grep information?

What I have:
test
more text
#user653434 text and so
test
more text
#user9659333 text and so
I'd like to filter this text and finally get the following list as .txt file:
user653434
user9659333
It's important to get the names without "#" sign.
Thx for help ;)
Using grep -P (requires GNU grep):
$ grep -oP '(?<=#)\w+' File
user653434
user9659333
-o tells grep to print only the match.
-P tells grep to use Perl-style regular expressions.
(?<=#) tells sed that # must precede the match but the # is not included in the match.
\w+ matches one or more word characters. This is what grep will print.
To change the file in place with grep:
grep -oP '(?<=#)\w+' File >tmp && mv tmp File
Using sed
$ sed -En 's/^#([[:alnum:]]+).*/\1/p' File
user653434
user9659333
And, to change the file in place:
sed -En -i.bak 's/^#([[:alnum:]]+).*/\1/p' File
-E tells sed to use the extended form of regular expressions. This reduces the need to use escapes.
-n tells sed not to print anything unless we explicitly ask it to.
-i.bak tells sed to change the file in place while leaving a backup file with the extension .bak.
The leading s in s/^#([[:alnum:]]+).*/\1/p tells sed that we are using a substitute command. The command has the typical form s/old/new/ where old is a regular expression and sed replaces old with new. The trailing p is an option to the substitute command: the p tells sed to print the resulting line.
In our case, the old part is ^#([[:alnum:]]+).*. Starting from the beginning of the line, ^, this matches # followed by one or more alphanumeric characters, ([[:alnum:]]+), followed by anything at all, .*. Because the alphanumeric characters are placed in parens, this is saved as a group, denoted \1.
The new part of the substitute command is just \1, the alphanumeric characters from above which comprise the user name.
Here, the s indicates that we are using a sed substitute command. The usual form
With GNU grep:
grep -Po '^#\K[^ ]*' file
Output:
user653434
user9659333
See: The Stack Overflow Regular Expressions FAQ

sed combine two search and replace [duplicate]

This question already has answers here:
Combining two sed commands
(2 answers)
Closed 1 year ago.
I am currently making a command that grabs information from iwconfig, grep's a certain line, cuts a portion and then runs two sed search and replace functions so I can pipe it's output elsewhere. The command currently is as follows:
iwconfig wlan0 | grep ESSID | cut -c32-50 | sed 's/ //g' | sed 's/"//g'
The output comes out as intended, removing whitespace and "'s, but I am wondering if there is a way to condense my search and replace into a single command, preferably with an and / or operator. Is there a way to do this? And how would the sed command be written if so? Thanks!
You haven't shown what iwconfig produces in your case, but, on my system, the following successfully extracts the ESSID:
iwconfig wlan0 | sed -n 's/.*ESSID://p'
If there really are spaces and quotes that need to be removed, then try:
iwconfig wlan0 | sed -n 's/[ "]//g; s/.*ESSID://p'
How it works
-n
This tells sed not to print any line unless we explicitly ask it to.
s/[ "]//g
This removes spaces and double-quotes.
s/.*ESSID://p
This removes everything up to and including ESSID:. If a substitution is made, meaning that this line contains ESSID:, then print it.
Example
$ echo '"something" ESSID:"my id"' | sed -n 's/[ "]//g; s/.*ESSID://p'
myid
regexp1\|regexp2
Matches either regexp1 or regexp2. Use parentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. It is a GNU extension.
sed 's/ \|"//g'
should work
With GNU awk for gensub():
iwconfig wlan0 | awk '/ESSID/{print gensub(/[ "]/,"","g",substr($0,32,19))}'
There MAY be a simpler method but without sample input/output (i.e. output from iwconfig and what you want the script to output) I'm not going to guess...

Grep for URL parsing - bash script programming

I am trying to learn some bash scripting and i can't understand how to use grep in order to split a URL link for example :
blabla1.com
blabla2.gov
blabla3.fr
I just want to keep com , gov and fr ( without the '.' character) ignore whats before '.'
Thanks in advance ..
Grep is a tool for matching text. You need something else if you want to transform text. If you have the values in question in a bash variable, then what you ask is pretty easy:
authority=blabla.com
# Here's the important bit:
domain=${authority/*./}
echo $domain
The funny syntax in the middle evaluates to the result of a pattern substitution on the value of variable temp.
If you're trying to do this on lines of a file, then the sed program is your friend:
sed 's/.*\.//' < input.file
This is again a pattern substitution, but sed uses regular expression patterns, whereas bash uses shell glob patterns.
grep -E -o '[^.]+$' < input
-o instructs grep to print only the matching part of the line
-E switches on extended regexp which is needed for + quantifier
[^.]+$ means any character which is not a dot at the end of the line
Try this way:
grep -o -E '[a-z]{2,3}\b' input > output
-o, --only-matching: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
$ cat input
blabla1.com
blabla2.gov
blabla3.fr
$ cat output
com
gov
fr
$ cut -d. -f2 file
com
gov
fr
If that's not all you need, post some more truly representative input and expected output so we can help you find the right solution.

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources