How to use grep to extract a text between two patterns - bash

I am working with bash script.
There is a file and inside the file, it is like this:
Hello /hi/12349/Jane?
Hello /hi/123?=Jane/
Hello /hey/123450/Jane
Hello /hi/123/Jane
And I want to extract any digits between "Hello /hi/" and "/", and between "Hello /hi/" and "/" there only should be digits.
So in this case the result I want is:
12349
123
I have tried this:
cat file.txt | grep -o -P '(?>=Hello \/hi\/).*(?=\/)'
But what I have tried printed out everything after "Hello /hi" :(

You can use sed for that:
sed -nE 's|^Hello /hi/([0-9][0-9]*)/.*|\1|p' file
12349
123

With GNU grep:
grep -Po '(?<=^Hello /hi/)[0-9]+(?=/)' file.txt
Output:
12349
123

Extracting things seems like a case to use a Stream EDitor.
sed -n '/Hello \/hi\/\([0-9]*\)\.*//s//\1/p' file.txt

Another possibility is an extended regex ( -E )...
grep -o -E "[^][a-zA-Z /?=]{1,8}" grep.txt
...that use a pattern of unwanted chars and [^] means in this case: not
puts out...
12349
123
123450
123

Related

How do I replace text using a variable in a shell script

I have a variable with a bunch of data.
text = "ABCDEFGHIJK"
file = garbage.txt //iiuhdsfiuhdsihf]sdiuhdfoidsoijsf
What I would like to do is replace the ] charachter in file with text. I've tried using sed but I keep getting odd errors.
output should be:
//iiuhdsfiuhdsihfABCDEFGHIJKsdiuhdfoidsoijsf
Just need to escape the ] character with a \ in regex:
text="ABCDEFGHIJK"
sed "s/\(.*\)\]\(.*\)/\1$text\2/" file > file.changed
or, for in-place editing:
sed -i "s/\(.*\)\]\(.*\)/\1$text\2/" file
Test:
sed "s/\(.*\)\]\(.*\)/\1$text\2/" <<< "iiuhdsfiuhdsihf]sdiuhdfoidsoijsf"
# output => iiuhdsfiuhdsihfABCDEFGHIJKsdiuhdfoidsoijsf
There is always the bash way that should work in your osx:
filevar=$(cat file)
echo "${filevar/]/$text}" #to replace first occurence
OR
echo "${filevar//]/$text}" #to replace all occurences
In my bash i don't even have to escape ].
By the way, the simple sed does not work?
$ a="AA"
$ echo "garbage.txt //iiuhdsfiuhdsihf]sdiuhdfoidsoijsf" |sed "s/]/$a/g"
garbage.txt //iiuhdsfiuhdsihfAAsdiuhdfoidsoijsf

How to using sed in pipelines

I have file, and I need replace one word in it.
I can find this word with many grep's and pipelines.
cat file | <many times grep here>
for example:
>> cat test.cpp | grep -o "\-\-window [0-9]* \"" | grep -o "[0-9]*"
>> 71303214
How can i change result number in this pipeline?
You can use look-ahead and look-behind as follows:
grep -Po '(?<=\-\-window )\d*(?= \")' file
It looks for digits (\d*) in between a block of --window_ and _" (_ stands for space).
If you want to replace with sed, use:
sed 's/--window \([0-9]*\) \"/\1/g' file
It looks for --window_digits_* and replaces them with digits (_ stands for space).
Update
If you want to replace it with another number, do:
sed 's/--window [0-9]* \"/new_number/g' file
Or you can even use a bash variable if you use double quotes (with single, it wouldn't expand the variable).
sed "s/--window [0-9]* \"/$new_number/g" file
Test
$ cat a
hello --window 234234 " a
hello --window 234234a " a
hello this is another thing
$ grep -Po '(?<=\-\-window )\d*(?= \")' a
234234
$ sed 's/--window \([0-9]*\) \"/\1/g' a
hello 234234 a
hello --window 234234a " a
hello this is another thing
$ sed 's/--window [0-9]* \"/XXX/g' a
hello XXX a
hello --window 234234a " a
hello this is another thing
$ number=22
$ sed "s/\(--window \)[0-9]*\( \"\)/\1$number\2/g" a
hello --window 22 " a
hello --window 234234a " a
hello this is another thing
sed -n '/--window [0-9]* \"/ {
s/[^[:digit:]]//gp
}' file
sed with file as input (so no need of cat pipe before)
use of -n for not printing output unless specific request (P in command)
first pattern between // (regex reduced so you need to escape some meta char like \ / . *)
(here) extraction of digit by removing other char of the line and print the result if it occur
As you see, grep is better in this case (more readeable and efficient)
I THINK all you're asking for is:
$ cat file
#define CLICK(x,y) system("xdotool mousemove --window 71303214 " #x" "#y " click 1");
$ var="888999"; sed "s/\(.*--window \)[^ ]*/\1$var/" file
#define CLICK(x,y) system("xdotool mousemove --window 888999 " #x" "#y " click 1");
If that's not it, update your question to show some representative sample input and expected output.

Replace substring with sed

I have a string like this :
test:blabla
And with sed, I want to replace what's after the ':' with something else.
I can manage to replace a word, but not the one after the ':'. I searched on the internet for the answer, but I didn't found anything.
Any help ?
Use: sed 's/:.*/:replaceword/'
$ echo test:blabla | sed 's/:.*/:replaceword/'
test:replaceword
Or for the situation test test:blabla test where you only want to replace the word following : use sed 's/:[^ ]*/:replaceword/':
$ echo "test test:blabla test" | sed 's/:[^ ]*/:replaceword/'
test test:replaceword test
# Use the g flag for multiple matches on a line
$ echo "test test:blabla test test:blah2" | sed 's/:[^ ]*/:replaceword/g'
test test:replaceword test test:replaceword
> echo $SER2
test:blabla
> echo $SER2 | sed 's/\([^:]*:\).*/\1replace/g'
test:replace
>

extract substring from lines using grep, awk,sed or etc

I have a files with many lines like:
lily weisy
I want to extract www.youtube.com/user/airuike and lily weisy, and then I also want to separate airuike from www.youtube.com/user/
so I want to get 3 strings: www.youtube.com/user/airuike, airuike and lily weisy
how to achieve this? thanks
do this:
sed -e 's/.*href="\([^"]*\)".*>\([^<]*\)<.*/link:\1 name:\2/' < data
will give you the first part. But I'm not sure what you are doing with it after this.
Since it is html, and html should be parsed with a html parser and not with grep/sed/awk, you could use the pattern matching function of my Xidel.
xidel yourfile.html -e '<a class="yt-uix-sessionlink yt-user-name " dir="ltr">{$link := #href, $user := substring-after($link, "www.youtube.com/user/"), $name:=text()}</a>*'
Or if you want a CSV like result:
xidel yourfile.html -e '<a class="yt-uix-sessionlink yt-user-name " dir="ltr">{string-join((#href, substring-after(#href, "www.youtube.com/user/"), text()), ", ")}</a>*' --hide-variable-names
It is kind of sad, that you also want to have the airuike string, otherwise it could be as simple as
xidel /yourfile.html -e '{$name}*'
(and you were supposed to be able to use xidel '{$name}*', but it seems I haven't thought the syntax through. Just one error check and it is breaking everything. )
$ awk '{split($0,a,/(["<>]|:\/\/)/); u=a[4]; sub(/.*\//,"",a[4]); print u,a[4],a[12]}' file
www.youtube.com/user/airuike airuike lily weisy
I think something like this must work
while read line
do
href=$(echo $line | grep -o 'http[^"]*')
user=$(echo $href | grep -o '[^/]*$')
text=$(echo $line | grep -o '[^>]*<\/a>$' | grep -o '^[^<]*')
echo href: $href
echo user: $user
echo text: $text
done < yourfile
Regular expressions basics: http://en.wikipedia.org/wiki/Regular_expression#POSIX_Basic_Regular_Expressions
Upd: checked and fixed

How to concatenate stdin and a string?

How to I concatenate stdin to a string, like this?
echo "input" | COMMAND "string"
and get
inputstring
A bit hacky, but this might be the shortest way to do what you asked in the question (use a pipe to accept stdout from echo "input" as stdin to another process / command:
echo "input" | awk '{print $1"string"}'
Output:
inputstring
What task are you exactly trying to accomplish? More context can get you more direction on a better solution.
Update - responding to comment:
#NoamRoss
The more idiomatic way of doing what you want is then:
echo 'http://dx.doi.org/'"$(pbpaste)"
The $(...) syntax is called command substitution. In short, it executes the commands enclosed in a new subshell, and substitutes the its stdout output to where the $(...) was invoked in the parent shell. So you would get, in effect:
echo 'http://dx.doi.org/'"rsif.2012.0125"
use cat - to read from stdin, and put it in $() to throw away the trailing newline
echo input | COMMAND "$(cat -)string"
However why don't you drop the pipe and grab the output of the left side in a command substitution:
COMMAND "$(echo input)string"
I'm often using pipes, so this tends to be an easy way to prefix and suffix stdin:
echo -n "my standard in" | cat <(echo -n "prefix... ") - <(echo " ...suffix")
prefix... my standard in ...suffix
There are some ways of accomplish this, i personally think the best is:
echo input | while read line; do echo $line string; done
Another can be by substituting "$" (end of line character) with "string" in a sed command:
echo input | sed "s/$/ string/g"
Why i prefer the former? Because it concatenates a string to stdin instantly, for example with the following command:
(echo input_one ;sleep 5; echo input_two ) | while read line; do echo $line string; done
you get immediatly the first output:
input_one string
and then after 5 seconds you get the other echo:
input_two string
On the other hand using "sed" first it performs all the content of the parenthesis and then it gives it to "sed", so the command
(echo input_one ;sleep 5; echo input_two ) | sed "s/$/ string/g"
will output both the lines
input_one string
input_two string
after 5 seconds.
This can be very useful in cases you are performing calls to functions which takes a long time to complete and want to be continuously updated about the output of the function.
You can do it with sed:
seq 5 | sed '$a\6'
seq 5 | sed '$ s/.*/\0 6/'
In your example:
echo input | sed 's/.*/\0string/'
I know this is a few years late, but you can accomplish this with the xargs -J option:
echo "input" | xargs -J "%" echo "%" "string"
And since it is xargs, you can do this on multiple lines of a file at once. If the file 'names' has three lines, like:
Adam
Bob
Charlie
You could do:
cat names | xargs -n 1 -J "%" echo "I like" "%" "because he is nice"
Also works:
seq -w 0 100 | xargs -I {} echo "string "{}
Will generate strings like:
string 000
string 001
string 002
string 003
string 004
...
The command you posted would take the string "input" use it as COMMAND's stdin stream, which would not produce the results you are looking for unless COMMAND first printed out the contents of its stdin and then printed out its command line arguments.
It seems like what you want to do is more close to command substitution.
http://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html#Command-Substitution
With command substitution you can have a commandline like this:
echo input `COMMAND "string"`
This will first evaluate COMMAND with "string" as input, and then expand the results of that commands execution onto a line, replacing what's between the ‘`’ characters.
cat will be my choice: ls | cat - <(echo new line)
With perl
echo "input" | perl -ne 'print "prefix $_"'
Output:
prefix input
A solution using sd (basically a modern sed; much easier to use IMO):
# replace '$' (end of string marker) with 'Ipsum'
# the `e` flag disables multi-line matching (treats all lines as one)
$ echo "Lorem" | sd --flags e '$' 'Ipsum'
Lorem
Ipsum#no new line here
You might observe that Ipsum appears on a new line, and the output is missing a \n. The reason is echo's output ends in a \n, and you didn't tell sd to add a new \n. sd is technically correct because it's doing exactly what you are asking it to do and nothing else.
However this may not be what you want, so instead you can do this:
# replace '\n$' (new line, immediately followed by end of string) by 'Ipsum\n'
# don't forget to re-add the `\n` that you removed (if you want it)
$ echo "Lorem" | sd --flags e '\n$' 'Ipsum\n'
LoremIpsum
If you have a multi-line string, but you want to append to the end of each individual line:
$ ls
foo bar baz
$ ls | sd '\n' '/file\n'
bar/file
baz/file
foo/file
I want to prepend my sql script with "set" statement before running it.
So I echo the "set" instruction, then pipe it to cat. Command cat takes two parameters : STDIN marked as "-" and my sql file, cat joins both of them to one output. Next I pass the result to mysql command to run it as a script.
echo "set #ZERO_PRODUCTS_DISPLAY='$ZERO_PRODUCTS_DISPLAY';" | cat - sql/test_parameter.sql | mysql
p.s. mysql login and password stored in .my.cnf file

Resources