How to extract specific lines from a file in bash? - shell

I want to extract the string from a line which starts with a specific pattern from a file in shell script.
For example: I want the strings from lines that start with hello:
hi to_RAm
hello to_Hari
hello to_kumar
bye to_lilly
output should be
to_Hari
to_kumar
Can anyone help me?

sed is the most appropriate tool:
sed -n 's/^hello //p'

Use grep:
grep ^hello file | awk '{print $2}'
^ is to match lines that starts with "hello". This is assuming you want to print the second word.
If you want to print all words except the first then:
grep ^hello file | awk '{$1=""; print $0}'

You could use GNU grep's perl-compatible regexes and use a lookbehind:
grep -oP '(?<=hello ).*'

Related

Extract tokens from log files in unix

I have a directory containing log files.
We are interested in a particular log line which goes like 'xxxxxxxxx|platform=SUN|.......|orderId=ABCDEG|........'
We have to extract all similar lines from the log files in this directory,and print out the token 'ABCDEG'.
Duplication is acceptable.
How do we achieve this with a single unix command operation?
sed -r '/platform=.*orderId=/s/.*orderId=([^|]+).*/\1/g' *
From all lines containing platform= && orderId= (/platform=.*orderId=/), take the non-| sequence of characters (([^|]+))after orderId=.
awk -F'|' '$2=="platform=SUN"{sub(/orderId=/,"", $4); print $4}' logFile*
output
ABCDEG
IHTH
grep -rP "\|platform=SUN\|.*(?<=\|orderId=)" | sed s/.*platform=SUN.*orderId=// | sed s/\|.*//
$ str='xxxxxxxxx|platform=SUN|.......|orderId=ABCDEG|........'
$ grep -Po 'platform=SUN.*orderId=\K[^|]*' <<< "$str"
ABCDEG
This requires Perl compatible regular expressions (-P); -o retains just the match. \K is variable length look-behind: "match the stuff to the left of it, but don't include it in the matched string".
From the logs directory you could run the following command:
sed -n /platform=SUN/p * | sed 's#.*orderId=\(.*\)|.*$#\1#'

shell script cut from variables

The file is like this
aaa&123
bbb&234
ccc&345
aaa&456
aaa$567
bbb&678
I want to output:(contain "aaa" and text after &)
123
456
I want to do in in shell script,
Follow code be consider
#!/bin/bash
raw=$(grep 'aaa' 1.txt)
var=$(cut -f2 -d"&" "$raw")
echo $var
It give me a error like
cut: aaa&123
aaa&456
aaa$567: No such file or directory
How to fix it? and how to cut (or grep or other) from exist variables?
Many thanks!
With GNU grep:
grep -oP 'aaa&\K.*' file
Output:
123
456
\K: ignore everything before pattern matching and ignore pattern itself
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-P, --perl-regexp
Interpret PATTERN as a Perl compatible regular expression (PCRE)
Cyrus has my vote. An awk alternative if GNU grep is not available:
awk -F'&' 'NF==2 && $1 ~ /aaa/ {print $2}' file
Using & as the field separator, for lines with 2 fields (i.e. & must be present) and the first field contains "aaa", print the 2nd field.
The error with your answer is that you are treating the grep output like a filename in the cut command. What you want is this:
grep 'aaa.*&' file | cut -d'&' -f2
The pattern means "aaa appears before an &"

Reading numbers from a text line in bash shell

I'm trying to write a bash shell script, that opens a certain file CATALOG.dat, containing the following lines, made of both characters and numbers:
event_0133_pk.gz
event_0291_pk.gz
event_0298_pk.gz
event_0356_pk.gz
event_0501_pk.gz
What I wanna do is print the numbers (only the numbers) inside a new file NUMBERS.dat, using something like > ./NUMBERS.dat, to get:
0133
0291
0298
0356
0501
My problem is: how do I extract the numbers from the text lines? Is there something to make the script read just the number as a variable, like event_0%d_pk.gz in C/C++?
A grep solution:
grep -oP '[0-9]+' CATALOG.dat >NUMBERS.dat
A sed solution:
sed 's/[^0-9]//g' CATALOG.dat >NUMBERS.dat
And an awk solution:
awk -F"[^0-9]+" '{print $2}' CATALOG.dat >NUMBERS.dat
There are many ways that you can achieve your result. One way would be to use awk:
awk -F_ '{print $2}' CATALOG.dat > NUMBERS.dat
This sets the field separator to an underscore, then prints the second field which contains the numbers.
Awk
awk 'gsub(/[^[:digit:]]/,"")' infile
Bash
while read line; do echo ${line//[!0-9]}; done < infile
tr
tr -cd '[[:digit:]\n]' <infile
You can use grep command to extract the number part.
grep -oP '(?<=_)\d+(?=_)' CATALOG.dat
gives output as
0133
0291
0298
0356
0501
Or
much simply
grep -oP '\d+' CATALOG.dat
You don't need perl mode in grep for this. BREs can do this.
grep -o '[[:digit:]]\+' CATALOG.dat > NUMBERS.dat

Text Manipulation using sed or AWK

I get the following result in my script when I run it against my services. The result differs depending on the service but the text pattern showing below is similar. The result of my script is assigned to var1. I need to extract data from this variable
$var1=HOST1*prod*gem.dot*serviceList : svc1 HOST1*prod*kem.dot*serviceList : svc3, svc4 HOST1*prod*fen.dot*serviceList : svc5, svc6
I need to strip the name of the service list from $var1. So the end result should be printed on separate line as follow:
svc1
svc2
svc3
svc4
svc5
svc6
Can you please help with this?
Regards
Using sed and grep:
sed 's/[^ ]* :\|,\|//g' <<< "$var1" | grep -o '[^ ]*'
sed deletes every non-whitespace before a colon and commas. Grep just outputs the resulting services one per line.
Using gnu grep and gnu sed:
grep -oP ': *\K\w+(, \w+)?' <<< "$var1" | sed 's/, /\n/'
svc1
svc3
svc4
svc5
svc6
grep is the perfect tool for the job.
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Sounds perfect!
As far as I'm aware this will work on any grep:
echo "$var1" | grep -o 'svc[0-9]\+'
Matches "svc" followed by one or more digits. You can also enable the "highly experimental" Perl regexp mode with -P, which means you can use the \d digit character class and don't have to escape the + any more:
grep -Po 'svc\d+' <<<"$var1"
In bash you can use <<< (a Here String) which supplies "$var1" to grep on the standard input.
By the way, if your data was originally on separate lines, like:
HOST1*prod*gem.dot*serviceList : svc1
HOST1*prod*kem.dot*serviceList : svc3, svc4
HOST1*prod*fen.dot*serviceList : svc5, svc6
This would be a good job for awk:
awk -F': ' '{split($2,a,", "); for (i in a) print a[i]}'

unix bash - extract a line from file

need your help!!! I tried looking for this but to no avail.
How can I achieve the following using bash?
I've a flat file called "cube.mdl" that contains:
[...]
bla bla bla bla lots of lines above
Cube 8007841 "BILA_" MdcFile "BILA_CO_PM_MKT_BR_CUBE.mdc"
bla bla bla more lines below
[...]
I need to open that file, look for the word "MdcFile" and get the string that follows between quotes, which would be BILA_CO_PM_MKT_BR_CUBE.mdc
I know AWK or grep are powerful enough to do this in one line, but I couldn't find an example that could help me do it on my own.
Thanks in advance!
JMA
You can use:
grep -o -P "MdcFile.*" cube.mdl | awk -F\" '{ print $2 }'
This will use grep's regex to only return MdcFile and everything after it in the current line. Then, awk will use the " as a delimiter and print only the second word - which would be your "in-quotes" word(s), returned without the quotes of course.
The option -o, --only-matching specifies to return only the text matching that matches and the -P, --perl-regexp specifies that the pattern is a Perl-Regex pattern. It appears that some versions of grep do not contain these options. The OP's version is a version that does not include them, but the following appears to work for him instead:
grep "MdcFile.*" cube.mdl | awk -F\" '{ print $2 }'
grep MdcFile cube.mdl | awk '{print $5}'
would do it, assuming there's no spaces in any of those bits to throw off the position count.
This might do it.
sed -n '/MdcFile / s/.*MdcFile "\(\[^"\]\+\)".*/\1/;/MdcFile / p' INPUTFILE
Use awk for the whole thing with " as record separator:
awk -v RS='"' '/MdcFile/ { getline; print }' cube.mdl

Resources