Extracting a part of lines matching a pattern

Extracting a part of lines matching a pattern - bash

I have a configuration file and need to parse out some values using bash
Ex. Inside config.txt
some_var= Not_needed
tests= spec1.rb spec2.rb spec3.rb
some_other_var= Also_not_needed
Basically I just need to get "spec1.rb spec2.rb spec3.rb" WITHOUT all the other lines and "tests=" removed from the line.
I have this and it works, but I'm hoping there's a much more simple way to do this.
while read run_line; do
if [[ $run_line =~ ^tests=* ]]; then
echo "FOUND"
all_selected_specs=`echo ${run_line} | sed 's/^tests= /''/'`
fi
done <${config_file}
echo "${all_selected_specs}"

all_selected_specs=$(awk -F '= ' '$1=="tests" {print $2}' "$config_file")
Using a field separator of "= ", look for lines where the first field is tests and print the second field.

This should work too
grep "^tests" ${config_file} | sed -e "s/^tests= //"

How about grep and cut?
all_selected_specs=$(grep "^tests=" "$config_file" | cut -d= -f2-)

try:
all_selected_specs=$(awk '/^tests/{sub(/.*= /,"");print}' Input_file)
searching for string tests which comes in starting of a line then substituting that line's all values till (= ) to get all spec values, once it is substituted then we are good to get the spec values so printing that line. Finally saving it's value to variable with $(awk...).

Related

How to get values in a line while looping line by line in a file (shell script)

I have a file which looks like this (file.txt)
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
I want to loop trough this line by line an extract the key values.
so the result should be like ,
AJGUIGIDH568
AJGUIGIDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
So I wrote a code like this to loop line by line and extract the value between {"key":" and ","rule": because key values is in between these 2 patterns.
while read p; do
echo $p | sed -n "/{"key":"/,/","rule":,/p"
done < file.txt
But this is not working. can someone help me to figure out me this. Thanks in advance.

Your sample input is almost valid json. You could tweak it to make it valid and then extract the values with jq with something like:
sed -e 's/squid/"squid/' -e 's/$/"}/' file.txt | jq -r .key
Or, if your actual input really is valid json, then just use jq:
jq -r .key file.txt
If the "random-txt" may include double quotes, making it difficult to massage the input to make it valid json, perhaps you want something like:
awk '{print $4}' FS='"' file.txt
or
sed -n '/{"key":"\([^"]*\).*/s//\1/p' file.txt
or
while IFS=\" read open_brace key colon val _; do echo "$val"; done < file.txt

For the shown data, you can try this awk:
awk -F '"[:,]"' '{print $2}' file
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568

With the give example you can simple use
cut -d'"' -f4 file.txt

Assumptions:
there may be other lines in the file so we need to focus on just the lines with "key" and "rule"
the only text between "key" and "rule" is the desired string (eg, squid never shows up between the two patterns of interest)
Adding some additional lines:
$ cat file.txt
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
ignore this line}
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
ignore this line}
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
One sed idea:
$ sed -nE 's/^(.*"key":")([^"]*)(","rule".*)$/\2/p' file.txt
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
Where:
-E - enable extended regex support (and capture groups without need to escape sequences)
-n - suppress printing of pattern space
^(.*"key":") - [1st capture group] everything from start of line up to and including "key":"
([^"]*) - [2nd capture group] everything that is not a double quote (")
(","rule".*)$ - [3rd capture group] everything from ",rule" to end of line
\2/p - replace the line with the contents of the 2nd capture group and print

How to grep a specific pattern before match?

I'm currently working on multiple configuration files which use the following format:
[Stanza1]
action.script=1
action.ping=0
action.lookup=1
action.notable.param=0
action.script.filename=script.pl
[Stanza2]
action.script=0
action.ping=0
action.lookup=1
[Stanza3]
action.script=1
action.ping=0
action.lookup=0
action.script.filename=script.pl
I want to know which stanzas include "action.script.filename=script.pl", so the expected result would be
[Stanza1]
[Stanza3]
Using something like:
grep -B 10 "action.script.filename = script.pl" file
doesn't work for cases where the stanza name is more than 10 lines before the match, and proves quite cumbersome to use.
Any suggestions on how to do this?

The following sed command would do the trick :
sed -n '/^\[/h;/^action\.script\.filename=script\.pl$/{x;p}'
You can try it here.
When it encounters a line that starts with "[", it stores it into its hold buffer. When it encounters a "action.script.filename=script.pl" line, it prints the content of the hold buffer.

I'm not sure this can be done purely with grep. I would recommend a small bash script:
while read line
do
if [[ $line =~ \[.* ]]; then
# save stanza for later
stanza=$line
fi
if [[ $line =~ action.script.filename=script.pl ]]; then
echo $stanza
fi
done < file

With awk
$ awk '/action\.script\.filename=script\.pl/{print h} /^\[/{h=$0}' ip.txt
[Stanza1]
[Stanza3]
/^\[/ lines starting with [ character, you can also use something like /Stanza/ as long as it uniquely identifies header lines
h=$0 for such lines, save the content ($0) to variable h
/action\.script\.filename=script\.pl/ if input line matches the given search criteria
print h print the value of h variable
if you are matching whole line, then you can also use string match $0 == "action.script.filename=script.pl" instead of regex match

This line of code works for me
grep '^\[Stanza\|^action.script.filename=script.pl$' fileName | grep -B1 'action.script.filename=script.pl' | grep -v 'action.script.filename=script.pl\|\-\-'
Explanation:
grep '^\[Stanza\|^action.script.filename=script.pl$' fileName
matches either [Stanza]* lines or action.script.filename=script.pl ones. Output is something like this
[Stanza1]
action.script.filename=script.pl
[Stanza2]
[Stanza3]
action.script.filename=script.pl
Adding this filter | grep -B1 'action.script.filename=script.pl' will result in this
[Stanza1]
action.script.filename=script.pl
--
[Stanza3]
action.script.filename=script.pl
Now you just need to clean the output from unwanted parts
| grep -v 'action.script.filename=script.pl\|\-\-'
This is the final output
[Stanza1]
[Stanza3]

awk '/^\[.*\]$/{stanza=$0;next} /action.script.filename=script.pl/{print stanza}' filename
[Stanza1]
[Stanza3]
You can store each stanza in a variable called stanza and move to next line. Whenever you see the string action.script.filename=script.pl , print the variable stanza.

Bash, cut word with dot character from string

I have a string:
Log for: squid.log.2017.11.13
I need to cut out squid.log. so that I see:
Log for: 2017.11.13
I tried to cut
echo "Log for: squid.log.2017.11.13" | cut -d'.' -f3-5
But I ended up with:
2017.11.13
How can I get the result I want?

You can use sed to cut the unwanted part:
echo "Log for: squid.log.2017.11.13" | sed 's/squid\.log\.//'

Use sed to remove the part you don't want:
echo "Log for: squid.log.2017.11.13" | sed 's/squid\.log\.//'

awk to the rescue! a non-standard approach to break the monotony...
define the to be removed text as field separator and parse and print the input line.
$ echo Log for: squid.log.2017.11.13 | awk -F' squid\\.log\\.' '{$1=$1}1'
Log for: 2017.11.13

This solution is a bit more reusable than the previous ones offered:
awk '/^Log/{ split($3,x,"."); print $1" "$2" "x[length(x)-2]"."x[length(x)-1]"."x[length(x)] };'
This looks for all lines starting with Log, then grabs the 3rd column which contains squid.log.2017.11.13 and utilizes the the split built-in to break up the string in to array x using the . as the delimiter. Once we have our array x, we know that the last 3 values will always be the date, and this will work regardless of the rest of the string, (even if squid.log was something different) - we can use the length built-in to make sure we only get the last three elements.
Then we just print our reformatted string print $1" "$2" "x[length(x)-2]"."x[length(x)-1]"."x[length(x)] - reinserting the .'s in the appropriate places since they were stripped by using them as the split delimiter.
Output:
Log for: 2017.11.13

How to use sed to extract a string [duplicate]

This question already has answers here:
BASH extract value after string in variable Not file [duplicate]
(2 answers)
Closed last year.
I need to extract a number from the output of a command: cmd. The output is type: 1000
So my question is how to execute the command, store its output in a variable and extract 1000 in a shell script. Also how do you store the extracted string in a variable?

This question has been answered in pieces here before, it would be something like this:
line=$(sed -n '2p' myfile)
echo "$line"
if [ `echo $line || grep 'type: 1000' ` ] then;
echo "It's there!";
fi;
Store output of sed into a variable
String contains in Bash
EDIT: sed is very limited, you would need to use bash, perl or awk for what you need.

This is a typical use case for grep:
output=$(cmd | grep -o '[0-9]\+')
You can write the output of a command or even a pipeline of commands into a shell variable using so called command substitution:
variable=$(cmd);
In comments it appeared that the output of cmd contains more lines than the type : 1000. In this case I would suggest sed:
output=$(cmd | sed -n 's/type : \([0-9]\+\)/\1/p;q')

You tagged your question as sed but your question description does not restrict other tools, so here's a solution using awk.
output = `cmd | awk -F':' '/type: [0-9]+/{print $2}'`
Alternatively, you can use the newer $( ) syntax. Some find the newer syntax preferable and it can be conveniently nested, without the need for escaping backtics.
output = $(cmd | awk -F':' '/type: [0-9]+/{print $2}')

If the output is rigidly restricted to "type: " followed by a number, you can just use cut.
var=$(echo 'type: 1000' | cut -f 2 -d ' ')
Obviously you'll have to pipe the output of your command to cut, I'm using echo as a demo.
In addition, I'd use grep and then cut if the string you are searching is more complex. If we assume there can be all kind of numbers in the text, but only one occurrence of "type: " followed by a number, you can use the command:
>> var=$(echo "hello 12 type: 1000 foo 1001" | grep -oE "type: [0-9]+" | cut -f 2 -d ' ')
>> echo $var
1000

You can use the | operator to send the output of one command to another, like so:
echo " 1\n 2\n 3\n" | grep "2"
This sends the string " 1\n 2\n 3\n" to the grep command, which will search for the line containing 2. It sound like you might want to do something like:
cmd | grep "type"

Here is a plain sed solution that uses a regualar expression to find the number in your string:
cmd | sed 's/^.*type: \([0-9]\+\)/\1/g'
^ means from the start
.* can be any character (also none)
\([0-9]\+\) are numbers (minimum one character)
\1 means it takes the first pattern it finds (and only in this case) and uses it as replacement for the whole string

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...

This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3

With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0

You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"

Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string

First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extracting a part of lines matching a pattern - bash

all_selected_specs=$(awk -F '= ' '$1=="tests" {print $2}' "$config_file") Using a field separator of "= ", look for lines where the first field is tests and print the second field.

This should work too grep "^tests" ${config_file} | sed -e "s/^tests= //"

How about grep and cut? all_selected_specs=$(grep "^tests=" "$config_file" | cut -d= -f2-)

Related

How to get values in a line while looping line by line in a file (shell script)

How to grep a specific pattern before match?

Bash, cut word with dot character from string

How to use sed to extract a string [duplicate]

Bash command to extract characters in a string

Categories

Resources