Extract and display multiple strings in a single line - bash

I have a single line and i wanna extract/display (from bash) all entire strings starting by specific characters.
Single line to filter:
"ABC-3324545/":{"acc":"fff"},"ABC-652123/":{"acc":"sss"},"ABC-15642/":{"acc":"rrr"}...
Specific character to research in strings: ABC-
Display needed:
ABC-3324545
ABC-652123
ABC-15642
i think i need to combinate multiple cmd like grep awk sed, etc... but unfortunately, no result :(
curl -H "Token: xxxx" $URL | grep -o 'ABC-'
returns
ABC-
ABC-
ABC-
curl -H "Token: xxxx" $URL | awk -F "PKI-" '{ print $1; '}
...don't match with what i wan't to do
Any idea plz?

Data file:
$ cat abc.dat
"ABC-3324545/":{"acc":"fff"},"ABC-652123/":{"acc":"sss"},"ABC-15642/":{"acc":"rrr"}...
"DEF-3324545/":{"acc":"fff"},"DEF-652123/":{"acc":"sss"},"DEF-15642/":{"acc":"rrr"}...
Assuming the desired string a) starts with ABC- and b) ends before the next /, one grep idea:
$ grep -o "ABC-[^/]*" abc.dat
ABC-3324545
ABC-652123
ABC-15642
Where [^/]* says to match everything that is not a /, ie, match everything up to the next /.

Related

How to convert piped/awk output to string/variable

I'm trying to create a bash function that automatically updates a cli tool. So far I've managed to get this:
update_cli_tool () {
# the following will automatically be redirected to .../releases/tag/vX.X.X
# there I get the location from the header, and remove it to get the full url
latest_release_url=$(curl -i https://github.com/.../releases/latest | grep location: | awk -F 'location: ' '{print $2}')
# to get the version, I get the 8th element from the url .../releases/tag/vX.X.X
latest_release_version=$(echo "$latest_release_url" | awk -F '/' '{print 8}')
# this is where it breaks
# the first part just replaces the "tag" with "download" in the url
full_url="${latest_release_url/tag/download}/.../${latest_release_version}.zip"
echo "$full_url" # or curl $full_url, also fails
}
Expected output: https://github.com/.../download/vX.X.X/vX.X.X.zip
Actual output: -.zip-.../.../releases/download/vX.X.X
When I just echo "latest_release_url: $latest_release_url" (same for version), it prints it correctly, but not when I use the above mentioned flow. When I hardcode the ..._url and ..._version, the full_url works fine. So my guess is I have to somehow capture the output and convert it to a string? Or perhaps concatenate it another way?
Note: I've also used ..._url=`curl -i ...` (with backticks instead of $(...)), but this gave me the same results.
The curl output will use \r\n line endings. The stray carriage return in the url variable is tripping you up. Observe it with printf '%q\n' "$latest_release_url"
Try this:
latest_release_url=$(
curl --silent -i https://github.com/.../releases/latest \
| awk -v RS='\r\n' '$1 == "location:" {print $2}'
)
Then the rest of the script should look right.

Bash regex: get value in conf file preceded by string with dot

I have to get my db credentials from this configuration file:
# Database settings
Aisse.LocalHost=localhost
Aisse.LocalDataBase=mydb
Aisse.LocalPort=5432
Aisse.LocalUser=myuser
Aisse.LocalPasswd=mypwd
# My other app settings
Aisse.NumDir=../../data/Num
Aisse.NumMobil=3000
# Log settings
#Aisse.Trace_AppliTpv=blabla1.tra
#Aisse.Trace_AppliCmp=blabla2.tra
#Aisse.Trace_AppliClt=blabla3.tra
#Aisse.Trace_LocalDataBase=blabla4.tra
In particular, I want to get the value mydb from line
Aisse.LocalDataBase=mydb
So far, I have developed this
mydbname=$(echo "$my_conf_file.conf" | grep "LocalDataBase=" | sed "s/LocalDataBase=//g" )
that returns
mydb #Aisse.Trace_blabla4.tra
that would be ok if it did not return also the comment string.
Then I have also tryed
mydbname=$(echo "$my_conf_file.conf" | grep "Aisse.LocalDataBase=" | sed "s/LocalDataBase=//g" )
that retruns void string.
How can I get only the value that is preceded by the string "Aisse.LocalDataBase=" ?
Using sed
$ mydbname=$(sed -n 's/Aisse\.LocalDataBase=//p' input_file)
$ echo $mydbname
mydb
I'm afraid you're being incomplete:
You mention you want the line, containing "LocalDataBase", but you don't want the line in comment, let's start with that:
A line which contains "LocalDataBase":
grep "LocalDataBase" conf.conf.txt
A line which contains "LocalDataBase" but who does not start with a hash:
grep "LocalDataBase" conf.conf.txt | grep -v "^ *#"
??? grep -v "^ *#"
That means: don't show (-v) the lines, containing:
^ : the start of the line
* : a possible list of space characters
# : a hash character
Once you have your line, you need to work with it:
You only need the part behind the equality sign, so let's use that sign as a delimiter and show the second column:
cut -d '=' -f 2
All together:
grep "LocalDataBase" conf.conf.txt | grep -v "^ *#" | cut -d '=' -f 2
Are we there yet?
No, because it's possible that somebody has put some comment behind your entry, something like:
LocalDataBase=mydb #some information
In order to prevent that, you need to cut that comment too, which you can do in a similar way as before: this time you use the hash character as a delimiter and you show the first column:
grep "LocalDataBase" conf.conf.txt | grep -v "^ *#" | cut -d '=' -f 2 | cut -d '#' -f 1
Have fun.
You may use this sed:
mydbname=$(sed -n 's/^[^#][^=]*LocalDataBase=//p' file)
echo "$mydbname"
mydb
RegEx Details:
^: Start
[^#]: Matches any character other than #
[^=]*: Matches 0 or more of any character that is not =
LocalDataBase=: Matches text LocalDataBase=
You can use
mydbname=$(sed -n 's/^Aisse\.LocalDataBase=\(.*\)/\1/p' file)
If there can be leading whitespace you can add [[:blank:]]* after ^:
mydbname=$(sed -n 's/^[[:blank:]]*Aisse\.LocalDataBase=\(.*\)/\1/p' file)
See this online demo:
#!/bin/bash
s='# Database settings
Aisse.LocalHost=localhost
Aisse.LocalDataBase=mydb
Aisse.LocalPort=5432
Aisse.LocalUser=myuser
Aisse.LocalPasswd=mypwd
# My other app settings
Aisse.NumDir=../../data/Num
Aisse.NumMobil=3000
# Log settings
#Aisse.Trace_AppliTpv=blabla1.tra
#Aisse.Trace_AppliCmp=blabla2.tra
#Aisse.Trace_AppliClt=blabla3.tra
#Aisse.Trace_LocalDataBase=blabla4.tra'
sed -n 's/^Aisse\.LocalDataBase=\(.*\)/\1/p' <<< "$s"
Output:
mydb
Details:
-n - suppresses default line output in sed
^[[:blank:]]*Aisse\.LocalDataBase=\(.*\) - a regex that matches the start of a string (^), then zero or more whiespaces ([[:blank:]]*), then a Aisse.LocalDataBase= string, then captures the rest of the line into Group 1
\1 - replaces the whole match with the value of Group 1
p - prints the result of the successful substitution.

Identify "$" that is immediately followed by only alphabet/alphanumeric words

"$" should not be immediately followed by digits [0-9]. It should only show the
output- "$" which is immediately followed by aphabet/alphanumeric/alphacharacter.
Input: dirname $0/../bin/$12JAVA_INV/$FILE12NAME
Output: $FILE12NAME
grep -o '[$][a-zA-z_]*'
Using this I'm receiving an output as: $ $ $FILENAME
You're getting $ in the result because * means to match zero or more of the preceding pattern. $0 matches because it has a $ followed by 0 letters.
If you want at least 1 letter, use + instead, it means one or more.
But if you want to be able to match $FILE12NAME, you also need to allow digits after the first character. So use:
grep -i -o '\$[a-z_][a-z_0-9]*'
This matches $, followed by a letter or underscore, followed by zero or more letters, underscores, or numbers.
It looks like you want:
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | awk '{print $NF}' FS=/
$FILE12NAME
But if you really want to parse it the way you describe, you could do either of:
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | sed -e 's/.*\(\$[^0-9]\)/\1/'
$FILE12NAME
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | sed -E 's/.*(\$[^0-9])/\1/'
$FILE12NAME

Multi-line grep with positive and negative filtering

I need to grep for a multi-line string that doesn't include one string, but does include others. This is what I'm searching for in some HTML files:
<not-this>
<this> . . . </this>
</not-this>
In other words, I want to find files that contain <this> and </this> on the same line, but should not be surrounded by html tags <not-this> on the lines before and/or after. Here is some shorthand logic for what I want to do:
grep 'this' && '/this' && !('not-this')
I've seen answers with the following...
grep -Er -C 2 '.*this.*this.*' . | grep -Ev 'not-this'
...but this just erases the line(s) containing the "not" portion, and displays the other lines. What I'd like is for it to not pull those results at all if "not-this" is found within a line or two of "this".
Is there a way to accomplish this?
P.S. I'm using Ubuntu and gnome-terminal.
It sounds like an awk script might work better here:
$ cat input.txt
<not-this>
<this>BAD! DO NOT PRINT!</this>
</not-this>
<yes-this>
<this>YES! PRINT ME!</this>
</yes-this>
$ cat not-this.awk
BEGIN {
notThis=0
}
/<not-this>/ {notThis=1}
/<\/not-this>/ {notThis=0}
/<this>.*<\/this>/ {if (notThis==0) print}
$ awk -f not-this.awk input.txt
<this>YES! PRINT ME!</this>
Or, if you'd prefer, you can squeeze this awk script onto one long line:
$ awk 'BEGIN {notThis=0} /<not-this>/ {notThis=1} /<\/not-this>/ {notThis=0} /<this>.*<\/this>/ {if (notThis==0) print}' input.txt

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Resources