Bash ./ get subset with length of 8 chars - bash

I have the following input:
line="before,myinput1,after"
myinput1 can be also first or last. for example: line="myinput1,after" or line="before,myinput1"
Im trying to get only the myinput1 value (which can be changed). tried this:
line | grep -o -E ',.{0,7}.,'
which its returned the following value: ,myinput1,. The issue its not working if the value is first or last because the missing ,.
is there any other way to do that?

Using grep, a regex for 8 characters (assuming you only want an 8 character string) is \w{8}. Using OR operators | the three cases needed (start of line, end of line and somewhere in the middle of the line) can be expressed as:
egrep -o ',\w{8},|^\w{8},|,\w{8}$'

To catch fields of 8 characters in a comma delimited string, you can use awk:
awk -v RS=, 'length()==8' <<< "$line"
RS sets the record separator to the comma ,.
awk length() function gives the size of the current record.

With bash :
(IFS=',';set -- $line;for i;do [ ${#i} -eq 8 ] && echo $i ;done)

Related

How to get a number with variable number of digits from a string in a file using bash script?

I have the following file:
APP_VERSION.ts
export const APP_VERSION = 1;
This is the only content of that file, and the APP_VERSION variable will be incremented as needed.
So, the APP_VERSION could be a single digit number or multiple digit number, like 15 or 999, etc.
I need to use that value in one of my bash scripts.
use-app-version.sh
APP_VERSION=`cat src/constants/APP_VERSION.ts`
echo $APP_VERSION
I know I can read it with cat. But how can I parse that string so I can get exactly the APP_VERSION value, whether it's 1 or 999, for example.
sed -En 's/(^.*APP_VERSION.*)([[:digit:]]+.*)(\;.*$)/\2/p' src/constants/APP_VERSION
Using sed, split the line into three sections defined by opening and closing brackets. Substitute the line for second section on ( the version value) and print.
You may use this awk:
app_ver=$(awk -F '[[:blank:];=]+' '$(NF-2) == "APP_VERSION" {print $(NF-1)}' src/constants/APP_VERSION.ts)
echo "$app_ver"
1
You can concat some commands to remove everything else:
APP_VERSION=`cat src/constants/APP_VERSION.ts | awk -F '=' '{print $2}' | tr -d ' ' | tr -d ';'`
1 - Cat get all file content
2 - AWK gets all content after '='
3 - Remove space
4 - Remove ;
A simple
APP_VERSION=$(grep --text -Eo '[0-9]+' src/constants/APP_VERSION.ts)
should be enough
With bash only:
APP_VERSION=$(cat src/constants/APP_VERSION.ts)
APP_VERSION=${APP_VERSION%;}
APP_VERSION=${APP_VERSION/*= }
Line 2 removes the trailing ';', line 3 removes everything before "= ".
Alternatively, you could set APP_VERSION as an array, take 5th element, and remove trailing ';'.
Or, another solution, using IFS:
IFS='=;' read a APP_VERSION < src/constants/APP_VERSION.ts
In this version, the space will remain before version number.
Assuming that the task can be rephrased to "extract the digits from a file", there are a few options:
Delete all characters that aren't digits with tr:
version=$(tr -cd '[:digit:]' < infile)
Use grep to match all digits and retain nothing but the match:
version=$(grep -Eo '[[:digit:]]+' infile)
Read file into string and delete all non-digits with just Bash:
contents=$(< infile)
version=${contents//[![:digit:]]}

Display First and Last string entries stored in a variable

I have a variable MyVar with values stored in it. For example:
MyVar="123, 234, 345, 456"
Each entry in the variable is separated by a coma as in the example above.
I want to be able to pick the first and last entry from this variable, i.e 123 and 456 respectively.
Any idea how I can achieve this from the command prompt terminal ?
Thanks!
Using bash substring removal:
$ echo ${MyVar##*,}
456
$ echo ${MyVar%%,*}
123
Also:
$ echo ${MyVar/,*,/,}
123, 456
More for example here:
https://tldp.org/LDP/abs/html/parameter-substitution.html
Edit: Above kind of expects the substrings to be separated by commas only. See comments where #costaparas gloriously demonstrates a case with , .
Try using sed:
MyVar="123, 234, 345, 456"
first=$(echo "$MyVar" | sed 's/,.*//')
last=$(echo "$MyVar" | sed 's/.*, //')
echo $first $last
Explanation:
To obtain the first string, we replace everything after & including
the first comma with nothing (empty string).
To obtain the last string, we replace everything before & including the last comma with nothing (empty string).
Using bash array:
IFS=', ' arr=($MyVar)
echo ${arr[0]} ${arr[-1]}
Where ${arr[0]} and ${arr[-1]} are your first and last respective values. Negative index requires bash 4.2 or later.
You could try following also with latest BASH version, by sending variable values into an array and then retrieve first and last element, keeping all either values in it saved in case you need them later in program etc.
IFS=', ' read -r -a array <<< "$MyVar"
echo "${array[0]}"
123
echo "${array[-1]}"
456
Awk alternative:
awk -F "(, )" '{ print $1" - "$NF }' <<< $MyVar
Set the field separator to command and a space. Print the first field and the last field (NF) with " - " in between.

In bash how can I get the last part of a string after the last hyphen [duplicate]

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

Is there a way to format the width of a substring within a string in a bash/sh script?

I have to format the width of a substring within a string using a bash script, but without using tokens or loops. A single character between two colons should be prepended by a 0 in order to match the standard width of 2 for each field.
For e.g
from:
6:0:36:35:30:30:72:6c:73:0:c:52:4c:30:31:30:31:30:30:30:31:36:39:0:1:3
to
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
How can I do this?
sed -r 's/\<([0-9a-f])\>/0\1/g'
Search and replace with a regex. Use \< and \> to match word boundaries so [0-9a-f] only matches single digits.
$ sed -r 's/\<([0-9a-f])\>/0\1/g' <<< "6:0:36:35:30:30:72:6c:73:0:c:52:4c:30:31:30:31:30:30:30:31:36:39:0:1:3"
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
awk -F: -v OFS=: '{for(i=1;i<=NF;i++) if(length($i)==1)gsub($i,"0&",$i)}1' file
Output:
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
This will divide the whole line into fields separated by : , if the length of any of the field is == 1. then it will replace that field with 0field.
Bash solution:
IFS=:; for i in $string; do echo -n 0$i: | tail -c 3; done
With
str="06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03"
you can add a '0' to all tokens and remove those that are unwanted:
sed -r 's/0([0-9a-f]{2})/\1/g' <<< "0${str//:/:0}"
That doesn't feel right, making errors and repairing them.
A better alternative is
echo $(IFS=:; printf "%2s:" ${str} | tr " " "0")

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Resources