Cutting string into different types of variables - bash

Full script:
snapshot_details=`az snapshot show -n $snapshot_name -g $resource_group --query \[diskSizeGb,location,tags\] -o json`
echo $snapshot_details
IFS='",][' read -r -a array <<< $snapshot_details
echo ${array[#]}
IFS=' ' read -r -a array1 <<< ${array[#]}
echo ${array1[0]} #size
echo ${array1[1]} #location
How can I break this into 3 different variables:
a=5
b=eastus2
c={ "name": "20190912123307" "namespace": "aj-ssd" "pvc": "poc-ssd" }
and is there any easier way to parse c so that I can easy traverse over all the keys and values?
o/p of the above script is:
[ 5, "eastus2", { "name": "20190912123307", "namespace": "ajain-ssd", "pvc": "azure-poc-ssd" } ]
5 eastus2 { name : 20190912123307 namespace : ajain-ssd pvc : azure-poc-ssd }
5
eastus2

A JSON parser, such as jq, should always be used when splitting out items from a JSON array in bash. Line-oriented tools (such as awk) are unable to correctly escape JSON -- if you had a value with a tab, newline, or literal quote, it would be emitted incorrectly.
Consider the following code, runnable exactly as-is even by people not having your az command:
snapshot_details_json='[ 5, "eastus2", { "name": "20190912123307", "namespace": "ajain-ssd", "pvc": "azure-poc-ssd" } ]'
{ read -r diskSizeGb && read -r location && read -r tags; } < <(jq -cr '.[]' <<<"$snapshot_details_json")
# show that we really got the content
echo "diskSizeGb=$diskSizeGb"
echo "location=$location"
echo "tags=$tags"
...which emits as output:
diskSizeGb=5
location=eastus2
tags={"name":"20190912123307","namespace":"ajain-ssd","pvc":"azure-poc-ssd"}

Bash can do this with the awk command:
To extract the 5 :
awk -F " " '{ print $1 }'
To extract eastus2 :
awk -F "\"" '{ print $2 }'
To extract the last string :
awk -F "{" '{ print "{" $2 }'
As seen here :
To explain quickly
awk -F " " '{ print $1 }'
-F sets a delimiter, here we set space as the delimiter.
Then, we ask awk to print the first occurence before the first delimiter is hit.
The slightly more complex one:
awk -F "{" '{ print "{" $2 }'
Here we set { as the delimiter. Since we wouldn't have the bracket with only printing $2, we're also manually re-printing the bracket (print "{" $2)

It will not be nice in Bash, but this should work if your input format does not vary (including no {, } or spaces inside the key/value pairs):
S='5 "eastus2" { "name": "20190912123307" "namespace": "aj-ssd" "pvc": "poc-ssd" }'
a=`echo "$S" | awk '{print $1}'`
b=`echo "$S" | awk '{print $2}' | sed -e 's/\"//g'`
c=`echo "$S" | awk '{$1=$2=""; print $0}'`
echo "$a"
echo "$b"
echo "$c"
elems=`echo "$c" | sed -e 's/{//' | sed -e 's/}//' | sed -e 's/: //g'`
echo $elems
for e in $elems
do
kv=`echo "$e" | sed -e 's/\"\"/ /' | sed -e 's/\"//g'`
key=`echo "$kv" | awk '{print $1}'`
value=`echo "$kv" | awk '{print $2}'`
echo "key:$key; value:$value"
done
The idea in the iteration over key/value pairs is to:
(1) remove the space (and colon) between keys and corresponding value so that each key/value pair appears as one item.
(2) inside the loop, change the delimiter between keys and values (which is now "") to space and remove the double quotes (variable 'kv').
(3) extract the key/value as the first/second item of kv.
EDIT:
Avoid file name wildcard expansions.

Related

how to change words with the same words but with number at the back bash

I have a file for example with the name file.csv and content
adult,REZ
man,BRB
women,SYO
animal,HIJ
and a line that is nor a directory nor a file
file.csv BRB1 REZ3 SYO2
And what I want to do is change the content of the file with the words that are on the line and then get the nth letter of that word with the number at the end of the those words in capital
and the output should then be
umo
I know that I can get over the line with
for i in "${#:2}"
do
words+=$(echo "$i ")
done
and then the output is
REZ3 BRB1 SYO2
Using awk:
Pass the string of values as an awk variable and then split them into an array a. For each record in file.csv, iterate this array and if the second field of current record matches the first three characters of the current array value, then strip the target character from the first field of the current record and append it to a variable. Print the value of the aggregated variable.
awk -v arr="BRB1 REZ3 SYO2" -F, 'BEGIN{split(arr,a," ")} {for (v in a) { if ($2 == substr(a[v],0,3)) {n=substr(a[v],length(a[v]),1); w=w""substr($1,n,1) }}} END{print w}' file.csv
umo
You can also put this into a script:
#!/bin/bash
words="${2}"
src_file="${1}"
awk -v arr="$words" -F, 'BEGIN{split(arr,a," ")} \
{for (v in a) { \
if ($2 == substr(a[v],0,3)) { \
n=substr(a[v],length(a[v]),1); \
w=w""substr($1,n,1);
}
}
} END{print w}' "$src_file"
Script execution:
./script file.csv "BRB1 REZ3 SYO2"
umo
This is a way using sed.
Create a pattern string from command arguments and convert lines with sed.
#!/bin/bash
file="$1"
pat='s/^/ /;Te;'
for i in ${#:2}; do
pat+=$(echo $i | sed 's#^\([^0-9]*\)\([0-9]*\)$#s/.\\{\2\\}\\(.\\).*,\1$/\\1/;#')
done
pat+='Te;H;:e;${x;s/\n//g;p}'
eval "sed -n '$pat' $file"
Try this code:
#!/bin/bash
declare -A idx_dic
filename="$1"
pattern_string=""
for i in "${#:2}";
do
pattern_words=$(echo "$i" | grep -oE '[A-Z]+')
index=$(echo "$i" | grep -oE '[0-9]+')
pattern_string+=$(echo "$pattern_words|")
idx_dic["$pattern_words"]="$index"
done
pattern_string=${pattern_string%|*}
while IFS= read -r line
do
line_pattern=$(echo $line | grep -oE $pattern_string)
[[ -n $line_pattern ]] && line_index="${idx_dic[$line_pattern]}" && echo $line | awk -v i="$line_index" '{split($0, chars, ""); printf("%s", chars[i]);}'
done < $filename
first find the capital words pattern and catch the index corresponding
then construct the hole pattern words string which connect with |
at last, iterate the every line according to the pattern string, and find the letter by the index
Execute this script.sh like:
bash script.sh file.csv BRB1 REZ3 SYO2

How to grab fields in inverted commas

I have a text file which contains the following lines:
"user","password_last_changed","expires_in"
"jeffrey","2021-09-21 12:54:26","90 days"
"root","2021-09-21 11:06:57","0 days"
How can I grab two fields jeffrey and 90 days from inverted commas and save in a variable.
If awk is an option, you could save an array and then save the elements as individual variables.
$ IFS="\"" read -ra var <<< $(awk -F, '/jeffrey/{ print $1, $NF }' input_file)
$ $ var2="${var[3]}"
$ echo "$var2"
90 days
$ var1="${var[1]}"
$ echo "$var1"
jeffrey
while read -r line; do # read in line by line
name=$(echo $line | awk -F, ' { print $1} ' | sed 's/"//g') # grap first col and strip "
expire=$(echo $line | awk -F, ' { print $3} '| sed 's/"//g') # grap third col and strip "
echo "$name" "$expire" # do your business
done < yourfile.txt
IFS=","
arr=( $(cat txt | head -2 | tail -1 | cut -d, -f 1,3 | tr -d '"') )
echo "${arr[0]}"
echo "${arr[1]}"
The result is into an array, you can access to the elements by index.
May be this below method will help you using
sed and awk command
#!/bin/sh
username=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $1}')
echo "$username"
expires_in=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $3}')
echo "$expires_in"
Output :
jeffrey
90 days
Note :
This above method will work if their is only distinct username
As far i know username are not duplicate

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

How to get output of grep in single line in shell script?

Here is a script which reads words from the file replaced.txt and displays the output each word in each line, But I want to display all the outputs in a single line.
#!/bin/sh
echo
echo "Enter the word to be translated"
read a
IFS=" " # Set the field separator
set $a # Breaks the string into $1, $2, ...
for a # a for loop by default loop through $1, $2, ...
do
{
b= grep "$a" replaced.txt | cut -f 2 -d" "
}
done
Content of "replaced.txt" file is given below:
hllo HELLO
m AM
rshbh RISHABH
jn JAIN
hw HOW
ws WAS
ur YOUR
dy DAY
This question can't be appropriate to what I asked, I just need the help to put output of the script in a single line.
Your entire script can be replaced by:
#!/bin/bash
echo
read -r -p "Enter the words to be translated: " a
echo $(printf "%s\n" $a | grep -Ff - replaced.txt | cut -f 2 -d ' ')
No need for a loop.
The echo with an unquoted argument removes embedded newlines and replaces each sequence of multiple spaces and/or tabs with one space.
One hackish-but-simple way to remove trailing newlines from the output of a command is to wrap it in printf %s "$(...) ". That is, you can change this:
b= grep "$a" replaced.txt | cut -f 2 -d" "
to this:
printf %s "$(grep "$a" replaced.txt | cut -f 2 -d" ") "
and add an echo command after the loop completes.
The $(...) notation sets up a "command substitution": the command grep "$a" replaced.txt | cut -f 2 -d" " is run in a subshell, and its output, minus any trailing newlines, is substituted into the argument-list. So, for example, if the command outputs DAY, then the above is equivalent to this:
printf %s "DAY "
(The printf %s ... notation is equivalent to echo -n ... — it outputs a string without adding a trailing newline — except that its behavior is more portably consistent, and it won't misbehave if the string you want to print happens to start with -n or -e or whatnot.)
You can also use
awk 'BEGIN { OFS=": "; ORS=" "; } NF >= 2 { print $2; }'
in a pipe after the cut.

Length of a specific field, and showing the record in much easier way

My goal is to find out the length of the second field and if the length is more than five characters, then I need to show the entire record using shell scripts/command.
echo "From the csv file"
cat latency.csv |
while read line
do
latency=`echo $line | cut -d"," -f2 | tr -d " "`
length=$(echo ${#latency})
if [ $length -gt 5 ]
then
echo $line
fi
done
There is nothing wrong with my code, but being UNIX/Linux, I thought there should be a simpler way of doing such things.
Is there one such simpler method?
awk -F, 'length($2)>5' file
this should work
updated
awk -F, '{a=$0;gsub(/ /,"",$2);if(length($2)>5)print a}' file
awk -F, '{
t = $2
gsub(/ /, x, t)
if (length(t) > 5)
print
}' latency.csv
Or:
perl -F, -ane'
print if
$F[1] =~ tr/ //dc > 5
' latency.csv

Resources