Split string in Array after specific delimited and New line - bash

$string="name: Destination Administrator
description: Manage the destination configurations, certificates and subaccount trust.
readOnly:
roleReferences:
- roleTemplateAppId: destination-xsappname!b62
roleTemplateName: Destination_Administrator
name: Destination Administrator"
I have above string each line is delimited by newline char, and I like to create array with two column after "-" as below
Col1 col2
roleTemplateAppId destination-xsappname!b62
roleTemplateName Destination_Administrator
name Destination Administrator
I tried below but it is not returning correct array
IFS='- ' read -r -a arrstring <<< "$string"
echo "${arrstring [1]}"

Assumptions:
OP is unable to use a yaml parser (per Léa's comment)
the input is guaranteed to have \n line endings (within the data)
the - only shows up in the one location (as depicted in OP's sample input); otherwise we need a better definition of where to start parsing the data
we're interested in parsing everything that follows the -
data is to be parsed based on a : delimiter, with the first field serving as the index in an associative array, while the 2nd field will be the value stored in the array
leading/trailing spaces to be removed from array indexes and values
One sed idea for pulling out just the lines we're interested in:
$ sed -n '/- /,${s/-//;p}' <<< "${string}"
roleTemplateAppId: destinationxsappname!b62
roleTemplateName: Destination_Administrator
name: Destination Administrator
Adding a few more bits to strip off leading/trailing spaces:
$ sed -n '/- /,${s/-//;s/^[ ]*//;s/[ ]*$//;s/[ ]*:[ ]*/:/;p}' <<< "${string}"
roleTemplateAppId:destination-xsappname!b62
roleTemplateName:Destination_Administrator
name:Destination Administrator
From here we'll feed this to a while loop where we'll populate the associative array
unset arrstring
declare -A arrstring # declare as an associative array
while IFS=':' read -r index value
do
arrstring["${index}"]="${value}"
done < <(sed -n '/- /,${s/-//;s/^[ ]*//;s/[ ]*$//;s/[ ]*:[ ]*/:/;p}' <<< "${string}")
Leaving us with:
$ typeset -p arrstring
declare -A arrstring=([roleTemplateAppId]="destination-xsappname!b62" [name]="Destination Administrator" [roleTemplateName]="Destination_Administrator" )
$ for i in "${!arrstring[#]}"
do
echo "$i : ${arrstring[$i]}"
done
roleTemplateAppId : destination-xsappname!b62
name : Destination Administrator
roleTemplateName : Destination_Administrator

Related

How do i add whitepsaces to a String while filling it up in a for-loop in Bash?

Have a string as follows:
files="applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml"
Want to extract the first two directories and save it as another string like this:
"applications/dbt applications/dbt applications/dataform pplications/dataform"
But while filling up the second string, its being saved as
applications/dbtapplications/dbtapplications/dataformapplications/dataform
What i tried:
files="applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml"
arr=($files)
#extracting the first two directories and saving it to a new string
for i in ${arr[#]}; do files2+=$(echo "$i" | cut -d/ -f 1-2); done
echo $files2
files2 echoes the following
applications/dbtapplications/dbtapplications/dataformapplications/dataform
Reusing your code as much as possible:
(assuming to only remove the last right part):
arr=( applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml )
#extracting the first two directories and saving it to a new string
for file in "${arr[#]}"; do
files2+="${file%/*} "
done
echo "$files2"
applications/dbt applications/dbt applications/dataform
You could use a for loop as requested
for dir in ${files};
do file2+=$(printf '%s ' "${dir%/*}")
done
which will give output
$ echo "$file2"
applications/dbt applications/dbt applications/dataform applications/dataform
However, it would be much easier with sed
$ sed -E 's~([^/]*/[^/]*)[^ ]*~\1~g' <<< $files
applications/dbt applications/dbt applications/dataform applications/dataform
Convert the string in an array first. Assuming there are no white/blank/newline space embedded in your strings/path name. Something like
#!/usr/bin/env bash
files="applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml"
mapfile -t array <<< "${files// /$'\n'}"
Now check the value of the array
declare -p array
Output
declare -a array=([0]="applications/dbt/Dockerfile" [1]="applications/dbt/cloudbuild.yaml" [2]="applications/dataform/Dockerfile" [3]="applications/dataform/cloudbuild.yaml")
Remove all the last / from the path name in the array.
new_array=("${array[#]%/*}")
Check the new value
declare -p new_array
Output
declare -a new_array=([0]="applications/dbt" [1]="applications/dbt" [2]="applications/dataform" [3]="applications/dataform")
Now the value is an array, assign it to a variable or do what ever you like with it. Like what was mentioned in the comment section. Use an array from the start.
Assign the first 2 directories/path in a variable (weird requirement)
new_var="${new_array[#]::2}"
declare -p new_var
Output
declare -- new_var="applications/dbt applications/dbt"

Reading CSV file in Shell Scripting

I am trying to read values from a CSV file dynamically based on the header. Here's how my input files can look like.
File 1:
name,city,age
john,New York,20
jane,London,30
or
File 2:
name,age,city,country
john,20,New York,USA
jane,30,London,England
I may not be following the best way to accomplish this but I tried the following code.
#!/bin/bash
{
read -r line
line=`tr ',' ' ' <<< $line`
while IFS=, read -r `$line`
do
echo $name
echo $city
echo $age
done
} < file.txt
I am expecting the above code read the values of the header as the variable names. I know that the order of columns can be different for the input file. But, I expect the files to have name, city and age columns in the input file. Is this the right approach? If so, what is the fix for the above code if fails with the error - "line7: name: command not found".
The issue is caused by the backticks. Bash will evaluate the contents and replace the backticks with the output from the command it just evaluated.
You can simply use the variable after the read command to achieve what you want:
#!/bin/bash
{
read -r line
line=`tr ',' ' ' <<< $line`
echo "$line"
while IFS=, read -r $line ; do
echo "person: $name -- $city -- $age"
done
} < file.txt
Some notes on your code:
The backtick syntax is legacy syntax, it is now preferred to use $(...) to evaluate commands. The new syntax is more flexible.
You can enable automatic script failure with set -euo pipefail (see here). This will make your script stop if it encounters an error.
You code is currently very sensitive to invalid header data:
with a file like
n ame,age,city,country
john,20,New York,USA
jane,30,London,England
your script (or rather the version in the beginning of my answer) will run without errors but with invalid output.
It is also good practice to quote variables to prevent unwanted splitting.
To make it much more robust, you can change it as follows:
#!/bin/bash
set -euo pipefail
# -e and -o pipefail will make the script exit
# in case of command failure (or piped command failure)
# -u will exit in case a variable is undefined
# (in you case, if the header is invalid)
{
read -r line
readarray -d, -t header < <(printf "%s" "$line")
# using an array allows to detect if one of the header entries
# contains an invalid character
# the printf is needed because bash would add a newline to the
# command input if using heredoc (<<<).
while IFS=, read -r "${header[#]}" ; do
echo "$name"
echo "$city"
echo "$age"
done
} < file.txt
A slightly different approach can let awk handle the field separation and ordering of the desired output given either of the input files. Below awk stores the desired output order in the f[] (field) array set in the BEGIN rule. Then on the first line in a file (FNR==1) the array a[] is deleted and filled with the headings from the current file. At that point you just loop over the field names in-order in the f[] array and output the corresponding field from the current line, e.g.
awk -F, '
BEGIN { f[1]="name"; f[2]="city"; f[3]="age" } # desired order
FNR==1 { # on first line read header
delete a # clear a array
for (i=1; i<=NF; i++) # loop over headings
a[$i] = i # index by heading, val is field no.
next # skip to next record
}
{
print "" # optional newline between outputs
for (i=1; i<=3; i++) # loop over desired field order
if (f[i] in a) # validate field in a array
print $a[f[i]] # output fields value
}
' file1 file2
Example Use/Output
In your case with the content you show in file1 and file2, you would have:
$ awk -F, '
> BEGIN { f[1]="name"; f[2]="city"; f[3]="age" } # desired order
> FNR==1 { # on first line read header
> delete a # clear a array
> for (i=1; i<=NF; i++) # loop over headings
> a[$i] = i # index by heading, val is field no.
> next # skip to next record
> }
> {
> print "" # optional newline between outputs
> for (i=1; i<=3; i++) # loop over desired field order
> if (f[i] in a) # validate field in a array
> print $a[f[i]] # output fields value
> }
> ' file1 file2
john
New York
20
jane
London
30
john
New York
20
jane
London
30
Where both files are read and handled identically despite having different field orderings. Let me know if you have further questions.
If using Bash verison ≥ 4.2, it is possible to use an associative array to capture an arbitrary number of fields with their name as a key:
#!/usr/bin/env bash
# Associative array to store columns names as keys and and values
declare -A fields
# Array to store columns names with index
declare -a column_name
# Array to store row's values
declare -a line
# Commands block consuming CSV input
{
# Read first line to capture column names
IFS=, read -r -a column_name
# Proces records
while IFS=, read -r -a line; do
# Store column values to corresponding field name
for ((i=0; i<${#column_name[#]}; i++)); do
# Fills fields' associative array
fields["${column_name[i]}"]="${line[i]}"
done
# Dump fields for debug|demo purpose
# Processing of each captured value could go there instead
declare -p fields
done
} < file.txt
Sample output with file 1
declare -A fields=([country]="USA" [city]="New York" [age]="20" [name]="john" )
declare -A fields=([country]="England" [city]="London" [age]="30" [name]="jane" )
For older Bash version, without associative array, use indexed column name alternatively:
#!/usr/bin/env bash
# Array to store columns names with index
declare -a column_name
# Array to store values for a line
declare -a value
# Commands block consuming CSV input
{
# Read first line to capture column names
IFS=, read -r -a column_name
# Proces records
while IFS=, read -r -a value; do
# Print record separator
printf -- '--------------------------------------------------\n'
# Print captured field name and value
for ((i=0; i<"${#column_name[#]}"; i++)); do
printf '%-18s: %s\n' "${column_name[i]}" "${value[i]}"
done
done
} < file.txt
Output:
--------------------------------------------------
name : john
age : 20
city : New York
country : USA
--------------------------------------------------
name : jane
age : 30
city : London
country : England

How to avoid the read command cutting the user input which is a string by space

I wrote a bash script to read multiple inputs from the user
Here is the command:
read -a choice
In this way, I can put all the inputs in the choice variable as an array so that I can extract them using an index.
The problem is that when one of the inputs, which is a string has space in it, like
user1 google.com "login: myLogin\npassword: myPassword"
the read command will split the quoted string into 3 words. How can I stop this from happening?
bash doesn't process quotes in user input. The only thing I can think of is to use eval to execute an array assignment.
IFS= read -r input
eval "choice=($input)"
Unfortunately this is dangerous -- if the input contains executable code, it will be executed by eval.
You can use a tab instead of space as a field delimiter. For instance :
$ IFS=$'\t' read -a choice
value1 value2 a value with many words ## This is typed
$ echo ${choice[2]}
a value with many words
Regards!
Given risk of using eval, and the fact the input seems to have only two types of tokens: unquoted, and quoted, consider using scripting engine that will put all text into proper format that will be easy to read.
It's not clear from the example what other quoting rules are used. Example assume 'standard' escaped that can be processed with bash #E processor.
The following uses Perl one liner to generate TAB delimited tokens (hopefully, raw tabs can not be part of the input, but other character can be used instead).
input='user1 google.com "login: myLogin\npassword: myPassword"'
tsv_input=$(perl -e '$_ = " $ARGV[0]" ; print $2 // $3, "\t" while ( /\s+("([^"]*)"|(\S*))/g) ;' "$input")
IFS=$'\t' read -d '' id domain values <<< $(echo -e "${tsv_input#E}")
Or using a function to get more readable code
function data_to_tsv {
# Translate to TSV
local tsv_input=$(perl -e '$_ = " $ARGV[0]" ; print $2 // $3, "\t" while ( /\s+("([^"]*)"|(\S*))/g) ;' "$1")
# Process escapes
echo -n "${tsv_input#E}"
}
input='user1 google.com "login: myLogin\npassword: myPassword"'
IFS=$'\t' read -d '' id domain values <<< $(data_to_tsv "$input")

Generate a column for each file matching a glob

I'm having difficulties with something that sounds relatively simple. I have a few data files with single values in them as shown below:
data1.txt:
100
data2.txt
200
data3.txt
300
I have another file called header.txt and its a template file that contains the header as shown below:
Data_1 Data2 Data3
- - -
I'm trying to add the data from the data*.txt files to the last line of Master.txt
The desired output would be something like this:
Data_1 Data2 Data3
- - -
100 200 300
I'm actively working this so I'm not sure where to begin. This doesn't need to be implemented in pure shell -- use of standard UNIX tools such as awk or sed is entirely reasonable.
paste is the key tool:
#!/bin/bash
exec >>Master.txt
cat header.txt
paste $'-d\n' data1.txt data2.txt data3.txt |
while read line1
do
read line2
read line3
printf '%-10s %-10s %-10s\n' "$line1" "$line2" "$line3"
done
As a native-bash implementation:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0+ needed" >&2; exit 1;; esac
declare -A keys=( ) # define an associative array (a string->string map)
for f in data*.txt; do # iterate over data*.txt files
name=${f%.txt} # for each, remove the ".txt" extension to get our name...
keys[${name^}]=$(<"$f") # capitalize the first letter, and read the file to get the value
done
{ # start a group so we can redirect output just once
printf '%s\t' "${!keys[#]}"; echo # first line: keys in our associative array
printf '%s\t' "${keys[#]//*/-}"; echo # second line: convert values to dashes
printf '%s\t' "${keys[#]}"; echo # third line: print the values unmodified
} >>Master.txt # all the above with output redirected to Master.txt
Most of the magic here is performed by parameter expansions:
${f%.txt} trims the .txt extension from the end of $f
${name^} capitalizes the first letter of $name
"${keys[#]}" expands to all values in the array named keys
"${keys[#]//*/-} replaces * (everything) in each key with the fixed string -.
"${!keys[#]}" expands to the names of entries in the associative array keys.

Bash command to read a line based on the parameters I pass - perform column-based lookups

I have a file links.txt:
1 a.sh
3 b.sh
6 c.sh
4 d.sh
So, if i pass 1,4 as parameters to another file(master.sh), a.sh and d.sh should be stored in a variable.
sed '3!d' would print the 3rd line, but not the line that starts with 3. For that, you need sed '/^3 /!d'. The problem is you can't combine them for more lines, as this means "Delete everything that doesn't start with a 3", which means all other lines will be missed. So, use sed -n '/^3 /p' instead, i.e. don't print by default and tell sed what lines to print, not what lines to delete.
You can loop over the argument and create a sed script from them that prints the lines, then run sed using this output:
#!/bin/bash
file=$1
shift
for id in "$#" ; do
echo "/^$id /p"
done | sed -nf- "$file"
Run as script.sh filename 3 4.
If you want to remove the id from the output, you can either use
cut -f2 -d' '
or you can modify the generated sed script to do the work
echo "/^$id /s/.* //p"
i.e. only print if the substitution was successful.
This loops through each argument and greps for it in the links file. The result is piped into cut where we specify the delimiter as a space with -d flag and the field number as 2 with -f flag. Finally this is appended to the array called files.
links="links.txt"
files=()
for arg in $#; do
files=("${files[#]}" `grep "^$arg" "$links" | cut -d" " -f2`)
done;
echo ${files[#]}
Usage:
$ ./master.sh 1 4
a.sh d.sh
Edit:
As pointed out by mklement0, the solution above reads the file once per arg. The following first builds the pattern then reads the file just once.
links="links.txt"
pattern="^$1\s"
for arg in ${#:2}; do
pattern+="|^$arg\s"
done
files=$(grep -E "$pattern" "$links" | cut -d" " -f2)
echo ${files[#]}
Usage:
$ ./master.sh 1 4
a.sh d.sh
Here is another example with grep and cut:
#!/bin/bash
for line in $(grep "$1\|$2" links.txt|cut -d' ' -f2)
do
echo $line
done
Example of usage:
./master.sh 1 4
a.sh
d.sh
Why not just stores the values and call them at will:
items=()
while read -r num file
do
items[num]="$file"
done<links.txt
for arg
do
echo "${items[arg]}"
done
Now you can use the items array any time you like :)
The following awk solution:
preserves the argument order; that is, the results reflect the order in which the lookup values were specified (as opposed to the order in which the lookup values happen to occur in the file).
If that is not important (i.e., if outputting the results in file order is acceptable), the readarray technique below can be combined with this one-liner, which is a generalized variant of Panta's answer:
grep -f <(printf "^%s\n" "$#") links.txt | cut -d' ' -f2-
performs well, because the input file is only read once; the only requirement is that all key-value pairs fit into memory as a whole (as a single associative Awk array (dictionary)).
works with any lookup values that don't have embedded whitespace.
Similarly, the assumption is that the output column values (containing values such as a.sh in the sample input) have no embedded whitespace. awk doesn't handle quoted fields well, so more work would be needed.
#!/bin/bash
readarray -t files < <(
awk -v idList="$*" '
BEGIN { count=split(idList, idArr); for (i in idArr) idDict[idArr[i]]++ }
$1 in idDict { idDict[$1] = $2 }
END { for (i=1; i<=count; ++i) print idDict[idArr[i]] }
' links.txt
)
# Print results.
printf '%s\n' "${files[#]}"
readarray -t files reads stdin input (<) line by line into array variable files.
Note: readarray requires Bash v4+; on Bash 3.x, such as on macOS, replace this part with
IFS=$'\n' read -d '' -ra files
<(...) is a Bash process substitution that, loosely speaking, presents the output from the enclosed command as if it were (self-deleting) temporary file.
This technique allows readarray to run in the current shell (as opposed to a subshell if a pipeline had been used), which is necessary for the files variable to remain defined in the remainder of the script.
The awk command breaks down as follows:
-v idList="$*" passes the space-separated list of all command-line arguments as a single string to Awk variable idList.
Note that this assumes that the arguments have no embedded spaces, which is indeed the case here and also generally the case with identifiers.
BEGIN { ... } is only executed once, before the individual lines are processed:
split(idList, idArr) splits the input ID list into an array by whitespace and stores the result in idArr.
for (i in idArr) idDict[idArr[i]]++ } then converts the (conceptually regular) array into associative array idDict (dictionary), whose keys are the input IDs - this enables efficient lookup by ID later, and also allows storing the lookup result for each ID.
$1 in idDict { idDict[$1] = $2 } is processed for every input line:
Pattern $1 in idDict returns true if the line's first whitespace-separated field ($1) - e.g., 6 - is among the keys (in) of associative array idDict, and, if so, executes the associated action ({...}).
Action { idDict[$1] = $2 } then assigns the second field ($2) - e.g., c.sh - to the iDict entry for key $1.
END { ... } is executed once, after all input lines have been processed:
for (i=1; i<=count; ++i) print idDict[idArr[i]] loops over all input IDs in order and prints each ID's lookup result, which is the value of the dictionary entry with that ID.

Resources