How to create a dictionary from a text file in bash? - bash

I want to create a dictionary in bash from a text file which looks like this:
H96400275|A
H96400276|B
H96400265|C
H96400286|D
Basically I want a dictionary like this from this file file.txt:
KEYS VALUES
H96400275 = A
H96400276 = B
H96400265 = C
H96400286 = D
I created following script:
#!/bin/bash
declare -a dictionary
while read line; do
key=$(echo $line | cut -d "|" -f1)
data=$(echo $line | cut -d "|" -f2)
dictionary[$key]="$data"
done < file.txt
echo ${dictionary[H96400275]}
However, this does not print A, rather it prints D. Can you please help ?

Associative arrays (dictionaries in your terms) are declared using -A, not -a. For references to indexed (ones declared with -a) arrays' elements, bash performs arithmetic expansion on the subscript ($key and H96400275 in this case); so you're basically overwriting dictionary[0] over and over, and then asking for its value; thus D is printed.
And to make this script more effective, you can use read in conjunction with a custom IFS to avoid cuts. E.g:
declare -A dict
while IFS='|' read -r key value; do
dict[$key]=$value
done < file
echo "${dict[H96400275]}"
See Bash Reference Manual ยง 6.7 Arrays.

the only problem is that you have to use -A instead of -a
-a Each name is an indexed array variable (see Arrays above).
-A Each name is an **associative** array variable (see Arrays above).

What you want to do is so named associative array. And to declare it you need to use command:
declare -A dictionary

Related

how to assign each of multiple lines in a file as different variable?

this is probably a very simple question. I looked at other answers but couldn't come up with a solution. I have a 365 line date file. file as below,
01-01-2000
02-01-2000
I need to read this file line by line and assign each day to a separate variable. like this,
d001=01-01-2000
d002=02-01-2000
I tried while read commands but couldn't get them to work.It takes a lot of time to shoot one by one. How can I do it quickly?
Trying to create named variable out of an associative array, is time waste and not supported de-facto. Better use this, using an associative array:
#!/bin/bash
declare -A array
while read -r line; do
printf -v key 'd%03d' $((++c))
array[$key]=$line
done < file
Output
for i in "${!array[#]}"; do echo "key=$i value=${array[$i]}"; done
key=d001 value=01-01-2000
key=d002 value=02-01-2000
Assumptions:
an array is acceptable
array index should start with 1
Sample input:
$ cat sample.dat
01-01-2000
02-01-2000
03-01-2000
04-01-2000
05-01-2000
One bash/mapfile option:
unset d # make sure variable is not currently in use
mapfile -t -O1 d < sample.dat # load each line from file into separate array location
This generates:
$ typeset -p d
declare -a d=([1]="01-01-2000" [2]="02-01-2000" [3]="03-01-2000" [4]="04-01-2000" [5]="05-01-2000")
$ for i in "${!d[#]}"; do echo "d[$i] = ${d[i]}"; done
d[1] = 01-01-2000
d[2] = 02-01-2000
d[3] = 03-01-2000
d[4] = 04-01-2000
d[5] = 05-01-2000
In OP's code, references to $d001 now become ${d[1]}.
A quick one-liner would be:
eval $(awk 'BEGIN{cnt=0}{printf "d%3.3d=\"%s\"\n",cnt,$0; cnt++}' your_file)
eval makes the shell variables known inside your script or shell. Use echo $d000 to show the first one of the newly defined variables. There should be no shell special characters (like * and $) inside your_file. Remove eval $() to see the result of the awk command. The \" quoted %s is to allow spaces in the variable values. If you don't have any spaces in your_file you can remove the \" before and after %s.

Bash: Issue when iterating string with lines [duplicate]

I have a JSON data as follows in data.json file
[
{"original_name":"pdf_convert","changed_name":"pdf_convert_1"},
{"original_name":"video_encode","changed_name":"video_encode_1"},
{"original_name":"video_transcode","changed_name":"video_transcode_1"}
]
I want to iterate through the array and extract the value for each element in a loop. I saw jq. I find it difficult to use it to iterate. How can I do that?
Just use a filter that would return each item in the array. Then loop over the results, just make sure you use the compact output option (-c) so each result is put on a single line and is treated as one item in the loop.
jq -c '.[]' input.json | while read i; do
# do stuff with $i
done
By leveraging the power of Bash arrays, you can do something like:
# read each item in the JSON array to an item in the Bash array
readarray -t my_array < <(jq --compact-output '.[]' input.json)
# iterate through the Bash array
for item in "${my_array[#]}"; do
original_name=$(jq --raw-output '.original_name' <<< "$item")
changed_name=$(jq --raw-output '.changed_name' <<< "$item")
# do your stuff
done
jq has a shell formatting option: #sh.
You can use the following to format your json data as shell parameters:
cat data.json | jq '. | map([.original_name, .changed_name])' | jq #sh
The output will look like:
"'pdf_convert' 'pdf_convert_1'"
"'video_encode' 'video_encode_1'",
"'video_transcode' 'video_transcode_1'"
To process each row, we need to do a couple of things:
Set the bash for-loop to read the entire row, rather than stopping at the first space (default behavior).
Strip the enclosing double-quotes off of each row, so each value can be passed as a parameter to the function which processes each row.
To read the entire row on each iteration of the bash for-loop, set the IFS variable, as described in this answer.
To strip off the double-quotes, we'll run it through the bash shell interpreter using xargs:
stripped=$(echo $original | xargs echo)
Putting it all together, we have:
#!/bin/bash
function processRow() {
original_name=$1
changed_name=$2
# TODO
}
IFS=$'\n' # Each iteration of the for loop should read until we find an end-of-line
for row in $(cat data.json | jq '. | map([.original_name, .changed_name])' | jq #sh)
do
# Run the row through the shell interpreter to remove enclosing double-quotes
stripped=$(echo $row | xargs echo)
# Call our function to process the row
# eval must be used to interpret the spaces in $stripped as separating arguments
eval processRow $stripped
done
unset IFS # Return IFS to its original value
From Iterate over json array of dates in bash (has whitespace)
items=$(echo "$JSON_Content" | jq -c -r '.[]')
for item in ${items[#]}; do
echo $item
# whatever you are trying to do ...
done
Try Build it around this example. (Source: Original Site)
Example:
jq '[foreach .[] as $item ([[],[]]; if $item == null then [[],.[0]] else [(.[0] + [$item]),[]] end; if $item == null then .[1] else empty end)]'
Input [1,2,3,4,null,"a","b",null]
Output [[1,2,3,4],["a","b"]]
None of the answers here worked for me, out-of-the-box.
What did work was a combination of a few:
projectList=$(echo "$projRes" | jq -c '.projects[]')
IFS=$'\n' # Read till newline
for project in ${projectList[#]}; do
projectId=$(jq '.id' <<< "$project")
projectName=$(jq -r '.name' <<< "$project")
...
done
unset IFS
NOTE: I'm not using the same data as the question does, in this example assume projRes is the output from an API that gives us a JSON list of projects, eg:
{
"projects": [
{"id":1,"name":"Project"},
... // array of projects
]
}
An earlier answer in this thread suggested using jq's foreach, but that may be much more complicated than needed, especially given the stated task. Specifically, foreach (and reduce) are intended for certain cases where you need to accumulate results.
In many cases (including some cases where eventually a reduction step is necessary), it's better to use .[] or map(_). The latter is just another way of writing [.[] | _] so if you are going to use jq, it's really useful to understand that .[] simply creates a stream of values.
For example, [1,2,3] | .[] produces a stream of the three values.
To take a simple map-reduce example, suppose you want to find the maximum length of an array of strings. One solution would be [ .[] | length] | max.
Here is a simple example that works in zch shell:
DOMAINS='["google","amazon"]'
arr=$(echo $DOMAINS | jq -c '.[]')
for d in $arr; do
printf "Here is your domain: ${d}\n"
done
I stopped using jq and started using jp, since JMESpath is the same language as used by the --query argument of my cloud service and I find it difficult to juggle both languages at once. You can quickly learn the basics of JMESpath expressions here: https://jmespath.org/tutorial.html
Since you didn't specifically ask for a jq answer but instead, an approach to iterating JSON in bash, I think it's an appropriate answer.
Style points:
I use backticks and those have fallen out of fashion. You can substitute with another command substitution operator.
I use cat to pipe the input contents into the command. Yes, you can also specify the filename as a parameter, but I find this distracting because it breaks my left-to-right reading of the sequence of operations. Of course you can update this from my style to yours.
set -u has no function in this solution, but is important if you are fiddling with bash to get something to work. The command forces you to declare variables and therefore doesn't allow you to misspell a variable name.
Here's how I do it:
#!/bin/bash
set -u
# exploit the JMESpath length() function to get a count of list elements to iterate
export COUNT=`cat data.json | jp "length( [*] )"`
# The `seq` command produces the sequence `0 1 2` for our indexes
# The $(( )) operator in bash produces an arithmetic result ($COUNT minus one)
for i in `seq 0 $((COUNT - 1))` ; do
# The list elements in JMESpath are zero-indexed
echo "Here is element $i:"
cat data.json | jp "[$i]"
# Add or replace whatever operation you like here.
done
Now, it would also be a common use case to pull the original JSON data from an online API and not from a local file. In that case, I use a slightly modified technique of caching the full result in a variable:
#!/bin/bash
set -u
# cache the JSON content in a stack variable, downloading it only once
export DATA=`api --profile foo compute instance list --query "bar"`
export COUNT=`echo "$DATA" | jp "length( [*] )"`
for i in `seq 0 $((COUNT - 1))` ; do
echo "Here is element $i:"
echo "$DATA" | jp "[$i]"
done
This second example has the added benefit that if the data is changing rapidly, you are guaranteed to have a consistent count between the elements you are iterating through, and the elements in the iterated data.
This is what I have done so far
arr=$(echo "$array" | jq -c -r '.[]')
for item in ${arr[#]}; do
original_name=$(echo $item | jq -r '.original_name')
changed_name=$(echo $item | jq -r '.changed_name')
echo $original_name $changed_name
done

How do i add whitepsaces to a String while filling it up in a for-loop in Bash?

Have a string as follows:
files="applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml"
Want to extract the first two directories and save it as another string like this:
"applications/dbt applications/dbt applications/dataform pplications/dataform"
But while filling up the second string, its being saved as
applications/dbtapplications/dbtapplications/dataformapplications/dataform
What i tried:
files="applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml"
arr=($files)
#extracting the first two directories and saving it to a new string
for i in ${arr[#]}; do files2+=$(echo "$i" | cut -d/ -f 1-2); done
echo $files2
files2 echoes the following
applications/dbtapplications/dbtapplications/dataformapplications/dataform
Reusing your code as much as possible:
(assuming to only remove the last right part):
arr=( applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml )
#extracting the first two directories and saving it to a new string
for file in "${arr[#]}"; do
files2+="${file%/*} "
done
echo "$files2"
applications/dbt applications/dbt applications/dataform
You could use a for loop as requested
for dir in ${files};
do file2+=$(printf '%s ' "${dir%/*}")
done
which will give output
$ echo "$file2"
applications/dbt applications/dbt applications/dataform applications/dataform
However, it would be much easier with sed
$ sed -E 's~([^/]*/[^/]*)[^ ]*~\1~g' <<< $files
applications/dbt applications/dbt applications/dataform applications/dataform
Convert the string in an array first. Assuming there are no white/blank/newline space embedded in your strings/path name. Something like
#!/usr/bin/env bash
files="applications/dbt/Dockerfile applications/dbt/cloudbuild.yaml applications/dataform/Dockerfile applications/dataform/cloudbuild.yaml"
mapfile -t array <<< "${files// /$'\n'}"
Now check the value of the array
declare -p array
Output
declare -a array=([0]="applications/dbt/Dockerfile" [1]="applications/dbt/cloudbuild.yaml" [2]="applications/dataform/Dockerfile" [3]="applications/dataform/cloudbuild.yaml")
Remove all the last / from the path name in the array.
new_array=("${array[#]%/*}")
Check the new value
declare -p new_array
Output
declare -a new_array=([0]="applications/dbt" [1]="applications/dbt" [2]="applications/dataform" [3]="applications/dataform")
Now the value is an array, assign it to a variable or do what ever you like with it. Like what was mentioned in the comment section. Use an array from the start.
Assign the first 2 directories/path in a variable (weird requirement)
new_var="${new_array[#]::2}"
declare -p new_var
Output
declare -- new_var="applications/dbt applications/dbt"

Dynamically creating associative arrays in bash

I have a variable ($OUTPUT) that contains the following name / value pairs:
member_id=4611686018429783292
platform=Xbox
platform_id=1
character_id=2305843009264966985
period_dt=2020-11-25 20:31:14.923158 UTC
mode=all Crucible modes
mode_id=5
activities_entered=18
activities_won=10
activities_lost=8
assists=103
kills=233
average_kill_distance=15.729613
total_kill_distance=3665
seconds_played=8535
deaths=118
average_lifespan=71.72269
total_lifespan=8463.277
opponents_defeated=336
efficiency=2.8474576
kills_deaths_ratio=1.9745762
kills_deaths_assists=2.411017
suicides=1
precision_kills=76
best_single_game_kills=-1
Each line ends with \n.
I want to loop through them, and parse them into an associative array, and the access the values in the array by the variable names:
while read line
do
key=${line%%=*}
value=${line#*=}
echo $key=$value
data[$key]="$value"
done < <(echo "$OUTPUT")
#this always prints the last value
echo ${data['seconds_played']}
This seems to work, i.e. key/value print the right values, but when I try to pull any values from the array, it always returns the last value (in this case -1).
I feel like im missing something obvious, but have been banging my head against it for a couple of hours.
UPDATE: My particular issue is I'm running a version of bash (3.2.57 on OSX) that doesn't support associative arrays). I'll mark the correct answer below.
Without declare -A data, then data is a normal array. In normal arrays expressions in [here] first undergo expansions, then arithmetic expansion. Inside arithmetic expansion unset variables are expanded to 0. You are effectively only just setting data[0]=something, because data[$key] is data[seconds_played] -> variable seconds_played is not defined, so it expands to data[0]
Add declare -A data and it "should work". You could also just:
declare -A data
while IFS== read -r key value; do
data["$key"]="$value"
done <<<"$OUTPUT"
Try declaring data as an associative array before populating it, eg:
$ typeset -A data # declare as an associative array
$ while read line
do
key=${line%%=*}
value=${line#*=}
echo $key=$value
data[$key]="$value"
done <<< "${OUTPUT}"
$ typeset -p data
declare -A data=([mode]="all Crucible modes" [period_dt]="2020-11-25 20:31:14.923158 UTC" [deaths]="118" [best_single_game_kills]="-1" [efficiency]="2.8474576" [precision_kills]="76" [activities_entered]="18" [seconds_played]="8535" [total_lifespan]="8463.277" [average_lifespan]="71.72269" [character_id]="2305843009264966985" [kills]="233" [activities_won]="10" [average_kill_distance]="15.729613" [activities_lost]="8" [mode_id]="5" [assists]="103" [suicides]="1" [total_kill_distance]="3665" [platform]="Xbox" [kills_deaths_ratio]="1.9745762" [platform_id]="1" [kills_deaths_assists]="2.411017" [opponents_defeated]="336" [member_id]="4611686018429783292" )
$ echo "${data['seconds_played']}"
8535

Open file with two columns and dynamically create variables

I'm wondering if anyone can help. I've not managed to find much in the way of examples and I'm not sure where to start coding wise either.
I have a file with the following contents...
VarA=/path/to/a
VarB=/path/to/b
VarC=/path/to/c
VarD=description of program
...
The columns are delimited by the '=' and some of the items in the 2nd column may contain gaps as they aren't just paths.
Ideally I'd love to open this in my script once and store the first column as the variable and the second as the value, for example...
echo $VarA
...
/path/to/a
echo $VarB
...
/path/to/a
Is this possible or am I living in a fairy land?
Thanks
You might be able to use the following loop:
while IFS== read -r name value; do
declare "$name=$value"
done < file.txt
Note, though, that a line like foo="3 5" would include the quotes in the value of the variable foo.
A minus sign or a special character isn't allowed in a variable name in Unix.
You may consider using BASH associative array for storing key and value together:
# declare an associative array
declare -A arr
# read file and populate the associative array
while IFS== read -r k v; do
arr["$k"]="$v"
done < file
# check output of our array
declare -p arr
declare -A arr='([VarA]="/path/to/a" [VarC]="/path/to/c" [VarB]="/path/to/b" [VarD]="description of program" )'
What about source my-file? It won't work with spaces though, but will work for what you've shared. This is an example:
reut#reut-home:~$ cat src
test=123
test2=abc/def
reut#reut-home:~$ echo $test $test2
reut#reut-home:~$ source src
reut#reut-home:~$ echo $test $test2
123 abc/def

Resources