How to parse an output enclosed with braces? - shell

So, I am writing a shell script and I am running a command that gives me an output like:
{"a":"some_text","b":some_other_text","c":"even_more_text"}
Now, I am not sure how to parse it, I basically need the value of "c", i.e. "ever_more_text" in a variable, but finding out results on internet have not worked yet! TIA.

the output which you paste here is not valid json. Check with https://jsonformatter.curiousconcept.com/ There is missing first double quote in "some_other_text". If you add it, you can then easily parse with jq:
./your_script.sh | jq -r ".c"

Related

Create new bash var from value of dict bash var

My environment created a variable that looks like this:
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}
EDIT by Ed Morton: per the OPs comment below, this is what (s)he is trying to describe above as the sample input:
$ SM_TRAINING_ENV='{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}'
$ echo "$SM_TRAINING_ENV"
{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}
How can I create a new bash variable that is equal to the value of SM_TRAINING_ENV["hyperparameters"]["model_dir"]?
For completeness, I was trying simple things like echo ${SM_TRAINING_ENV} | jq . and kept getting errors with everything I tried.
Edit: I've been informed that this value isn't a proper json, so rewording the question. I think the environment sets it to the value of a python dictionary, so jq seems not usable. Removed json tag. Maybe this is a job for awk?
It looks like I can match the value I want if I assume the structure doesn't change with the regex pattern s3.*?model, but not sure how to set a regex pattern to a new variable.
First, you need to quote the JSON value so that the double quotes will be included in the value.
SM_TRAINING_ENV='{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}'
Then you can use the jq utility to extract the value you want.
new_var=$(echo "$SM_TRAINING_ENV" | jq '.hyperparameters.model_dir')
This doesn't really index but it works if order is always the same:
NEW_VAR=$(echo $SM_TRAINING_ENV | egrep -o s3.*?model | head -1)
Would much prefer something not dependent on order though.

Cutting in bash everything around a string

I have exported a lot of JSON files, almost 100.
They are basically huge one-liners.
Each file has "uid":"random".
How to cut only "uid":"random" part for all files from the master directory?
I've tried with grep {} but can't workaround the quotes on the uid and the value.
Also, I tried to cut it but still the quotes are the problem.
Line structure:
..."title":"Random title","uid":"r4nd0muid","version":X},"overwrite": true}
Don't use anything but JSON aware tools for processing JSON files and post a proper sample for testing but to workaround the quotes on the uid and the value on the posted string:
$ grep -o \"uid\":\"[^\"]*\" foodata
"uid":"r4nd0muid"
Basically "uid":"[^"]*", ie. after "uid":" all non-" and a ".
It's probably a lot easier to use jq which is a search and transformation tool for JSON files.
Given the file test.json:
{
"uuid": "whatever"
}
You can extract just the uuid field with:
jq '.uuid' test.json
# output: "whatever"

Parse past all occurrences of a delimiter

I am new to Bash, but hoping this is simple to do. I have the following couple lines of code:
LOCATION='C:\\proj\\myproject\\node_modules\\protractor\\node_modules\\webdriver-manager\\selenium\\chromedriver_2.29.exe'
FILENAME=${LOCATION}
How do I parse past all the backslashes, go to the end of the path, extract the file name and assign it to $FILENAME (in this case 'chromedriver_2.29.exe') ?
This should do the trick:
FILENAME=${LOCATION##*'\\'}
See details on parameter expansion in Bash here.

Running sed ON a variable in bash script

Apologies for a seemingly inane question. But I have spent the whole day trying to figure it out and it drives me up the walls. I'm trying to write a seemingly simple bash script that would take a list of files in the directory from ls, replace part of the file names using sed, get unique names from the list and pass them onto some command. Like so:
inputs=`ls *.ext`
echo $inputs
test1_R1.ext test1_R2.ext test2_R1.ext test2_R2.ext
Now I would like to put it through sed to replace 1.ext and 2.ext with * to get test1_R* etc. Then I'd like to remove resulting duplicates by running sort -u to arrive to the following $outputs variable:
echo $outputs
test1_R* test2_R*
And pass this onto a command, like so
cat $outputs
I can do something like this in a command line:
ls *.ext | sed s/..ext/\*/g | sort -u
But if I try to assign the above to a variable in the script it just returns the output from the ls. I have tried several ways to do it: including the whole pipe in the script. Running each command separately and assigning it to a variable, then passing that variable to the next command and writing the outputs to files then passing the file to the next command. But so far none of this managed to achieve what I aimed to. I think my problem lies in (except general cluelessness aroung bash scripting) inability to run seq on a variable within script. There seems to be a lot of advice around in how to pass variables to pattern or replacement string in sed, but they all seem to take files as input. But I understand that it might not be the proper way of doing it anyway. Therefore I would really appreciate if someone could suggest an elegant way to achieve, what I'm trying to.
Many thanks!
Update 2/06/2014
Hi Barmar, thanks for your answer. Can't say it solved the problem, but it helped pin-pointing it. Seems like the problem is in me using the asterisk. I have to say, I'm very puzzled. The actual file names I've got are:
test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz
If I'm using the code you suggested, which seems to me the right way do to it:
ins=$(ls *.fastq.gz | sed 's/..fastq.gz/\*/g' | sort -u)
Sed doesn't seem to do anything and I'm getting the output of ls:
test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz
Now if I replace that backslash with anything else, the sed works, but it also returns whatever character I'm putting in front (or after) the asteriks:
ins=$(ls *.fastq.gz | sed 's/..fastq.gz/"*/g' | sort -u)
test1_R"* test2_R"*
That's odd enough, but surely I can just put an "R" in front of the asteriks and then replace R in the search pattern string, right? Wrong! If I do that whichever way: 's/R..fastq.gz/R*/g' 's/...fastq.gz/R*/g' 's/[A-Z]..fastq.gz/R*/g' I'm back to the original names! And even if I end up with something like test1_RR* test2_RR* and try to run it through sed again and replace "_R" for "_" or "RR" for "R", I'm having no luck and I'm back to the original names. And yet I can replace the rest of the file name no problem, just not to get me test1_R* I need.
I have a feeling I should be escaping that * in some very clever way, but nothing I've tried seems to work. Thanks again for your help!
This is how you capture the result of the whole pipeline in a variable:
var=$(ls *.ext | sed s/..ext/\*/g | sort -u)

Bash variable character replacement ends up to an empty string or a command not valid

I am working on a shell script to retrieve variable content from a JSON file via JQ. The JSON file is in string format (no matter whether this is a real string or a number) and to retrieve the variable in my bash script I did something like this
my_domain=$(cat /vagrant/data_bags/config.json | jq ."app"[0]."domain")
The above code once echoed results in "mydomain" with a beginning and a trailing quote sign. I though this was a normal behaviour of the echo command. However, while concatenating my variable with another shell command the system raise an error. For instance, the following command
cp /vagrant/public_html/index.php "/var/www/"+$my_domain+"/index.php"
fails with the following error
cp: cannot create regular file `/var/www/+"mydomain"+/index.php': No such file or directory
At this stage, I wasn't able to identify whether it's me doing the wrong concatenation with the plus sign or the variable is effectively including the quotes that in any case will end up generating an error.
I have tried to replace the quotes in my variable, but I ended up getting the system raising a "Command not found" error.
Can somebody suggest what am I doing wrong?
+ is not used for string concatenation in bash (or perl, or php). Just:
cp /vagrant/public_html/index.php "/var/www/$my_domain/index.php"
Embedding a variable inside a double-quoted text string is known as interpolation, and is one of the reasons why we need the $ prefix, to indicate that this is a variable. Interpolation is specifically not done inside single quoted strings.
Braces ${my_domain} are not required because the / directory separators are not valid characters in a variable name, so there is no ambiguity.
For example:
var='thing'
echo "Give me your ${var}s" # Correct, appends an 's' after 'thing'
echo "Give me your $vars" # incorrect, looks for a variable called vars.
If a variable (like 'vars') does not exist then (by default) it will not complain, it will just give an empty string. Braces (graph brackets) are required more in c-shell (csh or tcsh) because of additional syntax for modifying variables, which involves special trailing characters.
You don't need to use + to concatenate string in bash, change your command to
cp /vagrant/public_html/index.php "/var/www/"${my_domain}"/index.php"
My problem was not related only to the wrong concatenation, but also to the JQ library that after parsing the value from the JSon file was returning text between quotes.
In order to avoid JQ doing this, just add the -rawoutput parameter when calling JQ.

Resources