I tried accessing the object using both the generic object index and the shorthand version of it but it appears that the generic object index won't work. Can someone explain to me why?
I have the following json file jsonsample.txt
{
"id": "v4cw72hf3",
"output": {
"url": "//srv01.cloudconvert.com/download/~ugl5vnrpfO",
"downloads": 0
},
}
The Jq manual explained the generic syntax
Generic Object Index: .[<string>]
You can also look up fields of an object using syntax like .["foo"] (.foo above is a shorthand version of this, but only for identifier-like strings).
and I tried two ways to access the url field
jq '["output"].["url"]' jsonsample.txt
jq .output.url jsonsample.txt
But the first one doesn't give me the desired results
#Result for the first line
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.["output"].["url"]
jq: 1 compile error
shell returned 3
#Results for the second line
"//srv01.cloudconvert.com/download/~ugl5vnrpfO"
The input is not quite valid JSON, so the following assumes it has been fixed.
The basic form for a pipeline of array and/or object accessors is
.[<string-or-integer>] | .[<string-or-integer>] | ...
So you'd be safe with .["output"]|.["url"]
Certain abbreviations are allowed, but different versions of jq differ in the details. However, it's generally safe to remove an interior |., i.e. one would expect
.["output"]["url"]
to work, as indeed it does going back at least to jq 1.3.
The restricted .foo.bar notation is also supported going back at least to jq 1.3.
jq 1.4 added support for unrestricted dot-string notation wherein the key name can be any valid JSON string (i.e. with quotation marks), e.g.
."foo with space"."bar with space"
My environment created a variable that looks like this:
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}
EDIT by Ed Morton: per the OPs comment below, this is what (s)he is trying to describe above as the sample input:
$ SM_TRAINING_ENV='{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}'
$ echo "$SM_TRAINING_ENV"
{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}
How can I create a new bash variable that is equal to the value of SM_TRAINING_ENV["hyperparameters"]["model_dir"]?
For completeness, I was trying simple things like echo ${SM_TRAINING_ENV} | jq . and kept getting errors with everything I tried.
Edit: I've been informed that this value isn't a proper json, so rewording the question. I think the environment sets it to the value of a python dictionary, so jq seems not usable. Removed json tag. Maybe this is a job for awk?
It looks like I can match the value I want if I assume the structure doesn't change with the regex pattern s3.*?model, but not sure how to set a regex pattern to a new variable.
First, you need to quote the JSON value so that the double quotes will be included in the value.
SM_TRAINING_ENV='{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}'
Then you can use the jq utility to extract the value you want.
new_var=$(echo "$SM_TRAINING_ENV" | jq '.hyperparameters.model_dir')
This doesn't really index but it works if order is always the same:
NEW_VAR=$(echo $SM_TRAINING_ENV | egrep -o s3.*?model | head -1)
Would much prefer something not dependent on order though.
I have exported a lot of JSON files, almost 100.
They are basically huge one-liners.
Each file has "uid":"random".
How to cut only "uid":"random" part for all files from the master directory?
I've tried with grep {} but can't workaround the quotes on the uid and the value.
Also, I tried to cut it but still the quotes are the problem.
Line structure:
..."title":"Random title","uid":"r4nd0muid","version":X},"overwrite": true}
Don't use anything but JSON aware tools for processing JSON files and post a proper sample for testing but to workaround the quotes on the uid and the value on the posted string:
$ grep -o \"uid\":\"[^\"]*\" foodata
"uid":"r4nd0muid"
Basically "uid":"[^"]*", ie. after "uid":" all non-" and a ".
It's probably a lot easier to use jq which is a search and transformation tool for JSON files.
Given the file test.json:
{
"uuid": "whatever"
}
You can extract just the uuid field with:
jq '.uuid' test.json
# output: "whatever"
So, I am writing a shell script and I am running a command that gives me an output like:
{"a":"some_text","b":some_other_text","c":"even_more_text"}
Now, I am not sure how to parse it, I basically need the value of "c", i.e. "ever_more_text" in a variable, but finding out results on internet have not worked yet! TIA.
the output which you paste here is not valid json. Check with https://jsonformatter.curiousconcept.com/ There is missing first double quote in "some_other_text". If you add it, you can then easily parse with jq:
./your_script.sh | jq -r ".c"
This question already has answers here:
jq not working on tag name with dashes and numbers
(2 answers)
Closed 4 years ago.
I have a (not so complicated) json file and I need to extract its contents using bash. I want to use jq for the processing, it should be straightforward. The problem is that I'm getting a weird error in the processing that I don't know how to solve (because I don't know what is causing it).
A minimal sample causing me problems:
{
"E23763": {
"data": "information"
}
}
If I just run jq to pretty-print it, it works:
$ cat test.json | jq .
{
"E23763": {
"data": "information"
}
}
But if I try to extract the first field, it fails criptically:
$ cat test.json | jq .E23763
jq: error: Invalid numeric literal at EOF at line 1, column 7 (while parsing '.E23763') at <top-level>, line 1:
.E23763
jq: 1 compile error
The expected result would had been:
{
"data": "information"
}
Anyone found a similar issue? Why it is complaining about a numeric literal when he is really looking into a string?
Quotation didn't seem to matter here, same error.
Please refer to this issue on GitHub there are many responses posted here which might help you with your problem: https://github.com/stedolan/jq/issues/1526
I'll post one of the solutions here however:
jq '.["E23763"]' test.json
Another Solution as said by #Inian is:
jq '."E23763"' json
Without the [], in this case it was the correct solution but try both nonetheless
Basically the parser is buggy and treats .E as the beginning of a number.