Cutting in bash everything around a string - bash

I have exported a lot of JSON files, almost 100.
They are basically huge one-liners.
Each file has "uid":"random".
How to cut only "uid":"random" part for all files from the master directory?
I've tried with grep {} but can't workaround the quotes on the uid and the value.
Also, I tried to cut it but still the quotes are the problem.
Line structure:
..."title":"Random title","uid":"r4nd0muid","version":X},"overwrite": true}

Don't use anything but JSON aware tools for processing JSON files and post a proper sample for testing but to workaround the quotes on the uid and the value on the posted string:
$ grep -o \"uid\":\"[^\"]*\" foodata
"uid":"r4nd0muid"
Basically "uid":"[^"]*", ie. after "uid":" all non-" and a ".

It's probably a lot easier to use jq which is a search and transformation tool for JSON files.
Given the file test.json:
{
"uuid": "whatever"
}
You can extract just the uuid field with:
jq '.uuid' test.json
# output: "whatever"

Related

Adding quotes to variating characters in bash

I am trying to use the sed function in order to add double quotes for anything in between a matched pattern and a comma to break of the pattern. At the moment I am extracting the following data from cloudflare and I am trying to modify it to line protocol;
count=24043,clientIP=x.x.x.x,clientRequestPath=/abc/abs/abc.php
count=3935,clientIP=y.y.y.y,clientRequestPath=/abc/abc/abc/abc.html
count=3698,clientIP=z.z.z.z,clientRequestPath=/abc/abc/abc/abc.html
I have already converted to this format from JSON output with a bunch of sed functions to modify it, however, I am unable to get to the bottom of it to put the data for clientIP and clientRequestPath in inverted commas.
My expected output has to be;
count=24043,clientIP="x.x.x.x",clientRequestPath="/abc/abs/abc.php"
count=3935,clientIP="y.y.y.y",clientRequestPath="/abc/abc/abc/abc.html"
count=3698,clientIP="z.z.z.z",clientRequestPath="/abc/abc/abc/abc.html"
This data will be imported into InfluxDB, count will be a float whilst clientIP and clientRequestPath will be strings, hence why I need them to be in inverted commas as at the moment I am getting errors since they arent as they should be.
Is anyone available to provided to adequate 'sed' function to do is?
This might work for you (GNU sed):
sed -E 's/=([^0-9][^,]*)/="\1"/g' file
Enclose any string following a = does not begin with a integer upto a , in double quotes, globally.
here is a solution using a SED script to allow for multiple operations on a source file.
assuming your source data is in a file "from.dat"
create a sed script to run multiple commands
cat script.sed
s/clientIP=/clientIP=\"/
s/,clientRequestPath/\",clientRequestPath/
execute multiple-command sed script on data file redirecting the output file "to.dat"
sed -f script.sed from.dat > to.dat
cat to.dat (only showing one line)
count=24043,clientIP="x.x.x.x",clientRequestPath=/abc/abs/abc.php

Converting a TXT file with double quotes to a pipe-delimited format using sed

I'm trying to convert TXT files into pipe-delimited text files.
Let's say I have a file called sample.csv:
aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z
I'd like to convert this into an output that looks like this:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Now after tons of searching, I have come the closest using this sed command:
sed -r 's/""/\v/g;s/("([^"]+)")?,/\2\|/g;s/"([^"]+)"$/\1/;s/\v/"/g'
However, the output that I received was:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|pppqqq|rrr" sss|ttt,"uuu|Z
Where the expected for the 9th column should have been ppp"qqq" but the result removed the double quotes and what I got was pppqqq.
I have been playing around with this for a while, but to no avail.
Any help regarding this would be highly appreciated.
As suggested in comments sed or any other Unix tool is not recommended for this kind of complex CSV string. It is much better to use a dedicated CSV parser like this in PHP:
$s = 'aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z';
echo implode('|', str_getcsv($s));
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|nnnooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
The problem with sample.csv is that it mixes non-quoted fields (containing quotes) with fully quoted fields (that should be treated as such).
You can't have both at the same time. Either all fields are (treated as) unquoted and quotes are preserved, or all fields containing a quote (or separator) are fully quoted and the quotes inside are escaped with another quote.
So, sample.csv should become:
"aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z
to give you the desired result (using a csv parser):
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Have the same problem.
I found right result with https://www.papaparse.com/demo
Here is a FOSS on github. So maybe you can check how it works.
With the source of [ "aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z ]
The result appears in the browser console:
[1]: https://i.stack.imgur.com/OB5OM.png

Create new bash var from value of dict bash var

My environment created a variable that looks like this:
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}
EDIT by Ed Morton: per the OPs comment below, this is what (s)he is trying to describe above as the sample input:
$ SM_TRAINING_ENV='{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}'
$ echo "$SM_TRAINING_ENV"
{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}
How can I create a new bash variable that is equal to the value of SM_TRAINING_ENV["hyperparameters"]["model_dir"]?
For completeness, I was trying simple things like echo ${SM_TRAINING_ENV} | jq . and kept getting errors with everything I tried.
Edit: I've been informed that this value isn't a proper json, so rewording the question. I think the environment sets it to the value of a python dictionary, so jq seems not usable. Removed json tag. Maybe this is a job for awk?
It looks like I can match the value I want if I assume the structure doesn't change with the regex pattern s3.*?model, but not sure how to set a regex pattern to a new variable.
First, you need to quote the JSON value so that the double quotes will be included in the value.
SM_TRAINING_ENV='{"additional_framework_parameters":{},"channel_input_dirs":{"training":"/opt/ml/input/data/training"},"current_host":"algo-1","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1"],"hyperparameters":{"bool_param":true,"float_param":1.25,"int_param":5,"model_dir":"s3://bucket/detection/prefix/testing-2019-04-06-02-24-20-194/model","str_param":"bla"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"training":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"testing-2019-04-06-02-24-20-194","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://bucket/prefix/testing-2019-04-06-02-24-20-194/source/sourcedir.tar.gz","module_name":"launcher.sh","network_interface_name":"ethwe","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"ethwe"},"user_entry_point":"launcher.sh"}'
Then you can use the jq utility to extract the value you want.
new_var=$(echo "$SM_TRAINING_ENV" | jq '.hyperparameters.model_dir')
This doesn't really index but it works if order is always the same:
NEW_VAR=$(echo $SM_TRAINING_ENV | egrep -o s3.*?model | head -1)
Would much prefer something not dependent on order though.

How to parse an output enclosed with braces?

So, I am writing a shell script and I am running a command that gives me an output like:
{"a":"some_text","b":some_other_text","c":"even_more_text"}
Now, I am not sure how to parse it, I basically need the value of "c", i.e. "ever_more_text" in a variable, but finding out results on internet have not worked yet! TIA.
the output which you paste here is not valid json. Check with https://jsonformatter.curiousconcept.com/ There is missing first double quote in "some_other_text". If you add it, you can then easily parse with jq:
./your_script.sh | jq -r ".c"

Dynamicly build string path for curl capath

Curl has the option to add capath as one of its arguments.
This argument can contain one path or several paths in this format:
curl --capath /certs/path1:/certs/path2:/certs/path3 https://domain.com
Is it possible to use curl capath arg with subfolders by only adding the root dir such as /certs/ ?
And if not i would like to build the string which automatically expands to this
format: /certs/path1:/certs/path2:/certs/path3
When i echo this command :
echo /certs/*
/certs/path1 /certs/path2 /certs/path3
required output:
/certs/path1:/certs/path2:/certs/path3
The idea is to have some automatic expanding method that will do that without sed awk or external tool.
something like this:
curl --capath /certs/*{:} https://domain.com
will automatlcy result with :
curl --capath /certs/path1:/certs/path2/:/certs/path3 https://domain.com
Unfortunately, I don't see any way to do it without an externl program.
Well, you could do
for s in /certs/*
do
path+="$s:"
done
But I don't think you are looking for this.
The point is that you are using the wildcard character '*' and it is interpreted by your shell as a list of string, separated with space.
Else just put in a variable
var=`/certs/* | tr ' ' ':'`
Hope this will answer. And if someone can find a real solution, then I want to know it too =)

Resources