How to insert a line after a match using sed (or awk) in a formatted JSON file? - macos

{
"id": "a1234567-89ab-cdef-0123-456789abcdef",
"properties": {
...
"my_id": "c1234567-89ab-cdef-0123-456789abcdef",
...
}
Given the above in a file, I want to be able to perform a match (including the 4 leading spaces) on my_id and then append a new line "my_value": "abcd",. The desired output would look like this:
{
"id": "a1234567-89ab-cdef-0123-456789abcdef",
"properties": {
...
"my_id": "c1234567-89ab-cdef-0123-456789abcdef",
"my_value": "abcd",
...
}
Using examples online, I'm unable to get the command to work. Here is an example of something I have tried: sed '/.*"my_id".*/a "my_value": "abcd",' test.json, for which I receive the following error: command a expects \ followed by text.
What is the correct way to structure this command?

Using any awk:
$ awk -v new='"my_value": "abcd",' '{print} sub(/"my_id":.*/,""){print $0 new}' file
{
"id": "a1234567-89ab-cdef-0123-456789abcdef",
"properties": {
...
"my_id": "c1234567-89ab-cdef-0123-456789abcdef",
"my_value": "abcd",
...
}
The above will print the new line using whatever indent the existing "my_id" line has, it doesn't assume/hard-code any indent, e.g. 4 blanks.
I'm using this:
sub(/"my_id":.*/,""){print $0 new}
instead of the briefer:
sub(/"my_id":.*/,new)
so it won't break if the new string contains any backreference chars such as &.

awk procedure with passed argument for insert value
The following awk procedure allows for 'abcd' to be passed as an argument for insertion (allowing it to be set in a bash script if required).
awk -v insertVal="abcd" '/"my_id":/{$0=$0"\n \"my_value\": \""insertVal"\","} {print}' dat.txt
explanation
The required insertion string ('abcd' in this case) is passed as an argument using the -v variable switch followed by a variable name and value: insertVal="abcd".
The first awk action block has a pattern condition to only act on lines containing the target-line string (in this case "my_id":). When a line with that pattern is found, the line is extended with a new line mark \n, the required four spaces to start the next line, the specified key named "my_value", and the value associated with the key, passed by argument as the variable named insertVal ("abcd"), and the final , character. Note the need to escape the " quotes to render them.
The final awk block, prints the current line (whether or not it was modified).
test
The procedure was tested on Mac Terminal using GNU Awk 5.2.0.
The output generated (from the input data saved to a file named dat.txt) is:
{
"id": "a1234567-89ab-cdef-0123-456789abcdef",
"properties": {
...
"my_id": "c1234567-89ab-cdef-0123-456789abcdef",
"my_value": "abcd",
...
}

Using sed
$ sed -e '/my_id/{p;s/id.*"/value": "abcd"/' -e '}' input_file
{
"id": "a1234567-89ab-cdef-0123-456789abcdef",
"properties": {
...
"my_id": "c1234567-89ab-cdef-0123-456789abcdef",
"my_value": "abcd",
...
}

With your shown samples and attempts please try following GNU awk code. Where newVal is an awk variable having new value in it. Using match function in GNU awk where I have used regex (.*)("my_id": "[^"]*",)(.*) which creates 3 capturing groups and saves values into an array named arr. Then printing values as per requirement.
awk -v newVal='"my_value": "abcd",' -v RS= '
match($0,/(.*)("my_id": "[^"]*",)(.*)/,arr){
print arr[1] arr[2] newVal arr[3]
}
' Input_file

This might work for you (GNU sed):
sed '/"my-id".*/p;s//"my-value": "abcd"/' file
Match on "my-id" and print that line, then substitute the additional line.

Related

How to use sed command to replace value in json file

My json file looks like this:
"parameters": {
"$connections": {
"value": {
"azureblob": {
"connectionId": "/subscriptions/2b06d50xxxxxedd021/resourceGroups/Reource1005/providers/Microsoft.Web/connections/azureblob",
"connectionName": "azureblob",
"connectionProperties": {
"authentication": {
"type": "ManagedServiceIdentity"
}
},
"id": "/subscriptions/2b06d502-3axxxxxxedd021/providers/Microsoft.Web/locations/eastasia/managedApis/azureblob"
},
"office365": {
"connectionId": "/subscriptions/2b06d502xxxxxc8-5a8939edd021/resourceGroups/Reource1005/providers/Microsoft.Web/connections/office365",
"connectionName": "office365",
"id": "/subscriptions/2b06d50xxxxxx939edd021/providers/Microsoft.Web/locations/eastasia/managedApis/office365"
}
}
}
}
}
I want to use sed command to replace the string in connectionId, currently my script is as follows:
script: 'sed -e ''/connectionId/c\ \"connectionId\" : \"/subscriptions/2b06d50xxxxb-92c8-5a8939edd021/resourceGroups/Reourcetest/providers/Microsoft.Web/connections/azureblob\",'' "$(System.DefaultWorkingDirectory)/function-app-actions/templates/copycode.json"'
This script can replace the strings in both connectionIds in the json file with "Resourcetest", that's what I want to make the strings in the second connectionId replace with other values, how can I do that?
I'm new to sed commands, any insight is appreciated。
Edit:
I just want to replace "Resource1005" in both connectionId strings in the json file with "Resourcetest", but I need other content in the connectionIds string to keep the previous value
So my expected output should look like this:
"connectionId": "/subscriptions/2b06d502-3axxxx8939edd021/resourceGroups/Reourcetest/providers/Microsoft.Web/connections/azureblob"
"connectionId": "/subscriptions/2b06d502-3axxxx8939edd021/resourceGroups/Reourcetest/providers/Microsoft.Web/connections/office365"
If I use the script I mentioned above, it does replace the two Resource1005s, but the other values in the string are also replaced with the same (I just want to replace the Resource1005 value)
1st solution: With your shown samples and attempts, please try following GNU awk code. This will print only edited lines(as per shown samples) in output with substituting Resource1005 with Resourcetest in values.
awk -v RS='[[:space:]]+"connectionId": "[^"]*' '
RT{
sub(/\n+[[:space:]]+/,"",RT)
sub(/\/Resource1005\//,"/Resourcetest/",RT)
print RT
}
' Input_file
2nd solution: With sed you can try following sed code.
sed -nE 's/(^[[:space:]]+"connectionId": ".*)\/Resource1005\/(.*)/\1\/Resourcetest\/\2/p' Input_file
Common practice is to create template files and change them with sed or something else. Like this for example:
cat template.json
...
"office365": {
"connectionId": "__CONNECTIONID__",
"connectionName": "office365",
"id": "__ID__"
}
...
sed 's|__CONNECTIONID__|/some/path|; s|__ID__|/some/other/path|' template.json > new.json

Extract json value on regex on bash script

How can i get the values inner depends in bash script?
manifest.py
# Commented lines
{
'category': 'Sales/Subscription',
'depends': [
'sale_subscription',
'sale_timesheet',
],
'auto_install': True,
}
Expected response:
sale_subscription sale_timesheet
The major problem is linebreak, i have already tried | grep depends but i can not get the sale_timesheet value.
Im trying to add this values comming from files into a var, like:
DOWNLOADED_DEPS=($(ls -A $DOWNLOADED_APPS | while read -r file; do cat $DOWNLOADED_APPS/$file/__manifest__.py | [get depends value])
Example updated.
If this is your JSON file:
{
"category": "Sales/Subscription",
"depends": [
"sale_subscription",
"sale_timesheet"
],
"auto_install": true
}
You can get the desired result using jq like this:
jq -r '.depends | join(" ")' YOURFILE.json
This uses .depends to extract the value from the depends field, pipes it to join(" ") to join the array with a single space in between, and uses -r for raw (unquoted) output.
If it is not a json file and only string then you can use below Regex to find the values. If it's json file then you can use other methods like Thomas suggested.
^'depends':\s*(?:\[\s*)(.*?)(?:\])$
demo
you can use egrep for this as follows:
% egrep -M '^\'depends\':\s*(?:\[\s*)(.*?)(?:\])$' pathTo\jsonFile.txt
you can read about grep
As #Thomas has pointed out in a comment, the OPs input data is not in JSON format:
$ cat manifest.py
# Commented lines // comments not allowed in JSON
{
'category': 'Sales/Subscription', // single quotes should be replaced by double quotes
'depends': [
'sale_subscription',
'sale_timesheet', // trailing comma at end of section not allowed
],
'auto_install': True, // trailing comma issue; should be lower case "true"
}
And while the title of the question mentions regex, there is no sign of a regex in the question. I'll leave a regex based solution for someone else to come up with and instead ...
One (quite verbose) awk solution based on the input looking exactly like what's in the question:
$ awk -F"'" ' # use single quote as field separator
/depends/ { printme=1 ; next } # if we see the string "depends" then set printme=1
printme && /]/ { printme=0 ; next} # if printme=1 and line contains a right bracket then set printme=0
printme { printf pfx $2; pfx=" " } # if printme=1 then print a prefix + field #2;
# first time around pfx is undefined;
# subsequent passes will find pfx set to a space;
# since using "printf" with no "\n" in sight, all output will stay on a single line
END { print "" } # add a linefeed on the end of our output
' json.dat
This generates:
sale_subscription sale_timesheet

How to extract text between two patterns with sed/awk

I know this has been asked 1000 times here, but I read a lot of similar questions and still did not manage to find the right way to do this. I need to extract a number from a line that looks like this:
{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}
Expected output:
2034.2
This version number will not always be the same, but the rest of the line should.
I have tried working with sed but I am new to this and failed:
sed -e 's/version":[\(.*\),"description/\1/'
output:
sed: -e expression #1, char 35: unterminated `s' command
I think the issue is that there are too many special characters involved in the line and I did not write the command very well.
Since it's JSON, use should use JSON aware tools for processing it. If you prefer, for example, awk, the way is to use GNU awk's JSON extension. This is a small how-to.
First download and compile appropriate versions of GNU awk, Gawkextlib and gawk-json. That's pretty straightforward, actually, just ./configure and make. Then, write some code:
awk '
#load "json" # enable json extension
{
lines=lines $0 # read json file records and buffer to var lines
if(json_fromJSON(lines,data)==1) { # once the json is complete
for(i in data["info"]["version"]) # that seems to be an array so all elements
print data["info"]["version"][i] # are outputed
lines="" # once done with the first json object
} # reset the var for more lines
}' file
Output this time:
2034.2
Explained a bit more:
The JSON file structure can vary from one line to multiple lines, for example:
{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}
or:
{
"version": "4.9.123M",
"info": {
"version": [
2034.2
],
"description": ""
},
"status": "OK"
}
so we need to buffer the JSON lines with lines=lines $0 until there is a whole valid object in variable lines. We use the extension function json_fromJSON() to determine that validity in if(json_fromJSON(lines,data)==1). While validated the object gets disentangled and stored to array data. For this particular object the structure of the array is:
data["version"]="4.9.123M"
data["info"]["version"][1]="2034.2"
data["info"]["description"]=""
data["status"]="OK"
We could examine the object and produce some output of it with this recursive array scanning function:
awk '
#load "json"
function scan(a,p, q) { # a is array, p path to it, q is qnd *
if(isarray(a))
for(i in a) {
q=p (p==""?"":"->") i
scan(a[i],q)
}
else
print p ":" a
}
{
lines=lines $0
if(json_fromJSON(lines,data)==1)
scan(data) #
}' file.json
Output:
status:OK
version:4.9.123M
info->version->1:2034.2
info->description:
*) quick'n dirty
Here is a brief example of how to output JSON from an array: https://stackoverflow.com/a/58109715/4162356
If the version is always enclosed in [] and no other [ or ] is present in a line ,you can try this logic
STR='{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}'
echo $STR | awk -F'[' '{print $2}' | awk -F']' '{print $1}'
Simplest Way
Try grep when want to extract simple texts
echo "{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}"| grep -o "\[.*\]" | sed -e 's/\[\|\]//g'
This should do:
STR='{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}'
echo "$STR" | awk -F'[][]' '{print $2}'
2034.2

Text replace in a file, on 5h line, from position 18 to position 145

I have this text file:
{
"name": "",
"auth": true,
"username": "rtorrent",
"password": "d5275b68305438499f9660b38980d6cef7ea97001efe873328de1d76838bc5bd15c99df8b432ba6fdcacbff82e3f3c4829d34589cf43236468d0d0b0a3500c1e"
}
Now, I want to be able to replace the d5275b68305438499f9660b38980d6cef7ea97001efe873328de1d76838bc5bd15c99df8b432ba6fdcacbff82e3f3c4829d34589cf43236468d0d0b0a3500c1e using sed for example. (The string has always the exact same length, but the values can be different)
I've tried this using sed:
sed -i 5s/./new-string/18 file.json
That basically replaces text, on the 5th line, starting with position 18. I want to be able to replace the text, exactly starting with position 18 and up to position 154, strictly what's inside the "". The command above will cut the ", at the end of the file and if it's run multiple times, the string becomes every time longer and longer.
Any help is really appreciated.
You can use for example awk for it:
$ awk -v var="new_string" 'NR==5{print substr($0,1,17) var substr($0,146);next}1' file
{
"name": "",
"auth": true,
"username": "rtorrent",
"password": "new_string"
}
but there are better tools for changing a value in a JSON, jq for example:
$ jq '.password="new_string"' file
{
"name": "",
"auth": true,
"username": "rtorrent",
"password": "new_string"
}
Edit: When passing a shell variable $var to awk and jq:
$ var="new_string"
$ awk -v var="$var" 'NR==5{print substr($0,1,17) var substr($0,146);next}1' file
and
$ jq --arg var "$var" '.password=$var'
Edit2: There is always sed:
$ sed -i "5s/\"[^\"]*\"/\"$var\"/2" file

Bash/*NIX: split a file into multiple files on a substring

Variants of this question have been asked and answered before, but I find that my sed/grep/awk skills are far too rudimentary to work from those to a custom solution since I hardly ever work in shell scripts.
I have a rather large (100K+ lines) text file in which each line defines a GeoJSON object, each such object including a property called "county" (there are, all told, 100 different counties). Here's a snippet:
{"type": "Feature", "properties": {"county":"ALAMANCE", "vBLA": 0, "vWHI": 4, "vDEM": 0, "vREP": 2, "vUNA": 2, "vTOT": 4}, "geometry": {"type":"Polygon","coordinates":[[[-79.537429,35.843303],[-79.542428,35.843303],[-79.542428,35.848302],[-79.537429,35.848302],[-79.537429,35.843303]]]}},
{"type": "Feature", "properties": {"county":"NEW HANOVER", "vBLA": 0, "vWHI": 0, "vDEM": 0, "vREP": 0, "vUNA": 0, "vTOT": 0}, "geometry": {"type":"Polygon","coordinates":[[[-79.532429,35.843303],[-79.537428,35.843303],[-79.537428,35.848302],[-79.532429,35.848302],[-79.532429,35.843303]]]}},
{"type": "Feature", "properties": {"county":"ALAMANCE", "vBLA": 0, "vWHI": 0, "vDEM": 0, "vREP": 0, "vUNA": 0, "vTOT": 0}, "geometry": {"type":"Polygon","coordinates":[[[-79.527429,35.843303],[-79.532428,35.843303],[-79.532428,35.848302],[-79.527429,35.848302],[-79.527429,35.843303]]]}},
I need to split this into 100 separate files, each containing one county's GeoJSONs, and each named xxxx_bins_2016.json (where xxxx is the county's name). I'd also like the final character (comma) at the end of each such file to go away.
I'm doing this in Mac OSX, if that matters. I hope to learn a lot by studying any solutions you could suggest, so if you feel like taking the time to explain the 'why' as well as the 'what' that would be fantastic. Thanks!
EDITED to make clear that there are different county names, some of them two-word names.
jq can kind of do this; it can group the input and output one line of text per group. The shell then takes care of writing each line to an appropriately named file. jq itself doesn't really have the ability to open files for writing that would allow you to do this in a single process.
jq -Rn -c '[inputs[:-1]|fromjson] | group_by(.properties.county)[]' tmp.json |
while IFS= read -r line; do
county=$(jq -r '.[0].properties.county' <<< $line)
jq -r '.[]' <<< "$line" > "$county.txt"
done
[inputs[:-1]|fromjson] reads each line of your file as a string, strips the trailing comma, then parses the line as JSON and wraps the lines into a single array. The resulting array is sorted and grouped by county name, then written to standard output, one group per line.
The shell loop reads each line, extracts the county name from the first element of the group with a call to jq, then uses jq again to write each element of the group to the appropriate file, again one element per line.
(A quick look at https://github.com/stedolan/jq/issues doesn't appear to show any requests yet for an output function that would let you open and write to a file from inside a jq filter. I'm thinking of something like
jq -Rn '... | group_by(.properties.county) | output("\(.properties.county).txt")' tmp.json
without the need for the shell loop.)
If using string parsing rather than proper JSON parsing to extract the county name is acceptable - brittle in general, but would work in this simple case - consider Sam Tolton's GNU awk answer, which has the potential to be by far the simplest and fastest solution.
To complement chepner's excellent answer with a variation that focuses on performance:
jq -Rrn '[inputs[:-1]|fromjson] | .properties.county + "|" + (.|tostring)' file |
awk -F'|' '{ print $2 > ($1 "_bins_2016.json") }'
Shell loops are avoided altogether, which should speed up the operation.
The general idea is:
Use jq to trim the trailing , from each input line, interpret the trimmed string as JSON, extract the county name, then output the trimmed JSON strings prepended with the county name and a distinct separator, |.
Use an awk command to split each line into the prepended county name and the trimmed JSON string, which allows awk to easily construct the output filename and write the JSON string to it.
Note: The awk command keeps all output files open until the script has finished, which means that, in your case, 100 output files will be open simultaneously - a number that shouldn't be a problem, however.
In cases where it is a problem, you can use the following variation, in which jq first sorts the lines by county name, which then allows awk to immediately close the previous output field whenever the next county is reached in the input:
jq -Rrn '
[inputs[:-1]|fromjson] | sort_by(.properties.county)[] |
.properties.county + "|" + (.|tostring)
' file |
awk -F'|' '
prevCounty != $1 { if (outFile) close(outFile); outFile = $1 "_bins_2016.json" }
{ print $2 > outFile; prevCounty = $1 }
'
A simpler version of chepner's answer:
while IFS= read -r line
do
countyName=$(jq --raw-output '.properties.county' <<<"${line: : -1}")
jq <<< "${line: : -1}" >> "$countyName"_bins_2016.json
done<file
The idea is to filter the county name using a jq filter after stripping the , from each line of your input file. Then the line is passed to jq as plain stream to produce a JSON file in prettified format.
If you are from a relatively older version of bash (< 4.0) use "${line%?}" over "${line: : -1}"
For example with the change above, one of your county becomes,
cat ALAMANCE_bins_2016.json
{
"type": "Feature",
"properties": {
"county": "ALAMANCE",
"vBLA": 0,
"vWHI": 0,
"vDEM": 0,
"vREP": 0,
"vUNA": 0,
"vTOT": 0
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-79.527429,
35.843303
],
[
-79.532428,
35.843303
],
[
-79.532428,
35.848302
],
[
-79.527429,
35.848302
],
[
-79.527429,
35.843303
]
]
]
}
}
Note: The current solution could be performance intensive as reading file line by line is an expensive operation, and equally invoking jq for each of the lines.
This will do what you want minus getting rid of the last comma:-
gawk 'match($0, /"county":"([^"]+)/, array){ print >array[1]"_bins_2016.json" }' INPUT_FILE
This will output files in the current path with a filename in the format COUNTRY NAME_bins_2016.json.
The script goes line by line and uses a regex to match the exact term "country":" followed by 1 or more characters that aren't a ". It captures the characters within the quotes and then uses it as part of the filename to append the current line to.
To remove the trailing comma from all .json files in the current path you could use:-
sed -i '$ s/,$//' *.json
If you were certain that the last char was always a comma, a faster solution would be to use truncate:-
truncate -s-1 *.json
Last part taken from this answer: https://stackoverflow.com/a/40568723/1453798
Here is a quickie script that will do the job. It has the virtue of working on most systems without having to install any other tools.
IFS=$'\n'
counties=( $( sed 's/^.*"county":"//;s/".*$//' counties.txt ) )
unset IFS
for county in "${!counties[#]}"
do
county="${counties[$i]}"
filename="$county".out.txt
echo "'$filename'"
grep "\"$county\"" counties.txt > "$filename"
done
The setting of IFS to \n allows the array elements to contain spaces. The sed command strips off all the text up to the start of the county name and all the text after it. The for loop is the form that allows iterating over the array. Finally, the grep command needs to have double quotes around the search string so that counties that are substrings of other counties don't accidentally get put into the wrong file.
See this section of the GNU BASH Reference Manual for more info.

Resources