I would like to get the values of the name fields of the following text, using sed, awk, grep or similar.
{
"cast": [
{
"character": "",
"credit_id": "52532e3119c29579400012b5",
"gender": null,
"id": 23629,
"name": "Brian O'Halloran",
"order": 0,
"profile_path": "/eJsLxovTdcm6QK9PDB2pCe5FMqK.jpg"
},
{
"character": "",
"credit_id": "52532e3119c2957940001315",
"gender": null,
"id": 19302,
"name": "Jason Mewes",
"order": 1,
"profile_path": "/so3nT2vgSCZMO2QXDVHF3ubxaFX.jpg"
},
{
"character": "",
"credit_id": "52532e3119c295794000133d",
"gender": null,
"id": 23630,
"name": "Jeff Anderson",
"order": 2,
"profile_path": "/vjt5WhpJAx0jxvmiGc5PAOBzzb7.jpg"
},
{
"character": "Silent Bob",
"credit_id": "52532e3219c2957940001359",
"gender": null,
"id": 19303,
"name": "Kevin Smith",
"order": 4,
"profile_path": "/3XXThSMqHQgQFjM4bMJ25U1EJTj.jpg"
}
],
"crew": [
{
"credit_id": "55425dbe9251410efa000094",
"department": "Visual Effects",
"gender": null,
"id": 1419667,
"job": "Animation Manager",
"name": "Richard J. Gasparian",
"profile_path": null
},
{
"credit_id": "5544521dc3a3680ce60037e8",
"department": "Art",
"gender": null,
"id": 1450356,
"job": "Background Designer",
"name": "Tristin Cole",
"profile_path": null
},
{
"credit_id": "554a142dc3a3683c84001851",
"department": "Art",
"gender": null,
"id": 1447432,
"job": "Background Designer",
"name": "Nadia Vurbenova",
"profile_path": null
},
{
"credit_id": "554bcd2b9251414692002c9b",
"department": "Production",
"gender": null,
"id": 1447493,
"job": "Location Manager",
"name": "Simon Rodgers",
"profile_path": null
},
{
"credit_id": "52532e3219c29579400013cd",
"department": "Production",
"gender": null,
"id": 19303,
"job": "Executive Producer",
"name": "Kevin Smith",
"profile_path": "/3XXThSMqHQgQFjM4bMJ25U1EJTj.jpg"
},
{
"credit_id": "52532e3319c2957940001405",
"department": "Production",
"gender": null,
"id": 59839,
"job": "Producer",
"name": "Harvey Weinstein",
"profile_path": "/k4UCnh7n0r5CEjq30gAl6QCfF9g.jpg"
},
{
"credit_id": "52532e3319c29579400014a7",
"department": "Production",
"gender": null,
"id": 1307,
"job": "Producer",
"name": "Bob Weinstein",
"profile_path": "/oe5Oxp034UOubnvZqqhurp6a1EP.jpg"
}
],
"id": 2
}
jq is the right tool for processing JSON data:
getting all name key values:
jq '[.cast[], .crew[] | .name]' file
The output:
[
"Brian O'Halloran",
"Jason Mewes",
"Jeff Anderson",
"Kevin Smith",
"Richard J. Gasparian",
"Tristin Cole",
"Nadia Vurbenova",
"Simon Rodgers",
"Kevin Smith",
"Harvey Weinstein",
"Bob Weinstein"
]
To get just a list of strings, use the following:
jq '.cast[], .crew[] | .name' file
The output:
"Brian O'Halloran"
"Jason Mewes"
"Jeff Anderson"
"Kevin Smith"
"Richard J. Gasparian"
"Tristin Cole"
"Nadia Vurbenova"
"Simon Rodgers"
"Kevin Smith"
"Harvey Weinstein"
"Bob Weinstein"
To get just a list of unquoted " strings, add the -r (--raw-output) option:
jq -r '.cast[], .crew[] | .name' file
Another jq approach :
jq '.[]?|.[]?|.name' file
"Brian O'Halloran"
"Jason Mewes"
"Jeff Anderson"
"Kevin Smith"
"Richard J. Gasparian"
"Tristin Cole"
"Nadia Vurbenova"
"Simon Rodgers"
"Kevin Smith"
"Harvey Weinstein"
"Bob Weinstein"
Note: The .foo? usage [ see manpage ] does not output even an error when . is not an array or an object.
Helo buddy, with awk just do it:
awk '/name/{gsub("[\",]*", "");print $2}' yourFile.txt
Best regards!! ;)
try with another following awk solutions too here.
1st approach: IN case you need to have double course around the and name values.
awk -F'[:,]' '/name/{sub(/^ +/,"",$2);print $2}' Input_file
2nd approach: In case if you need only the name's values then following may help you in same.
awk -F'[":,]' '/name/{print $5}' Input_file
Explanation of 1st approach: Simply making (: and ,) as field separators and then looking for string name in a line if it contains then substituting the initial space of 2nd field with NULL and then printing the 2nd field's value.
Explanation of 2nd approach: Making (" : and ,) as field separators. Then searching for string name in a line, if it has it then printing the 5th field of that line.
You should parse the JSON file rather than use a regex.
You can use Ruby to do:
$ ruby -0777 -r json -lane '
d=JSON.parse($_)
(d["cast"]+d["crew"]).each { |x| p x["name"] }' json
"Brian O'Halloran"
"Jason Mewes"
"Jeff Anderson"
"Kevin Smith"
"Richard J. Gasparian"
"Tristin Cole"
"Nadia Vurbenova"
"Simon Rodgers"
"Kevin Smith"
"Harvey Weinstein"
"Bob Weinstein"
Or if you want to separate cast from crew:
$ ruby -0777 -r json -lane '
d=JSON.parse($_)
%w(cast crew).each {|w|
puts "#{w.capitalize}:"
(d[w]).each { |x| puts "\t#{x["name"]}" }}' json
Cast:
Brian O'Halloran
Jason Mewes
Jeff Anderson
Kevin Smith
Crew:
Richard J. Gasparian
Tristin Cole
Nadia Vurbenova
Simon Rodgers
Kevin Smith
Harvey Weinstein
Bob Weinstein
Related
I am new to Nifi, i hv a requirement where we get multiple JSON inputs with different Header Names. I have to parse the JSON and insert into different tables based on the Header value.
Not sure how to use RouteonContent processor or EvaluateJSON Path processor
Input 1
{
"Location": [
{
"country": "US",
"division": "Central",
"region": "Big South",
"locationID": 1015,
"location_name": "Hattiesburg, MS (XF)",
"location_type": "RETAIL",
"location_sub_type": "COS",
"store_type": "",
"planned_open_date": "",
"planned_close_date": "",
"actual_open_date": "2017-07-26",
"actual_close_date": "",
"new_store_flag": "",
"address1": "2100 Lincoln Road",
"address2": "",
"city": "Hattiesburg",
"state": "MS",
"zip": 39402,
"include_for_planning": "Y"
},
{
"country": "US",
"division": "Central",
"region": "Big South",
"locationID": 1028,
"location_name": "Laurel, MS",
"location_type": "RETAIL",
"location_sub_type": "COS",
"store_type": "",
"planned_open_date": "",
"planned_close_date": "",
"actual_open_date": "",
"actual_close_date": "",
"new_store_flag": "",
"address1": "1225 5th street",
"address2": "",
"city": "Laurel",
"state": "MS",
"zip": 39440,
"include_for_planning": "Y"
}
]
Input 2
{
"Item": [
{
"npi_code": "NEW",
"cifa_category": "XM",
"o9_category": "Accessories"
},
{
"npi_code": "NEW",
"cifa_category": "XM0",
"o9_category": "Accessories"
}
]
Use the website https://jsonpath.com/ to figure out the proper JSON expression. But what you could potentially do is use: if the array contains either $.npi_code then do X and if it contains $. country, then do Y
First of all, sorry for my English, I'm French.
I'm working on a script, which retrieves tags and links from M3U files to store them into variables.
M3U:
#EXTM3U
#EXTINF:-1 tvg-id="TFX.fr" tvg-name="TFX" tvg-country="FR;AD;BE;LU;MC;CH" tvg-language="French" tvg-logo="http://www.exemple.com/image.jpg" group-title="",TFX (720p)
https://tfx-hls-live-ssl.tf1.fr/tfx/1/hls/live_2328.m3u8
script:
#!/bin/bash
tags='#EXTINF:-1 tvg-id="TFX.fr" tvg-name="TFX" tvg-country="FR;AD;BE;LU;MC;CH" tvg-language="French" tvg-logo="http://www.exemple.com/image.jpg" group-title="Fiction",TFX (720p)'
get_chno="$(echo "$tags" | grep -o 'tvg-chno="[^"]*' | cut -d '"' -f2)"
get_id="$(echo "$tags" | grep -o 'tvg-id="[^"]*' | cut -d '"' -f2)"
get_logo="$(echo "$tags" | grep -o 'tvg-logo="[^"]*' | cut -d '"' -f2)"
get_grp_title="$(echo "$tags" | grep -o 'group-title="[^"]*' | cut -d '"' -f2)"
get_title="$(echo "$tags" | grep -o ',[^*]*' | cut -d ',' -f2)"
get_name="$(echo "$tags" | grep -o 'tvg-name="[^"]*' | cut -d '"' -f2)"
get_country="$(echo "$tags" | grep -o 'tvg-country="[^"]*' | cut -d '"' -f2)"
get_language="$(echo "$tags" | grep -o 'tvg-language="[^"]*' | cut -d '"' -f2)"
echo -e "chno:\n $get_chno"
echo -e "id:\n $get_id"
echo -e "logo:\n $get_logo"
echo -e "grp 1:\n $get_grp_title"
echo -e "title:\n $get_title"
echo -e "name:\n $get_name"
echo -e "country:\n $get_country"
echo -e "lang:\n $get_language"
I would like to store these variables in a json file.
This json will be used to rebuild another playlist.
#EXTM3U
#EXTINF:-1 tvg-id="TFX.fr" tvg-name="TFX" tvg-country="FR;AD;BE;LU;MC;CH" tvg-language="French" tvg-logo="http://www.exemple.com/image.jpg" group-title="",TFX (720p)
https://tfx-hls-live-ssl.tf1.fr/tfx/1/hls/live_2328.m3u8
#EXTINF:-1 tvg-id="TFX.fr" tvg-name="TFX" tvg-country="FR;AD;BE;LU;MC;CH" tvg-language="French" tvg-logo="http://127.0.0.1/img/image.jpg" group-title="",TFX (local)
http://127.0.0.1:1234/tfx/live.m3u8
The file which contains multiple arrays and multiple objects.
Like this :
{
"Channels": [
{
"name": "TFX",
"old_name": "NT1",
"logo": "http://www.exemple.com/image.jpg",
"category": "Fiction",
"urls": {
"Official": [
{
"server_name": "TF1",
"IP_address": "8.8.8.8",
"url": "tfx-hls-live-ssl.tf1.fr",
"port": "",
"https_port": "443",
"path": "tfx/1/hls/",
"file_name": "live_2328",
"extension": ".m3u8",
"full_url": "https://tfx-hls-live-ssl.tf1.fr/tfx/1/hls/live_2328.m3u8"
}
],
"Xtream_Servers": [
{
"server_name": "local",
"user_name": "rickey",
"stream_id": "11",
"category_name": "Fiction",
"category_id": "12"
}
]
},
"languages": [
{
"code": "fr",
"name": "Français"
}
],
"countries": [
{
"code": "fr",
"name": "France"
},
{
"code": "be",
"name": "Belgium"
}
],
"tvg": {
"id": "TFX.fr",
"name": "TFX",
"url": ""
}
},
{
"name": "France 2",
"old_name": "",
"logo": "http://www.exemple.com/image.jpg",
"category": "Général",
"urls": {
"Official": [
{
"server_name": "France TV",
"IP_address": "8.8.8.8",
"url": "france2.fr",
"port": "",
"https_port": "443",
"path": "live/",
"file_name": "Playlist",
"extension": ".m3u8",
"full_url": "https://france2.fr/live/Playlist.m3u8"
}
],
"Xtream_Servers": [
{
"server_name": "localhost",
"user_name": "rickey",
"stream_id": "2",
"category_name": "Général",
"category_id": "10"
}
]
},
"languages": [
{
"code": "fr",
"name": "Français"
}
],
"countries": [
{
"code": "fr",
"name": "France"
},
{
"code": "be",
"name": "Belgique"
}
],
"tvg": {
"id": "France2.fr",
"name": "France 2",
"url": ""
}
},
{
"name": "M6",
"old_name": "",
"logo": "http://www.exemple.com/image.jpg",
"category": "Général",
"urls": {
"Official": [
{
"server_name": "6Play",
"IP_address": "8.8.8.8",
"url": "6play.fr",
"port": "",
"https_port": "443",
"path": "live/",
"file_name": "Playlist",
"extension": ".m3u8",
"full_url": "https://6play.fr/M6/live/Playlist.m3u8"
}
],
"Xtream_Servers": [
{
"server_name": "localhost",
"user_name": "rickey",
"stream_id": "6",
"category_name": "Général",
"category_id": "10"
}
]
},
"languages": [
{
"code": "fr",
"name": "Français"
}
],
"countries": [
{
"code": "fr",
"name": "France"
},
{
"code": "be",
"name": "Belgique"
}
],
"tvg": {
"id": "France2.fr",
"name": "France 2",
"url": ""
}
}
],
"Third_Party": {
"Xtream_Servers": [
{
"server_name": "local",
"url": "192.168.1.100",
"port": "8080",
"https_port": "8082",
"server_protocol": "http",
"rtmp_port": "12345",
"Users_list": [
{
"username": "rickey",
"password": "azerty01",
"created_at": "",
"exp_date": "",
"is_trial": "0",
"last_check": "",
"max_connections": "3",
"allowed_output_formats": [
"m3u8",
"ts",
"rtmp"
]
}
]
},
{
"server_name": "localhost",
"url": "127.0.0.1",
"port": "8080",
"https_port": "8082",
"server_protocol": "http",
"rtmp_port": "12345",
"Users_list": [
{
"username": "rickey123",
"password": "azerty321",
"created_at": "",
"exp_date": "",
"is_trial": "0",
"last_check": "",
"max_connections": "3",
"allowed_output_formats": [
"m3u8",
"ts",
"rtmp"
]
},
{
"username": "guest",
"password": "guest01",
"created_at": "",
"exp_date": "",
"is_trial": "1",
"last_check": "",
"max_connections": "1",
"allowed_output_formats": [
"ts"
]
}
]
}
]
}
}
First question: Is it a crappy json?
To add or modify this file, the script must have the entry number (I think, if you have any other ideas, I'm interested...)
cat File.json | jq '.Channels | to_entries[]'
output:
{
"key": 0,
"value": {
"name": "TFX",
"old_name": "NT1",
2nd question:
How to get value key (0 is this case) with the value of "name", for store into variable after ? (to avoid duplicates)
key_="$(cat file.json | jq ????????? search="name": "$get_name" ???? .key)"
echo $key_
"0"
key_2="$(cat file.json | jq ????????? search="name": "$get_url" ???? .key)"
echo $key_2
"0"
if [[ $key_ == $key_2 ]]; then
Chan_Name="$(cat $1 | jq '.Channels[$key_].name)"
Echo $Chan_Name
"TFX"
jq '.[] ????? += {???? , ??? }' file.json | sponge file.json
fi
last question (most important):
How to find and modify these f*** objects, when the script does not know any values of the keys of the objects / arrays ?!
I've been looking for 2 days, my brain is liquid.
Thank you. :)
Edit 1 :
I've found a partial solution to replace value:
{
"name": "TFX",
"old_name": "NT1",
"logo": "http://www.exemple.com/image.jpg",
"category": "Fiction",
with:
cat file.json | jq -C '(.Channels[] | select(.name=="TFX").category="test")'
output:
{
"name": "TFX",
"old_name": "NT1",
"logo": "http://www.exemple.com/image.jpg",
"category": "test",
"urls": {
but "{"Channels": [" is missing. :/
jq -C '(.Channels[] | select(.name=="TFX").category="test")'
You were so close - just one misplaced parenthesis:
jq '(.Channels[] | select(.name=="TFX")) .category="test"'
I see that jq can calculate addition as simply as jq 'map(.duration) | add' but I've got a more complex command and I can't figure out how to perform this add at the end of it.
I'm starting with data like this:
{
"object": "list",
"data": [
{
"id": "in_1HW85aFGUwFHXzvl8wJbW7V7",
"object": "invoice",
"account_country": "US",
"customer_name": "clientOne",
"date": 1601244686,
"livemode": true,
"metadata": {},
"paid": true,
"status": "paid",
"total": 49500
},
{
"id": "in_1HJlIZFGUwFHXzvlWqhegRkf",
"object": "invoice",
"account_country": "US",
"customer_name": "clientTwo",
"date": 1598297143,
"livemode": true,
"metadata": {},
"paid": true,
"status": "paid",
"total": 51000
},
{
"id": "in_1HJkg5FGUwFHXzvlYp2uC63C",
"object": "invoice",
"account_country": "US",
"customer_name": "clientThree",
"date": 1598294757,
"livemode": true,
"metadata": {},
"paid": true,
"status": "paid",
"total": 57000
},
{
"id": "in_1H8B0pFGUwFHXzvlU6nrOm6I",
"object": "invoice",
"account_country": "US",
"customer_name": "clientThree",
"date": 1595536051,
"livemode": true,
"metadata": {},
"paid": true,
"status": "paid",
"total": 20000
}
],
"has_more": true,
"url": "/v1/invoices"
}
and my jq command looks like:
cat example-data.json |
jq -C '[.data[]
| {invoice_id: .id, client: .customer_name, date: .date | strftime("%Y-%m-%d"), amount: .total, status: .status}
| .amount = "$" + (.amount/100|tostring)]
| sort_by(.date)'
which nicely gives me output like:
[
{
"invoice_id": "in_1H8B0pFGUwFHXzvlU6nrOm6I",
"client": "clientThree",
"date": "2020-07-23",
"amount": "$200",
"status": "paid"
},
{
"invoice_id": "in_1HJlIZFGUwFHXzvlWqhegRkf",
"client": "clientTwo",
"date": "2020-08-24",
"amount": "$510",
"status": "paid"
},
{
"invoice_id": "in_1HJkg5FGUwFHXzvlYp2uC63C",
"client": "clientThree",
"date": "2020-08-24",
"amount": "$570",
"status": "paid"
},
{
"invoice_id": "in_1HW85aFGUwFHXzvl8wJbW7V7",
"client": "clientOne",
"date": "2020-09-27",
"amount": "$495",
"status": "paid"
}
]
and I want to add a sum/total at the end of that, something like Total: $1775, so that the entire output would look like this:
[
{
"invoice_id": "in_1H8B0pFGUwFHXzvlU6nrOm6I",
"client": "clientThree",
"date": "2020-07-23",
"amount": "$200",
"status": "paid"
},
{
"invoice_id": "in_1HJlIZFGUwFHXzvlWqhegRkf",
"client": "clientTwo",
"date": "2020-08-24",
"amount": "$510",
"status": "paid"
},
{
"invoice_id": "in_1HJkg5FGUwFHXzvlYp2uC63C",
"client": "clientThree",
"date": "2020-08-24",
"amount": "$570",
"status": "paid"
},
{
"invoice_id": "in_1HW85aFGUwFHXzvl8wJbW7V7",
"client": "clientOne",
"date": "2020-09-27",
"amount": "$495",
"status": "paid"
}
]
Total: $1775
Is there a neat/tidy way to enhance this jq command to achieve this?
Or even, since I'm invoking this in a shell script, a dirty/ugly way with bash?
If any of your output is going to be raw, you need to pass -r; it'll just be ignored for data items that aren't strings.
Anyhow -- if you write (expr1, expr2), then your input will be passed through both expressions. Thus:
jq -Cr '
([.data[]
| {invoice_id: .id,
client: .customer_name,
date: .date | strftime("%Y-%m-%d"),
amount: .total,
status: .status}
| .amount = "$" + (.amount/100|tostring)
] | sort_by(.date)),
"Total: $\([.data[] | .total] | add | . / 100)"
'
In case you decide after all to emit valid JSON, here is a modular answer to the question that makes it easy to formulate alternative approaches, and which postpones the conversion of .amount to dollars for efficiency:
def todollar:
"$" + tostring;
def json:
[.data[]
| {invoice_id: .id,
client: .customer_name,
date: .date | strftime("%Y-%m-%d"),
amount: (.total/100),
status: .status} ]
| sort_by(.date) ;
json
| map_values(.amount |= todollar),
"Total: " + (map(.amount) | add | todollar)
As noted elsewhere, you will probably want to use the -r command-line option.
I'm currently downloading a ton of jira issues to generate a report. Currently the 'full data' file has a ton of individual records like this:
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
However, because I'm downloading multiple versions and appending to the same file I end up with this kind of structure.
[
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
]
[
{
"key": "552",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Graphical Issue",
"closedDate": "2014-10-13T09:01:23.000+0200",
"flag": null,
"fixVersionID": "456",
"fixVersionName": "2.8"
}
]
What I want to do is to count the number of records with a specific date and then doing the same looping through a starting date to an end date using jq
But, I can't figure out how to:
Flatten the records so that they are one array not two
Strip the T09:01:23.000+0200 from the closedDate value
Count the number of objects with a specific date value such as 2014-10-13
You have multiple independent inputs. To be able to combine them in any meaningful way, you'll have to slurp up the input. The inputs will be treated as an array of the inputs. Then you could combine them into a single array by adding them.
Since the dates are all in a certain fixed format, you can take substrings of the dates.
"2014-10-13T09:01:23.000+0200"[:10] -> "2014-10-13"
Given that, you can then filter by the date you want and count using the length filter.
add | map(select(.closedDate[:10]=="2014-10-13")) | length
e.g.,
$ cat input.json
[
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
]
[
{
"key": "552",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Graphical Issue",
"closedDate": "2014-10-13T09:01:23.000+0200",
"flag": null,
"fixVersionID": "456",
"fixVersionName": "2.8"
}
]
$ jq -s 'add | map(select(.closedDate[:10]=="2014-10-13")) | length' input.json
1
For question 1 and 2:
$ echo -e "[\n$(sed '/^[][]$/d;/closedDate/s/\(T[^"]*\)//g' json)\n]" > flat-json
To count the number for special day:
$ grep "closedDate" flat-json | grep "2014-10-13" | wc -l
I have a json file of the following format:
[
{
"organization": "ABC",
"type": "School",
"contact": "Joe Schmo",
"contact_title": "Principal",
"mailing_address": "123 Main Street, Anytown, MA",
"phone": "214-555-5430",
"fax": "214-555-5444"
},
{
"organization": "XYZ",
"type": "School",
"contact": "John Doe",
"contact_title": "Asst Principal",
"mailing_address": "123 Main Street, Anycity, TX",
"phone": "512-555-5430",
"fax": "512-555-5444"
},
.
.
.
.
]
I want to duplicate the line starting with "organization" and then add it back to the file twice after replacing "organization" with "company" and "long name". I want to keep the original line too.
The output I want is:
[
{
"organization": "ABC",
"company": "ABC",
"long name": "ABC",
"type": "School",
"contact": "Joe Schmo",
"contact_title": "Principal",
"mailing_address": "123 Main Street, Anytown, MA",
"phone": "214-555-5430",
"fax": "214-555-5444"
},
{
"organization": "XYZ",
"company": "XYZ",
"name": "XYZ",
"type": "School",
"contact": "John Doe",
"contact_title": "Asst Principal",
"mailing_address": "123 Main Street, Anycity, TX",
"phone": "512-555-5430",
"fax": "512-555-5444"
},
.
.
.
.
]
awk or sed solutions preferred.
Here is one way:
sed '/organization/p;s/organization/company/p;s/company/long name/' file
Here is another:
awk '$1~/organization/{print $0;sub(/organization/,"company");print $0;sub(/company/,"long name")}1' file