I have 52 json files (r$i.json) containing each 25 results (0 to 24). I'd like to create a json file with a special name for each of these results. The name would be composed according to the content of each of these results : YYYYMMDDHHMMSS_company_jobtitle.json
the command generating names work fine :
#!bin/bash
for ((j=0;j<=24;j++))
do
datein=$(jq <"r1.json" ".results[$j].date" | sed 's/"//g')
dateout=$(date -d "${datein}" +"%Y%m%d%H%M%S")
company=$(jq <"r1.json" ".results[$j].company" | sed 's/,//g;s/"//g;s/ //g')
job=$(jq <"r1.json" ".results[$j].jobtitle" | sed 's/,//g;s/"//g;s/ //g')
jq <"r1.json" ".results[$j]" > ${dateout}_${company}_${job}.json
done
Now when I replace r1 by r$i and add ((i=1;i<=52;j++)) it doesn't work... So I guess my problem comes from nested loop syntax in jq...
r1.json would look like that :
{
"radius" : 25,
"totalResults" : 1329,
"results" : [
{
"jobtitle" : "job1",
"company" : "company1,
"date" : "Sun, 01 Sep 2015 07:59:58 GMT",
}
,
{
"jobtitle" : "job2",
"company" : "company2",
"date" : "Sun, 02 Sep 2015 07:59:58 GMT",
}
,
|...]
{
"jobtitle" : "job25",
"company" : "company25,
"date" : "Sun, 25 Sep 2015 07:59:58 GMT",
}
]
}
You should respect the bash syntax in your fors:
for (( i=0; i<5; i++ ))
((i=1,i< =52,j++)) won't work, use ; instead of ,.
1) You wrote that your i-loop used ((i=1;i< =52;j++)); that should be ((i-1; i<=52; i++))
2) We can't see exactly what you did with respect to r1 and r$i, so if (1) doesn't resolve your difficulty, maybe you should double-check that what you did is actually what is needed. Should you change "> $outputname" to ">> $outputname"?
3) I suspect that rather than using s/"//g, it might be better to use the -r option of jq; you might also consider avoiding sed altogether (jq 1.5 has sub and gsub functions).
4) As I said, it would be better to get rid of all the backticks.
Finally I found the solution, and my issue didn't come from jq but from the syntax I was using for nested loops... Here it is :
for ((i=1;i<=kdr;i++))
do
for ((j=0;j<=24;j++))
do
datein=$(jq <"r$i.json" ".results[$j].date" | sed 's/"//g')
dateout=$(date -d "${datein}" +"%Y%m%d%H%M%S")
company=$(jq <"r$i.json" ".results[$j].company" | sed 's/,//g;s/"//g;s/ //g')
job=$(jq <"r$i.json" ".results[$j].jobtitle" | sed 's/,//g;s/"//g;s/ //g')
jq <"r$i.json" ".results[$j]" > ${dateout}_${company}_${job}.json
done
done
Related
I was trying to solve one of my old assignment I am literally stuck in this one Can anyone help me?
There is a file called "datafile". This file has names of some friends and their
ages. But unfortunately, the names are not in the correct format. They should be
lastname, firstname
But, by mistake they are firstname,lastname
The task of the problem is writing a shell script called fix_datafile
to correct the problem, and sort the names alphabetically. The corrected filename
is called datafile.fix .
Please make sure the original structure of the file should be kept untouched.
The following is the sample of datafile.fix file:
#personal information
#******** Name ********* ***** age *****
Alexanderovich,Franklin 47
Amber,Christine 54
Applesum,Franky 33
Attaboal,Arman 18
Balad,George 38
Balad,Sam 19
Balsamic,Shery 22
Bojack,Steven 33
Chantell,Alex 60
Doyle,Jefry 45
Farland,Pamela 40
Handerman,jimmy 23
Kashman,Jenifer 25
Kasting,Ellen 33
Lorux,Allen 29
Mathis,Johny 26
Maxter,Jefry 31
Newton,Gerisha 40
Osama,Franklin 33
Osana,Gabriel 61
Oxnard,George 20
Palomar,Frank 24
Plomer,Susan 29
Poolank,John 31
Rochester,Benjami 40
Stanock,Verona 38
Tenesik,Gabriel 29
Whelsh,Elsa 21
If you can use awk (I suppose you can), than this there's a script which does what you need:
#!/bin/bash
RESULT_FILE_NAME="datafile.new"
cat datafile.fix | head -4 > datafile.new
cat datafile.fix | tail -n +5 | awk -F"[, ]" '{if(!$2){print()}else{print($2","$1, $3)}}' >> datafile.new
Passing -F"[, ]" allows awk to split columns both by , and space and all that remains is just print columns in a needed format. The downsides are that we should use if statement to preserve empty lines and file header also should be treated separately.
Another option is using sed:
cat datafile.fix | sed -E 's/([a-zA-Z]+),([a-zA-Z]+) ([0-9]+)/\2,\1 \3/g' > datafile.new
The downside is that it requires regex that is not as obvious as awk syntax.
awk -F[,\ ] '
!/^$/ && !/^#/ {
first=$1;
last=$2;
map[first][last]=$0
}
END {
PROCINFO["sorted_in"]="#ind_str_asc";
for (i in map) {
for (j in map[i])
{
print map[i][j]
}
}
}' namesfile > datafile.fix
One liner:
awk -F[,\ ] '!/^$/ && !/^#/ { first=$1;last=$2;map[first][last]=$0 } END { PROCINFO["sorted_in"]="#ind_str_asc";for (i in map) { for (j in map[i]) { print map[i][j] } } }' namesfile > datafile.fix
A solution completely in gawk.
Set the field separator to both , and space. Then ignore any lines that are empty or start with #. Mark the first and last variables based on the delimited fields and then create a two dimensional array called map indexed by first and last name and the value equal to the line. At the end, set the sort to indices string ascending and loop through the array printing the names in order as requested.
Completely in bash:
re="^[[:space:]]*([^#]([[:space:]]|[[:alpha:]])+),(([[:space:]]|[[:alpha:]])*[[:alpha:]]) *([[:digit:]]+)"
while read line
do
if [[ ${line} =~ $re ]]
then
echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}
else
echo "${line}"
fi
done < names.txt
The core of this is to capture, using bash regex matching (=~ operator of the [[ command), parenthesis groupings, and the BASH_REMATCH array, the name before the comma (([^#]([[:space:]]|[[:alpha:]])+)), the name after the comma ((([[:space:]]|[[:alpha:]])*[[:alpha:]])), and the age ( *([[:digit:]]+)). The first-name regex is constructed so as to exclude comments, and the last-name regex is constructed as to handle multiple spaces before the age without including them in the name. Preconditions: Commented lines with or without leading spaces (^[[:space:]]*([^#]), or lines without a comma, are passed through unchanged. Either first names or last names may have internal spaces. Once the last name and first name are isolated, it is easy to print them in reverse order followed by the age (echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}). Note that the letter/space groupings are counted as matches which is why we skip 2 and 4.
I have tried using awk and sed.
Try if this works
less dataflie.fix | sed 's/ /,/g' | awk -F "," '{print $2,$1,$3}' | sed 's/ /,/' | sed 's/^,//' | sort -u > dataflie_new.fix
Let's assume $commandGetEvents is an array of json objects. I use the following command to extract the event Id, which is a number from 1 - 65 and store it in currentEventId. Now let's assume I have another variable called startedEventId which holds the value I'm looking for, which is 22.
Here's an example of the data $commandGetEvents contains.
[
{
"eventId": 22,
"Name" : "Bob"
"Activity" : "Eat Food"
"startedEventId" : 15
},
{
"eventId": 21,
"Name" : "Smith"
"Activity" : "Ride a bike"
"startedEventId" : 13
},
{
"eventId": 20,
"Name" : "Tony"
"Activity": "Print paper"
"startedEventId" : 10
},
]
eventId is the unique identifier of the json object. & startedEventId is the identifier of json object that caused the current one to take place.
currentEventId=$(jq ".[$index].eventId" <<< ${commandGetEvents})
startedEventid=$(jq ".[${eventCounter}].startedEventId" <<< $commandGetEvents)
When i echo both statements in a while loop, I get the following output.
currentEventId = 1
startedEventId = 22
currentEventId = 2
startedEventId = 22
currentEventId = 3
startedEventId = 22
The while loop continues until all elements of currentEventId are exhausted.
My problem is when I compare both statements like this:
if [[ ${startedId} -eq ${currentEventId} ]] ;
then
echo "Equal"
fi
I get the following error message:
line 90: [[: 22: syntax error: operand expected (error token is "22")
The provided "JSON" is invalid as JSON. Please fix it.
When using jq at the bash command line, it's almost always best to enclose the jq program in single-quotation marks; bash shell variables can be passed in using --arg or --argjson. Consider for example the following snippet that assume `commandGetEvents' is valid JSON along the lines suggested in the Q:
index=0
currentEventId=$(jq --argjson index $index '.[$index].eventId' <<< ${commandGetEvents})
echo index=$index
echo currentEventId=$currentEventId
The part of the question involving eventCounter is somewhat obscure, but it looks like the example given immediately above will serve as a guide.
Rather than using bash constructs to iterate through the JSON array, it would almost certainly be better to use jq's support for iteration and selection. For example:
jq '.[] | select(.eventId == 22)' <<< ${commandGetEvents}
yields:
{
"eventId": 22,
"Name": "Bob",
"Activity": "Eat Food",
"startedEventId": 15
}
So if you just want the startedEventId value (or values) corresponding to .eventId == 22, you could write:
jq '.[] | select(.eventId == 22) | .startedEventId' <<< ${commandGetEvents}
I have the following line:
for custom_field in $(jq -r '.issues[] | .fields.created' 1.json
); do
echo $custom_field
done
Output:
2018-03-06T21:24:41.000+0000
2018-03-06T22:48:47.000+0000
How to compare current datetime with each output and if it's older than 3 hours to print "old"?
Given input like
{ "issues":
[{"id": 1, "fields": {"created": "2018-03-06T21:24:41.000+0000"}},
{"id": 2, "fields": {"created": "2018-03-06T22:48:47.000+0000"}},
{"id": 3, "fields": {"created": "2018-03-09T22:48:47.000+0000"}}]}
you can use the built-in date manipulation functions to print the old records with something like
jq -r '(now-3600*3) as $when | .issues[] |
select(.fields.created | strptime("%Y-%m-%dT%H:%M:%S.000+0000") | mktime < $when) |
[.id, .fields.created, "old"]' 1.json
where the last line probably needs tweaking to produce exactly the output you want.
It is much easier to convert first and subtract the three hours.
Example below converts to seconds and prints true if condition is met.
date_in_seconds=$(date -d $custom_field +"%s");
[ ${date_in_seconds} -gt 259200 ] && echo true;
For non GNU versions of the date command you can use the following;
date_in_seconds=$(date -j -f '%Y-%m-%d %H:%M:%S' "2016-02-22 20:22:14" '+%s')
Keep in mind that the EPOCH will rollover 1 Jan 2036.
I feel this is probably something that is fairly easy to do, I have googled it and searched the questions here, but I can't find it. Maybe this is because I am not asking the right question.
I want to take the numerical value of a variable from a netcdf file and use it in a mathematical operation in my bash script. I have tried:
a=5*"./myfile.nc"
a=5*./myfile.nc
echo $a
I am getting, in both cases:
./myfile.nc: Permission denied
1*/home/cohara/RainfallData/rainfall_increase/5%over10years.nc
ncdump prints all the information contained in the file to the terminal, but how do I select the variable I want, for use in the script?
Here is the ncdump output. It is the value of var61 that I want to extract:
{
dimensions:
lon = 1 ;
lat = 1 ;
height = 1 ;
time = UNLIMITED ; // (1 currently)
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
double height(height) ;
height:standard_name = "height" ;
height:long_name = "height" ;
height:units = "m" ;
height:positive = "up" ;
height:axis = "Z" ;
double time(time) ;
time:standard_name = "time" ;
time:units = "year as %Y.%f" ;
time:calendar = "proleptic_gregorian" ;
float var61(time, height, lat, lon) ;
var61:table = 1 ;
var61:_FillValue = -9.e+33f ;
// global attributes:
:CDI = "Climate Data Interface version 1.5.5 (http://code.zmaw.de/projects/cdi)" ;
:Conventions = "CF-1.0" ;
:history = "Tue Apr 08 12:40:41 2014: cdo divc,10 /home/cohara/RainfallData/rainfall_increase/historical_5%.nc /home/cohara/RainfallData/rainfall_increase/5%in10parts.nc\n",
"Tue Apr 08 12:40:41 2014: cdo mulc,5 /home/cohara/RainfallData/rainfall_increase/historical_1%.nc /home/cohara/RainfallData/rainfall_increase/historical_5%.nc\n",
"Tue Apr 08 12:40:41 2014: cdo divc,100 /home/cohara/RainfallData/rainfall_increase/historical_mean.nc /home/cohara/RainfallData/rainfall_increase/historical_1%.nc\n",
"Tue Apr 08 12:40:41 2014: cdo timmean /home/cohara/RainfallData/rainfall_increase/historical.nc /home/cohara/RainfallData/rainfall_increase/historical_mean.nc\n",
"Fri Mar 21 10:16:32 2014: cdo splityear ./1981-2076.nc ./1981-2076_\n",
"Tue Mar 04 14:18:04 2014: cdo settaxis,1981-01-01,00:00:00,1year ./RainfallData/timsum_1981.nc ./RainfallData/settaxis.nc\n",
"Tue Mar 04 13:55:00 2014: cdo timsum /home/cohara/RainfallData/fldmean_1981.nc /home/cohara/RainfallData/timsum_1981.nc\n",
"Tue Mar 04 13:45:48 2014: cdo fldmean /home/cohara/RainfallData/corrected_precip_1981.nc /home/cohara/RainfallData/fldmean_1981.nc\n",
"Tue Mar 04 13:43:26 2014: cdo divc,10 /home/cohara/RainfallData/PRECIP_1981.nc /home/cohara/RainfallData/corrected_precip_1981.nc\n",
"Mon Jul 15 13:16:49 2013: cdo cat WD2_1981m10grid.csv.nc WD2_1981m11grid.csv.nc WD2_1981m12grid.csv.nc WD2_1981m1grid.csv.nc WD2_1981m2grid.csv.nc WD2_1981m3grid.csv.nc WD2_1981m4grid.csv.nc WD2_1981m5grid.csv.nc WD2_1981m6grid.csv.nc WD2_1981m7grid.csv.nc WD2_1981m8grid.csv.nc WD2_1981m9grid.csv.nc PRECIP_1981.nc\n",
"Fri Jul 12 09:11:31 2013: cdo -f nc copy WD2_1981m10grid.csv.grb WD2_1981m10grid.csv.nc" ;
:CDO = "Climate Data Operators version 1.5.5 (http://code.zmaw.de/projects/cdo)" ;
data:
lon = 0 ;
lat = 0 ;
height = 0 ;
time = 2012 ;
var61 =
5.293939 ;
}
Thanks in advance for any help.
Ciara
Try this:
a=$(ncdump myfile.nc | grep "var61:_FillValue" | sed -e "s/.*= //;s/ .*//")
explanation:
the pipe | sends the output from one program to the input of the other.
grep deletes every line that does not match the given string
sed does some regexp magic, it has two parts separated by ";"
first part matches everything up to "= " and replaces with nothing
second part matches everything from " " and replaces with nothing
I don't have a proper netcdf file to test right now, but it might become easier using something with ncdump -v var61 myfile.nc.
edit
If you want the answer to be 5.293939, use
a=$(ncdump myfile.nc |sed -z -e "s/.* var61 =\n //;s/ .*//")
I think I searched for the wrong value above.
Alternative:
a=$(ncdump myfile.nc |awk '/var61 =/ {nextline=NR+1}{if(NR==nextline){print $1}}')
It works like this:
statement1:
/var61 =/ searches for the string between the slashes
NR contains the line number. nextline is set to the next line number
statement2
when NR equals nextline, then
print the first word in the line
I usually use the -v option on ncdump in case the file is very large as this is faster.
This should pick up the value of the variable:
myvar=`ncdump -v,var61 tm.nc | tail -n 2 | head -n 1 | awk '{print $1;}'`
I just use head and tail to chop off the netcdf file header and end as I find it easier to understand and remember!
Note this will only work with a variable that has a single value though, (which seems to be the case in your question and example file), since the tail command is picking up the last two lines. If you want to pick up the first entry of a variable array then this will require modification.
I don't know too much of bash scripting and I'm trying to develop a bash script to do this operations:
I have a lot of .txt files in the same directory.
Every .txt file follows this structure:
file1.txt:
<name>first operation</name>
<operation>21</operation>
<StartTime>1292435633</StartTime>
<EndTime>1292435640</EndTime>
<name>second operation</name>
<operation>21</operation>
<StartTime>1292435646</StartTime>
<EndTime>1292435650</EndTime>
I want to search every <StartTime> line and convert it to standard date/time format (not unix timestamp) but preserving the structure <StartTime>2010-12-15 22:52</StartTime>, for example. This could be a function of search/replace, using sed? I think I could use these function that I found: date --utc --date "1970-01-01 $1 sec" "+%Y-%m-%d %T"
I want to to do the same with <EndTime> tag.
I should do this for all *.txt files in a directory.
I tried using sed but with not wanted results. As I said I don't know so much of bash scripting so any help would be appreciated.
Thank you for your help!
Regards
sed is incapable of doing date conversions; instead I would reccomend you to use a more appropriate tool like awk:
echo '<StartTime>1292435633</StartTime>' | awk '{
match($0,/[0-9]+/);
t = strftime("%F %T",substr($0,RSTART,RLENGTH),1);
sub(/[0-9]+/,t)
}
{print}'
If your input files have one tag per line, as in your structure example, it should work flawlessly.
If you need to repeat the operation for every .txt file just use a shell for:
for file in *.txt; do
awk '/^<[^>]*Time>/{
match($0,/[0-9]+/);
t = strftime("%F %T",substr($0,RSTART,RLENGTH),1);
sub(/[0-9]+/,t)
} 1' "$file" >"$file.new"
# mv "$file.new" "$file"
done
In comparison to the previous code, I have done two minor changes:
added condition /^<[^>]*Time>/ that checks if the current line starts with or
converted {print} to the shorter '1'
If the files ending with .new contain the result you were expecting, you can uncomment the line containing mv.
Using grep:
while read line;do
if [[ $line == *"<StartTime>"* || $line == *"<EndTime>"* ]];then
n=$(echo $line | grep -Po '(?<=(>)).*(?=<)')
line=${line/$n/$(date -d #$n)}
fi
echo $line >> file1.new.txt
done < file1.txt
$ cat file1.new.txt
<name>first operation</name>
<operation>21</operation>
<StartTime>Wed Dec 15 18:53:53 CET 2010</StartTime>
<EndTime>Wed Dec 15 18:54:00 CET 2010</EndTime>
<name>second operation</name>
<operation>21</operation>
<StartTime>Wed Dec 15 18:54:06 CET 2010</StartTime>
<EndTime>Wed Dec 15 18:54:10 CET 2010</EndTime>