netcdf time dimension missing CDO - time

I downloaded a netcdf file containing 4-5 variables but it has only 2 dimensions (lat and lon).
Time is missing and this does not allow me to merge timesteps or do anything useful.
Is there any way to fix this hopefully by using CDO?
there are 100 netcdf files (without time dimension) and I want to merge them using time as the main variable for merging.

Let me expound a bit on the answer by #Ales --
ncecat is a command-line function in the NCO package that concatenates multiple netcdf files lacking a record (oftentimes "Time") dimension. The concatenated file will have a new record dimension, by default it is 'record'.
Let's say you have a list of netcdf files such as A.nc, B.nc, and C.nc, all with the dimensions lat x lon. You can concatenate them using ncecat A.nc B.nc C.nc -O new_file.nc. new_file.nc will have dimensions record x lat x lon, where record will have a size of 3 (the number of files you concatenated). The -O option is not necessary, but will explicitly create (and overwrite) the file new_file.nc...or whatever you want to call it.

You can use command:
ncecat -O -u time in.nc out.nc
to create a new time dimension from scratch.

Related

How to write for loop to run regression on multiple files contained in same folder?

I need to develop a script to run a simple OLS on multiple csv files stored in the same folder.
All have the same column names and regression will always be based upon the same columns ("x_var" and "y_var").
The below code is used to read in the csvs and rename them.
## Read in files from folder
file.List <- list.files(pattern = "*.csv")
for(i in 1:length(file.List))
{
assign(paste(gsub(".csv","", file.List[i])), read.csv(file.List[i]))
}
However, after this [very initial stage!] I've got a bit lost........
Each dataframe has 7 identical columns. a, b, c, d, x_var, e, y_var.....
I need to run a simple OLS using lm(x_car ~ y_var, data = dataframes) and plot the result on each dataframe and assumed a 'for loop' would be the best option, but am not too sure of how to do so....
After each regression is run I want it to extract the coefficients/R2 etc into a csv and save the plot separately.......
Tried below, but have gone very wrong [and not working at all];
list <- list(gsub(" finalSIRTAnalysis.csv","", file.List))
for(i in length(file.List))
{
lm(x_var ~ y_var, data = [i])
}
Can't even make a start on this........and need some advice, if anyone has any good ideas (such as creating an external function first.....)
I am not sure if the function lm is available to compute the results using multiple variable sources. Try merging the database. I have have a similar issue because I have 5k files and is computationally impossible to merge them all. But maybe this answer can help you.
https://stackoverflow.com/a/63770065/14744492

Extracting data from text file in AMPL without adding indexes

I'm new to AMPL and I have data in a text file in matrix form from which I need to use certain values. However, I don't know how to use the matrices directly without having to manually add column and row indexes to them. Is there a way around this?
So the data I need to use looks something like this, with hundreds of rows and columns (and several more matrices like this), and I would like to use it as a parameter with index i for rows and j for columns.
t=1
0.0 40.95 40.36 38.14 44.87 29.7 26.85 28.61 29.73 39.15 41.49 32.37 33.13 59.63 38.72 42.34 40.59 33.77 44.69 38.14 33.45 47.27 38.93 56.43 44.74 35.38 58.27 31.57 55.76 35.83 51.01 59.29 39.11 30.91 58.24 52.83 42.65 32.25 41.13 41.88 46.94 30.72 46.69 55.5 45.15 42.28 47.86 54.6 42.25 48.57 32.83 37.52 58.18 46.27 43.98 33.43 39.41 34.0 57.23 32.98 33.4 47.8 40.36 53.84 51.66 47.76 30.95 50.34 ...
I'm not aware of an easy way to do this. The closest thing is probably the table format given in section 9.3 of the AMPL Book. This avoids needing to give indices for every term individually, but it still requires explicitly stating row and column indices.
AMPL doesn't seem to do a lot with position-based input formats, probably because it defaults to treating index sets as unordered so the concept of "first row" etc. isn't meaningful.
If you really wanted to do it within AMPL, you could probably put together a work-around along these lines:
declare a single-index param with length equal to the total size of your matrix (e.g. if your matrix is 10 x 100, this param has length 1000)
edit the beginning and end of your "matrix" data file to turn it into appropriate format for a single-index parameter indexed from 1 to n
then define your matrix something like this:
param m{i in 1..nrows,j in 1..ncols} := x[j+i*(ncols-1)];
(not tested, I won't promise that I have rows and columns the right way around there!)
But you're probably better off editing the input file into one of the standard AMPL matrix formats. AMPL isn't really designed for data wrangling - you can do it in a pinch but if you're doing this kind of thing repeatedly it may be less trouble to code it in a general-purpose language e.g. Python.

Access file inside folder, matlab mac

I have about 300 files I would like to access and import in Matlab, all these files are inside 300 folders.
The first file lie in the directory users/matt/Documents/folder_1 with the filename line.csv the 2nd file lie in users/matt/Documents/folder_2 with filename line.csv
So I would like to import the data from the 300 line.csv files in Matlab so I can take the average value. Is this possible? I am using mac osx btw.
I know what do with the .csv file, but I have no clue how to access them efficiently.
This should work: All we are doing is generating the string for every file path using sprintf and the loop index i, and then reading the csv file using csvread and storing the data in a cell array.
for i = 1:300 % Loop 300 times.
% Full path pointing to the csv file.
file_path = sprintf('users/matt/Documents/folder_%d/line.csv', i);
% Read data from csv and store it in a cell array.
data{i} = csvread(file_path);
end
% Do your computations here.
% ...
Remember to replace the 300 by the actual number of folders that you have.

Increment Serial Number using EXIF

I am using ExifTool to change the camera body serial number to be a unique serial number for each image in a group of images numbering several hundred. The camera body serial number is being used as a second place, in addition to where the serial number for the image is in IPTC, to put the serial number as it takes a little more effort to remove.
The serial number is in the format ###-###-####-#### where the last four digits is the number to increment. The first three groups of digits do not change for each batch I run. I only need to increment that last group of digits.
EXAMPLE
I if I have 100 images in my first batch, they would be numbered:
811-010-5469-0001, 811-010-5469-0002, 811-010-5469-0003 ... 811-010-5469-0100
I can successfully drag a group of images onto my ExifTool Shortcut that has the values
exiftool(-SerialNumber='001-001-0001-0001')
and it will change the Exif SerialNumber Tag on the images, but have not been successful in what to add to this to have it increment for each image.
I have tried variations on the below without success:
exiftool(-SerialNumber+=001-001-0001-0001)
exiftool(-SerialNumber+='001-001-0001-0001')
I realize most likely ExifTool is seeing these as numbers being subtracted in the first line and seeing the second line as a string. I have also tried:
exiftool(-SerialNumber+='1')
exiftool(-SerialNumber+=1)
just to see if I can even get it to increment with a basic, single digit number. This also has not worked.
Maybe this cannot be incremented this way and I need to use ExifTool from the command line. If so, I am learning the command line/powershell (Windows), but am still weak in this area and would appreciate some pointers to get started there if this is the route I need to take. I am not afraid to use the command line, just would need a bit more hand holding then normal for a starting point. I also am learning Linux and could do this project from there but again, not afraid to use it, just would need a bit more hand holding to get it done.
I do program in PHP, JavaScript and other languages so code is not foreign to me. Just experience in writing it for the command-line.
If further clarification is needed, please let me know in the comments.
Your help and guidance is appreciated!
You'll probably have to go to the command line rather than rely upon drag and drop as this command relies upon ExifTool's advance formatting.
Exiftool "-SerialNumber<001-001-0001-${filesequence;$_=sprintf('%04d', $_+1 )}" <FILE/DIR>
If you want to be more general purpose and to use the original serial number in the file, you could use
Exiftool "-SerialNumber<${SerialNumber}-${filesequence;$_=sprintf('%04d', $_+1 )}" <FILE/DIR>
This will just add the file count to the end of the current serial number in the image, though if you have images from multiple cameras in the same directory, that could get messy.
As for using the command line, you just need to rename to remove the commands in the parens and then either move it to someplace in the command line's path or use the full path to ExifTool.
As for clarification on your previous attempts, the += option is used with numbers and with lists. The SerialNumber tag is usually a string, though that could depend upon where it's being written to.
If I understand your question correctly, something like this should work:
1..100 | % {
$sn = '811-010-5469-{0:D4}' -f $_
# apply $sn
}
or like this (if you iterate over files):
$i = 1
Get-ChildItem 'C:\some\folder' -File | % {
$sn = '811-010-5469-{0:D4}' -f $i
# update EXIF data of current file with $sn
$i++
}

How to fetch all records using NCBI Batch Entrez

I have over 200,000 accessions in a flat file, which need to retrieve relevant entry from NBCI.
I use Batch Entrez (http://www.ncbi.nlm.nih.gov/sites/batchentrez) to do the job. But encountered several problems:
The initial file was splitted into multiple sub-files, each containing 4000 lines. But it seems Batch Entrez has some size limitation on the returned file. For example: if the first 1000 accessions all have tens of thousands lines which reach the size limitation, then the rest 3000 accessions will be rejected and won't be searched.
One possible solution in my head is to split the file into more sub-files and search individually. However this requires too much manual effort.
So I am just wondering if there is any other solution, or any code could be used.
Thanks in advance
Your problem sounds a good fit for a Bio-star toolkit. This is a solution using BioSmalltalk
| giList gbReader |
giList := (BioObject openFullFileNamed: 'd:\Batch_entrez_1.txt') contents lines.
gbReader := BioNCBIGenBankReader new.
gbReader
genBankRecordsFrom: 'nuccore'
format: #setModeXML
uids: giList.
(BioGBSeqCollection newFromXMLCollection: gbReader searchResults)
collect: [: e | BioParser
tokenizeNcbiXmlBlast: e contents
nodes: #('GBAuthor' 'GBSeq_definition') ]
To execute/debug the script, just select it and a right-click will open the Smalltalk world-menu.
The API automatically split and fetch your accession list (in the script contained in Batch_entrez_1.txt) maintaining the NCBI Entrez post limits to avoid penalities.
The result format is XML (which is an "easy" format to parse or filter specific fields) although it could be any of the retrieval modes supported by Entrez, for example setting #setModeText will answer an ASN.1 representation. Replace 'nuccore' for the database you want to query. Finally choose the interesting fields, in the script I have choosed 'GBAuthor' and 'GBSeq_definition', but you are free to choose anyone of the available nodes.

Resources