xarray appending or rewriting a existing nc file - url-rewriting

Can xarray append or rewrite an existing NetCDF file? I need to update my data (NetCDF format) every year. I want to rewrite or append an existing NetCDF file by using xarray. However, I need to read all data into the memory and concatenate them, then overwrite the NC file. It consumes too much memory and less efficient. Is there any possibility that rewrites the file just like python library of NETCDF4?

The xarray documentation here:
https://xarray.pydata.org/en/stable/io.html#reading-and-writing-files
States the following:
It is possible to append or overwrite netCDF variables using the
mode='a' argument. When using this option, all variables in the
dataset will be written to the original netCDF file, regardless if
they exist in the original dataset.
You can add a new variable, but you cannot add data to an existing variable without re-creating it.
See also this (duplicate) question: Is it possible to append to an xarray.Dataset?
It's from a few years back, but things haven't changed. Unfortunately, you'd have to use the netCDF4 package directly instead.

Related

Rules for file extensions?

Are there any rules for file extensions? For example, I wrote some code which reads and writes a byte pattern that is only understood by that specific programm. I'm assuming my anti virus programm won't be too happy if I give it the name "pleasetrustme.exe"... Is it gerally allowed to use those extensions? And what about the lesser known ones, like ".arw"?
You can use any file extension you want (or none at all). Using standard extensions that reflect the actual type of the file just makes things more convenient. On Windows, file extensions control stuff like how the files are displayed in Windows Explorer and what happens when you double click on it.
I wrote some code which reads and writes a byte pattern that is only
understood by that specific programm.
A file extension is only an indication of what type of data will be inside, never a guarantee that certain data formatted in a specific way will be inside the file.
For your own specific data structure it is of course always best to choose an extension that is not already in use for other file formats (or use a general extension like .dat or .bin maybe). This also has the advantage of being able to use an own icon without it being overwritten by other software using the same extension - or the other way around.
But maybe even more important when creating a custom (binary?) file format, is to provide a magic number as the first bytes of that file, maybe followed by a file header structure containing a version number etc. That way your own software can first check the header data to make sure it's the right type and version (for example: anyone could rename any file type to your extension, so your program needs to have a way to do some checks inside the file before reading the remaining data).

Handle single files while extracting tar.gz

I am having a huge .tgz file which is further structured inside like this:
./RandomFoldername1/file1
./RandomFoldername1/file2
./RandomFoldername2/file1
./RandomFoldername2/file2
etc
What I want to do is having each individual file extracted to standard output so that I can pipe it afterwards to another command. While doing this, I also need to get the RandomFoldername name and file name so that I can deal with them properly from within the second command.
Till now the options I have are
to either extract all of the tarball and deal with the structured files that I will be having, which is not an option since the extracted tar doesn't fit into the hard drive
Make a loop that pattern match each file and extract one file at time. This option although that solves the problem, is too slow because the tarball is sweeped each time for only one file.
While searching on how to solve this, I've started to fear that there is no better alternative to this.
Using tar the tool I don't believe you have any other options.
Using a tar library for some language of your choice should allow you to do what you want though as it should let you iterate over the entries in the tarball one-by-one and allow you to extract/pipe/etc. each file one-by-one as necessary.

Writing hash information to file and reloading it automatically on program startup?

I wrote a little program that creates a hash called movies. Then I can add, update, delete, and display all current movies in the hash by typing the title.
Instead of having it start a new hash each time and save anything added to a file, and, when updated or deleted, update or delete the key, value pair from the file, I want the program to auto-load the file on startup and create it if it doesn't exist.
I have no idea how to go about doing this.
After reading a lot of the comments I have decided that maybe I should do this with SQL instead, seems like a much better approach!
You can't store Ruby objects directly on the disk; you will first need to convert them to some sequence of bytes (i.e. a string). This is called serialization, and there are several different ways to do it and several different formats the data could be in. I think I would recommend JSON, but you might also want to try YAML or Marshal.
Any of those libraries will allow you to convert your hash into a string and allow you to convert that same string back into a hash. Then you can use Ruby's File class to save and load that string from the disk.
This should get you pointed in the right direction. From here you can search for more specific things like "how do I convert a hash to JSON" or "how do I write a string to a file".
You have the ability to marshal your code in a few ways.
YAML if you would like to use a gem, or JSON. There is also a built in Marshal
RI tells us:
Marshal
(from ruby site)
----------------------------------------------------------------------------- The marshaling library converts collections of Ruby objects into a
byte stream, allowing them to be stored outside the currently active
script. This data may subsequently be read and the original objects
reconstituted.
Marshaled data has major and minor version numbers stored along with
the object information. In normal use, marshaling can only load data
written with the same major version number and an equal or lower minor
version number. If Ruby's ``verbose'' flag is set (normally using -d,
-v, -w, or --verbose) the major and minor numbers must match exactly. Marshal versioning is independent of Ruby's version numbers. You can
extract the version by reading the first two bytes of marshaled data.
And I will leave it at that for Marshal. But there is a bit more documentation there.
You can also use IO#puts to write to a file, and then modify that file to load later, which I use sometimes for config settings. Why use YAML or another external source, when Ruby is easy enough to have a user modify? You use YAML when it needs to be more generally accessible, as the Tin Man points out.
For example this file is the sample file, but is intended for interactive editing (with constraints, of course) but it is simply valid Ruby. And it gets read by a Ruby program, and is a valid object (in this case a Hash stored in a constant.)

How do I delete data/characters from a file? [duplicate]

This question already has answers here:
How do I insert and delete some characters in the middle of a file?
(4 answers)
Closed 9 years ago.
I'm writing a program to edit a txt file.
But I found that the windows API WriteFile can only add data/characters to a file, but not deleting data from files.
The only solution I've come up is to read the whole file into a buffer using ReadFile, and then use a loop to shift the data one by one, then replace the old file with the new file. But I think this will probably make my program really slow.
Can anyone help please
thanks.
If you're trying to delete from the end of the file it can be very fast with truncate() and ftruncate().
Where are you trying to delete the data from? If it's from the middle, you'll have to use fseek(): If the file contains "ABCDEFG", and you want to delete "DEF", use fseek() to get to G, copy "G" into a buffer, fseek to where "C" is, then write() what's there. Then truncate the file to the correct size with ftruncate().
If this really becomes a performance issue for you, you'll want to either design your file in a way that accounts for this or use a database of some kind. You may also want to use memory-mapped files, but usually this is better done by a database that someone else wrote instead of reinventing the wheel.
Files are linear streams of data. If you want to remove content from a file, you must re-write all the content of the file that follows the part that you have remove. So, unless the content to be removed is at the end of the file, you will need to perform some writing. In the worst case scenario, in order to remove the first byte of a file, you need to re-write the entire file apart from the byte that you removed.
FWIW, Raymond Chen wrote a nice article on this subject: How do I delete bytes from the beginning of a file?

What is the best way to edit the middle of an existing flat file?

I have tool that creates variables for a simulation. The current workflow involves hand copying those variables into the simulation input file. The input file is a standard flat file, i.e. not binary or XML. I would like to automate the addition of the variables to the flat input file.
The variables copy over existing variables in the file, e.g.
New Variables:
Length 10
Height 20
Depth 30
Old Variables:
...
Weight 100
Age 20
Length 10
Height 20
Depth 30
...
Would like to have the old variables copy over the new variable. They are 200 lines into the flat input file.
Thanks for any insights.
P.S. This is on Windows.
If you're stuck using flat, then you're stuck using the old fashioned way of updating them: read from original, write to temp file, either write the original row or change the data and then write that. To add data, write it to the temp file at the appropriate point; to delete data, simply don't copy it from the original file.
Finally, close both files and rename the temp file to the original file name.
Alternatively, it might be time to think about a little database.
For something like this I'd be looking at a simple template engine. You'd have a base template with predefined marker tokens instead of variable values and then just pass the values required to your engine along with the template and it will spit out the resultant file, all present and correct. There are a number of Open Source template engines available in Java that would meet your needs, I imagine such things are also available in your language of choice. You could even roll your own without too much difficulty.
Note that under Unix, one would probably look at using mmap() because you can then use functions such as memmove() to move the data around and add new data or truncate() the result if the file is then smaller (you may also want to use truncate() to grow the file).
Under MS-Windows, you have the MapViewOfFileEx() function to do the same thing. The API is different, though,
and I'm not exactly sure what happens or how to grow/shrink the file (MSDN also includes a truncate()-like function and maybe that works).
Of course, it's important to use memcpy() or memmove() properly to not overwrite the wrong data or go outside the buffer.

Resources