GeoTrellis: Create an Attribute Store for a Cloud-Optimized GeoTIFF hosted outside of GeoTrellis - geotiff

The strategic performance advantage of the Cloud-Optimized GeoTiff is the ability to retrieve raster data for a given extent while only pulling the overviews and a byte range from a remote resource.
In Python, the vsicurl and gdal.Warp abstractions make it possible to do this with just a URL an an extent:
vsicurl_url = '/vsicurl/' + url_to_cog
gdal.Warp(output_file,
vsicurl_url,
dstSRS = 'EPSG:4326',
cutlineDSName = jsonFileSliceAoi,
cropToCutline = True)
The newly-minted COG Spark Examples explain how to arrive at a Raster[Tile] using an AttributeStore created as a result of tiling an RDD in a previous step:
//tiling an RDD and writing out the catalog
...
// Create the reader instance to query tiles stored as a Structured COG Layer
val reader = FileCOGLayerReader(attributeStore)
// Read layer at the max persisted zoom level
// Actually it can be any zoom level in this case from the [0; zoom] values range
val layer: TileLayerRDD[SpatialKey] = reader.read[SpatialKey, Tile](LayerId("example_cog_layer", zoom))
// Let's stitch the layer into tile
val raster: Raster[Tile] = layer.stitch
The examples, release notes, and docs for COG support in GeoTrellis all confirm that there is support for tiling data and making it available for clients to consume as a COG. Does GeoTrellis also support the ability to act as a client?
How do you create a FileCOGLayerReader if you don't have a pre-existing catalog, but do have a URL that supports range requests?

We have two COG related concepts at the moment:
The first one is a GeoTrellis COG Layer - a very similar to a GeoTrellis avro layer thing, but instead of storing separate avro tiles we store tiles as a segment of a tiff(s). To prepare such catalog you'd have to go through an ingest process to structure data and to reformat it to the GeoTrellis ready format. That would be a layer that consists of partial pyramids, where each partial pyramid is represented by a set of COGS.
The second one is a GeoTrellis Unstructured COG Layer: https://github.com/locationtech/geotrellis/blob/master/doc-examples/src/main/scala/geotrellis/doc/examples/spark/COGSparkExamples.scala#L114
The last one allows you just to collect metadata somehow about your dataset into an (extent, URI) tuples and to provide an interface to query it. Get familiar with the example I posted you and let me know if that works for you.
Btw, RasterFoundry uses unstructured COG layers for their tile server.

Related

which nova-compute libvirt's images_type better for me, raw or qcow2, glance image's disk_format is raw

We want to add some compute nodes which will support create instances with local disk, and I have no idea how to define the images_type of instances, but one point, we pursue the high disk I/O performance.
glance image info, this is settled by our team and could not change.
Field
Value
container_format
bare
disk_format
raw
nova-compute/nova.conf, which type would I choose and why ?
[libvirt]
#images_type = qcow2
#images_type = raw

Images storage performance react native (base64 vs uri path)

I have an app to create reports with some data and images (min 1 img, max 6). This reports keeps saved on my app, until user sent it to API (which can be done at the same day that he registered a report, or a week later).
But my question is: What's the proper way to store this images (I'm using Realm), is it saving the path (uri) or a base64 string? My current version keeps the base64 for this images (500 ~~ 800 kb img size), and then after my users send his reports to API, I deleted this base64 hash.
I was developing a way to save the path to the image, and then I display it. But image-picker uri returned is temporary. So to do this, I need to copy this file to another place, then save the path. But doing it, I got (for kind of 2 or 3 days) 2x images stored on phone (using memory).
So before I develop all this stuff, I was wondering, will it (copy image to another path then save path) be more performant that save base64 hash (to store at phone), or it shouldn't make much difference?
I try to avoid text only answers; including code is best practice but the question about storing images comes up frequently and it's not really covered in the documentation so I thought it should be addressed at a high level.
Generally speaking, Realm is not a solution for storing blob type data - images, pdf's etc. There are a number of technical reasons for that but most importantly, an image can go well beyond the capacity of a Realm field. Additionally it can significantly impact performance (especially in a sync'ing use case)
If this is a local only app, storing the images on disk in the device and keep a reference to where they are (their path) stored in Realm. That will enable the app to be fast and responsive with a minimal footprint.
If this is a sync'd solution where you want to share images across devices or with other users, there are several cloud based solutions to accommodate image storage and then store a URL to the image in Realm.
One option is part of the MongoDB family of products (which also includes MongoDB Realm) called GridFS. Another option is a solid product we've leveraged for years is called Firebase Cloud Storage.
Now that I've made those statements, I'll backtrack just a bit and refer you to this article Realm Data and Partitioning Strategy Behind the WildAid O-FISH Mobile Apps which is a fantastic article about implementing Realm in a real-world use application and in particular how to deal with images.
In that article, note they do store the images in Realm for a short time. However, one thing they left out of that (which was revealed in a forum post) is that the images are compressed to ensure they don't go above the Realm field size limit.
I am not totally on board with general use of that technique but it works for that specific use case.
One more note: the image sizes mentioned in the question are pretty small (500 ~~ 800 kb img size) and that's a tiny amount of data which would really not have an impact, so storing them in realm as a data object would work fine. The caveat to that is future expansion; if you decide to later store larger images, it would require a complete re-write of the code; so why not plan for that up front.

Why is ELT more suitable for stream processing?

See https://youtu.be/2pgaQIitxiQ at 55.30 and then 55.50
It suggests that Extract load transform (ELT) is more suitable for batch processing at 55.30 and the stream processing at 55.50.
I understand that the idea is that as the stream data comes in we load it and then transform.
But in case of batch processing also, isn't it the same concept. The data comes in, we load it, then process as batch.
Historically, the target DB would not have the power/capabilities to effectively transform data and therefore a dedicated transformation server was used - so we'd extract data from the source, transform it using the dedicated server and load it into the target DB. (ETL)
As the capabilities of DB servers (especially with the rise of cloud DBs) have improved, the use of a dedicated transformation server is no longer needed - so we can load directly to the target DB and transform the data there. (ELT)
In relation to your specific question, whether it makes more sense to do (some or all) transformations in the stream or in the target DB depends on the type of transformation and the capabilities of the stream and DB systems. There is no right or wrong answer

h2o subset to "bestofFamily"

AutoML makes two learners, one that includes "all" and the other that is a subset that is "best of family".
Is there any way to not-manually save the components and stacked ensemble aggregator to disk so that that "best of family", treated as a standalone black-box, can be stored, reloaded, and used without requiring literally 1000 less valuable learners to exist in the same space?
If so, how do I do that?
While running AutoML everything runs in memory (nothing is saved to disk unless you save one of the models to disk - or apply the option of saving an object to disk).
If you just want the "Best of Family" stacked ensemble, all you have to do is save that binary model. When you save a stacked ensemble, it saves all the required pieces (base models and meta model) for you. Then you can re-load later for use with another H2O cluster when you're ready to make predictions (just make sure, if you are saving a binary model, that you can use the same version of H2O later on).
Python Example:
bestoffamily = h2o.get_model('StackedEnsemble_BestOfFamily_0_AutoML_20171121_012135')
h2o.save_model(bestoffamily, path = "/home/users/me/mymodel")
R Example:
bestoffamily <- h2o.getModel('StackedEnsemble_BestOfFamily_0_AutoML_20171121_012135')
h2o.saveModel(bestoffamily, path = "/home/users/me/mymodel")
Later on, you re-load the stacked ensemble into memory using h2o.load_model() in Python or h2o.loadModel() in R.
Alternatively, instead of using an H2O binary model, which requires an H2O cluster to be running at prediction time, you can use a MOJO model (different model format). It's a bit more work to use MOJOs, though they are faster and designed for production use. If you want to save a MOJO model instead, then you can use h2o.save_mojo() in Python or h2o.saveMojo() in R.

GeoServer: Set sequence of layers to be rendered in a WMS

how do I configure the sequence of layers in my GeoServer workspace such that when I read the layers in WMS in, say~ Tableau (below) or QGIS, the list of the layers available to be checked is in the sequence I need?
In ArcGIS Server, this can be easily set by aligning the layers in ArcMap before publishing it to the ArcGIS Server.
However, I can't seem to find such a configuration in GeoServer Admin.
Thanks!
The ordering of the layers is solely at the control of the client. GeoServer just provides a list of layers that the client can pick and display in any order it likes.
If you need to combine certain layers in a specific order you could use a LayerGroup.
Got a solution for this.
I found out that arrangement of layers to be displayed on the client can be set by naming the layer "Name" field in order.
E.g., (1 Coastline), (2 Waterbodies), (3 Roads) will display coastline as first layer, followed by waterbodies and roads.

Resources