I have a Golang application running in Kubernetes which needs to persist a single string value outside of it's memory. In other words, if the application is redeployed or the pod is restarted - the value is not lost. I also need to be able to read and write this from golang regularly.
What is a good way to do this?
So far I've thought about:
ConfigMap: Would this be considered misuse to utilize a config map, and are they even persistent?
PersistentVolume: This seems appropriate, but can I store a single value or a file with this, rather than setting up an entire database?
Thank you!
In Kubernetes, you have the following options to store data outside the POD (or actually to share data between PODs).
Persistent Volume: a shared filesystem, you share data as files
ConfigMap/Secret: Kubernetes-based shared objects, you use Kubernetes API to store data; Kubernetes uses etcd under the hood and the consensus algorithm is applied to every data change, so the size of data needs to be small and the performance is not great
3rd-party tool: Hazelcast, Redis, TiKV; you use their client (or REST API) to store your values
I'm not sure about your exact use case, but I'd start from a 3-rd party software. They are very simple to deploy with Helm Chart of Operators. Another option is Persistent Volume. ConfigMap/Secret, I'd treat it as last resort.
Related
I have a Go application running in a Kubernetes cluster which needs to read files from a large MapR cluster. The two clusters are separate and the Kubernetes cluster does not permit us to use the CSI driver. All I can do is run userspace apps in Docker containers inside Kubernetes pods and I am given maprtickets to connect to the MapR cluster.
I'm able to use the com.mapr.hadoop maprfs jar to write a Java app which is able to connect and read files using a maprticket, but we need to integrate this into a Go app, which, ideally, shouldn't require a Java sidecar process.
This is a good question because it highlights the way that some environments impose limits that violate the assumptions external software may hold.
And just for reference, MapR was acquired by HPE so a MapR cluster is now an HPE Ezmeral Data Fabric cluster. I am still training myself to say that.
Anyway, the accepted method for a generic program in language X to communicate with the Ezmeral Data Fabric (the filesystem formerly known as MapR FS) is to mount the file system and just talk to it using file APIs like open/read/write and such. This applies to Go, Python, C, Julia or whatever. Inside Kubernetes, the normal way to do this mount is to use a CSI driver that has some kind of operator working in the background. That operator isn't particularly magical ... it just does what is needful. In the case of data fabric, the operator mounts the data fabric using NFS or FUSE and then bind mounts[1] part of that into the pod's awareness.
But this question is cool because it precludes all of that. If you can't install an operator, then this other stuff is just a dead letter.
There are three alternative approaches that may work.
NFS mounts were included in Kubernetes as a native capability before the CSI plugin approach was standardized. It might still be possible to use that on a very vanilla Kubernetes cluster and that could give access to the data cluster.
It is possible to integrate a container into your pod that does the necessary FUSE mount in an unprivileged way. This will be kind of painful because you would have to tease apart the FUSE driver from the data fabric install and get it to work. That would let you see the data fabric inside the pod. Even then, there is no guarantee Kubernetes or the OS will allow this to work.
There is an unpublished Go file system client that users the low level data fabric API directly. We don't yet release that separately. For more information on that, folks should ping me directly (my contact info is everywhere ... email to ted.dunning hpe.com or gmail.com works)
The data fabric allows you to access data via S3. With the 7.0 release of Ezmeral Data Fabric, this capability is heavily revamped to give massive performance especially since you can scale up the number of gateways essentially without limit (I have heard numbers like 3-5GB/s per stateless connection to a gateway, but YMMV). This will require the least futzing and should give plenty of performance. You can even access files as if they were S3 objects.
[1] https://unix.stackexchange.com/questions/198590/what-is-a-bind-mount#:~:text=A%20bind%20mount%20is%20an,the%20same%20as%20the%20original.
My use case:
A single-node out-of-memory "big dict" (or "big map"). The total size is too large for memory, e.g. 20gb, but is ok for single-node disk. Due to the total size, it's unwieldy with single-file solution like SQLite. Also I want easy close backpacks, so want manageable file sizes. It needs to be a series of size-controllable files managed by the tool in a user transparent way. Further it should be embedded, ie, a simple lib, no client/server.
Long story short, I picked Rocksdb.
Now new requirements or nice-to-haves: I want to use a cloud blobstore as the ultimate storage. For example, a couple levels of hot caches reside in mo.eory or local disk with configurable total size; beyond that, go read/write to a cloud blob store.
After the initial creation of the dataset, the usage is mainly read. I don't care much about "distributed", multiple-machines competing-to-write that kind of complexities.
I don't see Rocksdb has this option.There's rocksdb-cloud that appears to be in "internal dev" mode---no end-user doc whatsoever.
Questions:
Is my use case reasonable? Would a cloud kv store (like GCP Firestore?) plus a naive flat cache in memory going to have similar effect?
How to do this with Rocksdb? Or any alternative?
Thanks.
RocksDB allows you to define your own FileSystem or Env, which you can implement the interaction layer with whatever special filesystem you want. So it's possible, but you need implement or define the integration layer with cloud kv store. (running on HDFS is an example, which defines it's own Env)
Comment to #jayZhuang; too long as comment
This looks like the code is modular in decent ways, but can hardly say it "supports" cloud storage, because that needs forking and hacking the code itself. More reasonable to the end use would be extension or plugin from outside, basically "give me a few auth argents and the location of the storage and I do the rest". The few major blob stores should be a modest effort for this.
For me, I'm using Rocksdb from python via a hardly maintained Rocksdb python client. (There are no active options.) I have nice python utilities for cloud blobstore. I'm sure there's no way to let Rocksdb use that coming from python via an inactive Rocksdb python client package. Although I am able to do c++ extensions for python, that would need digging into both Rocksdb and blobstore in c++. It's not something I'll take on.
Thanks for the pointers. Do you know of any other examples closer to the end user?
My app use a Google cloud Firestore instance. Among the data my App manages there some classical data (string, number, ...): No problem with that / Firestore handle these use case easily.
But my app also need to consume images that are linked to the other data.
So I'm looking for the right solution to manage images.: I try to use the "reference" type Field from my Firestore instance but I'm not sure that the right way...
Is there another solution outside Firestore?
What about Google cloud Filestore?: It seems available only from an app engine or a VM...
I try to use the "reference" type Field from my Firestore instance but I'm not sure that the right way...
Is there another solution outside Firestore?
What about Google cloud Filestore?: It seems available only from an app engine or a VM...
Disclosure: I work on the Firebase team at Google.
When I want to use both structured and unstructured data in my application, I use Cloud Firestore for the structured data, and Cloud Storage for the unstructured data. I use both of these through their Firebase SDKs, so that I can access the data and files directly from within my application code, or from server-side code (typically running in Cloud Functions).
There is no built-in reference type between Firestore and Storage, so you'll need to manage that yourself. I usually store either the path to the image in Firestore, or the download URL of the image. The choice between these two mostly depends on whether I want the file to be publicly accessible, or whether access needs to be controlled more tightly.
Since there is no managed relationship between Firestore and Storage (or any other Firebase/Google Cloud Platform services), you'll need to manage this yourself. This means that you'll need to write the related data (like the path above), check for its integrity when reading it (and handle corrupt data gracefully), and consider periodically running a script that removes/fixes up corrupt data.
I am new to prometheus, and so I am not sure if high availability is part of Prometheus data store tsdb. I am not looking into something like having two prometheus server instances scraping data from the same exporter as that has high chance of having two tsdb data store which are out of sync.
It really depends on your requirements.
Do you need highly available alerting on your metrics? Prometheus can do that.
Do you need a highly available monitoring system that contains the last few hours of data for operational triage? Two prometheus instances are pretty good for that too.
Do you need long-term storage of timeseries data? Prometheus is not designed to accomplish this on its own. Either use the remote write functionality of prometheus to ship data to another TSDB that supports redundant storage (InfluxDB and Clickhouse are pretty promising here) but you are on the hook for de-duping data. Alternatively, consider Cortex.
For Kubernetes setup Using kube-prometheus (prometheus-operator), you can configure it using values.
and including thanos Would help in this situation
There is prometheus-postgresql-adapter that allows you to use PostgreSQL / TimescaleDB as a remote storage. The adapter enables multiple Prometheus instances (HA setup) to write to a single remote storage, so you have one source of truth. Recently, I've published a blog post about it [How to manage Prometheus high-availability with PostgreSQL + TimescaleDB] (https://blog.timescale.com/blog/prometheus-ha-postgresql-8de68d19b6f5/).
Disclaimer: I am one of the engineers behind the adapter
We're considering using Consul's key-value store to enhance our configuration management tool (Ansible, at the moment). We're hoping to use it to solve two related problems:
Preventing scatter: Some items (namely: passwords, certificates etc) are scattered across our configuration files. Updating them requires manual search-and-replace which can be tiresome.
Ease of update: rather then edit-and-commit configuration changes into git, we could use Consul to store those items that change often.
We're looking for a set of recommendations on how to use/integrate Consul (or similar tools) for dynamic configurations. Naturally, there is no one answer, but a set of useful practices here. I'll provide a few approaches in my answer, but I'd like to hear additional ideas.
We've been tinkering with Consul as a key-value store for a while but I think the most interesting use comes with Consul Template and using that to update configuration on the fly.
I think the end state we're probably moving towards is going to be to use Ansible to configure a base image of things we know are slow changing plus configure Consul Template, then AMI this (these first 2 steps probably done via Packer) and then deploy into auto scaling groups in AWS using Terraform (which we already use for provisioning).
Then we will use Consul's key-value store to change properties that Consul Template will then propagate across a cluster of instances. We also intend to have instances register themselves in Consul which will also affect configuration on other instances such as load balancing members on Apache/NGINX configurations or lists of unicast addressable members for clustering.
On a slightly related note, and as mentioned by mahnve, Vault is a pretty nice add on to Consul for storing secrets. We're already using it for pretty static secrets but intend to start using some of the dynamic secret generation which allows you to request short lived API or SSH keys which can be tracked and revoked.
To mitigate #1 I'd suggest looking into Hashicorps Vault, https://www.vaultproject.io/, which is a tool to handle secrets, which can use Consul as a backend.
We've yet to do this, but are thinking about integrating consul into our Ansible plays. Ansible recently added a lookup option from consul:
https://github.com/ansible/ansible/blob/devel/test/integration/roles/test_consul_kv/tasks/main.yml#L70
- debug: msg='key contains {{item}}'
with_consul_kv:
- 'key/to/retrieve'
So we could directly populate our plays with values from Consul.
Another approach we're considering is to utilize consul's templating tool - and template entire configuration files after ansible plants them on our hosts.