I've found several circumstances where storing addition metadata regarding a specific service would be convenient however custom fields don't seem to be supported in the services API (Only the basic id, name, address, port). For example, a database name or a load balancer weighting.
I'm curious as to the design decision - is there a best practice this evangelizes or perhaps this is a future enhancement that could be made?
I understand that we one could use the KV store for extra info but it seems more convenient to bundle like-information together and not make
multiple Consul lookups.
Metadata should go into the KV store. There are use cases as you describe. However, Consul is designed for the 95% of most common use cases (actual words of Armon Dadger, a Consul principle enginner). Arbitrary metadata lives just fine in the KV store.
Related
I'm currently working on a traditional monolith application, but I am in the process of breaking it up into spring microservices managed by kubernetes. The application allows the uploading/downloading of large files and these files are normally stored on the host filesystem. I'm wondering what would be the most viable method of persisting these files in a microservice architecture?
You have a bunch of different options, Googling your question you'll find many answers, for any budget and taste. Basically you'd want high-availability storage like AWS S3. You could setup your own dedicated server to store these files as well if you wanted to cut costs, but then you'd have to worry about backups and availability. If you need low latency access to these files then you'd want to have them behind CDN as well.
We are mostly on prem. We end up using nfs. Path to least resistance, but probably not the most performant and making it highly available is tough. If you have the chance i agree with Denis Pshenov, that S3-like system for example minio might be a better alternative.
Maybe you should have a look at the rook project (https://rook.io/). It's easy to set up and provides different kinds of storage and persistence technologies to your CNAs.
There are many places to store your data. It also depends on the budget that you are able to spent (Holding duplicate data means also more storage which costs money) and mostly on your business requirements.
Is all data needed at all time?
Are there geo/region-related cases?
How fast needs a read / write operation need to be?
Do things need to be cached?
Statefull or Stateless?
Are there operational requirements? How should this be maintained?
...
A part from this your microservices should not know where the data is actually stored. In kubernetes you can use Persistent-Volumes https://kubernetes.io/docs/concepts/storage/persistent-volumes/ that can link to a storage of your Cloud-Provider or something else. The microservice should just mount the volume and be able to treat it like a local file.
Note that the Cloud Provider Storages already include solutions for scaling, concurrency etc. So I would probably use a single Blob-Storage under the hood.
However it has to be said, there is trend to understand a microservice as a package of data and logic coupled together and also accept duplicating the data, which leads to better scalability.
See for more information:
http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/
https://github.com/katopz/best-practices/blob/master/best-practices-for-building-a-microservice-architecture.md#stateless-service-instances
https://12factor.net/backing-services
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html
I am new to prometheus, and so I am not sure if high availability is part of Prometheus data store tsdb. I am not looking into something like having two prometheus server instances scraping data from the same exporter as that has high chance of having two tsdb data store which are out of sync.
It really depends on your requirements.
Do you need highly available alerting on your metrics? Prometheus can do that.
Do you need a highly available monitoring system that contains the last few hours of data for operational triage? Two prometheus instances are pretty good for that too.
Do you need long-term storage of timeseries data? Prometheus is not designed to accomplish this on its own. Either use the remote write functionality of prometheus to ship data to another TSDB that supports redundant storage (InfluxDB and Clickhouse are pretty promising here) but you are on the hook for de-duping data. Alternatively, consider Cortex.
For Kubernetes setup Using kube-prometheus (prometheus-operator), you can configure it using values.
and including thanos Would help in this situation
There is prometheus-postgresql-adapter that allows you to use PostgreSQL / TimescaleDB as a remote storage. The adapter enables multiple Prometheus instances (HA setup) to write to a single remote storage, so you have one source of truth. Recently, I've published a blog post about it [How to manage Prometheus high-availability with PostgreSQL + TimescaleDB] (https://blog.timescale.com/blog/prometheus-ha-postgresql-8de68d19b6f5/).
Disclaimer: I am one of the engineers behind the adapter
We're considering using Consul's key-value store to enhance our configuration management tool (Ansible, at the moment). We're hoping to use it to solve two related problems:
Preventing scatter: Some items (namely: passwords, certificates etc) are scattered across our configuration files. Updating them requires manual search-and-replace which can be tiresome.
Ease of update: rather then edit-and-commit configuration changes into git, we could use Consul to store those items that change often.
We're looking for a set of recommendations on how to use/integrate Consul (or similar tools) for dynamic configurations. Naturally, there is no one answer, but a set of useful practices here. I'll provide a few approaches in my answer, but I'd like to hear additional ideas.
We've been tinkering with Consul as a key-value store for a while but I think the most interesting use comes with Consul Template and using that to update configuration on the fly.
I think the end state we're probably moving towards is going to be to use Ansible to configure a base image of things we know are slow changing plus configure Consul Template, then AMI this (these first 2 steps probably done via Packer) and then deploy into auto scaling groups in AWS using Terraform (which we already use for provisioning).
Then we will use Consul's key-value store to change properties that Consul Template will then propagate across a cluster of instances. We also intend to have instances register themselves in Consul which will also affect configuration on other instances such as load balancing members on Apache/NGINX configurations or lists of unicast addressable members for clustering.
On a slightly related note, and as mentioned by mahnve, Vault is a pretty nice add on to Consul for storing secrets. We're already using it for pretty static secrets but intend to start using some of the dynamic secret generation which allows you to request short lived API or SSH keys which can be tracked and revoked.
To mitigate #1 I'd suggest looking into Hashicorps Vault, https://www.vaultproject.io/, which is a tool to handle secrets, which can use Consul as a backend.
We've yet to do this, but are thinking about integrating consul into our Ansible plays. Ansible recently added a lookup option from consul:
https://github.com/ansible/ansible/blob/devel/test/integration/roles/test_consul_kv/tasks/main.yml#L70
- debug: msg='key contains {{item}}'
with_consul_kv:
- 'key/to/retrieve'
So we could directly populate our plays with values from Consul.
Another approach we're considering is to utilize consul's templating tool - and template entire configuration files after ansible plants them on our hosts.
I would like to store user profile information. After researching a bit online, I am confused between the following options:
Use a LDAP server (example: Open DJ) - I can write Java clients which can interact with the LDAP server using LDAP APIs.
Store user profile in a database as a JSON document (like in Elastic DB) - The No SQL databases can then index the documents to improve lookup time.
What are the factors that I should keep in mind before selecting one of the approaches?
For a start, if you are storing passwords, then using LDAP is a no brainer IMO. See http://smart421.com/smart-identity-and-fraud/why-bother-with-an-ldap-anyway/ .
Otherwise I would recommend you do a PoC with each solutions (do not forget to add indexes for OpenDJ and you may also use Rest2LDAP) see how they fill your needs. Both products are open source so its easy to get started.
If your user population is a known group that may already have accounts in an existing LDAP repository, or where user account information needs to be shared between systems, then it makes sense to use and add on to the existing LDAP repository.
If you are starting out from scratch and have mainly external, unknown users who have no other interaction with your infrastructure but this one application, then LDAP is not a good choice imo because of the overhead that you are getting for creating and managing the server. Then a lightweight JSON approach seems better suited (even thought the L in LDAP stands for "lightweight").
The number of expected users is less of a consideration - you need to thread carefully with very large populations in either scenario.
See this questions as well for additional insights Reasons to store users' data in LDAP instead of RDBMS
I am trying to build a prototype of Elasticsearch as a Service. I have thought of 2 different approaches and I'd like to get opinions towards one or the other implementation
One single installation of Elasticsearch, and a proxy layer on top to add user validation (http basic authentication + user account to validate the usage).
This approach would be relatively straight forward and the main challenge would be configure the cluster properly to handle the load, as well as the permissions so there are no data leaks of the users don't have access to the cluster management APIs.
Use Docker as a container and have one instance of elasticsearch for each user. In this case I would be providing the isolation by using the Linux container (Docker). I'd still need to manage authentication.
It probably would be good to implement both, play around and see how things behave. Any opinions about pros and cons of each approach?
Thanks!
Disclaimer: I am the founder of the Elasticsearch service provider Facetflow, which currently offers shared clusters.
I think that both approaches have merit, but maybe suited for different types of customers.
Looking at other SaaS providers, like MongoDB provider MongoLab, they essentially ended up offering both setups (although not using Docker).
So, pros and cons as I see them:
Shared Cluster
Most Elasticsearch as a Service providers operate this way.
Pros:
Far more affordable for the majority of users just looking for good search and analytics.
Simpler maintenance, less clusters for you to monitor
Potentially less versions of Elasticsearch to integrate with. If you need to communicate with other systems (which you do), write your own plugins (we did, for authentication, silos, entitlements, stats etc.) less versions will be far easier to maintain.
Cons:
Noisy neighbours have to be monitored and you have to scale and relocate indices to handle this.
Users have to choose from a limited list of versions of Elasticsearch, usually a single version.
Users don't get full cluster admin control.
Private Clusters using Docker
One provider that works this way is Found.
Pros:
Users could potentially be able to deploy a variety of versions of Elasticsearch
Users can have complete cluster admin access
Noisy neighbours don't affect their cluster, less manual intervention from you
Cons:
Complex monitoring and support. If people can do whatever they want (shut down the cluster over the api), you have to be clear where your responsibility as a provider ends, and what wakes you up at night.
Complex integration with multiple versions, see shared cluster pros.
More expensive since you have to allocate resources that might not always be used.