oc debug on a stateful set results in PVC errors - debugging

I'm running a stateful set in Openshift 4.3 which does not start properly. I suspect permissions issues, but that's not directly relevant to the question. I'm having problems getting a debug container to start.
I run the command to create the stateful set and other relevant objects. The pod created for the stateful set (I'm only running one replica at the moment) crashes (which I expect). Then I issue the command oc debug statefulset/[ss-name] and I get an error saying that the primary container is invalid because * spec.containers[0].volumeMounts[0].name: Not found: "volume"
The volume does exist, though - it's called 'volume' and it creates successfully when I start up the stateful set.
I'm sure I'm just missing something when it comes to the creation of the debug pod, but I'm not sure what - I can't find anything on Google that suggests that I would need to create a separate PVC for the debug pod or anything. What am I missing?

Okay, I figured out the issue, here. When you start up a debug pod, it's a deployment of its own, and is NOT part of the stateful set. That's why it couldn't find the volume - the volume was created as part of the stateful set, and creating a debug pod creates nothing except the pod, with none of the other SS trappings.
I was able to start the debug pod by removing the section where it attempted to mount the volume, instead having that folder use the ephemeral storage that's local to the pod (since I didn't care what happened to the data on it anyway).

Related

How to troubleshoot DDEV DB container healthcheck timeout

When i want to start my DDEV Project an Container stucks at creating
Container ddev-oszimt-lf12a-v2-db Started
Error Message:
Failed waiting for web/db containers to become ready: db container failed: log=, err=health check timed out after 2m0s: labels map[com.ddev.site-name:oszimt-lf12a-v2 com.docker.compose.service:db] timed out without becoming healthy, status=
Its an Error i also had with some other projects.
In the Error Log is no information about this.
What could the Problem be and how do i fix it?
This isn't a very good place to study problems with specific projects, our Discord Channel is much better, or the DDEV issue queue.
But I'll try to give you some ideas about how to study and debug this.
Go to the Troubleshooting section of the docs. Work through it step-by-step.
As it says there, try the simplest possible project and see what the results are.
If the problem is particular to one particular project, see if you can remove customizations like .ddev/docker-compose.*.yaml files and config.*.yaml and non-standard things in the config.yaml file.
To find out what the causes the healthcheck timeout, see the docs on this exact problem, in your case the db container is timing out. So first, ddev logs -s db to see if something happened, and second docker inspect --format "{{json .State.Health }}" ddev-<projectname>-db.
For more help, you'll need to provide more information with things like your OS, Docker Provider, etc, and the easiest way to do that is to run ddev debug test and capture the output and put it in a gist on gist.github.com, then come over to discord with a link to that.

Couldn’t acquire exclusive lock on DB at ‘/eventstore/db’

I’m trying to install eventstore on ubuntu 20.04 but everytime I run evenstored --what-if (as root or as simple user, or as sudo) I get the following error message : Couldn't acquire exclusive lock on DB at '/eventstore/db'..
I tried many things :
I tried ensuring that eventstore user and group were owner of the folder.
reinstalling eventstore
rebooting server
stop process with systemctl stop eventstore and starting it back again
I also tried launching service first (as root / sudo or simple user) before using eventstored --what-if.
I can’t figure out why I keep getting this message as if many instance of eventstore where launched at the same time.
EDIT :
Here is my config file (/etc/eventstore/eventstore.conf)
# Paths
Db: /eventstore/db
Index: /eventstore/index
Log: /eventstore/logs
# Certificates configuration
CertificateFile: /etc/eventstore/certs/cert.crt
CertificatePrivateKeyFile: /etc/eventstore/certs/privkey.key
TrustedRootCertificatesPath: /etc/ssl/certs
CertificateReservedNodeCommonName: "*.mathob-jehanno.com"
# Network configuration
IntIp: 37.187.2.103
ExtIp: 37.187.2.103
IntHostAdvertiseAs: mathob-jehanno.com
ExtHostAdvertiseAs: mathob-jehanno.com
HttpPort: 2113
IntTcpPort: 1112
EnableExternalTcp: false
EnableAtomPubOverHTTP: false
# Projections configuration
RunProjections: None
It happened to me previously. I was running v20 without supplying the necessary settings like the certificates were missing. The server crashed because of this, but the last message you see is this Couldn't acquire exclusive lock on DB at '/eventstore/db'. You might look close and see if it's a warning, and the real reason for the crash is mentioned earlier in the stack trace of the original error.
Ok so
First of all, comments helped a lot :
this error message is following another one which give more detail about what the problem is.
One thing to know is that eventstored --what-if is supposed to be run while service is not running so user need to stop the service before (systemctl stop eventstore).
I then changed the path to db, index and logs file to match the default value (it prevented me some permissions error).

How to edit internal files without running container

Mariadb10.3 was installed as Docker on Mac, and the collaction-server value in the /etc/mysql/my.cnf file was modified.
After modification, I tried to restart the container, but when I entered the'docker ps -a' command, the Status was displayed as Exited(1).
So I entered docker logs [container name] and the following error was displayed.
The setting parameter was incorrectly written as'collection-server=utf8_unicode_ci'.
So the container did not run.
I've looked at several ways, but I can't find a way to modify the internal files without running the container.
I know that you shouldn't tamper with files inside the Docker container.
My question may be,'How do I edit a file inside the computer without turning on the computer?', but I don't think that the answer is to delete the container and create a new one.
Of course, deleting the container and installing a new one will save time and may be the simplest method. But I thought in a different way.
If a company that actually operates this docker container has the same mistake as me and cannot operate the container, it must be a very fatal mistake.
Because of that, I don't know... I think there is definitely a way.
I would like advice on a solution to this method.

How do I prevent access to a mounted secret file?

I have a spring boot app which loads a yaml file at startup containing an encryption key that it needs to decrypt properties it receives from spring config.
Said yaml file is mounted as a k8s secret file at etc/config/springconfig.yaml
If my springboot is running I can still sh and view the yaml file with "docker exec -it 123456 sh" How can I prevent anyone from being able to view the encryption key?
You need to restrict access to the Docker daemon. If you are running a Kubernetes cluster the access to the nodes where one could execute docker exec ... should be heavily restricted.
You can delete that file, once your process fully gets started. Given your app doesnt need to read from that again.
OR,
You can set those properties via --env-file, and your app should read from environment then. But, still if you have possibility of someone logging-in to that container, he can read environment variables too.
OR,
Set those properties into JVM rather than system environment, by using -D. Spring can read properties from JVM environment too.
In general, the problem is even worse than just simple access to Docker daemon. Even if you prohibit SSH to worker nodes and no one can use Docker daemon directly - there is still possibility to read secret.
If anyone in namespace has access to create pods (which means ability to create deployments/statefulsets/daemonsets/jobs/cronjobs and so on) - it can easily create pod and mount secret inside it and simply read it. Even if someone have only ability to patch pods/deployments and so on - he potentially can read all secrets in namespace. There is no way how you can escape that.
For me that's the biggest security flaw in Kubernetes. And that's why you must very carefully give access to create and patch pods/deployments and so on. Always limit access to namespace, always exclude secrets from RBAC rules and always try to avoid giving pod creation capability.
A possibility is to use sysdig falco (https://sysdig.com/opensource/falco/). This tool will look at pod event, and can take action when a shell is started in your container. Typical action would be to immediately kill the container, so reading secret cannot occurs. And kubernetes will restart the container to avoid service interruption.
Note that you must forbids access to the node itself to avoid docker daemon access.
You can try mounting the secret as an environment variable. Once your application grabs the secret on startup, the application can then unset that variable rendering the secret inaccessible thereon.

Mesos framework stays inactive due to "Authentication failed: EOF"

I'm currently trying to deploy Eremetic (version 0.28.0) on top of Marathon using the configuration provided as an example. I actually have been able to deploy it once, but suddenly, after trying to redeploy it, the framework stays inactive.
By inspecting the logs I noticed a constant attempt to connect to some service that apparently never succeeds because of some authentication problem.
2017/08/14 12:30:45 Connected to [REDACTED_MESOS_MASTER_ADDRESS]
2017/08/14 12:30:45 Authentication failed: EOF
It looks like the service returning an error is ZooKeeper and more precisely it looks like the error can be traced back to this line in the Go ZooKeeper library. ZooKeeper however seems to work: I've tried to query it directly with zkCli and to run a small Spark job (where the Mesos master is given with zk:// URL) and everything seems to work.
Unfortunately I'm not able to diagnose the problem further, what could it be?
It turned out to be a configuration problem. The master URL was simply wrong and this is how the error was reported.

Resources