spring-data-mongodb/k8s "Database name must not contain slashes, dots, spaces, quotes, or dollar signs" - spring

I'm at a real loss on this one. I've been attempting to get my application running with a replica set in Kubernetes for awhile. I'm setting: spring.data.mongodb.uri=${MYAPP_MONGODB}:mongodb://localhost:27017/myapp
in application.properties and using Spring Data to access my objects.
Locally using a local MongoDB container it works fine even if I set the env var to my remote databases locally I can connect to them and work just fine. But when I put the value of MYAPP_MONGODB into k8s secrets when the container boots I get quoted error from the title. The value is like this:
mongodb://myuser:mypasswd#1.1.1.1:27017,2.2.2.2:27017,3.3.3.3:27017,4.4.4.4:27017,5.5.5.5:27017/myapp
I reviewed the source and still baffled as to why this is happening. Pulling the secret from the k8s environment it is correct.
Any help is much appreciated!

It sounds like your secret in k8s might be setup incorrectly. I would try uploading your secrets again and decoding them to make sure they are correct. Careful for random line breaks :)

Related

How do I prevent access to a mounted secret file?

I have a spring boot app which loads a yaml file at startup containing an encryption key that it needs to decrypt properties it receives from spring config.
Said yaml file is mounted as a k8s secret file at etc/config/springconfig.yaml
If my springboot is running I can still sh and view the yaml file with "docker exec -it 123456 sh" How can I prevent anyone from being able to view the encryption key?
You need to restrict access to the Docker daemon. If you are running a Kubernetes cluster the access to the nodes where one could execute docker exec ... should be heavily restricted.
You can delete that file, once your process fully gets started. Given your app doesnt need to read from that again.
OR,
You can set those properties via --env-file, and your app should read from environment then. But, still if you have possibility of someone logging-in to that container, he can read environment variables too.
OR,
Set those properties into JVM rather than system environment, by using -D. Spring can read properties from JVM environment too.
In general, the problem is even worse than just simple access to Docker daemon. Even if you prohibit SSH to worker nodes and no one can use Docker daemon directly - there is still possibility to read secret.
If anyone in namespace has access to create pods (which means ability to create deployments/statefulsets/daemonsets/jobs/cronjobs and so on) - it can easily create pod and mount secret inside it and simply read it. Even if someone have only ability to patch pods/deployments and so on - he potentially can read all secrets in namespace. There is no way how you can escape that.
For me that's the biggest security flaw in Kubernetes. And that's why you must very carefully give access to create and patch pods/deployments and so on. Always limit access to namespace, always exclude secrets from RBAC rules and always try to avoid giving pod creation capability.
A possibility is to use sysdig falco (https://sysdig.com/opensource/falco/). This tool will look at pod event, and can take action when a shell is started in your container. Typical action would be to immediately kill the container, so reading secret cannot occurs. And kubernetes will restart the container to avoid service interruption.
Note that you must forbids access to the node itself to avoid docker daemon access.
You can try mounting the secret as an environment variable. Once your application grabs the secret on startup, the application can then unset that variable rendering the secret inaccessible thereon.

How do you use s3a with spark 2.1.0 on aws us-east-2?

Background
I have been working on getting a flexible setup for myself to use spark on aws with docker swarm mode. The docker image I have been using is configured to use the latest spark, which at the time is 2.1.0 with Hadoop 2.7.3, and is available at jupyter/pyspark-notebook.
This is working, and I have been just going through to test out the various connectivity paths that I plan to use. The issue I came across is with the uncertainty around the correct way to interact with s3. I have followed the trail on how to provide the dependencies for spark to connect to data on aws s3 using the s3a protocol, vs s3n protocol.
I finally came across the hadoop aws guide and thought I was following how to provide the configuration. However, I was still receiving the 400 Bad Request error, as seen in this question that describes how to fix it by defining the endpoint, which I had already done.
I ended up being too far off the standard configuration by being on us-east-2, making me uncertain if I had a problem with the jar files. To eliminate the region issue, I set things back up on the regular us-east-1 region, and I was able to finally connect with s3a. So I have narrowed down the problem to the region, but thought I was doing everything required to operate on the other region.
Question
What is the correct way to use the configuration variables for hadoop in spark to use us-east-2?
Note: This example uses local execution mode to simplify things.
import os
import pyspark
I can see in the console for the notebook these download after creating the context, and adding these took me from being completely broken, to getting the Bad Request error.
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'
conf = pyspark.SparkConf('local[1]')
sc = pyspark.SparkContext(conf=conf)
sql = pyspark.SQLContext(sc)
For the aws config, I tried both the below method and by just using the above conf, and doing conf.set(spark.hadoop.fs.<config_string>, <config_value>) pattern equivalent to what I do below, except doing it this was I set the values on conf before creating the spark context.
hadoop_conf = sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com")
hadoop_conf.set("fs.s3a.access.key", access_id)
hadoop_conf.set("fs.s3a.secret.key", access_key)
One thing to note, is that I also tried an alternative endpoint for us-east-2 of s3-us-east-2.amazonaws.com.
I then read some parquet data off of s3.
df = sql.read.parquet('s3a://bucket-name/parquet-data-name')
df.limit(10).toPandas()
Again, after moving the EC2 instance to us-east-1, and commenting out the endpoint config, the above works for me. To me, it seems like endpoint config isn't being used for some reason.
us-east-2 is a V4 auth S3 instance so, as you attemped, the fs.s3a.endpoint value must be set.
if it's not being picked up then assume the config you are setting isn't the one being used to access the bucket. Know that Hadoop caches filesystem instances by URI, even when the config changes. The first attempt to access a filesystem fixes, the config, even when its lacking in auth details.
Some tactics
set the value is spark-defaults
using the config you've just created, try to explicitly load the filesystem via a call to Filesystem.get(new URI("s3a://bucket-name/parquet-data-name", myConf) will return the bucket with that config (unless it's already there). I don't know how to make that call in .py though.
set the property "fs.s3a.impl.disable.cache" to true to bypass the cache before the get command
Adding more more diagnostics on BadAuth errors, along with a wiki page, is a feature listed for S3A phase III. If you were to add it, along with a test, I can review it and get it in

Using redis with heroku

This is my first time using redis and the only reason I am is because I'm trying out autocomplete search tutorial. The tutorial works perfectly in development but I'm having trouble setting up redis for heroku.
I already followed these steps on the heroku docs for setting up redis but when I run heroku run rake db:seed I get Redis::CannotConnectError: Error connecting to Redis on 127.0.0.1:6379 (Errno::ECONNREFUSED)
I'm not very familiar with heroku so if you guys need any more information let me know.
Edit
I've completed the initializer steps shown here and when I run heroku config:get REDISCLOUD_URL the result is exactly the same as the Redis Cloud URL under the config vars section of my Heroku settings.
Following the documentation, I then set up config/initializers/redis.rb like so:
if ENV["REDISCLOUD_URL"]
$redis = Redis.new(:url => ENV["REDISCLOUD_URL"])
end
Just to check, I tried substituting the actual URL for redis cloud inside the if block instead of just the REDISCLOUD_URL variable but that didn't work. My error message hasn't changed when I try to seed the heroku db.
It’s not enough to just create a $redis variable that points to the installed Redis server, you also need to tell Soulmate about it, otherwise it will default to localhost.
From the Soulmate README you should be able to do something like this in an initializer (instead of your current redis.rb initializer, which you won’t need unless you are using Redis somewhere else in your app):
if ENV["REDISCLOUD_URL"]
Soulmate.redis = ENV["REDISCLOUD_URL"]
end
Looking at the Soulmate source, an easier way may be to set the REDIS_URL environment variable to the Redis url, either instead of or as well as REDISCLOUD_URL, as it looks like Soulmate checks this before falling back to localhost.
Your code is trying to connect to a local Redis instance, instead the one from Redis Cloud - make sure you've completed the initializer step as detailed in order to resolve this.

How to use DATABSE_URL in Play 2.0 (Scala) for local connection to PostgreSQL 9.1 & Heroku?

I have developed a first web application using a local PostgreSQL 9.1 on OSX (Lion 10.7.4) using Play! Framework 2.0.3. I started with my database connection defined in conf/application.conf (relative to application's directory) with
db.default.driver=org.postgresql.Driver
db.default.url="jdbc:postgresql://localhost/fotoplay"
db.default.user=foo
db.default.password=bar
(username and password have been changed before posting) This works for writing and testing. I now want to deploy to Heroku. I have set up the Procfile (in application's directory) with a single line:
web: target/start -Dhttp.port=${PORT} ${JAVA_OPTS}
-DapplyEvolutions.default=false
-Ddb.default.driver=org.postgresql.Driver
-Ddb.default.url=$DATABASE_URL
I have exported DATABASE_URL in .bash_profile so that I can echo $DATABASE_URL at the system command line and get
postgres://foo:bar#localhost/photoplay
At Heroku, I have set up an instance of postgresql. I'm not sure on how to write a database evolution to populate heroku, so I turned that off evolutions and populated my database manually. My current evolutions build tables and populate them with some initial data from some csv files, so I would need to place the csv files somewhere accessible to heroku. At least for a first pass, I don't want to tackle this issue just yet. But as a result of the manual steps, I have a populated database on Heroku.
With this configuration, my application runs correctly locally. However, I am not using DATABASE_URL, which seems wrong. I pushed this to HEROKU and is is not able to connect to the database. The error is:
Caused by: org.postgresql.util.PSQLException:
FATAL: password authentication failed for user "foo"
If I remove the username & password, I break my local configuration. I should be able to use DATABASE_URL 'instead of' the older syntax, but I don't quite know how to do that. I don't think that I can figure this out experimentally (wandering high dimensional spaces is hopeless, and there are several possible configuration settings that might be involved & are subject to typos). Any guidance on how I should set up application.conf would be appreciated.
Worth a try: in your application.conf file, change your db.default.url parameter in order to include the username & password in the URL, and remove the db.default.user and db.default.password parameters:
db.default.url="postgres://foo:bar#localhost/fotoplay"

Windows - Private hosts file for a certain environment

I've an application running on a dev server and connecting to a dev-db hosting an oracle instance.
Now i'm deploying the on a prod/prod-db machine
Since the dev-db url is hardcoded inside the java code, the just-copied binaries still points to dev-db. As a quick warkaround i added a line in Windows Host file on prod so that dev-db now points to prod-db IP address. It's work, but i'm not very satisfied of this global-scope solution.
I was wondering if exits a way to make a hosts file "private" for a certain environments ie. only valid in the scope of my running application
No, there's no way to do this, and it's a bad approach anyway.
You should instead fix the real problem, which is the hard-coding of the address inside your java code. Put such things in a properties file, and use a different properties file for production.

Resources