Can a person add CORS headers using the ELB Application Load Balancer (sitting in front of Solr)? - ajax

We have a number of EC2 instances running Solr in EC2, which we've used in the past through another application. We would like to move towards allowing users (via web browser) to directly access Solr.
Without something "in front" of Solr this results in a security risk, so we have opted to try to use ELB (specifically the Application Load Balancer) as a simple and maintenance free way of preventing certain requests from hitting SOLR (i.e. preventing the public from DELETING or otherwise modifying the documents in Solr).
This worked great, but we realize that we need to deal with the CORS issue. In other words, we need to add the appropriate headers to requests that come in from a browser. I have not yet seen a way of doing this with Application Load Balancer but am wondering if it is possible to do someway. If it is not possible, I would love as an additional recomendation the easier and least complicated way of adding these headers. We really really really hate to add something like nginx in front of Solr because then we've got additional redundancy to deal with, more servers, etc.
Thank you!

There is not much I can find on CORS for ALB either and I remember when I used Beanstalk with ELB I had to add CORS support in my java application directly.
Having said that, I can find a lot of articles on how to set up CORS for Solr.
Can it be an option for you?

Related

How to remotely connect to a local elasticsearch server - in a secure way ofc

I have been playing around with creating a webapp that uses elasticsearch to perform queries. Currently, everything is in production, thus on the localhost, let's say elasticsearch runs at 123.123.123.123:9200. All fun and games, but once the webapplication (react) is finished, the webapp should be able to send the queries to the current local elastic search db.
I have been reading around on how to get this done in a proper and most of all secure way. Summary of this all is currently:
"First off, exposing an Elasticsearch node directly to the internet without protections in front of it is usually bad, bad news." (see here: Accessing elasticsearch from a public domain name or IP).
Another interesting blog I found: https://code972.com/blog/2017/01/dont-be-ransacked-securing-your-elasticsearch-cluster-properly-107.
The problem with the above-mentioned sources is that they are a bit older, and thus I am not sure whether they are up to date.
Therefore the following questions:
Is nginx sufficient to act as a secure middleman, passing the queries from the end-users to elastic?
What is the difference at that point with writing a backend into the react application (e.g. using node and express)?
What is the added value taking into account the built-in security from elasticsearch (usernames, password, apikey, certificates, https,...)?
I am reading a lot about using a VPN or tunneling. I have the impression that these solutions are more geared towards a corporate-collaborative approach. Let's say I am running my front-end on a live server, I can use tunneling to show my work to colleagues, my employer. VPN would be more realistic for allowing employees -wish I had them, just a cs student here- to access e.g. the database within my private network (let's say an employee needs to access kibana to adapt something, let's say an API-key - just making something up here), he/she could use a VPN connection for that.
Thank you so much for helping me clarify the above-mentioned points!
TLS, authorisation and access control are free for the Elastic Stack, and have been for a while. I'd start by looking at the docs, as it's an easy way to natively secure your cluster
for nginx, it can be useful for rate limiting, or blocking specific queries for eg. however it's another thing to configure and maintain
from a client POV it would really only matter if you are using the official Elasticsearch clients, and you use nginx and make changes to the way the API would respond to the client (eg path rewrites, rate limiting)
it's free, it's native, it's easy to manage via Kibana
I'd follow the docs to secure Elasticsearch and then see if you need this at some point in the future. this would be handled outside Elasticsearch anyway, and you'd still want to secure Elasticsearch
The point in exposing Elasticsearch nodes directly to the internet is a higher vulnerability in principle. You should follow the rule of the least "surface" of your system on the internet.
A good practice is to hide from the internet whatever doesn't need to be there, although well protected. It takes ~20mins to get cyber attacks on any exposed service (see a showcase).
So I suggest you install a private network, such as a traditional VPN or an SDP product such as Shieldoo Mesh.

Caching proxy solutions for internet bound HTTPS traffic

Sorry if this is inappropriate for SO but wasn't sure where best to ask this question!
Background:
Running applications on EC2 Container Service (ECS) inside an AWS VPC.
There's the potential to move the functions making the requests to Lambda functions in the near future (3-6 months).
What I'm functionally looking to achieve:
Cache responses from HTTPS traffic to specific URL patterns (eg. subdomain.example.com) for specified periods (eg. 7 days).
We're hitting API limits for free/paid services and looking to inject a layer to handle duplicate requests transparently, not easy to handle at the application layer unfortunately.
Have this applied at a VPC (eg. InternetGateway?) level or ECS service level - not too fussed which one.
Ideally this is transparent to the application itself that'd be fantastic but guessing the fact it's HTTPS traffic may throw a spanner in the works of that. Was initially thinking this may be possible at the InternetGateway level but assuming that doesn't have easy access to request headers.
Potential solutions:
Squid? (https://aws.amazon.com/articles/6463473546098546)
Linkerd? - Seems like something should be possible with this (we also don't have a unified service discovery approach at the moment so this might kill 2 birds with 1 stone).
Any suggestions would be greatly appreciated!
Alex
PS. As you can probably tell I'm a little out of my depth in this one, sorry if I'm mixing patterns/solutions!
If I understand your question correctly you want to cache certain responses that you do towards paid/free API's of 3rd parties. I'm wondering wether you're looking for a solution that works inside your VPC or if it's fine if the solution is outside.
When you're OK with some solution running outside of your VPC, Cloudfront might be something worth looking into. Cloudfront can act as a caching layer for any content of any origin, even if the origin connection is using HTTPS. It is even possible to use signed URL's or signed cookies with Cloudfront to restrict unwanted access, if that's what you're going for.

Good default tomcat server.xml for AWS ElasticBeanstalk

My setup is a WAR deployed though elasticBeanstalk on a Tomcat7/Java7 app. I'm doing basic HTML with Servlets, and REST. Nothing fancy.
I would like to replace the default server.xml for Tomcat7/Java7 under my elasticBeanstalk.
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers.html
I'm a bit confused.
I'm looking for reasonable performance tuning numbers for the parameters there.
Looking for good defaults for security as well
Should I touch the AJP connector? (every request goes to a servlet) what should I configure?
Does this setup have Apache as a front-end, or do the HTTP requests go directly to Tomcat?
My instance is choking after a relatively low amount of concurrent users, with ~9% CPU, and plenty of DB connections. Am I jumping to conclusions with server.xml?
Thanks..
AJP is not necessary as you have mentioned that most of the URL requests are servlet based. You can use AJP incase if you have more static contents to serve.
Most of the cases the performance tuning needs to be done at the web frontend part. Below are my suggestions.
Use gzip compression for web contents.
Make your pages cachable by using cache related HTTP headers (Etag, Expires, Cache-control,). By doing so you will reduce the number of unwanted HTTP requests.
JS and CSS can be minified inorder to reduce their sizes.
Check if you are getting more traffic from web crawlers. If you are getting more traffic from those try to reuse web sessions with Crawler_Session_Manager_Valve.
Try to index key tables of your database.
Make sure you are using DB connection pooling instead of opening new connections for every new request.
Avoid unwanted URL redirections (302, 304).
Incase if you are looking for a good book that can help you optimize your website refer High Performance Web Sites for O'Reilly

How to serve images from Riak

I have a bunch of markdown documents in Riak, which I'm exposing via a small Sinatra API with basic search functionality etc.
Each document has an associated image, also stored in Riak (in a different bucket). I'd like to have a client app display the documents alongside their associated images - so I need some way to make the images available, but as I'm only ever going to be requesting them by key it seems wasteful to serve them via a Sinatra app as I'm doing with the documents.
However I'm uneasy with serving them directly from Riak, because a) even using nginx to limit the acceptable requests, I worry about exposing more functionality than we want to and b) Riak throws a 403 for any request where the referrer is set, so by default using a direct-to-Riak url as the src of an img tag doesn't work.
So my question is - what's a good approach to take for serving the images? Add another endpoint to the Sinatra app? Direct from Riak using some Nginx wizardry that is currently beyond me? Or some other approach I haven't considered yet? This would ideally use Ruby as that's what the team I'm working with are more comfortable with.
Not sure if this question might be better suited to Server Fault - if so I'll move it over.
You're right to be concerned about exposing Riak to any direct connectivity. Until 2.0 arrives early next year, there is no security in the system (although the 403 for requests with a referrer is a security mechanism to protect against XSS), and even with security exposing any database directly to the Internet invites disaster.
I've not done anything with nginx, but all you'd really need to use it properly, I'd think, would be two features:
Ability to restrict requests to GET
Ability to restrict (or rewrite) requests to the proper bucket
Ability to strip out all HTTP headers that Riak includes in its result (which, since nginx is a proxy server and not a straight load balancer, seems like it should be straightforward)
Assuming that your images are the only content in that bucket, nginx feels like a reasonable choice here.

Should I make my CouchDB database server public-facing?

I'm new to CouchDb and am trying to comprehend how to properly make use of it. I'm coming from MongoDB where I would always write a web layer and put it in front of mongo so that I could allow users to access the data inside of it, etc. In fact, this is how I've used all databases for every web site that I've ever written. So, looking at Couch, I see that it's native API is HTTP and that it has built in things like OAuth support, and other features that hint to me that perhaps I should no longer have my code layer sitting in front of Couch, but instead write Views and things and just give out accounts to Couch to my users? I'm thinking in terms of like an HTTP-based API for a site of mine, or something that users would consume my data through. Opening up Couch like this seems odd to me, though. Is OAuth, in Couch's sense, meant more for remote access for software that I'd write and run internal to my own network "officially", or is it literally meant for the end users?
I know there might be things that could only be done through a code layer on top of CouchDB, like if you wanted additional non-database related things to occur during API requests, also. So thinking along those lines I think I will still need a code layer, anyway.
Dealer's choice.
Nodejitsu has a great writeup on this sort of topic here.
Not knowing your application specifics I'll take a broad approach...
Back-end
If you want to prevent users from ever seeing your database then make it back-end. You can pipe everything through something like node.js and present only what the user needs to see and they'll never know anything about the database.
See Resource View Presenter
Front-end
If you are not concerned about data security, you can host an entire app on CouchDB; see CouchApp. This approach has the benefit of using the replication mechanism to control publishing your site/data. The drawback here is that you will almost certainly run into some technical limitations that will require moving CouchDB closer to the backend.
Bl-end
Have the app server present the interface and the client pull the data from the database separately. This gives the most flexibility but can be a bag of hurt because even with good design this could lead to supportability and scalability issues.
My recommendation
Use CouchDB on the backend. If you need mobile clients to synchronize then use a secondary DB publicly exposed for this purpose and selectively sync this data to wherever it needs to go.
Simply put, no.
There's no way to secure Couch properly on a public facing site. There's no way to discriminate access at a fine enough granular level. If someone has access to any of the data, they have access to all of the data.
Not all data on a site is meant for public consumption, save for the most trivial of sites.

Resources