rsyslog rewrite hostname before relay - rsyslog

I am setting up rsyslog in a multitenant environment to relay to a central server. Because it is multitenanted, I would like to prefix the hostname from the first rsyslog server with a customer specific prepend before relaying on to the central server. I had planned to set the prefix manually, however, the prefix is configured in another file on the server, and if this could be gathered from that file, that would be even better.
Because the first server will be relaying from multiple hosts, the prepend has to be a dynamic rewrite that includes the original hostname rather than a hard-coded overwrite of the same hostname for all entries, which I've seen in some examples.
Ideally, what I am trying do do is summarised by the following pseudocode:
ruleset(name="myrule"){
set $hostname = "<prefix>-%HOSTNAME%"
action(type="omfwd" target="remote-ip")
}
I will be responsible for both the intermediate relay and the central server, but each relay can host multiple customers, so I don't think that the rewrite can be done on the central server, but I have full control of both layers. Each customer is connected via a dedicated interface and I was planning for a separate ruleset attached to an input configured for each interface and the ruleset to include the customer specific prefix. For this reason, I think the config needs to be on the relay, but if there's a different way, then I am willing to try anything that meets the end-goal of making events customer-identifiable.
The reason for wanting to use the hostname rewrite is because this is in-line with how other tools are configured in the environment and it is highly desirable to keep a homogenous setup. However, if that is not possible, another method may be considered if the first is not technically feasible.
What is the correct way to do this?

Related

How to remotely connect to a local elasticsearch server - in a secure way ofc

I have been playing around with creating a webapp that uses elasticsearch to perform queries. Currently, everything is in production, thus on the localhost, let's say elasticsearch runs at 123.123.123.123:9200. All fun and games, but once the webapplication (react) is finished, the webapp should be able to send the queries to the current local elastic search db.
I have been reading around on how to get this done in a proper and most of all secure way. Summary of this all is currently:
"First off, exposing an Elasticsearch node directly to the internet without protections in front of it is usually bad, bad news." (see here: Accessing elasticsearch from a public domain name or IP).
Another interesting blog I found: https://code972.com/blog/2017/01/dont-be-ransacked-securing-your-elasticsearch-cluster-properly-107.
The problem with the above-mentioned sources is that they are a bit older, and thus I am not sure whether they are up to date.
Therefore the following questions:
Is nginx sufficient to act as a secure middleman, passing the queries from the end-users to elastic?
What is the difference at that point with writing a backend into the react application (e.g. using node and express)?
What is the added value taking into account the built-in security from elasticsearch (usernames, password, apikey, certificates, https,...)?
I am reading a lot about using a VPN or tunneling. I have the impression that these solutions are more geared towards a corporate-collaborative approach. Let's say I am running my front-end on a live server, I can use tunneling to show my work to colleagues, my employer. VPN would be more realistic for allowing employees -wish I had them, just a cs student here- to access e.g. the database within my private network (let's say an employee needs to access kibana to adapt something, let's say an API-key - just making something up here), he/she could use a VPN connection for that.
Thank you so much for helping me clarify the above-mentioned points!
TLS, authorisation and access control are free for the Elastic Stack, and have been for a while. I'd start by looking at the docs, as it's an easy way to natively secure your cluster
for nginx, it can be useful for rate limiting, or blocking specific queries for eg. however it's another thing to configure and maintain
from a client POV it would really only matter if you are using the official Elasticsearch clients, and you use nginx and make changes to the way the API would respond to the client (eg path rewrites, rate limiting)
it's free, it's native, it's easy to manage via Kibana
I'd follow the docs to secure Elasticsearch and then see if you need this at some point in the future. this would be handled outside Elasticsearch anyway, and you'd still want to secure Elasticsearch
The point in exposing Elasticsearch nodes directly to the internet is a higher vulnerability in principle. You should follow the rule of the least "surface" of your system on the internet.
A good practice is to hide from the internet whatever doesn't need to be there, although well protected. It takes ~20mins to get cyber attacks on any exposed service (see a showcase).
So I suggest you install a private network, such as a traditional VPN or an SDP product such as Shieldoo Mesh.

HAProxy - routing to backend IP based on URL /path?

I'm trying to use HAProxy as a dynamic proxy for backend hosts based on partial /path regex match. The use case is routing from an HTTPS frontend to a large number of nodes that come and go frequently, without maintaining an explicit mapping of /path to server hostnames.
Specifically in this case the nodes are members of an Amazon EMR cluster, and I'd like to reverse-proxy/rewrite HTTP requests like:
<haproxy>/emr/ip-99-88-77-66:4040 -> 99.88.77.66:4040
<haproxy>/emr/ip-55-44-33-22/ganglia -> 55.44.33.22/ganglia
<haproxy>/emr/ip-11-11-11-11:8088/cluster/nodes -> 11.11.11.11:8088/cluster/nodes
...etc
dynamically.
As-in, parse the path beginning at /emr and proxy requests to an IP captured by the regex:
emr\/ip-(\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3})(.*)
Is this possible with HAProxy? I know it's probably not the right tool for the job, but if possible (even non-performant) I'd like to use the tooling we already have in place.
tl;dr basically nginx proxy_pass, but with HAProxy and plucking a backend IP from the url.
Thanks!
Yes its possible by using url filters in haproxy, see below link for more details.
https://fossies.org/linux/haproxy/doc/internals/filters.txt
Yes this can be done. I would recommend you use ACLs, as well as Roundrobin & checks, which will allow you to check to see if that instance is up before routing to it with a check. That way, the system will only route to service instances that are up and running, and will only have them preloaded for use if they are up.
In addition, this will also allow you to constantly cycle in and out instances, such as if your AWS instance costs change with any other providers you may have, and allows you to load balance with maximum cost savings in mind.
yes, this is possible.. check the official manual:
Using ACLs and fetching samples

Firewall review using ELK

I'm looking for a way to perform an automated and centralized firewall review tool by using the ELK stack. I believe it's a nice tool to use in order to achieve this, specifically with Kibana.
By using certain data, I would like to enrich the firewall ruleset.
The firewall rules can be exported in CSV format, the resting data can be adapted to such format.
The data will be comprised of:
Firewall rules: Source, Destination, Services, Action (drop or accept)
Inventory of all IPs: IP address(es), support groups, status of the device (dismissed or deployed)
Addressing scheme of each office: Site, belonging subnet(s)
I have imported all the data, but I have absolutely no idea on how to achieve the correlation between firewall rulesets and relative IP from the inventory and addressing from the offices. It would be nice to have a view of the firewall rule and see if it belongs to a specific site.Bear in mind that multiple firewall rules might have multiple entries (e.g. 10.0.0.0/8 -> 10.0.0.1,10.0.0.2,etc...)
The final objective to this review is to aggregate rules (whenever possible) by optimizing operability and security.
Did anybody find themselves in the same situation? If so, how did you solve it?
I'd appreciate any input you might have.
I figured it out.
Basically wrote some python scripts which correlate (offline) the information present within the CSV rules and the CSV inventory, and also within the CSV rules and CSV environments/subnets.
After all this, some manual intervention is indeed required to achieve what I'm asking.

ElasticSearch replication home/server

I am running a local ElasticSearch server from my own home, but would like access to the content from outside. Since I am on a dynamic IP and besides that do not feel comfortable opening up ports to the outside, I would like to rent a VPS somewhere, setup ElasticSearch and let this server be a read only copy of the one I have at home.
As I understand it, this should be possible - however I have been unsuccessful at creating any usable version that lets another server be a read-only version of my home ES-server.
Can anyone point me to a piece of information or create a guide, that would help me to set this up? I am rather known to ES-usage, however my setup-skills are still vague.
As I understand it, this should be possible
It might be possible with some workarounds, but it's definitely not built for that:
One cluster needs to be in one physical region; mainly because of latency and the stability of the network connection.
There are no read-only versions. You could only allow read access to a node (via a reverse proxy or the security plugin), but that's only a workaround.

1 A-record for every subdomain (10000+); any potential issues? Any other solution?

Most solutions I've read here for supporting subdomain-per-user at the DNS level are to point everything to one IP using *.domain.com.
It is an easy and simple solution, but what if I want to point first 1000 registered users to serverA, and next 1000 registered users to serverB? This is the preferred solution for us to keep our cost down in software and hardware for clustering.
alt text http://learn.iis.net/file.axd?i=1101
(diagram quoted from MS IIS site)
The most logical solution seems to have 1 x A-record per subdomain in Zone Datafiles. BIND doesn't seem to have any size limit on the Zone Datafiles, only restricted to memory available.
However, my team is worried about the latency of getting the new subdoamin up and ready, since creating a new subdomain consist of inserting a new A-record & restarting DNS server.
Is performance of restarting DNS server something we should worry about?
Thank you in advance.
UPDATE:
Seems like most of you suggest me to use a reverse proxy setup instead:
alt text http://learn.iis.net/file.axd?i=1102
(ARR is IIS7's reverse proxy solution)
However, here are the CONS I can see:
single point of failure
cannot strategically setup servers in different locations based on IP geolocation.
Use the wildcard DNS entry, then use load balancing to distribute the load between servers, regardless of what client they are.
While you're at it, skip the URL rewriting step and have your application determine which account it is based on the URL as entered (you can just as easily determine what X is in X.domain.com as in domain.com?user=X).
EDIT:
Based on your additional info, you may want to develop a "broker" that stores which clients are to access which servers. Make that public facing then draw from the resources associated with the client stored with the broker. Your front-end can be load balanced, then you can grab from the file/db servers based on who they are.
The front-end proxy with a wild-card DNS entry really is the way to go with this. It's how big sites like LiveJournal work.
Note that this is not just a TCP layer load-balancer - there are plenty of solutions that'll examine the host part of the URL to figure out which back-end server to forward the query too. You can easily do it with Apache running on a low-spec server with suitable configuration.
The proxy ensures that each user's session always goes to the right back-end server and most any session handling methods will just keep on working.
Also the proxy needn't be a single point of failure. It's perfectly possible and pretty easy to run two or more front-end proxies in a redundant configuration (to avoid failure) or even to have them share the load (to avoid stress).
I'd also second John Sheehan's suggestion that the application just look at the left-hand part of the URL to determine which user's content to display.
If using Apache for the back-end, see this post too for info about how to configure it.
If you use tinydns, you don't need to restart the nameserver if you modify its database and it should not be a bottleneck because it is generally very fast. I don't know whether it performs well with 10000+ entries though (it would surprise me if not).
http://cr.yp.to/djbdns.html

Resources