How to assign IP ranges to google data flow instances? - elasticsearch

I need to move data from google bigquery to elasticsearch instances, For that I have created python dataflow job to copy bigquery table to elasticsearch. But problem is recently they have added IP based restriction on elastic search instances so that it will allow only for specific IP ranges only.
So How can I identify or assign IP ranges of my dataflow workers when I using "DataflowRunner" option?

In the pipeline options you can set the network and the subnetwork you want to use. Each VPC network contains subnets, each with a defined IP range. By defining the subnet to the ip range needed and setting that subnet in the pipeline options you can assign a ip range to your workers.

Related

Addition of new field in metric beats when it sends data to Elastic cloud

Current situation is all logs from different cluster are mixed in the same index.
I am using metric beats which is running as daemon sets in a cluster. We need same dashboards where we could list out the clusters so that person who views this dashboard could use same dashboard to view different cluster metrics. Is there any ways to do this in Kibana ?
From controls in Kibana am able to create a drop down for user to select but i want to add a new field which will be unique for each cluster. Can we add a such field so that i could sort out logs using controls option.
Please suggest if there is any solution for this
This could by done by a 2 ways :
https://www.elastic.co/guide/en/beats/metricbeat/current/add-fields.html
Please refer the link and add it using processor in input section of metric beat.yml.
I used 2nd option under output.elasticsearch part of metric beat.yml file add :
fields_under_root_true : true
fields:
<custome-field-name>: value
If dynamic value is required add it in extraenvs and then fetch using ${nameused-in-extraenv}

How can I send StormCrawler content to multiple Elasticsearch indices, based on host?

I currently have a successful StormCrawler instance crawling about 20 sites, and indexing the content to one Elasticsearch index. Is it possible, either in ES or via StormCrawler, to send each host's content to its own unique content index?
Out of curiosity: why do you need to do that? Having one index per host seems rather wasteful. You can filter the results based on a field like host if you want to provide results for a particular host.
To answer your question, there is no direct way of doing it currently as the IndexerBolt it connected to one index only. You could declare one IndexerBolt per index you need and add a custom bolt to fan based on the value of the host metadata but this is not dynamic and rather heavy-handed. There could be a way of doing it using pipelines in ES, not sure.

Issue while querying network topology key space in multi region cluster

I have set up Cassandra cluster with 3 nodes in 3 different ec2 instances. each instance is in different availability zone though the datacenter is same.
I am using EC2MultiRegionSnitch, below are my yaml configuration detail
listen address : private ip
broadcast address : publicip
seeds : public of 1 node
While querying networkTopologyKeyspace query I am getting below error "not enough replicas available for query at consistency ONE" . RF for this key space is 3.
Queries on simpleclass keyspace are working perfectly fine.

Elastalert filter to detect network scanning

I use elastalert to alert from elasticsearch data and I would like to add an alert for network and port scanning from external addresses. In my elasticsearch cluster I have firewall data that shows connections from Internet addresses to my corporate Internet facing device IP addresses. I would like to detect and alert on IPs that are scanning my IPs and base it on some minimum threshold of what's being targeted. For example, the threshold could be a minimum of 'X' number of scanned hosts or TCP/UDP ports in a 5 minute period.
Is such a query possible? if so, please advise how I could construct an elastalert filter to do this.
Update: I'm wondering if the approaches described here could be used to solve this? How to set up percolator to return when an aggregation value hits a certain threshold?

Filtering Graphite metrics by server

I've recently done a lot of research into graphite with statsD instrumentation. With help of our developer operations team we managed to get multiple servers reporting metrics to graphite, and combine all the metrics. This is partially what we are looking for, however I want to filter the metric collection by server rather than having all the metrics be averaged together. The purpose of this is to monitor metrics collection on a per server basis, as many of our stats could also be used to visualize server uptime and performance. I haven't been able to find anything about how this may be achieved in my research, other than maybe some trickery with the aggregation rules.
You should include the server name as the first path component of the metric name being emitted. When naming metrics, Graphite separates the metric name into path components using . as the delimiter between path components. For example, you may want to use a naming schema like: <data_center>_<environment>_<role>_<node_id>.gauges.cpu.idle_pct This will cause each server to be listed as a separate category on http://graphite_hostname.com/dashboard/
If you need to perform aggregations across servers, you can do that at the graphite layer, or you could emit the same metric under two different names: one metric name that has the first path component as the server name, and one metric name that has the first path component as a value that is shared across all servers you want that metric aggregated across.

Resources