Elastic SIEM - alerting and correlation - elasticsearch

I was asked to do research, how can a very basic SIEM with Elastic Stack be build.
I managed to set up stack with Elasticsearch, Kibana and Beats, but now: How can I write correlation rules, like: If someone failed to log in 10 times in last 3 mins - ALERT. Or if there is unusual activity of scanning ports (detect nmap activity) - ALERT. How can it be done? Using only free options.

Elastics free and open license allows the usage of detections.
Machine Learning is a paid feature but correlations (EQL) and normal detections (query) can be build. You also get to use the kibana interface to handle the signals into cases.
https://www.elastic.co/subscriptions

Related

Kibana - How to count number of error logs and the type of error

I monitor our team project error logs in Kibana and report them, like: From yesterday to today, there has been 50 errors, 20 of them is IP authentification and 30 Host error... or something like that.
I wanted to automate this process, counting the number of errors and their types and displaying them on Slack, kind of a microsoft teams. I was looking at web scrapping with python to extract those error logs but it doesn't quite look like what I'm looking for.
How would you go about this?
Build a Watcher for that.
Query your stuff by timeframe, do the aggregations by "error category" & count your numbers, schedule the Watcher to fire at the frequency you're comfortable with, and send the results directly to Slack (connector is provided out of the box).
How to do it:
https://www.elastic.co/guide/en/elasticsearch/reference/current/watcher-api-put-watch.html

ubuntu server cpu utilisation increasing very quickly after installing ELK

I installed elasticsearch logstash and kibana in the ubuntu server. Before I starting these services the CPU utilization is less than 5% and after starting these services in the next minute the CPU utilization crossing 85%. I don't know why it is happening. Can anyone help me with this issue?
Thanks in advance.
There is not enough information in your question to give you a specific answer, but i will point out few possible scenarios and how to deal with them.
Did you wait long enough? sometimes there is a warmpup which is consuming higher CPU until all services are registered and finish to boot. if you have a fairly small machine it might consume higher CPU and take longer to finish.
folder write permissions. if any of the components of the ELK fails due to restricted access on needed directories either for logging, creating sub folders for sinceDB files or more it can cause it to go into an infinity loop and try again and again while it is consuming high CPU.
connection issues. ES should be the first component to start, if it fails, Kibana and Logstash will go and try to connect to the ES again and again until successful connection- which can cause high CPU.
bad logstash configuration. if logstash fails to read the file from the configurations or if you have a bad parsing, excessive parsing for example- your first "match" in the filter part will include the least common option it might consume high CPU.
For further investigation:
I suggest you to not start all of them together. start ES first. if everything goes well start Kibana and lastly start Logstash.
check the logs of all the ELK components to find error messages, failures, etc.
for a better answer I will need the yaml of all 3 components (ES, Kibana, Logstash)
I will need the logstash configuration file.
Would recommend you to analyse the CPU cycles consumed by each of the elasticsearch, logstash and kibana process.
Check specifically which process among the above is consuming the most memory/cpu via top command for example.
Start only ES first and allow it to settle and the node to be started completely before starting kibana and may be logstash after that.
Send me the logs for each and I can assist if there are any errors.

How to enable monitoring in Oracle Service Bus 11g?

I have been looking to enable monitoring in OSB 11g? I am not exactly sure how to achieve this?
Thanks
It depends on what you mean by "monitoring" as there are many different kinds and a lot depends on your functional requirements around monitoring too.
Monitoring can be:
* Proactive (When you actively look for patterns - preferably automatically but also possible manually - and detect issues before they occur or get alerted to those immediately after they occur)
* Reactive (when you are trying to debug an issue after it has occurred)
Monitoring can also be:
* Technical - check for signs of timeouts, long running invocations etc. Technical monitoring can be at:
Application level (OSB specific in your case)
Platform level (Application server/JVM/operating system - after all, for OSB monitoring to work, you need to ensure/monitor that the OSB itself is running!)
*Functional (often involves explicit logging from your code but can be co-related to technical patterns - e.g. number of invocations of a particular API/service might indicate number of orders).
Functional monitoring can also include SLA monitoring
Finally, in the Oracle Service bus:
* You can enable monitoring at the individual service level (via the Operations tab under each service or via scripting in WSLT)
* The monitoring above can be combined with rules to alert on specifc scenarios (such as SLA breaches)
* You can use specific log entries within your pipelines and then monitor those at runtime
There is a lot more you can do to "monitor" services depending on what is relevant for your services. Although OSB monitoring can be performed via various consoles (/sbconsole or /em in 12c), a lot of good monitoring combines these features into well designed alerts so that you are always on top of potential problems. You can reach this stage by constantly observing your system's behaviour and then improving/tweaking your monitoring solution(s).
This is a good document to read to start:
https://docs.oracle.com/cd/E29542_01/admin.1111/e15867/monitoring_ops.htm#OSBAG472
HTH.

How to design a monitor algorithm to detect a surge of traffic?

I am now developing a monitoring system to detect a surge of traffic for host/uri.The detailed procedure is to read the real-time nginx access log , ,compute the total traffic for each host and and report the traffic surge.
For example , for each 1 minute , if a traffic for a host is up to 1000/minute , my algorithm will find it. But ,
my monitoring system have many hosts to monitor at the same time
the activity of hosts vary widely from each other
the activity of each single host varies widely over time during a day.
So , just configure a thread-hold for each host is not a good solution. Any one could give me some suggestions? What I want to achieve, is to find a auto-detect method which could alarm for a sudden/unexpected traffic surge according to the traffic statistic data of each each host for every minute and requires little pre-configuration and is more or less intelligent?
Baron Schwartz, author of the Percona toolkit, the High Performance MySQL book, and cofounder of VividCortex recently released a free book on this subject: Anomaly detection for monitoring.
I've heard him speak at several conferences. He's very impressive. (Many of those talks are on YouTube.)

Best ways to diagnose elasticsearch issues?

The question is a little broad, but I feel there is no one place that helps systematically diagnose elastic search issues. The broad categories could be :
Client
Query errors
Incorrect Query Results
Unexplained behaviors
Server
Setup issues
Performance issues
Critical errors
Unexplained behaviors
Example for 1)a) would be to say, log the query string on the server ( reference to how to enable logging would be nice), install the inquistor plugin (link to github) and run the query string yourself. etc.
Your question is very broad and to be honest I am not sure I can fully answer it, however I will tell you how we monitor and manage our cluster.
1 - We log query logs and slow query logs to graylog2 (it uses es under the hood) so we can easily see, report, and alert on all logging from our cluster. We can also view slow queries that have occurred.
2 - we send es stats to statsd and then graph that information in graphite. This way we can see things like cluster state, query counts, indexing counts, jvm stats, disk i/o, etc. All parsed from the es stats api and sent to statsd
3 - we use fabric scripts to deploy/upgrade the cluster and manage plugin installation
4 - we use jenkins and jmeter to run occasional performance tests against the cluster (are we getting slower over time, does the cluster deployment work?)
5 - we use bigdesk and head plugins to keep an eye on the cluster and explore how it is doing.

Resources