I'm trying out Uptime feature in Kibana. I've downloaded Heartbeat and ran it with default setting. It works okay.
However, when I tried to add more monitors in heartbeat.monitors in heartbeat.yml. I run into an error.
The below is the default, and it runs okay.
haertbeat.yml
# Configure monitors inline
heartbeat.monitors:
- type: http
# List or urls to query
urls: ["http://localhost:9200"]
# Configure task schedule
schedule: '#every 10s'
# Total test connection and data exchange timeout
#timeout: 16s
However, when I add the following, I get an error.
# Configure monitors inline
heartbeat.monitors:
- type: http
# List or urls to query
urls: ["http://localhost:9200"]
# Configure task schedule
schedule: '#every 10s'
# Total test connection and data exchange timeout
#timeout: 16s
- type: icmp <------ When I try to add tcp or icmp,
schedule: '#every 10s' <------ I get an error. I am doing something
hosts: ["localhost"] <------ wrong. How can I add more monitors?
PS C:\Program Files\Heartbeat> Start-Service heartbeat
Start-Service : Service 'heartbeat (heartbeat)' cannot be started due to the following error: Cannot start service heartbeat on computer '.'.
At line:1 char:1
+ Start-Service heartbeat
+ ~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OpenError: (System.ServiceProcess.ServiceController:ServiceController) [Start-Service], ServiceCommandException
+ FullyQualifiedErrorId : CouldNotStartService,Microsoft.PowerShell.Commands.StartServiceCommand
When I erase what I wanted to add, it works fine. How can I add more monitors in heartbeat.yml?
I strongly believe that its an indentation issue in the YAML file.
Look at your icmp monitor:
- type: icmp <------ When I try to add tcp or icmp,
schedule: '#every 10s' <------ I get an error. I am doing something
hosts: ["localhost"] <------ wrong. How can I add more monitors?
There are whitespaces before the schedule and hosts settings.
Now look at the default monitor:
heartbeat.monitors:
- type: http
# List or urls to query
urls: ["http://localhost:9200"]
# Configure task schedule
schedule: '#every 10s'
# Total test connection and data exchange timeout
#timeout: 16s
Align the settings exactly under the type field and run it again.
Related
I am using filebeat to push my logs to elasticsearch using logstash and the set up was working fine for me before. I am getting Failed to publish events error now.
filebeat | 2020-06-20T06:26:03.832969730Z 2020-06-20T06:26:03.832Z INFO log/harvester.go:254 Harvester started for file: /logs/app-service.log
filebeat | 2020-06-20T06:26:04.837664519Z 2020-06-20T06:26:04.837Z ERROR logstash/async.go:256 Failed to publish events caused by: write tcp YY.YY.YY.YY:40912->XX.XX.XX.XX:5044: write: connection reset by peer
filebeat | 2020-06-20T06:26:05.970506599Z 2020-06-20T06:26:05.970Z ERROR pipeline/output.go:121 Failed to publish events: write tcp YY.YY.YY.YY:40912->XX.XX.XX.XX:5044: write: connection reset by peer
filebeat | 2020-06-20T06:26:05.970749223Z 2020-06-20T06:26:05.970Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://xx.com:5044))
filebeat | 2020-06-20T06:26:05.972790871Z 2020-06-20T06:26:05.972Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://xx.com:5044)) established
Logstash pipeline
02-beats-input.conf
input {
beats {
port => 5044
}
}
10-syslog-filter.conf
filter {
json {
source => "message"
}
}
30-elasticsearch-output.conf
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "index-%{+YYYY.MM.dd}"
}
}
Filebeat configuration
Sharing my filebeat config at /usr/share/filebeat/filebeat.yml
filebeat.inputs:
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /logs/*
#============================= Filebeat modules ===============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 3
#index.codec: best_compression
#_source.enabled: false
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["xx.com:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
When I do telnet xx.xx 5044, this is the what I see in terminal
Trying X.X.X.X...
Connected to xx.xx.
Escape character is '^]'
I had the same problem. Here some steps, which could help you to find the core of your problem.
Firstly I tested such way: filebeat (localhost) -> logstash (localhost) -> elastic -> kibana. Each service is on the same machine.
My /etc/logstash/conf.d/config.conf:
input {
beats {
port => 5044
ssl => false
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Here, I specially disabled ssl (in my case it was a main reason of the issue, even when certificates were correct, magic).
After that don't forget to restart logstash and test with sudo filebeat -e command.
If everything is ok, you wouldn't see 'connection reset by peer' error
I had the same problem. Starting filebeat as a sudo user worked for me.
sudo ./filebeat -e
I have made some changes to input plugin config, as specifying ssl => false but did not worked without starting filebeat as a sudo privileged user or as root.
In order to start filebeat as a sudo user, filebeat.yml file must be owned by root. Change whole filebeat folder permission to a sudo privileged user by using sudo chown -R sime_sudo_user:some_group filebeat-7.15.0-linux-x86_64/ and then chown root filebeat.yml will change the permission of file.
I created a playbook to reboot my remote servers. I use wait_for to wait for remote servers up before I continue. So I have the following code:
—-
- hosts: hostName
tasks:
- name: reboot
shell: reboot
async: 1
poll: 0
- name: wait for server to come up
Local_action: wait_for
args:
host: hostName
port: 22
state: started
delay: 10
timeout: 600
My targeted server was up about 5 minutes after reboot was initiated. However, the playbook stacked at this play till it timed out and generated error.
My questions are:
1. How doeS wait_for work here? Does it send ssh connection request to target host and time out if it cannot connect to the target host after 600 seconds? Or does it keep pinging the target host till it times out?
2.What could be the problem I am having?
You'll be better off using wait_for_connection in this case. For example, given the play is running at - hosts: hostName
- name: Wait 600 seconds, but only start checking after 10 seconds
wait_for_connection:
delay: 10
timeout: 600
Q: How does wait_for work here?
A: wait_for is waiting for a port to become available.
Q: Does it send the ssh connection request to the target host and time out if it cannot connect to the target host after 600 seconds?
A: No. It's testing the port.
Q: Does it keep pinging the target host till it times out?
A: No. It tries to create a socket. See wait_for.py
s = socket.create_connection((host, port), connect_timeout)
Q: What could be the problem I am having?
A: It's not clear from the data available. Do not run wait_for as local_action. Make sure the host rebooted successfully.
I have a elastic stack (version 7.3.0) configured, with a Heartbeat set up to ping my different hosts.
The config file of my monitor looks like this:
- type: icmp
name: icmp_monitor
schedule: '#every 5s'
hosts:
- machine1.domain.com # Machine 1
- machine2.domain.com # Machine 2
- machine3.domain.com # Machine 3
Is there a way to give the hosts an "alias" in the configuration file ?
In my organisation, the server hostname is not very meaningfull, it would be great for example to specify that machine1.domain.com is MongoDB main server.
The example on the documentation page shows that you can set host names in the hosts section/key. There they specify "myhost". So I assume that it is possible to define any name you want.
Elasticsearch is however not responsible for aliasing/resolving hostnames. It is a task of your OS.
If your heartbeat runs on a Linux machine I would set the aliases in /etc/hosts like
192.168.1.X mongodb-main
and would set the alias in the monitor config like
- type: icmp
name: icmp_monitor
schedule: '#every 5s'
hosts:
- mongodb-main
and see if heartbeat accepts it and can resolve the alias/hostname.
I just want to run packetbeat and get packet sniff from MySQL and output to file or console ,so that I no need Elastic system
I tried to run it but no thing output
root#localhost~: packetbeat -c packetbeat.yml
root#localhost~:
Following are my config file
procs:
enabled: true
monitored:
- process: mysqld
cmdline_grep: mysqld
output:
### Console output
console:
# Pretty print json event
pretty: false
How can I do that ?
Packetbeat works by capturing the network traffic that Mysql creates, so you need to also configure from which device to capture the traffic and on which tcp ports Mysql is running. For example:
interface:
device: any
protocols:
mysql:
ports: [3306]
procs:
enabled: true
monitored:
- process: mysqld
cmdline_grep: mysqld
output:
### Console output
console:
# Pretty print json event
pretty: false
Your console output configuration looks good to me. You can also output to rotating files, if you prefer.
It seems as if many developers trying to move from non-scaled apps (like the diy cartridge) to scaled versions of their apps are having trouble configuring their cartridges to interact properly with the default configuration of HAProxy created by Openshift and getting their start and stop action hooks to deal with scaling portions of their app, myself included. Most often because we're new and we don't quite understand what the default configuration of openshift's HAProxy does...
HAProxy's default configuration
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
#log 127.0.0.1 local2
maxconn 256
# turn on stats unix socket
stats socket /var/lib/openshift/{app's ssh username}/haproxy//run/stats level admin
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
#option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 128
listen stats 127.2.31.131:8080
mode http
stats enable
stats uri /
listen express 127.2.31.130:8080
cookie GEAR insert indirect nocache
option httpchk GET /
balance leastconn
server local-gear 127.2.31.129:8080 check fall 2 rise 3 inter 2000 cookie local-{app's ssh username}
Often it seems like both sides of the application are up and running but HAProxy isn't sending http requests to where we'd expect. And from numerous questions asked on openshift we know that this line:
option httpchk GET /
Is HAProxy's sanity check to make sure the app is working, but often times whether that line is edited or removed we'll still get something like this in HAProxy's logs:
[WARNING] 240/150442 (404099) : Server express/local-gear is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 240/150442 (404099) : proxy 'express' has no server available!
Yet inside the gears we often have our apps listening to $OPENSHIFT_CARTNAME_IP and $OPENSHIFT_CARTNAME_PORT and we'll see they've started, and sometimes are rejecting the sanity check.
ERROR [DST 127.2.31.129 sid=1] SHOUTcast 1 client connection rejected. Stream not available as there is no source connected. Agent: `'
A cut and dry manifest, like the one from the diy cartridge
Name: hws
Cartridge-Short-Name: HWS
Display-Name: Hello World of Scaling Apps
Description: A Scaling App on Openshift
Version: '0.1'
License: ASL 2.0
License-Url: http://www.apache.org/licenses/LICENSE-2.0.txt
Cartridge-Version: 0.0.10
Compatible-Versions:
- 0.0.10
Cartridge-Vendor: you
Vendor: you
Categories:
- web_framework
- experimental
Website:
Help-Topics:
Getting Started: urltosomeinfo
Provides:
- hws-0.1
- hws
Publishes:
Subscribes:
set-env:
Type: ENV:*
Required: false
Scaling:
Min: 1
Max: -1
Group-Overrides:
- components:
- web-proxy
Endpoints:
- Private-IP-Name: IP
Private-Port-Name: PORT
Private-Port: 8080
Public-Port-Name: PROXY_PORT
Protocols:
- http
- ws
Options:
primary: true
Mappings:
- Frontend: ''
Backend: ''
Options:
websocket: true
- Frontend: "/health"
Backend: ''
Options:
health: true
Start Hook (inside bin/control or in .openshift/action_hooks)
RESPONSE=$(curl -o /dev/null --silent --head --write-out '%{http_code}\n' "http://${OPENSHIFT_APP_DNS}:80")
${RESPONSE} > ${OPENSHIFT_DIY_LOG_DIR}/checkserver.log
echo ${RESPONSE}
if [ "${RESPONSE}" -eq "503" ]
then
nohup ${OPENSHIFT_REPO_DIR}/diy/serverexec ${OPENSHIFT_REPO_DIR}/diy/startfromscratch.conf > ${OPENSHIFT_DIY_LOG_DIR}/server.log 2>&1 &
else
nohup ${OPENSHIFT_REPO_DIR}/diy/serverexec ${OPENSHIFT_REPO_DIR}/diy/secondorfollowinggear.conf > ${OPENSHIFT_DIY_LOG_DIR}/server.log 2>&1 &
fi
Stop Hook (inside bin/control or in .openshift/action_hooks)
kill `ps -ef | grep serverexec | grep -v grep | awk '{ print $2 }'` > /dev/null 2>&1
exit 0
The helpful questions for new developers:
Avoiding a killer sanity check
Is there a way of configuring the app using the manifest.yml to avoid these collisions? Or vice/versa a little tweak to the default HAProxy configuration so that the app will run on appname-appdomain.rhcloud.com:80/ without returning 503 errors?
Setting up more convenient access to the app
My shoutcast example, as hinted by the error works so long as I'm streaming to it first. What additional parts to the manifest and HAProxy would let a user connect directly (from an external url) to the first gear's port 80? As opposed to port-forwarding into the app all the time.
Making sure the app starts and stops as if it weren't scaled
Lastly many non-scaled applications have a quick and easy script to start up and shutdown because it seems openshift accounts for the fact the app has to have the first gear running. How would a stop action hook be adjusted to run through and stop all the gears? What would have to be added to the start action hook to get the first gear back up online with all of it's components (not just HAProxy)?