Configuring autodiscover in heartbeat with Kubernetes - elasticsearch

I have a Kubernetes cluster and I am trying to deploy and configure heartbeat to monitor services.
When heartbeat starts, I see this in the logs:
2019-08-28T15:31:19.116Z ERROR kubernetes/util.go:90 kubernetes: Querying for pod failed with error: kubernetes api: Failure 404 pods "ip-172-20-64-197" not found
This is my autodiscover config:
heartbeat.autodiscover:
providers:
- type: kubernetes
templates:
- condition:
equals:
kubernetes.namespace: myappnamespace
config:
- type: http
enabled: true
urls: ["${data.host}:${data.port}"]
schedule: "#every 1s"
I also see this in the logs which also suggests that the autodiscover isn't working properly.
2019-08-28T15:31:19.117Z DEBUG [kubernetes] kubernetes/watcher.go:189 Got 0 items from the resource sync
2019-08-28T15:31:19.117Z DEBUG [kubernetes] kubernetes/watcher.go:194 Done syncing 0 items from the resource sync
I've tried all sorts of combinations of the above configuration but to no avail. Can anyone point me in the right direction? Thanks.

Related

"No logs found" in grafana

I installed Loki, grafana and promtail and all three runing. on http://localhost:9080/targets Ready is True, but the logs are not displayed in Grafana and show in the explore section "No logs found"
promtail-local-config-yaml:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: ward_workstation
agent: promtail
__path__: D:/LOGs/*log
loki-local-config.yaml:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
How can i solve this problem?
Perhaps you are using Loki in Windows ?
In your promtail varlogs job ,the Path "D:/LOGs/*log" is obviously wrong, you cannot access the windows file from your docker directly.
You shoud mount your windows file to your docker like this:
promtail:
image: grafana/promtail:2.5.0
volumes:
- D:/LOGs:/var/log
command: -config.file=/etc/promtail/config.yml
networks:
- loki
Then everything will be ok.
Note that, in your promtail docker the config is like this:
you can adjust both to make a match...
Here's a general advice how to debug Loki according to the question's title:
(1) Check promtail logs
If you discover such as error sending batch you need to fix your Promtail configuration.
level=warn ts=2022-10-12T16:26:20.667560426Z caller=client.go:369 component=client host=monitor:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp: lookup *Loki* on 10.96.0.10:53: no such host"
(2) Open the Promtail config page and check, if Promtail has read your given configuration: http://localhost:3101/config
(3) Open the Promtail targets page http://localhost:3101/targets and check
if your service is listed as Ready
if the log file contains the wanted contents and is readable by Promtail. If you're using docker or kubernetes I would log into the Promtail Container and would try to read the logfile manually.
To the specific problem of the questioner:
The questioner said, that the services are shown as READY in the targets page. So I recommend to check (1) Promtail configuration and (3b) access to log files (as Frank).

OpenTelemetry Export Traces to Elastic APM and Elastic OpenDistro

I am trying to instrument by python app (django based) to be able to push transaction traces to Elastic APM which I can later view using the Trace Analytic in OpenDistro Elastic.
I have tried the following
Method 1:
pip install opentelemetry-exporter-otlp
Then, in the manage.py file, I added the following code to directly send traces to elastic APM.
span_exporter = OTLPSpanExporter(
endpoint="http://localhost:8200",
insecure=True
)
When I run the code I get the following error:
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 2s.
Method 2:
I tried using OpenTelemetry Collector in between since method 1 didn't work.
I configured my collector in the following way:
extensions:
memory_ballast:
size_mib: 512
zpages:
endpoint: 0.0.0.0:55679
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
memory_limiter:
# 75% of maximum memory up to 4G
limit_mib: 1536
# 25% of limit up to 2G
spike_limit_mib: 512
check_interval: 5s
exporters:
logging:
logLevel: debug
otlp/elastic:
endpoint: "198.19.11.22:8200"
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging, otlp/elastic]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging]
extensions: [memory_ballast, zpages]
And configured my code to send traces to collector like this -
span_exporter = OTLPSpanExporter(
endpoint="http://localhost:4317",
insecure=True
)
Once I start the program, I get the following error in the collector logs -
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/queued_retry.go:304
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/traces.go:116
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/queued_retry.go:155
go.opentelemetry.io/collector/exporter/exporterhelper/internal.ConsumerFunc.Consume
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/internal/bounded_queue.go:103
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*BoundedQueue).StartConsumersWithFactory.func1
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/internal/bounded_queue.go:82
2022-01-05T17:36:55.349Z error exporterhelper/queued_retry.go:304 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "name": "otlp/elastic", "error": "max elapsed time expired failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed", "dropped_items": 1}
What am I possibly missing here?
NOTE: I am using the latest version of opentelemetry sdk and apis and latest version of collector.
Okay, So the way to work with Open Distro version of Elastic to get traces is:
To avoid using the APM itself.
OpenDistro provides a tool called Data Prepper which must be used in order to send data(traces) from Otel-Collector to OpenDistro Elastic.
Here is the configuration I did for the Otel-Collector to send data to Data Prepper:
... # other configurations like receivers, etc.
exporters:
logging:
logLevel: debug
otlp/data-prepper:
endpoint: "http://<DATA_PREPPER_HOST>:21890"
tls:
insecure: true
... # Other configurations like pipelines, etc.
And this is how I configured Data Prepper to receive data from Collector and send it to Elastic
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- elasticsearch:
hosts: [ "http://<ELASTIC_HOST>:9200" ]
trace_analytics_raw: true

Filebeat's GCP Module keep getting hash config error

I am currently trying to forward GCP's Cloud Logging to Filebeat to be forwarded to Elasticsearch following this docs with the GCP module settings on filebeat according to this docs
Currently I am only trying to forward audit logs so my gcp.yml module is as follows
- module: gcp
vpcflow:
enabled: false
var.project_id: my-gcp-project-id
var.topic: gcp-vpc-flowlogs
var.subscription_name: filebeat-gcp-vpc-flowlogs-sub
var.credentials_file: ${path.config}/gcp-service-account-xyz.json
#var.internal_networks: [ "private" ]
firewall:
enabled: false
var.project_id: my-gcp-project-id
var.topic: gcp-vpc-firewall
var.subscription_name: filebeat-gcp-firewall-sub
var.credentials_file: ${path.config}/gcp-service-account-xyz.json
#var.internal_networks: [ "private" ]
audit:
enabled: true
var.project_id: <my prod name>
var.topic: sample_topic
var.subscription_name: filebeat-gcp-audit
var.credentials_file: ${path.config}/<something>.<something>
When I run sudo filebeat setup I keep getting this error
2021-05-21T09:02:25.232Z ERROR cfgfile/reload.go:258 Error loading configuration files: 1 error: Unable to hash given config: missing field accessing '0.firewall' (source:'/etc/filebeat/modules.d/gcp.yml')
Although I can start the service, but I don't seem to see any logs forwarded from GCP's Cloud Logging pub/sub topic to elastic search.
Help or tips on best practice too would be appreciated.
Update
If I were to follow the docs in here, it would give me the same error but in audit

Packetbeat does not add Kubernetes metadata

I've started a minikube (using Kubernetes 1.18.3) to test out ECK and specifically packetbeat. The minikube profile is called "packetbeat" (important, as that's the hostname for the Virtualbox VM as well) and I followed the ECK quickstart to get it up and running. ElasticSearch (single node) and Kibana are running fine and packetbeat is gathering flows as well, however, I'm unable to make it add the Kubernetes metadata to the fields.
I'm working in the default namespace and created a ClusterRoleBinding to view for the default ServiceAccount in the namespace. This is working well, if I do not do that, packetbeat will report it is unable to list the Pods on the API server.
This is the Beat config I'm using to make ECK deploy packetbeat:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: packetbeat
spec:
type: packetbeat
version: 7.9.0
elasticsearchRef:
name: quickstart
kibanaRef:
name: kibana
config:
packetbeat.interfaces.device: any
packetbeat.protocols:
- type: http
ports: [80, 8000, 8080, 9200]
- type: tls
ports: [443]
packetbeat.flows:
timeout: 30s
period: 10s
processors:
- add_kubernetes_metadata: {}
daemonSet:
podTemplate:
spec:
terminationGracePeriodSeconds: 30
hostNetwork: true
automountServiceAccountToken: true # some older Beat versions are depending on this settings presence in k8s context
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: packetbeat
securityContext:
runAsUser: 0
capabilities:
add:
- NET_ADMIN
(This is mostly a slightly modified example from the ECK example page.) However, this is not working at all. I tried it with "add_kubernetes_metadata: {}" first, but that will error with the message:
2020-08-19T14:23:38.550Z ERROR [kubernetes] kubernetes/util.go:117
kubernetes: Querying for pod failed with error: pods "packetbeat" not
found {"libbeat.processor": "add_kubernetes_metadata"}
This message goes away when I add the "host: packetbeat". I'm no longer getting an error now, but I'm not getting the Kubernetes metadata either. I'm mostly interested in the namespace tag, but I'm not getting any. I do not see any additional errors in the log and it just reports monitoring details every 30 seconds at the moment.
What am I doing wrong? Any more information I can provide to help me debug this?
So the docs are just unclear. Although they do not explicitely state it, you do need to add indexers and matchers. My understanding was that there are "default" ones (as you can disable those), but that does not seem to be the case. Adding the indexers and matchers as per the example in the docs makes the Kubernetes metadata part of the data.

state_replicaset/state_replicaset.go98 error making http request: Get kube-state-metrics:8080/metrics: lookup kube-state-metrics on IP:53 no such host

We are trying to start metricbeat on typhoon kubernetes cluster. But after startup its not able to get some pod specific events like restart etc because of the following
Corresponding metricbeat.yaml snippet
# State metrics from kube-state-metrics service:
- module: kubernetes
enabled: true
metricsets:
- state_node
- state_deployment
- state_replicaset
- state_statefulset
- state_pod
- state_container
- state_cronjob
- state_resourcequota
- state_service
- state_persistentvolume
- state_persistentvolumeclaim
- state_storageclass
# Uncomment this to get k8s events:
#- event period: 10s
hosts: ["kube-state-metrics:8080"]
Error which we are facing
2020-07-01T10:31:02.486Z ERROR [kubernetes.state_statefulset] state_statefulset/state_statefulset.go:97 error making http request: Get http://kube-state-metrics:8080/metrics: lookup kube-state-metrics on *.*.*.*:53: no such host
2020-07-01T10:31:02.611Z WARN [transport] transport/tcp.go:52 DNS lookup failure "kube-state-metrics": lookup kube-state-metrics on *.*.*.*:53: no such host
2020-07-01T10:31:02.611Z INFO module/wrapper.go:259 Error fetching data for metricset kubernetes.state_node: error doing HTTP request to fetch 'state_node' Metricset data: error making http request: Get http://kube-state-metrics:8080/metrics: lookup kube-state-metrics on *.*.*.*:53: no such host
2020-07-01T10:31:03.313Z ERROR process_summary/process_summary.go:102 Unknown or unexpected state <P> for process with pid 19
2020-07-01T10:31:03.313Z ERROR process_summary/process_summary.go:102 Unknown or unexpected state <P> for process with pid 20
I can add some other info which is required for this.
Make sure you have the Kube-State-Metrics deployed in your cluster in the kube-system namespace to make this work. Metricbeat will not come with this by default.
Please refer this for detailed deployment instructions.
If your kube-state-metrics is deployed to another namespace, Kubernetes cannot resolve the name. E.g. we have kube-state-metrics deployed to the monitoring namespace:
$ kubectl get pods -A | grep kube-state-metrics
monitoring kube-state-metrics-765c7c7f95-v7mmp 3/3 Running 17 10d
You could set hosts option to the full name, including namespace, like this:
- module: kubernetes
enabled: true
metricsets:
- state_node
- state_deployment
- state_replicaset
- state_statefulset
- state_pod
- state_container
- state_cronjob
- state_resourcequota
- state_service
- state_persistentvolume
- state_persistentvolumeclaim
- state_storageclass
hosts: ["kube-state-metrics.<your_namespace>:8080"]

Resources