I am new with Loki and have made an alert in Loki but I don't see any notification in the Alertmanager. Loki is working fine (collecting logs), Alertmanager also (getting alerts from other sources), but the logs from loki don't get pushed to alertmanager.
Loki config:
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /loki/chunks
compactor:
working_directory: /loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
ruler:
storage:
type: local
local:
directory: etc/loki/rules
rule_path: /etc/loki/
alertmanager_url: http://171.11.3.160:9093
ring:
kvstore:
store: inmemory
enable_api: true
Docker-compose Loki:
loki:
image: grafana/loki:2.0.0
container_name: loki
ports:
- "3100:3100"
volumes:
- ./loki/etc/local-config.yaml:/etc/loki/local-config.yaml
- ./loki/etc/rules/rules.yaml:/etc/loki/rules/rules.yaml
command:
- '--config.file=/etc/loki/local-config.yaml'
Loki rules:
groups:
- name: rate-alerting
rules:
- alert: HighLogRate
expr: |
count_over_time(({job="grafana"})[1m]) >=0
for: 1m
Does anybody know what's the problem?
I got it working atlast .
Below is my ruler config
ruler:
storage:
type: local
local:
directory: /etc/loki/rulestorage
rule_path: /etc/loki/rules
alertmanager_url: http://alertmanager:9093
ring:
kvstore:
store: inmemory
enable_api: true
enable_alertmanager_v2: true
Created below directories
/etc/loki/rulestorage/fake
/etc/loki/rules/fake
Copied alert_rules.yaml under /etc/loki/rulestorage/fake
Gave full permission for loki user under /etc/loki/rulestorage/fake
Boom
The config looks good, similar as mine. I would troubleshoot it with following steps:
Exec to docker container and check if the rules file is not empty cat /etc/loki/rules/rules.yaml
Check the logs of loki. When rules are loaded properly logs like this will pop up:
level=info ts=2021-05-06T11:18:33.355446729Z caller=module_service.go:58 msg=initialising module=ruler
level=info ts=2021-05-06T11:18:33.355538059Z caller=ruler.go:400 msg="ruler up and running"
level=info ts=2021-05-06T11:18:33.356584674Z caller=mapper.go:139 msg="updating rule file" file=/data/loki/loki-stack-alerting-rules.yaml
During runtime loki also logs info messages about your rule (I will show you the one I am running, but slightly shortened)(notice status=200 and non-empty bytes=...):
level=info
ts=...
caller=metrics.go:83
org_id=...
traceID=...
latency=fast
query="sum(rate({component=\"kube-apiserver\"} |~ \"stderr F E.*failed calling webhook \\\"webhook.openpolicyagent.org\\\". an error on the server.*has prevented the request from succeeding\"[1m])) > 1"
query_type=metric
range_type=instant
length=0s
step=0s
duration=9.028961ms
status=200
throughput=40MB
total_bytes=365kB
Then make sure you can access alertmanager http://171.11.3.160:9093 from loki container without any issues (there can be a networking problem or you have set up basic authentication, etc.).
If the rule you set up (which you can test from grafana explore window) will exceed the threshold you set for 1 minute the alert should show up in alertmanager. It will be most likely ungrouped as you didn't add any labels to it.
Related
I'm using Amazon OpenSearch with the engine OpenSearch 1.2.
I was working on setting up APM with the following details
Service 1 - Tomcat Application running on an EC2 server that accesses an RDS database. The server is behind a load balancer with a sub-domain mapped to it.
I added a file - setenv.sh file in the tomcat/bin folder with the following content
#!/bin/sh
export CATALINA_OPTS="$CATALINA_OPTS -javaagent:<PATH_TO_JAVA_AGENT>"
export OTEL_METRICS_EXPORTER=none
export OTEL_EXPORTER_OTLP_ENDPOINT=http://<OTEL_COLLECTOR_SERVER_IP>:4317
export OTEL_RESOURCE_ATTRIBUTES=service.name=<SERVICE_NAME>
export OTEL_INSTRUMENTATION_COMMON_PEER_SERVICE_MAPPING=<RDS_HOST_ENDPOINT>=Database-Service
OTEL Java Agent for collecting traces from the application
OTEL Collector and Data Prepper running on another server with the following configuration
docker-compose.yml
version: "3.7"
services:
data-prepper:
restart: unless-stopped
image: opensearchproject/data-prepper:1
volumes:
- ./pipelines.yaml:/usr/share/data-prepper/pipelines.yaml
- ./data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml
networks:
- apm_net
otel-collector:
restart: unless-stopped
image: otel/opentelemetry-collector:0.55.0
command: [ "--config=/etc/otel-collector-config.yml" ]
volumes:
- ./otel-collector-config.yml:/etc/otel-collector-config.yml
ports:
- "4317:4317"
depends_on:
- data-prepper
networks:
- apm_net
data-prepper-config.yaml
ssl: false
otel-collector-config.yml
receivers:
otlp:
protocols:
grpc:
exporters:
otlp/data-prepper:
endpoint: http://data-prepper:21890
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/data-prepper]
pipelines.yaml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts:
[
<AWS OPENSEARCH HOST>,
]
# IAM signing
aws_sigv4: true
aws_region: <AWS_REGION>
index_type: "trace-analytics-raw"
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts:
[
<AWS OPENSEARCH HOST>,
]
# IAM signing
aws_sigv4: true
aws_region: <AWS_REGION>
index_type: "trace-analytics-service-map"
The data-prepper is getting authenticated via Fine access based control with all_access role and I'm able to see the otel resources like indexes, index templates generated when running it.
On running the above setup, I'm able to see traces from the application in the Trace Analytics Dashboard of OpenSearch, and upon clicking on the individual traces, I'm able to see a pie chart with one service. I also don't see any errors in the otel-collector as well as in data-prepper. Also, in the logs of data prepper, I see records being sent to otel service map.
However, the services tab of Trace Analytics remains empty and the otel service map index also remains empty.
I have been unable to figure out the reason behind this even after going through the documentation and any help is appreciated!
I installed Loki, grafana and promtail and all three runing. on http://localhost:9080/targets Ready is True, but the logs are not displayed in Grafana and show in the explore section "No logs found"
promtail-local-config-yaml:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: ward_workstation
agent: promtail
__path__: D:/LOGs/*log
loki-local-config.yaml:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
How can i solve this problem?
Perhaps you are using Loki in Windows ?
In your promtail varlogs job ,the Path "D:/LOGs/*log" is obviously wrong, you cannot access the windows file from your docker directly.
You shoud mount your windows file to your docker like this:
promtail:
image: grafana/promtail:2.5.0
volumes:
- D:/LOGs:/var/log
command: -config.file=/etc/promtail/config.yml
networks:
- loki
Then everything will be ok.
Note that, in your promtail docker the config is like this:
you can adjust both to make a match...
Here's a general advice how to debug Loki according to the question's title:
(1) Check promtail logs
If you discover such as error sending batch you need to fix your Promtail configuration.
level=warn ts=2022-10-12T16:26:20.667560426Z caller=client.go:369 component=client host=monitor:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://loki:3100/loki/api/v1/push\": dial tcp: lookup *Loki* on 10.96.0.10:53: no such host"
(2) Open the Promtail config page and check, if Promtail has read your given configuration: http://localhost:3101/config
(3) Open the Promtail targets page http://localhost:3101/targets and check
if your service is listed as Ready
if the log file contains the wanted contents and is readable by Promtail. If you're using docker or kubernetes I would log into the Promtail Container and would try to read the logfile manually.
To the specific problem of the questioner:
The questioner said, that the services are shown as READY in the targets page. So I recommend to check (1) Promtail configuration and (3b) access to log files (as Frank).
I am trying to instrument by python app (django based) to be able to push transaction traces to Elastic APM which I can later view using the Trace Analytic in OpenDistro Elastic.
I have tried the following
Method 1:
pip install opentelemetry-exporter-otlp
Then, in the manage.py file, I added the following code to directly send traces to elastic APM.
span_exporter = OTLPSpanExporter(
endpoint="http://localhost:8200",
insecure=True
)
When I run the code I get the following error:
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 2s.
Method 2:
I tried using OpenTelemetry Collector in between since method 1 didn't work.
I configured my collector in the following way:
extensions:
memory_ballast:
size_mib: 512
zpages:
endpoint: 0.0.0.0:55679
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
memory_limiter:
# 75% of maximum memory up to 4G
limit_mib: 1536
# 25% of limit up to 2G
spike_limit_mib: 512
check_interval: 5s
exporters:
logging:
logLevel: debug
otlp/elastic:
endpoint: "198.19.11.22:8200"
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging, otlp/elastic]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging]
extensions: [memory_ballast, zpages]
And configured my code to send traces to collector like this -
span_exporter = OTLPSpanExporter(
endpoint="http://localhost:4317",
insecure=True
)
Once I start the program, I get the following error in the collector logs -
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/queued_retry.go:304
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/traces.go:116
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/queued_retry.go:155
go.opentelemetry.io/collector/exporter/exporterhelper/internal.ConsumerFunc.Consume
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/internal/bounded_queue.go:103
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*BoundedQueue).StartConsumersWithFactory.func1
go.opentelemetry.io/collector#v0.35.0/exporter/exporterhelper/internal/bounded_queue.go:82
2022-01-05T17:36:55.349Z error exporterhelper/queued_retry.go:304 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "name": "otlp/elastic", "error": "max elapsed time expired failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed", "dropped_items": 1}
What am I possibly missing here?
NOTE: I am using the latest version of opentelemetry sdk and apis and latest version of collector.
Okay, So the way to work with Open Distro version of Elastic to get traces is:
To avoid using the APM itself.
OpenDistro provides a tool called Data Prepper which must be used in order to send data(traces) from Otel-Collector to OpenDistro Elastic.
Here is the configuration I did for the Otel-Collector to send data to Data Prepper:
... # other configurations like receivers, etc.
exporters:
logging:
logLevel: debug
otlp/data-prepper:
endpoint: "http://<DATA_PREPPER_HOST>:21890"
tls:
insecure: true
... # Other configurations like pipelines, etc.
And this is how I configured Data Prepper to receive data from Collector and send it to Elastic
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- elasticsearch:
hosts: [ "http://<ELASTIC_HOST>:9200" ]
trace_analytics_raw: true
I have a very simple test setup. Data flow is as follows:
sample.log -> Promtail -> Loki -> Grafana
I am using this log file from microsoft: sample log file download link
I have promtail config as follows:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: C:\Users\user\Desktop\tmp\positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: testing_logging_a_log_file
static_configs:
- targets:
- localhost
labels:
job: testing_logging_a_log_file_labels_job_what_even_is_this
host: testing_for_signs_of_life_probably_my_computer_name
__path__: C:\Users\user\Desktop\sample.log
- job_name: testing_logging_a_log_file_with_no_timestamp_test_2
static_configs:
- targets:
- localhost
labels:
job: actor_v2
host: ez_change
__path__: C:\Users\user\Desktop\Actors_2.txt
Loki config:
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
max_transfer_retries: 0
schema_config:
configs:
- from: 2018-04-15
store: boltdb
object_store: filesystem
schema: v11
index:
prefix: index_
period: 168h
storage_config:
boltdb:
directory: C:\Users\user\Desktop\tmp\loki\index
filesystem:
directory: C:\Users\user\Desktop\tmp\loki\chunks
limits_config:
enforce_metric_name: false
reject_old_samples: True
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
The sample files are read properly the first time. I can query WARN logs with: {host="testing_for_signs_of_life_probably_my_computer_name"} |= "WARN"
Problem arises when I manually add a new log line to the sample.log file. (To emulate log lines written to the file)
2012-02-03 20:11:56 SampleClass3 [WARN] missing id 42334089511
This new line is not visible in Grafana. Is there any particular config I must to know to make this happen?
It was a problem with the network, if you remove the loki port and don't configure any network you can access it by putting http://loki:3100 in your grafana panel.
Yes, it's weird, when I append a line to a existed log file, it can't be seen in grafana explore.but ....try to do it again , append one more line, now the previous line is show in grafana
it happens when you using notepad, works well on notepad++
I am following this tutorial to set up a ELK stack (VPS B) that will receive some Docker/docker compose images logs (VPS A) using Beatfile as forwarder, my diagram is as shown below
So far, I have managed to have all the interfaces with green ticks working. However, there are still remaining some issues in that I am not able to understand. Thus, I would appreciate if someone could help me out a bit with it.
My main issue is that I don't get any Docker/docker-compose log from the VPSA into the Filebeat Server of VPSB; nevertheless, I got other logs from VPSA such as rsyslog, authentication log and so on on the Filebeat Server of VPSB. I have configured my docker-compose file to forward the logs using rsyslog as logging driver, and then filebeat is fowarding that syslog to the VPSB. Reaching this point, I do see logs from the docker daemon itself, such as virtual interfaces up/down, staring process and so, but not the "debug" logs of the containters themselves.
The configuration of Filebeat client in VPSA looks like this
root#VPSA:/etc/filebeat# cat filebeat.yml
filebeat:
prospectors:
-
paths:
- /var/log/auth.log
- /var/log/syslog
# - /var/log/*.log
input_type: log
document_type: syslog
registry_file: /var/lib/filebeat/registry
output:
logstash:
hosts: ["ipVPSB:5044"]
bulk_max_size: 2048
tls:
certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]
shipper:
logging:
files:
rotateeverybytes: 10485760 # = 10MB
level: debug
One of the docker-compose logging driver looks like this
redis:
logging:
driver: syslog
options:
syslog-facility: user
Finally I would like to ask, whether it is possible to forward natively from docker-composer the logs to Filebeat client in VPSA, red arrow in the diagram, so that it can forward them to my VPSB.
Thank you very much,
REgards!!
The issue seemed to be in FileBeat VPSA, since it has to collect data from the syslog, it has to be run before that syslog!
Updating rc.d made it work
sudo update-rc.d filebeat defaults 95 10
My filebeats.yml if someone needs it
root#VPSA:/etc/filebeat# cat filebeat.yml
filebeat:
prospectors:
-
paths:
# - /var/log/auth.log
- /var/log/syslog
# - /var/log/*.log
input_type: log
ignore_older: 24h
scan_frequency: 10s
document_type: syslog
registry_file: /var/lib/filebeat/registry
output:
logstash:
hosts: ["ipVPSB:5044"]
bulk_max_size: 2048
tls:
certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]
shipper:
logging:
level: debug
to_files: true
to_syslog: false
files:
path: /var/log/mybeat
name: mybeat.log
keepfiles: 7
rotateeverybytes: 10485760 # = 10MB
Regards