Unable to use elasticsearch sink connector (kafka-connect) - elasticsearch

I'm currently trying to start an elasticsearch sink connector on a kafka-connect cluster (distributed mode)
This cluster is deployed in kubernetes using the helm charts provided by confluent with some tweaks in it.
Here is relevants parts :
For values.yaml
configurationOverrides:
"plugin.path": "/usr/share/java,/usr/share/confluent-hub-components"
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
"value.converter": "org.apache.kafka.connect.json.JsonConverter"
"key.converter.schemas.enable": "false"
"value.converter.schemas.enable": "false"
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter"
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter"
"config.storage.replication.factor": "3"
"offset.storage.replication.factor": "3"
"status.storage.replication.factor": "3"
"security.protocol": SASL_SSL
"sasl.mechanism": SCRAM-SHA-256
And for the kube cluster part :
releases:
- name: kafka-connect
tillerless: true
tillerNamespace: qa3-search
chart: ../charts/cp-kafka-connect
namespace: qa3-search
values:
- replicaCount: 2
- configurationOverrides:
config.storage.topic: kafkaconnectKApp_connect-config_private_json
offset.storage.topic: kafkaconnectKApp_connect-offsets_private_json
status.storage.topic: kafkaconnectKApp_connect-statuses_private_json
connect.producer.client_id: "connect-worker-producerID"
groupId: "kafka-connect-group-ID"
log4j.root.loglevel: "INFO"
bootstrap_servers: "SASL_SSL://SOME_ACCESSIBLE_URL:9094"
client.security.protocol: SASL_SSL
client.sasl.mechanism: SCRAM-SHA-256
- prometheus:
jmx:
enabled: false
- ingress:
enabled: true
hosts:
- host: kafka-connect.qa3.k8s.XXX.lan
paths:
- /
- cp-schema-registry:
url: "https://SOME_ACCESSIBLE_URL"
Then I am loading the elasticsearch sink connector as such :
curl -X POST -H 'Content-Type: application/json' http://kafka-connect.qa3.k8s.XXX.lan/connectors -d '{
"name": "similarads3",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"consumer.interceptor.classes": "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor",
"topics": "SOME_TOPIC_THAT_EXIST",
"topic.index.map": "SOME_TOPIC_THAT_EXIST:test_similar3",
"connection.url": "http://vqa38:9200",
"batch.size": 1,
"type.name": "similads",
"key.ignore": true,
"errors.log.enable": true,
"errors.log.include.messages": true,
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "SOME_ACCESSIBLE_URL",
"schema.ignore": true
}
}' -vvv
More over I'm loading user and password for brokers auth via environment variable, and I'm pretty sure it is connected with rights ACL...
What is troubling me, is that there is no index creation when the connector starts, and there is no error what so ever in kafka-connect's logs... And it says everything has started
Starting connectors and tasks using config offset 68
When running a curl on /connectors/similarads3/status, everything is running, without errors.
So it seems like I overlooked something, but I can't figure out what is missing.
When I check consumers lag on this particular topics, it seems like no messages where consumed ever.
If there is not enough information, I'm able to provide more.
Does someone have an idea ?
EDIT : I should have mentioned that I tried to configure it with a topic that does not exist : again no error in logs. (I don't know how to interpret this)
EDIT 2 : This issue is solved
Actually we found the issue and it appears that i did overlooked something: in order to read from a topic protected by ACLs rights, you have to provide the SASL configuration for both the connector and the sink consumer.
So just duplicating the configuration prefixed with consumer. fixed this problem.
However I'm still surprised that no logs can point to this.

We had issues trying to use the topic.index.map property. Even if you got it working there is a note in the docs that it is deprecated.
topic.index.map
This option is now deprecated. A future version may remove it completely. Please use single message transforms, such as RegexRouter, to map topic names to index names.
I'd try using the RegexRouter to accomplish this instead.
"transforms": "renameTopicToIndex",
"transforms.renameTopicToIndex.type": "org.apache.kafka.connect.transforms.RegexRouter"
"transforms.renameTopicToIndex.regex": ".*"
"transforms.renameTopicToIndex.replacement": "test_similar3"

Related

Spring RabbitMQ convertAndSend is not working properly

I am using this code to queue data into RabbitMQ: https://www.javainuse.com/spring/spring-boot-rabbitmq-hello-world
I configured the following properties correctly to match the RabbitMQ configuration
Host
Username
Password
Exchange
Routing key
Queue
But RabbitMQSender#send or rabbitTemplate.convertAndSend(exchange, routingkey, company); is not queuing any data into RabbitMQ and in the same time it's not returning any error
I tried to change the username or pwd to an incorrect one and I got not_authorized so the connection with correct username/pwd/queue/exchange/routingkey seems fine but it's not doing anything.
I tried to send event via Curl and it's working correctly, the event is queued correctly in RabbitMQ
curl -v -u username:pwd -H "Accept: application/json" -H "Content-Type:application/json" POST -d'{
"properties": {
},
"routing_key": "my-routingkey",
"payload":"hi",
"payload_encoding": "string"
}' localhost:15672/api/exchanges/%2F/my-exchange/publish
Does the spring RabbitTemplate#convertAndSend execute in the background this API localhost:15672/api/exchanges/%2F/my-exchange/publish ?
If not, what I need to change in my code?
I was trying to queue events into a remote RabbitMQ server which was not configured properly in kubernetes: it was missing the storage field storage: 10Gi and the RabbitMQ was failing silently ...
spec:
replicas: 1
image: rabbitmq:3.10.7-management
persistence:
storageClassName: managed-csi
storage: 10Gi
Please check whether exchange with the correct name is created

How to Use a Custom Ingest Pipeline with a Filebeat Module

How do I use a custom ingest pipeline with a Filebeat module? In my case, I'm using the apache module.
According to multiple sources, this is supposedly configurable via output.elasticsearch.pipeline / output.elasticsearch.pipelines[pipeline]. Sources follow:
https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#pipelines-option-es
https://stackoverflow.com/a/58726519/1026263
https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#pipelines-option-es
However, after many attempts at different permutations, I have never been able to influence which ingest pipeline is used by the Filebeat; it always uses the module's stock ingest pipeline.
This is just one of the many attempts:
filebeat.config:
filebeat.modules:
- module: apache
access:
enabled: true
var.paths: ["/var/log/apache2/custom_access*"]
error:
enabled: true
var.paths: ["/var/log/apache2/custom_error*"]
filebeat.config.modules:
reload.enabled: true
reload.period: 5s
output.elasticsearch:
hosts: ["${ELASTICSEARCH_URL}"]
pipeline: "apache_with_optional_x_forwarded_for"
Running filebeat with debug (-d "*") shows the following, which, I assume, demonstrates that my specification has been ignored. (I can also tell by the resulting docs in Elasticsearch that my custom pipeline was sidestepped.)
2021-12-16T23:23:47.464Z DEBUG [processors] processing/processors.go:203 Publish event: {
"#timestamp": "2021-12-16T23:23:47.464Z",
"#metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.10.2",
"pipeline": "filebeat-7.10.2-apache-access-pipeline"
},
I have tried this in both Filebeat v6.8 and v7.10 (in the docker.elastic.co/beats/filebeat docker images).
This is similar to these threads, which never had a satisfactory conclusion:
How to use custom ingest pipelines with docker autodiscover
How to specify pipeline for Filebeat Nginx module?
Well, according to this PR to on the beats repository, to override the module pipeline you need to specify the custom pipeline in the input configuration, not on the output.
Try this:
filebeat.modules:
- module: apache
access:
enabled: true
input.pipeline: your-custom-pipeline
var.paths: ["/var/log/apache2/custom_access*"]
error:
enabled: true
input.pipeline: your-custom-pipeline
var.paths: ["/var/log/apache2/custom_error*"]

Filebeat's GCP Module keep getting hash config error

I am currently trying to forward GCP's Cloud Logging to Filebeat to be forwarded to Elasticsearch following this docs with the GCP module settings on filebeat according to this docs
Currently I am only trying to forward audit logs so my gcp.yml module is as follows
- module: gcp
vpcflow:
enabled: false
var.project_id: my-gcp-project-id
var.topic: gcp-vpc-flowlogs
var.subscription_name: filebeat-gcp-vpc-flowlogs-sub
var.credentials_file: ${path.config}/gcp-service-account-xyz.json
#var.internal_networks: [ "private" ]
firewall:
enabled: false
var.project_id: my-gcp-project-id
var.topic: gcp-vpc-firewall
var.subscription_name: filebeat-gcp-firewall-sub
var.credentials_file: ${path.config}/gcp-service-account-xyz.json
#var.internal_networks: [ "private" ]
audit:
enabled: true
var.project_id: <my prod name>
var.topic: sample_topic
var.subscription_name: filebeat-gcp-audit
var.credentials_file: ${path.config}/<something>.<something>
When I run sudo filebeat setup I keep getting this error
2021-05-21T09:02:25.232Z ERROR cfgfile/reload.go:258 Error loading configuration files: 1 error: Unable to hash given config: missing field accessing '0.firewall' (source:'/etc/filebeat/modules.d/gcp.yml')
Although I can start the service, but I don't seem to see any logs forwarded from GCP's Cloud Logging pub/sub topic to elastic search.
Help or tips on best practice too would be appreciated.
Update
If I were to follow the docs in here, it would give me the same error but in audit

How to decode JSON in ElasticSearch load pipeline

I set up ElasticSearch on AWS and I am trying to load application log into it. The twist is that application log entry is in JSON format, like
{"EventType":"MVC:GET:example:6741/Common/GetIdleTimeOut","StartDate":"2021-03-01T20:46:06.1207053Z","EndDate":"2021-03-01","Duration":5,"Action":{"TraceId":"80001266-0000-ac00-b63f-84710c7967bb","HttpMethod":"GET","FormVariables":null,"UserName":"ZZZTHMXXN"} ...}
So, I am trying to unwrap it. Filebeat docs suggest that there is decode_json_fields processor; however, I am getting message fields in Kinbana as a single JSON string; nothing unwrapped.
I am new to ElasticSearch, but I am not going to use it as an excuse not to do analysis first. Only as an explanation that I am not sure which information is helpful for answering the question.
Here is filebeat.yml:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/opt/logs/**/*.json
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
- decode_json_fields:
fields: ["message"]
output.logstash:
hosts: ["localhost:5044"]
And here is Logstash configuration file:
input {
beats {
port => "5044"
}
}
output {
elasticsearch {
hosts => ["https://search-blah-blah.us-west-2.es.amazonaws.com:443"]
ssl => true
user => "user"
password => "password"
index => "my-logs"
ilm_enabled => false
}
}
I am still trying to understand the filtering and grok parts of Logstash, but it seems that it should work the way it is. Also, I am not sure where the actual tag messages comes from (probably, from Logstash or Filebeat), but it seems irrelevant as well.
UPDATE: AWS documentation doesn't give an example of just loading through filebeat, without logstash.
If I don't use logstash (just FileBeat) and have the following section in filebeat.yml:
output.elasticsearch:
hosts: ["https://search-bla-bla.us-west-2.es.amazonaws.com:443"]
protocol: "https"
#index: "mylogs"
# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
username: "username"
password: "password"
I am getting the following errors:
If I use index: "mylogs" - setup.template.name and setup.template.pattern have to be set if index name is modified
And if I don't use index (where would it go in ES then?) -
Failed to connect to backoff(elasticsearch(https://search-bla-bla.us-west-2.es.amazonaws.com:443)): Connection marked as failed because the onConnect callback failed: cannot retrieve the elasticsearch license from the /_license endpoint, Filebeat requires the default distribution of Elasticsearch. Please make the endpoint accessible to Filebeat so it can verify the license.: unauthorized access, could not connect to the xpack endpoint, verify your credentials
If transmitting via logstash works in general, add a filter block as Val proposed in the comments and use this json plugin/filter: elastic.co/guide/en/logstash/current/plugins-filters-json.html - it automatically parses the json into elasticsearch fields

None of the configured nodes are available issue with spring boot

Hi friends i am developing spring boot project with elastic search i have setup elastic search on local machine and i have installed Head plugin in elastic search . My elastic search is setup correctly showing green sign.
My application-dev.yml file in my project is as follows:
server:
port: 8080
liquibase:
context: dev
spring:
profiles:
active: dev
datasource:
dataSourceClassName: org.h2.jdbcx.JdbcDataSource
url: jdbc:h2:mem:jhipster;DB_CLOSE_DELAY=-1
databaseName:
serverName:
username:
password:
jpa:
database-platform: com.aquevix.demo.domain.util.FixedH2Dialect
database: H2
openInView: false
show_sql: true
generate-ddl: false
hibernate:
ddl-auto: none
naming-strategy: org.hibernate.cfg.EJB3NamingStrategy
properties:
hibernate.cache.use_second_level_cache: true
hibernate.cache.use_query_cache: false
hibernate.generate_statistics: true
hibernate.cache.region.factory_class: org.hibernate.cache.ehcache.SingletonEhCacheRegionFactory
data:
elasticsearch:
cluster-name: elasticsearch
cluster-nodes: localhost:9200
messages:
cache-seconds: 1
thymeleaf:
mode: XHTML
cache: false
activemq:
broker-url: tcp://localhost:61616
metrics:
jmx.enabled: true
spark:
enabled: false
host: localhost
port: 9999
graphite:
enabled: false
host: localhost
port: 2003
prefix: TestApollo
cache:
timeToLiveSeconds: 3600
ehcache:
maxBytesLocalHeap: 16M
Elastic search service is running on my machine but when i try to save entity first my code save entity in mysql then in elastic search using elastic search repository but on saving entity into elastic it throws error:
Hibernate: insert into EMPLOYEE (id, rollno) values (null, ?)
[ERROR] com.aquevix.demo.aop.logging.LoggingAspect - Exception in com.aquevix.demo.web.rest.EmployeeResource.create() with cause = null
org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: []
at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:298) ~[elasticsearch-1.3.2.jar:na]
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:214) ~[elasticsearch-1.3.2.jar:na]
at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:105) ~[elasticsearch-1.3.2.jar:na]
at org.elasticsearch.client.support.AbstractClient.index(AbstractClient.java:94) ~[elasticsearch-1.3.2.jar:na]
at org.elasticsearch.client.transport.TransportClient.index(TransportClient.java:331) ~[elasticsearch-1.3.2.jar:na]
at org.elasticsearch.action.index.IndexRequestBuilder.doExecute(IndexRequestBuilder.java:313) ~[elasticsearch-1.3.2.jar:na]
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:91) ~[elasticsearch-1.3.2.jar:na]
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:65) ~[elasticsearch-1.3.2.jar:na]
at org.springframework.data.elasticsearch.core.ElasticsearchTemplate.index(ElasticsearchTemplate.java:431) ~[spring-data-elasticsearch-1.1.3.RELEASE.jar:na]
at org.springframework.data.elasticsearch.repository.support.AbstractElasticsearchRepository.save(AbstractElasticsearchRepository.java:138) ~[spring-data-elasticsearch-1.1.3.RELEASE.jar:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_51]
i have also use 9300 port instead of 9200 but nothing is working. I have tried everything but could find solutions please help!
I have found the solution ES2.0 is not working correctly so i re-install ES1.7.3 now it is working in my case. complete details here!
I had the same problem as you, and also using Jhipster too. As mentioned one possible solution is to downgrade your elasticsearch instance but if you don't want to downgrade it, here is what it worked for me:
update spring boot to the lastet version (> 1.4.0.RC1)
Configure ElasticsearchTemplate manually instead of using autoconfiguration.
Please if you need more information have a look to this post:
http://ignaciosuay.com/how-to-connect-spring-boot-to-elasticsearch-2-x-x/
I encountered this error, and for me, the reason was that I was using the incorrect cluster name.
Steps to troubleshoot this error:
Make sure that Spring Data Elasticsearch is compatible with the Elasticsearch version that you intend to use. There is a table in the project's README which corresponds Spring Data Elasticsearch versions with Elasticsearch versions:
https://github.com/spring-projects/spring-data-elasticsearch#quick-start
In my case, I am using Spring Data Elasticsearch 3.0.7. According to the table, I need to use Elasticsearch 5.5.0, but I have found that Spring Data Elasticsearch 3.0.7 appears to be compatible with Elasticsearch 5.6.x as well.
Make sure that the spring.data.elasticsearch.cluster-nodes property specifies whatever port your Elasticsearch cluster is using for communication using the native Elasticsearch transport protocol.
By default, Elasticsearch listens on two ports, 9200 and 9300. Port 9200 is for communication using the RESTful API. Port 9300 is for communication using the transport protocol:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_talking_to_elasticsearch.html
The Java client that Spring Data Elasticsearch uses expects to communicate using the transport protocol (9300 by default).
Make sure that the spring.data.elasticsearch.cluster-name property specifies the correct cluster name.
If you do not specifically set this property, then the default is "elasticsearch".
You can look up the Elasticsearch cluster name using the RESTful API:
curl -XGET 'http://localhost:9200/?pretty'
This command will print something similar to:
{
"name" : "XXXXXXX",
"cluster_name" : "some_cluster_name",
"cluster_uuid" : "XXXXXXXXXXXXXXXXXXXXXX",
"version" : {
"number" : "5.6.10",
"build_hash" : "b727a60",
"build_date" : "2018-06-06T15:48:34.860Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
Make sure to set the value of the spring.data.elasticsearch.cluster-name property to the same string shown for "cluster_name".
You seem to be using JHipster (wonderful toolset if I may add) which uses
org.springframework.boot:spring-boot-starter-data-elasticsearch: ->
1.3.3.RELEASE
This only works with ElasticSearch BELOW 2.0 so just install ElasticSearch 1.7.3 and run your code

Resources