Is it possible to SET settings while logging using clickhouse-client? - clickhouse

I want to add a new Setting to Clickhouse codebase.
Now after doing the changes and compiling Clickhouse I want to test it.
Can I set that setting during authentication using clickhouse-client?
eg let's say there is a setting named max_concurrent_queries_for_user
./clickhouse-client --port 6667 --send_logs_level=trace SET max_concurrent_queries_for_user=100
I can log in like this, but now sure if the setting is applied or not.

clickhouse-client has a rich set of options.
To get a full list of available options run the command:
clickhouse-client --help
Main options:
...
--max_concurrent_queries_for_user arg The maximum number of concurrent
requests per user.
--insert_deduplicate arg For INSERT queries in the replicated
table, specifies that deduplication of
insertings blocks should be preformed
...
*/
The option --max_concurrent_queries_for_user arg defines the "The maximum number of concurrent requests per user".

Related

Debezium MongoDB connector does not perform initial snapshot

I am using MongoDB atlas with a sharded replica set cluster, with the Debezium MongoDB connector as described in the documentation.
This is how my current config looks like (running a standalone setup):
name=dev-mongodb
connector.class=io.debezium.connector.mongodb.MongoDbConnector
tasks.max=4
mongodb.hosts=<some-url>.mongodb.net:27017
mongodb.name=mongodb
mongodb.user=<admin_user>
mongodb.password=<admin_user_pw>
database.include.list=<list_of_databases>
database.history.kafka.bootstrap.servers=<list_of_aws_msk_brokers>
database.history.kafka.topic=mongodb.history
include.schema.changes=true
mongodb.ssl.enabled=true
I can receive CDC events in kafka topic but the initial snapshot that the documentation describes is never made. I have tried with a different mongodb.name resulting in entirely different set of topics being created and used, but the same outcome.
The MongoDB oplog has ~2M rows, kafka topics have hardly a few thousand messages in total.
On further digging up, it seems the connector records an offset for the last position of the oplog. Is it possible to reset this offset?
It sounds to me like you're using the same connector name in your multiple deployments, which means that despite changing the configuration and trying to reset the connector's state, it continues to find the prior offsets and restores the oplog position.
There are two alternatives:
Create a new connector with a completely different connector name.
Manually clear the offsets for the connector
A lot of users prefer the first option simply because it is the easiest. Kafka records a connector's offsets based on the connector's name and therefore by simply adjusting the name of a connector will tell Kafka that the connector is completely brand new and it won't find any persisted offsets to be restored.
The second option is a bit involved because you need to first locate the Kafka topic that stores the offsets, typically this is connect-offsets by default but can be overridden. Once you know the topic, you should shutdown all connectors that are using this topic. If you adjust this topic while a connector is using it, it can lead to unexpected behavior.
Using the kafkacat tool available from Kafka, you'll want to run the following, which assumes the default connect offsets topic name, so adjust that accordingly:
$ kafkacat -b localhost:9092 -t connect-offsets -C -f '\nKey (%K bytes): %k
Value (%S bytes): %s
Timestamp: %T
Partition: %p
Offset: %o\n'
This will generate some output and its important to take note of both the "Key" and "Partition". In order to reset the offsets, you're going to want to effectively write a NULL (or tombstone) into the topic using the correct "Key" and "Partition" values.
Assuming the above provided this output:
% Reached end of topic connect-offsets [0] at offset 0
% Reached end of topic connect-offsets [1] at offset 0
[…]
Key (52 bytes): ["source-file-01",{"filename":"/data/testdata.txt"}]
Value (15 bytes): {"position":87}
Timestamp: 1565859303551
Partition: 20
Offset: 0
[…]
You would want to execute the following command:
$ echo '["source-file-01",{"filename":"/data/testdata.txt"}]#' | \
kafkacat -b localhost:9092 -t connect-offsets -P -Z -K# -p 20
In the echo statement, we specify the key followed by the key separator # defined by the kafkacat argument -K# and the -Z option which is to send an empty value as NULL. The -p argument is where the partition is to be specified and its important that the key and partition be set correctly.
After this is done, you can safely restart the connectors that used that offset topic and you should see that the connector acts like its a brand new deployment.
Be mindful that if you are working with a connector that uses a database history topic such as MySQL, SQL Server, or Oracle, the database history topic will also need to be cleared as well.
As I said earlier however, its just simplier to redeploy the connector using a new name to avoid needing to do all the kafka topic magic to arrive at the same outcome.

How do I Monitoring Clickhouse instances effectively(GCP)

I want to build some sort of materialized view on the system. merges, metrics, asynchronous_metrics, so I get a time-series view of system health(memory consumption, etc).
How is this possible I tried for the system. merges but all I get are the currently running merges?
You can use following variants:
export metrics via graphite protocol to clickhouse itself:
turn on graphite export https://clickhouse.tech/docs/en/operations/server_settings/settings/#server_settings-graphite
use https://github.com/lomik/graphite-clickhouse for storage exported data back to clickhouse
complete vagrant demo stand here: https://github.com/Slach/clickhouse-metrics-grafana/
use undocumented system.metric_log table
look at https://github.com/ClickHouse/ClickHouse/issues/6363 and https://github.com/ClickHouse/ClickHouse/search?q=metric_log, https://github.com/ClickHouse/ClickHouse/blob/master/dbms/programs/server/config.d/metric_log.xml
turn on system.metric_log in /etc/clickhouse-server/config.d/metric_log.xml
<yandex>
<metric_log>
<database>system</database>
<table>metric_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
<collect_interval_milliseconds>1000</collect_interval_milliseconds>
</metric_log>
</yandex>
be careful, according to https://github.com/ClickHouse/ClickHouse/blob/master/dbms/src/Interpreters/MetricLog.cpp#L18 system.asynchronous_metrics doesn't flush into system.metric_log

Oracle Listener utility and batch file

I have 03 listener on same database server. The listeners are: listener_cympp1(1522), listener_cymap1 (1523), listener_cympd1 (1524)
How can I change name of listener log by batch file automaticall when the log is over 100 MB?
Which syntax can I use to set the listener name in this command? First executing "lsnrctl set current_listener listener_cympp1" doesn't help.
kind regards,
If I understand correctly you want rotate the listener log files with 100MB size log files. In addition, you request of explicitly specifying the log file name of each listener.
You can perform this without the need of a batch or shell script. The listener can be configured to do the log rotate based on size and number of files.
You can find all relevant information on Oracle documentation for the listener configuration parameters here.
The parameters of interest are:
TRACE_DIRECTORY_listener_name
TRACE_FILE_listener_name
TRACE_FILELEN_listener_name
TRACE_FILENO_listener_name
You can set the parameter values per listener.

How to check vnode disabled on a hadoop node

Linking back to this question:
Why not enable virtual node in an Hadoop node?
I'm running a mixed 3 node cluster with 2 cassandra and 1 analytics nodes and disabled the virtual nodes by generating 3 tokens with the utility given by DataStax enterprise.
But when I run 'nodetool status' command, I still see 256 tokens with each node and when a mapreduce job is created, it creates 257 mappers and takes a very long time to execute a query with small data.
So my specific questions are:
Is virtual node setting still not disabled? How can I verify if its disabled?
If its disabled then why 257 mappers are still created for each job? Is there a different configuration for that?
Thanks much for any help!!
1) It's not disabled. You can tell because it still says 256 tokens in nodetool status.
To disable vnodes you need to make sure that you change the num_tokens variable in the cassandra.yamnl
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
# num_tokens: 256 << Make sure this line is commented out
# initial_token allows you to specify tokens manually. While you can use it with
# vnodes (num_tokens > 1, above) -- in which case you should provide a
# comma-separated list -- it's primarily used when adding nodes to legacy clusters
# that do not have vnodes enabled.
initial_token: << Your generated token goes here

How to do load-testing for 100 concurrent users login with unique username and passwords using Jmeter?

My test plan scenario is to do load-testing for 100 concurrent users login to website.
I have created Threadgroup with Number of threads as 100.
Created CSV file which contains 100 users login details (unique usernames and passwords).
Under Sign in sample added a “User Parameter” from Thread Group -> PreProcessors to it. Added variables using __CSVRead function which reads values from file test.csv.
Selected the login sample and changed the values of userid and password to ${A} and ${B}.
Is this the right way to do or is there any alternative way to achieve this?
If this works for you and works as you expect, that's enough.
But looks like CSV Data Set Config is more suitable and easier to use for multi-user scenarios than __CSVRead function:
Thread Group
Number of Threads: N // count of your test-threads (users)
Loop Count: 1
CSV Data Set Config
Filename: [path to your csv-file with usernames / passwords]
Variable Names: username,pwd // extracted values can be referred as ${username}, ${pwd}
Recycle on EOF? False
Stop thread on EOF? True
Sharing mode: Current thread group
. . .
HTTP Request // your http call
. . .
As per documentation:
The function is not suitable for use with large files, as the entire
file is stored in memory. For larger files, use CSV Data Set Config
element or StringFromFile.
Pretty detailed guides available here:
How to use a CSV file with JMeter
Using CSV DATA SET CONFIG

Resources