How to get GroupID in Golang Kafka 10? - go

I am using Kafka 10.0 and https://github.com/Shopify/sarama.
I am trying to get the offset of the latest message that a consumer processed.
To do so I've found the method NewOffsetManagerFromClient(group string, client Client) which require the group name.
How do I get consumer group name?
offsets := make(map[int32]int64)
config := sarama.NewConfig()
config.Consumer.Offsets.CommitInterval = 200 * time.Millisecond
config.Version = sarama.V0_10_0_0
// config.Consumer.Offsets.Initial = sarama.OffsetNewest
cli, _ := sarama.NewClient(kafkaHost, config)
defer cli.Close()
offsetManager, _ := sarama.NewOffsetManagerFromClient(group, cli)
for _, partition := range partitions {
partitionOffsetManager, _ := offsetManager.ManagePartition(topic, partition)
offset, _ := partitionOffsetManager.NextOffset()
offsets[partition] = offset
}
return offsets
I created a consumer with
consumer := sarama.NewConsumer(connections, config)
but I do not know how to create a consumer group and get its group name.

You are attempting to create your own offset manager to find current offsets:
offsetManager, _ := sarama.NewOffsetManagerFromClient(group, cli)
Similarly, the consumer that was consuming your topic's messages would have to use the same offset manager and they would have used a specific group id. Use that group id.

I think you can use any string as groupId. Please look at the example from sarama GoDoc
// Start a new consumer group
group, err := NewConsumerGroupFromClient("my-group", client)
if err != nil {
panic(err)
}
defer func() { _ = group.Close() }()
Maybe you can give it any string. And you should make sure the other consumers can get the same groupId for joining the group.

Related

Large batch Inserts Using Snowflake & Go

I am retrieving payloads from a REST API with which I then want to insert into a Snowflake table.
My current process is to use the Snowflake DB connection and iterate over a slice of structs (which contain my data from the API). However, this doesn't seem to be efficient or optimal. Everything is successfully loading, but I am trying to figure out how to optimize a large amount of inserts for potentially thousands of records. Perhaps there needs to be a separate channel for insertions instead of synchronously inserting?
General code flow:
import (
"database/sql"
"fmt"
"sync"
"time"
_ "github.com/snowflakedb/gosnowflake"
)
func ETL() {
var wg sync.WaitGroup
ch := make(chan []*Response)
defer close(ch)
// Create requests to API
for _, req := range requests {
// All of this flows fine without issue
wg.Add(1)
go func(request Request) {
defer wg.Done()
resp, _ := request.Get()
ch <- resp
}(request)
}
// Connect to snowflake
// This is not a problem
connString := fmt.Sprintf(config...)
db, _ := sql.Open("snowflake", connString)
defer db.Close()
// Collect responses from our channel
results := make([][]*Response, len(requests))
for i, _ := range results {
results[i] <-ch
for _, res := range results[i] {
// transform is just a function to flatten my structs into entries that I would like to insert into Snowflake. This is not a bottleneck.
entries := transform(res)
// Load the data into snowflake, passing the entries that have been
// Flattened as well as the db connection
err := load(entries, db)
}
}
}
type Entry struct {
field1 string
field2 string
statusCode int
}
func load(entries []*Entry, db *sql.DB) error {
start := time.Now()
for i, entry := range entries {
fmt.Printf("Loading entry %d\n", i)
stmt := `INSERT INTO tbl (field1, field2, updated_date, status_code)
VALUES (?, ?, CURRENT_TIMESTAMP(), ?)`
_, err := db.Exec(stmt, entry.field1, entry.field2, entry.statusCode)
if err != nil {
fmt.Println(err)
return err
}
}
fmt.Println("Load time: ", time.Since(start))
return nil
}
Instead of INSERTing individual rows, collect rows in files and each time you push one of these to S3/GCS/Azure it will be loaded immediately.
I wrote a post detailing these steps:
https://medium.com/snowflake/lightweight-batch-streaming-to-snowflake-with-gcp-pub-sub-1790ab76da31
With the appropriate storage integration, this would auto-ingest the files:
create pipe temp.public.meetup202011_pipe
auto_ingest = true
integration = temp_meetup202011_pubsub_int
as
copy into temp.public.meetup202011_rsvps
from #temp_fhoffa_gcs_meetup;
Also check these considerations:
https://www.snowflake.com/blog/best-practices-for-data-ingestion/
Soon: If you want to send individual rows and ingest them in real time into Snowflake - that's in development (https://www.snowflake.com/blog/snowflake-streaming-now-hiring-help-design-and-build-the-future-of-big-data-and-stream-processing/).

Kafka logs in go service

I need kafka consumer logs for debug. I do the following:
chanLogs := make(chan confluentkafka.LogEvent)
go func() {
for {
logEv := <-chanLogs
logger.Debug("KAFKA: " + logEv.String())
}
}()
configMap["go.logs.channel.enable"] = true
configMap["go.logs.channel"] = chanLogs
consumer, err := confluentkafka.NewConsumer(&configMap)
err := consumer.SubscribeTopics(Topics, nil)
And I never get a line. I tried it with kafka chan (consumer.Logs()) with the same result. What I do wrong?
UPD
In initial post I wrongfully set parameter name. The correct one is go.logs.channel.enable. But sometimes this still don't work.
As described in the doc, you should enable that feature:
go.logs.channel.enable (bool, false) - Forward log to Logs() channel.
go.logs.channel (chan kafka.LogEvent, nil) - Forward logs to application-provided channel instead of Logs(). Requires go.logs.channel.enable=true.
So change your code like:
configMap["go.logs.channel"] = chanLogs
configMap["go.logs.channel.enable"] = true
consumer, err := confluentkafka.NewConsumer(&configMap)
See also in the doc or in the sample on the code repo here
The solution was to add
configMap["debug"] = "all"
I found it here

Streaming values from websocket, determining if I am lagging behind processing the data

I am connecting to a websocket that is stream live stock trades.
I have to read the prices, perform calculations on the fly and based on these calculations make another API call e.g. buy or sell.
I want to ensure my calculations/processing doesn't slow down my ability to stream in all the live data.
What is a good design pattern to follow for this type of problem?
Is there a way to log/warn in my system to know if I am falling behind?
Falling behind means: the websocket is sending price data, and I am not able to process that data as it comes in and it is lagging behind.
While doing the c.ReadJSON and then passing the message to my channel, there might be a delay in deserializing into JSON
When inside my channel and processing, calculating formulas and sending another API request to buy/sell, this will add delays
How can I prevent lags/delays and also monitor if indeed there is a delay?
func main() {
c, _, err := websocket.DefaultDialer.Dial("wss://socket.example.com/stocks", nil)
if err != nil {
panic(err)
}
defer c.Close()
// Buffered channel to account for bursts or spikes in data:
chanMessages := make(chan interface{}, 10000)
// Read messages off the buffered queue:
go func() {
for msgBytes := range chanMessages {
logrus.Info("Message Bytes: ", msgBytes)
}
}()
// As little logic as possible in the reader loop:
for {
var msg interface{}
err := c.ReadJSON(&msg)
if err != nil {
panic(err)
}
chanMessages <- msg
}
}
You can read bytes, pass them to the channel, and use other goroutines to do conversion.
I worked on a similar crypto market bot. Instead of creating large buffured channel i created buffered channel with cap of 1 and used select statement for sending socket data to channel.
Here is the example
var wg sync.WaitGroup
msg := make(chan []byte, 1)
wg.Add(1)
go func() {
defer wg.Done()
for data := range msg {
// decode and process data
}
}()
for {
_, data, err := c.ReadMessage()
if err != nil {
log.Println("read error: ", err)
return
}
select {
case msg <- data: // in case channel is free
default: // if not, next time will try again with latest data
}
}
This will insure that you'll get the latest data when you are ready to process.

Golang segmentio/kafka-go consumer not working

I am using segmentio/kafka-go to connect to Kafka.
// to produce messages
topic := "my-topic"
partition := 0
conn, _ := kafka.DialLeader(context.Background(), "tcp", "localhost:9092", topic, partition)
conn.SetWriteDeadline(time.Now().Add(10*time.Second))
conn.WriteMessages(
kafka.Message{Value: []byte("one!")},
kafka.Message{Value: []byte("two!")},
kafka.Message{Value: []byte("three!")},
)
conn.Close()
I am able to produce into my Kafka server using this code.
// to consume messages
topic := "my-topic"
partition := 0
conn, _ := kafka.DialLeader(context.Background(), "tcp", "localhost:9092", topic, partition)
conn.SetReadDeadline(time.Now().Add(10*time.Second))
batch := conn.ReadBatch(10e3, 1e6) // fetch 10KB min, 1MB max
b := make([]byte, 10e3) // 10KB max per message
for {
_, err := batch.Read(b)
if err != nil {
// err -> "invalid codec"
break
}
fmt.Println(string(b))
}
batch.Close()
conn.Close()
But I am unable to consume using the above code. I am getting the error invalid codec. What can be the reason?
In case relevant, I tweaked the minimum batch size to 1 so that it tries to consume something.
Just a guess:
try adding an import to load compression codecs, in case your topics use compression.
import _ "github.com/segmentio/kafka-go/snappy"

Insert thousand nodes to neo4j using golang

I am importing data to neo4j using neoism, and I have some issues importing big data, 1000 nodes, would take 8s. here is a part of the code that imports 100nodes.
quite basic code, needs improvement, anyone can help me improve this?
var wg sync.WaitGroup
for _, itemProps := range items {
wg.Add(1)
go func(i interface{}) {
s := time.Now()
cypher := neoism.CypherQuery{
Statement: fmt.Sprintf(`
CREATE (%v)
SET i = {Props}
RETURN i
`, ItemLabel),
Parameters: neoism.Props{"Props": i},
}
if err := database.ExecuteCypherQuery(cypher); err != nil {
utils.Error(fmt.Sprintf("error ImportItemsNeo4j! %v", err))
wg.Done()
return
}
utils.Info(fmt.Sprintf("import Item success! took: %v", time.Since(s)))
wg.Done()
}(itemProps)
}
wg.Wait()
Afaik neoism still uses old APIs, you should use cq instead: https://github.com/go-cq/cq
also you should batch your creates,
i.e. either send multiple statements per request, e.g 100 statements per request
or even better send a list of parameters to a single cypher query:
e.g. {data} is a [{id:1},{id:2},...]
UNWIND {data} as props
CREATE (n:Label) SET n = props

Resources