Write data come from concurrent requests to Kafka - performance

I'm building an event collector, it will receive a http request like http://collector.me/?uuid=abc123&product=D3F4&metric=view then write request parameters to Apache Kafka topic, now I use Plug, Cowboy and KafkaEx.
defmodule Collector.Router do
import Plug.Conn
def init(opts) do
opts
end
def call(conn, _opts) do
conn = fetch_query_params(conn)
KafkaEx.produce("test", 0, "#{inspect conn.query_params}")
conn
|> put_resp_content_type("text/plain")
|> send_resp(200, "OK")
end
end
AFAIK, Cowboy spawns a new process for each request, so I think write to Kafka in the call function is a proper way because it's easy to create hundreds of thousands of processes in Elixir. But I wonder if this is the right way to do? Do I need a queue before write to Kafka or something like that? My goal is handle as much concurrent requests as possible.
Thanks.

Consider using the Confluent Kafka REST Proxy because then you might not need to write any server side code.
https://github.com/confluentinc/kafka-rest
Worst case is you might need to rewrite the incoming URL into a properly formatted HTTP POST with JSON data and the right HTTP header for Content-Type. This can be done with and application load balancer or a basic reverse Proxy like haproxy or nginx.

Related

How do I expose my own Elixir websocket using WebSockex

I see the basic example for Elixir's WebSockex library here but it doesn't really explain how I can expose my own websocket to the internet. This question has answers which explain how to chat to an existing websocket externally, but I want to expose my own websocket. I'm actually using websockex as part of a Phoenix application, so perhaps bits of Phoenix might help here?
I obviously know the ip:port combo of my phoenix application so given these, how do I expose a websockex websocket on that ip:port? In other words, what should I pass as the URL? in this basic example code:
defmodule WebSocketExample do
use WebSockex
def start_link(url, state) do
WebSockex.start_link(url, __MODULE__, state)
end
def handle_frame({type, msg}, state) do
IO.puts "Received Message - Type: #{inspect type} -- Message: #{inspect msg}"
{:ok, state}
end
def handle_cast({:send, {type, msg} = frame}, state) do
IO.puts "Sending #{type} frame with payload: #{msg}"
{:reply, frame, state}
end
end
Please note that I need to expose a raw websocket, not a Phoenix channel, as the consumer doesn't understand Phoenix channels. If Phoenix can expose a raw websocket then I'll consider that a solution too.
If neither Phoenix nor WebSockex can help, what are my options?
Websockex is a client library, I don't think it has any code for exposing a websocket. Since you're already using phoenix, you probably can do what you need with phoenix channels.
If you're on cowboy (and you probably are, since it's the default), then you can also use it to expose a raw websocket. However, it requires some fiddling with routing. You will need to replace YourAppWeb.Endpoint with a manual configuration of cowboy:
{
Plug.Cowboy,
scheme: :http,
plug: YourAppWeb.Endpoint,
options: endpoint_options(),
dispatch: [
_: [
# Dispatch paths beginning with /ws to a websocket handler
{"/ws/[...]", YourApp.WebsocketHandler, []},
# Dispatch other paths to the phoenix endpoint
{:_, Plug.Cowboy.Handler, {YourAppWeb.Endpoint, endpoint_options()}}
]
]
}
I have honestly only done this with raw plug, so you might need to convert the endpoint to be a Plug instead of a Phoenix.Endpoint. Then, you need to implement YourApp.WebsocketHandler to conform to cowboy's API and perform a websocket upgrade (and handle sending/receiving messages), as described in cowboy docs. You can also see this gist for a more fleshed-out example.
WebSockex implements many callbacks, including, but not limited to WebSockex.handle_connect/2. It holds WebSockex.Conn in a state and passes it to all callbacks.
WebSockex.Conn is a plain old good struct, having socket field.
So from any callback (I’d do it from WebSockex.handle_connect/2) you might share this socket with the process which needs it and use it then from there.
Also, you can borrow some internals and check how the connection is being created.
You’ll see it uses WebSockex.Conn.new/2 that returns an initialized connection, that, in turn, holds a socket. In that case, you’ll be obliged to supervise the process that holds the socket manually.
The power of OSS is all answers are one mouse click far from questions.

Is it possible to use http4k to stream long reponses?

I would like to use http4k to stream a long response. I plan to use Content-type: multipart/x-mixed-replace so I push data to the client quite endlessly. In http4k, we have typealias HttpHandler = (Request) -> Response. But my handler cannot return a response because it is not a limited document that I want to return but an endless stream. Is this means that I should use something else for what I want?
If you're pulling from another HTTP source, you can use the streaming body mode on one of the various HTTP client modules (Apache/OkHttp/Jetty will work).
Alternatively if you're generating the content yourself or streaming from a database, you'll have to start a Thread and handle it that way. There's an example of how to do this in the source code in a test case that is used to prove the various clients can do streaming.
https://github.com/http4k/http4k/blob/master/http4k-core/src/test/kotlin/org/http4k/streaming/StreamingContract.kt
Could be that websocket is what you need?
https://www.http4k.org/blog/typesafe_websockets/
So you can have an endless stream of event (e.g you need to push a feed).

Is this example tcp socket programming sequence of events safe?

I plan on having two services.
HTTP REST service written in Ruby
JSON RPC service written in Go
The Ruby service will open a TCP socket connection to a Go JSON RPC service. It'll do this for each incoming HTTP request it receives. It will send some data over the socket to the Go service and that service will subsequently send back the corresponding data back down the socket.
Go code
The Go service go would look something like this (simplified):
srv := new(service.App) // this would expose a Process method
rpc.Register(srv)
listener, err := net.Listen("tcp", ":8080")
if err != nil {
// handle error
}
for {
conn, err := listener.Accept()
if err != nil {
// handle error
}
go jsonrpc.ServeConn(conn)
}
Notice we serve the incoming connection using a goroutine, so we can handle requests concurrently.
Ruby code
Below is a simple snippet of Ruby code that demonstrates (in theory) the way I would send data to the Go service:
require "socket"
require "json"
socket = TCPSocket.new "localhost", "8080"
b = {
:method => "App.Process",
:params => [{ :Config => JSON.generate({ :foo => :bar }) }],
:id => "0"
}
socket.write(JSON.dump(b))
response = JSON.load socket.readline
My concern is: will this be a safe sequence of events?
I'm not asking if this will be 'thread safe', because i'm not worried about manipulating shared memory across the go routines. I'm more concerned around whether my Ruby HTTP service will get back the data it's expecting?
If I have two parallel requests coming into my HTTP Service (or maybe the Ruby app is hosted behind a load balancer and so different instances of the HTTP service is handling multiple requests), then I could have instance A send the message Foo to the Go service; while instance B sends the message Bar.
The business logic inside the Go service will return different responses depending on its input so I want to be sure that Ruby instance A gets back the correct response for Foo, and B gets back the correct response for Bar.
I assume a socket connection is more like a queue in that if instance A makes a request to the Go service first and then B does, but B is quicker responding for whatever reason, then the Go service will write the response for B to the socket and instance A of the Ruby app will end up reading in the wrong socket data (this is obviously just one possible scenario considering that I could get lucky and have instance B read the socket data before instance A does).
Solutions?
I'm not sure if there is simple solution to this problem. Unless I don't use a TCP socket or RPC and instead rely on standard HTTP in the Go service. But I wanted the performance and less overhead of TCP.
I'm worried the design could get more complicated by maybe having to implement an external queue as a way of synchronising the responses with the Ruby service.
It maybe because the nature of my Ruby service is fundamentally synchronous (HTTP response/request) that I have no option but to switch to HTTP for the Go service.
But wanted to double check with the community first just in case I'm missing something obvious.
Yes this is safe if you create a new connection every time.
That said there are latent issues with your approach:
TCP connections are rather expensive to establish, so you probably want to re-use connections with a connection pool
If you make too many simultaneous requests you will exhaust ports/open file descriptors which will cause your program to crash
You don't have any timeouts in place, so it's possible to end up with orphaned TCP connections which never complete (either because of something bad on the Go side, or network problems)
I think you'd be better off using HTTP (despite the overhead) since libraries are already written to cope with these problems. HTTP is also much more debuggable since you can just curl an endpoint to test it.
Personally I'd probably go with gRPC.

Ruby tcpserver client server

I have an application that I am coding to have the logging info be sent over tcpsocket to a server and have a monitor client connect to the server to view the logging data . So far i am able to get to the stage where the info is sent to the server however I need some thoughts on how to go about the next stage. Using Ruby tcpsever what methodologies can I use to have the server resend the incoming data to a client? How can I have data stored across threads?
require "socket"
server_socket = TCPServer.new('localhost', 2200)
loop do
# Create a new thread for each connection.
Thread.start(server_socket.accept) do |session|
# check if received is viewer request
line = session.gets
if line =~ /viewer/
#filter = line[/\:(.*?)\:/]
session.puts "Listining for #{filter}"
loop do
if (log = ### need input here from logging app ###)
# Show if filter is set for all or cli matches to filter
if #filter == ':all:' || log =~ /\:(.*?)\:/
session.puts log
end
# Read trace_viewer input.
if session.gets =~ /quit/
# close the connections
session.puts "Closing connection. Bye!"
session.close
break
end
end
end
end
end
end
With clients connecting to the server it sounds like a typical client/server configuration, with some clients sending data, and others requesting it. Is there any reason you don't use a standard HTTP client/server, i.e., a web server, instead of reinventing the wheel? Use Sinatra or Padrino, or even Rails and you'll be mostly finished.
How can I have data stored across threads?
Ruby's Thread module includes Queue, which is good for moving data around between threads. The document page has an example which should help.
The same ideas for using a queue would apply to using tables. I'd recommend using a database to act as a queue for your use, rather than do it in memory. A power outage, or app crash will lose all the data if it's an in-memory queue and the clients haven't retrieved everything. Writing and reading a database means the data would survive such problems.
Keep the schema simple, provide a reasonable index on it, and it should be fast enough for most uses. You'll need some housekeeping code to keep the database clean, but that's easy using SQL, or an ORM like Sequel or ActiveRecord.

What's the fastest way for a true sinatra(ruby/rack) after_filter?

Okay it's a simple task. After I render html to the client I want to execute a db call with information from the request.
I am using sinatra because it's a lightweight microframework, but really i up for anything in ruby, if it's faster/easier(Rack?). I just want to get the url and redirect the client somewhere else based on the url.
So how does one go about with rack/sinatra a real after_filter. And by after_filter I mean after response is sent to the client. Or is that just not dooable without threads?
I forked sinatra and added after filters, but there is no way to flush the response, even send_data which is suppose to stream files(which is obviously for binary) waits for the after_filters.
I've seen this question: Multipart-response-in-ruby but the answer is for rails. And i am not sure if it really flushes the response to the client and then allows for processing afterwards.
Rack::Callbacks has some before and after callbacks but even those look like they would run before the response is ever sent to the client here's Rack::Callbacks implementation(added comment):
def call(env)
#before.each {|c| c.call(env) }
response = #app.call(env)
#after.each {|c| c.call(env) }
response
#i am guessing when this method returns then the response is sent to the client.
end
So i know I could call a background task through the shell with rake. But it would be nice not to have too... Also there is NeverBlock but is that good for executing a separate process without delaying the response or would it still make the app wait as whole(i think it would)?
I know this is a lot, but in short it's simple after_filter that really runs after the response is sent in ruby/sinatra/rack.
Thanks for reading or answering my question! :-)
Modified run_later port to rails to do the trick the file is available here:
http://github.com/pmamediagroup/sinatra_run_later/tree/master

Resources