Why zmq ( inproc:// )-connection's order matters, unlike for ( tcp:// )? - zeromq

When launching a zmq server and client, in any random order, communicating over the tcp:// transport-class, they are smart enough to connect/reconnect regardless of the order.
However, when trying to run the same over the inproc:// transport-class, I see that it works only if the client starts after the server. How can we avoid this?
MCVE-code :
Here are some kotlin MCVE-code examples, to reproduce the claim (this is a modified version of the well known weather example)
server.kt - run this to run the server standalone
package sandbox.zmq
import org.zeromq.ZMQ
import org.zeromq.ZMQ.Context
import sandbox.util.Util.sout
import java.util.*
fun main(args: Array<String>) {
server(
context = ZMQ.context(1),
// publishTo = "tcp://localhost:5556"
publishTo = "tcp://localhost:5557"
)
}
fun server(context: Context, publishTo: String) {
val publisher = context.socket(ZMQ.PUB)
publisher.bind(publishTo)
// Initialize random number generator
val srandom = Random(System.currentTimeMillis())
while (!Thread.currentThread().isInterrupted) {
// Get values that will fool the boss
val zipcode: Int
val temperature: Int
val relhumidity: Int
zipcode = 10000 + srandom.nextInt(10)
temperature = srandom.nextInt(215) - 80 + 1
relhumidity = srandom.nextInt(50) + 10 + 1
// Send message to all subscribers
val update = String.format("%05d %d %d", zipcode, temperature, relhumidity)
println("server >> $update")
publisher.send(update, 0)
Thread.sleep(500)
}
publisher.close()
context.term()
}
client.kt - run this for the client standalone
package sandbox.zmq
import org.zeromq.ZMQ
import org.zeromq.ZMQ.Context
import java.util.*
fun main(args: Array<String>) {
client(
context = ZMQ.context(1),
readFrom = "tcp://localhost:5557"
)
}
fun client(context: Context, readFrom: String) {
// Socket to talk to server
println("Collecting updates from weather server")
val subscriber = context.socket(ZMQ.SUB)
// subscriber.connect("tcp://localhost:");
subscriber.connect(readFrom)
// Subscribe to zipcode, default is NYC, 10001
subscriber.subscribe("".toByteArray())
// Process 100 updates
var update_nbr: Int
var total_temp: Long = 0
update_nbr = 0
while (update_nbr < 10000) {
// Use trim to remove the tailing '0' character
val string = subscriber.recvStr(0).trim { it <= ' ' }
println("client << $string")
val sscanf = StringTokenizer(string, " ")
val zipcode = Integer.valueOf(sscanf.nextToken())
val temperature = Integer.valueOf(sscanf.nextToken())
val relhumidity = Integer.valueOf(sscanf.nextToken())
total_temp += temperature.toLong()
update_nbr++
}
subscriber.close()
}
inproc.kt - run this and modify which sample is called for the inproc:// scenarios
package sandbox.zmq
import org.zeromq.ZMQ
import kotlin.concurrent.thread
fun main(args: Array<String>) {
// clientFirst()
clientLast()
}
fun println(string: String) {
System.out.println("${Thread.currentThread().name} : $string")
}
fun clientFirst() {
val context = ZMQ.context(1)
val client = thread {
client(
context = context,
readFrom = "inproc://backend"
)
}
// use this to maintain order
Thread.sleep(10)
val server = thread {
server(
context = context,
publishTo = "inproc://backend"
)
}
readLine()
client.interrupt()
server.interrupt()
}
fun clientLast() {
val context = ZMQ.context(1)
val server = thread {
server(
context = context,
publishTo = "inproc://backend"
)
}
// use this to maintain order
Thread.sleep(10)
val client = thread {
client(
context = context,
readFrom = "inproc://backend"
)
}
readLine()
client.interrupt()
server.interrupt()
}

Why zmq inproc:// connection order matters, unlike for tcp://?
Well, this is a by-design behaviour
Given the native ZeroMQ API warns about this by-design behaviour ( since ever ), the issue is not a problem, but an intended property.
Plus one additional property has to be also met:
The name [ meant an_endpoint_name in .connect("inproc://<_an_endpoint_name_>")] must have been previously created by assigning it to at least one socketwithin the same ØMQ context as the socket being connected.
Newer versions of the native ZeroMQ API ( post 4.0 ), if indeed deployed under one's respective language binding / wrapper, may allow to release the former of these requirements:
Since version 4.0 the order of zmq_bind() and zmq_connect() does not matter just like for the tcp transport type.
How can we avoid this?
Well, a much harder part ...
if not already got an easy way above the ZeroMQ native API v4.2+, one may roll up one's sleeves and either re-factor the pre-4.x language wrapper / binding, so as to make the engine get there, or, may be, test if Martin SUSTRIK's second lovely child, the nanomsg could fit the scene for achieving this.

Related

How to Read Files in Flink FlatMapFunction

I am building a Flink pipeline and based on live input data need to read records from archive files in a RichFlatMapFunction (e.g. each day I want to read files from the previous day and week). I'm wondering what is the best way to do that?
I could use the Hadoop APIs directly, so that is what I'm trying next.
That would be something like this:
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
class LoadHistory(
var basePath: String,
var pathTemplate: String,
) extends RichFlatMapFunction[(TypeAlias.GridId, TypeAlias.Timestamp), ArchiveRecord] {
// see
// https://programmerall.com/article/34422316834/
// https://stackoverflow.com/questions/37085528/hadoop-with-binary-files
// https://data-flair.training/blogs/hdfs-data-read-operation
val fileSystem = FileSystem.get(new conf.Configuration())
def formatPath(pathTemplate: String, gridId: TypeAlias.GridId, archiveDate: TypeAlias.Timestamp): String = ???
override def flatMap(value: (TypeAlias.GridId, TypeAlias.Timestamp), out: Collector[ArchiveRecord]): Unit = {
val pathStr = formatPath(pathTemplate, value._1, value._2)
val path = new Path(pathStr)
if (!fileSystem.exists(path)) {
return
}
val in: FSDataInputStream = fileSystem.open(path)
if (pathStr.endsWith(".protobuf")) {
// TODO read file
} else {
assert(pathStr.endsWith(".lz4"))
// TODO read file
}
}
}
I'm new with Hadoop, so I figure I'll need to configure it before reading data from cloud storage (e.g. replace new Configuration() with something meaningful). I know Flink uses Hadoop to read files internally, so I am wondering if I can access the configuration or configured HadoopFileSystem object being used by Flink at runtime.
Previously I tried starting a Flink batch job inside the FlatMapFunction (ending with env.collect), but it seems to have resulted in thread-locking (job 2 won't start until job 1 is done).
I dug into the Flink source code a little bit and found a way to get an initialized org.apache.flink.core.fs.FileSystem object from a org.apache.flink.core.fs.Path. Then that can be used to read the files:
import org.apache.flink.core.fs.Path;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.core.fs.FSDataInputStream;
class LoadHistory(
var basePath: String,
var pathTemplate: String,
) extends RichFlatMapFunction[(TypeAlias.GridId, TypeAlias.Timestamp), ArchiveRecord] {
val fileSystem = new Path(basePath).getFileSystem()
def formatPath(gridId: TypeAlias.GridId, archiveDate: TypeAlias.Timestamp): String = ???
override def flatMap(value: (TypeAlias.GridId, TypeAlias.Timestamp), out: Collector[ArchiveRecord]): Unit = {
val pathStr = formatPath(value._1, value._2)
val path = new Path(pathStr)
if (!fileSystem.exists(path)) {
return
}
val in: FSDataInputStream = fileSystem.open(path)
if (pathStr.endsWith(".protobuf")) {
// TODO read file
} else {
assert(pathStr.endsWith(".lz4"))
// TODO read file
}
}
}

Capturing ElasticsearchSink Exceptions in Flink

I've recently been encountering some issues that I've noticed in the logs
of my Flink job that handles writing to an Elasticsearch index. I was
hoping to leverage some of the metrics that Flink exposes (or piggyback on
them) to update metric counters when I encounter specific kinds of errors.
val builder = ElasticsearchSink.Builder(...)
builder.setFailureHandler { actionRequest, throwable, _, _ ->
// Log error here (and update metrics via metricGroup.counter(...)
}
return builder.build()
Currently, I don't have any "context" when the callback for the setFailureHandler occurs, and while I can log it, ideally I'd like to expose a metric to track how frequently this is occurring:
builder.setFailureHandler ( actionRequest, throwable, _, _ ->
elasticExceptionsCounter.inc()
}
One additional wrinkle here is that my specific scenario relies on dynamically creating and handling these sinks via a router like the following:
class DynamicElasticsearchSink<ElementT, RouteT, SinkT : ElasticsearchSinkBase<ElementT, out AutoCloseable>>(
private val sinkRouter: ElasticsearchSinkRouter<ElementT, RouteT, SinkT>
) : RichSinkFunction<ElementT>(), CheckpointedFunction {
// Store a reference to all of the current routes
private val sinkRoutes: MutableMap<RouteT, SinkT> = ConcurrentHashMap()
private lateinit var configuration: Configuration
override fun open(parameters: Configuration) {
configuration = parameters
}
override fun invoke(value: ElementT, context: SinkFunction.Context) {
val route = sinkRouter.getRoute(value)
var sink = sinkRoutes[route]
if (sink == null) {
// Build a new sink for this key and cache it for later use based on incoming records
sink = sinkRouter.createSink(route, value)
sink.runtimeContext = runtimeContext
sink.open(configuration)
sinkRoutes[route] = sink
}
sink.invoke(value, context)
}
// Omitted for brevity
}
and the sinkRouter.createSink() looks like the following:
override fun createSink(cacheKey: String, element: JsonObject): ElasticsearchSink<JsonObject> {
return buildSinkFromRoute(element)
}
private fun buildSinkFromRoute(element: JsonObject): ElasticsearchSink<JsonObject> {
val builder = ElasticsearchSink.Builder(
buildHostsFromElement(element),
ElasticsearchRoutingFunction()
)
// Various configuration omitted for brevity
builder.setFailureHandler { actionRequest, throwable, _, _ ->
// Here's where I'd like to capture the failures and record them as metrics
}
return builder.build()
}
Is there a way to support this currently or what options are available for handing this?

Springboot Kickstart GraphQL - Metrics for number of requests per resolver

I am currently trying in SpringBoot GraphQL kickstart to track the number of times each resolver method is called. To be more specific, I want to exactly how many times the methods of my GraphQLResolver<T> are called. This would have two utilities:
Track if the deprecated resolvers are still used
Know which fields are the most used, in order to optimize the database queries for those
To do so, I implemented a really weird and not-so-clean way using schema directive wiring.
#Component
class ResolverUsageCountInstrumentation(
private val meterRegistry: MeterRegistry
) : SchemaDirectiveWiring {
private val callsRecordingMap = ConcurrentHashMap<String, Int>()
override fun onField(environment: SchemaDirectiveWiringEnvironment<GraphQLFieldDefinition>): GraphQLFieldDefinition {
val fieldContainer = environment.fieldsContainer
val fieldDefinition = environment.fieldDefinition
val currentDF = environment.codeRegistry.getDataFetcher(fieldContainer, fieldDefinition)
if (currentDF.javaClass.name != "graphql.kickstart.tools.resolver.MethodFieldResolverDataFetcher") {
return fieldDefinition
}
val signature = getMethodSignature(unwrappedDF)
callsRecordingMap[signature] = 0
val newDF = DataFetcherFactories.wrapDataFetcher(currentDF) { dfe: DataFetchingEnvironment, value: Any? ->
callsRecordingMap.computeIfPresent(signature) { _, current: Int -> current + 1 }
value
}
environment.codeRegistry.dataFetcher(fieldContainer, fieldDefinition, newDF)
return fieldDefinition
}
private fun getMethodSignature(currentDF: DataFetcher<*>): String {
val method = getFieldVal(currentDF, "resolverMethod", true) as Method // nonapi.io.github.classgraph.utils.ReflectionUtils
return "${method.declaringClass.name}#${method.name}"
}
}
This technique does the work, but has the big disadvantage of not working if the data fetcher is wrapped. Along with that, it's not really clean at all. I'm wondering, would there be a better way to do this?
Thank you!

Kotlin Coroutines validate running on same Dispatcher

I have a custom Scope that is using a single thread as it's Dispatcher.
private val jsDispatcher = Executors.newSingleThreadExecutor().asCoroutineDispatcher()
private val jsScope = CoroutineScope(jsDispatcher + SupervisorJob() + CoroutineName("JS-Thread"))
Let's assume I have a code block that uses the above scope to launch a new coroutine and call multiple suspend methods
jsScope.launch {
sampleMethod()
sampleMethod2()
sampleMethod3()
}
I need to validate and throw an exception if one of the above sample methods is not running on the above JS thread
private suspend fun sampleMethod() = coroutineScope {
//Implement me
validateThread()
}
How can this be enforced?
You can check the current thread name in your method:
private suspend fun sampleMethod() = coroutineScope {
assert(Thread.currentThread().name == "js-thread") // Doesn't work!
}
However, newSingleThreadExecutor uses DefaultThreadFactory which produces thread names like pool-N-thread-M which cannot really be validated because you don't know M or N. I see two solutions here:
Take advantage of the fact that you have a single thread and change its name as soon as you create the executor:
runBlocking {
jsScope.launch {
Thread.currentThread().name = "js-thread"
}
}
Pass a custom thread factory: Executors.newSingleThreadExecutor(MyThreadFactory("js-thread"))
private class MyThreadFactory(private val name: String) : ThreadFactory {
private val group: ThreadGroup
private val threadNumber = AtomicInteger(1)
init {
val s = System.getSecurityManager()
group = if (s != null) {
s.threadGroup
} else {
Thread.currentThread().threadGroup
}
}
override fun newThread(r: Runnable): Thread {
val t = Thread(group, r, "$name-${threadNumber.getAndIncrement()}", 0)
if (t.isDaemon) {
t.isDaemon = false
}
if (t.priority != Thread.NORM_PRIORITY) {
t.priority = Thread.NORM_PRIORITY
}
return t
}
}
Code was adapted from DefaultThreadFactory. Guava and apache-commons also provide utility methods to do the same. This has the advantage that it works for any thread pool, not just single-threaded.
After some research, I took a look at the withContext() implementation and the answer to my question was right there.
Taken from the withContext() implementation, this is how to check if current coroutine context is on same dispatcher as other context/scope
if (newContext[ContinuationInterceptor] === oldContext[ContinuationInterceptor]) {
// same dispatcher
}

Akka HTTP REST API for producing to Kafka Performance

I'm building an API with Akka that should produce to a Kafka bus. I have been load testing the application using Gatling. Noticed that when more than 1000 users are created in Gatling, the API starts to struggle. On average, about 170 requests per second are handled, which seems like very little to me.
The API's main entry point is this:
import akka.actor.{Props, ActorSystem}
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import akka.pattern.ask
import akka.http.scaladsl.server.Directives
import akka.http.scaladsl.unmarshalling.Unmarshaller
import akka.stream.ActorMaterializer
import com.typesafe.config.{Config, ConfigFactory}
import play.api.libs.json.{JsObject, Json}
import scala.concurrent.{Future, ExecutionContext}
import akka.http.scaladsl.server.Directives._
import akka.util.Timeout
import scala.concurrent.duration._
import ExecutionContext.Implicits.global
case class PostMsg(msg:JsObject)
case object PostSuccess
case class PostFailure(msg:String)
class Msgapi(conf:Config) {
implicit val um:Unmarshaller[HttpEntity, JsObject] = {
Unmarshaller.byteStringUnmarshaller.mapWithCharset { (data, charset) =>
Json.parse(data.toArray).asInstanceOf[JsObject]
}
}
implicit val system = ActorSystem("MsgApi")
implicit val timeout = Timeout(5 seconds)
implicit val materializer = ActorMaterializer()
val router = system.actorOf(Props(new RouterActor(conf)))
val route = {
path("msg") {
post {
entity(as[JsObject]) {obj =>
if(!obj.keys.contains("key1") || !obj.keys.contains("key2") || !obj.keys.contains("key3")){
complete{
HttpResponse(status=StatusCodes.BadRequest, entity="Invalid json provided. Required fields: key1, key2, key3 \n")
}
} else {
onSuccess(router ? PostMsg(obj)){
case PostSuccess => {
complete{
Future{
HttpResponse(status = StatusCodes.OK, entity = "Post success")
}
}
}
case PostFailure(msg) =>{
complete{
Future{
HttpResponse(status = StatusCodes.InternalServerError, entity=msg)
}
}
}
case _ => {
complete{
Future{
HttpResponse(status = StatusCodes.InternalServerError, entity = "Unknown Server error occurred.")
}
}
}
}
}
}
}
}
}
def run():Unit = {
Http().bindAndHandle(route, interface = conf.getString("http.host"), port = conf.getInt("http.port"))
}
}
object RunMsgapi {
def main(Args: Array[String]):Unit = {
val conf = ConfigFactory.load()
val api = new Msgapi(conf)
api.run()
}
}
The router actor is as follows:
import akka.actor.{ActorSystem, Props, Actor}
import akka.http.scaladsl.server.RequestContext
import akka.routing.{Router, SmallestMailboxRoutingLogic, ActorRefRoutee}
import com.typesafe.config.Config
import play.api.libs.json.JsObject
class RouterActor(conf:Config) extends Actor{
val router = {
val routees = Vector.tabulate(conf.getInt("kafka.producer-number"))(n => {
val r = context.system.actorOf(Props(new KafkaProducerActor(conf, n )))
ActorRefRoutee(r)
})
Router(SmallestMailboxRoutingLogic(), routees)
}
def receive = {
case PostMsg(msg) => {
router.route(PostMsg(msg), sender())
}
}
}
And finally, the kafka producer actor:
import akka.actor.Actor
import java.util.Properties
import com.typesafe.config.Config
import kafka.message.NoCompressionCodec
import kafka.utils.Logging
import org.apache.kafka.clients.producer._
import play.api.libs.json.JsObject
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, Future, Await}
import ExecutionContext.Implicits.global
import scala.concurrent.{Future, Await}
import scala.util.{Failure, Success}
class KafkaProducerActor(conf:Config, id:Int) extends Actor with Logging {
var topic: String = conf.getString("kafka.topic")
val codec = NoCompressionCodec.codec
val props = new Properties()
props.put("bootstrap.servers", conf.getString("kafka.bootstrap-servers"))
props.put("acks", conf.getString("kafka.acks"))
props.put("retries", conf.getString("kafka.retries"))
props.put("batch.size", conf.getString("kafka.batch-size"))
props.put("linger.ms", conf.getString("kafka.linger-ms"))
props.put("buffer.memory", conf.getString("kafka.buffer-memory"))
props.put("key.serializer", conf.getString("kafka.key-serializer"))
props.put("value.serializer", conf.getString("kafka.value-serializer"))
val producer = new KafkaProducer[String, String](props)
def receive = {
case PostMsg(msg) => {
// push the msg to Kafka
try{
val res = Future{
producer.send(new ProducerRecord[String, String](topic, msg.toString()))
}
val result = Await.result(res, 1 second).get()
sender ! PostSuccess
} catch{
case e: Exception => {
println(e.printStackTrace())
sender ! PostFailure("Kafka push error")
}
}
}
}
}
The idea being that in application.conf I can easily specify how many producers there should be, allowing better horizontal scaling.
Now, however, it seems that the api or router is actually the bottleneck. As a test, I disabled the Kafka producing code, and replaced it with a simple: sender ! PostSuccess. With 3000 users in Gatling, I still had 6% of requests failing due to timeouts, which seems like a very long time to me.
The Gatling test I am executing is the following:
import io.gatling.core.Predef._ // 2
import io.gatling.http.Predef._
import scala.concurrent.duration._
class BasicSimulation extends Simulation { // 3
val httpConf = http // 4
.baseURL("http://localhost:8080") // 5
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") // 6
.doNotTrackHeader("1")
.acceptLanguageHeader("en-US,en;q=0.5")
.acceptEncodingHeader("gzip, deflate")
.userAgentHeader("Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0")
.header("Content-Type", "application/json")
val scn = scenario("MsgLoadTest")
.repeat(100)(
pace(2 seconds)
.exec(http("request_1")
.post("/msg").body(StringBody("""{ "key1":"something", "key2": "somethingElse", "key3":2222}""")).asJSON)
)
setUp( // 11
scn.inject(rampUsers(3000) over (5 seconds)) // 12
).protocols(httpConf) // 13
}
update
Following some pointers from cmbaxter, I tried some things (see discussion in comments), and profiled the application using visualvm during the gatling load test. I don't quite know how to interpret these results though. It seems that a lot of time is spent in the ThreadPoolExecutor, but this might be ok?
Two screenshots from the profiling are below:
To exclude the Kafka producer, I removed the logic from the Actor. I was still getting performance issues. So, as a final test, I reworked the API to simply give a direct answer when a POST came in:
val route = {
path("msg") {
post {
entity(as[String]) { obj =>
complete(
HttpResponse(status = StatusCodes.OK, entity = "OK")
)
}
}
}
}
and I implemented the same route in Spray, to compare performance. The results were clear. Akka HTTP (at least in this current test setup) does not come close to Spray's performance. Perhaps there is some tweaking that can be done for Akka HTTP? I have attached two screenshots of response time graphs for 3000 concurrent users in Gatling, making a post request.
Akka HTTP
Spray
I would eliminate the KafkaProducerActor and router completely and call a Scala wrapped version of producer.send directly. Why create a possible bottleneck if not necessary? I could very well imagine the global execution context or the actor system becoming a bottleneck in your current setup.
Something like this should do the trick:
class KafkaScalaProducer(val producer : KafkaProducer[String, String](props)) {
def send(topic: String, msg : String) : Future[RecordMetadata] = {
val promise = Promise[RecordMetadata]()
try {
producer.send(new ProducerRecord[String, String](topic, msg), new Callback {
override def onCompletion(md : RecordMetadata, e : java.lang.Exception) {
if (md == null) promise.success(md)
else promise.failure(e)
}
})
} catch {
case e : BufferExhaustedException => promise.failure(e)
case e : KafkaException => promise.failure(e)
}
promise.future
}
def close = producer.close
}
(note: I have not actually tried this code. It should be interpreted as pseudo-code)
I would then simply transform the result of the future to a HttpResponse.
After that it's a question of tweaking configuration. Your bottleneck is now either the Kafka Producer or Akka Http.

Resources