Capturing ElasticsearchSink Exceptions in Flink - elasticsearch

I've recently been encountering some issues that I've noticed in the logs
of my Flink job that handles writing to an Elasticsearch index. I was
hoping to leverage some of the metrics that Flink exposes (or piggyback on
them) to update metric counters when I encounter specific kinds of errors.
val builder = ElasticsearchSink.Builder(...)
builder.setFailureHandler { actionRequest, throwable, _, _ ->
// Log error here (and update metrics via metricGroup.counter(...)
}
return builder.build()
Currently, I don't have any "context" when the callback for the setFailureHandler occurs, and while I can log it, ideally I'd like to expose a metric to track how frequently this is occurring:
builder.setFailureHandler ( actionRequest, throwable, _, _ ->
elasticExceptionsCounter.inc()
}
One additional wrinkle here is that my specific scenario relies on dynamically creating and handling these sinks via a router like the following:
class DynamicElasticsearchSink<ElementT, RouteT, SinkT : ElasticsearchSinkBase<ElementT, out AutoCloseable>>(
private val sinkRouter: ElasticsearchSinkRouter<ElementT, RouteT, SinkT>
) : RichSinkFunction<ElementT>(), CheckpointedFunction {
// Store a reference to all of the current routes
private val sinkRoutes: MutableMap<RouteT, SinkT> = ConcurrentHashMap()
private lateinit var configuration: Configuration
override fun open(parameters: Configuration) {
configuration = parameters
}
override fun invoke(value: ElementT, context: SinkFunction.Context) {
val route = sinkRouter.getRoute(value)
var sink = sinkRoutes[route]
if (sink == null) {
// Build a new sink for this key and cache it for later use based on incoming records
sink = sinkRouter.createSink(route, value)
sink.runtimeContext = runtimeContext
sink.open(configuration)
sinkRoutes[route] = sink
}
sink.invoke(value, context)
}
// Omitted for brevity
}
and the sinkRouter.createSink() looks like the following:
override fun createSink(cacheKey: String, element: JsonObject): ElasticsearchSink<JsonObject> {
return buildSinkFromRoute(element)
}
private fun buildSinkFromRoute(element: JsonObject): ElasticsearchSink<JsonObject> {
val builder = ElasticsearchSink.Builder(
buildHostsFromElement(element),
ElasticsearchRoutingFunction()
)
// Various configuration omitted for brevity
builder.setFailureHandler { actionRequest, throwable, _, _ ->
// Here's where I'd like to capture the failures and record them as metrics
}
return builder.build()
}
Is there a way to support this currently or what options are available for handing this?

Related

How to add arguments to a org.gradle.api.Action?

I'm writing a custom Gradle plugin that registers an extension and uses Actions for configuring it so you can use it like
myExtension {
info {
someConfig("a") {
// config for someConfig["a"]
}
someConfig("b") {
// config for someConfig["b"]
}
}
}
Based on how much someConfig you define, the plugin created new Gradle tasks dynamically. I've seen some scenarios in which it'd useful to configure all someConfig with the same values, so instead of writing
myExtension {
info {
someConfig("a") {
someValue = x
}
someConfig("b") {
someValue = x
}
}
}
I was thinking in providing something like
myExtension {
info {
someConfig("a")
someConfig("b")
allConfigs { configName ->
someValue = x
}
}
}
When trying to code it, I've used an objectFactory for creating instances of Action so I can configure the extension values
val allMyConfigs = mutableMapOf<String, SomeConfig>()
fun someConfig(configName: String, init: Action<in EnvironmentInfo>) {
val myConfig = allMyConfigs.computeIfAbsent(env) {
objectFactory.newInstance(SomeConfig::class.java, env)
}
init.execute(environmentInfo)
}
but since Action is a (SomeConfig) -> Unit (i.e. a function receiving a SomeConfig instance) I'm unable to pass the config name (an String) as param
Is there a way to create an Action, so it's compatible with both build.gradle and build.gradle.kts files, that receives an extra parameter and maintain the SomeConfig as delegate?

Springboot Kickstart GraphQL - Metrics for number of requests per resolver

I am currently trying in SpringBoot GraphQL kickstart to track the number of times each resolver method is called. To be more specific, I want to exactly how many times the methods of my GraphQLResolver<T> are called. This would have two utilities:
Track if the deprecated resolvers are still used
Know which fields are the most used, in order to optimize the database queries for those
To do so, I implemented a really weird and not-so-clean way using schema directive wiring.
#Component
class ResolverUsageCountInstrumentation(
private val meterRegistry: MeterRegistry
) : SchemaDirectiveWiring {
private val callsRecordingMap = ConcurrentHashMap<String, Int>()
override fun onField(environment: SchemaDirectiveWiringEnvironment<GraphQLFieldDefinition>): GraphQLFieldDefinition {
val fieldContainer = environment.fieldsContainer
val fieldDefinition = environment.fieldDefinition
val currentDF = environment.codeRegistry.getDataFetcher(fieldContainer, fieldDefinition)
if (currentDF.javaClass.name != "graphql.kickstart.tools.resolver.MethodFieldResolverDataFetcher") {
return fieldDefinition
}
val signature = getMethodSignature(unwrappedDF)
callsRecordingMap[signature] = 0
val newDF = DataFetcherFactories.wrapDataFetcher(currentDF) { dfe: DataFetchingEnvironment, value: Any? ->
callsRecordingMap.computeIfPresent(signature) { _, current: Int -> current + 1 }
value
}
environment.codeRegistry.dataFetcher(fieldContainer, fieldDefinition, newDF)
return fieldDefinition
}
private fun getMethodSignature(currentDF: DataFetcher<*>): String {
val method = getFieldVal(currentDF, "resolverMethod", true) as Method // nonapi.io.github.classgraph.utils.ReflectionUtils
return "${method.declaringClass.name}#${method.name}"
}
}
This technique does the work, but has the big disadvantage of not working if the data fetcher is wrapped. Along with that, it's not really clean at all. I'm wondering, would there be a better way to do this?
Thank you!

Kotlin Coroutines validate running on same Dispatcher

I have a custom Scope that is using a single thread as it's Dispatcher.
private val jsDispatcher = Executors.newSingleThreadExecutor().asCoroutineDispatcher()
private val jsScope = CoroutineScope(jsDispatcher + SupervisorJob() + CoroutineName("JS-Thread"))
Let's assume I have a code block that uses the above scope to launch a new coroutine and call multiple suspend methods
jsScope.launch {
sampleMethod()
sampleMethod2()
sampleMethod3()
}
I need to validate and throw an exception if one of the above sample methods is not running on the above JS thread
private suspend fun sampleMethod() = coroutineScope {
//Implement me
validateThread()
}
How can this be enforced?
You can check the current thread name in your method:
private suspend fun sampleMethod() = coroutineScope {
assert(Thread.currentThread().name == "js-thread") // Doesn't work!
}
However, newSingleThreadExecutor uses DefaultThreadFactory which produces thread names like pool-N-thread-M which cannot really be validated because you don't know M or N. I see two solutions here:
Take advantage of the fact that you have a single thread and change its name as soon as you create the executor:
runBlocking {
jsScope.launch {
Thread.currentThread().name = "js-thread"
}
}
Pass a custom thread factory: Executors.newSingleThreadExecutor(MyThreadFactory("js-thread"))
private class MyThreadFactory(private val name: String) : ThreadFactory {
private val group: ThreadGroup
private val threadNumber = AtomicInteger(1)
init {
val s = System.getSecurityManager()
group = if (s != null) {
s.threadGroup
} else {
Thread.currentThread().threadGroup
}
}
override fun newThread(r: Runnable): Thread {
val t = Thread(group, r, "$name-${threadNumber.getAndIncrement()}", 0)
if (t.isDaemon) {
t.isDaemon = false
}
if (t.priority != Thread.NORM_PRIORITY) {
t.priority = Thread.NORM_PRIORITY
}
return t
}
}
Code was adapted from DefaultThreadFactory. Guava and apache-commons also provide utility methods to do the same. This has the advantage that it works for any thread pool, not just single-threaded.
After some research, I took a look at the withContext() implementation and the answer to my question was right there.
Taken from the withContext() implementation, this is how to check if current coroutine context is on same dispatcher as other context/scope
if (newContext[ContinuationInterceptor] === oldContext[ContinuationInterceptor]) {
// same dispatcher
}

How to create new chat room with play framework websocket?

I tried the chat example with websocket in play framework 2.6.x. It works fine. Now for the real application, I need to create multiple chat rooms based on user requests. And users will be able to access different chatrooms with an id or something. I think it might related to create a new flow for each room. Related code is here:
private val (chatSink, chatSource) = {
val source = MergeHub.source[WSMessage]
.log("source")
.map { msg =>
try {
val json = Json.parse(msg)
inputSanitizer.sanText((json \ "msg").as[String])
} catch {
case e: Exception => println(">>" + msg)
"Malfunction client"
}
}
.recoverWithRetries(-1, { case _: Exception ⇒ Source.empty })
val sink = BroadcastHub.sink[WSMessage]
source.toMat(sink)(Keep.both).run()
}
private val userFlow: Flow[WSMessage, WSMessage, _] = {
Flow.fromSinkAndSource(chatSink, chatSource)
}
But I really don't know how to create new flow with id and access it later. Can anyone help me on this?
I finally figured it out. Post the solution here in case anyone has similar problems.
My solution is to use the AsyncCacheApi to store Flows in cache with keys. Generate a new Flow when necessary instead of creating just one Sink and Source:
val chatRoom = cache.get[Flow[WSMessage, WSMessage, _]](s"id=$id")
chatRoom.map{room=>
val flow = if(room.nonEmpty) room.get else createNewFlow
cache.set(s"id=$id", flow)
Right(flow)
}
def createNewFlow: Flow[WSMessage, WSMessage, _] = {
val (chatSink, chatSource) = {
val source = MergeHub.source[WSMessage]
.map { msg =>
try {
inputSanitizer.sanitize(msg)
} catch {
case e: Exception => println(">>" + msg)
"Malfunction client"
}
}
.recoverWithRetries(-1, { case _: Exception ⇒ Source.empty })
val sink = BroadcastHub.sink[WSMessage]
source.toMat(sink)(Keep.both).run()
}
Flow.fromSinkAndSource(chatSink, chatSource)
}

In Spring Webflux how to go from an `OutputStream` to a `Flux<DataBuffer>`?

I'm building a tarball dynamically, and would like to stream it back directly, which should be 100% possible with a .tar.gz.
The below code is the closest thing I could get to a dataBuffer, through lots of googling. Basically, I need something that implements an OutputStream and provides, or publishes, to a Flux<DataBuffer> so that I can return that from my method, and have streaming output, instead of buffering the entire tarball in ram (which I'm pretty sure is what is happening here). I'm using apache Compress-commons, which has a wonderful API, but it's all OutputStream based.
I suppose another way to do it would be to directly write to the response, but I don't think that would be properly reactive? Not sure how to get an OutputStream out of some sort of Response object either.
This is kotlin btw, on Spring Boot 2.0
#GetMapping("/cookbook.tar.gz", "/cookbook")
fun getCookbook(): Mono<DefaultDataBuffer> {
log.info("Creating tarball of cookbooks: ${soloConfig.cookbookPaths}")
val transformation = Mono.just(soloConfig.cookbookPaths.stream()
.toList()
.flatMap {
Files.walk(Paths.get(it)).map(Path::toFile).toList()
})
.map { files ->
//Will make one giant databuffer... but oh well? TODO: maybe use some kind of chunking.
val buffer = DefaultDataBufferFactory().allocateBuffer()
val outputBufferStream = buffer.asOutputStream()
//Transform my list of stuff into an archiveOutputStream
TarArchiveOutputStream(GzipCompressorOutputStream(outputBufferStream)).use { taos ->
taos.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU)
log.info("files to compress: ${files}")
for (file in files) {
if (file.isFile) {
val entry = "cookbooks/" + file.name
log.info("Adding ${entry} to tarball")
taos.putArchiveEntry(TarArchiveEntry(file, entry))
FileInputStream(file).use { fis ->
fis.copyTo(taos) //Copy that stuff!
}
taos.closeArchiveEntry()
}
}
}
buffer
}
return transformation
}
I puzzled through this, and have an effective solution. You implement an OutputStream and take those bytes and publish them into a stream. Be sure to override close, and send an onComplete. Works great!
#RestController
class SoloController(
val soloConfig: SoloConfig
) {
val log = KotlinLogging.logger { }
#GetMapping("/cookbooks.tar.gz", "/cookbooks")
fun streamCookbook(serverHttpResponse: ServerHttpResponse): Flux<DataBuffer> {
log.info("Creating tarball of cookbooks: ${soloConfig.cookbookPaths}")
val publishingOutputStream = PublishingOutputStream(serverHttpResponse.bufferFactory())
//Needs to set up cookbook path as a parent directory, and then do `cookbooks/$cookbook_path/<all files>` for each cookbook path given
Flux.just(soloConfig.cookbookPaths.stream().toList())
.doOnNext { paths ->
//Transform my list of stuff into an archiveOutputStream
TarArchiveOutputStream(GzipCompressorOutputStream(publishingOutputStream)).use { taos ->
taos.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU)
paths.forEach { cookbookDir ->
if (Paths.get(cookbookDir).toFile().isDirectory) {
val cookbookDirFile = Paths.get(cookbookDir).toFile()
val directoryName = cookbookDirFile.name
val entryStart = "cookbooks/${directoryName}"
val files = Files.walk(cookbookDirFile.toPath()).map(Path::toFile).toList()
log.info("${files.size} files to compress")
for (file in files) {
if (file.isFile) {
val relativePath = file.toRelativeString(cookbookDirFile)
val entry = "$entryStart/$relativePath"
taos.putArchiveEntry(TarArchiveEntry(file, entry))
FileInputStream(file).use { fis ->
fis.copyTo(taos) //Copy that stuff!
}
taos.closeArchiveEntry()
}
}
}
}
}
}
.subscribeOn(Schedulers.parallel())
.doOnComplete {
publishingOutputStream.close()
}
.subscribe()
return publishingOutputStream.publisher
}
class PublishingOutputStream(bufferFactory: DataBufferFactory) : OutputStream() {
val publisher: UnicastProcessor<DataBuffer> = UnicastProcessor.create(Queues.unbounded<DataBuffer>().get())
private val bufferPublisher: UnicastProcessor<Byte> = UnicastProcessor.create(Queues.unbounded<Byte>().get())
init {
bufferPublisher
.bufferTimeout(4096, Duration.ofMillis(100))
.doOnNext { intList ->
val buffer = bufferFactory.allocateBuffer(intList.size)
buffer.write(intList.toByteArray())
publisher.onNext(buffer)
}
.doOnComplete {
publisher.onComplete()
}
.subscribeOn(Schedulers.newSingle("publisherThread"))
.subscribe()
}
override fun write(b: Int) {
bufferPublisher.onNext(b.toByte())
}
override fun close() {
bufferPublisher.onComplete() //which should trigger the clean up of the whole thing
}
}
}

Resources