kafka-connect-bigquery: Regex based syntax in "topics" - apache-kafka-connect

so I use kafka-connect-bigquery connector
Is it possible to use regular expression in "topics"?
For example I have two topics:
mysql.database.table1
mysql.database.table2
and I want to add it to connector at once:
"topics": "mysql.database.*"
thanks

You can whitelist Kafka topics based regex by replacing the topics property with topics.regex.
Ex.
"topics.regex":"mysql.database.*"

Related

Multiline values in application.yml for spring.cloud.function.definition

We have a Spring boot project using the deprecated #StreamListener and now we are switching to Spring Cloud Stream functional kafka binders.
The problem is that this service connects to multiple kafka topics and our single line spring.cloud.function.definition: topicA;topicB;topicC;...;topicN is becoming very long
I would like to know how to use on Spring's application.yaml the yaml capabilities such as multi line values (such as | or > operators) but I haven't found something similar on documentation.
My goal would be something such as
spring.cloud.function.definition: | topicA;
topicB;
topicC;
...;
topicN
Thanks
There are multiple ways to represent multiline values in YAML. Details could be found in How do I break a string in YAML over multiple lines? and most of them are supported in Spring and could be used in application.yml.
Using multiline for function definitions in Spring Cloud Stream
All above approaches works in Spring but at the end it really depends how the result value is used. In some cases new line character is preserved and in some cases replaced with space.
From what I see spring.cloud.function.definition is parsed in org.springframework.cloud.stream.function.FunctionConfiguration and logic doesn't expect any white spaces.
The only approach that would work in this case is double-quoted block with escaped the newline
spring.cloud.function.definition: "topicA;topicB;topicC;\
...;topicN"

Grok filters for the below log message

I am working on writing a pattern for this specific line through GROK filter
"NOTIFICATION-Interface_IF-asdasdsf01.chn.asdfasp.com/1074_Down"
Can some one help me with this please .
This is the reg-ex i came up with
[A-Z]\w+[-][A-Z][a-z]\w+[]\w+[a-zA-Z][-]\w+[a-zA-Z0-9][.][A-Za-z]\w+[.][A-Za-z]\w+[.][a-z]\w+/[0-9]\d+[][A-Za-z]\w+
but i want it for "NOTIFICATION-Interface_IF-asdasdsf01.chn.asdfasp.com/1074_Down" as well as with out the chn in the hostname
NOTIFICATION-Interface_IF-asdasdsf01.asdfasp.com/1074_Down
Thanks in Advance.
Which data would you capture ?
Using this kind of pattern if would be possible to capture multiple data
%{WORD:type}-%{WORD:interface}-%{GREEDYDATA:client}
Will be produce
type = "NOTIFICATION"
interface = "Interface_ID"
client = "asdasdsf01.chn.asdfasp.com/1074_Down"
Don't hesitate to test and upgrade the pattern using the grok debugger. See here the native patterns from logstash.
PS : your first regexp can be simplify : ([A-Z]+)\-([A-Za-z]+)_([A-Za-z]+)\-(.*)\/([0-9]+)_([A-Za-z]+). Depending of your delimiter, it can be more easy.
You can test is using this link

Logstash grok pattern field not appearing in Kibana

I have recently been investigating ELK as a potential logging/monitoring solution. I have the stack set up and working, and I am starting to filter logs via grok.
Is it possible to have a specific part of your grok pattern appear as a field in Kibana?
For example, take the following pattern:
SAMSLOG %{HOUR}:%{MINUTE}:%{SECOND} \[%{USERNAME:user}\] - %{JAVALOGMESSAGE}
I was hoping (and from what I have read) "user" should become an available field in Kibana that I am able to search/filter the results on? Have I completely misunderstood or am I missing a vital link in the chain?
Full Grok pattern:
multiline {
patterns_dir => "/home/samuel/logstash/grok.patterns"
pattern => "(^%{SAMSLOG})"
negate => true
what => "previous"
}
Thank you,
Sam
Yes, the whole "magic" of logstash is to take the unstructured data and make structured fields from it. So, your basic premise is correct.
What you're missing is that multiline{} is a filter that is used to combine several input lines into one event; that's basically all it does. The "pattern" field there is used to identify when a new line should be started.
To make fields out of an event, you would need to use the grok{} filter.

how to implement complex pattern matching in Spring batch using PatternMatchingCompositeLineMapper

How can we implement pattern matching in Spring Batch, I am using org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper
I got to know that I can only use ? or * here to create my pattern.
My requirement is like below:
I have a fixed length record file and in each record I have two fields at 35th and 36th position which gives record type
for example below "05" is record type which is at 35th and 36th position and total length of record is 400.
0000001131444444444444445589868444050MarketsABNAKKAAAAKKKA05568551456...........
I tried to write regular expression but it does not work, i got to know only two special character can be used which are * and ? .
In that case I can only write like this
??????????????????????????????????05?????????????..................
but it does not seem to be good solution.
Please suggest how can I write this solution, Thanks a lot for help in advance
The PatternMatchingCompositeLineMapper uses an instance of org.springframework.batch.support.PatternMatcher to do the matching. It's important to note that PatternMatcher does not use true regular expressions. It uses something closer to ant patterns (the code is actually lifted from AntPathMatcher in Spring Core).
That being said, you have three options:
Use a pattern like you are referring to (since there is no short hand way to specify the number of ? that should be checked like there is in regular expressions).
Create your own composite LineMapper implementation that uses regular expressions to do the mapping.
For the record, if you choose option 2, contributing it back would be appreciated!

Yahoo Pipes: filter items in a feed based on words in a text file

I have a pipe that filters an RSS feed and removes any item that contains "stopwords" that I've chosen. Currently I've manually created a filter for each stopword in the pipe editor, but the more logical way is to read these from a file. I've figured out how to read the stopwords out of the text file, but how do I apply the filter operator to the feed, once for every stopword?
The documentation states explicitly that operators can't be applied within the loop construct, but hopefully I'm missing something here.
You're not missing anything - the filter operator can't go in a loop.
Your best bet might be to generate a regex out of the stopwords and filter using that. e.g. generate a string like (word1|word2|word3|...|wordN).
You may have to escape any odd characters. Also I'm not sure how long a regex can be so you might have to chunk it over multiple filter rules.
In addition to Gavin Brock's answer the following Yahoo Pipes
filters the feed items (title, description, link and author) according to multiple stopwords:
Pipes Info
Pipes Edit
Pipes Demo
Inputs
_render=rss
feed=http://example.com/feed.rss
stopwords=word1-word2-word3

Resources