How can we implement pattern matching in Spring Batch, I am using org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper
I got to know that I can only use ? or * here to create my pattern.
My requirement is like below:
I have a fixed length record file and in each record I have two fields at 35th and 36th position which gives record type
for example below "05" is record type which is at 35th and 36th position and total length of record is 400.
0000001131444444444444445589868444050MarketsABNAKKAAAAKKKA05568551456...........
I tried to write regular expression but it does not work, i got to know only two special character can be used which are * and ? .
In that case I can only write like this
??????????????????????????????????05?????????????..................
but it does not seem to be good solution.
Please suggest how can I write this solution, Thanks a lot for help in advance
The PatternMatchingCompositeLineMapper uses an instance of org.springframework.batch.support.PatternMatcher to do the matching. It's important to note that PatternMatcher does not use true regular expressions. It uses something closer to ant patterns (the code is actually lifted from AntPathMatcher in Spring Core).
That being said, you have three options:
Use a pattern like you are referring to (since there is no short hand way to specify the number of ? that should be checked like there is in regular expressions).
Create your own composite LineMapper implementation that uses regular expressions to do the mapping.
For the record, if you choose option 2, contributing it back would be appreciated!
Related
My Asg names are creating with this name format
"digital-microservice-app1-20220627062026999600000001"
"digital-microservice-app2-20220627062026999600000001"
"digital-microservice-app3-20220627062026999600000001"
How can I search all the asg names starting with "digital-microservice-" using ruby?
Is that possible to search asg names based on string?
Use Anchored Regular Expressions When You Need Post-Processing of Matches
String-based answers are generally faster because regular expressions are generally slower than comparable String methods. However, they offer some capabilities that would take a lot more additional parsing and transformation steps if you ever need to match something more complex than a simple prefix, or to capture elements of the match for additional processing. For example, the following Regexp does the same thing as String#start_with?:
asg_names = [
"digital-microservice-app1-20220627062026999600000001",
"digital-microservice-app2-20220627062026999600000001",
"digital-microservice-app3-20220627062026999600000001",
"analog-microservice-app4-20220627062026999600000001"
]
# this will match all your digital apps, and exclude
# the one starting with "analog"
asg_names.grep /^digital-microservice-/
#=>
["digital-microservice-app1-20220627062026999600000001",
"digital-microservice-app2-20220627062026999600000001",
"digital-microservice-app3-20220627062026999600000001"]
Example of When Regexp Helps with Post-Processing
Unlike with String methods, you can use Regexp capture groups and other features to do something with various portions of the Regexp match, which you couldn't do without additional String processing. By using named captures, pre- and post-match variables, sub-expressions, or other things not strictly needed to solve the problem as originally posted, you can simplify some things that might otherwise take a larger number of additional parsing and transformation steps. This could simplify any next steps you may have in your processing workflow.
As a trivial example, consider the following pattern that extracts the name of the app type (e.g. digital or analog) and the app name, and then transforms the resulting matches within a block into an Array of String objects suitable for logging or user-facing output:
asg_names.grep(/^digital-microservice-\K(app\d)+/) do
"#{$`.tr(?-, ?\s)}found for: #{$1}".capitalize
end
#=>
["Digital microservice found for: app1",
"Digital microservice found for: app2",
"Digital microservice found for: app3"]
Go with String methods if you don't need the additional Regexp features to solve your problem, but consider regular expressions if you're doing something more complex with your matches.
Input
a = ["digital-microservice-app1-20220627062026999600000001",
"digital-microservice-app2-20220627062026999600000001",
"digital-microservice-app3-20220627062026999600000001",
"digita-microservice-app3-20220627062026999600000001"
]
Code
p a.filter { |x| x.start_with?('digital-microservice') }
Output
["digital-microservice-app1-20220627062026999600000001", "digital-microservice-app2-20220627062026999600000001", "digital-microservice-app3-20220627062026999600000001"]
I have a very simple use case where I need to add an NER annotation to a sequence of two words where the first word is optional.
For example, I need to annotate both "net income" and "income" phrases as a same NE type.
With ordinary regular expressions the following expression works:
([Nn]et\s)?[Ii]ncome
However, in RegexNER it does not work.
The effect that the above regex has in RegexNER is that the word "income" is annotated in both sequences, but the word "net" is not annotated in the sequence "net income", which is not the result that I need.
That is sort of expected, knowing that RegexNER matches a sequence of regular expressions over a sequence of tokens, not a single regular expression over a single string.
However, the following syntax does not work either:
([Nn]et)? [Ii]ncome
The effect that this expression has is that the sequence "net income" is annotated entirely, but just "income" is not annotated at all.
This is unexpected, since this seems like a very simple use case.
I tried different ways to denote the initial token as a group and also tried different quantifiers - it still does not work.
Any help with making the first token optional will be appreciated.
Let me answer my own question. This is not a direct solution, it's a workaround.
The following expression will work, but only with TokensRegex, not with RegexNER:
/[Nn]et/? /[Ii]ncome/
I am not sure why this is the case, maybe RegexNER does not support quantifiers at the token level the same way TokensRegex does.
We have one quite complex regular expression which checks for string structure.
I wonder if there is an easy way to find out which character in the string that is causing reg expression not to match.
For example,
string.match(reg_exp).get_position_which_fails
Basically, the idea is how to get "position" of state machine when it gave up.
Here is an example of regular expression:
%q^[^\p{Cc}\p{Z}]([^\p{Cc}\p{Zl}\p{Zp}]{0,253}[^\p{Cc}\p{Z}])?$
The short answer is: No.
The long answer is that a regular expression is a complicated finite state machine that may be in a state trying to match several different possible paths simultaneously. There's no way of getting a partial match out of a regular expression without constructing a regular expression that allows partial matches.
If you want to allow partial matches, either re-engineer your expression to support them, or write a parser that steps through the string using a more manual method.
You could try generating one of these automatically with Ragel if you have a particularly difficult expression to solve.
I was just wondering if there is a shorter way of writing an XPath query to find all HREF values containing at least one of many search values?
What I currently have is the following:
//a[contains(#href, 'value1') or contains(#href, 'value2')]
But it seems quite ugly, especially if I were to have more values.
First of all, in many cases you have to live with the "ugliness" or long-windedness of expressions if only XPath 1.0 is at your disposal. Elegance is something introduced with version 2.0, I'd daresay.
But there might be ways to improve your expression: Is there a regularity to the href attributes you'd like to find? For instance, if it is sufficient as a rule to say that the said href attribute values must start with "value", then the expression could be
//a[starts-with(#href,'value')]
I know that "value1" and "value2" are most probably not your actual attribute values but there might be something else that uniquely identifies the group of a elements you're after. Post your HTML input if this is something you want us to help you with.
Personally, I do not find your expression ugly. There is just one or operator and the expression is quite short and readable. I take
if I were to have more values.
to mean that currently, there are only two attribute values you are interested in and that your question therefore is a theoretical one.
In case you're using XPath 2 and would like to have exact matches instead of also matches only containing part of a search value, you can shorten with
//a[#href = ('value1', 'value2')]
For contains() this syntax wouldn't work as the second argument of contains() is only allowed to be 0 or 1 value.
In XPath 2 you could also use
//a[some $s in ('value1', 'value2') satisfies contains(#href, $s)]
or
//a[matches(#href, "value1|value2")]
Template.ParseGlob("*.html") //fetches all html files from current directory.
Template.ParseGlob("**/*.html") //Seems to only fetch at one level depth
Im not looking for a "Walk" solution. Just want to know if this is possible. I don't quite understand what "pattern" this expects. if i can get an explanation about the pattern used by ParseGlob that would be great too.
The code text/template/helper.go mentions
// The pattern is processed by filepath.Glob and must match at least one file.
filepath.Glob() says that "the syntax of patterns is the same as in Match"
Match returns true if name matches the shell file name pattern.
The implementation of Match() doesn't seem to treat '**' differently, and only consider '*' as matching any sequence of non-Separator characters.
That would mean '**' is equivalent to '*', which in turn would explain why the match works at one level depth only.
So, since the ParseGlob can't load templates recursively we have to use path/filepath.Walk function. But this way gives more opportunities.
https://gist.github.com/logrusorgru/abd846adb521a6fb39c7405f32fec0cf