Nifi regex on attributes [closed] - apache-nifi

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have flowfiles with filename like that:
xxx2019xxx.txt
Where xxx are letters, I want to extract the year (2019 or whatever looks like a 4 digits numbers) within the filename. It seems to me that the Expression Language´s regex functions like matches(...) just return a boolean value. Any ideas how to extract the year?
Thank you and best regards.

You can add a dynamic property (click the + icon on the top right of the "Properties" tab of the UpdateAttribute processor). Name it "extractedYear" or whatever you like. The value of this property should be an Expression Language statement like:
${filename:replace('.*(\d{4}).*', '$1')}
That says to replace (in the new attribute, not modifying the existing filename attribute) the matched pattern (anything + 4 digits + anything) with the first capture group (aka the 4 digits).

Related

Ruby Regex for string "and/or" as exact match [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 months ago.
Improve this question
I am trying to figure out the Ruby Regex for the exact string "and/or". For example, let's say I have a name variable that is "Elvin and/or Jafarli"
name = "Elvin and/or Jafarli"
and I want to split the name based on the string "and/or". How is that done in Ruby?
This is the final result I am looking for:
name.split(some_regex) results in ["Elvin", "Jafarli"]
** UPDATE **
This is the current regex that exists in the system
names.split(/ (?i)(?:and|or) /)
What I want to do is to update the regex to also split on exactly string match like "Elvin and/or Jafarli".
Add another alternative with |, and escape the delimiter:
names.split(/ (?i)(?:and|or|and\/or) /)
or use the alternative regex literal form:
names.split(%r{ (?i)(?:and|or|and/or) })
I feel like there must be a catch. This seems too easy.
irb(main):001:0> name = "Elvin and/or Jafarli"
=> "Elvin and/or Jafarli"
irb(main):002:0> name.split /\s+and\/or\s+/
=> ["Elvin", "Jafarli"]
Remember to escape the / in the regular expression and account for the surrounding whitespace. \s+ specifies one or more whitespace characters.

Get selected value from string, from point to point [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
please is in ruby possible to get information from example name "Doe,Jon" (exact format) to get only the name "Jon"? Of course the name can be always different, I was thinking if is not possible to get the value from end of string to "," separator. If is it possible, how?
Thanks for your help.
So lets examine some of the solutions that are given to you in the comments
Split
"Doe,Jon".split(',').last
# or a bit more verbose
parts = "Doe,Jon".split(',') # ["Doe", "Jon"]
name = parts.last # "Jon"
String#split splits a sting into an array. It uses the parameter "," as separator. Array#last returns the last item from an array.
Gsub
"Doe,Jon".gsub(/.*,/, '')
String#gsub substitutes the part that matches the Regular Expression (/.*,/) with the substitution value ("").
The regexp matches everything (.*) up to (and including) the comma. And the replacement is an empty string, essentially deleting the part that matches the regexp.
Note that you could/should probably have an anchor to make the regexp more strict (/\A.*,/)
Slice
String#slice creates a substring given a range. -1 is a shortcut for the last element.
String#index finds the index of a character inside a String.
"Doe,Jon".slice(("Doe,Jon".index(',')+1)..-1)
# or more verbose
full = "Doe,Jon"
index_of_comma = full.index(',') # => 3
index_after_comma = index + 1
name = full.slice(index_after_comma..full.size)
CSV
CSV (Comma Separated Values) is a format where multiple values are separated by a comma (or other separation character).
require "csv"
CSV.parse("John,Doe")[0][1]
This will treat the name as CSV data and then access the first row of data (´[0]´). And from that row accesses the second element ([1]) which is the name.
Now what?
There are usually multiple ways to reach a goal. And it's up to you to pick a way. I'd go with the first one. To me it is easy to read and understand its purpose.

How to match strings, ignoring certain characters [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have to match pairs of strings, ignoring spaces " " and hyphens "-". I want to regard the following pairs as identical.
"2,3 chloro benzene" and "2,3 chlorobenzene"
"4'3',2-dinitrotoluene" and "4'3',2-di nitro toluene"
Due to the spaces, I cannot match them. How can I do that? I am not sure how to do it in Ruby.
Use String#delete to delete unwanted chars and normalize the two strings before comparing them, as shown below:
s1 = "2,3 chloro-benzene"
s2 = "2,3 chlorobenzene"
s1.delete(" -") == s2.delete(" -")
#=> true

Regex issue with a string and all numbers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In Ruby, I would like to create a regular expression that matches the following:
building/liberty-green/6d
(the word building and some number somewhere after it)
Currently, I have /building/ and need to add \d (any digit) to it, but I don't know how.
You need /building\/[\w-]+\/\w+/. For example:
irb(main):001:0> /building\/[\w-]+\/\w+/.match("building/liberty-green/6d")
=> #<MatchData "building/liberty-green/6d">
That expression will match any string that:
Starts with /building/
Then follows with one or more word characters or dashes (eg. foo-bar, foo, bar-1)
Then follows with a /
Finally ends with one or more word characters (eg. foo, 6d, 12345)
Note that \w includes digits.

Regex for string separated by pipe character [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm having some problems coming up with a regexp that matches class1, class2, and class3 in the following string (ideally I could have n number of words separated by pipes, as the number of classes passed to my method is not constant)
class1|class2|class3 path/to/resource
I have the following matcher which returns only class1. Bonus points to whomever can find me a matcher for the resource path as well.
Edit
Thank you very much for all the help - points all around!
Assuming you are confident that your input will be well formed, my advice would be to split your string by both the pipe character and space. For example:
components = "class1|class2|class3 path/to/resource".split(/[ \|]/)
You would then have access to an array containing n components followed by the path to your resource which you can manipulate to access.
resourcePath = components.pop()
classes = components
EDIT: The original topic of this was suggested the OP is using Ruby, hence my answer.
\w+(\|\w+)*\s+\w+(\/\w+)*
I assumed that the names of your classes consist of one or more word characters. Adjust if they're more restricted than that. For instance, use class\d+ for numbered classes only.
We have a class name, followed by any number of [a pipe followed by a class name]. Then we have one or more spaces, followed by basically the same thing, but this time using slashes instead of pipes.
I've escaped both the pipe and the slash with a backslash.
string = "class1|class2|class3 path/to/resource".split(%r{[| ]})
=> ["class1", "class2", "class3", "path/to/resource"]
I would just do two splits:
string = 'class1|class2|class3 path/to/resource'
p string.split.first.split('|') #=> ["class1", "class2", "class3"]
If you want to use regex with the input you provided, this will extract your classes and path:
([\w/]+)\|? ?
INPUT
class1|class2|class3 path/to/resource
OUTPUT
class1
class2
class3
path/to/resource

Resources