I am getting three attributes instead of one. ExtractText Processor NIFI - apache-nifi

So I am trying to extract attributes from file with the line format NUMBER/TEXT, for example like this:
9999, text
I am creating attribute number with the regular expression like this (\d{4})
But instead of one attribute number, I am getting 3 attributes number, number0 and number1.
What am I doing wrong?
Thank you beforehand!

Use regular expression \d{4} without brackets. It returns only one attribute number.0

Related

How to extract only a particular number from HTML response

This is how my HTML value looks like. Please note, there is a carriage return and space after q1. I only want to extract the number next to q as here it is 1.
<span id="sqa">q1
?</span>
I am using Xpath extractor, and it gives me complete value as can be seen via debug sample. The name of my variable is answer_value. I only wanted 1 which is a dynamic number here, not q. As only this number is used in the subsequent request.
Xpath query I am using is //span[#id="sqa"] and it gives me below value. I am not sure how I can split this value in Xpath or need to use a split function of JMeter to do that?
answer_value=q1 ?
Just add a Regular Expression Extractor and configure it to extract a number from the answer_value JMeter Variable.
Place it below the XPath Extractor and configure like this:
You might also want to apply XPath normalize-space() function to remove eventual line breaks:
normalize-space(//span[#id="sqa"]/text())
Demo:

NiFi: change text in FlowFile (Python or ...)

Im very new in NiFi..
I get data(FlowFile ?) from my processor "ConsumerKafka", it seems like
So, i have to delete any text before '!',I know a little Python. So with "ExcecuteScript", i want to do something like this
my_string=session.get()
my_string.split('!')[1]
#it return "ZPLR_CHDN_UPN_ECN....."
but how to do it right?
p.s. or, may be, use "substringAfterLast", but how?
Tnanks.
Update:
I have to remove text between '"Tagname":' and '!', how can i do it without regex?
If you simply want to split on a bang (!) and only keep the text after it, then you could achieve this with a SplitContent configured as:
Byte Sequence Format: Text
Byte Sequence: !
Keep Byte Sequence: false
Follow this with a RouteOnAttribute configured as:
Routing Strategy: Route to Property name
Add a new dynamic property called "substring_after" with a value: ${fragment.index:equals(2)}
For your input, this will produce 2 FlowFiles - one with the substring before ! and one with the substring after !. The first FlowFile (substring before) will route out of the RouteOnAttribute to the unmatched relationship, while the second FlowFile (substring after) will route to a substring_after relationship. You can auto-terminate the unmatched relationship to drop the text you don't want.
There are downsides to this approach though.
Are you guaranteed that there is only ever a single ! in the content? How would you handle multiple?
You are doing a substring on some JSON as raw text. Splitting on ! will result in a "} left at the end of the string.
These look like log entries, you may want to consider looking into ConsumeKafkaRecord and utilising NiFi's Record capabilities to interpret and manipulate the data more intelligently.
On scripting, there are some great cookbooks for learning to script in NiFi, start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922
Edit:
Given your update, I would use UpdateRecord with a JSON Reader and Writer, and Replacement Value Strategy set to Record Path Value .
This uses the RecordPath syntax to perform transformations on data within Records. Your JSON Object is a Record. This would allow you to have multiple Records within the same FlowFile (rather than 1 line per FlowFile).
Then, add a dynamic property to the UpdateRecord with:
Name: /Tagname
Value: substringAfter(/Tagname, '!' )
What is this doing?
The Name of the property (/Tagname) is a RecordPath to the Tagname key in your JSON. This tells UpdateRecord where to put the result. In your case, we're replacing the value of an existing key (but it could be also be a new key if you wanted to add one).
The Value of the property is the expression to evaluate to build the value you want to insert. We are using the substringAfter function, which takes 2 parameters. The first parameter is the RecordPath to the Key in the Record that contains the input String, which is also /Tagname (we're replacing the value of Tagname, with a substring of the original Tagname value). The second parameter is the String to split on, which is !.
If your purpose getting the string between ! and "} use ReplaceText with (.*)!(.*)"} , capture second group and replace it with entire content
Please note that this regular expression may not be best for your case but I believe you can find solution for your problem with regular expression

XPath to be precised into one in order to extract text from a web page?

I have a few Xpaths as below:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p
//*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="2555ab30-bb84-11ea-9e8b-277e7f6208b2"]/div/div/div[1]/div/p
//*[#id="7e100250-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
All of the above are used to extract text from a single web page since text is located at different view--ports, but I wish to find a single xpath to extract text for all of them. Is it possible to use 'and' and multiple ID's to extract all of it through one xpath?
Any other suggestions would be appreciate.
You can use the or operator for the last four.
And the merge-nodes operator | to add the first one.
So to select all 5 expression in one, use the following expression:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p | //*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2" or #id="2555ab30-bb84-11ea-9e8b-277e7f6208b2" or #id="7e100250-a71d-11ea-b994-53a3e91a35c2" or #id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
A shorter and more generic solution could be :
(//div/div/div[1]/div/p|//div/p)[parent::*[string-length(#id)=36 and substring(#id,24,1)="-"]]
First part with () is used to specify the end of the path. Since #id attributes have the same length, we use it inside the predicate. We also verify the presence of a - at a specific position with substring.

How to extract complete Product name using regex or x-path

I have an Html page which contains many script tags and inside each script tag I have a structure like:
<script>window.pagedata={listItems:[{"name": "Multi-Warna Lembut Silikon Casing Ponsel Untuk Apple iPhone 11 Case 11 Pro Max Tidak Berbau dan Tidak Beracun Casing iPhone 11 pro-Max"}]}</script>
My goal is to extract all the name from this script tag using a regex or a x-path in JMeter.
You can extract name using below regular expression. Please note that if your requirement is to extract any one product name from response (assuming you have lot of product names in response), you can put match No. as "0" (which selects randomly). Else, if you need product name which comes at first occurrence, you can define match no as "1".
Regular Expression : \{"name": "(.*?)"\}
If your goal is to extract all names, then please use "-1" in match No. Variable substitution will be ${name_1}, ${name_2} ..etc;
You can use regulare expression extractior post processor and refer to the regex as shown below.
Result:
Add Regular Expression Extractor as a child of the request which returns the names, you're looking for
Configure it as follows:
Name of created variable: anything meaningful, i.e. name
Regular Expression: {listItems:\[{"name":\s*(.+?)"
Template: $1$
Match No: -1
That's it, you should be able to access the extracted values as ${name_1}, ${name_2}, etc.
Demo (assumes Dummy Sampler to mimic multiple products)
More information: JMeter Regular Expressions

How to Select Only Alphanumeric characters from a string in Datastage?

I am facing a problem with my data, in my data other than alphanumeric characters are there in a column field, where for EX in Name column: Ravicᅩhandr¬an (¬ᅩ○`) like these many characters are there. I need a result like Ravichandran. How can I achieve this? Is there any way to remove in transformer stage.
I tried Convert function in Transformer stage, but problem in using Convert, I am not sure about these unknown characters, I have shown above is just example.
My Requirement is, other than alphanumeric must be removed. And the Balance string should be the same.
How can I get this done?
The following Convert function can be used in Transformer stage to remove any kind of unknown/special characters from the column.
**Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', Column_Name1),'',Column_Name1)
Ex : Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', to_txm.SourceCode),'',to_txm.SourceCode)**

Resources