How to Select Only Alphanumeric characters from a string in Datastage? - internationalization

I am facing a problem with my data, in my data other than alphanumeric characters are there in a column field, where for EX in Name column: Ravicᅩhandr¬an (¬ᅩ○`) like these many characters are there. I need a result like Ravichandran. How can I achieve this? Is there any way to remove in transformer stage.
I tried Convert function in Transformer stage, but problem in using Convert, I am not sure about these unknown characters, I have shown above is just example.
My Requirement is, other than alphanumeric must be removed. And the Balance string should be the same.
How can I get this done?

The following Convert function can be used in Transformer stage to remove any kind of unknown/special characters from the column.
**Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', Column_Name1),'',Column_Name1)
Ex : Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', to_txm.SourceCode),'',to_txm.SourceCode)**

Related

NiFi: change text in FlowFile (Python or ...)

Im very new in NiFi..
I get data(FlowFile ?) from my processor "ConsumerKafka", it seems like
So, i have to delete any text before '!',I know a little Python. So with "ExcecuteScript", i want to do something like this
my_string=session.get()
my_string.split('!')[1]
#it return "ZPLR_CHDN_UPN_ECN....."
but how to do it right?
p.s. or, may be, use "substringAfterLast", but how?
Tnanks.
Update:
I have to remove text between '"Tagname":' and '!', how can i do it without regex?
If you simply want to split on a bang (!) and only keep the text after it, then you could achieve this with a SplitContent configured as:
Byte Sequence Format: Text
Byte Sequence: !
Keep Byte Sequence: false
Follow this with a RouteOnAttribute configured as:
Routing Strategy: Route to Property name
Add a new dynamic property called "substring_after" with a value: ${fragment.index:equals(2)}
For your input, this will produce 2 FlowFiles - one with the substring before ! and one with the substring after !. The first FlowFile (substring before) will route out of the RouteOnAttribute to the unmatched relationship, while the second FlowFile (substring after) will route to a substring_after relationship. You can auto-terminate the unmatched relationship to drop the text you don't want.
There are downsides to this approach though.
Are you guaranteed that there is only ever a single ! in the content? How would you handle multiple?
You are doing a substring on some JSON as raw text. Splitting on ! will result in a "} left at the end of the string.
These look like log entries, you may want to consider looking into ConsumeKafkaRecord and utilising NiFi's Record capabilities to interpret and manipulate the data more intelligently.
On scripting, there are some great cookbooks for learning to script in NiFi, start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922
Edit:
Given your update, I would use UpdateRecord with a JSON Reader and Writer, and Replacement Value Strategy set to Record Path Value .
This uses the RecordPath syntax to perform transformations on data within Records. Your JSON Object is a Record. This would allow you to have multiple Records within the same FlowFile (rather than 1 line per FlowFile).
Then, add a dynamic property to the UpdateRecord with:
Name: /Tagname
Value: substringAfter(/Tagname, '!' )
What is this doing?
The Name of the property (/Tagname) is a RecordPath to the Tagname key in your JSON. This tells UpdateRecord where to put the result. In your case, we're replacing the value of an existing key (but it could be also be a new key if you wanted to add one).
The Value of the property is the expression to evaluate to build the value you want to insert. We are using the substringAfter function, which takes 2 parameters. The first parameter is the RecordPath to the Key in the Record that contains the input String, which is also /Tagname (we're replacing the value of Tagname, with a substring of the original Tagname value). The second parameter is the String to split on, which is !.
If your purpose getting the string between ! and "} use ReplaceText with (.*)!(.*)"} , capture second group and replace it with entire content
Please note that this regular expression may not be best for your case but I believe you can find solution for your problem with regular expression

how to make Google sheets REGEXREPLACE return empty string if it fails?

I'm using the REGEXREPLACE function in google sheets to extract some text contained in skus. If it doesn't find a text, it seems to return the entire original string. Is it possible to make it return an empty string instead? Having problems with this because it doesn't actually error out, so using iserror doesn't seem to work.
Example: my sku SHOULD contain 5 separate groups delimited by the underscore character '_'. in this example it is missing the last group, so it returns the entire original string.
LDSS0107_SS-WH_5
=REGEXREPLACE($A3,"[^_]+_[^_]+_[^_]+_[^_]+_(.*)","$1")
Fails to find the fifth capture group, that is correct... but I need it to give me an empty string when it fails... presently gives me the whole original string. Any ideas??
Perhaps the solution would be to add the missing groups:
=REGEXREPLACE($A1&REPT("_ ",4-(LEN($A1)-LEN(SUBSTITUTE($A1,"_","")))),"[^_]+_[^_]+_[^_]+_[^_]+_(.*)","$1")
This returns space as result for missing group. If you don't want to, use TRIM function.

Formatting numbers with thousand separator as empty string on amcharts4

I need to format the chart number format so that numbers stop looking like that 1,525 (comma separator) and start looking like this 1 525 (empty string thousand separator). Plus, I need dot separator for decimal, but only if a number has any, like this 1 525.4
The closest number format I was able to find for amCharts4 version is
chart.numberFormatter.numberFormat = '#,###.#';
Any ideas?
So, after a research I've found a solution - you have to use locales.
This line of code helped me a lot:
chart.language.locale = am4lang_[locale];
For empty string separator I used am4lang_ru_RU.
Btw, if you need to make your own number, strings, etc formatting, you can create your locales for that.

How to replace '$' from string in pig?

We know to replace word we can use REPLACE keyword like below...
RELATION = FOREACH data GENERATE REPLACE(string,'a','b');
above statement replace all 'a' letters to 'b'.
But if I want to REPLACE dollar sign($). then how I can do that? Because in Pig '$' indicates no of column. So for example, if want to replace '$' from string like '$1234.56' and want output like '1234.56'.
RELATION = FOREACH data GENERATE REPLACE(string,'$','');
But this not work for me.
Can anyone please help? Thanks in advance.
Using Unicode:
REPLACE(string,'\u0024','')
It can helpful to look at the string regrexes in Java, for instance: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
In your particular case, you can use the following:
REPLACE(string, '[$]', '')
For increased flexibility, (when dealing with other currency types for instance), it might be a good idea to remove all non-numeric characters, except '.'. In that case use:
REPLACE(string, '[^\\d.]', '')
This worked for me: (triple backslashes)
REPLACE(string,'\\\$','')

How to create a cfcollection / verity collection with a UTF-8 character in the name?

I'd just like to be able to use a UTF-8 character in the name of the collection. We base our code logic on the names of the collections which are related to a given company. This new company has an abbreviation of XØZ3, and both the CFAdminstrator and cfcollection seem to have issues with using the ø in the collection name.
The errors presented are:
Unable to create collection peoplexscvdocsXØZ3.
Unable to create collection peoplexscvdocsxøz3.
An error occurred while creating the collection: com.verity.api.administration.ConfigurationException: Fail to create the index. (-6220)
If verity doesn't accept UTF-8 and there isn't a work around, I guess you'll have to
have 2 fields, one with ascii based version of the character, one with the html/xml version of the character
pass through the ascii version of the characters when searching the collection to match
so you'd have:
plaintext: XOZ3
XMLText: X&#216Z3;
And a function that takes Ø and changes it to O when searching verity on the plaintext field and return the matching XMLText field

Resources