LUIS issues with special characters - botframework

(TEXT) is converted to ( TEXT ) in LUIS when we identify an entity name.
Issues with special characters.
Refer the image in below:
Here monthly iq dashboard hospitalists is converted to reportname --> "monthly iq dashboard ( hospitalists )" in Entities. So when we use this entity in bot framework we are facing issues while comparing to actual report name stored in Metadata (database).

(TEXT) is converted to ( TEXT ) in LUIS when we identify an entity name. Issues with special characters.
The issue you reported seems be that whitespace is added when some special characters are using, I reproduced the issue on my side, and I find similar issues are reported by others:
LUIS inserts whitespace in utterances when punctuation present causing entity getting incorrectly parsed
LUIS cannot take care of special characters
when we use this entity in bot framework we are facing issues while comparing to actual report name stored in Metadata (database)
To solve it, as Nicolas R and NiteLordz mentioned in comments, you can try to handle that in your code. And to remove whitespace from ( hospitalists ), the following regex would be helpful.
Regex regex = new Regex(#"\(\s\w*\s\)");
input = Regex.Replace(input, regex.ToString(), c => c.Value.Replace(" ", ""));
Note: can reproduce the issue, and same issue will appear when we process something like URL that contains / and . etc

Related

UTF8mb4 unicode breaking MariaDB JDBC driver

I have some product names that include unicode characters
⚠️📷PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK
SELLER
A query in heidiSQL shows it fine
I setup MariaDB new this morning having moved from MySQL, but when records are retrieved through a ColdFusion Query using the MariaDB JDBC I get
java.lang.StringIndexOutOfBoundsException: begin 0, end 80, length 74
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
at java.base/java.lang.String.substring(String.java:1883)
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalString(TextRowProtocol.java:238)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getString(SelectResultSet.java:948)
The productname field collation is utf8mb4_unicode_520_ci, I've tried a few options. I've tried to set this at table and database level where it let me.
The JDBC connection string in ColdFusion admin is jdbc:mysql://localhost:3307/usedlens?useUnicode=true&characterEncoding=UTF-8
I note that the live production database where MariaDB was used from the beginning I don't have this trouble but the default charset is latin1, and the same record is the database as
????PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK SELLER
Here's how we've been stripping high ASCII characters while retaining any characters that may be salvaged:
string function ASCIINormalize(string inputString=""){
return createObject( 'java', 'java.text.Normalizer' ).normalize( javacast("string", arguments.inputString) , createObject( 'java', 'java.text.Normalizer$Form' ).valueOf('NFD') ).replaceAll('\p{InCombiningDiacriticalMarks}+','').replaceAll('[^\p{ASCII}]+','');
}
productname = ASCIINormalize(productname);
/*
Comparisons using java UDF versus reReplace regex:
"ABC Café ’test" (note: High ASCII non-normal whitespace characters used.)
ASCIINormalize = "ABC Cafe test"
reReplace = "ABC Caf test"
"čeština"
ASCIINormalize = "cestina"
reReplace = "etina"
"Häuser Bäume Höfe Gärten"
ASCIINormalize = "Hauser Baume Hofe Garten"
reReplace = "Huser Bume Hfe Grten"
*/
This is due to a sequence of high ASCII characters that form emojis. I encountered similar issues when exporting MSSQL data to a UTF-8 file to be converted to Excel using a 3rd party tool. In this case, the database and file were correct, but the 3rd party tool would crash when encountering emoji characters.
Our approach to this was to convert emojis to their aliases so that information wasn't lost in the process. (If you strip high ASCII characters, you may lose some context.) To sanitize emojis to use aliases, I wrote this ColdFusion cf-emoji-java (CFC) to leverage emoji-java (JAR file) to convert emojis to their ASCII7-safe aliases.
emojijava = new emojijava();
emojijava.parseToAliases('I like 🍕'); // I like :pizza:
Since...
I'm not really in the business of supporting emojis
My data is just product names targeted at UK, Europe and the United States for the foreseeable future
I don't want to have to go through the same trouble with production (already defaulted to latin1_swedish_ci)
I decided to..
Match production, so I set the database, table, and fields to latin1_swedish_ci with help from
How to change the CHARACTER SET (and COLLATION) throughout a database?
and strip non ASCII characters in the product name
== edit don't do this, it takes out too many useful characters ==
<cfset productname = reReplace(productname, "[^\x20-\x7E]", "", "ALL")>

Performing updates using alternate key in Dynamics 365 WebAPI

Can anyone help, as I believe someone has already faced the issue I'm having.
I have a custom entity (alssc_anglesector) with an alternate key (alssc_name)
“alssc_ANGLESector#odata.bind”: “/alssc_anglesectors(alssc_name=’Air’)”,
“alssc_ANGLESector#odata.bind”: “/alssc_anglesectors(alssc_name=’Water Auth/Company’)”,
when I create an account and use the first bind with “Air” it works fine, while when using the second “Auth/Company” I got the response
“message”: “Bad Request – Error in query syntax.”,
“type”: “Microsoft.OData.ODataException”,
“stacktrace”: ” at
Microsoft.OData.UriParser.ODataPathParser.ExtractSegmentIdentifierAndParenthesisExpression(String
segmentText, String& identifier, String& parenthesisExpression)
I have also tried to encode it
"alssc_ANGLESector#odata.bind": "/alssc_anglesectors(alssc_name=\u0027Water Auth\u002FCompany\u0027)",
but the end result was the same.
I’m not being able to overcome this, Any ideas / suggestions ?
could it be a Bug in D365 API WebApi ?
As per the answer in the comment thread: This request is unsupported because it contains Unicode characters
Unicode characters in key value
If the data within a field that is
used in an alternate key will contain one of the following characters
<,>,*,%,&,:,/,\ then update or upsert (PATCH) actions will not work.
The suggestion from the Microsoft Docs is to create another field (such as a code, or a simplified name) that does not contain these characters
https://learn.microsoft.com/en-us/powerapps/maker/common-data-service/define-alternate-keys-reference-records#unicode-characters-in-key-value
In this case you have to change ’Water Auth/Company’ with ’Water Auth%2FCompany’ because of '/' special character.
I hope it works.
Ugur

Not able to add new intent in LUIS with ":"

My whole application has intents with name having ":" .But now when i am trying to add new intent,its giving me error "BadArgument: Intent and entity name cannot contain the character ":" or "$" "
Welcome to Stack Overflow. Unfortunately, these special characters should not be used in intent names because they are reserved for other uses. An example of how the colon is used is in entity roles in patterns. My recommendation is to rename your intents. I also recommend storing the intent names in your application as constant string resources so that the values can be easily changed.

How to fix different intent getting identified when input contains special characters

In my LUIS application I have a 'Greeting' intent. The intent identified for 'hi' is 'Greeting' but for 'hi.......' some other intent is identified.
After training the 'hi.......' as 'Greeting' it gets identified as 'Greeting' correctly. There are some other variants too with special characters which need to be trained to make it work.
How do I make this to identify as Greeting without training with special characters?
This is being used in Microsoft Bot Framework v3 in C#
You can either train your LUIS model with all possible variations that include special characters or you can strip out all of the special characters before you send it to LUIS. I would recommend the latter. Here is an example of how you would do that in Node.
turnContext.activity.text = turnContext.activity.text.replace(/[^a-zA-Z ]/g, "", "");
Hope this helps!

Rules for field names in ElasticSearch 6?

Currently all what I can find online is:
must not start with underscore "_"
must not contain comma ","
must not contain hash mark "#"
usage of point "." is discouraged but possible
field names must not be longer than 255
But it seems that these are the rules for ElasticSearch 5 and older versions.
I did some experiments and found:
using dots (.) may result in various kinds of errors, e.g. illegal_state_exception, array_index_out_of_bounds_exception, but sometimes it's legal
empty strings are not allowed (illegal_argument_exception)
leading underscores, commas, hash marks seem to be legal in ElasticSearch 6
field names can be longer than 255 (but perhaps there's a new limit?)
I wonder whether there's an official document for this? Am I just being blind?
We are currently planning an upgrade from 5.6.5 to 6.2.x.
I'm looking for evidence to support the worrying comment "...as underscores in field names will not be allowed" mentioned in Breaking Changes for Watcher in 6.0.0-alpha2.
I've been unable to find any additional evidence that underscores are now verboten. I'll open a support case referencing this question to get an official response on this.

Resources