VB function that strips accents from a string - vbscript

I'm trying to have a function which takes a string and returns the same string without any accented letters. Instead, the accented letters should return the same letter without the accent. This function is not working:
function StripAccents(str)
accent = "ÈÉÊËÛÙÏÎÀÂÔÖÇèéêëûùïîàâôöç"
noaccent = "EEEEUUIIAAOOCeeeeuuiiaaooc"
currentChar = ""
result = ""
k = 0
o = 0
FOR k = 1 TO len(str)
currentChar = mid(str,k, 1)
o = InStr(accent, currentChar)
IF o > 0 THEN
result = result & mid(noaccent,k,1)
ELSE
result = result & currentChar
END IF
NEXT
StripAccents = result
End function
testStr = "Test : à é À É ç"
response.write(StripAccents(testStr))
This is the result using the above:
Test : E E Eu EE E

Disregarding possible encoding problems - you must change
result = result & mid(noaccent,k,1)
to
result = result & mid(noaccent,o,1)

I tried the example code with the correction added
Then I added more characters
Giving:
accent = "àèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛáéíóúÁÉÍÓÚðÐýÝãñõÃÑÕšŠžŽçÇåÅøØ"
noaccent = "aeiouAEIOUaeiouAEIOUaeiouAEIOUaeiouAEIOUdDyYanoANOsSzZcCaAoO"
Now I realised that there are a few more to deal with, namely
æ
Æ
ß
These need converting first using a simple replace them with ae AE and ss
Then it works fine other than it is important to not have <%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> or similar in the code
However having meta charset="UTF-8" in the header is not a big issue, it converts fine.
So if the code is needed on the page with <%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> in it, I do not know any answer to that
Thanks for the code greener, very useful for dealing with the common diacriticals :-)

You should probably do a decomposition normalization first (NFD). I think you could do this in VBA using a call to the WinAPI function NormalizeString (https://msdn.microsoft.com/en-us/library/windows/desktop/dd319093(v=vs.85).aspx). Then, you could remove the accent code points.

Related

I wrote a code to update the Lettering of the first name in Zoho but it's not working

Here's the deluge script to capitalize the first letter of the sentence and make the other letters small that isn't working:
a = zoho.crm.getRecordById("Contacts",input.ID);
d = a.get("First_Name");
firstChar = d.subString(0,1);
otherChars = d.removeFirstOccurence(firstChar);
Name = firstChar.toUppercase() + otherChars.toLowerCase();
mp = map();
mp.put("First_Name",d);
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
info Name;
info b;
I tried capitalizing the first letter of the alphabet and make the other letters small. But it isn't working as expected.
Try using concat
Name = firstChar.toUppercase().concat( otherChars.toLowerCase() );
Try removing the double-quotes from the Name value in the the following statement. The reason is that Name is a variable holding the case-adjusted name, but "Name" is the string "Name".
From:
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
To
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":Name});

VBScript: How can I trim a number to 4 decimal places but not round it?

My code currently looks like this:
FormatNumber((CDbl(0.05935)),4)
The returned value is 0.0594 rather than 0.0593 which is what I need.
You can try parsing this number to string then trimming it and again parsing back to float.
example:
v = 100.0097
x = Str$(v) ' Gives " 100.0097"
//This adds a leading space for positive numbers
or
x = CStr(v) ' Gives "100.0097"
and then trim it as your need
finalstr = LEFT(variable, (LEN(variable)-4))
then parse it to float
finaltrimed = CDbl(finalstr)

hidden space in excel

I tried almost all the methods (CLEAN,TRIM,SUBSTITUTE) trying to remove the character hiding in the beginning and the end of a text. In my case, I downloaded the bill of material report from oracle ERP and found that the item codes are a victim of hidden characters.
After so many findings, I was able to trace which character is hidden and found out that it's a question mark'?' (via VBA code in another thread) both at the front and the end. You can take this item code‭: ‭11301-21‬
If you paste the above into your excel and see its length =LEN(), you can understand my problem much better.
I need a good solution for this problem. Therefore please help!
Thank you very much in advance.
Thanks to Gary's Student, because his answer inspired me.
Also, I used this answer for this code.
This function will clean every single char of your data, so it should work for you. You need 2 functions: 1 to clean the Unicode chars, and other one to clean your item codes_
Public Function CLEAN_ITEM_CODE(ByRef ThisCell As Range) As String
If ThisCell.Count > 1 Or ThisCell.Count < 1 Then
CLEAN_ITEM_CODE = "Only single cells allowed"
Exit Function
End If
Dim ZZ As Byte
For ZZ = 1 To Len(ThisCell.Value) Step 1
CLEAN_ITEM_CODE = CLEAN_ITEM_CODE & GetStrippedText(Mid(ThisCell.Value, ZZ, 1))
Next ZZ
End Function
Private Function GetStrippedText(txt As String) As String
If txt = "–" Then
GetStrippedText = "–"
Else
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "[^\u0000-\u007F]"
GetStrippedText = regEx.Replace(txt, "")
End If
End Function
And this is what i get using it as formula in Excel. Note the difference in the Len of strings:
Hope this helps
You have characters that look like a space character, but are not. They are UniCode 8236 & 8237.
Just replace them with a space character (ASCII 32).
EDIT#1:
Based on the string in your post, the following VBA macro will replace UniCode characters 8236 amd 8237 with simple space characters:
Sub Kleanup()
Dim N1 As Long, N2 As Long
Dim Bad1 As String, Bad2 As String
N1 = 8237
Bad1 = ChrW(N1)
N2 = 8236
Bad2 = ChrW(N2)
Cells.Replace what:=Bad1, replacement:=" ", lookat:=xlPart
Cells.Replace what:=Bad2, replacement:=" ", lookat:=xlPart
End Sub

How do I use Base64 and exclude special characters +/= from string?

I am currently using Ruby's 'base64' but the strings that are created have special characters like /+= .
How do I remove these and still make sure that my decode works in the future?
Essentially I want alphanumeric to be used.
Rather than invent something new, I'd use Base64.urlsafe_encode64 (and its counterpart Base64.urlsafe_decode64) which is basically base64 with + and / replaced with - and _. This conforms to rfc 4648 so should be widely understandable
If you want alphanumeric, I think it is better and is practical to use base 36. Ruby has built-in encoding/decoding up to base 36 (26 letters and 10 numbers).
123456.to_s(36)
# => "qglj"
"qglj".to_i(36)
# => 123456
class Integer
Base62_digits = [*("0".."9"), *("a".."z"), *("A".."Z")]
def base_62
return "0" if zero?
sign = self < 0 ? "-" : ""
n, res = self.abs, ""
while n > 0
n, units = n.divmod(62)
res = Base62_digits[units] + res
end
sign + res
end
end
p 124.base_62 # => "20"
This could be adapted to handle lower bases, but it may be sufficient as is.

format string (postcode) in ruby

I need to re-format a list of UK postcodes and have started with the following to strip whitespace and capitalize:
postcode.upcase.gsub(/\s/,'')
I now need to change the postcode so the new postcode will be in a format that will match the following regexp:
^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$
I would be grateful of any assistance.
If this standards doc is to be believed (and Wikipedia concurs), formatting a valid post code for output is straightforward: the last three characters are the second part, everything before is the first part!
So assuming you have a valid postcode, without any pre-embedded space, you just need
def format_post_code(pc)
pc.strip.sub(/([A-Z0-9]+)([A-Z0-9]{3})/, '\1 \2')
end
If you want to validate an input post code first, then the regex you gave looks like a good starting point. Perhaps something like this?
NORMAL_POSTCODE_RE = /^([A-PR-UWYZ][A-HK-Y0-9][A-HJKS-UW0-9]?[A-HJKS-UW0-9]?)\s*([0-9][ABD-HJLN-UW-Z]{2})$/i
GIROBANK_POSTCODE_RE = /^GIR\s*0AA$/i
def format_post_code(pc)
return pc.strip.upcase.sub(NORMAL_POSTCODE_RE, '\1 \2') if pc =~ NORMAL_POSTCODE_RE
return 'GIR 0AA' if pc =~ GIROBANK_POSTCODE_RE
end
Note that I removed the '0-9' part of the first character, which appears unnecessary according to the sources I quoted. I also changed the alpha sets to match the first-cited document. It's still not perfect: a code of the format 'AAA ANN' validates, for example, and I think a more complex RE is probably required.
I think this might cover it (constructed in stages for easier fixing!)
A1 = "[A-PR-UWYZ]"
A2 = "[A-HK-Y]"
A34 = "[A-HJKS-UW]" # assume rule for alpha in fourth char is same as for third
A5 = "[ABD-HJLN-UW-Z]"
N = "[0-9]"
AANN = A1 + A2 + N + N # the six possible first-part combos
AANA = A1 + A2 + N + A34
ANA = A1 + N + A34
ANN = A1 + N + N
AAN = A1 + A2 + N
AN = A1 + N
PART_ONE = [AANN, AANA, ANA, ANN, AAN, AN].join('|')
PART_TWO = N + A5 + A5
NORMAL_POSTCODE_RE = Regexp.new("^(#{PART_ONE})[ ]*(#{PART_TWO})$", Regexp::IGNORECASE)
UK Postcodes aren't consistent, but they are finite - you might be better with a look-up table.
Reformat or pattern match? I suspect the latter, although upcasing it first is a good idea.
Before we proceed though I would point out that you are stripping spaces but your regex contains " {1,2}" which is "one or two space characters". As you have already stripped whitespace you've already caused all to fail the match.
Given a post code as input we can check whether it matches the regex using =~
Here we create some example post codes (taken from the wikipedia page), and test each one against the regex:
post_codes = ["M1 1AA", "M60 1NW", "CR2 6XH", "DN55 1PT", "W1A 1HQ", "EC1A 1BB", "bad one", "cc93h29r2"]
r = /^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$/
post_codes.each do |pc|
# pc =~ r will return something true if we have a match (specifically the integer of first match position)
# We use !! to display it as true|false
puts "#{pc}: #{!!(pc =~ r)}"
end
M1 1AA: true
M60 1NW: true
CR2 6XH: true
DN55 1PT: true
W1A 1HQ: true
EC1A 1BB: true
bad one: false
cc93h29r2: false

Resources