I'm trying to have a function which takes a string and returns the same string without any accented letters. Instead, the accented letters should return the same letter without the accent. This function is not working:
function StripAccents(str)
accent = "ÈÉÊËÛÙÏÎÀÂÔÖÇèéêëûùïîàâôöç"
noaccent = "EEEEUUIIAAOOCeeeeuuiiaaooc"
currentChar = ""
result = ""
k = 0
o = 0
FOR k = 1 TO len(str)
currentChar = mid(str,k, 1)
o = InStr(accent, currentChar)
IF o > 0 THEN
result = result & mid(noaccent,k,1)
ELSE
result = result & currentChar
END IF
NEXT
StripAccents = result
End function
testStr = "Test : à é À É ç"
response.write(StripAccents(testStr))
This is the result using the above:
Test : E E Eu EE E
Disregarding possible encoding problems - you must change
result = result & mid(noaccent,k,1)
to
result = result & mid(noaccent,o,1)
I tried the example code with the correction added
Then I added more characters
Giving:
accent = "àèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛáéíóúÁÉÍÓÚðÐýÝãñõÃÑÕšŠžŽçÇåÅøØ"
noaccent = "aeiouAEIOUaeiouAEIOUaeiouAEIOUaeiouAEIOUdDyYanoANOsSzZcCaAoO"
Now I realised that there are a few more to deal with, namely
æ
Æ
ß
These need converting first using a simple replace them with ae AE and ss
Then it works fine other than it is important to not have <%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> or similar in the code
However having meta charset="UTF-8" in the header is not a big issue, it converts fine.
So if the code is needed on the page with <%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> in it, I do not know any answer to that
Thanks for the code greener, very useful for dealing with the common diacriticals :-)
You should probably do a decomposition normalization first (NFD). I think you could do this in VBA using a call to the WinAPI function NormalizeString (https://msdn.microsoft.com/en-us/library/windows/desktop/dd319093(v=vs.85).aspx). Then, you could remove the accent code points.
Related
Here's the deluge script to capitalize the first letter of the sentence and make the other letters small that isn't working:
a = zoho.crm.getRecordById("Contacts",input.ID);
d = a.get("First_Name");
firstChar = d.subString(0,1);
otherChars = d.removeFirstOccurence(firstChar);
Name = firstChar.toUppercase() + otherChars.toLowerCase();
mp = map();
mp.put("First_Name",d);
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
info Name;
info b;
I tried capitalizing the first letter of the alphabet and make the other letters small. But it isn't working as expected.
Try using concat
Name = firstChar.toUppercase().concat( otherChars.toLowerCase() );
Try removing the double-quotes from the Name value in the the following statement. The reason is that Name is a variable holding the case-adjusted name, but "Name" is the string "Name".
From:
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":"Name"});
To
b = zoho.crm.updateRecord("Contacts", Name,{"First_Name":Name});
My code currently looks like this:
FormatNumber((CDbl(0.05935)),4)
The returned value is 0.0594 rather than 0.0593 which is what I need.
You can try parsing this number to string then trimming it and again parsing back to float.
example:
v = 100.0097
x = Str$(v) ' Gives " 100.0097"
//This adds a leading space for positive numbers
or
x = CStr(v) ' Gives "100.0097"
and then trim it as your need
finalstr = LEFT(variable, (LEN(variable)-4))
then parse it to float
finaltrimed = CDbl(finalstr)
I tried almost all the methods (CLEAN,TRIM,SUBSTITUTE) trying to remove the character hiding in the beginning and the end of a text. In my case, I downloaded the bill of material report from oracle ERP and found that the item codes are a victim of hidden characters.
After so many findings, I was able to trace which character is hidden and found out that it's a question mark'?' (via VBA code in another thread) both at the front and the end. You can take this item code: 11301-21
If you paste the above into your excel and see its length =LEN(), you can understand my problem much better.
I need a good solution for this problem. Therefore please help!
Thank you very much in advance.
Thanks to Gary's Student, because his answer inspired me.
Also, I used this answer for this code.
This function will clean every single char of your data, so it should work for you. You need 2 functions: 1 to clean the Unicode chars, and other one to clean your item codes_
Public Function CLEAN_ITEM_CODE(ByRef ThisCell As Range) As String
If ThisCell.Count > 1 Or ThisCell.Count < 1 Then
CLEAN_ITEM_CODE = "Only single cells allowed"
Exit Function
End If
Dim ZZ As Byte
For ZZ = 1 To Len(ThisCell.Value) Step 1
CLEAN_ITEM_CODE = CLEAN_ITEM_CODE & GetStrippedText(Mid(ThisCell.Value, ZZ, 1))
Next ZZ
End Function
Private Function GetStrippedText(txt As String) As String
If txt = "–" Then
GetStrippedText = "–"
Else
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "[^\u0000-\u007F]"
GetStrippedText = regEx.Replace(txt, "")
End If
End Function
And this is what i get using it as formula in Excel. Note the difference in the Len of strings:
Hope this helps
You have characters that look like a space character, but are not. They are UniCode 8236 & 8237.
Just replace them with a space character (ASCII 32).
EDIT#1:
Based on the string in your post, the following VBA macro will replace UniCode characters 8236 amd 8237 with simple space characters:
Sub Kleanup()
Dim N1 As Long, N2 As Long
Dim Bad1 As String, Bad2 As String
N1 = 8237
Bad1 = ChrW(N1)
N2 = 8236
Bad2 = ChrW(N2)
Cells.Replace what:=Bad1, replacement:=" ", lookat:=xlPart
Cells.Replace what:=Bad2, replacement:=" ", lookat:=xlPart
End Sub
I am currently using Ruby's 'base64' but the strings that are created have special characters like /+= .
How do I remove these and still make sure that my decode works in the future?
Essentially I want alphanumeric to be used.
Rather than invent something new, I'd use Base64.urlsafe_encode64 (and its counterpart Base64.urlsafe_decode64) which is basically base64 with + and / replaced with - and _. This conforms to rfc 4648 so should be widely understandable
If you want alphanumeric, I think it is better and is practical to use base 36. Ruby has built-in encoding/decoding up to base 36 (26 letters and 10 numbers).
123456.to_s(36)
# => "qglj"
"qglj".to_i(36)
# => 123456
class Integer
Base62_digits = [*("0".."9"), *("a".."z"), *("A".."Z")]
def base_62
return "0" if zero?
sign = self < 0 ? "-" : ""
n, res = self.abs, ""
while n > 0
n, units = n.divmod(62)
res = Base62_digits[units] + res
end
sign + res
end
end
p 124.base_62 # => "20"
This could be adapted to handle lower bases, but it may be sufficient as is.
I need to re-format a list of UK postcodes and have started with the following to strip whitespace and capitalize:
postcode.upcase.gsub(/\s/,'')
I now need to change the postcode so the new postcode will be in a format that will match the following regexp:
^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$
I would be grateful of any assistance.
If this standards doc is to be believed (and Wikipedia concurs), formatting a valid post code for output is straightforward: the last three characters are the second part, everything before is the first part!
So assuming you have a valid postcode, without any pre-embedded space, you just need
def format_post_code(pc)
pc.strip.sub(/([A-Z0-9]+)([A-Z0-9]{3})/, '\1 \2')
end
If you want to validate an input post code first, then the regex you gave looks like a good starting point. Perhaps something like this?
NORMAL_POSTCODE_RE = /^([A-PR-UWYZ][A-HK-Y0-9][A-HJKS-UW0-9]?[A-HJKS-UW0-9]?)\s*([0-9][ABD-HJLN-UW-Z]{2})$/i
GIROBANK_POSTCODE_RE = /^GIR\s*0AA$/i
def format_post_code(pc)
return pc.strip.upcase.sub(NORMAL_POSTCODE_RE, '\1 \2') if pc =~ NORMAL_POSTCODE_RE
return 'GIR 0AA' if pc =~ GIROBANK_POSTCODE_RE
end
Note that I removed the '0-9' part of the first character, which appears unnecessary according to the sources I quoted. I also changed the alpha sets to match the first-cited document. It's still not perfect: a code of the format 'AAA ANN' validates, for example, and I think a more complex RE is probably required.
I think this might cover it (constructed in stages for easier fixing!)
A1 = "[A-PR-UWYZ]"
A2 = "[A-HK-Y]"
A34 = "[A-HJKS-UW]" # assume rule for alpha in fourth char is same as for third
A5 = "[ABD-HJLN-UW-Z]"
N = "[0-9]"
AANN = A1 + A2 + N + N # the six possible first-part combos
AANA = A1 + A2 + N + A34
ANA = A1 + N + A34
ANN = A1 + N + N
AAN = A1 + A2 + N
AN = A1 + N
PART_ONE = [AANN, AANA, ANA, ANN, AAN, AN].join('|')
PART_TWO = N + A5 + A5
NORMAL_POSTCODE_RE = Regexp.new("^(#{PART_ONE})[ ]*(#{PART_TWO})$", Regexp::IGNORECASE)
UK Postcodes aren't consistent, but they are finite - you might be better with a look-up table.
Reformat or pattern match? I suspect the latter, although upcasing it first is a good idea.
Before we proceed though I would point out that you are stripping spaces but your regex contains " {1,2}" which is "one or two space characters". As you have already stripped whitespace you've already caused all to fail the match.
Given a post code as input we can check whether it matches the regex using =~
Here we create some example post codes (taken from the wikipedia page), and test each one against the regex:
post_codes = ["M1 1AA", "M60 1NW", "CR2 6XH", "DN55 1PT", "W1A 1HQ", "EC1A 1BB", "bad one", "cc93h29r2"]
r = /^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$/
post_codes.each do |pc|
# pc =~ r will return something true if we have a match (specifically the integer of first match position)
# We use !! to display it as true|false
puts "#{pc}: #{!!(pc =~ r)}"
end
M1 1AA: true
M60 1NW: true
CR2 6XH: true
DN55 1PT: true
W1A 1HQ: true
EC1A 1BB: true
bad one: false
cc93h29r2: false