UTF-8 value of a character in ColdFusion? - utf-8

In ColdFusion I can determine the ASCII value of character by using asc()
How do I determine the UTF-8 value of a character?

<cfscript>
x = "漢"; // 3 bytes
// bytes of unicode character, a.k.a. String.getBytes("UTF-8")
bytes = charsetDecode(x, "UTF-8");
writeDump(bytes); // -26-68-94
// convert the 3 bytes to Hex
hex = binaryEncode(bytes, "HEX");
writeDump(hex); // E6BCA2
// convert the Hex to Dec
dec = inputBaseN(hex, 16);
writeDump(dec); // 15121570
// asc() uses the UCS-2 representation: 漢 = Hex 6F22 = Dec 28450
asc = asc(x);
writeDump(asc); // 28450
</cfscript>
USC-2 is fixed to 2 bytes, so it cannot support all unicode characters (as there can be as much as 4 bytes per character). But what are you actually trying to achieve here?
Note: If you run this example and get more than 3 bytes returned, make sure CF picks up the file as UTF-8 (with BOM).

Related

String of bits to byte. Value out of range - converting error java

I need to convert the set of bits that are represented via String. My String is multiple of 8, so I can divide it by 8 and get sub-strings with 8 bits inside. Then I have to convert these sub-strings to byte and print it in HEX. For example:
String seq = "0100000010000110";
The seq is much longer, but this is not the topic. Below you can see two sub-strings from the seq. And with one of them I have trouble, why?
String s_ok = "01000000"; //this value is OK to convert
String s_error = "10000110"; //this is not OK to convert but in HEX it is 86 in DEC 134
byte nByte = Byte.parseByte(s_ok, 2);
System.out.println(nByte);
try {
byte bByte = Byte.parseByte(s_error, 2);
System.out.println(bByte);
} catch (Exception e) {
System.out.println(e); //Value out of range. Value:"10000110" Radix:2
}
int in=Integer.parseInt(s_error, 2);
System.out.println("s_error set of bits in DEC - "+in + " and now in HEX - "+Integer.toHexString((byte)in)); //s_error set of bits in DEC - 134 and now in HEX - ffffff86
I can't understand why there is an error, for calculator it is not a problem to convert 10000110. So, I tried Integer and there are ffffff86 instead of simple 86.
Please help with: Why? and how to avoid the issue.
Well, I found how to avoid ffffff:
System.out.println("s_error set of bits in DEC - "+in + " and now in HEX - "+Integer.toHexString((byte)in & 0xFF));
0xFF was added. Bad thing is - I still don't know from where those ffffff came and I it is not clear for, what I've done. Is it some kind of byte multiplication or is it masking? I'm lost.

Wierd output characters (Chinese characters) when using Ruby to read / write CSV

I'm trying to print the first 5 lines from a set of large (>500MB) csv files into small headers in order to inspect the content more easily.
I'm using Ruby code to do this but am getting each line padded out with extra Chinese characters, like this:
week_num type ID location total_qty A_qty B_qty count਍㌀㐀ऀ猀漀爀琀愀戀氀攀ऀ㄀㤀㜀ऀ䐀䔀开伀渀氀礀ऀ㔀㐀㜀㈀ ㌀ऀ㔀㐀㜀㈀ ㌀ऀ ऀ㤀㄀㈀㔀㌀ഀ
44 small 14 A 907859 907859 0 550360਍㐀㄀ऀ猀漀爀琀愀戀氀攀ऀ㐀㈀㄀ऀ䐀䔀开伀渀氀礀ऀ㌀ ㈀㄀㜀㐀ऀ㌀ ㈀㄀
The first few lines of input file are like so:
week_num type ID location total_qty A_qty B_qty count
34 small 197 A 547203 547203 0 91253
44 small 14 A 907859 907859 0 550360
41 small 421 A 302174 302174 0 18198
The strange characters appear to be Line 1 and Line 3 of the data.
Here's my Ruby code:
num_lines=ARGV[0]
fh = File.open(file_in,"r")
fw = File.open(file_out,"w")
until (line=fh.gets).nil? or num_lines==0
fw.puts line if outflag
num_lines = num_lines-1
end
Any idea what's going on and what I can do to simply stop at the line end character?
Looking at input/output files in hex (useful suggestion by #user1934428)
Input file - each character looks to be two bytes.
Output file - notice the NULL (00) between each single byte character...
Ruby version 1.9.1
The problem is an encoding mismatch which is happening because the encoding is not explicitly specified in the read and write parts of the code. Read the input csv as a binary file "rb" with utf-16le encoding. Write the output in the same format.
num_lines=ARGV[0]
# ****** Specifying the right encodings <<<< this is the key
fh = File.open(file_in,"rb:utf-16le")
fw = File.open(file_out,"wb:utf-16le")
until (line=fh.gets).nil? or num_lines==0
fw.puts line
num_lines = num_lines-1
end
Useful references:
Working with encodings in Ruby 1.9
CSV encodings
Determining the encoding of a CSV file

Decrypting REG_NONE value in Registry

http://i.stack.imgur.com/xaP9s.jpg
Referring to the screenshot above as I'm not able to attach screenshot,
I want to convert the Filesize value which is in Hex to a String i.e. human readable format
The actual decimal value is 5.85 MB
While converting, I am not getting the actual value i.e. 5.85
Can any one suggest how do I convert the values.
I have a set of these hex values and want to convert them into a human readable format.
Each pair of hexadecimal numbers represents a byte, while the lowest value bytes are placed to the left:
0x00 -> 0
0xbb -> 187
0x5d -> 93
0*256^0 + 187*256^1 + 93*256^2 + 0*256^3 + 0*256^4 + 0*256^5 + 0*256^6 + 0*256^7
= 6142720
6142720 / 1024^2
= 5.85815
This storage format is called little-endian: https://en.wikipedia.org/wiki/Little-endian#Little-endian

Is byte 0xFF valid in a UTF-8 encoded string?

Can an UTF-8 string contain the byte 0xFF (255)?
No. It is specifically forbidden by the spec.
UTF-8, Number of 1 bytes,First code point is U+0000, Last code point is U+007F.
The bytes 0xFE and 0xFF do not valid in UTF-8.
The first byte is 0 in The UTF-8 when bytes only one.
[click image for more info about UTF-8 bytes]

Limiting the size of vbstring to 10240 bytes in VB6 [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
How to declare variable containing character limiting to 1000 bytes in vb6
“Object variable or With block variable not set” runtime error in VB6
Exact duplicate of askers own question How to declare variable containing character limiting to 1000 bytes in vb6
How to declarare the size of string variable as 10240 butes in VB6?
Try
Dim s As String * 5120
' Gives 10240 bytes, as pointed out by KristoferA
This will ensure the string is ALWAYS 5120 characters, if there are less in there, it will be padded with spaces. e.g.
Dim s As String * 10
s = "Hello"
Debug.Print "[" & s & "]"
gives
[Hello ]
10240 bytes* or characters*?
Dim strFoo As String * 5120 // 10240 bytes
Dim strFoo As String * 10240 // 10240 characters
(* = VB6 strings are unicode, so each character in a string takes 2 bytes)
This is the syntax for a fixed-length string of 5120 characters, which is 10240 bytes. The value will always have 5120 characters - trailing spaces will be added, or excess characters truncated. VB6 strings are Unicode (UTF-16) and therefore each character has 2 bytes.
Dim s As String * 5120 ' 5120 characters, 10240 bytes
It's not clear whether you are dealing with binary data rather than text. The Byte data type is better for binary data.
Dim byt(10240) as Byte ' an array of 10240 bytes

Resources