Determine length in bytes of CLOB/NCLOB with multibyte charset - oracle

I'm working with an Oracle database and want to determine the length in bytes of a NCLOB using a multibyte charset (UTF-8).
LENGTHB() does not support CLOBs oder NCLOBS with multibyte charset. I could convert the NCLOB to a BLOB and get its length then. But isn't there a better way to do this?

Oracle stores CLOB in UTF-16 (or possibly your NCHARACTER_SET?). Each character is stored as two bytes.
Here's how you can see the raw data:
SQL> CREATE TABLE test_clob (c CLOB, v VARCHAR2(10 CHAR), nv NVARCHAR2(10));
Table created
SQL> INSERT INTO test_clob VALUES ('0123456789', '0123456789', '0123456789');
1 row inserted
SQL> SELECT dbms_rowid.rowid_relative_fno(ROWID),
2 dbms_rowid.rowid_block_number(ROWID)
3 FROM test_clob;
DBMS_ROWID.ROWID_RELATIVE_FNO( DBMS_ROWID.ROWID_BLOCK_NUMBER(
------------------------------ ------------------------------
13 94314
SQL> alter system dump datafile 13 block 94314;
System altered
Navigate to your USER_DUMP_DEST directory and open the trace file, you should see something like this:
col 0: [56]
00 54 00 01 02 0c 80 00 00 02 00 00 00 01 00 00 03 64 c6 a5 00 24 09 00 00
00 00 00 00 14 00 00 00 00 00 01 00 30 00 31 00 32 00 33 00 34 00 35 00 36
00 37 00 38 00 39
LOB
Locator:
Length: 84(56)
Version: 1
Byte Length: 2
LobID: 00.00.00.01.00.00.03.64.c6.a5
Flags[ 0x02 0x0c 0x80 0x00 ]:
Type: CLOB
[...]
Inline data[20]
[...]
col 1: [10] 30 31 32 33 34 35 36 37 38 39
col 2: [20] 00 30 00 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00 39
As you can see the CLOB (column 0) is composed of a few header bytes and the same byte raw data as the UTF-16 NVARCHAR2 column.
As such I think you will have to convert your CLOB to UTF-8 to determine its length in this character set.
Here's an example I've used using DBMS_LOB.converttoblob:
SQL> DECLARE
2 l_clob CLOB := 'abcdéfghij'; -- the é will take two bytes!
3 l_blob BLOB;
4 l_dest_offset NUMBER := 1;
5 l_src_offset NUMBER := 1;
6 l_lang_context NUMBER := 0;
7 l_warning NUMBER;
8 BEGIN
9 dbms_lob.createtemporary(l_blob, FALSE, dbms_lob.call);
10 dbms_lob.converttoblob(dest_lob => l_blob,
11 src_clob => l_clob,
12 amount => dbms_lob.lobmaxsize,
13 dest_offset => l_dest_offset,
14 src_offset => l_src_offset,
15 blob_csid => nls_charset_id('AL32UTF8'),
16 lang_context => l_lang_context,
17 warning => l_warning);
18 dbms_output.put_line('byte length:'||dbms_lob.getlength(l_blob));
19 dbms_lob.freetemporary(l_blob);
20 END;
21 /
byte length:11
PL/SQL procedure successfully completed
You can convert to any character set using the function nls_charset_id.

Related

parse the following ibeacon packet

I am trying to parse this ibeacon packet received by scanning through a hci socket
b'\x01\x03\x00\x18\xbe\x99m\xf3\x14\x1e\x02\x01\x1a\x1a\xffL\x00\x02\x15e\xec\xe2\x90\xc7\xdbM\xd0\xb8\x1aV\xa6-b 2\x00\x00\x00\x02\xc5\xcc'
hex format 01 03 00 18 be 99 6d f3 14 1e 02 01 1a 1a ff 4c 00 02 15 65 ec e2 90 c7 db 4d d0 b8 1a 56 a6 2d 62 20 32 00 00 00 02 c5 cc
the parameters after applying the parser are
'UUID': '65ece290c7db4dd0b81a56a62d622032', 'MAJOR': '0000', 'MINOR': '0002', 'TX': -59, 'RSSI': -60
I am not sure if the RSSI portion of this parsing is right.
Referring to this https://stackoverflow.com/a/19040616/10355673
the last bit of the beacon advertising packet is the TX power value.
so how do we get the rssi value? here, I have taken rssi to be cc and tx to be c5. Is this correct?
There are flags headers before the manufacturer advertisement sequence shown below, but you really don't care about the flags. Here are the bytes you care about:
ff # manufacturee adv type
4c 00 # apple Bluetooth company code
02 15 # iBeacon type code
65 ec e2 90 c7 db 4d d0 b8 1a 56 a6 2d 62 20 32 # proximity uuid
00 00 # major
00 02 # minor
c5 # measured power (tx power)
cc # crc
Proximity UUUD: 65ece290-c7db-4dd0-b81a-56a62d622032,
Major: 0,
Minor: 2,
Measured Power: -59 dBm
The RSSI is not part of the transmitted packet, but a measurement taken by the receiver based on the strength of the signal. It will typically be a slightly different value for each packet that is received. You get this value from an API on a mobile device or embedded system that fetches it from the bluetooth chip.

What is the algorithm behind Cassandra's token function?

The Token function in my driver doesn't support a composite partition key, but it works very well with a single partition key, it takes a binary in 8 bits form as an input and pass it to murmur3 hash function and extract the 64-signed-little-integer (Token) from the result of murmur3 and ignore any extra binary buffer.
So my hope is to generate the binary for a composite partition key and then pass it to murmur3 as usual, an algorithm or bitwise operations will be really helpful or at least a source in any programming language.
I don't mean murmur3 part, only the token side which converts/mixes the composite partition key and outputs raw bytes in binary form.
Take a look at the drivers since they have generate the token to find the correct coordinator. https://github.com/datastax/java-driver/blob/8be7570a3c7fbba773ae2581bbf26e8196e7d6fb/driver-core/src/main/java/com/datastax/driver/core/Token.java#L112
Its slightly different than the typical murmur3 due to a bug when it was made and inability to change it without breaking existing clusters. So I would recommend copying it from them or better yet, use the existing drivers to find the token.
Finally I found a solution to my question :The Algorithm to compute the token for a composite partition key :
Primary_key((text, int)) -> therefore the partition key is a composite_partition_key (text, int).
Example : a row with composite_partition_key ('hello', 1)
Applying the algorithm :
1- lay-out the components of the composite partition key in big-endian (16 bits) presentation :
first_component = 'hello' -> 68 65 6c 6c 6f
sec_component = 1 -> 00 00 00 01
68 65 6c 6c 6f 00 00 00 01
2- add two-bytes length of component before each component
first_component = 'hello', length= 5-> 00 05 68 65 6c 6c 6f
sec_component = 1, therefore length= 4 -> 00 04 00 00 00 01
00 05 68 65 6c 6c 6f 00 04 00 00 00 01
3- add zero-value after each component
first_component = 'hello' -> 00 05 68 65 6c 6c 6f 00
sec_component = 1 -> 00 04 00 00 00 01 00
4- result
00 05 68 65 6c 6c 6f 00 00 04 00 00 00 01 00
now pass the result as whatever binary base your murmur3 function understand (make sure it's cassandra variant).

Re-attempting BASIC 6502 N-byte integer addition?

I initially (asked for help) and wrote a BASIC program in the 6502 pet emulator which added two n-byte integers. However, my feedback was that it was simply adding two 16 bit integers (not adding n-byte integers).
Can anyone help me understand this feedback by looking at my code and point me in the right direction to make a program that adds two n-byte integers?
Thank You for the collaboration!
Documentation:
Adds two n-byte integers using absolute indexed addressing. The addends begin at memory locations $0600, $0700 and the answer is at $0800. Byte length of the integers is at $0600 (¢ —> 256)
Machine Code:
18 a2 00 ac 00 06 bd 00
07 7d 00 08 9d 00 09 e8
00 88 00 d0
Op Codes, Documentation, Variables:
A1 = $0600
B1 = $0700
B2 = $0800
Z1 = $0900
[START] = $0500
CLC 18 // loads x with 0
LDX A2 00 // loads length on Y
LDY A1 AC 00 06 // load first operand
loop: LDA B1, x BD 00 07 // adds second operand
ADC B2, x 7D 00 08 // store result
STA Z1, x 9D 00 09 // go to next byte
INX E8 00 // count how many are left
DEY 88 00 // do more if needed
BNE loop D0
It looked to me like your code does what you claim -- adds two N byte operands in little-endian byte order. I vaguely remembered the various addressing modes of the 6502 from my misspent youth and the code seems fine. X is used to index the current byte from the two numbers, Y is a counter for the length of the operands in bytes and you loop over those bytes, stored at addresses 0x0700 and 0x0800 and write the result at address 0x0900.
Rather than get the Commodore 64 out of the attic and try it out I used an online virtual 6502 simulator. On this site we can set the memory address and load the byte values in. They even link to a page to assemble opcodes too. So setting the memory locations 0x0600 to "04" and both 0x0700 and 0x0800 to "04 03 02 01" we should see this code add these two 32 bit values (0x01020304 + 0x01020304 == 0x02040608).
Stepping through the code by clicking on the PC register and setting it to 0x0500 and then single stepping we see there is a bug in your machine code. After INX which compiles to E8 we hit a spurious 0x00 value(BRK) which terminates. The corrected code as below runs to completion and the expected value is seen by reading the memory at 0x0900.
0000 CLC 18
0001 LDX #$00 A2 00
0003 LDY $0600 AC 00 06
0006 LOOP: LDA $0700,X BD 00 07
0009 ADC $0800,X 7D 00 08
000C STA $0900,X 9D 00 09
000F INX E8
0010 DEY 88
0011 BNE LOOP: D0 F3
Memory dump:
:0900 08 06 04 02 00 00 00 00
:0908 00 00 00 00 00 00 00 00

Method to calculate 'mechListMIC' for SPNEGO GSS-API(NTLMSSP_AUTH) accept-completed(0) state

I am trying to learn and implement SMB2 Server. I am very interested to learn GSS-API (NTLMSSP, NTLMSSP_AUTH) inside. So, I am doing experiment with my own component of GSS-API. I read the description of mechListMIC in RFC4178 & RFC2478. But I couldn’t understand how to calculate mechListMIC for ‘SessionSetup Response, Unknown message type’ response.
Actually, I can generate the mechListMIC for negTokenInit phase of ‘NegotiateProtocol Response’. But the problem is, when client sends ‘SessionSetup Request, NTLMSSP_AUTH, User: Domain\Administrator, Unknown message type’ request, I can’t understand how is it generating ‘mechListMIC: 01 00 00 00 78 1E E9 4A DB 99 7F E9 00 00 00 00’ and how should I send response back in ‘SessionSetup Response, Unknown message type’ with corresponding mechListMIC based on the previous SessionSetup Request.
I tried with the following Info:
SMB2.CSessionSetup.securityBlob.GSSAPI.InitialContextToken.InnerContextToken.SpnegoToken.NegTokenInit.MechTypes , hex data = 30 0C 06 0A 2B 06 01 04 01 82 37 02 02 0A
AND
SMB2.CSessionSetup.securityBlob.GSSAPI.NegotiationToken.NegTokenResp.MechListMic, hex data = 01 00 00 00 78 1E E9 4A DB 99 7F E9 00 00 00 00
SecBuffer SignBuffers[2];
SignBufferDesc.ulVersion = SECBUFFER_VERSION; // SECBUFFER_VERSION = 0
SignBufferDesc.cBuffers = 2;
SignBufferDesc.pBuffers = SignBuffers;
SignBuffers[0] = 30 0C 06 0A 2B 06 01 04 01 82 37 02 02 0A;
SignBuffers[1] = 01 00 00 00 78 1E E9 4A DB 99 7F E9 00 00 00 00;
SignBuffers[0].BufferType = SECBUFFER_DATA; // SECBUFFER_DATA = 1
SignBuffers[1].BufferType = SECBUFFER_TOKEN; // SECBUFFER_TOKEN = 2
Can anyone please tell me what information do I need to use inside HMAC-MD5 (key, data) algorithm to generate mechListMIC for SessionSetup Response and how?
If it is possible to create a step-by-step example using my test case to calculate mechListMIC for ‘SessionSetup Response, Unknown message type’ response, that would be very helpful for me. Please let me know if you need any further information.
Thanks,
Shishir
Please find the answer in MSDN site
http://social.msdn.microsoft.com/Forums/gu-IN/os_fileservices/thread/d00b4e1a-077b-4620-99c7-da7bf86d5212

Whats going on with this byte array?

I have a byte array:
00 01 00 00 00 12 81 00 00 01 00 C8 00 00 00 00 00 08 5C 9F 4F A5 09 45 D4 CE
It is read via StreamReader using UTF8 encoding
// Note I can't change this code, to many component dependent on it.
using (StreamReader streamReader =
new StreamReader(responseStream, Encoding.UTF8, false))
{
string streamData = streamReader.ReadToEnd();
if (requestData.Callback != null)
{
requestData.Callback(response, streamData);
}
}
When that function runs I get the following returned to me (i converted to a byte array)
00 01 00 00 00 12 EF BF BD 00 00 01 00 EF BF BD 00 00 00 00 00 08 5C EF BF BD 4F EF BF BD 09 45 EF BF BD
Somehow I need to take whats returned to me and get it back to the right encoding and the right byte array, but I've tried alot.
Please be aware, I'm working with WP7 limited API.
Hopefully you guys can help.
Thanks!
Update for help...
if I do the following code, it's almost right, only thing that is wrong is the 5th to last byte gets split out.
byte[] writeBuf1 = System.Text.Encoding.UTF8.GetBytes(data);
string buf1string = System.Text.Encoding.BigEndianUnicode.GetString(writeBuf1, 0, writeBuf1.Length);
byte[] writeBuf = System.Text.Encoding.BigEndianUnicode.GetBytes(buf1string);
The original byte array is not encoded as UTF-8. The StreamReader therefore replaces each invalid byte with the replacement character U+FFFD. When that character gets encoded back to UTF-8, this results in the byte sequence EF BF BD. You cannot construct the original byte value from the string because the information is completely lost.

Resources