How to represent 11111111 as a byte in java - byte

When I say that 0b11111111 is a byte in java, it says " cannot convert int to byte," which is because, as i understand it, 11111111=256, and bytes in java are signed, and go from -128 to 127. But, if a byte is just 8 bits of data, isn't 11111111 8 bits? I know 11111111 could be an integer, but in my situation it must be represented as a byte because it must be sent to a file in byte form. So how do I send a byte with the bits 11111111 to a file(by the way, this is my question)? When I try printing the binary value of -1, i get 11111111111111111111111111111111, why is that? I don't really understand how signed bytes work.

You need to cast the value to a byte:
byte b = (byte) 0b11111111;
The reason you need the cast is that 0b11111111 is an int literal (with a decimal value of 255) and it's outside the range of valid byte values (-128 to +127).

Java allows hex literals, but not binary. You can declare a byte with the binary value of 11111111 using this:
byte myByte = (byte) 0xFF;
You can use hex literals to store binary data in ints and longs as well.
Edit: you actually can have binary literals in Java 7 and up, my bad.

Related

How do I convert an integer to a byte in Go

I'm generating a random number in the range of 65 to 90 (which corresponds to the byte representations of uppercase characters as bytes). The random number generator returns an integer value and I want to convert it to a byte.
When I say I want to convert the integer to a byte, I don't mean the byte representation of the number - i.e. I don't mean int 66 becoming byte [54 54]. I mean, if the RNG returns the integer 66, I want a byte with the value 66 (which would correspond to an uppercase B).
Use the byte() conversion to convert an integer to a byte:
var n int = 66
b := byte(n) // b is a byte
fmt.Printf("%c %d\n", b, b) // prints B 66
You should be able to convert any of those integers to the char value by simply doing character := string(asciiNum) where asciiNum is the integer that you've generated, and character will be the character with the byte value corresponding to the generated int

How MKI$ and CVI Functions works

I am working on GwBasic and want to know how 'CVI("aa")' returns '24929' is that converts each char to ASCII but code of "aa" is 9797.
CVI converts between a GW-BASIC integer and its internal representation in bytes. That internal representation is a 16-bit little-endian signed integer, so that the value you find is the same as ASC("a") + 256*ASC("a"), which is 97 + 256*97, which is 24929.
MKI$ is the opposite operation of CVI, so that MKI$(24929) returns the string "aa".
The 'byte reversal' is a consequence of the little endianness of GW-BASIC's internal representation of integers: the leftmost byte of the representation is the least significant byte, whereas in hexadecimal notation you would write the most significant byte on the left.

C++ Output a Byte Clarification

From textbook:
So I know a byte has 8 bits and the right bit-shift adds zero bits to the left and pops off bits from the right.
But how is it used the above example to output a byte?
I would've expected:
putchar(b >> 8)
putchar(b >> 7)
putchar(b >> 6)
etc.
Since I assume putchar outputs the popped off bits?
putchar prints the ascii character corresponding to the integer given.
putchar(0x41) converts the integer 0x41 into an unsigned char (with a size of one byte) and prints out the ascii character corresponding to 0x41 (which is "A").
The key thing to realize here that putchar only looks at the lower 8 bits, i.e. putchar(0x41) and putchar(0xffffff41) do the same thing.
Now let's look at what happens when you pass something to your function above.
outbyte(0x41424344);
first it bitshifts b by 24 bits, and then calls putchar on that value
0x41424344 << 24; //0x00000041
putchar(0x00000041); //A
then it bitshifts b by 16 bits, and then calls putchar on that value
0x41424344 << 24; //0x00004142
putchar(0x00004142); //B
etc.
Here it is in action: http://ideone.com/3xeFSx

Conversion from int to 4 byte unsigned int

I'm trying to convert an int to a 4 byte unsigned ints. I was reading a post and I'm not sure what comparison does. I know that its a sub mask, but I'm not sure when to use the & and when to use the |. Lets use the number 6 as an example, if the LSB is the number 6. Why would we do
6 & 0xFF// I know that we are comparing it to 11111111
when do we use the OR operator? I'm still not sure how to use the & nor |
x & 0xFF will set all bits of x to zero except for the last byte (which stays the same). If you had used a bitwise or (|), it would leave the bits of x set, and set all the bits of the last byte to 1.
Typically, the comparison will be something like (x & 0xFF) == x. This is to make sure that the first three bytes of x are all 0.

Convert character from UTF-8 to ISO-8859-1 manually

I have the character "ö". If I look in this UTF-8 table I see it has the hex value F6. If I look in the Unicode table I see that "ö" has the indices E0and 16. If I add both I get the hex value of the code point of F6. This is the binary value 1111 0110.
1) How do I get from the hex value F6 to the indices E0 and 16?
2) I don't know how to come from F6 to the two bytes C3 B6 ...
Because I didn't got the results I tried to go the other way. "ö" is represented in ISO-8859-1 as "ö". In the UTF-8 table I can see that "Ã" has the decimal value 195 and "¶" has the decimal value 182. Converted to bits this is 1100 0011 1011 0110.
Process:
Look in a table and get the unicode for the character "ö". Calculated from the indices E0 and 16 you get the Unicode U+00F6.
According to the algorithm posted by wildplasser you can calculate the coded UTF-8 value C3 and B6.
In the binary form you get 1100 0011 1011 0110 which corresponds to the decimal values 195 and 182.
If these values are interpreted as ISO 8859-1 (only 1 byte) then you get "ö".
PS: I found also this link, which shows the values from step 2.
The pages you are using are confusing you somewhat. Neither your "UTF-8 table" or "Unicode table" are giving you the value of the code point in UTF-8. They are both simply listing the Unicode value of the characters.
In Unicode, every character ("code point") has a unique number assigned to it. The character ö is assigned the code point U+00F6, which is F6 in hexadecimal, and 246 in decimal.
UTF-8 is a representation of Unicode, using a sequence of between one and four bytes per Unicode code point. The transformation from 32-bit Unicode code points to UTF-8 byte sequences is described in that article - it is pretty simple to do, once you get used to it. Of course, computers do it all the time, but you can do it with a pencil and paper easily, and in your head with a bit of practice.
If you do that transformation, you will see that U+00F6 transforms to the UTF-8 sequence C3 B6, or 1100 0011 1011 0110 in binary, which is why that is the UTF-8 representation of ö.
The other half of your question is about ISO-8859-1. This is a character encoding commonly called "Latin-1". The numeric values of the Latin-1 encoding are the same as the first 256 code points in Unicode, thus ö is F6 in Latin-1.
Once you have converted between UTF-8 and standard Unicode code points (UTF-32), it should be trivial to get the Latin-1 encoding. However, not all UTF-8 sequences / Unicode characters have corresponding Latin-1 characters.
See the excellent article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) for a better understanding of character encodings and transformations between them.
unsigned cha_latin2utf8(unsigned char *dst, unsigned cha)
{
if (cha < 0x80) { *dst = cha; return 1; }
/* all 11 bit codepoints (0x0 -- 0x7ff)
** fit within a 2byte utf8 char
** firstbyte = 110 +xxxxx := 0xc0 + (char>>6) MSB
** second = 10 +xxxxxx := 0x80 + (char& 63) LSB
*/
*dst++ = 0xc0 | (cha >>6) & 0x1f; /* 2+1+5 bits */
*dst++ = 0x80 | (cha) & 0x3f; /* 1+1+6 bits */
return 2; /* number of bytes produced */
}
To test it:
#include <stdio.h>
int main (void)
{
char buff[12];
cha_latin2utf8 ( buff, 0xf6);
fprintf(stdout, "%02x %02x\n"
, (unsigned) buff[0] & 0xff
, (unsigned) buff[1] & 0xff );
return 0;
}
The result:
c3 b6

Resources