String from NSData fails using UTF8 but succeeds using ASCII - utf-8

I am scanning some barcodes and decoding them to Swift strings. The specific scanner provides an object that holds the information I need to build an NSData:
let rawData = decodedData.getData() // UnsafeMutablePointer<UInt8>
let rawDataSize = decodedData.getDataSize() // UInt32
let data = NSData(bytes: rawData, length: Int(rawDataSize)) // NSData
I then decode this into a string:
let string = NSString(data: data, encoding: NSUTF8StringEncoding) as? String
I find that certain barcodes return nil when decoding unless I switch to NSASCIIStringEncoding:
let string = NSString(data: data, encoding: NSASCIIStringEncoding) as? String
My understanding of string encoding is limited, but I was under the impression that any ASCII string could be decoded as UTF8 since ASCII is a subset of UTF8. Is this accurate?
If so, what else might be causing this issue?

The problem is that not every sequence of bytes is valid if interpreted as UTF-8. For example, a single byte with a value of 0xff = 255 is never valid in UTF-8. On the other hand, it might be that the ASCII encoding allows every byte value, even though this is not really correct.
You better have a good look at the data and see what encoding it actually is. And if it is just random bytes, then please do NOT convert them to a string.

Related

Get string with base-16 (hex) rendering of the bytes of an ASCII string

E.g.
input := "Office"
want := "4f6666696365" // Note: this is a string!!
I know that string literals are stored in UTF-8 already.
What is the easiest way to get convert this to string in UTF-8 representation?
Calling EncodeRune on each character seems too cumbersome.
What you're looking for is a string that contains the hex representation of your input string. That is not UTF-8. (Any string that's valid ASCII is also valid UTF-8.)
In any case, this is how to do what you want:
want := fmt.Sprintf("%x", []byte(input))

byte[] to string an string to byte[]

i know this was handled a lot here, but i couldnt solve my problem yet:
I read bytes from a Parceble Object and save them in a byte[], then I unmurshall
them back to an Object an it works all fine. But i have to send the bytes as a String, so i
have to convert the bytes to string and then return.
I thought it would work as follow:
byte[] bytes = p1.marshall(); //get my object as bytes
String str = bytes.toString();
byte[] someBytes = str.getBytes();
But it doesnt Work, when I "p2.unmarshall(someBytes, 0, someBytes.length);" with someBytes, but when I p2.unmarshall(bytes, 0, bytes.length); with bytes, it works fine. How can i convert bytes to String right?
You've got three problems here:
You're calling toString() on byte[], which is just going to give you something like "[B#15db9742"
You're assuming you can just convert a byte array into text with no specific conversion, and not lose data
You're calling getBytes() without specifying the character encoding, which is almost always a mistake.
In this case, you should just use base64 - that's almost always the right thing to do when converting arbitrary binary data to text. (If you were actually trying to decode encoded text, you should use new String(bytes, charset), but that's not the case here.)
So, using android.util.Base64:
String str = Base64.encodeToString(bytes, Base64.DEFAULT);
byte[] someBytes = Base64.decode(str, Base64.DEFAULT);

How to convert utf-8 encoded string to Turkish characters in Xcode?

I have a webservis in php and I encoded the string in utf-8 like this :
$str_output = mb_convert_encoding("MATEMATİK", "UTF-8");
$data_array = array('name' => $str_output);
echo json_encode($data_array);
I get this string from webservis in xcode : MATEMAT\u00ddK
I couldn't convert this string to Turkish string.
My json_dictionary is like this
2014-01-08 16:17:22.274 test_app[6432:70b] {
name = "MATEMAT\U00ddK";
}
I tried this encoding method, but it didn't work for me
NSString * name = [json_dictionary objectForKey:#"name"];
NSString * correctString = [NSString stringWithCString:[baslik cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSWindowsCP1254StringEncoding];
I got null
If I use NSUTF8StringEncoding
MATEMATÝK
Also I tried NSISOLatin1StringEncoding, NSISOLatin2StringEncoding ...
Thanks...
iOS is correctly decoding the \u00dd when you use NSUTF8StringEncoding (which is what you should be using). That's LATIN CAPITAL LETTER Y WITH ACUTE. The letter you want is LATIN CAPITAL LETTER I WITH DOT ABOVE, which is \u0130.
That suggests the problem is on your php side. If I had to guess, I'd suspect that the İ in your source file is not itself in the encoding that php expects. You may need to pass to "from" encoding to mb_convert_encoding depending on what encoding your editor is using.
I would strongly recommend that you stay in UTF-8 entirely if possible, and avoid creating a CP1254 (Turkish) string at all. UTF-8 is capable of encoding all the characters you need. In that case, you may be able to avoid the mb_convert_encoding entirely.

java.lang.NumberFormatException or java.nio.BufferUnderflowException when transforming bytes

I played around with some String -> byte -> binary code and I want my code to work for any byte[] array, currently it only works for, I am not sure ascii?
chinese DONT WORK.
String message =" 汉语";
playingWithFire(message.getBytes());
while String wow = "WOW..."; Works :( I want it to work for all utf-8 formates. Any pointers on how I can do it?
//thanks
public static byte[] playingWithFire(byte[] bytes){
byte[] newbytes = null;
newbytes = new byte[bytes.length];
for(int i = 0; i < bytes.length; i++){
String tempStringByte = String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0');
StringBuffer newByteBrf = null;
newByteBrf = new StringBuffer();
for(int x = 0; x < tempStringByte.length(); x++){
newByteBrf.append(tempStringByte.charAt(x));
}
/*short a = Short.parseShort(newByteBrf.toString(), 2);
ByteBuffer bytesads = ByteBuffer.allocate(2).putShort(a);
newbytes[i] = bytesads.get();
cause: java.nio.BufferUnderflowException
*/
//cause: java.lang.NumberFormatException: Value out of range.
newbytes[i] = Byte.parseByte(newByteBrf.toString(), 2);
}
return newbytes;
}
message.getBytes() in your case is trying to convert Chinese Unicode characters to bytes using the default character set on your computer. If its a western charset, its going to be wrong.
Notice that String.getBytes() has another form with String.getBytes(String) where the string is the name of a character encoding that is used to convert the chars of the string to bytes.
The char type will hold Unicode. The byte type only holds raw bits in groups of 8.
So, to convert a Unicode string to bytes encoded as UTF-16 you would use this code:
String message =" 汉语";
byte[] utf16Bytes = message.getBytes("utf-16");
Substitute the name of any encoding that you want to use.
Similarly new String(String, byte[]) constructor can take an array of bytes encoded in some fashion and, given the String, can convert those bytes to Unicode characters.
For example: If you want to convert those bytes, which were encoded as utf-16 above, back to a String (which has Unicode chars in it):
String newMessage = new String(utf16Bytes, "utf-16");
Since I don't know what you mean by "binary code" above, I can't go much farther. As I see it, the Unicode chars have a binary code inside them that represents the characters one-by-one. Also the byte array has a binary code in it that represents the characters with a many-bytes-to-one-character representation. If you want to encrypt the byte array somehow, use a standard, proven encryption method and proven, time-tested procedures to secure the contents.

Character generated from SHA-512 hash does not get saved to database

I'm hashing a password using SHA512. I'm using Entity Framework Code-First for my ORM.
Hashing Algorithm
public static string CreateSHA512Hash(string pwd, string salt)
{
string saltAndPwd = String.Concat(pwd, salt);
var ae = new ASCIIEncoding();
byte[] hashValue, messageBytes = ae.GetBytes(saltAndPwd);
var sHhash = new SHA512Managed();
hashValue = sHhash.ComputeHash(messageBytes);
sHhash.Dispose();
return ae.GetString(hashValue);
}
Code for generating salt:
//Generate a cryptographic random number.
var rng = new RNGCryptoServiceProvider();
var buff = new byte[size];
rng.GetBytes(buff);
rng.Dispose();
// Return a Base64 string representation of the random number.
return Convert.ToBase64String(buff);
Problem:
For some reason, it seems the hash function would randomly generate some characters, which the ones after those are not saved to the database. In this case (I'm not sure if there are other characters that does this), but it is \0.
For eg. Password: testuser. Salt: uvq5i4CfMcOMjKPkwhhqxw==
Hash generated: ????j???7?o\0?dE??????:???s?x??u?',Vj?mNB??c???4H???vF\bd?T? (copied during dubug mode in visual studio).
But EF actually saves ????j???7?o to the database. If I try to use the text visualizer in debug mode, it cuts it off also. If you noticed, it gets cut off right at the \0. All I could find about it is that its a null character.
Question
How can I save this null character in the database using Entity Framework Code-First? If this can't be saved, how can I prevent the SHA512 from generating these characters for me?
I recommend encoding the hash with Base64 before saving. On the other hand, encoding the salt with Base64 before adding to the password sounds strange.
A SHA-256 hash does not generate characters, it generates bytes. If you want to have a character string, as opposed to a byte array, you need to convert the bytes into a character format. As #wRAR has suggested, Base64 is one common way to do it or else you could just use a hex string.
What you should probably do:
Return the array of bytes for the SHA512 hash not a string.
Use a BINARY(64) database column to hold your hash value.
Why your method doesn't work:
These ASCII strings are NULL terminated
NULL is as you said \0
SHA512 creates a byte array and any byte can be NULL
To answer your specific question:
wRAR above was saying.
return Convert.ToBase64String(hashValue);

Resources