How to emit utf8 message (such as Chinese) in the code below?
pragma solidity ^0.7.2;
modifier buyerOnly() {
require(
msg.sender == buyer,
"For buyer ONLY" //<<==utf8?
);
It throws error with Chinese.
I think you are asking how to insert unicode characters into a string.
Disclaimer: I don't know Chinese, but I used Google Translate to convert "For buyer only" to Chinese simplified as this: 仅适用于买方 . Then I used an online tool to convert this string into \U and \x escape sequences below.
Per solidity docs for string encoding of literals:
\xNN takes a hex value and inserts the appropriate byte, while \uNNNN takes a Unicode codepoint and inserts an UTF-8 sequence.
So instead of trying this:
modifier buyerOnly() {
require(
msg.sender == buyer,
"For buyer ONLY" //<<==utf8?
);
This (the UTF-16 escapes will get converted back to UTF-8)
modifier buyerOnly() {
require(
msg.sender == buyer,
"\u4ec5\u9002\u7528\u4e8e\u4e70\u65b9"
);
Or this (insert UTF-8 bytes individually):
modifier buyerOnly() {
require(
msg.sender == buyer,
"\xe4\xbb\x85\xe9\x80\x82\xe7\x94\xa8\xe4\xba\x8e\xe4\xb9\xb0\xe6\x96\xb9"
);
You can use
string public saudacao = unicode"Olá sou uma variável";
You can use keyword unicode""
Example: string public saudacao = unicode"Olá sou uma variável 🦄";
Related
My application is developed in C++'11 and uses Qt5. In this application, I need to store a UTF-8 text as Windows-1250 coded file.
I tried two following ways and both work expect for Romanian 'ș' and 'ț' characters :(
1.
auto data = QStringList() << ... <some texts here>;
QTextStream outStream(&destFile);
outStream.setCodec(QTextCodec::codecForName("Windows-1250"));
foreach (auto qstr, data)
{
outStream << qstr << EOL_CODE;
}
2.
auto data = QStringList() << ... <some texts here>;
auto *codec = QTextCodec::codecForName("Windows-1250");
foreach (auto qstr, data)
{
const QByteArray encodedString = codec->fromUnicode(qstr);
destFile.write(encodedString);
}
In case of 'ț' character (alias 0xC89B), instead of expected 0xFE value, the character is coded and stored as 0x3F, that it is unexpected.
So I am looking for any help or experience / examples regarding text recoding.
Best regards,
Do not confuse ț with ţ. The former is what is in your post, the latter is what's actually supported by Windows-1250.
The character ț from your post is T-comma, U+021B, LATIN SMALL LETTER T WITH COMMA BELOW, however:
This letter was not part of the early Unicode versions, which is why Ţ (T-cedilla, available from version 1.1.0, June 1993) is often used in digital texts in Romanian.
The character referred to is ţ, U+0163, LATIN SMALL LETTER T WITH CEDILLA (emphasis mine):
In early versions of Unicode, the Romanian letter Ț (T-comma) was considered a glyph variant of Ţ, and therefore was not present in the Unicode Standard. It is also not present in the Windows-1250 (Central Europe) code page.
The story of ş and ș, being S-cedilla and S-comma is analogous.
If you must encode to this archaic Windows 1250 code page, I'd suggest replacing the comma variants by the cedilla variants (both lowercase and uppercase) before encoding. I think Romanians will understand :)
Team,
I am not able to use the Java 7 Underscores in Numeric Literals feature for getting the input from user and printing out in same format as declared. Please help in doing that? OR Is this feature is incomplete?
Scanner input = new Scanner( System.in );
int x = 1_00_000;
System.out.print( "Enter numeric literals with underscores: " ); //2_00_000
x = input.nextInt(); //java.util.InputMismatchException
System.out.println(x); // Prints in normal format, but want to be in 2_00_000.
NOTE: In Eclipse; I am able to change the value of numeric literal with Underscored numeric literal in runtime. This may be hack, but this is needed feature to input Underscored numeric literal in runtime rit?.
http://www.eclipse.org/jdt/ui/r3_8/Java7news/whats-new-java-7.html#miscellaneous
if you want maintain the underscores you can use String:
Scanner input = new Scanner( System.in );
System.out.print( "Enter numeric literals with underscores: " ); //2_00_000
String stringLiterals = input.nextLine();
System.out.println(stringLiterals); // Prints 2_00_000.
I have a webservis in php and I encoded the string in utf-8 like this :
$str_output = mb_convert_encoding("MATEMATİK", "UTF-8");
$data_array = array('name' => $str_output);
echo json_encode($data_array);
I get this string from webservis in xcode : MATEMAT\u00ddK
I couldn't convert this string to Turkish string.
My json_dictionary is like this
2014-01-08 16:17:22.274 test_app[6432:70b] {
name = "MATEMAT\U00ddK";
}
I tried this encoding method, but it didn't work for me
NSString * name = [json_dictionary objectForKey:#"name"];
NSString * correctString = [NSString stringWithCString:[baslik cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSWindowsCP1254StringEncoding];
I got null
If I use NSUTF8StringEncoding
MATEMATÝK
Also I tried NSISOLatin1StringEncoding, NSISOLatin2StringEncoding ...
Thanks...
iOS is correctly decoding the \u00dd when you use NSUTF8StringEncoding (which is what you should be using). That's LATIN CAPITAL LETTER Y WITH ACUTE. The letter you want is LATIN CAPITAL LETTER I WITH DOT ABOVE, which is \u0130.
That suggests the problem is on your php side. If I had to guess, I'd suspect that the İ in your source file is not itself in the encoding that php expects. You may need to pass to "from" encoding to mb_convert_encoding depending on what encoding your editor is using.
I would strongly recommend that you stay in UTF-8 entirely if possible, and avoid creating a CP1254 (Turkish) string at all. UTF-8 is capable of encoding all the characters you need. In that case, you may be able to avoid the mb_convert_encoding entirely.
I am currently learning c++/cli and I want to convert a character to its ASCII code decimal and vice versa( example 'A' = 65 ).
In JAVA, this can be achieved by a simple type casting:
char ascci = 'A';
char retrieveASCII =' ';
int decimalValue;
decimalValue = (int)ascci;
retrieveASCII = (char)decimalValue;
Apparently this method does not work in c++/cli, here is my code:
String^ words = "ABCDEFG";
String^ getChars;
String^ retrieveASCII;
int decimalValue;
getChars = words->Substring(0, 1);
decimalValue = Int32:: Parse(getChars);
retrieveASCII = decimalValue.ToString();
I am getting this error:
A first chance exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
Additional information: Input string was not in a correct format.
Any Idea on how to solve this problem?
Characters in a TextBox::Text property are in a System::String type. Therefore, they are Unicode characters. By design, the Unicode character set includes all of the ASCII characters. So, if the string only has those characters, you can convert to an ASCII encoding without losing any of them. Otherwise, you'd have to have a strategy of omitting or substituting characters or throwing an exception.
The ASCII character set has one encoding in current use. It represents all of its characters in one byte each.
// using ::System::Text;
const auto asciiBytes = Encoding::ASCII->GetBytes(words->Substring(0,1));
const auto decimalValue = asciiBytes[0]; // the length is 1 as explained above
const auto retrieveASCII = Encoding::ASCII->GetString(asciiBytes);
Decimal is, of course, a representation of a number. I don't see where you are using decimal except in your explanation. If you did want to use it in code, it could be like this:
const auto explanation = "The encoding (in decimal) "
+ "for the first character in ASCII is "
+ decimalValue;
Note the use of auto. I have omitted the types of the variables because the compiler can figure them out. It allows the code to be more focused on concepts rather than boilerplate. Also, I used const because I don't believe the value of "variables" should be varied. Neither of these is required.
BTW- All of this applies to Java, too. If your Java code works, it is just out of coincidence. If it had been written properly, it would have been easy to translate to .NET. Java's String and Charset classes have very similar functionality as .NET String and Encoding classes. (Encoding to the proper term, though.) They both use the Unicode character set and UTF-16 encoding for strings.
More like Java than you think
String^ words = "ABCDEFG";
Char first = words [0];
String^ retrieveASCII;
int decimalValue = ( int)first;
retrieveASCII = decimalValue.ToString();
Currently in L4 you can't get slug from cyrillic string. In L3 there was an ascii array for that. Where and how can I add this array/ability to create a slug from cyrillic string?
EDIT
The library https://github.com/cocur/slugify is a good option, but I decided to use in L4 a custom Slug library from L3 methods and ascii array. Now I have in L4 working Slug maker just like in L3.
You can install this library (https://github.com/cocur/slugify) via composer and use.
It's super easy to install and use.
I have faced this problem when I was working with Arabic language, so I've made the following function which solved the problem for me.
function make_slug($string = null, $separator = "-") {
if (is_null($string)) {
return "";
}
// Remove spaces from the beginning and from the end of the string
$string = trim($string);
// Lower case everything
// using mb_strtolower() function is important for non-Latin UTF-8 string | more info: http://goo.gl/QL2tzK
$string = mb_strtolower($string, "UTF-8");;
// Make alphanumeric (removes all other characters)
// this makes the string safe especially when used as a part of a URL
// this keeps latin characters and arabic charactrs as well
$string = preg_replace("/[^a-z0-9_\s-ءاأإآؤئبتثجحخدذرزسشصضطظعغفقكلمنهويةى]/u", "", $string);
// Remove multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
// Convert whitespaces and underscore to the given separator
$string = preg_replace("/[\s_]/", $separator, $string);
return $string;
}
This function solves the problem only for Arabic language, if you want to solve the problem for Cyrillic or any other language, you need to add Cyrillic characters (or the other language's characters) beside or instead of these ءاأإآؤئبتثجحخدذرزسشصضطظعغفقكلمنهويةى existing Arabic characters.