We have a piece of cross-platform code that uses wide strings. All our string constants are wide strings and we need to use CFSTR() on some of them. We use these macros to get rid of L from wide strings:
// strip leading L"..." from wide string macros
// expand macro, e.g. turn WIDE_STRING (#define WIDE_STRING L"...") into L"..."
# define WIDE2NARROW(WideMacro) REMOVE_L(WideMacro)
// L"..." -> REM_L"..."
# define REMOVE_L(WideString) REM_##WideString
// REM_L"..." -> "..."
# define REM_L
This works on both Windows and Linux. Not on Mac – we get the following error:
“error: pasting "REM_" and "L"qm"" does not give a valid preprocessing token”
Mac example:
#define TRANSLATIONS_DIR_BASE_NAME L"Translations"
#define TRANSLATIONS_FILE_NAME_EXTENSION L"qm"
CFURLRef appUrlRef = CFBundleCopyResourceURL( CFBundleGetMainBundle()
, macTranslationFileName
, CFSTR(WIDE2NARROW(TRANSLATIONS_FILE_NAME_EXTENSION))
, CFSTR(WIDE2NARROW(TRANSLATIONS_DIR_BASE_NAME))
);
Any ideas?
During tokenization, which happens before the preprocessor language, string literals are processed. So the L"qm" is converted to a wide string literal. Which means you are trying to token paste with a string literal(and not the letter L), which C99 forbids.
Related
How can i add an angle symbol to a string to put in a TMemo?
I can add a degree symbol easy enough based on its octal value from the extended ascii table:
String deg = "\272"; // 272 is octal value in ascii code table for degree symbol
Form1->Memo1->Lines->Add("My angle = 90" + deg);
But, if i try to use the escape sequence for the angle symbol (\u2220) i get a compiler error, W8114 Character represented by universal-character-name \u2220 cannot be represented in the current ansi locale:
UnicodeString deg = "\u2220";
Form1->Memo1->Lines->Add("My angle = 90" + deg);
Just for clarity, below is the symbol i'm after. I can just use the # if i have too, just wondering if this is possible without nashing of teeth. My target for this test was Win32 but i'll want it to work on iOS and Android too.
p.s. This table is handy to see the codes.
After following Rob's answer i've got it working but on iOS the angle is offset down below the horizontal with the other text. On Win32 it is tiny. Looks good on Android. I'll report as a bug to Embarcadero, albeit minor.
Here is code i used based on Rob's comments:
UnicodeString szDeg;
UnicodeString szAng;
szAng.SetLength(1);
szDeg.SetLength(1);
*(szAng.c_str()) = 0x2220;
*(szDeg.c_str()) = 0x00BA;
Form1->Memo1->Lines->Add("1: " + FormatFloat("##,###0.0",myPhasors.M1)+ szAng + FormatFloat("###0.0",myPhasors.A1) + szDeg);
Here is how looks when explicitly set the TMemo font to Courier New:
Here is the final code i'm using after Remy's replies:
UnicodeString szAng = _D("\u2220");
UnicodeString szDeg = _D("\u00BA");
Form1->Memo1->Lines->Add("1: " + FormatFloat("##,###0.0",myPhasors.M1)+ szAng + FormatFloat("###0.0",myPhasors.A1) + szDeg);
The compiler error is because you are using a narrow ANSI string literal, and \u2220 does not fit in a char. Use a Unicode string literal instead:
UnicodeString deg = _D("\u2220");
The RTL's _D() macro prefixes the literal with either the L or u prefix depending on whether UnicodeString uses wchar_t (Windows only) or char16_t (other platforms) for its character data.
The error indicates some kind of code range failure, which you ought to be able to avoid. Try setting the character code directly:
UnicodeString szDeg;
UnicodeString szMessage;
szDeg.SetLength(1);
*(szDeg.c_str())=0x2022;
szMessage=UnicodeString(L"My angle = 90 ")+szDeg;
Form1->Memo1->Lines->Add(szMessage);
A trivial implementation:
extern crate unicode_width;
fn main () {
let prompt = "\x1b[1;32m>>\x1b[0m ";
println!("{}", unicode_width::UnicodeWidthStr::width(prompt));
}
returns 12 but 3 is expected.
I would also be happy to use a crate that already does this, if there is one.
You're not going to get the width of an escape-sequence using a Unicode width calculation, simply because none of the string is printable—on a terminal.
If you control the content of the string, you could calculate the width by
copying the string to a temporary variable
substituting the escape sequences to empty strings, e.g., changing the pattern starting with \x1b, allowing any combination of [, ], <, >', =, ?, ; or decimal digits through the "final" characters in the range # to ~
measuring the length of what (if anything) is left.
In your example
let prompt = "\x1b[1;32m>>\x1b[0m ";
only ">> " would be left to measure.
For patterns... you would start here: Regex
Further reading:
crate Regex
17.3 Strings, Rust by Example
I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.
I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0. or Starting character is the blank :literal:` ` which has the value 0. but there are a few problems with how these end up working:
reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
The above can be "fixed"--it looks correct in the HTML () and plaintext (" ") output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect.
The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (" "), in plaintext it ends up double-quoted as "" "".
In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses textwrap) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.
I feel like I'm missing something; is there a good way to handle this?
Try using the unicode character codes. If I understand your question, this should work.
Here is a "|space|" and a non-breaking space (|nbspc|)
.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space
You should see:
Here is a “ ” and a non-breaking space ( )
I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.
There are two main components to the approach:
introduce a char role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node.
modify the text-wrapper Sphinx uses so that it won't break at the space.
Here's the code:
class TextWrapperDeux(TextWrapper):
_wordsep_re = re.compile(
r'((?<!`)\s+(?!`)|' # whitespace not between backticks
r'(?<=\s)(?::[a-z-]+:)`\S+|' # interpreted text start
r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
#property
def wordsep_re(self):
return self._wordsep_re
def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
"""Describe a character given by unicode name.
e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
"""
try:
character = nodes.unicodedata.lookup(text)
except KeyError:
msg = inliner.reporter.error(
':char: argument %s must be valid unicode name at line %d' % (text, lineno))
prb = inliner.problematic(rawtext, rawtext, msg)
return [prb], [msg]
app = inliner.document.settings.env.app
describe_char = "(U+%05X %s)" % (ord(character), text)
char = nodes.inline("char:", "char:", nodes.literal(character, character))
char += nodes.inline(describe_char, describe_char)
return [char], []
def setup(app):
app.add_role('char', char_role)
The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.
Markup: Starting character is the :char:`SPACE` which has the value 0.
It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.
And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.
The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.
I am currently learning c++/cli and I want to convert a character to its ASCII code decimal and vice versa( example 'A' = 65 ).
In JAVA, this can be achieved by a simple type casting:
char ascci = 'A';
char retrieveASCII =' ';
int decimalValue;
decimalValue = (int)ascci;
retrieveASCII = (char)decimalValue;
Apparently this method does not work in c++/cli, here is my code:
String^ words = "ABCDEFG";
String^ getChars;
String^ retrieveASCII;
int decimalValue;
getChars = words->Substring(0, 1);
decimalValue = Int32:: Parse(getChars);
retrieveASCII = decimalValue.ToString();
I am getting this error:
A first chance exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
Additional information: Input string was not in a correct format.
Any Idea on how to solve this problem?
Characters in a TextBox::Text property are in a System::String type. Therefore, they are Unicode characters. By design, the Unicode character set includes all of the ASCII characters. So, if the string only has those characters, you can convert to an ASCII encoding without losing any of them. Otherwise, you'd have to have a strategy of omitting or substituting characters or throwing an exception.
The ASCII character set has one encoding in current use. It represents all of its characters in one byte each.
// using ::System::Text;
const auto asciiBytes = Encoding::ASCII->GetBytes(words->Substring(0,1));
const auto decimalValue = asciiBytes[0]; // the length is 1 as explained above
const auto retrieveASCII = Encoding::ASCII->GetString(asciiBytes);
Decimal is, of course, a representation of a number. I don't see where you are using decimal except in your explanation. If you did want to use it in code, it could be like this:
const auto explanation = "The encoding (in decimal) "
+ "for the first character in ASCII is "
+ decimalValue;
Note the use of auto. I have omitted the types of the variables because the compiler can figure them out. It allows the code to be more focused on concepts rather than boilerplate. Also, I used const because I don't believe the value of "variables" should be varied. Neither of these is required.
BTW- All of this applies to Java, too. If your Java code works, it is just out of coincidence. If it had been written properly, it would have been easy to translate to .NET. Java's String and Charset classes have very similar functionality as .NET String and Encoding classes. (Encoding to the proper term, though.) They both use the Unicode character set and UTF-16 encoding for strings.
More like Java than you think
String^ words = "ABCDEFG";
Char first = words [0];
String^ retrieveASCII;
int decimalValue = ( int)first;
retrieveASCII = decimalValue.ToString();
So far I only see stuff like '<' ,but never see 'abc' nor "abc" in a yacc file.
a:
b '<' c;
Are the later two valid at all?
'abc' = is valid character since whenever you specify char like this compiler/preprocessor
simply remove last character , sometimes you would get "character constants must be one or two character long" compile time error in ANSI C.If it is not given by your compiler then
it has removed last 'c' from 'abc' should be assumed.
so
char ch='abc' ; // is actually equi. to ch = 'ab'
but while binding it will only use ch='a' ,that's why 'abc' is syntaxically correct but symantically wrong characher.(I wrote C coz. we use c89 tool i.e. POSIX C for compiling yacc and lex inputs)
Again yylex() works on characters as basic functional unit and not string (anything inside double quotes). So "abc" is not valid character not even character to match with yylex()'s
input.
(yylex() accepts string of token
exam. "10+20"
having grammer [[:DIGIT:]]+ [-+*/%] [[:DIGIT:]]+
and having tokens 1,0,+,2,0
The tokens lex can identify by default w/o specifying grammer are
10 as number
+ as char and
20 as number again
so it will match with grammer specified before )
you can also specify string in rules section for matching with , like
^["I am"] means match with any input line starting with "I am"
"I am" match with only input having string as "I am" only , It wont match with "I am Swapnil # vikas.ghode#gmail.com"