Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
"\u5546\u54c1\u7f16\u53f7" is displayed as "商品编号".
"\u5546\u54c1\u7f16\u53f7" # => "商品编号"
What is the character encoding in "\u5546\u54c1\u7f16\u53f7"? How can I convert "商品编号" to "\u5546\u54c1\u7f16\u53f7"?
The \uHHHH (where HHHH is in hex) notation is simply a way to reference Unicode characters by number. This is usually used when:
You don't know how to get things like 商 out of your keyboard.
You're working in an environment that can't display all the Unicode that you need.
When you say "\u5546\u54c1\u7f16\u53f7" and see "商品编号", it simply means that you're working in a modern terminal that is Unicode aware and has a good font.
In most cases it should matter which representation you use, it all ends up as the same bytes inside the machine. However, if you must get the \u version for some reason, then you can say things like this (assuming that your encoding starts out right):
ascii_friendly = str.chars.map { |c| '\u%4.4x' % c.ord }.join
Then when you print ascii_friendly to the screen, a file, or say a JSON stream, you'll see things like
\u5546\u54c1\u7f16\u53f7
Note that the \u5546 in there is not the single Unicode 商, it is the six characters \, u, 5, 5, 4, and 6. If your target is JSON, then the \u escapes will be interpreted properly when the JSON is parsed but if your target is anything else, it will just see the six characters rather than the single Unicode character you're looking for.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am having trouble remembering which one of the parameter expansions ${var%subst} or ${var#subst} remove from the front and which one from the back of the string. Example:
$ var=/a/b/c
$ echo dirname=${var#/*} filename=${var%%*/}
dirname=a/b/c filename=/a/b/c # Wrong!
$ echo dirname=${var%/*} filename=${var##*/}
dirname=/a/b filename=c
I always mix them and either end up writing some test commands or checking the manual. It's easy to remember that %% removes more then %, because %% is a longer string then %, 2 characters vs 1 character, same for ## vs #. But I always mix % with #.
Is there a memory rule to know which % or # remove from which end of the string?
Percent symbols % always come last in numbers (e.g. 86%), so they remove from the end.
The hash symbols # start comments, so they remove from the start.
Remember only 1 and other will be Opposite of it.
# Shebang starts from a hash which means it will remove from starting till pattern, if you remember this and you know what other does IMHO :)
For keyboards of US English layout, # is on the left and % on the right.
I use a figurative representation of the symbols:
%: Looks like a pair of scissors to cut the right part of a text strip while you hold the scissors in your right hand and the strip in your left hand.
#: Looks like an eraser you use to erase the left part of a text strip while you hold the right part with your right hand and the eraser in your left hand.
//: This bash specific text-replace, looks like the cuts in a strip to re-assemble or edit in-between parts.
And I thought I was alone with this problem.
My mnemonic is that since percent sign (%) looks like a slash surrounded by a couple of circles (°/o), put the slashes alongside ie. %/ - also, that also looks like it says o lol (°/o/).
With the asterisk in place (°/o/*) it also looks like a guyperson holding an orb and a BZZRT in hishir raised hands (can not be unseen).
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have to match pairs of strings, ignoring spaces " " and hyphens "-". I want to regard the following pairs as identical.
"2,3 chloro benzene" and "2,3 chlorobenzene"
"4'3',2-dinitrotoluene" and "4'3',2-di nitro toluene"
Due to the spaces, I cannot match them. How can I do that? I am not sure how to do it in Ruby.
Use String#delete to delete unwanted chars and normalize the two strings before comparing them, as shown below:
s1 = "2,3 chloro-benzene"
s2 = "2,3 chlorobenzene"
s1.delete(" -") == s2.delete(" -")
#=> true
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
The symbol is: ؤْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْ
What's so special about this symbol and where did it come from?
What can be done to validate against such input? Or even better, how can such symbols be displayed properly (i.e. not letting them overlap over other elements) ?
Well since it seems to be not as trivial as I thought for others here is my answer.
This is called Combining Diacritical Marks.
To give you an example you can write a ä directly or as ä which results in "ä".
Now you can mess up with that signs like here: "ä̈̈̈̈̈̈", here I entered: ä̈̈̈̈̈̈
To protect yourself to such "unicode" attacks you could limit the count of unicode chars which are allowed to come after each other. I cannot give you an exact example since you tags don't give a hint about your server side language. If you have a plain english website you might try to limit it to ascii chars only. However I would not recomment that, since I would be not allowed to sign then with my name :-)
I would just limit the count of Unicode characters after each other. That might been done with regex.
If you just want to avoid that the Unicode characters "break out" of their container try using style="overflow:auto" which seems to limit the way how it is rendered.
I just copied the symbol to SQL Server and Visual Studio and found that the symbol got converted to
So it looks like the combination of ْ (which looks like an Arabic symbol)symbol which the browser is not able to recognize.
The symbol is Arabic Hamza symbol.
Also the same symbol is interpreted correctly by IE.
So it looks like that some browsers are not able to recognize the symbol.
EDIT:
To validate such input usually you can use some sort validation(like to restirct user to enter only ASCII characters) using languages like Javascript or PHP through which you can restrict the user to input the characters as per your choice.
Or even better, how can such symbols be displayed properly
If the browser cannot render the symbol as the one you have shown then as a workaround you can put some limit on those characters like put them inside a div with overflow:auto but that would not be a good solution. A better one would be to use a validation script.
It strange that, on screen you will see only 1 character followed by a line drawn from nowhere.
But when inspected with chrome, It is actually characters with 1st character having Unicode 1572, followed by 161 characters that draws line having Unicode 1618 ! And after that there is Unicode (or ASCII code) 32 for space.
I am not sure if parsing your symbols in Javascript is gonna be helpful but here is a script that does that:
var text = 'your symbol goes here',
regex1 = /(?:[\u0624|\u0652])/g,
result;
// note that the symbol comprises of the letter and the repeated diacritics;
// to remove the symbol completely:
result = text.replace( regex1, '');
Here is a way to see what kind of characters are included in the symbol and how these chars made it looked very weird (it’s using javascript regex):
https://regex101.com/r/yW4aM8/3
You may wanna use meta tag: charset=UTF-8 to render the entire symbol correctly on all browsers than trying it only on IE. I would say the only reason your symbol looks weird is because the diacritics (the repeated chars) are not used correctly, otherwise, the chars included are all legit. I wouldn’t really be surprised if this symbol is just someone trying to misuse a form input or something for the same effect.
The symbol is using pure Arabic characters, and just for you to know the range of this language’s characters in the unicode are as follows (javascript regex) and available at unicode.org:
/[\u0600-\u06FF]/g
/[\u0600-\u06FF]/g.exec( ‘text here’ );
// it's advised that you wrap the Arabic words in spans to control and show them correctly, do the following:
'text includes arabic words'.replace(/(?:([\u0600-\u06FF]+))/g, '<span class="xyz">$1</span>';
and the css would be:
.xyz { unicode-bidi: bidi-override; }
I hope that helps a bit.
good luck.
$ echo -n ؤْْ | recode utf8..dump
UCS2 Nem Descripción
0624 wH arabic letter waw with hamza above
0652 0+ arabic sukun
0652 0+ arabic sukun
0652 0+ arabic sukun
[...lots of repeated lines...]
0652 0+ arabic sukun
That's the arabic waw (w) with a lot of diacritics: 1 hamza (precomposed as the character waw with hamza above) and about 160 repeated sukun diacritics.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Is there an easy way to take what I write to the write window and log it to a file? Or do I need to separately create an array of chars manually and open a file to write the char[] to? I would love to be able to write to file using regular expressions at the very least but I'm not finding much helpful info from the docs.
Looks like writeToLogEx(char format[], ...) can do what I want but it outputs to a Logging Block in the measurement setup. So I'll have some header and footer data that I don't want as well as CAN traffic if I don't put up a channel block.
Vector's Example:
char timeBuffer[64];
getLocalTimeString(timeBuffer);
writeToLogEx("===> %s",timeBuffer);
Regular Expression Options:
"%ld","%d" decimal display
"%lx","%x" hexadecimal display
"%lX","%X" hexadecimal display (upper case)
"%lu","%u" unsigned display
"%lo","%o" octal display
"%s" display a string
"%g","%lf" floating point display
"%c" display a character
"%%" display %-character
"%I64d" decimal display of a 64 bit value
"%I64x" hexadecimal display of a 64 bit value
"%I64X" hexadecimal display of a 64 bit value (upper case)
"%I64u" unsigned display of a 64 bit value
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I would like to start a new line after every 66 characters for any file that is input into a Ruby script.
some_string.insert( 66, "\n" )
puts some_string
shows that a new line starts after the 66th character but I need it to happen after each 66th character. In other words, each line should be 66 characters long (except possibly the last).
I'm sure it involves a regex but I've tried various with insert, scan, gsub and cannot get it to work.
I'm new to Ruby and programming and this is the first thing I've tried outside of a tutorial. Thanks for the information, all.
You could do something like this:
<your_string>.scan(/.{1,66}/).join("\n")
It will basically split <your_string> at every 66th character and then re-join it by adding the \n between each part.
Or this variation to not split words in half:
<your_string>.scan(/.{1,66} /).join("\n")
some_string.gsub(/.{66}/, "\n")
If you're interested in exploring an answer that doesn't use RegEx, try something like:
a = "Your string goes here"
d = 66
Array(0..a.length/d).collect {|j| a[j*d..(j+1)*d-1]}.join("\n")
The RegEx is likely faster, but this uses the Array Constructor, .collect and .join so it might be an interesting learning exercise. The first part generates an array of numbers based on the number of chunks (a.length/d). The collect gathers the substrings in to an array. The body of the collect generates substrings by ranges on the original string, and the join puts it back together with '\n' separators.
Use the following to split the string into an array of strings of length 66 and join those strings with a newline character.
some_string.scan(/.{1,66}/).join("\n")