View special chars in Sublime Text - sublimetext

I am using both Notepad2 and Sublime Text 3 and I prefer ST3 over Notepad2 as it has a lot of great features. One thing I miss very much though is the possibility to view special characters in a logfile.
If I have a logfile with this one line in it (<null> is the HEX char 0x00):
ERROR: Received invalid data string [<null><null>e<null><null>test</null>]
If I open it in Notepad2 I get this view:
If I open it in ST3 I get this HEX view:
Is it possible to get the same view in ST3 as in Notepad2, so I can see the special characters?

I just found this option which can be set in the User Settings:
// Files containing null bytes are opened as hexadecimal by default
"enable_hexadecimal_encoding": false
This gives exactly what I wanted:

I've been using this:
https://sublime.wbond.net/packages/HexViewer
But that does not map \0 to NUL, this may cause alignment issue (unless you have a fixed-width NUL glyph in your font).

Related

Why is facepalm emoji 🤦‍♀️ followed by U+200D♀

I am trying to send utf-8 symbols via serial device to browser and display them. I have found out when I print facepalm emoji 🤦‍♀️ (on windows 10 Win+.) it has U+200D and ♀ characters behind. Others emojis don't have that. I was using View non-printable unicode characters tool. Also I found, if you print it in notepad it will show you ♀, when you print it in browser address bar ♀ is invisible but if you press backspace you delete it. And finally, if you print it in some html text input, you can delete whole emoji with single backspace. Why is that?
Emoji sequences have more than one code point to signify variations (below may or may not look different for each sequence depending on browser):
🤦 PERSON FACEPALMING U+1F926
🤦‍♂️ MAN FACEPALMING U+1F926 U+200D U+2642 U+FE0F
🤦‍♀️ WOMAN FACEPALMING U+1F926 U+200D U+2640 U+FE0F
References:
Emoji List, v13.1 No. 260-262.
Full Emoji List, v13.1, No. 260-262 (With browser-specific images)
Unicode® Standard Annex #29, UNICODE TEXT SEGMENTATION
Some editors/browsers handle the sequences better than others and may not show differences in all variations or may not recognize the latest Unicode specfication and newer emojis.

How to insert/copy+paste unicode whitespace into a text file using editors like Textmate?

I am trying to create a test csv file for a file cleaning script that is supposed to normalize all whitespace into "normal"/ "regular" whitespace character. The idea is I will insert a bunch of these oddball whitespace characters into this test file in some various locations.
Here are some sites that show these various and oddball whitespaces
https://en.wikipedia.org/wiki/Whitespace_character
http://jkorpela.fi/chars/spaces.html
I've tried to copy and paste from sources like that website but it seems like they always paste in as a normal space in Textmate. It could be that I am not copying what I think I am copying. In the past I've been able to copy and paste into Textmate special / unicode characters when I can clearly see what I am copying but with whitespace characters, I can't confirm since I can can't see it, so I am not sure if the problem is where I am copying from or that Textmate is converting it to the normal space when I paste it in.
If it is easier to use Textedit (the built in editor) or nano (command line editor) to do this I could use those. Or if there is another way other than copying and pasting that is better to get these into Textmate that would be an option.
I am on a MacbookPro running High Sierra MacOS.
If you have LibreOffice installed you can use the spreadsheet application to create these using their hexidecimal equivalent in 1 cell then doing a conversion using
=unichar(hex2dec(cell_ref_to_1rst_cell)).
Far less confusing and you can save the spreadsheet complete with comments as a handy reference. Then you should just be able to copy paste the cell with the unicode character when required.
If you’re using TextMate, various functions provided by the Unicode bundle could be helpful here (install via Preferences → Bundles → Unicode).
With this bundle installed you can use Insert Unicode Character ⌃⌥⌘I to insert a character by name. Search for “space” to get a list of all space characters, then simply click on the desired character (the full title of a character is shown on hover):
Of course once inserted all the space characters look almost identical. To identify them, use Show Unicode Name(s) ⌃⌥⌘U 6. This will display a tooltip showing the unicode of name of the character directly before the cursor (or the names of all selected characters, if a selection is active).
Also have a look at Show Character Inventory (press ⌃⌥⌘U and then select the command from the popup menu): This provides a convenient overview of all the characters in your document (or in the selected text, if a selection is active).

Notepad++ convert leading spaces to tabs upon entry

Very close to reverse of this question. I prefer coding with 2-whitespace indentation, but need to have files indented with tabs to align with project convention. What I would like to do is preferably automatically convert 2 spaces upon entry to tab symbol in Notepad++ and have the editor configured to tab length of 2.
A possible manual way for doing this could be Edit->Blank Operations->Space to TAB but this converts all of my spaces to tabs, even those of length 1 - which are, for example, spaces between function arguments, not just leading spaces.
In a perfect case scenario I'm trying to achieve formatting style as described in this question, but with typing just spaces and the editor taking care of the rest.
I'm on Notepad++ 6.0, but willing to upgrade if this helps
Let me complete the answer of Ari Okkonen to add a workaround to the problem commented by Sergii Zaskaleta of mixed tabs and spaces at the beginning of the line.
Settings->Preferences->Tab Settings->Tab size: 2 (if not already)
Edit->Blank Operations->Space to TAB (Leading)
Select a block of lines of text with the problem of mixed spaces and tabs. Press [Tab] and [Shift]+[Tab] to add and remove a tab from each line. In the process, the leading spaces had been converted to tabs.
A manual way that seems to work: After having edited the file before saving you may try (Works in Notepad++ v6.8.3):
Settings->Preferences->Tab Settings->Tab size: 2 (if not already)
Edit->Blank Operations->Space to TAB (Leading)

Control Characters and How OS/TextEditors interprets them?

I was going thru some content about control characters especially newline character(will focus on this).After going thru
http://en.wikipedia.org/wiki/Control_characters, got to know that \n is the line character in unix
while it is \r\n in windows. Now i got the question how OS comes into picture when iterpreting
ASCII Codes becoz i was under impression when we type any given character on keyboard, any OS send the same
bits and editor interprets that bit and display the corresponding character. Looks like this understanding is
wrong, Because different bit is sent in case of unix(\n) and windows(\r\n) when we press ENTER(new line terminator).As per
new understanding if we press ENTER on diff OS(say unix and windows),different bits are sent to editor and its
responsibilty of text editor to show the typed stuff in new line keeping the underlying OS in picture.Please let me
know if my understanding is correct as this will help me to understand other basics also?
Next question is if above is correct, what can be the reason different OS treat some control characters differently
when they treat all other characters equally? Is it becoz specific bits are already reserved in specific OS?
How an application treats keyboard input varies a bit, actually. When you press return the application is under no obligation to actually generate LF or CR+LF anywhere. E.g. it might decide to just end the current paragraph object and start a new one (e.g. in a word processor). If it's a Windows text editor then it will probably just write CR+LF into the file, while on Unix it just writes an LF.
They keyboard itself is very, very far removed from things you see on the screen or even on the disk. This goes through scan codes, keyboard layouts and other transformations before it ends up as text or markup somewhere.

How to display unicode control characters in visual studio text visualizer?

I get some text string from service, which contains Unicode control characters
(i.e \u202B or \u202A and others for Arabic language support).
But while debugging I can't see them in default text visualizer. So I need to enable display for such characters to determine which of them my text consists of. There is checkbox in text visualizer "show all characters", but it doesn't work as I expect.
Any suggestions?
Thanks in advance
Those are codes for explicit RLE and LRE order, ie if in RLE something should be displayed in LRE order.
http://unicode.org/reports/tr9/#Directional_Formatting_Codes

Resources