Ignore non UTF-8 characters in Beyond Compare - utf-8

My project include some units of measurement that are expressed with non UTF-8 characters like the squared symbol. With most editors these are displayed with the following character: �.
I am comparing parts of the source code with beyond compare and I would like to ignore the cases where these symbol appear. I tried with these two solutions:
Beyond Compare - ignore certain text strings?
How do I make Beyond Compare ignore certain differences while comparing versions of Delphi Form Files
but in both cases the differences are still marked in red (? vs � or ² vs �). How can I fix that?

If the characters are unprintable characters, you can define them as unimportant text in Beyond Compare 4's Text Compare using a hex value.
As an example, assume the character is superscript 2, the squared symbol, with hex value 0x00B2.
Load files in the Text Compare.
Click the Rules toolbar button (referee icon).
In the Importance tab, click Edit Grammar.
In the Grammar tab, click +.
Element name: Squared
Text matching: \x{00B2}
Check Regular Expression
Click OK.
Click OK.
In the Grammar element list, uncheck Squared to make it unimportant.
Click OK.
If View | Ignore Unimportant Text is turned on, differences matching Squared will show as a match (black). If it is turned off, differences matching matching Squared will show in blue.
In the above instructions, the regular expression \x{nnnn} matches on character with hex value nnnn.
References:
Unicode Character Superscript 2
Define Unimportant Text in Beyond Compare
Beyond Compare Help - Regular Expression Reference

Related

Beyond compare ignore one side comments

I found some instructions on how to show/hide differences in comments using beyond compare. However most of the answers show how to set comment as important text or not. That is, if a portion of code is commented on both sides then check if the comment are different or not.
I would like to ignore when only one side of the comparison is commented. In other words if I have
# # line1
# line2
on one side and
# line1
line2
I would like both lines to be marked as "unimportant differences" (if indeed the text is the same, otherwise to be marked as differences).
Beyond Compare will only compare text if it is of the same grammar element type. If one side is regular code and the other side is a comment, it will always mark it as an important difference.
To make regular text on one side and the same text commented on the other side show as a match, you'll need to edit the definition of a comment in the file format.
To edit a format, open Tools > File Formats.
Select the format that matches your files.
Go to the Grammar tab.
Select the Comment grammar element, which might be defined as # to end of line.
Click the Edit (gear) button.
Set the Category radio button to Basic.
Text matching: ^#\s
Check Regular expression.
Click OK, then Save.
The updated file format will treat # followed by a whitespace character as an unimportant comment, the remaining text in the line will be treated as regular text and compared to the other side.

How do I get Beyond Compare to Ignore non text characters

My problem is twofold. I'm using Beyond Compare integrated with Visual Studio 2015 as my Compare/Merge tool. While looking at my list of 'Pending Changes', I see a file which if I 'Right Click -> Source Control -> Compare with Latest Version', I can see no differences in the text compare as you can see below:
However, if I do a hex compare of the same, I get the following result:
I can see it's somehow gotten the EF BB BF at the starting and the Beyond Compare notes this difference with a red bar in the left side window.
On other occasions, I've seen files in 'Pending Changes' with 0d0a at the end which is apparently for a newline character but again Beyond Compare doesn't show this as a difference in Text Compare (I've seen Git GUi show it as a difference in the past).
How can I get Beyond Compare to ignore changes like these that don't show up in the Text Compare when it considers a particular file as a 'pending change' so I don't see it in the 'Pending Changes' window in the first place?
OR, if that's not at all possible,
How can I get Beyond Compare to show these changes in the default text compare so I can undo them easily?
To make the extra newline character show as a difference in the Text Compare:
Click the Rules toolbar button (referee icon).
In the Importance tab, check Compare line endings (PC/Mac/Unix).
To make it the default for new Text Compare sessions, change the dropdown from Use for this view only to Also update session defaults before you click OK.
If you turn on View > Visible Whitespace, the extra newline character will show as a red difference. When this setting is on, it will also show Windows style newline on one side and Unix style newline on the other side as a difference.
As AdrianHHH said, the EF BB BF is a UTF-8 byte order mark. It isn't possible to add or remove a BOM in the Text Compare. In the Hex Compare, it is possible to delete the BOM from a file.

Notepad++ convert leading spaces to tabs upon entry

Very close to reverse of this question. I prefer coding with 2-whitespace indentation, but need to have files indented with tabs to align with project convention. What I would like to do is preferably automatically convert 2 spaces upon entry to tab symbol in Notepad++ and have the editor configured to tab length of 2.
A possible manual way for doing this could be Edit->Blank Operations->Space to TAB but this converts all of my spaces to tabs, even those of length 1 - which are, for example, spaces between function arguments, not just leading spaces.
In a perfect case scenario I'm trying to achieve formatting style as described in this question, but with typing just spaces and the editor taking care of the rest.
I'm on Notepad++ 6.0, but willing to upgrade if this helps
Let me complete the answer of Ari Okkonen to add a workaround to the problem commented by Sergii Zaskaleta of mixed tabs and spaces at the beginning of the line.
Settings->Preferences->Tab Settings->Tab size: 2 (if not already)
Edit->Blank Operations->Space to TAB (Leading)
Select a block of lines of text with the problem of mixed spaces and tabs. Press [Tab] and [Shift]+[Tab] to add and remove a tab from each line. In the process, the leading spaces had been converted to tabs.
A manual way that seems to work: After having edited the file before saving you may try (Works in Notepad++ v6.8.3):
Settings->Preferences->Tab Settings->Tab size: 2 (if not already)
Edit->Blank Operations->Space to TAB (Leading)

Beyond Compare 3.3.10 ignores checkboxes for 'leading whitespace" and "embedded whitespace"

I would expect that Beyond Compare would ignore differences based on tabs vs spaces if in the Session Settings/Importance Tab, I check the boxes labeled Leading Whitespace and Embedded Whitespace while comparing text files using the default format. Neither checked nor unchecked causes those differences to be ignored.
What am I missing?
The checkboxes there are controlling what's important to the comparison. Whitespace will be important if they're checked and unimportant if they're unchecked. They do only affect text that doesn't match something else in the grammar though. If you're comparing C++ code, for example, and the whitespace occurs at the end of a comment line it will be classified as a comment instead.
Assuming it's classified as "unimportant" correctly, BC will still show it as a difference, but will show it in blue rather than red. You can hide unimportant differences using the View->Ignore Unimportant Differences menu item, which will make them appear using the matching coloring and filter as such.
If you're still having trouble you'll have better luck getting support if you email support#scootersoftware.com or post in our support forums at http://www.scootersoftware.com/vbulletin/ with a bit more information.
Go to: Tools -> File formats -> Grammar. Add a grammar item (the +) and then mark the Regular expression check box.
There you can add a regex matching the items you want and define the severity of them.
In general, this is very useful when you have some differences that are not important to you.

Conventions for the behavior of double or triple "click to select text" features?

Almost any mature program that involves text implements "double click to select the word" and, in some cases, "triple click to select additional stuff like an entire line" as a feature. I find these features useful but they are often inconsistent between programs.
Example - some programs' double clicks do not select the ending space after a word, but most do. Some recognize the - character as the end of a word, others do not. SO likes to select the entire paragraph as I write this post when I triple click it, VS web developer 2005 has no triple click support, and ultra-edit 32 will select one line upon triple clicking. We could come up with innumerable inconsistencies about how double and triple click pattern matching is implemented across programs.
I am concerned about how to implement this behavior in my program if nobody else has achieved a convention about how the pattern matching should work.
My question is, does a convention (conventions? maybe an MS or Linux convention?) exist that dictates how these features are supposed to behave to the end user? What, if any, are they?
I don’t believe there is a standard to the level of specification you want, and there probably shouldn’t be. Apple Human Interface Guidelines are the most complete. With respect to selecting content (as opposed to controls or discrete data objects), they say:
Double-clicking is most commonly used as a shortcut for other actions, such as… to select a word. Triple-clicking selects the next logical unit, as defined by the application. In a word-processing document, triple-clicking in a word selects the paragraph containing the word…. Double-clicking within a word selects the word. The selection should provide “smart” behavior; if the user deletes the selected word, for example, the space after the word should also be deleted… In some contexts—in a programming language, for example—it may be appropriate to allow users to select both the left and right parentheses (or braces or brackets) in a pair, as well as all the characters between them, by double-clicking either one of them.” (p115-116)
Apple is quite specific about what characters are and aren’t included in a word.
Microsoft’s Windows User Interaction Experience Guidelines say:
For some types of selectable objects, each click expands the effect of the click. For example, single-clicking in a text box sets the input location, double-clicking selects a word, and triple-clicking selects a sentence or paragraph. (p430)
Java Swing Look and Feel Design Guidelines say:
Double-clicking (clicking a mouse button twice in rapid succession without moving the mouse) is used to select larger units (for example, to select a word in a text field)…. Triple-clicking (clicking a mouse button three times in rapid succession without moving the mouse) is used to select even larger units (for instance, to select an entire line in a text field)…. A triple click in a line of text deselects any existing selection and selects the line.
The Gnome Human Interface Guidelines don’t say much about what double- and triple-clicking should do.
This gives you the freedom to choose whatever is best for your users. Double and tripling clicking are expert shortcuts, so their behavior should aim to maximize efficiency. Consider why the user is selecting something and design to make that easiest and fastest.
For example, apparently the rationale behind including the trailing space when double-clicking a word is that users usually select a word in order to copy or paste it in another position in the text. This implies you automatically include the trailing space in order keep the user from having to manually delete a remaining extra space at the source and add a word-separating space at the destination.
Likewise if users are selecting a line of code or paragraph to copy or move it somewhere else, then you probably want to include the newline characters so the user isn’t left with an empty line at the source and force to manually add a newline at the destination (assuming they didn’t want to take the line/paragraph and combine it with another line/paragraph.
If selection is for something other than copying and moving text in sentences, then none of this may apply and you don’t necessarily want to include trailing spaces or newlines. That’s why there shouldn’t be a standard.
An alternative is to do what Apple calls Intelligent Cut and Paste (see the Human Interface Guidelines), or Microsoft Word’s Smart Cut and Paste, where spaces, newlines and other adjustment are algorithmically figured out when cutting, copying, pasting, and deleting, not when selecting.
In my perfect world I would have it work like this.
Double click on a word selects the word only (a word according to the grammar rules of the locale), no trailing space (this is for easier copying between programs so that I would not need to remove any spaces when pasting)
If I remove the selected word my text editor is aware of my content and removes any additional spaces left over
A triple click selects a line with no trailing newlines. (A paragraph is a long line that has been wrapped)
In Windows, Linux and OS X double-click selects the word under cursor triple-click selects the entire line of text (single line only, i.e., wrapped line)
Finding answers and come up with a alternative solution:
I like to write code or command in text, and copy them to shell prompt without the ending \n
1. use notepad
2. surround each line with ()
3. use ctrl + double click.
Fine...

Resources