How can I refactor an existing source code file to normalize all use of tab? - refactoring

Sometimes I find myself editing a C source file which sees both use of tab as four spaces, and regular tab.
Is there any tool that attempts to parse the file and "normalize" this, i.e. convert all occurrences of four spaces to regular tab, or all occurrences of tab to four spaces, to keep it consistent?
I assume something like this can be done even with just a simple vim one-liner?

There's :retab and :retab! which can help, but there are caveats.
It's easier if you're using spaces for indentation, then just set 'expandtab' and execute :retab, then all your tabs will be converted to spaces at the appropriate tab stops (which default to 8.) That's easy and there are no traps in this method!
If you want to use 4 space indentation, then keep 'expandtab' enabled and set 'softtabstop' to 4. (Avoid modifying the 'tabstop' option, it should always stay at 8.)
If you want to do the inverse and convert to tabs instead, you could set 'noexpandtab' and then use :retab! (which will also look at sequences of spaces and try to convert them back to tabs.) The main problem with this approach is that it won't just consider indentation for conversion, but also sequences of spaces in the middle of lines, which can cause the operation to affect strings inside your code, which would be highly undesirable.
Perhaps a better approach for replacing spaces with tabs for indentation is to use the following substitute command:
:%s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)#
Yeah it's a mouthful... It's matching whitespace at the beginning of the lines, then using the indent() function to find the total indentation (that function calculates indentation taking tab stops in consideration), then dividing that by the 'tabstop' to decide how many tabs and how many spaces a specific line needs.
If this command works for you, you might want to consider adding a mapping or :command for it, to keep it handy. For example:
command! -range=% Retab <line1>,<line2>s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)
This also allows you to "Retab" a range of the file, including one you select with a visual selection.
Finally, one last alternative to :retab is that to ask Vim to "reformat" your code completely, using the = command, which will use the current 'indentexpr' or other indentation configurations such as 'cindent' to completely reindent the block. That typically respects your 'noexpandtab' and 'smarttabstop' options, so it use tabs and spaces for indentation consistently. The downside of this approach is that it will completely reformat your code, including changing indentation in places. The upside is that it typically has a semantic understanding of the language and will be able to take that in consideration when reindenting the code block.

Related

asciidoc: Is there a way to get around the problem of lines beginning with [colour] attributes ending with ] not displaying in asciidoc?

My target asciidoc text is this:
[red]#Some prompt[x]# Make sure the option is [checked]
But it won't display in asciidoc
On further investigation, I found that any line beginning with a [colour] in square brackets, and ending in a right-bracket is similarly not displayed.
Now, in this case, I've got around the problem by putting the whole prompt section in bold, like this:
*[red]#Some prompt[x]#* Make sure the option is [checked]
but this is not ideal. Adding a period after the final close bracket \] also AVOIDS the problem - but in my use case I didn't like it.
I'd like to know if there is a better way. So far I've tried:
Escaping the leading open bracket \[
Escaping the final close bracket \]
Removing the [x] in the middle, thinging the additional brackets in the middle may influence the outcome
but none of these has worked.
So my question is:
Is there a way to get around the problem of lines beginning with [colour] attributes ending with ] not displaying in asciidoc?
It seems to me that a line which begins with an opening bracket and ends with a closing bracket is being interpreted as a block attribute line.
There are a number of ways you can mitigate this.
Use a character replacement attribute. There are many built-in attributes, or you can easily define your own.
For example:
[.red]#Some prompt[x]# Make sure the option is [checked{endsb}
Use one of the inline pass-through syntaxes, for example ++:
[.red]#Some prompt[x]# Make sure the option is [checked++]++
Prevent the first opening bracket from being the first character of the line. Also, uses a built-in attribute, and the markup needs to be changed to unconstrained.
For example:
{empty}[.red]##Some prompt[x]## Make sure the option is [checked]

Terminal overwriting same line when too long

In my terminal, when I'm typing over the end of a line, rather than start a new line, my new characters overwrite the beginning of the same line.
I have seen many StackOverflow questions on this topic, but none of them have helped me. Most have something to do with improperly bracketed colors, but as far as I can tell, my PS1 looks fine.
Here it is below, generated using bash -x:
PS1='\[\033[01;32m\]\w \[\033[1;36m\]☔︎ \[\033[00m\] '
Yes, that is in fact an umbrella with rain; I have my Bash prompt update with the weather using a script I wrote.
EDIT:
My BashWeather script actually can put any one of a few weather characters, so it would be great if we could solve for all of these, or come up with some other solution:
☂☃☽☀︎☔︎
If the umbrella with rain is particularly problematic, I can change that to the regular umbrella without issue.
The symbol being printed ☔︎ consists of two Unicode codepoints: U+2614 (UMBRELLA WITH RAIN DROPS) and U+FE0E (VARIATION SELECTOR-15). The second of these is a zero-length qualifier, which is intended to enforce "text style", as opposed to "emoji style", on the preceding symbol. If you're viewing this with a font can distinguish the two styles, the following might be the emoji version: ☔︉ Otherwise, you can see a table of text and emoji variants in Working Group document N4182 (the umbrella is near the top of page 3).
In theory, U+FE0E should be recognized as a zero-length codepoint, like any other combining character. However, it will not hurt to surround the variant selector in PS1 with the "non-printing" escape sequence \[…\].
It's a bit awkward to paste an isolated variant selector directly into a file, so I'd recommend using bash's unicode-escape feature:
WEATHERCHAR=$'\u2614\[\ufe0e\]'
#...
PS1=...${WEATHERCHAR}...
Note that \[ and \] are interpreted before parameter expansion, so WEATHERCHAR as defined above cannot be dynamically inserted into the prompt. An alternative would be to make the dynamically-inserted character just the $'\u2614' umbrella (or whatever), and insert the $'\[\ufe0e\]' in the prompt template along with the terminal color codes, etc.
Of course, it is entirely possible that the variant indicator isn't needed at all. It certainly makes no useful difference on my Ubuntu system, where the terminal font I use (Deja Vu Sans Mono) renders both variants with a box around the umbrella, which is simply distracting, while the fonts used in my browser seem to render the umbrella identically with and without variants. But YMMV.
This almost works for me, so should probably not be considered a complete solution. This is a stripped down prompt that consists of only an umbrella and a space:
PS1='\342\230\[\224\357\270\] '
I use the octal escapes for the UTF-8 encoding of the umbrella character, putting the last three bytes inside \[...\] so that bash doesn't think they take up space on the screen. I initially put the last four bytes in, but at least in my terminal, there is a display error where the umbrella is followed by an extra character (the question-mark-in-a-diamond glyph for missing characters), so the umbrella really does occupy two spaces.
This could be an issue with bash and 5-byte UTF-8 sequences; using a character with a 4-byte UTF-encoding poses no problem:
# U+10400 DESERET CAPITAL LETTER LONG I
# (looks like a lowercase delta)
PS1='\360\220\220\200 '

Block Indent Regex

I'm having problems about a regexp.
I'm trying to implement a regex to select just the tab indent blocks, but i cant find a way of make it work:
Example:
INDENT(1)
INDENT(2)
CONTENT(a)
CONTENT(b)
INDENT(3)
CONTENT(c)
So I need blocks like:
INDENT(2)
CONTENT(a)
CONTENT(b)
AND
INDENT(3)
CONTENT(c)
How I can do this?
really tks, its almost that, here is my original need:
table
tr
td
"joao"
"joao"
td
"marcos"
I need separated "td" blocks, could i adapt your example to that?
It depends on exactly what you are trying to do, but maybe something like this:
^(\t+)(\S.*)\n(?:\1\t.*\n)*
Working example: http://www.rubular.com/r/qj3WSWK9JR
The pattern searches for:
^(\t+)(\S.*)\n - a line that begins with a tab (I've also captured the first line in a group, just to see the effect), followed by
(?:\1\t.*\n)* - lines with more tabs.
Similarly, you can use ^( +)(\S.*)\n(?:\1 .*\n)* for spaces (example). Mixing spaces and tabs may be a little problematic though.
For the updated question, consider using ^(\t{2,})(\S.*)\n(?:\1\t.*\n)*, for at least 2 tabs at the beginning of the line.
You could use the following regex to get the groups...
[^\s]*.*\r\n(?:\s+.*\r*\n*)*
this requires that your lines not begin with white space for the beginning of the blocks.

How to escape unicode characters in bash prompt correctly

I have a specific method for my bash prompt, let's say it looks like this:
CHAR="༇ "
my_function="
prompt=\" \[\$CHAR\]\"
echo -e \$prompt"
PS1="\$(${my_function}) \$ "
To explain the above, I'm builidng my bash prompt by executing a function stored in a string, which was a decision made as the result of this question. Let's pretend like it works fine, because it does, except when unicode characters get involved
I am trying to find the proper way to escape a unicode character, because right now it messes with the bash line length. An easy way to test if it's broken is to type a long command, execute it, press CTRL-R and type to find it, and then pressing CTRL-A CTRL-E to jump to the beginning / end of the line. If the text gets garbled then it's not working.
I have tried several things to properly escape the unicode character in the function string, but nothing seems to be working.
Special characters like this work:
COLOR_BLUE=$(tput sgr0 && tput setaf 6)
my_function="
prompt="\\[\$COLOR_BLUE\\] \"
echo -e \$prompt"
Which is the main reason I made the prompt a function string. That escape sequence does NOT mess with the line length, it's just the unicode character.
The \[...\] sequence says to ignore this part of the string completely, which is useful when your prompt contains a zero-length sequence, such as a control sequence which changes the text color or the title bar, say. But in this case, you are printing a character, so the length of it is not zero. Perhaps you could work around this by, say, using a no-op escape sequence to fool Bash into calculating the correct line length, but it sounds like that way lies madness.
The correct solution would be for the line length calculations in Bash to correctly grok UTF-8 (or whichever Unicode encoding it is that you are using). Uhm, have you tried without the \[...\] sequence?
Edit: The following implements the solution I propose in the comments below. The cursor position is saved, then two spaces are printed, outside of \[...\], then the cursor position is restored, and the Unicode character is printed on top of the two spaces. This assumes a fixed font width, with double width for the Unicode character.
PS1='\['"`tput sc`"'\] \['"`tput rc`"'༇ \] \$ '
At least in the OSX Terminal, Bash 3.2.17(1)-release, this passes cursory [sic] testing.
In the interest of transparency and legibility, I have ignored the requirement to have the prompt's functionality inside a function, and the color coding; this just changes the prompt to the character, space, dollar prompt, space. Adapt to suit your somewhat more complex needs.
#tripleee wins it, posting the final solution here because it's a pain to post code in comments:
CHAR="༇"
my_function="
prompt=\" \\[`tput sc`\\] \\[`tput rc`\\]\\[\$CHAR\\] \"
echo -e \$prompt"
PS1="\$(${my_function}) \$ "
The trick as pointed out in #tripleee's link is the use of the commands tput sc and tput rc which save and then restore the cursor position. The code is effectively saving the cursor position, printing two spaces for width, restoring the cursor position to before the spaces, then printing the special character so that the width of the line is from the two spaces, not the character.
(Not the answer to your problem, but some pointers and general experience related to your issue.)
I see the behaviour you describe about cmd-line editing (Ctrl-R, ... Cntrl-A Ctrl-E ...) all the time, even without unicode chars.
At one work-site, I spent the time to figure out the diff between the terminals interpretation of the TERM setting VS the TERM definition used by the OS (well, stty I suppose).
NOW, when I have this problem, I escape out of my current attempt to edit the line, bring the line up again, and then immediately go to the 'vi' mode, which opens the vi editor. (press just the 'v' char, right?). All the ease of use of a full-fledged session of vi; why go with less ;-)?
Looking again at your problem description, when you say
my_function="
prompt=\" \[\$CHAR\]\"
echo -e \$prompt"
That is just a string definition, right? and I'm assuming your simplifying the problem definition by assuming this is the output of your my_function. It seems very likely in the steps of creating the function definition, calling the function AND using the values returned are a lot of opportunities for shell-quoting to not work the way you want it to.
If you edit your question to include the my_function definition, and its complete use (reducing your function to just what is causing the problem), it may be easier for others to help with this too. Finally, do you use set -vx regularly? It can help show how/wnen/what of variable expansions, you may find something there.
Failing all of those, look at Orielly termcap & terminfo. You may need to look at the man page for your local systems stty and related cmds AND you may do well to look for user groups specific to you Linux system (I'm assuming you use a Linux variant).
I hope this helps.

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I don´t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Resources