Why are my line-endings still wrong according to CodeSniffer?

Why are my line-endings still wrong according to CodeSniffer? - coding-style

I made a git hook that checks my code style before commit. It passes the staged files to CodeSniffer. I use the PSR-2 code style which means newlines should be \n even on Windows. However even after changing PhpStorm settings and git settings it still gives me the error that the newlines are \r\n. Why does this happen?
PhpStorm
Searching for \r\n with regex on does not return instances, so I believe the problem to lie with git.
.gitconfig
[core]
autocrlf = false
editor = \"C:/Program Files (x86)/GitExtensions/GitExtensions.exe\" fileeditor
eol = lf
[user]
name = Thomas Moors
email = thomas.moors#*****.nl
[merge]
tool = kdiff3
[mergetool "kdiff3"]
path = C:/Program Files (x86)/KDiff3/kdiff3.exe
[diff]
guitool = kdiff3
[difftool "kdiff3"]
path = C:/Program Files (x86)/KDiff3/kdiff3.exe
The error
Transcription:
FILE: ...\Thomas\Documents\example-live\laravel\app\Models\DoMdokUser.php
----------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
----------------------------------------------------------------------
1 | ERROR | [x] End of line character is invalid; expected "\n" but
| | found "\r\n"
----------------------------------------------------------------------
PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY
----------------------------------------------------------------------
Time: 105ms; Memory: 4Mb
Fix the error before commit please
edit: using another editor (n++) the problem does seem to lie with phpstorm. Replacing \r\n fixes the problem. So why does PHPStorm not work properly?

Searching for \r\n with regex on does not return instances, so I believe the problem to lie with git.
Not necessarily with git.
IDE (PhpStorm) stores all lines in memory with normalized line separator (which is \n / LF) and when you saving the file it replaces them with actual line separator symbol detected on file opening. This also means that if you happen to have mixed line endings in one file (e.g. both CRLF and LF) .. after save it will use only one style (e.g. only LF).
Normalized line ending allows you to execute your regex searches/replaces in a bit more simpler way -- no need to worry about what the actual symbol is.
Now -- the Code Style settings page -- as you can see in that hint below the field, it says: "Applied to new files". This means that this setting does NOT affect existing files in any way.
To change line ending for a particular existing file: open the file and either change it via appropriate section in Status Bar .. or via File | Line Separators
So why does PhpStorm not work properly?
It works properly -- you just happen to not know how to change the line separator style.

Related

Override .gitattributes text=auto in Windows

This is pretty unintuitive:
C:\python-tdl\examples\termbox>git config core.autocrlf
false
C:\python-tdl\examples\termbox>git commit termbox.py
warning: LF will be replaced by CRLF in examples/termbox/termbox.py.
The file will have its original line endings in your working directory.
warning: LF will be replaced by CRLF in examples/termbox/termbox.py.
The file will have its original line endings in your working directory.
warning: LF will be replaced by CRLF in examples/termbox/termbox.py.
The file will have its original line endings in your working directory.
Aborting commit due to empty commit message.
According to various media with core.autocrlf=false there should be no linefeed conversion at all.
In project root I discovered .gitattributes with the line:
# Auto detect text files and perform LF normalization
* text=auto
If I comment it, the warning goes away. The question - how can I override this .gitattibutes setting automatically?

.gitattributes overrides all config settings, so it really can't be overridden; it is the "overrider," so to speak. While you can simply remove the line, this will cause inconsistent behavior on other developers' machines if they have core.autocrlf=true. So the best bet would be to add the following line to .gitattributes: * -text. This will disable CRLF processing for all files.

At least in modern versions of git, .git/info/attributes (or $GIT_DIR/info/attributes) overrides .gitattributes for local configuration.
Use * !text to use the value of core.autocrlf, or * -text to force no conversion.
See the documentation for gitattributes and the text attribute.
Also note: core.eol, the eol attribute

Adding new lang to ctags does not work

I am trying to add .volt extension to ctags language map, but it keep ignoring .volt file. This is content of my .ctags file:
--recurse=yes
--tag-relative=yes
--exclude=*.git*
--exclude=.DS_Store
--langmap=html:+.volt
When I do ctags --list-maps I will see .volt files being included in HTML:
HTML *.htm *.html *.volt
But still when I run ctags, it completely ignores .volt files. What I am doing wrong here?

The reasons for the unexpected behavior are most likely:
You are not using currently latest version 5.8 of Exuberant Ctags, but a version before 5.6.
Your .ctags file has --langmap=html:+.volt at end of file with no line termination.
Read the full story below on why I think those 2 reasons result in the unexpected behavior of Ctags on your computer.
I looked on your problem on Windows first using older version 5.5.4 of Exuberant Ctags installed with text editor UltraEdit and later also with version 5.8 downloaded directly from Exuberant Ctags project page.
I created a copy of one of my HTML projects with just 1 *.html file in parent directory of the test project, 3 *.html files in a subdirectory and two more *.html files also in the subdirectory with file extension changed from html to volt on both files which were just copies of 2 of the 3 *.html files in this subdirectory.
Next I created in parent directory of the project a ctags.conf file and copied the few lines you posted into this file. Additionally I inserted at top a line with --verbose as this is useful on looking for problems like that.
And last I copied ctags.exe (first v5.5.4, later v5.8) also into the test project directory just for making it easier to run it from command line.
I opened a command prompt window in test project directory and executed
ctags.exe -f test.tag --options=ctags.conf
I could see on verbose output that both *.volt files were opened for processing and created test.tag contained also all the tags from the 2 *.volt files, the same tags as the 2 *.html files from which the *.volt files were copied before.
So what could be the problem?
I'm not only familiar with HTML. My main job is programming in C/C++. Therefore I know about an often made mistake in C source code files on reading in text files: a wrong handling of text files with no line termination on last line of the file.
And I know that some text editors like gedit on Linux position the caret on Ctrl+End at beginning of the line below the last line in the file even when last line of the file does not have a line termination. The caret should be in this case positioned by the text editor at end of the string on last line instead of beginning on next line beyond real end of the file. This in my point of view wrong behavior lets a user of the text editor think that the text file has a line termination also on last line of the file even if this is not true.
So I thought that you have appended --langmap=html:+.volt perhaps at end of the file without a line termination and ctags.exe does not evaluate the line in this case because of not well done text file parsing in source code. Therefore I removed the line termination in ctags.conf from last line containing now only --langmap=html:+.volt
I executed same command line as before and AHA, both *.volt files are ignored because of unknown language.
This was the time as I downloaded version 5.8 of Ctags for Windows and copied it into the test project directory replacing executable of version 5.5.4.
I executed the command line again with not modified ctags.conf. Both *.volt files were processed by Ctags and test.tag contained again the tags from both *.volt files.
Appending on last line of file ctags.conf again a line termination and executing the command line once more did not result in a different output. So this bug with ignoring last line of the options file if no line termination present at end of the file is fixed in version 5.8 of Ctags.
I searched in Change Notes of Exuberant Ctags for last and found in changes notes block for ctags-5.6 (Mon May 29 2006)
Fixed problem reading last line of list file (-L) without final newline.
This is the confirmation for what I thought and could see. And of course the problem existed not only on reading the list file, but also on reading other text files like the options file, or C and Java files as the next line in the change notes informs
Fixed infinite loop that could occur on files without final newline [C, Java].

If the ctags binary is really universal ctags you need to put/link your config file here (man ctags-universal -> FILES):
~/.ctags.d/my-config.ctags
File extension .ctags is relevant.
In my case, I needed ctags to support the arduino (.ino) file type. Add --langmap=c++:+.ino to ~/.ctags.d/local.ctags (it only symlinks to ~/.ctags really).
Check:
ctags --list-maps | grep C++
C++ *.c++ *.cc *.cp *.cpp *.cxx *.h *.h++ *.hh *.hp *.hpp *.hxx *.inl *.C *.H *.CPP *.CXX *.ino
[...]
Notice *.ino at the end of the line listing known extensions.

How to ignore Icon? in git

While trying to setup a dropbox folder with git, I saw a "Icon\r" file which is not created by me. I try to ignore it in the ~/.gitignore file. But adding Icon\r Icon\r\r Icon? won't work at all.

You can use vim as well.
vim .gitignore
in a new line write Icon, then
press ctrl+v and then press Enter
repeat step 3
save and exit (shortcut: ZZ)
Now you should have Icon^M^M and it's done :)
For a smarter use you could add it to your gitignore global config file in ~/.gitignore_global.

(This improves on the original answer, following a suggestion by robotspacer, according to hidn's explanation.)
The Icon? is the file of OS X folder icon. The "?" is a special character for double carriage return (\r\r).
To tell git to ignore it, open a terminal and navigate to your repository folder. Then type:
printf "Icon\r\r" >> .gitignore
If the file does not exist, it will be created and Icon\r\r will be its one line. If the file does exist, the line Icon\r\r will be appended to it.

"Icon[\r]" is probably a better alternative.
In vim, you just put Icon[^M], which is Icon[ followed by CtrlV, Enter then ].
The problem with "Icon\r\r" is EOL conversion.
The whole line is actually "Icon\r\r\n", counting line ending. Based on your setup, CRLF may be converted to LF on commit, so your repo will actually have "Icon\r\n". Say you sync the changes to another repo. You will get "Icon\r\n" in that working directory, which ignores Icon but not Icon^M. If you further edit .gitignore and commit it there, you will end up with "Icon\n" - completely losing \r.
I encountered this in a project where some develop on OS X while some on Windows. By using brackets to separate \r and the line ending, I don't have to repeat \r twice and I don't worry about EOL conversion.

The best place for this is in your global gitignore configuration file. You can create this file, access it, and then edit per the following steps:
>> git config --global core.excludesfile ~/.gitignore_global
>> vim ~/.gitignore_global
press i to enter insert mode
type Icon on a new line
while on the same line, ctrl + v, enter, ctrl + v, enter
press esc, then shift + ; then type wq then hit enter

Regarding Naming (and Quoting) Things: First, more people would benefit by knowing that ANSI-C Quoting can be used to unambiguously match the macOS icon file. Both Icon$'\r' or $'Icon\r' and work in Bash and Zsh and most other modern shells, I hope, such as Fish.
Keep Your .gitignore Editable: While I'm impressed by the byte-level manipulation offered by other answers here, these methods are brittle in practice. Simply put, programmers tend to use text editors, and many of these editors are configured to alter line endings when saving a file. (For example, see this VS Code discussion about line ending normalization.)
Do you want your careful byte editing undone by your editor? Of course not. So perhaps you find it practical and convenient to configure your editor so that it doesn't affect line endings. You might look into (a) editor-specific configuration settings; or (b) cross-editor configuration (i.e. EditorConfig).
But this gets complex and messy. If want a simpler, more flexible way, use this in your .gitignore file:
# .gitignore
Icon?
![iI]con[_a-zA-Z0-9]
Explanation for the patterns:
Use Icon? because the gitignore format does not support \r as an escape code.
Use [iI] because Git can be case sensitive.
Use [_a-zA-Z0-9] to catch many common ASCII characters; you may want to broaden this.
You can test that your gitignore patterns are working as expected with:
git check-ignore -v *
For example, for testing, with these files in a directory:
-rw-r--r--# Icon?
-rw-r--r-- icon8
drwxr-xr-x icons
-rw-r--r-- iconography
... the result of git check-ignore -v * is:
/Users/abc/.gitignore:3:Icon? "Icon\r"
/Users/abc/.gitignore:4:![iI]con[_a-zA-Z0-9] icon_
/Users/abc/.gitignore:4:![iI]con[_a-zA-Z0-9] icons
This is what you want.
Long Term Recommendation This problem would be trivial to fix if Git supported the \r escape in .gitconfig files. One could simply write:
# .gitignore
Icon[\r]
So I suggest we engage with the Git community and try to make this happen.
(If you do want to wade in and suggest a patch to Git, be sure to read first.)
References
From the gitignore documentation:
Otherwise, Git treats the pattern as a shell glob: "*" matches anything except "/", "?" matches any one character except "/" and "[]" matches one character in a selected range. See fnmatch(3) and the FNM_PATHNAME flag for a more detailed description.
Please see This linuxize.com article for good examples of the square bracket syntax and negation syntax in .gitignore files.
For those that want to dig deep and see how pattern matching has changed over time in the Git source code, you can run this search for uses of fnmatch in the git repository on GitHub.

The Icon? is the file of OSX folder icon. It turn out that \r is actually CRLF. So I use ruby to add the line to .gitignore file. Open terminal and navigate to home folder, then:
> irb
>> f = File.open(".gitignore", "a+") #<File:.gitignore>
>> f.write("Icon\r\r") # output a integer
>> f.close
>> exit

For me this worked in TextMate: Icon<CR><CR>. The <CR> is a carriage return character, which is at ctrl-alt-return on the keyboard. You can also find it in the standard Character Viewer app searching for cr. Please note that the <CR> is an invisible character, so it's only visible if the editor is set up to show them.

I'm posting just an update answer because the one above didn't work for me but actually simply adding Icon? in my .gitignore worked. If you look at your name file on your Finder, it is actually how it is displayed.

Icon[\r] did not work for me. I had to use the following in .gitignore...
Icon*
I also added Icon* to my Settings > Core > Ignored Names in Atom...
.git, .hg, .svn, .DS_Store, ._*, Thumbs.db, desktop.inis, Icon*

Add Icon? to your .gitignore file and save it. It should do the job.
Icon?

To avoid wasting time on such trivial issues, I recommend using gibo.
gibo dump macOS >> .gitignore
The result:
### Generated by gibo (https://github.com/simonwhitaker/gibo)
### https://raw.github.com/github/gitignore/e5323759e387ba347a9d50f8b0ddd16502eb71d4/Global/macOS.gitignore
# General
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
# Thumbnails
._*
# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

Ensure newline at EOF in RubyMine

How does one enforce a newline at end of file in RubyMine (v 4.5.3, on Mac OS)?
e.g., similar to Sublime Text 2

Enable Ensure blank line before end of file on Save option in the Editor settings:

In RubyMine 6 or above (on Mac OSX):
Enable 'Ensure line feed at file end on Save' option in the Editor settings.

RubyMine 2021.2
"Ensure every saved file ends with a line break"

RubyMine alongside many other editors supports the EditorConfig standard for basic configuration.
You can enforce newlines at the end of every file by placing a file named .editorconfig at the root of your project:
# top-most EditorConfig file
root = true
# Unix-style newlines with a newline ending every file
[*]
end_of_line = lf
insert_final_newline = true
It's possible to disable the setting for specific file types or filenames.

In RubyMine 2016.2.1 you have to go to Preferences > Editor > General > Other > Ensure line fedd at file end on Save.
Screenshot of the Preferences in RubyMine 2016.2.1

Get encoding of a file in Windows

This isn't really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?

Open up your file using regular old vanilla Notepad that comes with Windows.
It will show you the encoding of the file when you click "Save As...".
It'll look like this:
Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).
I realize there are many different types of encoding, but this was all I needed when I was informed our export files were in UTF-8 and they required ANSI. It was a onetime export, so Notepad fit the bill for me.
FYI: From my understanding I think "Unicode" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad's "Unicode" option: Windows 7 - UTF-8 and Unicdoe

If you have "git" or "Cygwin" on your Windows Machine, then go to the folder where your file is present and execute the command:
file *
This will give you the encoding details of all the files in that folder.

The (Linux) command-line tool 'file' is available on Windows via GnuWin32:
http://gnuwin32.sourceforge.net/packages/file.htm
If you have git installed, it's located in C:\Program Files\git\usr\bin.
Example:
C:\Users\SH\Downloads\SquareRoot>file *
_UpgradeReport_Files; directory
Debug; directory
duration.h; ASCII C++ program text, with CRLF line terminators
ipch; directory
main.cpp; ASCII C program text, with CRLF line terminators
Precision.txt; ASCII text, with CRLF line terminators
Release; directory
Speed.txt; ASCII text, with CRLF line terminators
SquareRoot.sdf; data
SquareRoot.sln; UTF-8 Unicode (with BOM) text, with CRLF line terminators
SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary info
SquareRoot.vcproj; XML document text
SquareRoot.vcxproj; XML document text
SquareRoot.vcxproj.filters; XML document text
SquareRoot.vcxproj.user; XML document text
squarerootmethods.h; ASCII C program text, with CRLF line terminators
UpgradeLog.XML; XML document text
C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
_UpgradeReport_Files; binary
Debug; binary
duration.h; us-ascii
ipch; binary
main.cpp; us-ascii
Precision.txt; us-ascii
Release; binary
Speed.txt; us-ascii
SquareRoot.sdf; binary
SquareRoot.sln; utf-8
SquareRoot.sln.docstates.suo; binary
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary infobinary
SquareRoot.vcproj; us-ascii
SquareRoot.vcxproj; utf-8
SquareRoot.vcxproj.filters; utf-8
SquareRoot.vcxproj.user; utf-8
squarerootmethods.h; us-ascii
UpgradeLog.XML; us-ascii

Another tool that I found useful: https://archive.codeplex.com/?p=encodingchecker
EXE can be found here

Install git ( on Windows you have to use git bash console). Type:
file --mime-encoding *
for all files in the current directory , or
file --mime-encoding */*
for the files in all subdirectories

Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii when no BOM is present (like most text editors, the default would be UTF8 if you want to match the HTTP/web ecosystem).
Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by #Sybren, and I show how to do that via PowerShell in a later answer.
# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
$bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)
if(!$bytes) { return 'utf8' }
switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
'^efbbbf' { return 'utf8' }
'^2b2f76' { return 'utf7' }
'^fffe' { return 'unicode' }
'^feff' { return 'bigendianunicode' }
'^0000feff' { return 'utf32' }
default { return 'ascii' }
}
}
dir ~\Documents\WindowsPowershell -File |
select Name,#{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} |
ft -AutoSize
Recommendation: This can work reasonably well if the dir, ls, or Get-ChildItem only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)

A simple solution might be opening the file in Firefox.
Drag and drop the file into firefox
Press Ctrl+I to open the page info
and the text encoding will appear on the "Page Info" window.
Note: If the file is not in txt format, just rename it to txt and try again.
P.S. For more info see this article.

I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use #Sybren's solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).
Add this to your profile.ps1:
$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe
And used like: file.exe --mime-encoding *. You must include .exe in the command for PS alias to work.
But if you don't customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
and save it to ~\Documents\WindowsPowerShell. It's safe to use on a computer without git, but will write warnings when git is not found.
The .exe in the command is also how I use C:\WINDOWS\system32\where.exe from powershell; and many other OS CLI commands that are "hidden by default" by powershell, *shrug*.

you can simply check that by opening your git bash on the file location then running the command file -i file_name
example
user filesData
$ file -i data.csv
data.csv: text/csv; charset=utf-8

Some C code here for reliable ascii, bom's, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html
Only ASCII, UTF-8 and encodings using a BOM (UTF-7 with BOM, UTF-8 with BOM,
UTF-16, and UTF-32) have reliable algorithms to get the encoding of a document.
For all other encodings, you have to trust heuristics based on statistics.
EDIT:
A powershell version of a C# answer from: Effective way to find any file's Encoding. Only works with signatures (boms).
# get-encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)
begin {
# set .net current directoy
[Environment]::CurrentDirectory = (pwd).path
}
process {
$reader = [System.IO.StreamReader]::new($filename,
[System.Text.Encoding]::default,$true)
$peek = $reader.Peek()
$encoding = $reader.currentencoding
$reader.close()
[pscustomobject]#{Name=split-path $filename -leaf
BodyName=$encoding.BodyName
EncodingName=$encoding.EncodingName}
}
.\get-encoding chinese8.txt
Name BodyName EncodingName
---- -------- ------------
chinese8.txt utf-8 Unicode (UTF-8)
get-childitem -file | .\get-encoding

Looking for a Node.js/npm solution? Try encoding-checker:
npm install -g encoding-checker
Usage
Usage: encoding-checker [-p pattern] [-i encoding] [-v]
Options:
--help Show help [boolean]
--version Show version number [boolean]
--pattern, -p, -d [default: "*"]
--ignore-encoding, -i [default: ""]
--verbose, -v [default: false]
Examples
Get encoding of all files in current directory:
encoding-checker
Return encoding of all md files in current directory:
encoding-checker -p "*.md"
Get encoding of all files in current directory and its subfolders (will take quite some time for huge folders; seemingly unresponsive):
encoding-checker -p "**"
For more examples refer to the npm docu or the official repository.

Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you're using that. In Visual Studio, you can select "File > Advanced Save Options..."
The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it's useful when dealing with various files from around the world and whatever else.
Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting "OK". You can also select the encoding you want through the "Save with Encoding..." option in the Save As dialog (by clicking the arrow next to the Save button).

The only way that I have found to do this is VIM or Notepad++.

EncodingChecker
File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.
File Encoding Checker requires .NET 4 or above to run.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio