Bash programmation (Cygwin): Illegal Character ^M [duplicate] - windows

This question already has answers here:
Are shell scripts sensitive to encoding and line endings?
(14 answers)
Closed 3 years ago.
I have a problem with a character. I think it's a conversion problem between dos and unix.
I have a variable that is a float value.
When I print it with the echo command i get:
0.495959
But when I try to make an operation on that value with the bc command (I am not sure how to write the bc command).
echo $mean *1000 |bc
I get:
(standard_in) 1 : illegal character: ^M
I already use the dos2unix command on my .sh file.
I think it's because my variable have the ^M character (not printed with the echo command)
How can i eliminate this error?

I don't have Cygwin handy, but in regular Bash, you can use the tr -d command to strip out specified characters, and you can use the $'...' notation to specify weird characters in a command-line argument (it's like a normal single-quoted string, except that it supports C/Java/Perl/etc.-like escape sequences). So, this:
echo "$mean" * 1000 | tr -d $'\r' | bc
will strip out carriage-returns on the way from echo to bc.
You might actually want to run this:
mean=$(echo "$mean" | tr -d $'\r')
which will modify $mean to strip out any carriage-returns inside, and then you won't have to worry about it in later commands that use it.
(Though it's also worth taking a look at the code that sets $mean to begin with. How does $mean end up having a carriage-return in it, anyway? Maybe you can fix that.)

This works:
${mean/^M/}
You can get ^M by typing Ctrl-V followed by Ctrl-M. Or, alternatively:
${mean/$(printf "\r")/}
The benefit of this method compared to #ruakh's is that here you are using bash built-ins only. The first will be faster as the second will run inside a subshell.
If you just want to "unixize" $mean:
mean="${mean/^M/}"
Edit: There's yet another way:
${mean/$'\r'/}

Running Windows stuff in cygwin has one nasty side-effect as you found out - capturing the output of Windows programs in a cygwin bash variable will also capture the CR output by the program.
Judicious use of d2u avoids the issue - for example,
runtime="`mediainfo --Inform='Video;%Duration%' ${movie} | d2u`"
(Without the d2u, ${runtime} would have a CR tacked on the end, which causes the problem you saw when you feed it to 'bc' for example.)

Maybe you should just save your script in UNIX format instead of DOS.

Try this:
echo `echo $mean` *1000 |bc
If echo really isn't printing it, it should work.

^M is a carriage return character that is used in Windows along with newline (\n) character to indicate next line. However, it is not how it is done in UNIX world, and so bash doesn't treat at as a special character and it breaks the syntax. What you need to do is to remove that character using one of many methods. dos2unix tool can come handy, for example.

As others have pointed out, this is a Windows line ending issue. There are many ways to fix the problem, but the question is why did this happen in the first place.
I can see this happening in several places:
This is a WINDOWS environment variable that was set when Cygwin started up. Sometimes these variables get a CRLF on the end of them. You mentioned this was a particular issue with this one variable, but you didn't specify where it was set.
You edited this file using a Windows text editor like Notepad or Winpad.
Never use a text editor to edit a program. Use a program editor. If you like VI, download VIM which is available on Windows and comes on Cygwin (and all other Unix-based platforms). If VIM isn't for you, try the more graphically based Notepad++. Both of these editors handle end of line issues, and can create scripts with Unix line endings in Windows or files with Windows line endings in Cygwin.
If you use VIM, you can do the following to change line endings and to set them:
To see the line ending in the current file, type :set ff? while in command mode.
To set the line ending for Unix, type :set ff=unix while in command mode.
To set the line ending for Windows, type :set ff=dos while in command mode.
If you use Notepad++
You can go into the Edit-->EOL Conversion menu item and see what your current line ending setting (it's the one not highlighted) and change it.
To have Notepad++ use Unix line endings as the default, go into the Settings-->Preferences menu item. In the Dialog box, select the New Document/Default Directory tab. In the Format section which is part of the New Document section, select the line ending you want. WARNING: Do not select Mac as an option. That doesn't even work on Macs. If you have a Mac, select Unix.

Related

unable to solve "syntax error near unexpected token `fi'" - hidden control characters (CR) / Unicode whitespace

I am new to bash scripting and i'm just trying out new things and getting to grips with it.
Basically I am writing a small script to store the content of a file in a variable and then use that variable in an if statement.
Through step by step i have figured out the ways to store variables and then store content of files as variables. I am now working on if statements.
My test if statement if very VERY basic. I just wanted to grasp the syntax before moving onto more complicated if statement for my program.
My if statement is:
if [ "test" = "test" ]
then
echo "This is the same"
fi
Simple right? however when i run the script i am getting the error:
syntax error near unexpected token `fi'
I have tried a number of things from this site as well as others but i am still getting this error and I am unsure what is wrong. Could it be an issue on my computer stopping the script from running?
Edit for ful code. Note i also deleted all the commented out code and just used the if statement, still getting same error.
#!/bin/bash
#this stores a simple variable with the content of the file testy1.txt
#DATA=$(<testy1.txt)
#This echos out the stored variable
#echo $DATA
#simple if statement
if [ "test" = "test" ]
then
echo "has value"
fi
If a script looks ok (you've balanced all your quotes and parentheses, as well as backticks), but issues an error message, this may be caused by funny characters, i.e. characters that are not displayed, such as carriage returns, vertical tabs and others. To diagnose these, examine your script with
od -c script.sh
and look for \r, \v or other unexpected characters. You can get rid of \r for example with the dos2unix script.sh command.
BTW, what operating system and editor are you using?
To complement Jens's helpful answer, which explains the symptoms well and offers a utility-based solution (dos2unix). Sometimes installing a third-party utility is undesired, so here's a solution based on standard utility tr:
tr -d '\r' < script > script.tmp && mv script.tmp script
This removes all \r (CR) characters from the input, saves the output to a temporary file, and then replaces the original file.
While this blindly removes \r instances even if they're not part of \r\n (CRLF) pairs, it's usually safe to assume that \r instances indeed only occur as part of such pairs.
Solutions with other standard utilities (awk, sed) are possible too - see this answer of mine.
If your sed implementation offers the -i option for in-place updating, it may be the simpler choice.
To diagnose the problem I suggest using cat -v script, as its output is easy to parse visually: if you see ^M (which represents \r) at the end of the output lines, you know you're dealing with a file with Window line endings.
Why Your Script Failed So Obscurely
Normally, a shell script that mistakenly has Windows-style CRLF line endings, \r\n, (rather than the required Unix-style LF-only endings, \n) and starts with shebang line #!/bin/bash fails in a manner that does indicate the cause of the problem:
/bin/bash^M: bad interpreter
as a quick SO search can attest. The ^M indicates that the CR was considered part of the interpreter path, which obviously fails.
(If, by contrast, the script's shebang line is env-based, such as #!/usr/bin/env bash, the error message differs, but still points to the cause: env: bash\r: No such file or directory)
The reason you did not see this problem is that you're running in the Windows Unix-emulation environment Cygwin, which - unlike on Unix - allows a shebang line to end in CRLF (presumably to also support invoking other interpreters on Windows that do expect CRLF endings).
The CRLF problem therefore didn't surface until later in your script, and the fact that you had no empty lines after the shebang line further obfuscated the problem:
An empty CRLF-terminated line would cause Bash (4.x) to complain as follows: "bash: line <n>: $'\r': command not found, because Bash tries to execute the CR as a command (since it doesn't recognize it as part of the line ending).
The comment lines directly following the shebang lines are unproblematic, because a comment line ending in CR is still syntactically valid.
Finally, the if statement broke the command, in an obscure manner:
If your file were to end with a line break, as is usually the case, you would have gotten syntax error: unexpected end of file:
The line-ending then and if tokens are seen as then\r and if\r (i.e., the CR is appended) by Bash, and are therefore not recognized as keywords. Bash therefore never sees the end of the if compound command, and complains about encountering the end of the file before seeing the if statement completed.
Since your file did not, you got syntax error near unexpected token 'fi':
The final fi, due to not being followed by a CR, is recognized as a keyword by Bash, whereas the preceding then wasn't (as explained). In this case, Bash therefore saw keyword fi before ever seeing then, and complained about this out-of-place occurrence of fi.
Optional Background Information
Shell scripts that look OK but break due to characters that are either invisible or only look the same as the required characters are a common problem that usually has one of the following causes:
Problem A: The file has Windows-style CRLF (\r\n) line endings rather than Unix-style LF-only (\n) line endings - which is the case here.
Copying a file from a Windows machine or using an editor that saves files with CRLF sequences are among the possible causes.
Problem B: The file has non-ASCII Unicode whitespace and punctuation that looks like regular whitespace, but is technically distinct.
A common cause is source code copied from websites that use non-ASCII whitespace and punctuation for formatting code for display purposes;
an example is use of the no-break space Unicode character (U+00A0; UTF-8 encoding 0xc2 0xa0), which is visually indistinguishable from a normal (ASCII) space (U+0020).
Diagnosing the Problem
The following cat command visualizes:
all normally invisible ASCII control characters, such as \r as ^M.
all non-ASCII characters (assuming the now prevalent UTF-8 encoding), such as the non-break space Unicode char. as M-BM- .
^M is an example of caret notation, which is not obvious, especially with multi-byte characters, but, beyond ^M, it's usually not necessary to know exactly what the notation stands for - you just need to note if the ^<letter> sequences are present at all (problem A), or are present in unexpected places (problem B).
The last point is important: non-ASCII characters can be a legitimate part of source code, such as in string literals and comments. They're only a problem if they're used in place of ASCII punctuation.
LC_ALL=C cat -v script
Note: If you're using GNU utilities, the LC_ALL=C prefix is optional.
Solutions to Problem A: translating line endings from CRLF to LF-only
For solutions based on standard or usually-available-by-default utilities (tr, awk, sed, perl), see this answer of mine.
A more robust and convenient option is the widely used dos2unix utility, if it is already installed (typically, it is not), or installing it is an option.
How you install it depends on your platform; e.g.:
on Ubuntu: sudo apt-get install dos2unix
on macOs, with Homebrew installed, brew install dos2unix
dos2unix script would convert the line endings to LF and update file script in place.
Note that dos2unix also offers additional features, such as changing the character encoding of a file.
Solutions to Problem B: translating Unicode punctuation to ASCII punctuation
Note: By punctuation I mean both whitespace and characters such as -
The challenge in this case is that only Unicode punctuation should be targeted, whereas other non-ASCII characters should be left alone; thus, use of character-transcoding utilities such as iconv is not an option.
nws is a utility (that I wrote) that offers a Unicode-punctuation-to-ASCII-punctuation translation mode while leaving non-punctuation Unicode chars. alone; e.g.:
nws -i --ascii script # translate Unicode punct. to ASCII, update file 'script' in place
Installation:
If you have Node.js installed, install it by simply running [sudo] npm install -g nws-cli, which will place nws in your path.
Otherwise: See the manual installation instructions.
nws has several other functions focused on whitespace handling, including CRLF-to-LF and vice-versa translations (--lf, --crlf).
syntax error near unexpected token `fi'
means that if statement not opened and closed correctly, you need to check from the beginning that every if, for or while statement was opened and closed correctly.
Don't forget to add in the beginning of script :
#!/bin/bash
Here, "$test" should be a variable that stores the content of that file.
if [ "$test" = "test" ]; then
echo "This is the same"
fi

Why is this error happening in bash?

Using cygwin, I have
currentFold="`dirname $0`"
echo ${currentFold}...
This outputs ...gdrive/c/ instead of /cygdrive/c/...
Why is that ?
Your script is stored in DOS format, with a carriage return followed by linefeed (sometimes written "\r\n") at the end of each line; unix uses just a linefeed ("\n") at the end of lines, and so bash is mistaking the carriage return for part of the command. When it sees
currentFold="`dirname $0`"\r
it dutifully sets currentFold to "/cygdrive/c/\r", and when it sees
echo ${currentFold}...\r
it prints "/cygdrive/c/\r...\r". The final carriage return doesn't really matter, but the one in the middle means that the "..." gets printed on top of the "/cy", and you wind up with "...gdrive/c/".
Solution: convert the script to unix format; I believe you'll have the dos2unix command available for this, but you might have to look around for alternatives. In a pinch, you can use
perl -pi -e 's/\r\n?/\n/g' /path/to/script
(see http://www.commandlinefu.com/commands/view/5088/convert-files-from-dos-line-endings-to-unix-line-endings). Then switch to a text editor that saves in unix format rather than DOS.
I would like to add to Gordon Davisson's anwer.
I am also using Cygwin. In my case this happened because my Git for Windows was configured to Checkout Windows-style, commint Unix style line endings.
This is the default option, and was breaking all my cloned shell scripts.
I reran my git setup and changed to Checkout as-is, commit Unix-style line endings which prevented the problem from happening at all.

Weird behavior in mac os x terminal

I started using this code from Mark Dotto (http://markdotto.com/2013/01/13/improved-terminal-hotness/) to make my terminal is bit sexier.
I just copied the code without editing it, so in my .bash_profile I added:
export PS1='\[\e[0:35m⌘\e[m \e[0:36m\w/\e[m \e[0:33m`git branch 2> /dev/null | grep -e ^* | sed -E s/^\\\\\*\ \(.+\)$/\(\\\\\1\)\ /`\e[m\]'
Everything's working, but there is a weird thing: when I type 3 characters or less then I hit backspace, it deletes everything, even the informations on the left (the path and git branch).
This could be okay, but the problem is that when I keep typing after that, the command I started typing is still here (but hidden).
I guess you didn't understand so I'll try to show some code:
# this is what my prompt looks like
~/my/path/ (branch) |
# I start typing a command
~/my/path/ (branch) ls|
# now I hit backspace once
|
# everything is removed
# but if I type something else then hit return
git st|
# it throws an error as the `l` from the previous command is still here
-bash: lgit: command not found
I have absolutely know idea how this bash_profile works, anybody can help? Thanks
there appears to be some incorrect syntax in your PS1 variable that's causing some unexpected errors. try this revision instead:
export PS1='\[\e[36m\]\w \[\e[33m\]`git branch 2> /dev/null | grep -e ^* | sed -E s/^\\\\\*\ \(.+\)$/\(\\\\\1\)\ /` \[\e[0m\]'
(note: i left the git ... grep ... sed pipeline alone and only edited the parts related to the prompt itself.)
edit - take out the 0: parts and the colors actually work. (i.e. \[\e[36m\] instead of \[\e[0:36m\])
and here's a breakdown of what's going on there:
\[\e[36m\] - this block sets a foreground text color (light blue/tealish)
\w - current working directory
\[\e[33m\] - sets a different text color (yellow)
git ... grep ... sed - retrieves your current git branch
\[\e[0m\] - resets the text color to white so you're not typing commands in yellow
if you don't care about colors, prompts are a fairly trivial thing. the color blocks make it a bit more complex, and (as you've seen) error prone.
First of all: Make sure you are using the BASH shell.
I am on Mountain Lion on a MacBook and the PS1 command sort of, kind of works. My prompt looks like this:
⌘ ~/SVN-Precommit-Kitchen-Sink-Hook.git/ (master) _
I guess the question is what do you want your prompt to do. BASH prompts can embed a whole bunch of escape sequences that can do all sorts of neat things that in Kornshell would take a wee bit of hacking.
Type man bash on the command line, and find the PROMPTING heading. You should see something like this:
When executing interactively, bash displays the primary prompt PS1 when it is ready to read a com-
mand, and the secondary prompt PS2 when it needs more input to complete a command. Bash allows these
prompt strings to be customized by inserting a number of backslash-escaped special characters that
are decoded as follows:
\a an ASCII bell character (07)
\d the date in "Weekday Month Date" format (e.g., "Tue May 26")
\D{format}
the format is passed to strftime(3) and the result is inserted into the prompt string;
an empty format results in a locale-specific time representation. The braces are
required
\e an ASCII escape character (033)
\h the hostname up to the first `.'
\H the hostname
\j the number of jobs currently managed by the shell
\l the basename of the shell's terminal device name
\n newline
\r carriage return
\s the name of the shell, the basename of $0 (the portion following the final slash)
\t the current time in 24-hour HH:MM:SS format
\T the current time in 12-hour HH:MM:SS format
\# the current time in 12-hour am/pm format
\A the current time in 24-hour HH:MM format
\u the username of the current user
\v the version of bash (e.g., 2.00)
\V the release of bash, version + patch level (e.g., 2.00.0)
\w the current working directory, with $HOME abbreviated with a tilde
\W the basename of the current working directory, with $HOME abbreviated with a tilde
\! the history number of this command
\# the command number of this command
\$ if the effective UID is 0, a #, otherwise a $
\nnn the character corresponding to the octal number nnn
\\ a backslash
\[ begin a sequence of non-printing characters, which could be used to embed a terminal
control sequence into the prompt
\] end a sequence of non-printing characters
Let's take a simple prompt. I want to display my user name, the system I'm on, and my current directory. I can set PS1 like this:
PS1="\u#\h:\w$ "
This will give me:
david#davebook:~$ _
The \u is my user name (david), the \h is my machine name (davebook), and the \w displays the current directory I'm in relation to my $HOME directory.
You can also embed commands in the prompt too:
PS1="\$(date) \u#\h:\w$ "
Now the date and time will be embedded in my prompt:
Fri Feb 1 09:45:53 EST 2013 david#DaveBook:~
Sort of silly (I should have formatted the date. Besides, BASH already has built in sequences for the date), but you get the idea.
I recommend that you build your own damn prompt. If you're a git user, and you're using command lines comfortably, you can probably make a nice prompt yourself to look the way you want. You can use the \$(command) syntax to include interactive commands that get executed with each new PS command. You can use ANSI escape codes to color different parts of your prompt, or make them do fantastic stuff.
Build your prompt slowly and bit-by-bit. Create a shell script that will set PS1, and source it in like this:
$ echo "PS='\u#\h:\w\$ " > prompt.sh
$ chmod a+x prompt.sh
$ . prompt.sh
Then, add more and more features to your prompt until you get it to work the way you want.
Personally, I avoid over fancy prompts simply because they tend to fall apart sometime when you least expect it. For example, I use VI sequences for editing, and that prompt simply falls completely apart whenever I try to edit my command line.
Fancy prompts remind me of programs like Talking Moose which are really cool for the first few minutes, then start getting really, really annoying after that.

Using Vim search and replace from within a Bash script

I have a number of files (more than a hundred) that I want to process using Vim. A sample of the files’ contents is as follows:
xyz.csv /home/user/mydocs/abc.txt
/home/user/waves/wav.wav , user_wav.wav
I want this to be replaced by:
xyz.csv /var/lib/mydir/abc.txt
/var/sounds/wav.wav , wav.wav
In each of the files, the changes I need to make are the same. My questions are:
Can I use Vim search and replace functionality by calling it from within a Bash script?
If so, how do I go about it?
P.S. I have searched StackOverflow for similar questions and found some answers using ex scripts, etc. I want to know how I can call an ex script from within a bash script.
While vim is quite powerful, this is not something I would normally use vim for. It can be done using a combination of common command line utilities instead.
I've assumed that the blank line in your example above is actually blank and does not contain spaces or any other whitespace characters. You can use the following to do what you want.
sed -e "s,/home/user/mydocs,/var/lib/mydir," -e "s,/home/user/waves,/var/sounds," -e "/^$/d" file1
You can use that command together with find and a for loop to do this for a bunch of files:
for file in `find . -maxdepth 1 -type f`
do
sed -e "s,/home/user/mydocs,/var/lib/mydir," -e "s,/home/user/waves,/var/sounds," -e "/^$/d" $file
done
In the for loop, the find command above limits the output to all files in the current directory (including dot files), assigning each line from the output of find to the file variable and then running the sed command posted earlier to transform the file the way you want it to be transformed.
This is how you'd invoke an ed script from bash:
ed filename <<END
/^$/d
%s|/home/user/mydocs|/var/lib/mydir|
%s|/home/user/waves|/var/sounds|
%s|, user_|, |
w
q
END
To answer with vim, you can do
vim -e 'bufdo!%s:\(xyz.csv \)/home/user/mydocs/\(abc.txt\n\)\n.*:\1/var/lib/mydir/\2/var/sounds/wav.wav , wav.wav:' -e 'xa' FILES
Note, I had assumed, that the second line is statically replaced, as it had looked like in the question.
If you don't like writing long lines in your script, you can create a file like:
s/FOO/BAR/
" several replacement and other commands
w " write the file
bd " if you want to
Then do:
vim -e "buffdo!source /your_scriptfile" -e "x" FILES
HTH
If all the editing consists in a series of substitutions, the most
idiomatic way of accomplishing it using Vim would be the following.
Open all the target files at once:
vim *.txt
Run the substitution commands on the loaded files:
:argdo %s#/home/user/mydocs#/var/lib/mydir#
:argdo %s#/home/user/waves#/var/sounds#
:argdo %s#, \zsuser_##
...
If changes are correctly made, save the files:
:wall
If the editing you want to automate could not be expressed only
in substitutions, record a macro and run it via the :normal
command:
:argdo norm!#z
(Here z is the name of the macro to be run.)
Lastly, if the editing should be performed from time to time and
needs to be stored in a script, try using the approach described
in the answer to a similar question.
Answer
While most vim users would be aware of the % motion command for executing inclusive commands on the whole document in the current buffer. Most modern versions of vim (ie 6.x+) support actions on regex searches for exclusive actions like so:
:/regex/substitute/match/replace/ # as vim command line
+"/reges/s/match/replace" # as bash cli parameter
This breaks down into vim doing the following,
search for regex and put the cursor at start of found point
call internal substitute function (see :help s or :help substitute) [ could be other commands ]
match string with regex for substitution
replace with new string value
Effectively it operates the same as the :global command.
Notes
Command after regex search can be any command, including '!"shell command"' filter commands.
Reference Help
:help global
:help substitute
:help filter

KornShell (ksh) wraparound

Okay, I am sure this is simple but it is driving me nuts. I recently went to work on a program where I have had to step back in time a bit and use Redhat 9. When I'm typing on the command line from a standard xterm running KornShell (ksh), and I reach the end of the line the screen slides to the right (cutting off the left side of my command) instead of wrapping the text around to a new line. This makes things difficult for me because I can't easily copy and paste from the previous command straight from the command line. I have to look at the history and paste the command from there. In case you are wondering, I do a lot of command-line awk scripts that cause the line to get quite long.
Is there a way to force the command line to wrap instead of shifting visibility to the right side of the command I am typing?
I have poured through man page options with no luck.
I'm running:
XFree86 4.2.99.903(174)
KSH 5.2.14.
Thanks.
Did you do man ksh?
You want to do a set -o multiline.
Excerpt from man ksh:
multiline:
The built-in editors will use multiple lines on the screen for
lines that are longer than the width of the screen. This may not
work for all terminals.
eval $(resize) should do it.
If possible, try to break the command down to multiple lines by adding \
ie:
$ mycommand -a foo \
-f bar \
-c dif
The simple answer is:
$ set -o multiline
ksh earlier than 5.12, like the ksh shipped with NetBSD 6.1, doesn't have this option. You will have to turn off current Interactive Input Line Editing mode, which is usually emacs:
$ set +o emacs
This turns off a lot of featuers altogether, like tab-completion or the use of 'Up-arrow' key to roll back the previous command.
If you decide to get used to emacs mode somehow, remember ^a goes to the begining of the line ("Home" key won't workk) and ^e goes to the end.
I don't know of a way of forcing the shell to wrap, but I would ask why you'd be writing lines that long. With awk scripts, I simply wrap the script in single quotes, and then break the lines where I want. It only gets tricky if you need single quotes in the script -- and diabolical if you need both single and double quotes. Actually, the rule is simple enough: use single quotes to wrap the whole script, and when you want a single quote in the script, write '\''. The first quote terminates the previous single-quoted string; the backslash-single quote yields a single quote; and the last single quote starts a new single quoted string. It really gets hairy if you need to escape those characters for an eval or something similar.
The other question is - why not launch into an editor. Since I'm a die-hard vim nutcase (ok - I've been using vi for over 20 years, so it is easier for me than the alternatives), I have Korn shell set to vi mode (set -o vi), and can do escape-v to launch the editor on whatever I've typed.
This is kind of a pragmatic answer, but when that's an issue for me I usually do something like:
strings ~/.history | grep COMMAND
or
strings ~/.history | tail
(The history file has a little bit of binary data in it, hence 'strings')

Resources