Append to end of each line with sed on Windows? - windows

I am using sed on Windows (the GNU port).
I execute:
$> sed "s/$/./" < /data.txt
And get:
.ne
.wo
.hree
But expect.
one.
two.
three.
The following works though I don't think it should. The way I read it is "replace the last character of the line with a period." I'm afraid it won't work consistently when used elsewhere. The intent isn't to replace the last character with a period but to append a period.
$> sed "s/.$/./" < /data.txt
I am not sure if the file encoding or something specific to windows is causing the issues I'm having or if it's just lack of experience with sed. Ideas?

hexdump -C sheds some light:
$ sed 's/$/./' < t.dos | hexdump -C
00000000 6f 6e 65 0d 2e 0a 74 77 6f 0d 2e 0a 74 68 72 65 |one...two...thre|
00000010 65 0d 2e 0a |e...|
00000014
There, 2e is the dot, the 0d before it is carriage return aka \r, and after that is the newline aka \n. In other words, sed treats \r as the end of the line instead of \r\n together, and thus \r is still part of the line, so it puts the dot after it, then adds back the newline as usual.
I think this does what you want, but it's not exactly pretty:
$ sed 's/.$/.\r/' < t.dos | hexdump -C
00000000 6f 6e 65 2e 0d 0a 74 77 6f 2e 0d 0a 74 68 72 65 |one...two...thre|
00000010 65 2e 0d 0a |e...|
00000014
The above is not so good, because it will only work if the input is in dos format, otherwise it will break the file. A better solution might be to first strip any \r and add them back manually later, like this:
$ tr -d '\r' < t.dos | sed -e 's/$/.\r/' | hexdump -C
00000000 6f 6e 65 2e 0d 0a 74 77 6f 2e 0d 0a 74 68 72 65 |one...two...thre|
00000010 65 2e 0d 0a |e...|
00000014

Related

Append to a string in bash

I'm trying to get a download URL using curl and awk and want to append something to that URL afterwards.
Here some snipped of my code:
IMAGE=$(curl -I -s https://downloads.raspberrypi.org/raspbian_lite_latest | awk '/Location/ {print $2}')
CHECKSUM="$IMAGE.sha256"
echo $IMAGE
echo $CHECKSUM
What I'm getting is that it is somehow replacing parts at the beginning.
https://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2018-11-15/2018-11-13-raspbian-stretch-lite.zip
.sha256/downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2018-11-15/2018-11-13-raspbian-stretch-lite.zip
I'm a bit helpless, because the following works as expected:
A="https""://abc.org/a_b/a.zip" # looks weird, but full URLs are not allowed here
B="$A.sha256"
echo $B
What am I doing wrong?
When you hexdump your string, you see that is uses windows line endings (with carriage return):
echo $IMAGE | hexdump -C
00000000 68 74 74 70 73 3a 2f 2f 64 6f 77 6e 6c 6f 61 64 |https://download|
00000010 73 2e 72 61 73 70 62 65 72 72 79 70 69 2e 6f 72 |s.raspberrypi.or|
00000020 67 2f 72 61 73 70 62 69 61 6e 5f 6c 69 74 65 2f |g/raspbian_lite/|
00000030 69 6d 61 67 65 73 2f 72 61 73 70 62 69 61 6e 5f |images/raspbian_|
00000040 6c 69 74 65 2d 32 30 31 38 2d 31 31 2d 31 35 2f |lite-2018-11-15/|
00000050 32 30 31 38 2d 31 31 2d 31 33 2d 72 61 73 70 62 |2018-11-13-raspb|
00000060 69 61 6e 2d 73 74 72 65 74 63 68 2d 6c 69 74 65 |ian-stretch-lite|
00000070 2e 7a 69 70 0d 0a |.zip..|
00000076
To fix that, use
IMAGE=$(curl -I -s https://downloads.raspberrypi.org/raspbian_lite_latest | awk '/Location/ {print $2}' | tr -d "\r")
The problem apparently is, that your $IMAGE contains / ends in a trailing '\r(carriage return). So you've actually appended ".sha256" as you expected to"something\r.sha256" which when being echoed means.... something, cursor back to the beginning of the line, .sha256. Long story short, strip that '\r`. E.g:
IMAGE=$(curl -I -s https://downloads.raspberrypi.org/raspbian_lite_latest | awk '/Location/ {sub(/\r$/, "", $2); print $2}')
Since you are using bash you can use substring replacement, ie. replace the \r in IMAGEvar:
$ CHECKSUM="${IMAGE/$'\r'/}.sha256"
$ echo $CHECKSUM
https://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2018-11-15/2018-11-13-raspbian-stretch-lite.zip.sha256
or prepare for it in the awk part by setting the record separator RS:
... | awk -v RS="\r?\n" '/Location/ {print $2}'
Tested with gawk, mawk and original-awk. Surprisingly busybox awk removed it by itself:
$ echo -e \\r | busybox awk '{print $1}' | hexdump -C
00000000 0a |.|
but for example:
$ echo -e \\r | gawk '{print $1}' | hexdump -C
00000000 0d 0a |..|

Shell script running different on MacOS and Linux

I'm trying to run my shell script on Linux (Ubuntu).
It's running correctly on MacOS, but on Ubuntu it doesn't.
#!/usr/bin/env bash
while true
do
node Client/request -t 10.9.2.4 -p 4400 --flood
done
Ubuntu output this error for running: sh myScript.sh:
Syntax error: end of file unexpected (expecting "do")
Why is there any difference between them, since both of them are running by Bash? How can I avoid future errors caused by their differences?
I tried cat yourscript.sh | tr -d '\r' >> yournewscript.sh as related question was suggested to do, and also while [ true ].
The command hexdump -C util/runner.sh result is:
00000000 23 21 2f 75 73 72 2f 62 69 6e 2f 65 6e 76 20 62 |#!/usr/bin/env b|
00000010 61 73 68 0d 0a 0d 0a 77 68 69 6c 65 20 5b 20 74 |ash....while [ t|
00000020 72 75 65 20 5d 0d 0a 64 6f 0d 0a 20 20 20 6e 6f |rue ]..do.. no|
00000030 64 65 20 43 6c 69 65 6e 74 2f 72 65 71 75 65 73 |de Client/reques|
00000040 74 20 2d 74 20 31 39 32 2e 31 36 38 2e 30 2e 34 |t -t 192.168.0.4|
00000050 31 20 2d 70 20 34 34 30 30 20 2d 2d 66 6c 6f 6f |1 -p 4400 --floo|
00000060 64 0d 0a 64 6f 6e 65 0d 0a |d..done..|
00000069
The shebang #! line at the top of your file tells that this is a bash script. But then you run your script with sh myScript.sh, therefore using the sh shell.
The sh shell is not the same as the bash shell in Ubuntu, as explained here.
To avoid this problem in the future, you should call shell scripts using the shebang line. And also make sure to prefer bash over sh, because the bash shell is more convenient and standardized (IMHO). In order for the script to be directly callable, you have to set the executable flag, like this:
chmod +x yournewscript.sh
This has to be done only once (it's not necessary to do this on every call.)
Then you can just call the script directly:
./yournewscript.sh
and it will be interpreted by whatever command is present in the first line of the script.

tcsh if/then statement gives error

I'm trying to do a simple tcsh script to look for a folder, then navigate to it if it exists. The statement evaluates properly, but if it evaluates false, I get an error "then: then/endif not found". If it evaluates true, no problem. Where am I going wrong?
#!/bin/tcsh
set icmanagedir = ""
set workspace = `find -maxdepth 1 -name "*$user*" | sort -r | head -n1`
if ($icmanagedir != "" && $workspace != "") then
setenv WORKSPACE_DIR `readlink -f $workspace`
echo "Navigating to workspace" $WORKSPACE_DIR
cd $WORKSPACE_DIR
endif
($icmanagedir is initialized elswehere, but I get the error regardless of which variable is empty)
The problem is that tcsh needs to have every line end in a newline, including the last line; it uses the newline as the "line termination character", and if it's missing it errors out.
You can use a hex editor/viewer to check if the file ends with a newline:
$ hexdump -C x.tcsh i:arch:21:49
00000000 69 66 20 28 22 78 22 20 3d 20 22 78 22 29 20 74 |if ("x" = "x") t|
00000010 68 65 6e 0a 09 65 63 68 6f 20 78 0a 65 6e 64 69 |hen..echo x.endi|
00000020 66 |f|
Here the last character if f (0x66), not a newline. A correct file has 0x0a as the last character (represented by a .):
$ hexdump -C x.tcsh
00000000 69 66 20 28 22 78 22 20 3d 20 22 78 22 29 20 74 |if ("x" = "x") t|
00000010 68 65 6e 0a 09 65 63 68 6f 20 78 0a 65 6e 64 69 |hen..echo x.endi|
00000020 66 0a |f.|
Ending the last line in a file with a newline is a common UNIX idiom, and some shell tools expect this. See What's the point in adding a new line to the end of a file? for some more info on this.
Most UNIX editors (such as Vim, Nano, Emacs, etc.) should do this by default, but some editors or IDEs don't do this by default, but almost all editors have a setting through which this can be enabled.
The best solution is to enable this setting in your editor. If you can't do this then adding a blank line at the end also solves your problem.

Change the local setting to enable sed work correctly, but why?

The following is a bash file I wrote to convert all C++ style(//) comments in a C file to C style(/**/).
#!/bin/bash
lang=`echo $LANG`
# It's necessary to change the local setting. I don't know why.
export LANG=C
# Can comment the following statement if there is not dos2unix command.
dos2unix -q $1
sed -i -e 's;^\([[:blank:]]*\)//\(.*\);\1/* \2 */;' $1
export LANG=$lang
It works. But I found a problem I cannot explain. In default, my local setting is en_US.UTF-8. And in my C code, there are comments written in Chinese, such as
// some english 一些中文注释
If I don't change the local setting, i.e., do not run the statement export LANG=C, I'll get
/* some english */一些中文注释
instead of
/* some english 一些中文注释*/
I don't know why. I just find a solution by try and error.
After read Jonathan Leffler's answer, I think I've make some mistake leading to some misunderstand. In the question, those Chinese words were inputed in Google Chrome and were not the actual words in my C file. 一些中文注释 just means some Chinese comments.
Now I inputed // some english 一些中文注释 in Visual C++ 6.0 in Windows XP, and copied the c file to Debian. Then I just run sed -i -e 's;^([[:blank:]])//(.);\1/ \2 /;' $1 and got
/* some english 一些 */中文注释
I think it's different character coding(GB18030, GBK, UTF-8?) cause the different results.
The following is my results gotten on Debian
~/sandbox$ uname -a
Linux xyt-dev 2.6.30-1-686 #1 SMP Sat Aug 15 19:11:58 UTC 2009 i686 GNU/Linux
~/sandbox$ echo $LANG
en_US.UTF-8
~/sandbox$ cat tt.c | od -c -t x1
0000000 / / s o m e e n g l i s h
2f 2f 20 73 6f 6d 65 20 65 6e 67 6c 69 73 68 20
0000020 322 273 320 251 326 320 316 304 327 242 312 315
d2 bb d0 a9 d6 d0 ce c4 d7 a2 ca cd
0000034
~/sandbox$ ./convert_comment_style_cpp2c.sh tt.c
~/sandbox$ cat tt.c | od -c -t x1
0000000 / * s o m e e n g l i s h
2f 2a 20 20 73 6f 6d 65 20 65 6e 67 6c 69 73 68
0000020 322 273 320 251 * / 326 320 316 304 327 242 312 315
20 d2 bb d0 a9 20 2a 2f d6 d0 ce c4 d7 a2 ca cd
0000040
~/sandbox$
I think these Chinese Character encoding with 2 byte(Unicode).
There are another example:
~/sandbox$ cat tt.c | od -c -t x1
0000000 / / I n W i n d o w : 250 250 ?
2f 2f 20 49 6e 57 69 6e 64 6f 77 3a 20 a8 a8 3f
0000020 1 ?
31 3f
0000022
~/sandbox$ ./convert_comment_style_cpp2c.sh tt.c
~/sandbox$ cat tt.c | od -c -t x1
0000000 / * I n W i n d o w : *
2f 2a 20 20 49 6e 57 69 6e 64 6f 77 3a 20 20 2a
0000020 / 250 250 ? 1 ?
2f a8 a8 3f 31 3f
Which platform are you working on? Your sed script works fine on MacOS X without changing locale. The Linux terminal was less happy with the Chinese characters, but it is not setup to use UTF-8. Moreover, a hex dump of the string that it did get contained a zero byte 0x00 where the Chinese started, which might lead to the confusion. (I note that your regex adds a space before the comment text if it starts // with a space.)
MacOS X (10.6.8)
The 'odx' command use is a hex-dump program.
$ echo "// some english 一些中文注释" > x3.utf8
$ odx x3.utf8
0x0000: 2F 2F 20 73 6F 6D 65 20 65 6E 67 6C 69 73 68 20 // some english
0x0010: E4 B8 80 E4 BA 9B E4 B8 AD E6 96 87 E6 B3 A8 E9 ................
0x0020: 87 8A 0A ...
0x0023:
$ utf8-unicode x3.utf8
0x2F = U+002F
0x2F = U+002F
0x20 = U+0020
0x73 = U+0073
0x6F = U+006F
0x6D = U+006D
0x65 = U+0065
0x20 = U+0020
0x65 = U+0065
0x6E = U+006E
0x67 = U+0067
0x6C = U+006C
0x69 = U+0069
0x73 = U+0073
0x68 = U+0068
0x20 = U+0020
0xE4 0xB8 0x80 = U+4E00
0xE4 0xBA 0x9B = U+4E9B
0xE4 0xB8 0xAD = U+4E2D
0xE6 0x96 0x87 = U+6587
0xE6 0xB3 0xA8 = U+6CE8
0xE9 0x87 0x8A = U+91CA
0x0A = U+000A
$ sed 's;^\([[:blank:]]*\)//\(.*\);\1/* \2 */;' x3.utf8
/* some english 一些中文注释 */
$
All of which looks clean and tidy.
Linux (RHEL 5)
I copied the x3.utf8 file to a Linux box, and dumped it. Then I ran the sed script on it, and all seemed OK:
$ odx x3.utf8
0x0000: 2F 2F 20 73 6F 6D 65 20 65 6E 67 6C 69 73 68 20 // some english
0x0010: E4 B8 80 E4 BA 9B E4 B8 AD E6 96 87 E6 B3 A8 E9 ................
0x0020: 87 8A 0A ...
0x0023:
$ sed 's;^\([[:blank:]]*\)//\(.*\);\1/* \2 */;' x3.utf8 | odx
0x0000: 2F 2A 20 20 73 6F 6D 65 20 65 6E 67 6C 69 73 68 /* some english
0x0010: 20 E4 B8 80 E4 BA 9B E4 B8 AD E6 96 87 E6 B3 A8 ...............
0x0020: E9 87 8A 20 2A 2F 0A ... */.
0x0027:
$
So far, so good. I also tried:
$ echo $LANG
en_US.UTF-8
$ echo $LC_CTYPE
$ env | grep LC_
$ bash --version
GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.
$ cat x3.utf8
// some english 一些中文注释
$ echo $(<x3.utf8)
// some english 一些中文注释
$ sed 's;^\([[:blank:]]*\)//\(.*\);\1/* \2 */;' x3.utf8
/* some english 一些中文注释 */
$
So, the terminal is nominally working in UTF-8 after all, and it certainly seems display the data OK.
However, if I echo the string at the terminal, it gets into a tizzy. When I cut'n'pasted the string to the Linux terminal, it said:
$ echo "// some english d8d^G:
> "
// some english d8d:
$
and beeped.
$ echo "// some english d8d^G:
> " | odx
0x0000: 2F 2F 20 73 6F 6D 65 20 65 6E 67 6C 69 73 68 20 // some english
0x0010: 64 38 64 07 3A 0A 0A d8d.:..
0x0017:
$
I'm not quite sure what to make of that. I think it means that something in the input side of bash is having some problems, but I'm not quite sure. I also am getting slightly inconsistent results. The first time I tried it, I got:
$ cat > xxx
's;^\([[:blank:]]*\)//\(.*\);\1/* \2 */;'
// some english d8^#d:^[d8-f^Gf3(i^G
$ odx xxx
0x0000: 27 73 3B 5E 5C 28 5B 5B 3A 62 6C 61 6E 6B 3A 5D 's;^\([[:blank:]
0x0010: 5D 2A 5C 29 2F 2F 5C 28 2E 2A 5C 29 3B 5C 31 2F ]*\)//\(.*\);\1/
0x0020: 2A 20 5C 32 20 2A 2F 3B 27 0A 2F 2F 20 73 6F 6D * \2 */;'.// som
0x0030: 65 20 65 6E 67 6C 69 73 68 20 64 38 00 64 3A 1B e english d8.d:.
0x0040: 64 38 2D 66 07 66 33 28 69 07 0A 0A d8-f.f3(i...
0x004C:
$
And in that hex dump, you can see a 0x00 byte (offset 0x003C). That appears at the position where you got the end comment, and a null there could confuse sed; but the whole input is such a mess it is hard to know what to make of it.
Okay, here's the correct answer...
The GNU regular expression library (regex) doesn't match everything when you put a . in your expression. Yup, I know how braindead that sounds.
The problem comes from the word "character", now reasonable people will say that everything that's in the input file for sed is characters. And even in your case they are perfectly correct. But regex has been programmed to required that the input be perfectly correctly formatted characters of the current locale character set (UTF-8) if they're correctly formatted characters for the Windows character set (UTF-16) they're not "characters".
So as . only matches "characters" it doesn't match your characters.
If you used the regex //.*$, ie: pinned it to the end of the line it wouldn't match at all because there's something that's not a "character" between the // and the end of the line.
And no you can't do anything like //\(.\|[^.]\)*$, it's just impossible to match those characters without switching to the C locale.
This will also, sometimes, destroy 8-bit transparency; ie: a binary piped through sed will get corrupted even if no changes are made.
Fortunately the C locale still uses the reasonable interpretation so anything that's not a perfectly correctly formatted ASCII-68 character is still a "character".

Using UNIX linebreaks in Windows Ruby

Is it possible to tell Ruby in Windows to use only \n instead of r\n?
I'm having an issue where a file is being saved with \r\n and it is causing it to not function properly. Is there a setting somewhere I can change to fix this?
The simple attack:
File.open("foo.txt", "w") do |fd|
fd.write "this\nis\a\test\n"
end
And when I open this in hexedit:
00000000 74 68 69 73 0A 69 73 0A 61 0A 74 65 73 74 0A
^^ ^^ ^^ ^^
\n \n \n \n

Resources