Bash: difference between cat and echo - bash

This is file.txt (without an end-of-line for the last line):
foo:bar:baz:qux:quux
one:two:tree:four:five:six:seven
alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
the quick brown fox jumps over the lazy dog
File read.sh
while read -r line
do
echo $line
done < file.txt
This is what I tried in the terminal:
./read.sh
Output:
foo:bar:baz:qux:quux
one:two:tree:four:five:six:seven
alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
Why doesn't read.sh show the last end of line like cat file.txt does?

Because there is no end of line in file.txt, if you:
$ od -c file.txt
0000000 f o o : b a r : b a z : q u x :
0000020 q u u x \n o n e : t w o : t r e
0000040 e : f o u r : f i v e : s i x :
0000060 s e v e n \n a l p h a : b e t a
0000100 : g a m m a : d e l t a : e p s
0000120 i l o n : z e t a : e t a : t e
0000140 t a : i o t a : k a p p a : l a
0000160 m b d a : m u \n t h e q u i c
0000200 k b r o w n f o x j u m p
0000220 s o v e r t h e l a z y
0000240 d o g
There are no \n at the end of the file.
echo on the other other hand will always add a new line when you echo a message if there isn't one.

Other answers are right, there is simply no newline character in the end of your file.txt.
Most text editors will end a file with a newline automatically, even nano does that. But your file was generated by a script, right?
To reproduce this behavior all you have to do is:
echo -n 'hello world' >> file.txt
-n flag tells echo not to output the trailing newline.
Also, if you want your read code to work, you can use this:
while read -r line
do
printf "%s\n" "$line"
done < file.txt
[[ -n $line ]] && printf '%s' "$line"
This is going to work because actually read will place the last line into the variable, but it also will return false, thus breaking the while loop.

Your input file doesn't end in a newline.
cat file simply copies the file contents to standard output. It operates by characters, not lines, so it doesn't care if the file ends in a newline or not. But if it doesn't end in a newline, it won't add one to the output.
read -r line will read a line into the variable. It will only report success if the line ends in a newline. If the last line of the input doesn't end in newline, it reports an error, as if EOF had been reached. So the loop terminates when it tries to read the last line, instead of returning that line. That's why the script never displays the line beginning with the quick brown fox.
In general, Unix text-file programs are only defined to work on text files that end in newline. Their treatment of the last line if it doesn't have a newline is not usually specified.

Your file.txt does not contain a newline at the end of the last line. Hence cat does not show it.
Note that read.sh does not display the last line at all... in read.sh, read is waiting for a complete line of input, and since the last line is not terminated by a newline, so it is not actually read.

Related

When performing sed command, some tabs in a file are converted to a single whitespace

Background
I have a .xyz file from which I need to remove a specific set of lines from. As well as do some text replacements. I have a separate .txt file that contains a list of integers, corresponding to line numbers that need to be removed, and another for the lines which need replacing. This file will be called atomremove.txt and looks as follows. The other file is structured similarly.
Just as a preemptive TL;DR: The tabs in my input file that happen to have one extra whitespace (because they justify to a certain position regardless of one extra whitespace), end up being converted to a single whitespace in the output file.
14
13
11
10
4
The xyz file from which I need to remove lines will look like something like this.
24
Comment block
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
B 17.99018 17.98940 2.24243
C 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
H 20.41328 14.02079 2.31959
H 22.06640 8.65013 2.27145
C 19.33725 17.20040 2.26894
H 13.96336 17.42048 2.19342
H 21.69450 3.68090 2.22196
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
C 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
My Code
I am successful in doing the line removal, and the replacements, although the output is not as expected. It appears to replace some of the tabs with the whitespace, specifically for lines that have a 'y' coordinate with only 5 decimals. I am going to share the resulting output first, and then my code.
Here is the output
19
Comment Block
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
Here is my code.
atomstorefile="./extract_internal/atomremove.txt"
atomchangefile="./extract_internal/atomchange.txt"
temp="temp.txt"
tempp="tempp.txt"
temppp="temppp.txt"
filestoreloc="./"$basefilename"_xyzoutputs/chops"
#get number of files in directory and set a loop for that # of files
numfiles=$( ls "./"$basefilename"_xyzoutputs/splits" | wc -l )
numfiles=$(( numfiles/2 ))
counter=1
while [ $counter -lt $(( numfiles + 1 )) ];
do
#set a loop for each split half
splithalf=1
while [ $splithalf -lt 3 ];
do
#storing the xyz file in a temp file for edits (non destructive)
cat ./"$basefilename"_xyzoutputs/splits/split"$splithalf"-geometry$counter.xyz > $temp
#changin specified atoms
while read line;
do
line=$(( line + 2 ))
sed -i "${line}s/C/H/" $temp
done < $atomchangefile
# removing specified atoms
while read line;
do
line=$(( line + 2 ))
sed -i "${line}d" $temp
done < $atomstorefile
remainatoms=$( wc -l $temp | awk '{print $1}' )
remainatoms=$(( remainatoms - 2 ))
tail -n $remainatoms $temp > $tempp
echo $remainatoms > "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
echo Comment Block >> "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
cat $tempp >> "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
splithalf=$(( splithalf + 1 ))
done
counter=$(( counter + 1 ))
done
I am sure the solution is simple. Any insight into what is causing this issue would be very appreciated.
Not sure what you are doing but you file can be fixed using column -t < filename command.
Example :
❯ cat test
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
~
❯ column -t < test
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
~
❯
The reason you wreck your whitespace is that you need to quote your strings. But a much superior solution is to refactor all of this monumentally overcomplicated shell script to a simple sed or Awk script.
Assuming the line numbers all indicate line numbers in the original input file, try this.
tmp=$(mktemp -t atomtmpXXXXXXXXX) || exit
trap 'rm -f "$tmp"' ERR EXIT
( sed 's%$%s/C/H/%' extract_internal/atomchange.txt
sed 's%$%d%' extract_internal/atomremove.txt ) >"$tmp"
ls -l "$tmp"; nl "$tmp" # debugging
for file in "$basefilename"_xyzoutputs/splits/*; do
dst= "$basefilename"_xyzoutputs/chops/${file#*/splits/}
sed -f "$tmp" "$file" >"$dst"
done
This combines the two input files into a new sed script (remarkably, by way of sed); the debugging line lets you inspect the result (probably remove it once you understand how this works).
Your question doesn't really explain how the input files relate to the output files so I had to guess a bit. One of the important changes is to avoid sed -i when you are not modifying an existing file; but above all, definitely avoid repeatedly overwriting the same file with sed -i.

For and If in Awk scripts (bash)

My bash command is
awk -f code.txt input.txt
This is my code
{z=0
for(i=2;i<17;i++)
if ($i=="y")
z++
print $1 " " z}
This is my input
AaA y n y n y n n n y n n n n n y
BbB n y y n n n n n n n n n n n n
My output should be
AaA 5
BbB 2
Yet it is
AaA 4
BbB 2
After messing around with the code, it seems it doesn't register the last symbol of a line.
{z=0
for(i=2;i<18;i++)
print $i
print $1 " " z}
When I run this it outputs all y/n, so the problem must be somewhere in the if-statement.
It may be the case that your input file has MS-DOS line endings (CRLF). The last symbol will be read as y<CR>. To check whether this is true, on a Linux system you can run
hexdump -C input.txt
The problem seemed to be in CRLF. You can fix that in an awk loop with
sub(/\r/,"",$NF)
ie. by removing CR (replacing with nothing) from the last field. Also, you could use that sub or gsub for counting the y occurrences:
$ awk '$0=$1 OFS gsub(FS "y","&")' file
AaA 5
BbB 2
This way the \r does not matter as we just replace y with itself (&).

How to get line break in E-Mail from shellscript?

There is a shell script (bash) that check a csv file for lines that don't match a pattern and send a mail with the wrong lines. Thats works fine but while combine the wrong lines linux give a \r as line break, in the E-Mail there is no linebreak. So I try to send \r\n as line break but this has no effect, perl or bash delete this \n newline.
Here is a minimal working script as example:
SUBJECT="Error while parse CSV"
TO="rcpt#domain.tld"
wrongLines=$(perl -ne 'print "Row $.: $_\r\n" if not /^00[1-9]\d{4,}$/' $file)
MESSAGE="Error while parse following Lines, pattern dont match: \r\n $wrongLines"
echo $MESSAGE |od -c
The output of od is:
0000000 E r r o r w h i l e p a r s
0000020 e f o l l o w i n g L i n e
0000040 s , p a t t e r n d o n t
0000060 m a t c h : \ r \ n R o w
0000100 2 : 4 9 2 7 8 3 8 7 4 3 \r R
0000120 o w 3 : 4 8 2 3 2 8 9 7 3 8
0000140 \r \n
0000143
But what is the reason that in the od output the \n between the rows is deleted? I also try \x0D\x0A instead of \r\n but this also don't help. Any suggestions?
Your problem is that you're not using quotes!
Look:
$ a="A multi-line
input
variable"
$ echo $a
A multi-line input variable
$ echo "$a"
A multi-line
input
variable
$
Without quotes, you'll be victim of word splitting and filename expansion (not illustrated in the example above).
Also, adding \r or \n (that is, verbatim backslash followed by r or n) is not going to help at all.
Conclusion: Quote every variable expansion! always! (unless you really mean a glob pattern — in which case you will also add a comment in the code to explain why you purposely didn't quote the expansion).
Side note: don't use upper case variable names!
It is recommended you use lower-case names for your own parameters so as not to confuse them with the all-uppercase variable names used by Bash internal variables and environment variables.

Add space between every letter

How can I add spaces between every character or symbol within a UTF-8 document? E.g. 123hello! becomes 1 2 3 h e l l o !.
I have BASH, OpenOffice.org, and gedit, if any of those can do that.
I don't care if it sometimes leaves extra spaces in places (e.g. 2 or 3 spaces in a single place is no problem).
Shortest sed version
sed 's/./& /g'
Output
$ echo '123hello!' | sed 's/./& /g'
1 2 3 h e l l o !
Obligatory awk version
awk '$1=$1' FS= OFS=" "
Output
$ echo '123hello!' | awk '$1=$1' FS= OFS=" "
1 2 3 h e l l o !
sed(1) can do this:
$ sed -e 's/\(.\)/\1 /g' < /etc/passwd
r o o t : x : 0 : 0 : r o o t : / r o o t : / b i n / b a s h
d a e m o n : x : 1 : 1 : d a e m o n : / u s r / s b i n : / b i n / s h
It works well on e.g. UTF-8 encoded Japanese content:
$ file japanese
japanese: UTF-8 Unicode text
$ sed -e 's/\(.\)/\1 /g' < japanese
E X I F 中 の 画 像 回 転 情 報 対 応 に よ り 、 一 部 画 像 ( 特 に 『
$
sed is ok but this is pure bash
string=hello
for ((i=0; i<${#string}; i++)); do
string_new+="${string:$i:1} "
done
Since you have bash, I am will assume that you have access to sed. The following command line will do what you wish.
$ sed -e 's:\(.\):\1 :g' < input.txt > output.txt
I like these solutions because they do not have a trailing space like the rest
here.
GNU awk:
echo 123hello! | awk NF=NF FS=
GNU awk:
echo 123hello! | awk NF=NF FPAT=.
POSIX awk:
echo 123hello! | awk '{while(a=substr($0,++b,1))printf b-1?FS a:a}'
This might work for you:
echo '1 23h ello ! ' | sed 's/\s*/ /g;s/^\s*\(.*\S\)\s*$/\1/;l'
1 2 3 h e l l o !$
1 2 3 h e l l o !
In retrospect a far better solution:
sed 's/\B/ /g' file
Replaces the space between letters with a space.
string='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
echo ${string} | sed -r 's/(.{1})/\1 /g'
Pure POSIX Shell version:
addspace() {
__addspace_str="$1"
while [ -n "${__addspace_str#?}" ]; do
printf '%c ' "$__addspace_str"
__addspace_str="${__addspace_str#?}"
done
printf '%c' "$__addspace_str"
}
Or if you need to put it in a variable:
addspace_var() {
addspace_result=""
__addspace_str="$1"
while [ -n "${__addspace_str#?}" ]; do
addspace_result="$addspace_result${__addspace_str%${__addspace_str#?}} "
__addspace_str="${__addspace_str#?}"
done
addspace_result="$addspace_result$__addspace_str"
}
addspace_var abc
echo "$addspace_result"
Tested with dash, ksh, zsh, bash (+ bash --posix), and busybox ash.
Explanation
${x#?}
This parameter expansion removes the first character of x. ${x#...} in general removes a prefix given by a pattern, and ? matches any single character.
printf '%c ' "$str"
The %c format parameter transforms the string argument into its first character, so the full format string '%c ' prints the first character of the string followed by a space. Note that if the string was empty this would cause issues, but we already checked that it wasn't before, so it's fine. To print the first character safely in any situation we can use '%.1s', but I like living dangerously :3j
${x%${x#?}}
This is an alternate way to get the first character of the string. We already know that ${x#?} is all but the first character. Well, ${x%...} removes ... from the end of x, so ${x%${x#?}} removes all but the first character from the end of x, leaving only the first one.
__prefixed_variable_names
POSIX doesn't define local, so to avoid variable conflicts it's safer to create unique names that are unlikely to clobber each other. I am starting to experiment using M4 to generate unique names while not having to destroy my code every time but it's probably overkill for people who don't use shell as much as me.
[ -n "${str#?}" ]
Why not just [ -n "$str" ]? It's to avoid the dreaded trailing space, it's also why we have a little statement guy at the bottom there outside the loop. The loops goes until the string is one character long, then we finish outside of it so we can append this last character without adding a space.
When should I use this?
This is good for small inputs in long running loops, since it avoids the overhead of calling an external process, but for larger inputs it starts lagging behind fast, specially the var version. (I fault the ${x%${x#?}} trick).
Benchmark Commands
# addspace
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace \"$input\" >/dev/null; done"
# addspace_var
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace_var \"$input\" >/dev/null; done"
# sed for comparison
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do echo \"$input\" | sed 's/./& /g' >/dev/null; done"
Input Length = 3
addspace addspace_var sed
real 0m0,106s 0m0,106s 0m10,651s
user 0m0,077s 0m0,075s 0m9,349s
sys 0m0,029s 0m0,031s 0m3,030s
Input Length = 200
addspace addspace_var sed
real 0m6,050s 0m47,115s 0m11,049s
user 0m5,557s 0m46,919s 0m9,727s
sys 0m0,488s 0m0,068s 0m3,085s
Input Length = 1000
addspace addspace_var sed
real 0m55,989s TBD 0m11,534s
user 0m53,560s TBD 0m10,214s
sys 0m2,428s TBD 0m2,975s
(Yeah, I was waiting a bit for that last var one.)
In situations like this you can simply check the length of the input and call the appropriate function for maximum performance.
addspace() {
if [ ${#1} -lt 100 ]; then
addspace_builtins "$1"
else
addspace_proccess "$1"
fi
}

OSX - "sort" by 1st character only issue, tried -k 1.1,1.1

I'm on OSX 10.6.8
I'm having some issues sorting a text file by the first character.
I'm concatenating three files into one and need the final result sorted by the first alphabetical letter.
Each file has lines that look like this:
A025-001
A118-001
A118-002
B657-001
D316-001
So the file after concatenation via "cat" looks like this:
A025-001
....
A025-001 (where file 2 was appended)
....
A025-001 (where file 3 was appended)
I've tried "sort -k 1.1,1.1 result.txt > sortedresult.txt" and with a large amount of other options in the man page: i,b,f,s (just guessing in hopes that I may have found the right one)
I need all the entries to be put next to each other:
A025-001
A025-001
B.......
B.......
D.......
Hopefully, someone more knowledgeable than thou can help me solve this problem.
Thanks
Update: the data files themselves aren't working well with unix tools. If I cat the results file, only a few lines are shown, of many. Opening them in "vim" shows a bunch of ^M characters. It seems as if sort is not going through the whole file.
There's column header at the top, with fields in quotations, tab-separated e.g. "Product" \t "Category" \t
The rest of the data is tab-separated but without quotations.
sample od -c:
0000000 " P r o d u c t N u m b e r "
0000020 \t " L o o k u p A t t r i b u
0000040 t e 1 G r o u p " \t " L o o
0000060 k u p A t t r i b u t e 1
0000100 N a m e " \t " L o o k u p A t
0000120 t r i b u t e 1 V a l u e "
0000140 \t " L o o k u p A t t r i b u
0000160 t e 1 V a l u e I m a g e
0000200 " \t " L o o k u p A t t r i b
Here's some of the data (not the column header):
0000660 " \n A 0 2 5 - 0 0 1 \t F a c e t
0000700 \t F a c e t C o l o r \t B l u e
0000720 \t C C D D D D \t O P T I O N \t \r
Does anyone know why it is doing this?
Update #2: The files were exported out of FileMaker as ASCII. You'll see a lot of extra tabs, just ignore those, once we get this figured out I'll sed them out. Here is the entire file along with a hexdump and od -c of the file: pastebin.com/UzaUgG6C
Looking at the pastebin, it seems FileMaker is terminating the column headers with \n and separating your records with \r. You need to normalize your line endings first.
cat result.txt | tr '\r' '\n' | sort
I think the problem is just the line endings. The ^M characters are carriage returns. UNIX tools generally expect newlines, and no carriage returns. Try the answers to this question or try running mac2unix if you have it.
Try
sort -k1.1,1.2 result.txt > sortedresult.txt
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.
You should try simply:
cat file1.txt file2.txt file3.txt | sort > result.txt
using the -k 1.1,1.1 will not make any use as there is only one field
To make it stable, that is, the group of entries for which the first characters are same, will keep the relative ordering same, you might use the -s switch with the -k 1.1,1.1 switch.
cat file1.txt file2.txt file3.txt | sort -s -k 1.1,1.1 > result.txt
I think this is the solution you need.

Resources