There is a shell script (bash) that check a csv file for lines that don't match a pattern and send a mail with the wrong lines. Thats works fine but while combine the wrong lines linux give a \r as line break, in the E-Mail there is no linebreak. So I try to send \r\n as line break but this has no effect, perl or bash delete this \n newline.
Here is a minimal working script as example:
SUBJECT="Error while parse CSV"
TO="rcpt#domain.tld"
wrongLines=$(perl -ne 'print "Row $.: $_\r\n" if not /^00[1-9]\d{4,}$/' $file)
MESSAGE="Error while parse following Lines, pattern dont match: \r\n $wrongLines"
echo $MESSAGE |od -c
The output of od is:
0000000 E r r o r w h i l e p a r s
0000020 e f o l l o w i n g L i n e
0000040 s , p a t t e r n d o n t
0000060 m a t c h : \ r \ n R o w
0000100 2 : 4 9 2 7 8 3 8 7 4 3 \r R
0000120 o w 3 : 4 8 2 3 2 8 9 7 3 8
0000140 \r \n
0000143
But what is the reason that in the od output the \n between the rows is deleted? I also try \x0D\x0A instead of \r\n but this also don't help. Any suggestions?
Your problem is that you're not using quotes!
Look:
$ a="A multi-line
input
variable"
$ echo $a
A multi-line input variable
$ echo "$a"
A multi-line
input
variable
$
Without quotes, you'll be victim of word splitting and filename expansion (not illustrated in the example above).
Also, adding \r or \n (that is, verbatim backslash followed by r or n) is not going to help at all.
Conclusion: Quote every variable expansion! always! (unless you really mean a glob pattern — in which case you will also add a comment in the code to explain why you purposely didn't quote the expansion).
Side note: don't use upper case variable names!
It is recommended you use lower-case names for your own parameters so as not to confuse them with the all-uppercase variable names used by Bash internal variables and environment variables.
Related
In the dist_train.sh from mmdetection3d, what does ${#:3} do at the last line ?
I can't understand its bash grammar.
#!/usr/bin/env bash
CONFIG=$1
GPUS=$2
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-29500}
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -m torch.distributed.launch \
--nnodes=$NNODES \
--node_rank=$NODE_RANK \
--master_addr=$MASTER_ADDR \
--nproc_per_node=$GPUS \
--master_port=$PORT \
$(dirname "$0")/train.py \
$CONFIG \
--seed 0 \
--launcher pytorch ${#:3}
It is standard parameter expansion:
${parameter:offset}
${parameter:offset:length}
This is referred to as Substring Expansion. It expands to up to
length characters of the value of parameter starting at the character
specified by offset. If parameter is #, an indexed array
subscripted by # or *, or an associative array name, the
results differ as described below. If length is omitted, it expands
to the substring of the value of parameter starting at the character
specified by offset and extending to the end of the value. length
and offset are arithmetic expressions (see Shell Arithmetic).
[...]
If parameter is #, the result is length positional parameters
beginning at offset. A negative offset is taken relative to one
greater than the greatest positional parameter, so an offset of -1
evaluates to the last positional parameter. It is an expansion error
if length evaluates to a number less than zero.
The following examples illustrate substring expansion using positional parameters:
$ set -- 1 2 3 4 5 6 7 8 9 0 a b c d e f g h
$ echo ${#:7}
7 8 9 0 a b c d e f g h
$ echo ${#:7:0}
$ echo ${#:7:2}
7 8
$ echo ${#:7:-2}
bash: -2: substring expression < 0
$ echo ${#: -7:2}
b c
$ echo ${#:0}
./bash 1 2 3 4 5 6 7 8 9 0 a b c d e f g h
$ echo ${#:0:2}
./bash 1
$ echo ${#: -7:0}
Per the Bash Hackers wiki on the Positional Parameters syntax, the ${#:3} means any script argument starting at the third argument.
In other words, the ${#:3} syntax means "all arguments EXCEPT the first and second". A similar SO question exists from which you can infer the same conclusion.
A contrived example:
foo() {
echo "${#:3}"
}
foo a b c d e f g h i
# prints c d e f g h i
Great question.
In bash this is one kind of something called variable expansion. In this case the variable is $# representing all the parameters received by the program (or function), as a string.
Using the colon : means that you want to 'expand' $# to a subset of it's original string (ie. a substring).
So in this instance you're saying give me the string representing all the incoming parameters, but start from the 3rd one.
Background
I have a .xyz file from which I need to remove a specific set of lines from. As well as do some text replacements. I have a separate .txt file that contains a list of integers, corresponding to line numbers that need to be removed, and another for the lines which need replacing. This file will be called atomremove.txt and looks as follows. The other file is structured similarly.
Just as a preemptive TL;DR: The tabs in my input file that happen to have one extra whitespace (because they justify to a certain position regardless of one extra whitespace), end up being converted to a single whitespace in the output file.
14
13
11
10
4
The xyz file from which I need to remove lines will look like something like this.
24
Comment block
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
B 17.99018 17.98940 2.24243
C 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
H 20.41328 14.02079 2.31959
H 22.06640 8.65013 2.27145
C 19.33725 17.20040 2.26894
H 13.96336 17.42048 2.19342
H 21.69450 3.68090 2.22196
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
C 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
My Code
I am successful in doing the line removal, and the replacements, although the output is not as expected. It appears to replace some of the tabs with the whitespace, specifically for lines that have a 'y' coordinate with only 5 decimals. I am going to share the resulting output first, and then my code.
Here is the output
19
Comment Block
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
Here is my code.
atomstorefile="./extract_internal/atomremove.txt"
atomchangefile="./extract_internal/atomchange.txt"
temp="temp.txt"
tempp="tempp.txt"
temppp="temppp.txt"
filestoreloc="./"$basefilename"_xyzoutputs/chops"
#get number of files in directory and set a loop for that # of files
numfiles=$( ls "./"$basefilename"_xyzoutputs/splits" | wc -l )
numfiles=$(( numfiles/2 ))
counter=1
while [ $counter -lt $(( numfiles + 1 )) ];
do
#set a loop for each split half
splithalf=1
while [ $splithalf -lt 3 ];
do
#storing the xyz file in a temp file for edits (non destructive)
cat ./"$basefilename"_xyzoutputs/splits/split"$splithalf"-geometry$counter.xyz > $temp
#changin specified atoms
while read line;
do
line=$(( line + 2 ))
sed -i "${line}s/C/H/" $temp
done < $atomchangefile
# removing specified atoms
while read line;
do
line=$(( line + 2 ))
sed -i "${line}d" $temp
done < $atomstorefile
remainatoms=$( wc -l $temp | awk '{print $1}' )
remainatoms=$(( remainatoms - 2 ))
tail -n $remainatoms $temp > $tempp
echo $remainatoms > "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
echo Comment Block >> "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
cat $tempp >> "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
splithalf=$(( splithalf + 1 ))
done
counter=$(( counter + 1 ))
done
I am sure the solution is simple. Any insight into what is causing this issue would be very appreciated.
Not sure what you are doing but you file can be fixed using column -t < filename command.
Example :
❯ cat test
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
~
❯ column -t < test
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
~
❯
The reason you wreck your whitespace is that you need to quote your strings. But a much superior solution is to refactor all of this monumentally overcomplicated shell script to a simple sed or Awk script.
Assuming the line numbers all indicate line numbers in the original input file, try this.
tmp=$(mktemp -t atomtmpXXXXXXXXX) || exit
trap 'rm -f "$tmp"' ERR EXIT
( sed 's%$%s/C/H/%' extract_internal/atomchange.txt
sed 's%$%d%' extract_internal/atomremove.txt ) >"$tmp"
ls -l "$tmp"; nl "$tmp" # debugging
for file in "$basefilename"_xyzoutputs/splits/*; do
dst= "$basefilename"_xyzoutputs/chops/${file#*/splits/}
sed -f "$tmp" "$file" >"$dst"
done
This combines the two input files into a new sed script (remarkably, by way of sed); the debugging line lets you inspect the result (probably remove it once you understand how this works).
Your question doesn't really explain how the input files relate to the output files so I had to guess a bit. One of the important changes is to avoid sed -i when you are not modifying an existing file; but above all, definitely avoid repeatedly overwriting the same file with sed -i.
This question already has answers here:
How can I replace each newline (\n) with a space using sed?
(43 answers)
Closed 1 year ago.
Why are new lines unaffected by the following code?
echo "line 1" > /tmp/xxx
echo "line 2" >> /tmp/xxx
echo "line 3" >> /tmp/xxx
sed -e 's/\n/\000/g' /tmp/xxx | od -xc
results in:
0000000 696c 656e 3120 6c0a 6e69 2065 0a32 696c
l i n e 1 \n l i n e 2 \n l i
0000020 656e 3320 000a
n e 3 \n
0000025
Why are new lines unaffected by the following code?
Because newline is not read to pattern space, as the newline character delimits lines and is not part of the read line.
From POSIX sed:
In default operation, sed cyclically shall append a line of input, less its terminating <newline> character, into the pattern space. [...]
This is file.txt (without an end-of-line for the last line):
foo:bar:baz:qux:quux
one:two:tree:four:five:six:seven
alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
the quick brown fox jumps over the lazy dog
File read.sh
while read -r line
do
echo $line
done < file.txt
This is what I tried in the terminal:
./read.sh
Output:
foo:bar:baz:qux:quux
one:two:tree:four:five:six:seven
alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
Why doesn't read.sh show the last end of line like cat file.txt does?
Because there is no end of line in file.txt, if you:
$ od -c file.txt
0000000 f o o : b a r : b a z : q u x :
0000020 q u u x \n o n e : t w o : t r e
0000040 e : f o u r : f i v e : s i x :
0000060 s e v e n \n a l p h a : b e t a
0000100 : g a m m a : d e l t a : e p s
0000120 i l o n : z e t a : e t a : t e
0000140 t a : i o t a : k a p p a : l a
0000160 m b d a : m u \n t h e q u i c
0000200 k b r o w n f o x j u m p
0000220 s o v e r t h e l a z y
0000240 d o g
There are no \n at the end of the file.
echo on the other other hand will always add a new line when you echo a message if there isn't one.
Other answers are right, there is simply no newline character in the end of your file.txt.
Most text editors will end a file with a newline automatically, even nano does that. But your file was generated by a script, right?
To reproduce this behavior all you have to do is:
echo -n 'hello world' >> file.txt
-n flag tells echo not to output the trailing newline.
Also, if you want your read code to work, you can use this:
while read -r line
do
printf "%s\n" "$line"
done < file.txt
[[ -n $line ]] && printf '%s' "$line"
This is going to work because actually read will place the last line into the variable, but it also will return false, thus breaking the while loop.
Your input file doesn't end in a newline.
cat file simply copies the file contents to standard output. It operates by characters, not lines, so it doesn't care if the file ends in a newline or not. But if it doesn't end in a newline, it won't add one to the output.
read -r line will read a line into the variable. It will only report success if the line ends in a newline. If the last line of the input doesn't end in newline, it reports an error, as if EOF had been reached. So the loop terminates when it tries to read the last line, instead of returning that line. That's why the script never displays the line beginning with the quick brown fox.
In general, Unix text-file programs are only defined to work on text files that end in newline. Their treatment of the last line if it doesn't have a newline is not usually specified.
Your file.txt does not contain a newline at the end of the last line. Hence cat does not show it.
Note that read.sh does not display the last line at all... in read.sh, read is waiting for a complete line of input, and since the last line is not terminated by a newline, so it is not actually read.
I'm on OSX 10.6.8
I'm having some issues sorting a text file by the first character.
I'm concatenating three files into one and need the final result sorted by the first alphabetical letter.
Each file has lines that look like this:
A025-001
A118-001
A118-002
B657-001
D316-001
So the file after concatenation via "cat" looks like this:
A025-001
....
A025-001 (where file 2 was appended)
....
A025-001 (where file 3 was appended)
I've tried "sort -k 1.1,1.1 result.txt > sortedresult.txt" and with a large amount of other options in the man page: i,b,f,s (just guessing in hopes that I may have found the right one)
I need all the entries to be put next to each other:
A025-001
A025-001
B.......
B.......
D.......
Hopefully, someone more knowledgeable than thou can help me solve this problem.
Thanks
Update: the data files themselves aren't working well with unix tools. If I cat the results file, only a few lines are shown, of many. Opening them in "vim" shows a bunch of ^M characters. It seems as if sort is not going through the whole file.
There's column header at the top, with fields in quotations, tab-separated e.g. "Product" \t "Category" \t
The rest of the data is tab-separated but without quotations.
sample od -c:
0000000 " P r o d u c t N u m b e r "
0000020 \t " L o o k u p A t t r i b u
0000040 t e 1 G r o u p " \t " L o o
0000060 k u p A t t r i b u t e 1
0000100 N a m e " \t " L o o k u p A t
0000120 t r i b u t e 1 V a l u e "
0000140 \t " L o o k u p A t t r i b u
0000160 t e 1 V a l u e I m a g e
0000200 " \t " L o o k u p A t t r i b
Here's some of the data (not the column header):
0000660 " \n A 0 2 5 - 0 0 1 \t F a c e t
0000700 \t F a c e t C o l o r \t B l u e
0000720 \t C C D D D D \t O P T I O N \t \r
Does anyone know why it is doing this?
Update #2: The files were exported out of FileMaker as ASCII. You'll see a lot of extra tabs, just ignore those, once we get this figured out I'll sed them out. Here is the entire file along with a hexdump and od -c of the file: pastebin.com/UzaUgG6C
Looking at the pastebin, it seems FileMaker is terminating the column headers with \n and separating your records with \r. You need to normalize your line endings first.
cat result.txt | tr '\r' '\n' | sort
I think the problem is just the line endings. The ^M characters are carriage returns. UNIX tools generally expect newlines, and no carriage returns. Try the answers to this question or try running mac2unix if you have it.
Try
sort -k1.1,1.2 result.txt > sortedresult.txt
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.
You should try simply:
cat file1.txt file2.txt file3.txt | sort > result.txt
using the -k 1.1,1.1 will not make any use as there is only one field
To make it stable, that is, the group of entries for which the first characters are same, will keep the relative ordering same, you might use the -s switch with the -k 1.1,1.1 switch.
cat file1.txt file2.txt file3.txt | sort -s -k 1.1,1.1 > result.txt
I think this is the solution you need.