Remove carriage return end of variable - bash

I'm getting really strange output for this program. What is the "Carriage Return" doing, and how to remove it - missing single quote in the end? Why is the letter "T" missing? How to write code to correct this?
code i'm using
#!/bin/bash
export DATABASE_LIST="/opt/halogen/crontab/etc/db_stat_list.cfg"
export v3=""
while read -r USERID ORACLE_SID2
do
v3="This is '${ORACLE_SID2}' "
echo $v3
done < <(tac $DATABASE_LIST)
output
'his is 'OT1SL80
'his is 'OT1SL010
The file I'm reading from is not corrupt and is small one with two lines
[oracle#ot1sldbm001v test2]$ cat /opt/halogen/crontab/etc/db_stat_list.cfg
asp_dba/dba OT1SL010
asp_dba/dba OT1SL80
Thank you

Your DATABASE_LIST file is in DOS/Windows format, with carriage return + linefeed at the end of each line. Unix uses just linefeed as a line terminator, so unix tools treat the carriage return as part of the content of the line. You can keep this from being a problem by telling the read command to treat the carriage return as whitespace (like spaces, tabs, etc), since read automatically removes whitespace from the beginning and end of lines:
...
while IFS="$IFS"$'\r' read -r USERID ORACLE_SID2
...
Note that since this assignment to IFS (which basically lists the whitespace characters) is a prefix to the read command, it only applies to that one command and doesn't have to be set back to normal afterward.

Related

Is there a way to use bash to get specific text content of a .eml?

Total noob here with both bash and working with .eml files, so bare with me...
I have a folder with many saved .eml files, and I want a bash script (if this is not possible with bash, I'm willing to use python, or zsh, or maybe perl--never used perl before, but it may be good to learn) that will print the email content after a line containing a specific textual phrase, and before the next empty line.
I also want this script to combine consecutive lines ending in "=". (Lines that do not end with an "=" sign should continue printing on a new line.)
All of my testing with .txt files that I create manually work fine, but when I use an actual .eml file, then things stop working.
Here is a portion of a sample .eml file:
(.eml file continues above)
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
testing
StartLine (This is where stuff begins)
This is a line that should be printed.
This is a long line that should be printed. Soooooooooooooooooooooooooooooo=
Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo L=
oooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loo=
oooooooooooooooooooooonnnnnnnnnggggg.
This is where things should stop (no more printing)
Don=92t print me please!
Don=92t print me please!
Don=92t print me please!
[This message is from an external sender.]
(.eml file continues below)
I want the script to output:
This is a line that should be printed.
This is a long line that should be printed. Soooooooooooooooooooooooooooooo Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loooooooooooooooooooooooonnnnnnnnnggggg.
Here is my script so far:
#!/bin/bash
files="/Users/username/Desktop/emails/*"
specifictext="StartLine"
for f in $files
do
begin=false
previous=""
while read -r line
do
if [[ -z "$line" ]] #this doesn't seem to be working right
then
begin=false
fi
if [[ "$begin" = true ]]
then
if [[ "${line:0-1}" = "=" ]] #this also doesn't appear to be working
then
previous=$previous"${line::${#line}-1}"
else
echo $previous$line
fi
fi
if [[ $line = "$specifictext"* ]]
then
begin=true
fi
done < "$f"
done
This will successfully skip everything up to and including the line containing $specifictext, but then it will print off the entire remainder of each email instead of stopping at the next empty line. Like this:
$ ./printeml.sh
This is a line that should be printed.
This is a long line that should be printed. Soooooooooooooooooooooooooooooo=
Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo L=
oooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loo=
oooooooooooooooooooooonnnnnnnnnggggg.
This is where things should stop (no more printing)
Don=92t print me please!
Don=92t print me please!
Don=92t print me please!
[This message is from an external sender.]
(continues printing remainder of .eml)
As you can see above, the other issue I'm having is that I wanted to get combine lines with "=" signs at the end, but that is not working. It appears all the testing I do with test files works fine, except when I use an actual .eml file. I think this is an issue with hidden characters in .eml files, but I'm not really sure how that works.
I'm using bash version 3.2.57(1) on MacOS 12.4.
Both of your problems stem from the fact that the .eml file is using Windows line endings (really, MIME line endings; the specification is designed for transmission over the TELNET protocol and thus dictates the use of CRLF instead of bare LF). Bash doesn't understand those, and sees the carriage return as an ordinary character that happens to be the last character of every line. So the blank lines are really single-character lines containing a carriage return, and the lines ending in an = really end in = followed by a carriage return ($'=\r'). When you check the last character, you're getting the carriage return, which of course is never =.
But that's just part of the problem. You could convert the file to UNIX line-endings (though it wouldn't be a valid .eml file at that point) or account for the CRs in your code. However, the trailing equal sign for continued lines is just one part of the "quoted printable" encoding scheme that the Content-Encoding header tells you the message body is using. Another thing you may run into is that Q-P messages cannot legally contain any characters outside the ASCII range, but must use =xx with two hex digits to represent such characters. Any Windows-1252 characters whose code point is > 127 will be replaced by =xx with the code in hexadecimal – as will any literal equal signs, which become =3D.
So you should ideally be using some library that understands MIME messages rather than trying to roll your own code to do bits and pieces of the decoding. Perhaps a Perl script using the MIME::Parser module would be appropriate? Or you could use the Python answers given to this question.

sort -o appends newline to end of file - why?

I'm working on a small text file with a list of words in it that I want to add a new word to, and then sort. The file doesn't have a newline at the end when I start, but does after the sort. Why? Can I avoid this behavior or is there a way to strip the newline back out?
Example:
words.txt looks like
apple
cookie
salmon
I then run printf "\norange" >> words.txt; sort words.txt -o words.txt
I use printf rather than echo figuring that'll avoid the newline, but the file then reads
apple
cookie
orange
salmon
#newline here
If I just run printf "\norange" >> words.txt orange appears at the bottom of the file, with no newline, ie;
apple
cookie
salmon
orange
This behavior is explicitly defined in the POSIX specification for sort:
The input files shall be text files, except that the sort utility shall add a newline to the end of a file ending with an incomplete last line.
As a UNIX "text file" is only valid if all lines end in newlines, as also defined in the POSIX standard:
Text file - A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the newline character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Think about what you are asking sort to do.
You are asking it "take all the lines, and sort them in order."
You've given it a file containing four lines, which it splits to the following strings:
"salmon\n"
"cookie\n"
"orange"
It sorts these for you dutifully:
"cookie\n"
"orange"
"salmon\n"
And it then outputs them as a single string:
"cookie
orangesalmon
"
That is almost certainly exactly what you do not want.
So instead, if your file is missing the terminating newline that it should have had, the sort program understands that, most likely, you still intended that last line to be a line, rather than just a fragment of a line. It appends a \n to the string "orange", making it "orange\n". Then it can be sorted properly, without "orange" getting concatenated with whatever line happens to come immediately after it:
"cookie\n"
"orange\n"
"salmon\n"
So when it then outputs them as a single string, it looks a lot better:
"cookie
orange
salmon
"
You could strip the last character off the file, the one from the end of "salmon\n", using a range of handy tools such as awk, sed, perl, php, or even raw bash. This is covered elsewhere, in places like:
How can I remove the last character of a file in unix?
But please don't do that. You'll just cause problems for all other utilities that have to handle your files, like sort. And if you assume that there is no terminating newline in your files, then you will make your code brittle: any part of the toolchain which "fixes" your error (as sort kinda does here) will "break" your code.
Instead, treat text files the way they are meant to be treated in unix: a sequence of "lines" (strings of zero or more non-newline bytes), each followed by a newline.
So newlines are line-terminators, not line-separators.
There is a coding style where prints and echos are done with the newline leading. This is wrong for many reasons, including creating malformed text files, and causing the output of the program to be concatenated with the command prompt. printf "orange\n" is correct style, and also more readable: at a glance someone maintaining your code can tell you're printing the word "orange" and a newline, whereas printf "\norange" looks at first glance like it's printing a backslash and the phrase "no range" with a missing space.

in Ksh, why does the last (empty) line of my multi-line string disappears when saving it in a variable?

While implementing a script, I am facing the following issue :
when putting the multi-line result of a command into a variable, it seems the last (empty) line of my multi-line string disappear.
This line is "empty", but however, I can not lose the carriage return it contains (because I am concatenating blocks of code saved in DB and containing "\n" character into a human-readable string... If I lose some of the "\n", I will lose a part of my code indentation)
Here is the code to illustrate my issue :
test="A
B
";
test2=`echo "$test"`;
echo "||$test2||";
This returns
||A
B||
while I was expecting :
||A
B
||
--> the last (empty) line has disappeared... and a carriage return is thus missing in my human-readable code.
This issue only occurs when the last line of my multi-line string is empty...
Do you know
Why this last line disappears ?
How I can ensure my last empty line is saved in my multi-line string variable ?
Note that I can of course not use the easiest solution
test2="$test";
because the complete process is rather :
test="^A\n\nB\n^"
test2="`echo "$test" | sed -e 's/\^//g'`";
but I tried to simplify the issue the most I could.
Command substitutions always trim trailing newlines -- that's in accordance with design and specification. If you don't want that, you can append a fixed sigil character to your output and trim it, such that the newlines you want to preserve are before the sigil:
test="A
B
"
test_wip=$(printf '%sEND' "$test")
test2=${test_wip%END}
Instead of trying to work around the issues that arise from assigning the output from echo to a variable (eg, stripping of trailing \n's), consider using ksh's built in string processing in this case, eg:
$ test="^A\n\nB\n^"
$ test2="${test//^}"
$ echo "||${test2}||"
||A
B
||
//^ : remove all ^ characters

Bash script sourcing config file but can't use vars in arithmetic

This is killing me. I have a config file, "myconfig.cfg", with the following content:
SOME_VAR=2
echo "I LOVE THIS"
Then I have a script that I'm trying to run, that sources the config file in order to use the settings in there as variables. I can print them out fine, but when I try to put one into a numeric variable for use in something like a "seq " command, I get this weird "invalid arithmetic operator" error.
Here's the script:
#!/bin/bash
source ./myconfig.cfg
echo "SOME_VAR=${SOME_VAR}"
let someVarNum=${SOME_VAR}
echo "someVarNum=${someVarNum}"
And here's the output:
I LOVE THIS
SOME_VAR=2
")syntax error: invalid arithmetic operator (error token is "
someVarNum=
I've tried countless things that theoretically shouldn't make a difference, and, surprise, they don't. I simply can't figure it out. If I simply take the line "SOME_VAR=2" and put it directly into the script, everything's fine. I'm guessing I'll have to read in the config file line by line, split the strings by "=", and find+create the variables I want to use manually.
The error is precisely as indicated in a comment by #TomFenech. The first line (and possibly all the lines) in myconfig.cfg is terminated with a Windows CR-LF line ending. Bash considers CR to be an ordinary character (not whitespace), so it will set SOME_VAR to the two character string 2CR. (CR is the character with hex code 0x0D. You could see that if you display the file with a hex-dumper: hd myconfig.cfg.)
The let command performs arithmetic on numbers. It also considers the CR to be an ordinary character, but it is neither a digit nor an operator so it complains. Unfortunately, it does not make any attempt to sanitize the display of the character in the error message, so the carriage return is displayed between the two " symbols. Consequently, the end of the error message overwrites the beginning.
Don't create Unix files with a Windows text editor. Or use a utility like dos2unix to fix them once you copy them to the Unix machine.

bash parameter expansion changes original string

I have inherited a shell script. One of the things it does is parsing of a list of filenames. For every filename in the list it does following command:
fs_item="`echo ${fs_item%/}`"
This command (a part from doing it's job which in this case, I think, is to remove everything after last slash) replaces spaces in filename with one space:
in: aa bbbb ccc
out: aa bbbb ccc
From this point filename is broken.
So, the question is: can I somehow tell bash not to replace spaces?
Get rid of the backticks and the echo command. It is worse than useless in this situation because it adds nothing, and causes the problem you are trying to solve here.
fs_item="${fs_item%/}"
Is the echo really necessary? You could simply remove it:
fs_item="${fs_item%/}"
If your actual problem is something different, and you cannot get rid of the echo (or some other command invocation), adding some quotes should work:
fs_item="`echo \"${fs_item%/}\"`"
The spaces vanish when running the backticked echo command. The internal field separator includes the space character, so words separated by a sequence of one or more spaces will be passed as separate arguments to echo. Then, echo just prints it's arguments separated by a single space.
Since we're on the internal field separator subject, changing the IFS should also work (but usually has other possibly undesirable effects elsewhere in your script):
IFS=$'\n'
This sets the internal field separator to the newline character. After this, the spaces are no longer considered to be separators for lists. The echo command will receive just one argument (unless you have file names with the newline character in them) and spaces will stay intact.
Try setting IFS to something else, e.g. IFS=","

Resources