Weird txt behavior

Weird txt behavior - shell

I have a centos server. I cloned a GitHub repository. And I have .txt file in that repository which contains 1 line. For some reason it does that:
[root#0-0-0-0 Some]# cat some.txt
some text[root#0-0-0-0 Some]#
And also while read i; do echo "$i"; done < some.txt don't see that line. What could cause that? And how to avoid it. If I edit it with vim adding a new line and then deleting that new line (so it still contains only one line) it starts to work properly.

The text file has no newline character at the end of it. Some programs will treat it as a valid text file whose last line doesn't happen to end in a newline. Others (apparently including bash's built-in read command, at least by default) will treat it as invalid, and perhaps ignore the last line (which isn't considered a "line" because it's not marked as one).
vim's default behavior is to quietly add a newline to the end of a file if you modify and save it.
You can add a newline to a file that lacks one by editing it with vim (or another editor that behaves similarly), or by adding it from the shell:
echo '' >> some.txt
In general, it's a good idea to ensure that text files end in a newline character in the first place, at least if they're intended to be used on UNIX-like systems.

Related

Is there a way to use bash to get specific text content of a .eml?

Total noob here with both bash and working with .eml files, so bare with me...
I have a folder with many saved .eml files, and I want a bash script (if this is not possible with bash, I'm willing to use python, or zsh, or maybe perl--never used perl before, but it may be good to learn) that will print the email content after a line containing a specific textual phrase, and before the next empty line.
I also want this script to combine consecutive lines ending in "=". (Lines that do not end with an "=" sign should continue printing on a new line.)
All of my testing with .txt files that I create manually work fine, but when I use an actual .eml file, then things stop working.
Here is a portion of a sample .eml file:
(.eml file continues above)
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
testing
StartLine (This is where stuff begins)
This is a line that should be printed.
This is a long line that should be printed. Soooooooooooooooooooooooooooooo=
Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo L=
oooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loo=
oooooooooooooooooooooonnnnnnnnnggggg.
This is where things should stop (no more printing)
Don=92t print me please!
Don=92t print me please!
Don=92t print me please!
[This message is from an external sender.]
(.eml file continues below)
I want the script to output:
This is a line that should be printed.
This is a long line that should be printed. Soooooooooooooooooooooooooooooo Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loooooooooooooooooooooooonnnnnnnnnggggg.
Here is my script so far:
#!/bin/bash
files="/Users/username/Desktop/emails/*"
specifictext="StartLine"
for f in $files
do
begin=false
previous=""
while read -r line
do
if [[ -z "$line" ]] #this doesn't seem to be working right
then
begin=false
fi
if [[ "$begin" = true ]]
then
if [[ "${line:0-1}" = "=" ]] #this also doesn't appear to be working
then
previous=$previous"${line::${#line}-1}"
else
echo $previous$line
fi
fi
if [[ $line = "$specifictext"* ]]
then
begin=true
fi
done < "$f"
done
This will successfully skip everything up to and including the line containing $specifictext, but then it will print off the entire remainder of each email instead of stopping at the next empty line. Like this:
$ ./printeml.sh
This is a line that should be printed.
This is a long line that should be printed. Soooooooooooooooooooooooooooooo=
Loooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo L=
oooooooooooooooooooooooonnnnnnnnnggggg. Soooooooooooooooooooooooooooooo Loo=
oooooooooooooooooooooonnnnnnnnnggggg.
This is where things should stop (no more printing)
Don=92t print me please!
Don=92t print me please!
Don=92t print me please!
[This message is from an external sender.]
(continues printing remainder of .eml)
As you can see above, the other issue I'm having is that I wanted to get combine lines with "=" signs at the end, but that is not working. It appears all the testing I do with test files works fine, except when I use an actual .eml file. I think this is an issue with hidden characters in .eml files, but I'm not really sure how that works.
I'm using bash version 3.2.57(1) on MacOS 12.4.

Both of your problems stem from the fact that the .eml file is using Windows line endings (really, MIME line endings; the specification is designed for transmission over the TELNET protocol and thus dictates the use of CRLF instead of bare LF). Bash doesn't understand those, and sees the carriage return as an ordinary character that happens to be the last character of every line. So the blank lines are really single-character lines containing a carriage return, and the lines ending in an = really end in = followed by a carriage return ($'=\r'). When you check the last character, you're getting the carriage return, which of course is never =.
But that's just part of the problem. You could convert the file to UNIX line-endings (though it wouldn't be a valid .eml file at that point) or account for the CRs in your code. However, the trailing equal sign for continued lines is just one part of the "quoted printable" encoding scheme that the Content-Encoding header tells you the message body is using. Another thing you may run into is that Q-P messages cannot legally contain any characters outside the ASCII range, but must use =xx with two hex digits to represent such characters. Any Windows-1252 characters whose code point is > 127 will be replaced by =xx with the code in hexadecimal – as will any literal equal signs, which become =3D.
So you should ideally be using some library that understands MIME messages rather than trying to roll your own code to do bits and pieces of the decoding. Perhaps a Perl script using the MIME::Parser module would be appropriate? Or you could use the Python answers given to this question.

sort -o appends newline to end of file - why?

I'm working on a small text file with a list of words in it that I want to add a new word to, and then sort. The file doesn't have a newline at the end when I start, but does after the sort. Why? Can I avoid this behavior or is there a way to strip the newline back out?
Example:
words.txt looks like
apple
cookie
salmon
I then run printf "\norange" >> words.txt; sort words.txt -o words.txt
I use printf rather than echo figuring that'll avoid the newline, but the file then reads
apple
cookie
orange
salmon
#newline here
If I just run printf "\norange" >> words.txt orange appears at the bottom of the file, with no newline, ie;
apple
cookie
salmon
orange

This behavior is explicitly defined in the POSIX specification for sort:
The input files shall be text files, except that the sort utility shall add a newline to the end of a file ending with an incomplete last line.
As a UNIX "text file" is only valid if all lines end in newlines, as also defined in the POSIX standard:
Text file - A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the newline character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

Think about what you are asking sort to do.
You are asking it "take all the lines, and sort them in order."
You've given it a file containing four lines, which it splits to the following strings:
"salmon\n"
"cookie\n"
"orange"
It sorts these for you dutifully:
"cookie\n"
"orange"
"salmon\n"
And it then outputs them as a single string:
"cookie
orangesalmon
"
That is almost certainly exactly what you do not want.
So instead, if your file is missing the terminating newline that it should have had, the sort program understands that, most likely, you still intended that last line to be a line, rather than just a fragment of a line. It appends a \n to the string "orange", making it "orange\n". Then it can be sorted properly, without "orange" getting concatenated with whatever line happens to come immediately after it:
"cookie\n"
"orange\n"
"salmon\n"
So when it then outputs them as a single string, it looks a lot better:
"cookie
orange
salmon
"
You could strip the last character off the file, the one from the end of "salmon\n", using a range of handy tools such as awk, sed, perl, php, or even raw bash. This is covered elsewhere, in places like:
How can I remove the last character of a file in unix?
But please don't do that. You'll just cause problems for all other utilities that have to handle your files, like sort. And if you assume that there is no terminating newline in your files, then you will make your code brittle: any part of the toolchain which "fixes" your error (as sort kinda does here) will "break" your code.
Instead, treat text files the way they are meant to be treated in unix: a sequence of "lines" (strings of zero or more non-newline bytes), each followed by a newline.
So newlines are line-terminators, not line-separators.
There is a coding style where prints and echos are done with the newline leading. This is wrong for many reasons, including creating malformed text files, and causing the output of the program to be concatenated with the command prompt. printf "orange\n" is correct style, and also more readable: at a glance someone maintaining your code can tell you're printing the word "orange" and a newline, whereas printf "\norange" looks at first glance like it's printing a backslash and the phrase "no range" with a missing space.

Newlines in shell script variable not being replaced properly

Situation: Using a shell script (bash/ksh), there is a message that should be shown in the console log, and subsequently sent via email.
Problem: There are newline characters in the message.
Example below:
ErrMsg="File names must be unique. Please correct and rerun.
Duplicate names are listed below:
File 1.txt
File 1.txt
File 2.txt
File 2.txt
File 2.txt"
echo "${ErrMsg}"
# OK. After showing the message in the console log, send an email
Question: How can these newline characters be translated into HTML line breaks for the email?
Constraint: We must use HTML email. Downstream processes (such as Microsoft Outlook) are too inconsistent for anything else to be of use. Simple text email is usually a good choice, but off the table for this situation.
To be clear, the newlines do not need to be completely removed, but HTML line breaks must be inserted wherever there is a newline character.
This question is being asked because I have already attempted to use several commands, such as sed, tr, and awk with varying degrees of success.

TL;DR: The following snippet will do the job:
ErrMsg=`echo "$ErrMsg"|awk 1 ORS='<br/>'`
Just make sure there are double quotes around the variable when using echo.
This turned out to be a tricky situation. Some notes of explanation are below.
Using sed
Turns out, sed reads through input line by line, which makes finding and replacing those newlines somewhat outside the norm. There were several clever tricks that appeared to work, but I felt they were far too complicated to apply appropriately to this rather simple situation.
Using tr
According to this answer the tr command should work. Unfortunately, this only translates character by character. The two character strings are not the same length, and I am limited to translating the newline into a space or other single character.
For the following:
ErrMsg="Line 1
Line 2
"
ErrMsg=`echo $ErrMsg| tr '\n' 'BREAK'`
# You might expect:
# "Line 1BREAKLine 2BREAK"
# But instead you get:
# "Line 1BLine 2B"
echo "${ErrMsg}"
Using awk
Using awk according to this answer initially appeared to work, but due to some other circumstances with echo there was a subtle problem. The solution is noted in this forum.
You must have double-quotes around your variable, or echo will strip out all newlines.(Of course, awk will receive the characters with a newline at the end, because that's what echo does after it echos stuff.)
This snippet is good: (line breaks in the middle are preserved and replaced correctly)
ErrMsg=`echo "$ErrMsg"|awk 1 ORS='<br/>'`
This snipped is bad: (newlines converted to spaces by echo, one line break at end)
ErrMsg=`echo $ErrMsg|awk 1 ORS='<br/>'`

You can wrap your message in HTML using <pre>, something like
<pre>
${ErrMsg}
and more.
</pre>

How to edit file content using zsh terminal?

I created an empty directory on zsh and added a file
called hello.rb by doing the following:
echo 'Hello, world.' >hello.rb
If I want to make changes in this file using the terminal
what's the proper way of doing it without opening the file
itself using let's say TextEditor?
I want to be able to make changes in the file hello.rb strictly
by using my zsh terminal, is this at all possible?

Zsh is not a terminal but a shell. The terminal is the window in which the shell executes. The shell is the text program prompting you commands and executing them.
If you want to edit the file within the terminal, then using vim, nano, emacs -nw or any other text-mode text editor will do it. They are not Zsh commands, but external commands that you can call from Zsh or from any other shell.
If you want to edit the file within Zsh, then use zed. You will need to run once (in ~/.zshrc)
autoload zed
and then you can edit hello.rb using:
zed hello.rb
(exit and save with Control-j)

You have already created and edited the file.
To edit it again, you can use the >> to append.
For example
echo "\nAnd you too!\n" >> hello.rb
This would edit the file by concatenating the additional string.
Edit, of course, by your use and definition of 'changing' a file, this is the simplest way to do so using the shell.
In a normal way, though you probably want to use a terminal editor.

Zed is a great answer, but to be even more stripped down - for a level of editing that even a script can do - zsh can hand all 256 characters/byte-values (including null) in variables. This means you can edit line by line or chunk by chunk almost any kind of file data directly from the command-line. This is approximately what zed/vared does. If you have a current version with all the standard modules included, it is a great benefit to have zsh/mapfile or zsh/system loaded so that you can capture any of the characters that are left out by command-expansion (zed uses $(<$file) to read a file to memory). Here is an example of a way you could use this variable manipulation method:
% typeset -T Buffer buffer $'\n'
% typeset -T Edit edit $'\n'
It is most common to use newline to divide a text file one wishes to edit.
This handy feature will make zsh give you full access to one line or a range of lines at a time without unintentionally messing with the data.
% zmodload zsh/mapfile
% Buffer=$mapfile[path/to/file]
Here, I use the handy mapfile module because I can load the contents of a file byte-for-byte. Alternately you can use % Buffer="$(<path/to/file)", like zed does, but you will always have the trailing newlines removed and other word splitting is possible with a typo or environment variation, so the simplicity of the module's method is best. When finished, you save the changes by simply assigning the $Buffer value back to the $mapfile[file] or use a more classic command like printf '%s' $Buffer >path/to/file (this is exact string writing, byte-for-byte, so any newlines or formatting you added back will be written).
You transfer the lines between Buffer and Edit using the mapped arrays as follows, however, remember that in its simplest form assigning one array to another drops elements that are completely empty (one \n \n two \n three becomes one \n two \n three). You can suppress this empty-element removal by quoting the input array and adding an '#' symbol to its index "$buffer[#]", if using the whole array; and adding the '#' symbol to the flags if using a range of the array "${(#)buffer[2,50]}". Preserving empty lines can be a bit troublesome for typing, but these multiple arrays should only be used in a script or function, since you can just edit one line at a time from the command line with buffer[54]="echo This is a newly written line."
% edit=($buffer[50,70])
...
% buffer[50,70]=($edit)
This is standard Zsh syntax, that means in the ... area you can edit and manipulate the $edit array of lines or the $Edit scalar block of text all you want, including adding more lines or taking some away. When you add the lines back into $buffer it will replace the specified block of lines (50-70) with the new lines, automatically expanding or reducing the other array elements to accommodate the reintegrated lines. -- Because of the dynamic array accommodations, you can also just insert whatever you need as a new line like this buffer[40]=("new string as new line" "$buffer[40]"). This inserts it before the index given, while swapping the order of the elements ("$buffer[40]" "new string as new line") inserts the new line after the index given. Either will adjust all following elements, including totally empty elements, to their current index plus one.
If you wanted to re-write the zed function to use this method in some complex way like: newzed /path/to/file [start-line] [end-line], that would be great and handy too.
Before I leave, I wanted to mention that using vared directly, once you have these commands typed on the interactive terminal, you may find it frustrating that you can't use "Enter" for inserting or appending new lines. I found that with my terminal and Zsh version using ESC-ENTER worked well, but I don't know about older versions (Mac usually comes stocked with a not-most-recent version, if my memory is right). If that doesn't work, you may have to do some documentation digging to learn how to set up your ZLE (Zsh Line Editor, a component of Zsh) or acquire a newer version of Zsh. Also, some other shells, when indexing a scalar variable may count by the byte because in ascii and C a byte is the same as a character, but Zsh supports UTF8 and will index a scalar string by the UTF8 character unless you turn off the shell option multibyte (on by default). This will help with manipulating each line if you need to use the old byte-character indexing. Also, if you have a version of Zsh that for whatever was not compiled with zsh/mapfile or zsh/system, then you can achieve a similar effect using number of options to the read builtin, like <path/to/file |read -u 0 -k $[5 * 2**20] -r -s Buffer ||(($#Buffer)). As you can see here, you have to make the read length big enough to accommodate the file's size or it will leave off part of the file, and the read return code will nearly always be an error because of not being able to read the full length of the string. We fix this with ||(($#Buffer)), but this builtin was simply not meant to handle large scale byte manipulation efficiently, so what you see is what you can get from it.

Programmatically delete all text between 2 characters in osx terminal

I have a thousand of txt files
1.txt
2.txt
3.txt
in each files, several times I have tags among my text:
{somethinghere...blablabla} than the text I want to keep than again {somethinghere...blablabla}
I'm not very pratical in mac osx command line, can someone help me to write a command opening each file, parsing it, and deleting all text included by two "{"?
To be clear:
First of all I need to open each file, than parse the text. When the loop finds a "{" it starts deleting till it founds a "}". When done parsing it saves and close the file. That's what I need to do.

$ sed -i.bak -e 's#{[^}]*}##g' *.txt
-i.bak make a backup copy of each modified files. If you don't want backups, on OsX use -i'' (the quotes are not necessary on Linux)
in substitutions, the delimiter can be another character than /, here I choose #, so : s#<REGEX>#<REMPLACEMENT># (the basic form for substitutions are s///)
In the regex, we search a litteral { and all but not a } with [^}]. * means 0 or more occurences. Last, we search the closing } and we replace the matching part by nothing, so it delete what was matching
the g modifier #the end means not only one match but all

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio