handle self-deleting console outputs in unix\shell\bash - bash

I want to download a huge file from s3 to a server so I'm using nohup.
Problem is, the process outputs "self-updating reports", which are cool in the terminal but are horrible when written to a file as a single long line.
My questions are:
How should I handle this output to avoid a 84MB text file of just one line?
Given that I have such a file, how can I read its "bottom line" effectively?
thanks!

The "self-updating" text is just a carriage return character, which when printed to the terminal moves the cursor back to the beginning of line, allowing you to overwrite any previous text on the line.
To remove everything up through the last carriage return character,
awk '{ sub(/.*\r/, ""); }1' file
or the same with any regex tool (sed 's/.*\r//' file would work if your sed recognizes the nonstandard \r escape).
Some tools have an option to turn off progress updates; if you are using the standard aws s3 cp, try adding the option --no-progress.

Related

Why is there no such file or directory_profile?

I am using Windows and MobaXterm.
I created a .bash_profile file in the ~ directory and the following line
alias sbp="source ~/.bash_profile"
is the only code in that file.
However, when I was trying to do sbp, I got an error.
This works on my Mac and it used to work on my old Windows computer (but that one has some water damage so it broke down). Why does this not work now?
Thanks in advance!
From the way that error message is garbled I'm pretty sure that the .bash_profile file you created has DOS/Windows-style line endings, consisting of a carriage return character followed by a newline character. Unix tools expect unix-style line endings consisting of just a newline; if they see DOS/Windows-style endings, they'll treat the carriage return as part of the content of the line. In this case, bash will treat the carriage return as part of the alias definition, and therefore part of the filename to filename to source. Try running alias sbp | cat -vt to print the alias with invisible characters shown; my guess is it'll print alias sbp='source ~/.bash_profile^M' (where the ^M is cat -vt's way of representing the carriage return).
Solution: convert the file to unix format, and either switch to a text editor that knows how to save in unix format, or change your settings in the current editor to do it. For conversion, there are a number of semi-standard tools like dos2unix and fromdos. If you don't have any of those, this answer has some other options.
BTW, the reason the error message is garbled is that the CR gets printed as part of the error message, and the terminal treats that as an instruction to go back to the beginning of the line; it then prints the rest of the message over top of the beginning of the message. It's a little like this:
-bash: /home/dir/path/.bash_profile
: No such file or directory
...but with the second line printed over the first, so it comes out as:
: No such file or directory_profile

In shell script, colon(:) is being treated as a operator for variable creation

I have following snippet:
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
the output is:
:80ps://example.com
How can I escape the colon here. I also tried:
url="${host}\:${port}"
but it did not work.
Expected output is:
https://example.com:80
You've most likely run into what I call the Linefeed-Limbo.
If I copy the code you provided from StackOverflow and run it on my machine (bash version 4.4.19(1)), then it outputs correctly
user#host:~$ cat script.sh
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
user#host:~$ bash script.sh
https://example.com:80
What is Linefeed-Limbo?
Different operating systems use different ASCII symbols to represent when a new line occurs in a text, such as a script. This Wikipedia article gives a good introduction.
As you can see, Unix and Unix-like systems use the single character \n, also called a "Line Feed". Windows, as well as other systems, use \r\n, so a "carriage return" followed by a "line feed".
What happens now is when you write a script on Windows on an editor such as notepad, what you write is host="example.com"\r\n. When you copy this file into Linux, Linux interprets the \r as if it were part of the script, since only \n is considered a new line. And indeed, when I change my newline style to DOS-style, I get the exact output you get.
How can I fix this?
You have several options to fix this issue.
Converting the script (with dos2unix)
Since all you need to do is replacing every instance of \r\n with \n, you could use any text-editing software you want. However, if you like simple solutions, then dos2unix (and its sister unix2dos) might be what you looking for:
user#host:~$ dos2unix script.sh
dos2unix: converting file script.sh to Unix format...
That's it. Run your file now and you will see it behaves well.
Encoding the source-file correctly
By using a more advanced text editor such as Notepad++, you can define which style of newline you would like to use.
By changing the newline-type to whichever system you intend to run your script on, you will not run into any problems like this anymore.
Bonus round: Why does it output :80ps://example.com?
To understand why your output is like this, you have to look at what your script is doing, and what \r means.
Try thinking of your terminal as an old-fashioned typewriter. Returning the carriage means you start writing on the left again. Making a "new line" means sliding the paper. These two things are seperate, and I think that's why some systems decided to use these two characters as a logical "new line".
But I digress. Let's look at the first line, host="https://example.com"\r.
What this means when printed is "Print https://example.com, then put the carriage back at the start". When you then print :80\r, it doesn't start after ".com", it starts at the beginning of the line, because that's where you (unknowingly) told the cursor to go. it then overwites the first few characters, resulting in ":80ps://example.com" to be written. Keep in mind that after 80, you again placed a carriage return symbol, so any new text you would have written ends up overwriting the beginning again.
It works for me, try to remove carriage returns in variables and then try.
new_host=$(echo "$host" | tr -d '\r')
new_port=$(echo "$port" | tr -d '\r')
new_url="${new_host}:${new_port}"

How to get rid of bash control characters by evaluating them?

I have an output file (namely a log from screen) containing several control characters. Inside the screen, I have programs running that use control characters to refresh certain lines (examples would be top or anything printing progress bars).
I would like to output a tail of this file using PHP. If I simply read in that file and echo its contents (either using PHP functions or through calling tail, the output is messy and much more than these last lines as it also includes things that have been overwritten. If I instead run tail in the command line, it returns just what I want because the terminal evaluates the control characters.
So my question is: Is there a way to evaluate the control characters, getting the output that a terminal would show me, in a way that I could then use elsewhere (e.g., write to a file)?
#5gon12eder's answer got rid of some control characters (thanks for that!) but it did not handle the carriage return part that was even more important to me.
I figured out that I could just delete anything from the beginning of a line to the last carriage return inside that line and simply keep everything after that, so here is my sed command accomplishing that:
sed 's/^.*\r\([^\r]\+\)\r\?$/\1\r/g'
The output can then be further cleaned using #5gon12eder's answer:
cat screenlog.0 | sed 's/^.*\r\([^\r]\+\)\r\?$/\1\r/g' | sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g'
Combined, this looks exactly like I wanted.
I'm not sure what you mean by “evaluating” the control characters but you could remove them easily.
Here is an example using sed but if you are already using PHP, its internal regex processing functionality seems more appropriate. The command
$ sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g' file.dat
will dump the contents of file.dat to standard output with all ANSI escape sequences removed. (And I'm pretty sure that nothing else is removed except if your file contains invalid escape sequences in which case the operation is ill-defined anyway.)
Here is a little demo:
$ echo -e "This is\033[31m a \033[umessy \033[46mstring.\033[0m" > file.dat
$ cat file.dat
# The output of the above command is not shown to protect small children
# that might be browsing this site.
$ reset # your terminal
$ sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g' file.dat
This is a messy string.
The less program has some more advanced logic built in to selectively replace some escape sequences. Read the man page for the relevant options.

Grep with windows, words created in text file without spaces

I am using grep to take all the four letter words out of a dictionary text file and place them into a new text file.
This command should work with Unix however on windows it does not.
I need one word per line, on windows it gives me all the words but all piled together without spaces.
This is the grep command I'm using:
grep "^[a-z]\{4\}$" dictionaryfilename > outputfilename
I believe it's something to do with a difference in newline characters between Unix and windows?
Anyway I'm not sure how to make a fix for windows with this could someone please help.
Thanks a lot :)
you probably have a UNIX-formatted textfile (newlines without carriage returns), which looks like one big line in Windows; grep just deals in whatever the system says is 'a line', so it has little to do with the problem.
Try converting the file from LF to CRLF and see if you get better results.

sed can not work in script file in Windows

I once write a simple sed command like this
s/==/EQU/
while I run it in command line:
sed 's/==/EQU' filename
it works well, replace the '==' with 'EQU', but while I write the command to a script file named replace.sed, run it in this way:
sed -f replace.sed filename
there is a error, says that
sed: file replace.sed line 1: unknwon option to 's'
What I want to ask is that is there any problem with my script file replace.sed while it run in windows?
The unknown option is almost invariably a rogue character after the trailing / (which is missing from your command line version, by the way so it should complain about an unterminated command).
Have a look at you replace.sed again. You may have a funny character at the end, which could include the ' if you forgot to delete it, or even a CTRL-M DOS-style line ending, though CygWin seems to handle this okay - you haven't specified which sed you're using (that may help).
Okay, based on your edit, it looks like one of my scattergun of suggestions was right :-) You had CTRL-M at the end of the line because of the CR/LF line endings:
At the end of each line in the *.sed file, there was a 'CR\LF' pair, and that the problem, but you cannot see it by default, I use notepad to delete them manually and fix the problem. But I have not find a way to delete it automatically or do not contain the 'new-line' style while edit a new text file in windows.
You may want to get your hands on a more powerful editor like Notepad++ or gVim (my favourite) but, in fact, you do have a tool that can get rid of those characters :-) It's called sed.
sed 's/\015//g' replace.sed >replace2.sed
should get rid of all the CR characters from your file and give you a replace2.sed that you can use for your real job.

Resources