How to get rid of bash control characters by evaluating them? - bash

I have an output file (namely a log from screen) containing several control characters. Inside the screen, I have programs running that use control characters to refresh certain lines (examples would be top or anything printing progress bars).
I would like to output a tail of this file using PHP. If I simply read in that file and echo its contents (either using PHP functions or through calling tail, the output is messy and much more than these last lines as it also includes things that have been overwritten. If I instead run tail in the command line, it returns just what I want because the terminal evaluates the control characters.
So my question is: Is there a way to evaluate the control characters, getting the output that a terminal would show me, in a way that I could then use elsewhere (e.g., write to a file)?

#5gon12eder's answer got rid of some control characters (thanks for that!) but it did not handle the carriage return part that was even more important to me.
I figured out that I could just delete anything from the beginning of a line to the last carriage return inside that line and simply keep everything after that, so here is my sed command accomplishing that:
sed 's/^.*\r\([^\r]\+\)\r\?$/\1\r/g'
The output can then be further cleaned using #5gon12eder's answer:
cat screenlog.0 | sed 's/^.*\r\([^\r]\+\)\r\?$/\1\r/g' | sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g'
Combined, this looks exactly like I wanted.

I'm not sure what you mean by “evaluating” the control characters but you could remove them easily.
Here is an example using sed but if you are already using PHP, its internal regex processing functionality seems more appropriate. The command
$ sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g' file.dat
will dump the contents of file.dat to standard output with all ANSI escape sequences removed. (And I'm pretty sure that nothing else is removed except if your file contains invalid escape sequences in which case the operation is ill-defined anyway.)
Here is a little demo:
$ echo -e "This is\033[31m a \033[umessy \033[46mstring.\033[0m" > file.dat
$ cat file.dat
# The output of the above command is not shown to protect small children
# that might be browsing this site.
$ reset # your terminal
$ sed 's,\x1B\[[0-9?;]*[a-zA-Z],,g' file.dat
This is a messy string.
The less program has some more advanced logic built in to selectively replace some escape sequences. Read the man page for the relevant options.

Related

handle self-deleting console outputs in unix\shell\bash

I want to download a huge file from s3 to a server so I'm using nohup.
Problem is, the process outputs "self-updating reports", which are cool in the terminal but are horrible when written to a file as a single long line.
My questions are:
How should I handle this output to avoid a 84MB text file of just one line?
Given that I have such a file, how can I read its "bottom line" effectively?
thanks!
The "self-updating" text is just a carriage return character, which when printed to the terminal moves the cursor back to the beginning of line, allowing you to overwrite any previous text on the line.
To remove everything up through the last carriage return character,
awk '{ sub(/.*\r/, ""); }1' file
or the same with any regex tool (sed 's/.*\r//' file would work if your sed recognizes the nonstandard \r escape).
Some tools have an option to turn off progress updates; if you are using the standard aws s3 cp, try adding the option --no-progress.

Pipe command in Bash

Pipe command is showing it's results properly .When i try to use it cat or > it doesn't show the output
i have try to run the command with different spaces but it didn't help
sort spiderman.txt | cat > superman.txt
sort spiderman.txt | > superman.txt
in the first above code cat is not showing it's output (the cat command is not showing contents of superman.txt ) however if i write is separately the cat command it's showing the contents
in the second command nothing happens to superman.txt
ideally it should have replaced all contents of superman.txt and replaced with sorted contents of spiderman.txt but nothing happens.
If you're trying simple output redirection you shouldn't pipe (|), just redirect (>):
sort spiderman.txt > superman.txt
If you want to show the content as well as redirect to a file - perhaps what you're looking for is tee?
sort spiderman.txt | tee superman.txt
Description:
The tee utility copies standard input to standard output, making a copy in zero or more files. The output is unbuffered.
> superman.txt (with no command) is processed as follows:
superman.txt is opened for writing and truncated
The output redirection is removed from the current command.
Since there is nothing left, the empty command is treated as having
run and exited successfully. Nothing actually reads from the pipe
or writes to superman.txt.
cat is necessary as a command which does read from standard input and writes to standard output.
It sometimes seems a little odd to me that more shells don't provide a minimal built-in that simply copies input to output with no frills, to avoid otherwise having to fork and exec cat. ( I should say "no" rather than "more", as I'm not aware of any shell that does. zsh might, if I bothered to search through the documentation to find it.)
(Some shells will optimize away an extra fork when processing a command line; bash is not one of them, though. It forks once to create a process for the write end of the pipe, then forks again to run cat. I believe ksh would simply exec cat directly instead of unnecessarily forking, in which case a built-in cat is less necessary.)

In shell script, colon(:) is being treated as a operator for variable creation

I have following snippet:
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
the output is:
:80ps://example.com
How can I escape the colon here. I also tried:
url="${host}\:${port}"
but it did not work.
Expected output is:
https://example.com:80
You've most likely run into what I call the Linefeed-Limbo.
If I copy the code you provided from StackOverflow and run it on my machine (bash version 4.4.19(1)), then it outputs correctly
user#host:~$ cat script.sh
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
user#host:~$ bash script.sh
https://example.com:80
What is Linefeed-Limbo?
Different operating systems use different ASCII symbols to represent when a new line occurs in a text, such as a script. This Wikipedia article gives a good introduction.
As you can see, Unix and Unix-like systems use the single character \n, also called a "Line Feed". Windows, as well as other systems, use \r\n, so a "carriage return" followed by a "line feed".
What happens now is when you write a script on Windows on an editor such as notepad, what you write is host="example.com"\r\n. When you copy this file into Linux, Linux interprets the \r as if it were part of the script, since only \n is considered a new line. And indeed, when I change my newline style to DOS-style, I get the exact output you get.
How can I fix this?
You have several options to fix this issue.
Converting the script (with dos2unix)
Since all you need to do is replacing every instance of \r\n with \n, you could use any text-editing software you want. However, if you like simple solutions, then dos2unix (and its sister unix2dos) might be what you looking for:
user#host:~$ dos2unix script.sh
dos2unix: converting file script.sh to Unix format...
That's it. Run your file now and you will see it behaves well.
Encoding the source-file correctly
By using a more advanced text editor such as Notepad++, you can define which style of newline you would like to use.
By changing the newline-type to whichever system you intend to run your script on, you will not run into any problems like this anymore.
Bonus round: Why does it output :80ps://example.com?
To understand why your output is like this, you have to look at what your script is doing, and what \r means.
Try thinking of your terminal as an old-fashioned typewriter. Returning the carriage means you start writing on the left again. Making a "new line" means sliding the paper. These two things are seperate, and I think that's why some systems decided to use these two characters as a logical "new line".
But I digress. Let's look at the first line, host="https://example.com"\r.
What this means when printed is "Print https://example.com, then put the carriage back at the start". When you then print :80\r, it doesn't start after ".com", it starts at the beginning of the line, because that's where you (unknowingly) told the cursor to go. it then overwites the first few characters, resulting in ":80ps://example.com" to be written. Keep in mind that after 80, you again placed a carriage return symbol, so any new text you would have written ends up overwriting the beginning again.
It works for me, try to remove carriage returns in variables and then try.
new_host=$(echo "$host" | tr -d '\r')
new_port=$(echo "$port" | tr -d '\r')
new_url="${new_host}:${new_port}"

Bash - control external command's output

I'm writing a bash script to make DVD authoring more automated (but, mainly, so that I can learn some more bash scripting) and I'm trying to find out if it's possible to control how an exterrnal command presents its output.
For instance, the output from ffmpeg is a load of (to me) irrelevant cruft about options, libraries, streams, progress and so on.
What I really want is to be able to select for display only the lines with the input and output filenames and then to display the progress on the same line each time. Similarly for mkisofs and wodim.
I've tried Googling for this and am beginning to suspect that either it's not possible or nobody's thought of it before (or, possibly, that it's so obvious that nobody thinks it necessary to say how :-) ).
Many thanks, in advance,
David Shaw
You want to use grep and pipes. They are your friends. You want to pipe the output of the ffmpeg into grep and have it output only lines containing the text you want.
Assuming you have the input and output file names as command lines arguments $1 and $2 to your shell script, you might try something like
ffmpeg .... | grep "$1\|$2"
^ ^
| +-- escape and OR character
+--pipe character
The '\|' is an escape and an OR character for regular expressions. The OR '|' is also the pipe character so you have to escape that.
This will output only output lines that contain the files you are looking for.
This assumes all output is via stdout. If ffmpeg is outputting text via stderr then you will need to add some redirects at the end of ffmpeg line to redirect those back to stdout.
EDIT: I used the wrong quotes in the first example. Use double quotes or it won't expand the parameters $1 and $2

Is there a script I can run to remove all the hard (carriage) returns in a .txt file?

I have a .txt (Mac OS X Snow Leopard) file that has a lot of text. At the end of a paragraph, there is a hard return that moves the next paragraph onto another line. This is causing some issues with what I am wanting to do to get the content into my db, so I am wondering if there is anyway I can remove the hard returns? Is there some sort of script I can run? I am really hoping I don't have to go through and manually take the hard returns out.
To recap, here is what it looks like now:
This is some text. Text is what this is.
And then this is the next paragraph that is on a different line.
And this is what I would like to get to:
This is some text. Text is what this is. And then this is the next paragraph that is on a different line.
For all several thousand lines in my .txt file.
Thanks!
EDIT:
The text I am dealing with in my txt file is actually HTML:
 <span class="text">1 </span> THis is where my text is<br/>
And when I run the cat command in terminal like mentioned below, only the first is there. Everything else is missing...
In a terminal:
cat myfile.txt | tr -d '\r' > file2.txt
There's probably a more efficient way to do this, since the "tr -d '\r'" is the active ingredient, but that's the idea.
I normally just use an editor with good Regular Expression support. TextWrangler is great.
An end of line in TextWrangler is \r, so to remove it, just search for \r and replace it with a space. TBH, I always wondered how it handles CRLF-encoded files, but somehow it works.
I believe you can do this with Applescript. Unfortunately I'm not familiar with it however the following should help you to acomplish this (it's for a different problem but it will lead you in the direction you need to go): http://macscripter.net/viewtopic.php?id=18762
Alternatively if you didn't want to do this with Applescript and have Excel installed (or access to a machine with it) then the following should help: http://www.mrexcel.com/forum/showthread.php?t=474054
In Linux terminal cat file.txt | tr -d "\r\n" | > new file.txt will do. Modify \r\n part to remove desired charters.

Resources