Saving shell command output in python script to text file - terminal

I manage to call and use a terminal command in my python script, in which it is working.
But currently I am trying to save the 'output' result from this command into a text file but I am getting errors.
This is my initial code in which it is doing correctly:
import os
os.system("someCmds --proj ARCH --all")
The code that I used when I tried to save the outputs into a text file:
import os
import sys
sys.stdout=open("output.txt","w")
command = os.system("someCmds --proj ARCH --all")
print command
sys.stdout.close()
However I got the following error - ValueError: I/O operation on closed file
I did close the file as I found this online. So where am I wrong?

The Python programming part of this is more appropriate for stackoverflow.com. However, there's a Unix-oriented component as well.
Every process has three well-known file descriptors. While they have names—stdin, stdout, and stderr—they are actually known by their file numbers, which are 0, 1, and 2 respectively. To run a program and capture its stdout—its file-number-1 output—you must connect that process's file-descriptor-1 to a file or pipe.
In the command-line shells, there is syntax for this:
prog > file
runs program prog, in a process whose stdout is connected to an open file descriptor that has been moved into descriptor-slot-1. Actually making all that happen at the system-call level is complicated:
Open the file. This results in a open file descriptor, whose number is probably higher than 2 since zero-through-two are your own stdin, stdout, and stderr.
Use the fork system call or one of its variants: this makes a clone of yourself.
In the clone (but not in the original), move the file descriptor from wherever it is now, to file descriptor 1. Then close the original higher-numbered descriptor (and any others that you have open that you do not wish to pass on).
Use one of the the exec family of calls to terminate yourself (the clone) but in the process, replace everything with the program prog. This maintains all your open file descriptors, including the one pointing to the file or pipe you moved in step 3. Once the exec succeeds, you no longer exist and cannot do anything. (If the exec fails, report the failure and exit.)
Back in the original (parent) process, wait for the clone-child to exec and then finish, or to fail, or whatever happens.
Python being what it is, this multi-step sequence is all wrapped up for you, with fancy error checking, in the subprocess module. Instead of using os.system, you can use subprocess.Popen. It's designed to work with pipes—which are actually more difficult than files—so if you really want to redirect to a file, rather than simply reading the program's output through a pipe, you will want to open the file first, then invoke subprocess.Popen.
In any case, altering your Python process's sys.stdout is not helpful, as sys.stdout is a Python data structure quite independent of the underlying file descriptor system. By opening a Python stream, you do obtain a file descriptor (as well as a Python data structure), but it is not file-descriptor-number-1. The underlying file descriptor number, whatever it is, has to be moved to the slot-1 position after the fork call, in the clone. Even if you use subprocess.Popen, the only descriptor Python will move, post-fork, is the one you pass as the stdout= argument.
(Subprocess's Popen accepts any of these:
a Python stream: Python uses stream.fileno() to get the descriptor number, or
an integer: Python assumes that this represents an open file descriptor that should be moved to fd 0, 1, or 2 depending on whether this is an argument to stdin=, stdout=, or stderr=, or
the special values subprocess.PIPE or, for stderr=, subprocess.STDOUT: these tell it that Python should create a pipe, or re-use the previously-created stdout pipe for the special stderr=subprocess.STDOUT case.
The library is pretty fancy and knows how to report, with a Python traceback, a failure to exec, or various other failures that occur in the child. It does this using another auxiliary pipe with close-on-exec. EOF on this pipe means the exec succeeded; otherwise the data arriving on this extra pipe include the failure, converted to a byte stream using the pickle module.)

Related

stdout and stderr are out-of-order when a bash script is run from Atom

I tried to write a little program to list a non existing directory and echo done, in a .sh file:
#!/bin/bash
ls notexist
echo 'done'
But my console outputs done on the first line, before the error message to list the nonexisting directory:
done
ls: notexist: No such file or directory
I don't think bash creates a thread automatically for each line of code, does it? I'm using terminal in macOS Big Sur.
Edit: I'm accessing terminal indirectly from the script package of the Atom text editor in macOS Big Sur. The error goes away if I run code directly in console via ./file.sh.
If we look at the source code to the Atom script plugin, the problem becomes clear:
It creates a BufferedProcess with separate stdout and stderr callbacks (using them, among other things, to determine whether any output has been written to each of these streams).
Implementing this requires stdout and stderr to be directed to different FIFOs. This means that, unlike a typical terminal where there's an absolute ordering of which content was written to the single FIFO shared by both stdout and stderr at the same time, there's no strict guarantee that content will be processed through those functions in the same order it was written.
As a workaround, you can exec 2>&1 into your script to put all content on stdout, or exec >&2 to put all content on stderr. Ideally, if the script plugin doesn't need to track the two streams separately, it would do this itself, and put a callback only on the single stream to which all content has been redirected.

Why go programs print output to terminal screen but not /dev/stderr?

As I see in the source of golang
go will print output to os.Stderr which is
Stderr = NewFile(uintptr(syscall.Stderr), "/dev/stderr")
So why I run this program in my terminal with the command go run main.go
the output is printed to the terminal screen, not the /dev/stderr
// main.go
func main() {
log.Println("this is my first log")
}
In standard Unix/Linux terminals, both stdout and stderr are connected to the terminal so the output goes there.
Here's a shell snippet to clarify this:
$ echo "joe" >> /dev/stderr
joe
Even though we echoed "joe" to something that looks like a file, it gets emitted to the screen. Replace /dev/stderr with /tmp/foo and you won't see the output on the screen (though it will be appended to the file /tmp/foo)
In Go you can specifically select which stream to output to by passing it to functions like fmt.Fprintf in its first argument.
Well, several things are going on here.
First, is that on a UNIX-like system (and you appear to be on a Linux-based one), the environment in which each user-space program runs, includes the concept of the so-called "standard I/O streams" — that is, each program bootstrapped by the OS and taken control automatically has three file descriptors opened and available: representing the standard input stream, the standard output stream and the standard error stream.
Second, typically (but not always) the spawned program inherits these streams from its parent program. For the case of an interactive shell running in a terminal (or a terminal emulator), that parent program is the shell, and so the standard I/O streams of the spawned program are inherited from the shell.
The shell's standard I/O streams, in turn, naturally connected to the terminal it runs at: that's why it's possible to input data to the shell and read what it prints back: you actually type into the terminal, not to the shell; it's the terminal which delivers that data to the shell; the case for the shell's output is just the reverse.
Third, that /dev/stderr is a Linux-specific "hack" which is a virtual device meaning "whatever my stderr is connected to".
That is, when a process opens that special file, it gets back a file descriptor connected to whatever the process' stderr is already connected to.
Fourth, let's grok the code example you cited:
NewFile(uintptr(syscall.Stderr), "/dev/stderr")
Here, a call to os.NewFile is made, receiving two arguments.
To cite it's documentation:
$ go doc os.NewFile
func NewFile(fd uintptr, name string) *File
NewFile returns a new File with the given file descriptor and name. The returned value will be nil if fd is not a valid file descriptor.
<…>
OK, so this function takes a raw kernel-level
file descriptor
and a name of the file it is supposed to have been opened to.
That latter bit is crucial: the OS kernel itself is (almost) oblivious about what sort of stream a file descriptor actually represents — at least as long as its public API is considered.
So, when NewFile is called to obtain an instance of *os.File for the program's standard error stream by the log package,
it does not open the file "/dev/stderr" (even though it exists);
it merely uses it's name since os.NewFile requests it.
It could have used "" there to much the same extent except for changes in error reporting: if something fails when using the resulting *os.File, the error output would not have included the name "/dev/stderr".
The syscall.Stderr value is merely the number of the file descriptor connected to the standard error stream.
On UNIX-compatible kernels it's always 2; you can run go doc syscall.Stderr and see for yourself.
To recap,
The call NewFile(...) you referred to does not open any files;
it merely wraps an already open file descriptor connected to the standard error stream of the current process into a value of type os.File which is used throughout the os package for I/O on files.
On Linux, the special virtual device file /dev/stderr does really exist but it has nothing to do with what's happening here.
When you run a program in an interactive shell without using any I/O redirection, the standard streams of the created process are connected to the same "sinks and sources" as those of the shell. And they, in turn, are most of the time connected to the terminal which hosts the shell.
Now I urge you to fetch an introductory book on the design of UNIX-like operating systems and read it.

Why won't an external executable run without manual input from the terminal command line?

I am currently writing a Python script that will pipe some RNA sequences (strings) into a UNIX executable, which, after processing them, will then send the output back into my Python script for further processing. I am doing this with the subprocess module.
However, in order for the executable to run, it must also have some additional arguments provided to it. Using the subprocess.call method, I have been trying to run:
import subprocess
seq= "acgtgagtag"
output= subprocess.Popen(["./DNAanalyzer", seq])
Despite having my environmental variables set properly, the executables running without problem from the command line of the terminal, and the subprocess module functioning normally (e.g. subprocess.Popen(["ls"]) works just fine), the Unix executable prints out the same output:
Failed to open input file acgtgagtag.in
Requesting input manually.
There are a few other Unix executables in this package, and all of them behave the same way. I even tried to create a simple text file containing the sequence and specify it as the input in both the Python script as well as within the command line, but the executables only want manual input.
I have looked through the package's manual, but it does not mention why the executables can ostensibly be only run through the command line. Because I have limited experience with this module (and Python in general), can anybody indicate what the best approach to this problem would be?
The Popen() is actually a constructor for an object – that object being a "sub-shell" that directly runs the executable. But because I didn't set a standard input or output (stdin and stdout), they default to None, meaning that the process's I/O are both closed.
What I should have done is pass subprocess.PIPE to signify to the Popen object that I want to pipe input and output between my program and the process.
Additionally, the environment variables of the script (in the main shell) were not the same as the environment variables of the subshell, and these specific executables needed certain environment variables in order to function (in this case, it was the path to the parameter files in its package). This was done in the following fashion:
import subprocess as sb
seq= "acgtgagtag"
my_env= {BIOPACKAGEPATH: "/Users/Bobmcbobson/Documents/Biopackage/"}
p= sb.Popen(['biopackage/bin/DNAanalyzer'], stdin=sb.PIPE, stdout=sb.PIPE, env=my_env)
strb = (seq + '\n').encode('utf-8')
data = p.communicate(input=strb)
After creating the Popen object, we send it a formatted input string using communicate(). The output can now be read, and further processed in whatever way in the script.

Bash script - run process & send to background if good, or else

I need to start up a Golang web server and leave it running in the background from a bash script. If the script in question in syntactically correct (as it will be most of the time) this is simply a matter of issuing a
go run /path/to/index.go &
However, I have to allow for the possibility that index.go is somehow erroneous. I should explain that in Golang this for something as "trival" as importing a module that you then fail to use. In this case the go run /path/to/index.go bit will return an error message. In the terminal this would be something along the lines of
index.go:4:10: expected...
What I need to be able to do is to somehow change that command above so I can funnel any error messages into a file for examination at a later stage. I tried variants on go run /path/to/index.go >> errors.txt with the terminating & in different positions but to no avail.
I suspect that there is a bash way to do this by altering the priority of evaluation of the command via some judiciously used braces/brackets etc. However, that is way beyond my bash capabilities. I would be most obliged to anyone who might be able to help.
Update
A few minutes later... After a few more experiments I have found that this works
go run /path/to/index.go &> errors.txt &
Quite apart from the fact that I don't in fact understand why it works there remains the issue that it produces a 0 byte errors.txt file when the command goes to completion without Golang throwing up any error messages. Can someone shed light on what is going on and how it might be improved?
Taken from man bash.
Redirecting Standard Output and Standard Error
This construct allows both the standard output (file descriptor 1) and the standard error output (file descriptor 2) to be redirected to the file whose name is the expansion of word.
There are two formats for redirecting standard output and standard error:
&>word
and
>&word
Of the two forms, the first is preferred. This is semantically equivalent to
>word 2>&1
Appending Standard Output and Standard Error
This construct allows both the standard output (file descriptor 1) and the standard error output (file descriptor 2) to be appended to the file whose name is the expansion of word.
The format for appending standard output and standard error is:
&>>word
This is semantically equivalent to
>>word 2>&1
Narūnas K's answer covers why the &> redirection works.
The reason why the file is created anyway is because the shell creates the file before it even runs the command in question.
You can see this by trying no-such-command > file.out and seeing that even though the shell errors because no-such-command doesn't exist the file gets created (using &> on that test will get the shell's error in the file).
This is why you can't do things like sed 'pattern' file > file to edit a file in place.

When would piping work - does application have to adhere to some standard format? What is stdin and stdout in Unix?

I am using a program that allows me to do
echo "Something" | app outputilfe
But a similar program doesnt do that (and its a bash script that runs Java -jar internally).
Both works with
app input output
This leads to me this question . And why some programs do it and some don't ?
I am basically trying to understand in a larger sense how does programs inter-operate so fluently in *nix - The idea behind it- what is stdin and stdout in a simple layman terms and
A simple way of writing a program that takes an input file and writes an output file is:
Write a code in such a manor that the first 2 positional arguments get interpreted as input and output strings where input should a file that is available in the file system and output is a string that is where its going to write back the binary data .
But this is not how it is . It seems I can stream it . Thats a real paradigm shift for me. I believe its the File Descriptor abstraction that makes it possible? That is you normally write code to expect a FD as positional arguments and not the real file strings ? Which in turn means the output file gets opened and the fd is sent to the program once I execute the command in bash ?
It can read from Terminal and give the display to screen or a application . What makes this possible ? I think there is some concept of file descriptors that I am missing here ?
Does applications 'talk' in terms of File Descriptors and not file name as strings? - In Unix everything is a file and that means FD is used ?
Few other related reads :
http://en.wikipedia.org/wiki/Pipeline_(Unix)
What is a simple explanation for how pipes work in BASH?
confused about stdin, stdout and stderr?
Here's a very non-technical description of a relatively technical topic:
A file descriptor, in Unix parlance, is a small number that identifies a given file or file-like thingy. So let's talk about file-like-thingies in the Unix sense.
What's a Unix file-like-thingy? It's something that you can read from and/or write to. So standard files that live in a directory on your hard disk certainly can qualify as files. So can your terminal session – you can type into it, so it can be read, and you can read output printed on it. So can, for that matter, network sockets. So can (and we'll talk about this more) pipes.
In many cases, an application will read its data from one (or more) file descriptors, and write its results to one (or more) file descriptors. From the point of view of the core code of the application, it doesn't really care which file descriptors its using, or what they're "hooked up" to. (Caveat: Different file descriptors can be hooked up to file-like-thingies with different capabilities, like read-only-ness; I'm ignoring this deliberately for now.) So I might have a trivial program which looks like (ignoring error checking):
void zcrew_up_zpelling(int in_fd, int out_fd) {
char c;
ssize_t
while(read(in_fd, &c, 1)) {
if (c == 's') c = 'z';
write(out_fd, &c, 1));
}
}
Don't worry too much about what this code does (please!); instead, just notice that it's copying-and-modifying from one file descriptor to another.
So, what file descriptors are actually used here? Well, that's up to the code that calls zcrew_up_zpelling(). There are, however, some vague conventions. Many programs that need a single source of input default to using stdin as the file descriptor they'll read from; many programs that need a single source of output default to using stdout as the file descriptor they'll write to. Many of these programs provide ways to use a different file descriptor instead, often one hooked up to a named file.
Let's write a program like this:
int main(int argc, char **argv) {
int in_fd = 0; // Descriptor of standard input
int out_fd = 1; // Descriptor of standard output
if (argc >= 2) in_fd = open(argv[1], O_RDONLY);
if (argc >= 3) out_fd = open(argv[2], O_WRONLY);
zcrew_up_zpelling(in_fd, out_fd);
return 0;
}
So, let's run our program:
./our_program
Hmm, it's waiting for input. We didn't pass any arguments, so it's just using stdin and stdout. What if we type "Using stdin and stdout"?
Uzing ztdin and ztdout
Interesting. Let's try something different. First, we create a file containing "Hello worlds" named, let's say, hello.txt.
./our_program hello.txt
What do we get?
Hello worldz
And one more run:
./out_program hello.txt output.txt
Out program returns immediately, but creates a file called output.text containing... our output!
Deep breath. At this point, I'm hoping that I've successfully explained how a program is able to have behavior independent of the type of file-like-thingy hooked up to a file descriptor, and also to choose what file-like-thingy gets hooked up.
What about that pipe thing I mentioned? What about streaming? Why does it work when I say:
echo Tessting | ./our_program | grep -o z | wc -l
Well, each of these programs follows some form of the conventions above. our_program, as we know, by default reads from stdin and writes to stdout. grep does the same thing. wc by default reads from stdin, but by default writes to stdout -- it likes to live at the end of pipelines. And echo doesn't read from a file descriptor at all (it just reads arguments, like we did in main()), but writes to stdout, so likes to live at the front of streams.
How does this all work? Well, to get much deeper we have to talk about the shell. The shell is the program that starts other command line programs, and it gets to choose what file descriptors are already hooked up to when a program starts. Those magic numbers of 0 and 1 for stdin and stdout we used earlier? That's a Unix convention, and the shell hooks up a file-like-thingy to each of those file descriptors before starting your program. When the shell sees you asking for a pipeline by entering a command with | characters, it hooks the stdout of one program directly into the stdin of the next program, using a file-like-thingy called a pipe. A file-like-thingy pipe, just like a plumbing pipe, takes whatever is put in one end and puts it out the other.
So, we've combined three things:
Code that deals with file descriptors, without worrying about what they're hooked to
Conventions for default file descriptors to use for normal tasks
The shell's ability to set up a program's file descriptors to "pipe" to other programs'
Together, these give us the ability to write programs that "play nice" with streaming and pipelines, without each program having to understand where it sits in the pipeline and what's happening around it.

Resources