how to send text to a process in a shell script? - shell

So I have a Linux program that runs in a while(true) loop, which waits for user input, process it and print result to stdout.
I want to write a shell script that open this program, feed it lines from a txt file, one line at a time and save the program output for each line to a file.
So I want to know if there is any command for:
- open a program
- send text to a process
- receive output from that program
Many thanks.

It sounds like you want something like this:
cat file | while read line; do
answer=$(echo "$line" | prog)
done
This will run a new instance of prog for each line. The line will be the standard input of prog and the output will be put in the variable answer for your script to further process.
Some people object to the "cat file |" as this creates a process where you don't really need one. You can also use file redirection by putting it after the done:
while read line; do
answer=$(echo "$line" | prog)
done < file

Have you looked at pipes and redirections ? You can use pipes to feed input from one program into another. You can use redirection to send contents of files to programs, and/or write output to files.

I assume you want a script written in bash.
To open a file you just need to type a name of it.
To send a text to a program you either pass it through | or with < (take input from file)
To receive output you use > to redirect output to some file or >> to redirect as well but append the results instead of truncating the file
To achieve what you want in bash, you could write:
#/bin/bash
cat input_file | xargs -l1 -i{} your_program {} >> output_file
This calls your_program for each line from input_file and appends results to output_file

Related

How to run a command through a text file with different number of arguments on each line

I have a text file similar to the one below (much longer). I'm trying to do a lookup for each of these IP addresses using the host command. Do you know how I could do this in the order of the text file (entire first line, then the second line, etc.)?
I tried using this, but it did not execute correctly:
while read in; do host "$in"; done < inputfile.txt > outputfile.txt
Input text file:
10.10.999.200 10.11.223.334 10.55.555.555
10.12.238.222 10.52.212.212
10.12.238.222 10.14.217.232
10.23.212.212 10.19.301.305 10.12.345.678
Set the spaces to newlines and pipe each IP to xargs to process.
tr ' ' '\n' < inputfile.txt | xargs -IX host X > outputfile.txt
I would do it this way:
cat file | while read line
do
echo "$line"
done
This way can see line by line. However, if your file is huge, it will take long time because every time you read file in shell, program open, read, close file. In such case you have to use AWK

Duplicate stdin to stdout

I am looking for a bash one-liner that duplicates stdin to stdout without interleaving. The only solution I have found so far is to use tee, but that does produced interleaved output. What do I mean by this:
If e.g. a file f reads
a
b
I would like to execute
cat f | HERE_BE_COMMAND
to obtain
a
b
a
b
If I use tee - as the command, the output typically looks something like
a
a
b
b
Any suggestions for a clean solution?
Clarification
The cat f command is just an example of where the input can come from. In reality, it is a command that can (should) only be executed once. I also want to refrain from using temporary files, as the processed data is sort of sensitive and temporary files are always error-prone when the executed command gets interrupted. Furthermore, I am not interested in a solution that involves additional scripts (as stated above, it should be a one-liner) or preparatory commands that need to be executed prior to the actual duplication command.
Solution 1:
<command_which_produces_output> | { a="$(</dev/stdin)"; echo "$a"; echo "$a"; }
In this way, you're saving the content from the standard input in a (choose a better name please), and then echo'ing twice.
Notice $(</dev/stdin) is a similar but more efficient way to do $(cat /dev/stdin).
Solution 2:
Use tee in the following way:
<command_which_produces_output> | tee >(echo "$(</dev/stdin)")
Here, you're firstly writing to the standard output (that's what tee does), and also writing to a FIFO file created by process substitution:
>(echo "$(</dev/stdin)")
See for example the file it creates in my system:
$ echo >(echo "$(</dev/stdin)")
/dev/fd/63
Now, the echo "$(</dev/stdin)" part is just the way I found to firstly read the entire file before printing it. It echo'es the content read from the process substitution's standard input, but once all the input is read (not like cat that prints line by line).
Store the second input in a temp file.
cat f | tee /tmp/showlater
cat /tmp/showlater
rm /tmp/showlater
Update:
As shown in the comments (#j.a.) the solution above will need to be adjusted into the OP's real needs. Calling will be easier in a function and what do you want to do with errors in your initial commands and in the tee/cat/rm ?
I recommend tee /dev/stdout.
cat f | tee /dev/stdout
One possible solution I found is the following awk command:
awk '{d[NR] = $0} END {for (i=1;i<=NR;i++) print d[i]; for (i=1;i<=NR;i++) print d[i]}'
However, I feel there must be a more "canonical" way of doing this using.
a simple bash script ?
But this will store all the stdin, why not store the output to a file a read the file both if you need ?
full=""
while read line
do
echo "$line"
full="$full$line\n"
done
printf $full
The best way would be to store the output in a file and show it later on. Using tee has the advantage of showing the output as it comes:
if tmpfile=$(mktemp); then
commands | tee "$tmpfile"
cat "$tmpfile"
rm "$tmpfile"
else
echo "Error creating temporary file" >&2
exit 1
fi
If the amount of output is limited, you can do this:
output=$(commands); echo "$output$output"

using cat in a bash script is very slow

I have very big text files(~50,000) over which i have to do some text processing. Basically run multiple grep commands.
When i run it manually it returns in an instant , but when i do the same in a bash script - it takes a lot of time. What am i doing wrong in below bash script. I pass the names of files as command line arguments to script
Example Input data :
BUSINESS^GFR^GNevil
PERSONAL^GUK^GSheila
Output that should come in a file - BUSINESS^GFR^GNevil
It starts printing out the whole file on the terminal after quite some while. How do i suppress the same?
#!/bin/bash
cat $2 | grep BUSINESS
Do NOT use cat with program that can read file itself.
It slows thing down and you lose functionality:
grep BUSINESS test | grep '^GFR|^GDE'
Or you can do like this with awk
awk '/BUSINESS/ && /^GFR|^GDE/' test

How would I run a unix command in a loop with variable data?

I'd like to run a unix command in a loop, replacing a variable for each iteration and then store the output into a file.
I'll be grabbing the HTTP headers of a series of URL's using curl -I and then I want each instance outputted to a new line of a file.
I know
I could store the output with | cat or redirect it into a file with >, but how would I run the loop?
I have a file with a list of URL's one per line (or I could comma separate them, alternatively).
You can write:
while IFS= read -r url ; do
curl -I "$url"
done < urls-to-query.txt > retrieved-headers.txt
(using the built-in read command, which reads a line from standard input — in this case redirected from urls-to-query.txt — and saves it to a variable — in this case $url).
Given a list of URLs in a file:
http://url1.com
http://url2.com
You could run
cat inputfile | xargs curl -I >> outputfile
That would read each line of the input file and append the results for each row into the outputfile

Best way to modify a file when using pipes?

I often have shell programming tasks where I run into this pattern:
cat file | some_script > file
This is unsafe - cat may not have read in the entire file before some_script starts writing to it. I don't really want to write the result to a temporary file (its slow, and I don't want the added complication of thinking up a unique new name).
Perhaps, there is there is a standard shell command that will buffer a whole stream until EOF is reached? Something like:
cat file | bufferUntilEOF | script > file
Ideas?
Like many others, I like to use temporary files. I use the shell process-id as part of the temporary name so that if multiple copies of the script are running at the same time, they won't conflict. Finally, I then only overwrite the original file if the script succeeds (using boolean operator short-circuiting - it's a little dense but very nice for simple command lines). Putting that all together, it would look like:
some_script < file > smscrpt.$$ && mv smscrpt.$$ file
This will leave the temporary file if the command fails. If you want to clean up on error, you can change that to:
some_script < file > smscrpt.$$ && mv smscrpt.$$ file || rm smscrpt.$$
BTW, I got rid of the poor use of cat and replaced it with input redirection.
You're looking for sponge.
Using a temporary file is the correct solution here. When you use a redirection like '>', it is handled by the shell, and no matter how many commands are in your pipeline, the shell is free to delete and overwrite the output file before any command is executed (during pipeline setup).
Another option is just to read the file into a variable:
file_contents=$(cat file)
echo "$file_contents" | script1 | script2 > file
Using mktemp(1) or tempfile(1) saves you the expense of having to think up unique filename.
In response to the OP's question above about using sponge without external dependencies, and building on #D.Shawley's answer, you can have the effect of sponge with only a dependency on gawk, which is not uncommon on Unix or Unix-like systems:
cat foo | gawk -voutfn=foo '{lines[NR]=$0;} END {if(NR>0){print lines[1]>outfn;} for(i=2;i<=NR;++i) print lines[i] >> outfn;}'
The check for NR>0 is to truncate the input file.
To use this in a shell script, change -voutfn=foo to -voutfn="$1" or whatever syntax your shell uses for filename arguments. For example:
#!/bin/bash
cat "$1" | gawk -voutfn="$1" '{lines[NR]=$0;} END {if(NR>0){print lines[1]>outfn;} for(i=2;i<=NR;++i) print lines[i] >> outfn;}'
Note that, unlike real sponge, this may be limited to the size of RAM. sponge actually buffers in a temporary file if necessary.
Using a temporary file is IMO better than attempting to buffer the data in the pipeline.
It almost defeats the purpose of pipelines to buffer them.
I think you need to use mktemp. Something like this will work:
FILE=example-input.txt
TMP=`mktemp`
some_script <"$FILE" >"$TMP"
mv "$TMP" "$FILE"
I think that the best way is to use a temp file. However, if you want another approach, you can use something like awk to buffer up the input into memory before your application starts receiving input. The following script will buffer the all of the input into the lines array before it starts to output it to the next consumer in the pipeline.
{ lines[NR] = $0; }
END {
for (line_no=1; line_no<=NR; ++line_no) {
print lines[line_no];
}
}
You can collapse it into a one-liner if you want:
cat file | awk '{lines[NR]=$0;} END {for(i=1;i<=NR;++i) print lines[i];}' > file
With all of that, I would still recommend using a temporary file for the output and then overwriting the original file with it.

Resources