How to process lines which is read from standard input in UNIX shell script? - shell

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?

/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]

You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.

Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.

How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"

Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

Related

shell script to execute command multiple times reading values from input file

I have a input file, input.txt, and I want to run a command which should read two values from input.txt. Let us assume source name and destination name should be read from input and same command to be iterated thousands times based on input.txt.
Also the output of the command for each is to be stored in separate log. Is this possible with a singular input file or we need to use 2 files for source and destination? Request you to provide shell script to be used to achieve this as I am poor in shell scripting. I tried the below which is not working.
while read i j; do
command $i $j
done > output.txt
Sure. Suppose this is input.txt:
source1.txt dest1.txt
source2.txt dest2.txt
...
And you want to do this:
command source1.txt dest1.txt
command source2.txt dest2.txt
...
Here's a way:
while read i o; do
command $i $o
done < input.txt
This assumes that command command is already constructed to read from it's first argument and write to its second. If command instead prints to stdout (i.e., to the terminal screen), then replace command $i $o with command $i > $o. This also assumes that there are no spaces or funny characters in input.txt.
There is also a way that will be significantly faster if your input.txt has e.g. millions of lines or more:
awk '{printf "command %s\n", $0}' input.txt | sh
Or, if you must use command $i > $o:
awk '{printf "command %s > %s\n", $1, $2}' input.txt | sh
This method reads lines from input.txt and prints command source1.txt dest1.txt for the first line, command source2.txt dest2.txt for the second, etc... Then it "pipes" (|) those commands to sh, which executes them.
For error handling in command, try:
while read i o; do
command $i $o || command2 $i $o >> command2.log
done < input.txt 2> error.log
Or:
done < input.txt > error.log 2>&1
(One of these will work better, depending on whether command and command2 print their errors to stdout(1) or stderr(2).)
Say you want different outputs in different files, then on log file per command and one error file per command:
while read i o; do
command $i $o 2>"$i$o.err" >"$i$o.log"
done < input.txt
Error and log in same file: stderr is redirected to stdout thanks to 2>&1:
while read i o; do
command $i $o 2>&1 >"$i$o.log"
done < input.txt
You can also have all in same file output.log:
echo "" > output.log
while read i o; do
command $i $o 2>&1 >> output.log
done < input.txt

Save multiple variables from bash script to text file

I have a simple bash script I have written to count the number of lines in a collection of text files, and I store each number of lines as a variable using a for loop. I would like to print each variable to the same text file, so that I may access all the line counts at once, from the same file.
My code is:
for f in *Daily.txt; do
lines=$(cat $f | wc -l);
lines=$(($num_lines -1));
echo $lines > /destdrive/linesTally2014.txt;
done
When I run this, the only output I receive is of the final file, not all the other files.
If anyone could help me with this I would really appreciate it. I am new to bash scripting, so please excuse this novice question.
You create the file on each iteration. Move the I/O redirection after the done. Use:
for f in *Daily.txt
do
echo $(( $(wc -l < $f) - 1))
done > /destdrive/linesTally2014.txt
This avoids the variable; if you have a need for it, you can use a fixed version of the original code (use $lines throughout, instead of using $num_lines once). Note that the code in the question has a UUoC (Useless Use of cat) that this version avoids.
You can avoid the loop with
wc -l *Daily.txt | awk '{ print $1 }' > /destdrive/linesTally2014.txt
or (when you want 1 less)
wc -l *Daily.txt | awk '{ print $1 -1 }' > /destdrive/linesTally2014.txt
The above suggestions are probably better, but the problem you're having with your script is your use of the > for redirection, which overwrites the file. Use >> and it will append to the file.
echo $lines >> /destdrive/linesTally2014.txt

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

pipe tail output into another script

I am trying to pipe the output of a tail command into another bash script to process:
tail -n +1 -f your_log_file | myscript.sh
However, when I run it, the $1 parameter (inside the myscript.sh) never gets reached. What am I missing? How do I pipe the output to be the input parameter of the script?
PS - I want tail to run forever and continue piping each individual line into the script.
Edit
For now the entire contents of myscripts.sh are:
echo $1;
Generally, here is one way to handle standard input to a script:
#!/bin/bash
while read line; do
echo $line
done
That is a very rough bash equivalent to cat. It does demonstrate a key fact: each command inside the script inherits its standard input from the shell, so you don't really need to do anything special to get access to the data coming in. read takes its input from the shell, which (in your case) is getting its input from the tail process connected to it via the pipe.
As another example, consider this script; we'll call it 'mygrep.sh'.
#!/bin/bash
grep "$1"
Now the pipeline
some-text-producing-command | ./mygrep.sh bob
behaves identically to
some-text-producing-command | grep bob
$1 is set if you call your script like this:
./myscript.sh foo
Then $1 has the value "foo".
The positional parameters and standard input are separate; you could do this
tail -n +1 -f your_log_file | myscript.sh foo
Now standard input is still coming from the tail process, and $1 is still set to 'foo'.
Perhaps your were confused with awk?
tail -n +1 -f your_log_file | awk '{
print $1
}'
would print the first column from the output of the tail command.
In the shell, a similar effect can be achieved with:
tail -n +1 -f your_log_file | while read first junk; do
echo "$first"
done
Alternatively, you could put the whole while ... done loop inside myscript.sh
Piping connects the output (stdout) of one process to the input (stdin) of another process. stdin is not the same thing as the arguments sent to a process when it starts.
What you want to do is convert the lines in the output of your first process into arguments for the the second process. This is exactly what the xargs command is for.
All you need to do is pipe an xargs in between the initial command and it will work:
tail -n +1 -f your_log_file | xargs | myscript.sh

reading a file line by line from a shell script

I am facing a problem in a bash shell script. This script is supposed to execute another shell script (./script here) and the output of the script is redirected to a file(tmp). Then the file should be read line by line and for each of the lines the same script(./script) should be executed giving the line as argument and the result should be stored in a file(tmp1). Eventually these results should be appended to the first file(tmp).
I am pasting my script below:
./script $1 $2 > tmp
cat tmp | while read a
do
./script $a $2 >> tmp1
done
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
I get the following error when I execute the script "./script: line 11: syntax error: unexpected end of file"
Can anyone please help me out in this??
Thanks a lot in advance.
The script file has DOS/Windows line endings. Try this command, then run your script:
dos2unix ./script
You're probably editing the file using a Windows editor and that's adding \r (0x0d) to the end of each line. This can be removed by using dos2unix.
The shell splits on whitespace normally. You can make it split on newlines by writing, towards the top,
IFS='
'
However, I think what you're trying to do might not be appropriate for shell, and might be better served by Perl or Ruby.
lose all the cats! they are unnecessary. And i suppose summ_tmp is an existing file?
#!/bin/bash
set -x
./wiki $1 $2 > tmp
while read -r a
do
./wiki $a $2 >> tmp1
done < summ_tmp
while read -r line
do
./wiki $line $2 >> tmp
done < tmp1
With what you are doing, you might want to refactor your "./script" to eliminate unnecessary steps. If its not too long, show what your "./script" does. Show your desired output and show examples of relevant input files where possible
Put set -x in your script (wiki and ./script) to help you debug.
Alternatively you could use xargs - this executes a command on every line in a file, which is exactly what you want.
You can replace
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
with
xargs <tmp1 -n1 -IXXX ./script XXX $2 >>tmp
-n1 means read one line at a time from the input,
-IXXX means substitute XXX with the line that was read in - the default is to append it to the end of the command line.

Resources