reading a file line by line from a shell script - bash

I am facing a problem in a bash shell script. This script is supposed to execute another shell script (./script here) and the output of the script is redirected to a file(tmp). Then the file should be read line by line and for each of the lines the same script(./script) should be executed giving the line as argument and the result should be stored in a file(tmp1). Eventually these results should be appended to the first file(tmp).
I am pasting my script below:
./script $1 $2 > tmp
cat tmp | while read a
do
./script $a $2 >> tmp1
done
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
I get the following error when I execute the script "./script: line 11: syntax error: unexpected end of file"
Can anyone please help me out in this??
Thanks a lot in advance.

The script file has DOS/Windows line endings. Try this command, then run your script:
dos2unix ./script
You're probably editing the file using a Windows editor and that's adding \r (0x0d) to the end of each line. This can be removed by using dos2unix.

The shell splits on whitespace normally. You can make it split on newlines by writing, towards the top,
IFS='
'
However, I think what you're trying to do might not be appropriate for shell, and might be better served by Perl or Ruby.

lose all the cats! they are unnecessary. And i suppose summ_tmp is an existing file?
#!/bin/bash
set -x
./wiki $1 $2 > tmp
while read -r a
do
./wiki $a $2 >> tmp1
done < summ_tmp
while read -r line
do
./wiki $line $2 >> tmp
done < tmp1
With what you are doing, you might want to refactor your "./script" to eliminate unnecessary steps. If its not too long, show what your "./script" does. Show your desired output and show examples of relevant input files where possible
Put set -x in your script (wiki and ./script) to help you debug.

Alternatively you could use xargs - this executes a command on every line in a file, which is exactly what you want.
You can replace
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
with
xargs <tmp1 -n1 -IXXX ./script XXX $2 >>tmp
-n1 means read one line at a time from the input,
-IXXX means substitute XXX with the line that was read in - the default is to append it to the end of the command line.

Related

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

bash script prepending ? to file name

I am using the below script. When I have it echo $f as shown below, it gives the correct result:
#/bin/bash
var="\/home\/"
while read p; do
f=$(echo $p | sed "s/${var}/\\n/g")
f=${f%.sliced.bam}.fastq
echo $f
~/bin/samtools view $p | awk '{print "#"$1"\n"$10"\n+\n"$11}' > $f
./run.sh $f ${f%.fastq}
rm ${f%.sliced.bam}.fastq
done < $1
I get the output as expected
test.fastq
But the file being created by awk > $f has the name
?test.fastq
Note that the overall goal here is to run this loop on every file listed in a file with absolute paths but then write locally (which is what the sed call is for)
edit: Run directly on the command line (without variables) the samtools | awk line runs correctly.
Awk cannot possibly have anything to do with your problem. The shell is completely responsible for file redirection, so f MUST have a weird character in it.
Most likely whatever you are sending to this script has a special character in it (e.g. perhaps a UTF character, and your terminal is showing ASCII only). When you do the echo, the shell doesn't know how to display the char, and probably just shows it as whitespace, and when you send it through ls (which might be doing things like colorization) it combines in a strange way and ends up showing the ?.
Oh wait...why are you putting a newline into the filename with sed??? That is possibly your problem...try just:
sed "s/${var}//g"

How to process lines which is read from standard input in UNIX shell script?

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?
/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.
Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.
How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"
Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

How to stream input line by line in UNIX, perform commands on each line, and immediately return output?

I am trying to do something very simple, but solving it the way I want would help me with many other commands as well.
I want to read a file line by line in UNIX and perform commands on them, in this case character count. For an entire file, I would just use:
wc -m
However, I want this per line of input. What is the simplest, shortest way to stream a file line by line for manipulation by UNIX commands? I ask because in this situation I want wc -m per line, but future applications will use completely different commands.
Also, I want to avoid perl and awk! I already know how to do this with those tools, but am looking for alternate methods.
Thanks!
EDIT: Thanks for the link to the other question, but after looking at their 4 answers, I don't see a solution to my exact quandary.
Given the following input:
cat test.txt
This is the first line.
This is the second, longer line.
This is short.
My Final line that is much longer than the first couple of lines.
I want to plug it through some code that will read it line by line and perform a command on each line, immediately returning the result.
Some code which does wc -m on each line and returns the output:
23
32
14
65
Or some code which does cut -d " " -f 1 on each line and returns the output:
This
This
This
My
Hopefully this makes things a bit clearer. Thanks again for any suggestions!
You can use echo "${#line}" to know the length of a string. Reading the file with a while read... will do the rest:
$ cat file
hello
my name
is fedor
qui
$ while read line; do echo "${#line}"; done < file
5
7
8
3
0
In a nicer format:
while read line
do
echo "${#line}"
done < file
Your best bet for line-by-line processing is a while read loop, although the idiom to use to preserve the lines exactly is:
while IFS= read -r line; do
# process "$line"
done
Failing to use IFS= will lose leading whitespace. Failing to use read -r means some backslash sequences will be interpreted by bash and not kept verbatim in the variable.
I think your quandry can be restated:
I have a line of text. How do I treat it like a file?
bash has 2 features that can answer that
for commands like wc that can read from stdin, use a here-string:
wc -m <<< "$line"
for commands that require a file (I can't think of one off the top of my head), use a process substitution:
wc -m <(echo "$line")
Example:
$ line="foo bar baz"
$ wc -m <<<"$line"
12
$ wc -m <(echo "$line")
12 /dev/fd/63
p.s.
I notice the char count includes the implicit trailing newline. To remove that, use printf without a newline in the format string
$ wc -m <(printf %s "$line")
11 /dev/fd/63
$ wc -m < <(printf %s "$line")
11

Reading a file line by line in ksh

We use some package called Autosys and there are some specific commands of this package. I have a list of variables which i like to pass in one of the Autosys commands as variables one by one.
For example one such variable is var1, using this var1 i would like to launch a command something like this
autosys_showJobHistory.sh var1
Now when I launch the below written command, it gives me the desired output.
echo "var1" | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
But if i put the var1 in a file say Test.txt and launch the same command using cat, it gives me nothing. I have the impression that command autosys_showJobHistory.sh does not work in that case.
cat Test.txt | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
What I am doing wrong in the second command ?
Wrote all of below, and then noticed your grep statement.
Recall that ksh doesn't support .. as an indicator for 'expand this range of values'. (I assume that's your intent). It's also made ambiguous by your lack of quoting arguments to grep. If you were using syntax that the shell would convert, then you wouldn't really know what reg-exp is being sent to grep. Always better to quote argments, unless you know for sure that you need the unquoted values. Try rewriting as
grep '1[1-6]:[0-9][0-9]' | grep '24.12.2012'
Also, are you deliberately using the 'match any char' operator '.' OR do you want to only match a period char? If you want to only match a period, then you need to escape it like \..
Finally, if any of your files you're processing have been created on a windows machine and then transfered to Unix/Linux, very likely that the line endings (Ctrl-MCtrl-J) (\r\n) are causing you problems. Cleanup your PC based files (or anything that was sent via ftp) with dos2unix file [file2 ...].
If the above doesn't help, You'll have to "divide and conquer" to debug your problem.
When I did the following tests, I got the expected output
$ echo "var1" | while read line ; do print "line=${line}" ; done
line=var1
$ vi Test.txt
$ cat Test.txt
var1
$ cat Test.txt | while read line ; do print "line=${line}" ; done
line=var1
Unrelated to your question, but certain to cause comment is your use of the cat commnad in this context, which will bring you the UUOC award. That can be rewritten as
while read line ; do print "line=${line}" ; done < Test.txt
But to solve your problem, now turn on the shell debugging/trace options, either by changing the top line of the script (the shebang line) like
#!/bin/ksh -vx
Or by using a matched pair to track the status on just these lines, i.e.
set -vx
while read line; do
print -u2 -- "#dbg: Line=${line}XX"
autosys_showJobHistory.sh $line \
| grep 1[1..6]:[0..9][0..9] \
| grep 24.12.2012 \
| tail -1
done < Test.txt
set +vx
I've added an extra debug step, the print -u2 -- .... (u2=stderror, -- closes option processing for print)
Now you can make sure no extra space or tab chars are creeping in, by looking at that output.
They shouldn't matter, as you have left your $line unquoted. As part of your testing, I'd recommend quoting it like "${line}".
Then I'd comment out the tail and the grep lines. You want to see what step is causing this to break, right? So does the autosys_script by itself still produce the intermediate output you're expecting? Then does autosys + 1 grep produce out as expected, +2 greps, + tail? You should be able to easily see where you're loosing your output.
IHTH

Resources