Issue while processing a line using awk in unix - shell

I am running realpath command on each line of a file. Two sample lines of file are
$HOME:1:2
$HOME:1:2 3
I am expecting output of above two lines after running my command as:
/home/mjain8:1:2
/home/mjain8:1:2 3
The awk command I am running is awk 'BEGIN{cmd="realpath "}{cmd$0|getline;print $0;}' FS=':' OFS=':'
Now, when I run the command on first line it runs fine and gives me the desired output. But for line 2 of file (shown above) the output is /home/mjain8:1:2 (and NOT /home/mjain8:1:2 3). That is the output only contains line before space.
Can someone please point what am I doing wrong. Also, in case you have suggestion to use any other command please let me know same too. I have been struggling to do same using awk since last 2 days.
I want to make it portable so that it run on as many shells as possible.

With shell's while loop it will be much simpler, could you please try following. It worked fine for me.
while IFS=':' read -r path rest
do
real=$(realpath "$path")
echo "$real:$rest"
done < "Input_file"
Above code has real variable to first have realpath command's value and then it prints its output along with rest variable, in case you want to directly print them as per tripleee's comment use following then.
while IFS=':' read -r path rest
do
echo "$(realpath "$path"):$rest"
done < "Input_file"

With Perl-one liner also, you could do it easily
> export HOME=/home/mjain8
> cat home.txt
$HOME:1:2
$HOME:1:2 3
> perl -F: -lane ' {$F[0]=$ENV{HOME} ;print join(":",#F) } ' home.txt
/home/mjain8:1:2
/home/mjain8:1:2 3
> perl -F: -lane ' {$F[0]=$ENV{HOME} if $F[0]=~/\$HOME/;print join(":",#F) } ' home.txt # if you need to explicity check if it is HOME
/home/mjain8:1:2
/home/mjain8:1:2 3
>

Related

give a file without changing the name in script [duplicate]

This question already has answers here:
How to pass parameters to a Bash script?
(4 answers)
Closed 1 year ago.
At the beginning I have a file.txt, which contains several informations that I will take using the grep command as you see in the script.
What I want is to give the script the file I want instead of file.txt but without changing the file name each time in the script for example if the file is named Me.txt I don’t want to go into the script and write Me.txt in each grep command especially if I have dozens of orders.
Is there a way to do this?
#!/bin/bash
grep teste file.txt > testline.txt
awk '{print $2}' testline.txt > test.txt
echo '#'
echo '#'
grep remote file.txt > remoteline.txt
awk '{print $3}' remoteline.txt > remote.txt
echo '#'
echo '#'
grep adresse file.txt > adresseline.txt
awk '{print $2}' adresseline.txt > adresse.txt
Using a parameter, as many contributors here suggested, is of course the obvious approach, and the one which is usually taken in such case, so I want to extend this idea:
If you do it naively as
filename=$1
you have to supply the name on every invocation. You can improve on this by providing a default value for the case the parameter is missing:
filename=${1:-file.txt}
But sometimes you are in a situation, where for some time (working on a specific task), you always need the same filename over and over, and the default value happens to be not the one you need. Another possibility to pass information to a program is via the environment. If you set the filename by
filename=${MOOFOO:-file.txt}
it means that - assuming your script is called myscript.sh - if you invoke your script by
MOOFOO=myfile.txt myscript.sh
it uses myfile.txt, while if you call it by
myscript.sh
it uses the default file.txt. You can also set MOOFOO in your shell, as
export MOOFOO=myfile.txt
and then, even a lone execution of
myscript.sh
with use myfile.txt instead of the default file.txt
The most flexible approach is to combine both, and this is what I often do in such a situation. If you do in your script a
filename=${1:-${MOOFOO:-file.txt}}
it takes the name from the 1st parameter, but if there is no parameter, takes it from the variable MOOFOO, and if this variable is also undefined, uses file.txt as the last fallback.
You should pass the filename as a command line parameter so that you can call your script like so:
script <filename>
Inside the script, you can access the command line parameters in the variables $1, $2,.... The variable $# contains the number of command line parameters passed to the script, and the variable $0 contains the path of the script itself.
As with all variables, you can choose to put the variable name in curly brackets which has advantages sometimes: ${1}, ${2}, ...
#!/bin/bash
if [ $# = 1 ]; then
filename=${1}
else
echo "USAGE: $(basename ${0}) <filename>"
exit 1
fi
grep teste "${filename}" > testline.txt
awk '{print $2}' testline.txt > test.txt
echo '#'
echo '#'
grep remote "${filename}" > remoteline.txt
awk '{print $3}' remoteline.txt > remote.txt
echo '#'
echo '#'
grep adresse "${filename}" > adresseline.txt
awk '{print $2}' adresseline.txt > adresse.txt
By the way, you don't need two different files to achieve what you want, you can just pipe the output of grep straight into awk, e.g.:
grep teste "${filename}" | awk '{print $2}' > test.txt
but then again, awk can do the regex match itself, reducing it all to just one command:
awk '/teste/ {print $2}' "${filename}" > test.txt

Cannot grep a file from one line number to another in shell script

I am unable to grep a file from a shell script written. Below is the code
#!/bin/bash
startline6=`cat /root/storelinenumber/linestitch6.txt`
endline6="$(wc -l < /mnt/logs/arcfilechunk-aim-stitch6.log.2017-11-08)"
awk 'NR>=$startline6 && NR<=$endline6' /mnt/logs/arcfilechunk-aim-stitch6.log.2017-11-08 | grep -C 100 'Error in downloading indivisual chunk' > /root/storelinenumber/error2.txt
The awk command is working on standalone basis though when the start and end line numbers are given manually.
There was an issue with the syntax. The last line was modified to
awk 'NR>='"$startline9"' && NR<='"$endline9"'' /mnt/logs/arcfilechunk-aim-stitch9.log | grep -C 100 'Error in downloading indivisual chunk' >> /root/storelinenumber/error.txt
It solved the issue.
You have your attempted variable expansions within single quotes, meaning that they won't actually be expanded.
When passing shell variables into awk, I prefer them to be actual first-class awk variables so I don't have to worry about that sort of stuff:
awk -vstl=$startline6 -vendl=$endline6 'NR>=stl && NR<=endl ...

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

How to process lines which is read from standard input in UNIX shell script?

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?
/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.
Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.
How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"
Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

reading a file line by line from a shell script

I am facing a problem in a bash shell script. This script is supposed to execute another shell script (./script here) and the output of the script is redirected to a file(tmp). Then the file should be read line by line and for each of the lines the same script(./script) should be executed giving the line as argument and the result should be stored in a file(tmp1). Eventually these results should be appended to the first file(tmp).
I am pasting my script below:
./script $1 $2 > tmp
cat tmp | while read a
do
./script $a $2 >> tmp1
done
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
I get the following error when I execute the script "./script: line 11: syntax error: unexpected end of file"
Can anyone please help me out in this??
Thanks a lot in advance.
The script file has DOS/Windows line endings. Try this command, then run your script:
dos2unix ./script
You're probably editing the file using a Windows editor and that's adding \r (0x0d) to the end of each line. This can be removed by using dos2unix.
The shell splits on whitespace normally. You can make it split on newlines by writing, towards the top,
IFS='
'
However, I think what you're trying to do might not be appropriate for shell, and might be better served by Perl or Ruby.
lose all the cats! they are unnecessary. And i suppose summ_tmp is an existing file?
#!/bin/bash
set -x
./wiki $1 $2 > tmp
while read -r a
do
./wiki $a $2 >> tmp1
done < summ_tmp
while read -r line
do
./wiki $line $2 >> tmp
done < tmp1
With what you are doing, you might want to refactor your "./script" to eliminate unnecessary steps. If its not too long, show what your "./script" does. Show your desired output and show examples of relevant input files where possible
Put set -x in your script (wiki and ./script) to help you debug.
Alternatively you could use xargs - this executes a command on every line in a file, which is exactly what you want.
You can replace
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
with
xargs <tmp1 -n1 -IXXX ./script XXX $2 >>tmp
-n1 means read one line at a time from the input,
-IXXX means substitute XXX with the line that was read in - the default is to append it to the end of the command line.

Resources