Cannot grep a file from one line number to another in shell script - bash

I am unable to grep a file from a shell script written. Below is the code
#!/bin/bash
startline6=`cat /root/storelinenumber/linestitch6.txt`
endline6="$(wc -l < /mnt/logs/arcfilechunk-aim-stitch6.log.2017-11-08)"
awk 'NR>=$startline6 && NR<=$endline6' /mnt/logs/arcfilechunk-aim-stitch6.log.2017-11-08 | grep -C 100 'Error in downloading indivisual chunk' > /root/storelinenumber/error2.txt
The awk command is working on standalone basis though when the start and end line numbers are given manually.

There was an issue with the syntax. The last line was modified to
awk 'NR>='"$startline9"' && NR<='"$endline9"'' /mnt/logs/arcfilechunk-aim-stitch9.log | grep -C 100 'Error in downloading indivisual chunk' >> /root/storelinenumber/error.txt
It solved the issue.

You have your attempted variable expansions within single quotes, meaning that they won't actually be expanded.
When passing shell variables into awk, I prefer them to be actual first-class awk variables so I don't have to worry about that sort of stuff:
awk -vstl=$startline6 -vendl=$endline6 'NR>=stl && NR<=endl ...

Related

How to add string to cat result in bash?

File info has some certain info on a line starting with myline. Im trying to pass it to a script like this:
bash myscript `cat info | grep myline`
This works well. Script gets "myline" as first argument. But now i want to add a "w" at the end of that. I tried
bash myscript `cat info | grep myline`w
This is already problematic, the script gets "wyline" as first argument.
And now the next step is that i actually want to have an if statement whether i want to add w or not. Tried this:
bash myscript `cat info | grep myline``[ "condition" == "condition"] && echo "w"`
This works the same way. Script gets "wyline" as first argument.
So I have two questions:
1) How to fix the "wyline" result to get desired "mylinew"
2) Is there a better way to write this if statement after cat?
Do not use backticks `, use $(...) instead. bash hackers obsolete deprecated syntax
cat file | grep is a useless use of cat useless use of cat award. Just grep file.
Just quote the result and add w:
myscript "$(grep myline info)w"
You can add a trailing w to the last line of input with sed:
myscript "$(grep myline info | sed '$s/$/w/')"
I would advise to always quote your variable expansions.
Script gets "wyline" as first argument.
Your input file has dos line endings. Inspect output with cut -v or hexdump -C or xxd. Use dos2unix and remove carriage return characters.

AWK NR Variable Syntax Issue

I am new to AWK and trying to write some code where I can delete all files in a directory apart from the newest N number.
My code works if I use a hard coded number instead of a variable.
Works:
delete=`ls -t | awk 'NR>3'`
echo $delete
Does Not Work:
amount=3
delete=`ls -t | awk 'NR>$amount'`
echo $delete
I know the problem lies somewhere with the bash variable not being recognised as an awk variable however I do not know how to correct.
I have tried variations of code to fix this such as below, however I am still at a loss.
amount=3
delete=`ls -t | awk -v VAR=${amount} 'NR>VAR'`
echo $delete
Could you please advise what the correct code is ?
Shells don't expand anything inside single quotes.
Either:
amount=3
delete=$(ls -t | awk "NR>$amount")
or:
amount=3
delete=$(ls -t | awk -v amount=$amount 'NR > amount')
Be aware that parsing the output of ls is fraught if your file names are not limited to the portable file name character set. Spaces, newlines, and other special characters in the file name can wreck the parsing.
The simplest fix is to use double quotes instead of single. Single quotes prevent the shell from interpolating the variable $amount in the quoted string.
amount=3
ls -t | awk "NR>$amount"
I would not use a variable to capture the result. If you do use one, you need to quote it properly when interpolating it.
amount=3
delete=$(ls -t | awk -v VAR="$amount" 'NR>VAR')
echo "$delete"
Note that this is basically identical to your second attempt, which should have worked, modulo the string quoting issues.

Use sed substitution from different files

Okay, I am a newbie to Unix scripting. I was given the task to find a temporary work around for this:
cat /directory/filename1.xml |sed -e "s/ABCXYZ/${c}/g" > /directory/filename2.xml
$c is a variable from a sqlplus count query. I totally understand how this sed command is working. But here is where I am stuck. I am storing the count associated with the variable in another file called filename3 as count[$c] where $c is replaced with a number. So my question is how can I update this sed command to substitute ABCXYZ with the count from file3?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
UPDATE: In case anyone has a similar issue I got mine to work using:
rm /directory/folder/variablefilename.dat
echo $c >> /directory/folder/variablefilename.dat
d=$(grep [0-9] /directory/folder/variablefilename.dat)
sed -3 "s/ABC123/${d}/g" /directory/folder/inputfile.xml >> /directory/folder/outputfile.xml
thank you to Kaz for pointing me in the right direction
Store the count in filename3 using the syntax c=number. Then you can source the file as a shell script:
. /filename3 # get c variable
sed -e "s/ABCXYZ/${c}/g" /directory/filename1.xml > /directory/filename2.xml
If you can't change the format of filename3, you can write a shell function which scrapes the number out of that file and sets the c variable. Or you can scrape the number out with an external program like grep, and then interpolate its output into a variable assignment using command substitution: $(command arg ...) syntax.
Suppose we can rely on file3 to contain exactly one line of the form count[42]. Then we can just extract the digits with grep -o:
c=$(grep -E -o '[0-9]+' filename3)
sed -e "s/ABCXYZ/$c/g" /directory/filename1.xml > /directory/filename2.xml
The c variable can be eliminated, of course; you can stick the $(grep ...) into the sed command line in place of $c.
A file which contains numerous instances of syntax like count[42] for various variables could be transformed into a set of shell variable assignments using sed, and then sourced into the current shell to make those assignments happen:
$ sed -n -e 's/^\([A-Za-z_][A-Za-z0-9_]\+\)\[\(.*\)\]/\1=\2/p' filename3 > vars.sh
$ . ./vars.sh
you can use sed like this
sed -r "s/ABCXYZ/$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)/g" path_to_file
the expression is double quoted which allow the shell to execute below and find the number in count[$c] in the file and use it as a substitute
$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)

How to process lines which is read from standard input in UNIX shell script?

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?
/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.
Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.
How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"
Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

Resources