in shell script how to print a line if the previous and the next line has a blank and the - shell

I have a file like
abc
1234567890
0987654321
cde
fgh
ijk
1234567890
0987654321
I need to write a script that extract the lines with a blank before and after, in the example should be like this:
cde
fgh
I guess that awk or sed could do the work but I wasn't able to make them work. Any help?

Here is the solution.
#!/bin/bash
amt=$(sed -n '$=' path-to-your-file)
i=0
while :
do
((i++))
if [ $i == $amt ]; then
break
fi
if ! [ $i == 1 ]; then
j=$(expr $i - 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
j=$(expr $i + 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
emp=$(sed $i'!d' path-to-your-file)
echo >> extracted $emp
fi
fi
fi
done

With awk:
awk '
BEGIN{
RS=""
FS="\n"
}
NF==1' file
Prints:
cde
fgh

very simple solution
cat "myfile.txt" | grep -A 1 '^$' | grep -B 1 '^$' | grep -v -e '^--$' | grep -v '^$'
assuming "--" is the default group separator
you may get ride of group separator by other means like
--group-separator="" or --no-group-separator options
but depends of grep program variant (BSD, Gnu, OSX... )

Related

Removing current line of a file

I'm facing something that looks easy, but can't find the answer :
The goal of this function is to remove all the line that contains 3 commas ',' :
while read line; do
COUNT=$(echo $line | grep -o "\," | wc -)
if [ $COUNT -ne 3 ]; then
remove line
fi
done < tmp.txt
I dont find how to remove current line, can you help me ?
I extract this tmp.txt from a larger with grep, if it was in a variable instead of a tmp.txt will it be the same ?
while read line; do
COUNT=$(echo $line | grep -o "\," | wc -)
COUNT=$(echo $line | grep -o "\," | wc -)
if [ $COUNT -ne 3 ]; then
remove line
fi
done <<< "$toto"
Thanks in advance
Using sed command only solution.
sed '/^\([^,]*,\)\{3\}[^,]*$/d' infile
Delete all those line which character comma , occurred exactly 3 times.
Or using awk:
awk -F, 'NF!=4' infile
Or both read from a variable.
sed '/^\([^,]*,\)\{3\}[^,]*$/d' <<<"$variable"
awk -F, 'NF!=4' <<<"$variable"
A simple awk solution
awk 'gsub(/,/,",")!=3' file
gsub replaces the pattern with the specified string and it returns the number of substitutions/replacements made.
We are replacing , with , here and thus gsub will return us the number of , in the string.
Example :
Input file
hello this line has 1 ,
This line, has, 3 ,
This line, has, 4 , commas , Thanks
Output
$ awk 'gsub(/,/,",")!=3' file
hello this line has 1 ,
This line, has, 4 , commas , Thanks
I would have done it in the other way :
while read line; do
COUNT=$(echo $line | grep -o "\," | wc -)
if [ $COUNT -eq 3 ]; then
echo $line >> $tempofile
fi
done < tmp.txt
If the line is matched, keep it, otherwise get to next line.
This simple command can remove all the lines that contains 3
$ awk '!/3/' file_name

How to copy only that data to a file which is not present in that file in shell script / bash?

I have to put some data in a file which should be unique.
suppose in
file1 I have following data.
ABC
XYZ
PQR
and now I want to add MNO DES ABC then it should only copy "MNO" and "DES" as "ABC" is already present.
file1 should look like
ABC
XYZ
PQR
MNO
DES
(ABC should be there for only once.)
Easiest way: this sholud add non-matching line in f1
diff -c f1 f2|grep ^+|awk -F '+ ' '{print $NF}' >> f1
or if '+ ' is going to be a part of actual text:
diff -c f1 f2|grep ^+|awk -F '+ ' '{ for(i=2;i<=NF;i++)print $i}' >> f1
shell script way:
I have compare script that compares line counts/lenght etc.. but for your requirement I think below part should do the job....
input:
$ cat f1
ABC
XYZ
PQR
$ cat f2
MNO
DES
ABC
output after script*
$ ./compareCopy f1 f2
-----------------------------------------------------
comparing f1 f2
-----------------------------------------------------
Lines check - DONE
$ cat f1
ABC
XYZ
PQR
DES
MNO
#!/bin/sh
if [ $# != "2" ]; then
echo
echo "Requires arguments from command prompt"
echo "Usage: compare <file1> <file2>"
echo
exit
fi
proc="compareCopy"
#sort files for line by line compare
cat $1|sort > file1.tmp
cat $2|sort > file2.tmp
echo "-----------------------------------------------------"
echo " comparing $1 $2" |tee ${proc}_compare.result
echo "-----------------------------------------------------"
file1_lines=`wc -l $1|cut -d " " -f1`
file2_lines=`wc -l $2|cut -d " " -f1`
#Check each line
x=1
while [ "${x}" -le "${file1_lines}" ]
do
f1_line=`sed -n ${x}p file1.tmp`
f2_line=`sed -n ${x}p file2.tmp`
if [ "${f1_line}" != "${f2_line}" ]; then
echo "line number ${x} don't match in both $1 and $2 files" >> ${proc}_compare.result
echo "$1 line: "${f1_line}"" >> ${proc}_compare.result
echo "$2 line: "${f2_line}"" >> ${proc}_compare.result
# so add this line in file 1
echo $f2_line >> $1
fi
x=$[${x} +1]
done
rm -f file1.tmp file2.tmp
echo "Lines check - DONE" |tee -a ${proc}_compare.result
Use fgrep:
fgrep -vf file1 file2 > file2.tmp && cat file2.tmp >> file1 && rm file2.tmp
which fetches all lines of file2 that are not in file1 and appends the result to file1.
You may want to take a look at this post: grep -f maximum number of patterns?
Perl one liner
file one:
1
2
3
file two:
1
4
3
Print Only Unique Line
perl -lne 'print if ++$n{ $_ } == 1 ' file_one.txt file_two.txt
Or
perl -lne 'print unless ++$n{ $_ } ' file_one.txt file_two.txt
output
1
4
3
2
The natural way:
sort -u File1 File2 >Temp && mv Temp File1
The tricky way if the files are already sorted:
comm File1 File2 | awk '{$1=$1};1' >Temp && mv Temp File1

How to pass filename through variable to be read it by awk

Good day,
I was wondering how to pass the filename to awk as variable, in order to awk read it.
So far I have done:
echo file1 > Aenumerar
echo file2 >> Aenumerar
echo file3 >> Aenumerar
AE=`grep -c '' Aenumerar`
r=1
while [ $r -le $AE ]; do
lista=`awk "NR==$r {print $0}" Aenumerar`
AEList=`grep -c '' $lista`
s=1
while [ $s -le $AEList ]; do
word=`awk -v var=$s 'NR==var {print $1}' $lista`
echo $word
let "s = s + 1"
done
let "r = r + 1"
done
Thanks so much in advance for any clue or other simple way to do it with bash command line
Instead of:
awk "NR==$r {print $0}" Aenumerar
You need to use:
awk -v r="$r" 'NR==r' Aenumerar
Judging by what you've posted, you don't actually need all the NR stuff; you can replace your whole script with this:
while IFS= read -r lista ; do
awk '{print $1}' "$lista"
done < Aenumerar
(This will print the first field of each line in each of file1, file2, file3. I think that's what you're trying to do?)

How to properly parse this scenario in a simple bash script?

I have a file where each key-value pair takes a new line. There is a possibility of having multiple values for each key. I want to return a list of all pairs that have a "special key", where "special" is is defined as some function.
For Example, if "special" is defined as a key that somewhere has a value of 100
A 100
B 400
A hello
B world
C 100
I would return
A 100
A hello
C 100
How to do this in bash?
#!/bin/bash
special=100
awk -v s=$special '
{
a[$1,$2]
if($2 ~ s)
k[$1]
}
END
{
for(key in k)
for(pair in a)
{
split(pair,b,SUBSEP)
if(b[1] == key)
print b[1],b[2]
}
}' ./infile
Proof of Concept
$ special=100; echo -e "A 100\nB 400\nA hello\nB world\nC 100" | awk -v s=$special '{a[$1,$2];if($2 ~ s)k[$1]}END{for(key in k)for(pair in a){split(pair,b,SUBSEP); if(b[1] == key)print b[1],b[2]}}'
A hello
A 100
C 100
This would also work:
id=`grep "\<$special\>$" yourfile | sed -e "s/$special//"`
[ -z "$id" ] || grep "^$id" yourfile
Returns:
If special=100
A 100
A hello
C 100
If special="hello"
A 100
A hello
If special="A"
(nothing)
If special="ello"
(nothing)
Notes
drop the \<\> if you want partial match
add | uniq at the end if there is a possibility of multiple entrances of the same pair (A 100, A 100, ...) but you don't want that in your output.
***** script *****
#!/bin/bash
grep " $1" data.txt | cut -d ' ' -f1 | grep -f /dev/fd/0 data.txt
result:
./test.sh 100
A 100
A hello
C 100
***** inline *****
the first grep must contain the 'special' preceded by a space ' ':
grep " 100" data.txt | cut -d ' ' -f1 | grep -f /dev/fd/0 data.txt
A 100
A hello
C 100
awk -v special="100" '$2==special{a[$1]}($1 in a)' file
Whew! My bash was incredibly rusty! Hope this helps:
FILE=$1
IFS=$'\n' # Internal File Sep, so as to avoid splitting in whitespaces
FIND="100"
KEEP=""
for line in `cat $FILE`; do
key=`echo $line | cut -d \ -f1`;
value=`echo $line | cut -d \ -f2`;
echo "$key = $value"
if [ "$value" == "$FIND" ]; then
KEEP="$key $KEEP"
fi
done
echo "Keys to keep: $KEEP"
# You can now do whatever you want with those keys.

How to verify information using standard linux/unix filters?

I have the following data in a Tab delimited file:
_ DATA _
Col1 Col2 Col3 Col4 Col5
blah1 blah2 blah3 4 someotherText
blahA blahZ blahJ 2 someotherText1
blahB blahT blahT 7 someotherText2
blahC blahQ blahL 10 someotherText3
I want to make sure that the data in 4th column of this file is always an integer. I know how to do this in perl
Read each line, Store value of 4th column in a variable
check if that variable is an integer
if above is true, continue the loop
else break out of the loop with message saying file data not correct
But how would I do this in a shell script using standard linux/unix filter? My guess would be to use grep, but I am not sure how?
cut -f4 data | LANG=C grep -q '[^0-9]' && echo invalid
LANG=C for speed
-q to quit at first error in possible long file
If you need to strip the first line then use tail -n+2 or you could get hacky and use:
cut -f4 data | LANG=C sed -n '1b;/[^0-9]/{s/.*/invalid/p;q}'
awk is the tool most naturally suited for parsing by columns:
awk '{if ($4 !~ /^[0-9]+$/) { print "Error! Column 4 is not an integer:"; print $0; exit 1}}' data.txt
As you get more complex with your error detection, you'll probably want to put the awk script in a file and invoke it with awk -f verify.awk data.txt.
Edit: in the form you'd put into verify.awk:
{
if ($4 !~/^[0-9]+$/) {
print "Error! Column 4 is not an integer:"
print $0
exit 1
}
}
Note that I've made awk exit with a non-zero code, so that you can easily check it in your calling script with something like this in bash:
if awk -f verify.awk data.txt; then
# action for success
else
# action for failure
fi
You could use grep, but it doesn't inherently recognize columns. You'd be stuck writing patterns to match the columns.
awk is what you need.
I can't upvote yet, but I would upvote Jefromi's answer if I could.
Sometimes you need it BASH only, because tr, cut & awk behave differently on Linux/Solaris/Aix/BSD/etc:
while read a b c d e ; do [[ "$d" =~ ^[0-9] ]] || echo "$a: $d not a numer" ; done < data
Edited....
#!/bin/bash
isdigit ()
{
[ $# -eq 1 ] || return 0
case $1 in
*[!0-9]*|"") return 0;;
*) return 1;;
esac
}
while read line
do
col=($line)
digit=${col[3]}
if isdigit "$digit"
then
echo "err, no digit $digit"
else
echo "hey, we got a digit $digit"
fi
done
Use this in a script foo.sh and run it like ./foo.sh < data.txt
See tldp.org for more info
Pure Bash:
linenum=1; while read line; do field=($line); if ((linenum>1)); then [[ ! ${field[3]} =~ ^[[:digit:]]+$ ]] && echo "FAIL: line number: ${linenum}, value: '${field[3]}' is not an integer"; fi; ((linenum++)); done < data.txt
To stop at the first error, add a break:
linenum=1; while read line; do field=($line); if ((linenum>1)); then [[ ! ${field[3]} =~ ^[[:digit:]]+$ ]] && echo "FAIL: line number: ${linenum}, value: '${field[3]}' is not an integer" && break; fi; ((linenum++)); done < data.txt
cut -f 4 filename
will return the fourth field of each line to stdout.
Hopefully that's a good start, because it's been a long time since I had to do any major shell scripting.
Mind, this may well not be the most efficient compared to iterating through the file with something like perl.
tail +2 x.x | sort -n -k 4 | head -1 | cut -f 4 | egrep "^[0-9]+$"
if [ "$?" == "0" ]
then
echo "file is ok";
fi
tail +2 gives you all but the first line (since your sample has a header)
sort -n -k 4 sorts the file numerically on the 4th column, letters will rise to the top.
head -1 gives you the first line of the file
cut -f 4 gives you the 4th column, of the first line
egrep "^[0-9]+$" checks if the value is a number (integers in this case).
If egrep finds nothing, $? is 1, otherwise it's 0.
There's also:
if [ `tail +2 x.x | wc -l` == `tail +2 x.x | cut -f 4 | egrep "^[0-9]+$" | wc -l` ] then
echo "file is ok";
fi
This will be faster, requiring two simple scans through the file, but it's not a single pipeline.
#OP, use awk
awk '$4+0<=0{print "not ok";exit}' file

Resources