Skip lines starting with a character and delete lines matching second column lesser than a value - shell

I have file with following format :
Qil
Lop
A D E
a 1 10
b 2 21
c 3 22
d 4 5
3 5 9
I need to skip lines that start with pattern 'Qil' or 'Lop' or 'A D E' and ones where the third column has a value greater than 10 and save the entire thing in 2 different files with formats as shown below.
Example output files :
Output file 1
Qil
Lop
A D E
a 1 10
d 4 5
3 5 9
Output file 2
a
d
3
My code :
while read -r line; if [[ $line == "A" ]] ||[[ $line == "Q" ]]||[[ $line == "L" ]] ; then
awk '$2 < "11" { print $0 }' test.txt
awk '$2 < "11" { print $1 }' test1.txt
done < input.file

Could you please try following.
awk '
/^Qil$|^Lop$|^A D E$/{
val=(val?val ORS:"")$0
next
}
$3<=10{
if(!flag){
print val > "file1"
flag=1
}
print > "file1"
if(!a[$1]++){
print $1> "file2"
}
}' Input_file
This will create 2 output files named file1 and file2 as per OP's requirements.

This can be done in a single awk:
awk '$1 !~ /^[QLA]/ && $2 <= 10' file
1 10
4 5
5 9
If you want to print only first column then use:
awk '$1 !~ /^[QLA]/ && $2 <= 10 { print $1 }' file
1
4
5

Related

Add x^2 to every "nonzero" coefficient with sed/awk

I have to write, as easy as possible, a script or command which has to use awk or/and sed.
Input file:
23 12 0 33
3 4 19
1st line n=3
2nd line n=2
In each line of file we have string of numbers. Each number is coefficient and we have to add x^n where n is the highest power (sum of spaces between numbers in each line (no space after last number in each line)) and if we have "0" in our string we have to skip it.
So for that input we will have output like:
23x^3+12x^2+33
3x^2+4x+19
Please help me to write a short script solving that problem. Thank you so much for your time and all the help :)
My idea:
linescount=$(cat numbers|wc -l)
linecounter=1
While[linecounter<=linescount];
do
i=0
for i in spaces=$(cat numbers|sed 1p | sed " " )
do
sed -i 's/ /x^spaces/g'
i=($(i=i-1))
done
linecounter=($(linecounter=linecounter-1))
done
Following awk may help you on same too.
awk '{for(i=1;i<=NF;i++){if($i!="" && $i){val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))} else {pointer++}};if(val){print val};val=""} pointer==NF{print;} {pointer=""}' Input_file
Adding a non-one liner form of solution too here.
awk '
{
for(i=1;i<=NF;i++){
if($i!="" && $i){
val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))}
else {
pointer++}};
if(val) {
print val};
val=""
}
pointer==NF {
print}
{
pointer=""
}
' Input_file
EDIT: Adding explanation too here for better understanding of OP and all people's learning here.
awk '
{
for(i=1;i<=NF;i++){ ##Starting a for loop from variable 1 to till the value of NF here.
if($i!="" && $i){ ##checking if variable i value is NOT NULL then do following.
val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))} ##creating variable val here and putting conditions here if val is NULL then
##simply take value of that field else concatenate the value of val with its
##last value. Second condition is to check if last field of line is there then
##keep it like that else it is second last then print "x" along with it else keep
##that "x^" field_number-1 with it.
else { ##If a field is NULL in current line then come here.
pointer++}}; ##Increment the value of variable named pointer here with 1 each time it comes here.
if(val) { ##checking if variable named val is NOT NULL here then do following.
print val}; ##Print the value of variable val here.
val="" ##Nullifying the variable val here.
}
pointer==NF { ##checking condition if pointer value is same as NF then do following.
print} ##Print the current line then, seems whole line is having zeros in it.
{
pointer="" ##Nullifying the value of pointer here.
}
' Input_file ##Mentioning Input_file name here.
Offering a Perl solution since it has some higher level constucts than bash that make the code a little simpler:
use strict;
use warnings;
use feature qw(say);
my #terms;
while (my $line = readline(*DATA)) {
chomp($line);
my $degree = () = $line =~ / /g;
my #coefficients = split / /, $line;
my #terms;
while ($degree >= 0) {
my $coefficient = shift #coefficients;
next if $coefficient == 0;
push #terms, $degree > 1
? "${coefficient}x^$degree"
: $degree > 0
? "${coefficient}x"
: $coefficient;
}
continue {
$degree--;
}
say join '+', #terms;
}
__DATA__
23 12 0 33
3 4 19
Example output:
hunter#eros  ~  perl test.pl
23x^3+12x^2+33
3x^2+4x+19
Any information you want on any of the builtin functions used above: readline, chomp, push, shift, split, say, and join can be found in perldoc with perldoc -f <function-name>
$ cat a.awk
function print_term(i) {
# Don't print zero terms:
if (!$i) return;
# Print a "+" unless this is the first term:
if (!first) { printf " + " }
# If it's the last term, just print the number:
if (i == NF) printf "%d", $i
# Leave the coefficient blank if it's 1:
coef = ($i == 1 ? "" : $i)
# If it's the penultimate term, just print an 'x' (not x^1):
if (i == NF-1) printf "%sx", coef
# Print a higher-order term:
if (i < NF-1) printf "%sx^%s", coef, NF - i
first = 0
}
{
first = 1
# print all the terms:
for (i=1; i<=NF; ++i) print_term(i)
# If we never printed any terms, print a "0":
print first ? 0 : ""
}
Example input and output:
$ cat file
23 12 0 33
3 4 19
0 0 0
0 1 0 1
17
$ awk -f a.awk file
23x^3 + 12x^2 + 33
3x^2 + 4x + 19
0
x^2 + 1
17
$ cat ip.txt
23 12 0 33
3 4 19
5 3 0
34 01 02
$ # mapping each element except last to add x^n
$ # -a option will auto-split input on whitespaces, content in #F array
$ # $#F will give index of last element (indexing starts at 0)
$ # $i>0 condition check to prevent x^0 for last element
$ perl -lane '$i=$#F; print join "+", map {$i>0 ? $_."x^".$i-- : $_} #F' ip.txt
23x^3+12x^2+0x^1+33
3x^2+4x^1+19
5x^2+3x^1+0
34x^2+01x^1+02
$ # with post processing
$ perl -lape '$i=$#F; $_ = join "+", map {$i>0 ? $_."x^".$i-- : $_} #F;
s/\+0(x\^\d+)?\b|x\K\^1\b//g' ip.txt
23x^3+12x^2+33
3x^2+4x+19
5x^2+3x
34x^2+01x+02
One possibility is:
#!/usr/bin/env bash
line=1
linemax=$(grep -oEc '(( |^)[0-9]+)+' inputFile)
while [ $line -lt $linemax ]; do
degree=$(($(grep -oE ' +' - <<<$(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1) | cut -d : -f 1 | uniq -c)+1))
coeffs=($(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1))
i=0
while [ $i -lt $degree ]; do
if [ ${coeffs[$i]} -ne 0 ]; then
if [ $(($degree-$i-1)) -gt 1 ]; then
echo -n "${coeffs[$i]}x^$(($degree-$i-1))+"
elif [ $(($degree-$i-1)) -eq 1 ]; then
echo -n "${coeffs[$i]}x"
else
echo -n "${coeffs[$i]}"
fi
fi
((i++))
done
echo
((line++))
done
The most important lines are:
# Gets degree of the equation
degree=$(($(grep -oE ' +' - <<<$(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1) | cut -d : -f 1 | uniq -c)+1))
# Saves coefficients in an array
coeffs=($(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1))
Here, grep -oE '(( |^)[0-9]+)+' finds lines containing only numbers (see edit). grep -oE ' +' - ........... |cut -d : -f 1 |uniq counts the numbers of coefficients per line as explained in this question.
Edit: An improved regex for capturing lines with only numbers is
grep -E '(( |^)[0-9]+)+' inputfile | grep -v '[a-zA-Z]'
sed -r "s/(.*) (.*) (.*) (.*)/\1x^3+\2x^2+\3x+\4/; \
s/(.*) (.*) (.*)/\1x^2+\2x+\3/; \
s/\+0x(^.)?\+/+/g; \
s/^0x\^.[+]//g; \
s/\+0$//g;" koeffs.txt
Line 1: Handle 4 elements
Line 2: Handle 3
Line 3: Handle 0 in the middle
Line 5: Handle 0 at start
Line 5: Handle 0 at end
Here is a more bashy, less sedy answer which is better readable, than the sed one, I think:
#!/bin/bash
#
# 0 4 12 => 12x^3
# 2 4 12 => 12x
# 3 4 12 => 12
term () {
p=$1
leng=$2
fac=$3
pot=$((leng - 1 - p))
case $pot in
0) echo -n '+'${fac} ;;
1) echo -n '+'${fac}x ;;
*) echo -n '+'${fac}x^$pot ;;
esac
}
handleArray () {
# mapfile puts a counter into the array, starting with 0 for the 1st
# get rid of it!
shift
coeffs=($*)
# echo ${coeffs[#]}
cnt=0
len=${#coeffs[#]}
while (( cnt < len ))
do
if [[ ${coeffs[$cnt]} != 0 ]]
then
term $cnt $len ${coeffs[$cnt]}
fi
((cnt++))
done
echo # -e '\n' # extra line for dbg, together w. line 5 of the function.
}
mapfile -n 0 -c 1 -C handleArray < ./koeffs.txt coeffs | sed -r "s/^\++//;s/\++$//;"
The mapfile reads data and produces an array. See help mapfile for a brief syntax introduction.
We need some counting, to know, to which power to raise. Meanwhile we try to get rid of 0-terms.
In the end I use sed to remove leading and trailing plusses.
sh solution
while read line ; do
set -- $line
while test $1 ; do
i=$(($#-1))
case $1 in
0) ;;
*) case $i in
0) j="" ;;
1) j="x" ;;
*) j="x^$i" ;;
esac
result="$result$1$j+";;
esac
shift
done
echo "${result%+}"
result=""
done < infile
$ cat tst.awk
{
out = sep = ""
for (i=1; i<=NF; i++) {
if ($i != 0) {
pwr = NF - i
if ( pwr == 0 ) { sfx = "" }
else if ( pwr == 1 ) { sfx = "x" }
else { sfx = "x^" pwr }
out = out sep $i sfx
sep = "+"
}
}
print out
}
$ awk -f tst.awk file
23x^3+12x^2+33
3x^2+4x+19
First, my test set:
$ cat file
23 12 0 33
3 4 19
0 1 2
2 1 0
Then the awk script:
$ awk 'BEGIN{OFS="+"}{for(i=1;i<=NF;i++)$i=$i (NF-i?"x^" NF-i:"");gsub(/(^|\+)0(x\^[0-9]+)?/,"");sub(/^\+/,""}1' file
23x^3+12x^2+33
3x^2+4x^1+19
1x^1+2
2x^2+1x^1
And an explanation:
$ awk '
BEGIN {
OFS="+" # separate with a + (negative values
} # would be dealt with in gsub
{
for(i=1;i<=NF;i++) # process all components
$i=$i (NF-i?"x^" NF-i:"") # add x and exponent
gsub(/(^|\+)0(x\^[0-9]+)?/,"") # clean 0s and leftover +s
sub(/^\+/,"") # remore leading + if first component was 0
}1' file # output
This might work for you (GNU sed);)
sed -r ':a;/^\S+$/!bb;s/0x\^[^+]+\+//g;s/\^1\+/+/;s/\+0$//;b;:b;h;s/\S+$//;s/\S+\s+/a/g;s/^/cba/;:c;s/(.)(.)\2\2\2\2\2\2\2\2\2\2/\1\1\2/;tc;s/([a-z])\1\1\1\1\1\1\1\1\1/9/;s/([a-z])\1\1\1\1\1\1\1\1/8/;s/([a-z])\1\1\1\1\1\1\1/7/;s/([a-z])\1\1\1\1\1\1/6/;s/([a-z])\1\1\1\1\1/5/;s/([a-z])\1\1\1\1/4/;s/([a-z])\1\1\1/3/;s/([a-z])\1\1/2/;s/([a-z])\1/1/;s/[a-z]/0/g;s/^0+//;G;s/(.*)\n(\S+)\s+/\2x^\1+/;ba' file
This is not a serious solution!
Shows how sed can count, kudos goes to Greg Ubben back in 1989 when he wrote wc in sed!

UPDATED: Bash + Awk : Print first X(dynamic) columns and always last column

#file test.txt
a b c 5
d e f g h 7
gg jj 2
Say X = 3 I need the output like this:
#file out.txt
a b c 5
d e f 7
gg jj 2
NOT this:
a b c 5
d e f 7
gg jj 2 2 <--- WRONG
I've gotten to this stage:
cat test.txt | awk ' { print $1" "$2" "$3" "NF } '
If you're unsure of the total number of fields, then one option would be to use a loop:
awk '{ for (i = 1; i <= 3 && i < NF; ++i) printf "%s ", $i; print $NF }' file
The loop can be avoided by using a ternary:
awk '{ print $1, $2, (NF > 3 ? $3 OFS $NF : $3) }' file
This is slightly more verbose than the approach suggested by 123 but means that you aren't left with trailing white space on the lines with three fields. OFS is the Output Field Separator, a space by default, which is what print inserts between fields when you use a ,.
Use a $ combined with NF :
cat test.txt | awk ' { print $1" "$2" "$3" "$NF } '

Find a number of a file in a range of numbers of another file

I have this two input files:
file1
1 982444
1 46658343
3 15498261
2 238295146
21 47423507
X 110961739
17 7490379
13 31850803
13 31850989
file2
1 982400 982480
1 46658345 46658350
2 14 109
2 5000 9000
2 238295000 238295560
X 110961739 120000000
17 7490200 8900005
And this is my desired output:
Desired output:
1 982444
2 238295146
X 110961739
17 7490379
This is what I want: Find the column 1 element of file1 in column 1 of file2. If the number is the same, take the number of column 2 of file1 and check if it is included in the range of numbers of column2 and 3 of file2. If it is included, print the line of file1 in the output.
Maybe is a little confusing to understand, but I'm doing my best. I have tried some things but I'm far away from the solution and any help will be really appreciated. In bash, awk or perl please.
Thanks in advance,
Just using awk. The solution doesn't loop through file1 repeatedly.
#!/usr/bin/awk -f
NR == FNR {
# I'm processing file2 since NR still matches FNR
# I'd store the ranges from it on a[] and b[]
# x[] acts as a counter to the number of range pairs stored that's specific to $1
i = ++x[$1]
a[$1, i] = $2
b[$1, i] = $3
# Skip to next record; Do not allow the next block to process a record from file2.
next
}
{
# I'm processing file1 since NR is already greater than FNR
# Let's get the index for the last range first then go down until we reach 0.
# Nothing would happen as well if i evaluates to nothing i.e. $1 doesn't have a range for it.
for (i = x[$1]; i; --i) {
if ($2 >= a[$1, i] && $2 <= b[$1, i]) {
# I find that $2 is within range. Now print it.
print
# We're done so let's skip to the next record.
next
}
}
}
Usage:
awk -f script.awk file2 file1
Output:
1 982444
2 238295146
X 110961739
17 7490379
A similar approach using Bash (version 4.0 or newer):
#!/bin/bash
FILE1=$1 FILE2=$2
declare -A A B X
while read F1 F2 F3; do
(( I = ++X[$F1] ))
A["$F1|$I"]=$F2
B["$F1|$I"]=$F3
done < "$FILE2"
while read -r LINE; do
read F1 F2 <<< "$LINE"
for (( I = X[$F1]; I; --I )); do
if (( F2 >= A["$F1|$I"] && F2 <= B["$F1|$I"] )); then
echo "$LINE"
continue
fi
done
done < "$FILE1"
Usage:
bash script.sh file1 file2
Let's mix bash and awk:
while read col min max
do
awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1
done < f2
Explanation
For each line of file2, read the min and the max, together with the value of the first column.
Given these values, check in file1 for those lines having same first column and being 2nd column in the range specified by file 2.
Test
$ while read col min max; do awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1; done < f2
1 982444
2 238295146
X 110961739
17 7490379
Pure bash , based on Fedorqui solution:
#!/bin/bash
while read col_2 min max
do
while read col_1 val
do
(( col_1 == col_2 && ( min <= val && val <= max ) )) && echo $col_1 $val
done < file1
done < file2
cut -d' ' -f1 input2 | sed 's/^/^/;s/$/\\s/' | \
grep -f - <(cat input2 input1) | sort -n -k1 -k3 | \
awk 'NF==3 {
split(a,b,",");
for (v in b)
if ($2 <= b[v] && $3 >= b[v])
print $1, b[v];
if ($1 != p) a=""}
NF==2 {p=$1;a=a","$2}'
Produces:
X 110961739
1 982444
2 238295146
17 7490379
Here's a Perl solution. It could be much faster but less concise if I built a hash out of file2, but this should be fine.
use strict;
use warnings;
use autodie;
my #bounds = do {
open my $fh, '<', 'file2';
map [ split ], <$fh>;
};
open my $fh, '<', 'file1';
while (my $line = <$fh>) {
my ($key, $val) = split ' ', $line;
for my $bound (#bounds) {
next unless $key eq $bound->[0] and $val >= $bound->[1] and $val <= $bound->[2];
print $line;
last;
}
}
output
1 982444
2 238295146
X 110961739
17 7490379

printing variable number lines to output

I would like to have a script to modify some large text files (100k records) such that, for every record, a number of lines in the output is created equivalent to the difference in columns 3 and 2 of every input line. In the output I want to print the record name (column 1), and a step-wise walk between the numbers contained in columns 2 and 3.
Sample trivial input could be (tab separated data, if it makes a difference)
a 3 5
b 10 14
with the desired output (again, ideally tab separated)
a 3 4
a 4 5
b 10 11
b 11 12
b 12 13
b 13 14
It's a challenge sadly beyond my (very) limited abilities.
Can anyone provide a solution to the problem, or point me in the right direction? In an ideal world I would be able to be integrate this into a bash script, but I'll take anything that works!
Bash solution:
while read h f t ; do
for ((i=f; i<t; i++)) ; do
printf "%s\t%d\t%d\n" $h $i $((i+1))
done
done < input.txt
Perl solution:
perl -lape '$_ = join "\n", map join("\t", $F[0], $_, $_ + 1), $F[1] .. $F[2] - 1' input.txt
awk -F '\t' -v OFS='\t' '
$2 >= $3 {print; next}
{for (i=$2; i<$3; i++) print $1, i, i+1}
' filename
With awk:
awk '$3!=$2 { while (($3 - $2) > 1) { print $1,$2,$2+1 ; $2++} }1' inputfile
Fully POSIX, and no unneeded loop variables:
$ while read h f t; do
while test $f -lt $t; do
printf "%s\t%d\t%d\n" "$h" $f $((++f))
done
done < input.txt
a 3 4
a 4 5
b 10 11
b 11 12
b 12 13
b 13 14

awk with nested if else

I have a tab delimited two column data. I want to get the third based on the condition applied on second column.
if second column is not equal to zero it should print col 1 and 3 and ratio of col1/col2
if col two is zero and col one is more than 15 than it should print col 1 and col2 and the value in col1 (in col 3) else (when col1<=15 & col2 is 0) it should print col1 col2 and 0.
for example, for a file like this
1 2
4 5
6 7
14 0
18 0
the output should be
1 2 0.5
4 5 0.8
6 7 0.85
14 0 0
18 0 18
What I have tried:
awk '{if ($2!=0) print $1 "\t" $2 "\t" $1/$2; elseif($2>15) print $1 "\t" $2 "\t" $1 ; else print $1 "\t" $2 "\t" $2}'<tags| head
Obviously I am doing something wrong, please help me in getting the above code right.
Thank you
Slightly different way:
awk '{if($2!=0) $3=$1/$2; else if($1>15) $3=$1; else $3=0}1' OFS='\t' file
Determined by the order of the if clause:
awk '{$3=0} $1>15{$3=$1} $2{$3=$1/$2}1' OFS='\t' file
or the cryptic version:
awk '{$3=$2?$1/$2:$1>15?$1:0}1' OFS='\t' file
a funny but unreadable(maybe) :) one-liner:
awk '{$0=$2?$1FS$2FS$1/$2:$1>15?$1FS$2FS$1:$1FS$2FS"0"}1' file
short explaination:
a=boolean? first : second
this means assign var a, if boolean true, using value first, otherwise use value second.
I set `$0 = $2? FOO : BAR`
FOO part: $1 FS $2 FS $1/$2
BAR part: $1>15? FOO2 : BAR2
FOO2 part: $1 FS $2 FS $1
BAR2 part: $1 FS $2 FS "0"
finally, print $0
Problem in your code
chang elseif -> else if also check $1 with 15, not $2 then your oneliner works too.
Here's another alternative:
awk '!$2 { $3 = $1>15 ? $1 : 0 } $2 { $3 = $1/$2 } 1' OFS='\t' CONVFMT='%.2g'
Output:
1 2 0.5
4 5 0.8
6 7 0.86
14 0 0
18 0 18
awk '{$3=$1>=15 && $2==0?$1:$1<15 && $2==0?0:$1/$2}1' your_file

Resources