Search for a value in a file and remove subsequent lines - shell

I'm developing a shell script but I am stuck with the below part.
I have the file sample.txt:
S.No Sub1 Sub2
1 100 200
2 100 200
3 100 200
4 100 200
5 100 200
6 100 200
7 100 200
I want to search the S.No column in sample.txt. For example if I'm searching the value 5 I need the rows up to 5 only I don't want the rows after the value of in S.NO is larger than 5.
the output must look like, output.txt
S.No Sub1 Sub2
1 100 200
2 100 200
3 100 200
4 100 200
5 100 200

Print the first line and any other line where the first field is less than or equal to 5:
$ awk 'NR==1||$1<=5' file
S.No Sub1 Sub2
1 100 200
2 100 200
3 100 200
4 100 200
5 100 200

Using perl:
perl -ane 'print if $F[$1]<=5' file

And the sed solution
n=5
sed "/^$n[[:space:]]/q" filename
The sed q command exits after printing the current line

The suggested awk relies on that column 1 is numeric sorted. A generic awk that fulfills the question title would be:
gawk -v p=5 '$1==p {print; exit} {print}'
However, in this situation, sed is better IMO. Use -i to modify the input file.

sed '6q' sample.txt > output.txt

Related

Replacing the value of specific field in a table-like string stored as bash variable

I am looking for a way to replace (with 0) a specific value (1043252782) in a "table-like" string stored as a bash variable. The output of echo "$var"looks like this:
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 1043252782
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
After the replacement echo "$var" should look like this:
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
Is there a way to do this without saving the content of $var to a file and directly manipulating it within the bash (shell script)?
Maby with awk? I can select the value in the 10th field of the second record with awk and pattern matching ("7 Seek_Error_Rate ....") like this:
echo "$var" | awk '/^ 7/{print $10}'
Maby there is some way doing it with awk (or other cli-tool) to replace it and store it back into $var? Also, the value changes over time, but the structure remains the same (some record at the 10th field).
You can change a specific string directly in the shell:
var=${var/1043252782/0}
To replace final number of second line, you could use awk or sed:
var=$(awk 'NR==2 { sub(/[0-9]+$/,0) }1' <<<"$var")
var=$(sed '2s/[0-9][0-9]*$/0/' <<<"$var")
If you don't know which line it will be, you can match a known string:
var=$(awk '/Seek_Error_Rate/{ sub(/[0-9]+$/,0) }1' <<<"$var")
var=$(sed '/Seek_Error_Rate/s/[0-9][0-9]*$/0/' <<<"$var")
You can use a here-string to feed the variable as input to awk.
Use sub() to perform a regular expression replacement.
var=$(awk '{sub(/1043252782$/, "0")}1' <<<"$var")
Using sed
$ var=$(sed '/1043252782$/s//0/' <<< "$var")
$ echo "$var"
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
if you don't wanna ruin formatting of tabs and spaces :
{m,g}wk NF=NF FS=' 1043252782$' OFS=' 0'
:
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
or doing the whole file in one single shot :
awk NF=NF FS=' 1043252782\n' OFS=' 0\n' RS='^$' ORS=
awk NF=NF FS=' 1043252782\n' OFS=' 0\n' RS= -- (This might work too but I'm not too well versed in any side effects for blank RS)

Handling ties when ranking in bash

Let's say I have a list of numbers that are already sorted as below
100
222
343
423
423
500
What I want is to create a rank field such that same values are assigned the same rank
100 1
222 2
343 3
423 4
423 4
500 5
I have been using the following piece of code to mimic a rank field
awk '{print $0, NR}' file
That gives me below, but it's technically a rownumber.
100 1
222 2
343 3
423 4
423 5
500 6
How do I go about this? I am an absolute beginner in bash so would really appreciate if you could add a little explanation for learning sake.
That's a job for awk:
$ awk '{if($0!=p)++r;print $0,r;p=$0}' file
Output:
100 1
222 2
343 3
423 4
423 4
500 5
Explained:
$ awk '{ # using awk
if($0!=p) # if the value does not equal the previous value
++r # increase the rank
print $0,r # output value and rank
p=$0 # store value for next round
}' file
Could you please try following.
awk 'prev==$0{--count} {print $0,++count;prev=$1}' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk code from here.
prev==$0 ##Checking condition if variable prev is equal to current line then do following.
{
--count ##Subtract count variable with 1 here.
}
{
print $0,++count ##Printing current line and variable count with increasing value of it.
prev=$1 ##Setting value of prev to 1st field of current line.
}
' Input_file ##Mentioning Input_file name here.
another awk
$ awk '{print $1, a[$1]=a[$1]?a[$1]:++c}' file
100 1
222 2
343 3
423 4
423 4
500 5
where the file is not need to be sorted, for example after adding a new 423 at the end of the file
$ awk '{print $1, a[$1]=a[$1]?a[$1]:++c}' file
100 1
222 2
343 3
423 4
423 4
500 5
423 4
increment the rank counter a for new value observed, otherwise use the registered value for the key. since c is initialized to zero, pre-increment the value. This will use the same rank value for the same key regardless or the position.

process second column if first column matches

I just want the second column to be multiplied by exp(3) if the first column matches the parameter I define.
cat inputfile.i
100 2
200 3
300 1
100 5
200 2
300 3
I want the output to be:
100 2
200 60.25
300 1
100 5
200 40.17
300 3
I tried this code:
awk ' $1 == "200" {print $2*exp(3)}' inputfile
but nothing actually shows
you are not printing the unmatched lines, you don't need to quote numbers
$ awk '$1==200{$2*=exp(3)}1' file
100 2
200 60.2566
300 1
100 5
200 40.1711
300 3
Is there a difference between inputfile.i and inputfile?
Anyway, here is my solution for you:
awk '$1 == 200 {printf "%s %.2f\n",$1,$2*exp(3)};$1 != 200 {print $0}' inputfile.i
100 2
200 60.26
300 1
100 5
200 40.17
300 3

Divide column values of different files by a constant then output one minus the other

I have two files of the form
file1:
#fileheader1
0 123
1 456
2 789
3 999
4 112
5 131
6 415
etc.
file2:
#fileheader2
0 442
1 232
2 542
3 559
4 888
5 231
6 322
etc.
How can I take the second column of each, divide it by a value then minus one from the other and then output a new third file with the new values?
I want the output file to have the form
#outputheader
0 123/c-422/k
1 456/c-232/k
2 789/c-542/k
etc.
where c and k are numbers I can plug into the script
I have seen this question: subtract columns from different files with awk
But I don't know how to use awk to do this by myself, does anyone know how to do this or could explain what is going on in the linked question so I can try to modify it?
I'd write:
awk -v c=10 -v k=20 ' ;# pass values to awk variables
/^#/ {next} ;# skip headers
FNR==NR {val[$1]=$2; next} ;# store values from file1
$1 in val {print $1, (val[$1]/c - $2/k)} ;# perform the calc and print
' file1 file2
output
0 -9.8
1 34
2 51.8
3 71.95
4 -33.2
5 1.55
6 25.4
etc. 0

Assigning divisions and subdivisions (A.1 and A.1.1) to data in a file

I have an example input of the following data in a file.
start end
chr1 100 300
chr2 200 400
The "start" and "end" indicate the length of the region. So, for "chr1" the region length is 200. For "chr2" the length is 200.
I assigned each "chr" region with a "name" using awk'{print$0 "\tA." NR} to produce :
start end name
chr1 100 300 A.1
chr2 200 400 A.2
What I want to do next is to break chr1 into 2 parts by splitting region length into 100 each, and named each part with A.1.1 and A.1.2 (to indicate that they used to be 1 part, but is split into 2). And the same with "chr2." So that they look like this:
start end name
chr1 100 200 A.1.1
chr1 201 300 A.1.2
chr2 200 300 A.2.1
chr2 301 400 A.2.2
So, my question is for the very last part. Would it be possible to use awk or something that can work with awk (since I already use awk for the first part) to solve this? if so, how would you do that?
Thanks for the help guys.
Using the following input:
chr1 100 300
chr2 200 400
I have kept the script simple so that you can follow what exactly is being done. You can bypass the intermediate step you are doing as the following will get that done.
awk -v OFS="\t" '
{
offset = 0;
range = int(($3-$2)/100);
start = $2;
end = $3;
for (iter=1; iter<=range; iter++) {
print $1, start+offset, (iter==range?end:start+100), "A."NR"."iter;
offset = 1;
start+=100
}
}' file
chr1 100 200 A.1.1
chr1 201 300 A.1.2
chr2 200 300 A.2.1
chr2 301 400 A.2.2
We create three variables, iter, start, and end that gets initialized to 0 for every line. We calculate the range from start and end. We loop for the range to print column1, start range, start+100 along with character A followed by line number and the iteration number.
We initialize the offset to 1 so that next range doesn't start from the end of first.
There is a ternary test (iter==range?end:start+100) which basically checks if we are towards the end of range. If we are we use the end number. This is to handle cases where your lines would be chr1 100 150 etc.
$ awk '$1!=prev{++cnt} {print $0 "\tA." cnt "." ++seen[$1]; prev=$1}' file
chr1 100 200 A.1.1
chr1 201 300 A.1.2
chr2 200 300 A.2.1
chr2 301 400 A.2.2

Resources