Bash function to output a line that contains a searched string, with caveats - bash

TL;DR: grep for a string in a file and output the match that is closest before a given line number.
I'm trying to write a bash command line function to find a line within a file that contains a given string.
I don't know bash enough to write it; though I've been trying.
The input to the function is:
The string to search.
The file to search.
A line number; the match for the string will be somewhere before this line.
The match will definetly exist within the file.
There may be multiple matches for the string within the file, but the line that I'm interested in is the closest
to the line number, in reserve. For example, if the passed line number is 100, and there were matches on
lines 5, 30, 77, then I'm interested in the match at line 77.
I want the output of the function to be the value of the line (the whole string)
contains the searched string. Example
some foo value
some bar value
some zee value
and the search was for 'bar'. I want the output to be:
some bar value

head -n $num $file | tac | grep -m 1 $string
head -n to get first n lines
tac will reverse the order of lines returned by head to put the closest match at the top
grep -m 1 will return only the first line matching the string

Related

Unix command to cut and check a specific column in a file [duplicate]

I'm a novice using grep/egrep/awk and have not wrapped my head around regular expressions (bonus: a link to an introduction to regex for someone who has zero programming experience would be great).
My question revolves around matching a number range within a flat file. I have values which are ten digits. Telephone numbers...
I'm attempting to match a range of numbers that move across a range for example.
55512122041 through 55512122050 (41, 42, 43, 44, 45, 46, 47, 48, 49, and 50).
I have been using grep to match the first value like this.
grep 555121204[1-9]
Next step is I grep for the final digit
grep 55512122050
I believe that I have not found the right way to use a regex to allow one grep.
Try the below grep command which uses P(Perl regex) parameter,
grep -P '55512120(?:4[1-9]|50)' file
OR
grep -E '555121204[1-9]|5551212050' file
This would print the lines which has the number ranges from 55512122041 to 55512122050.
If you want to print only the number then add o parameter to the above grep command.
grep -oP '55512120(?:4[1-9]|50)' file
Example:
$ cat file
bar foo
5551212040 Don't match
5551212041 Match
5551212050 Match
foo bar
$ grep -P '55512120(?:4[1-9]|50)' file
5551212041 Match
5551212050 Match
For the general case, where the number range is not easy to express as a regex, Awk is probably better, as it has proper support for arithmetic.
awk '(($1 > 123) && ($1 < 1024)) || (($1 > 2048) && ($1 < 65536))' file
This prints the entire matching line; if you only want to print the second field, add { print $2 } etc.
You can learn enough Awk to figure this out on your own with a good tutorial and 30 minutes; see the Stack Overflow awk tag info page for pointers.

search a string from file which contains only specific symbol and number in unix through Grep command

I want to search a line from file which contains "+", number and character. If any other character available in string then complete string should be discarded.
A1264
13255
1255+*
*6_54
54789+
Output should be
A1264
13255
54789+
Record number 3 and 4 should not come as it contains some other character also.
You can try something like:
grep -E '^[a-zA-Z0-9\+]+$'
This will accept only a to z chars (small and caps), digits and + sign
If you have other symbols you can edit the command line:
# grep -E '^[a-fA-F0-9©]+$' a1
A1264
13255
54789©

Replace k-th to n-th characters in 1st line and last line using bash?

I want to replace some characters in header and footer of a file. If say, I want to replace 5th to 9th character how do I do it? I need to use bash or a shell command.
I want to do something like this
s="abcdabcd"
s=s=s[0]+"12"+s[4:]
>a12dabcd
I have a string of exact length I can substitute and the start and end of replacement. I want to put the generated replacement back into the file.
Example:
I have this header:
HEADER 22aabbccdd23aabbccdd
I get these start and end indices : 2,10
I get this string: xyz56789
I want this: HEADER 22xyz5678923aabbccdd
to replace the existing 1st line in the file.
This can be done with Perl:
perl -i -lpe 'if ($. == 1 || eof) { substr($_, 1, 2) = "12" }' input.txt
-i: modify file in place
-l: automatically strip newlines from input and add them back on output
-p: iterate over lines of the input file and print them back out
-e CODE: what to do for each line
First we check whether the current line number ($.) is 1 (i.e. we're processing the first line of the file) or we have reached the end of the file (i.e. the line currently being processed is the last line of the file). If the condition is true, we take the substring of the current line ($_) starting from offset 1 of length 2 and set it to "12".
Simply with sed:
input.txt:
$ cat input.txt
22aabbccdd23aabbccdd
asasdfsdfd234234234234
$ sed -Ei '1 s/(..).{8}/\1xyz56789/' input.txt
Result:
22xyz5678923aabbccdd
asasdfsdfd234234234234

'sed' replace last patern and delete others pattern

I want to replace only the last string "delay" by "ens_delay" in my file and delete the others one before the last one:
Input file:
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
delay=''
delay=''
delay=''
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
Output file: (expected value)
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
ens_delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
Here my first command but it doesn't work because it will work only if I have delay as last line.
sed -e '$,/delay/ s/delay/ens_delay/'
My second command will delete all lines contain "delay", even "ens_delay" will be deleted.
sed -i '/delay/d'
Thank you
This might work for you (GNU sed):
sed '/^delay=/,$!b;/^delay=/!H;//{x;s/^[^\n]*\n\?//;/./p;x;h};$!d;x;s/^/ens_/' file
Lines before the first line beginning delay= should be printed as normal. Otherwise, a line beginning delay= is stored in the hold space and subsequent lines that do not begin delay= are appended to it. Should the hold space already contain such lines, the first line is deleted and the remaining lines printed before the hold space is replaced by the current line. At the end of the file, the first line of the hold space is amended to prepend the string ens_ and then the whole of the hold space is printed.
You cannot do this kind of thing with sed. There is no way in sed to "look forward" and tell if there are more matches to the pattern. You can kind of look back, but that won't be sufficient to solve this problem.
This perl script will solve it:
#!/usr/bin/perl
use strict;
use warnings;
my ($seek, $replacement, $last, #new) = (shift, shift, 0);
open(my $fh, shift) or die $!;
my #l = <$fh>;
close($fh) or die $!;
foreach (reverse #l){
if(/$seek/){
if ($last++ == 0){
s/$seek/$replacement/;
} else {
next;
}
}
unshift(#new, $_);
}
print join "", #new;
Call like:
./script delay= ens_delay= inputfile
I chose to entirely eliminate lines which you intended to delete rather than collapse them in to a single blank line. If that is really required then it's a bit more complicated: the first such line in any consecutive set (or rather the last such) must be pushed on to the output list and you have to track whether this has just been done so you know whether to push the next time, too.
You could also solve this problem with awk, python, or any number of other languages. Just not sed.
Have this monster:
sed -e "1,$(expr $(sed -n '/^delay=/=' your_file.txt | tail -1) - 1)"'s/^delay=.*$//' \
-e 's/^delay=/ens_delay=/' your_file.txt
Here:
sed -n '/^delay=/=' your_file.txt | tail -1 return the last line number of the encountered pattern (let's name it X)
expr is used to get the X-1 line
"1,X-1"'[command]' means "perform this command betwen the first and the X-1 line included (I used double quotes to let the expansion getting done)
's/^delay=.*$//' the said [command]
-e 's/^delay=/ens_delay=/' the next expression to perform (will occur only on the last line)
Output:
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_hsm_backup_notification='YES'
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
ens_delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_hsm_backup_notification='YES'
If you want to delete the lines instead of leaving them blank:
sed -e "1,$(expr $(sed -n '/^delay=/=' your_file.txt | tail -1) - 1)"'{/^delay=.*$/d}' \
-e 's/^delay=/ens_delay=/' your_file.txt
As was mentioned elsewhere, sed can't know which occurrence of a substring is the last one. But awk can keep track of things in arrays. For example, the following will delete all duplicate assignments, as well ask making your substitution:
awk 'BEGIN{FS=OFS="="} $1=="delay"{$1="ens_delay"} !($1 in a){o[++i]=$1} {a[$1]=$0} END{for(x=0;x<i;x++) printf "%s\n",a[o[x]]}' inputfile
Or, broken out for easier reading/comments:
BEGIN {
FS=OFS="=" # set the field separator, to help isolate the left hand side
}
$1=="delay" {
$1="ens_delay" # your field substitution
}
!($1 in a) {
o[++i]=$1 # if we haven't seen this variable, record its position
}
{
a[$1]=$0 # record the value of the last-seen occurrence of this variable
}
END {
for (x=0;x<i;x++) # step through the array,
printf "%s\n",a[o[x]] # printing the last-seen values, in the order
} # their variable was first seen in the input file.
You might not care about the order of the variables. If so, the following might be simpler:
awk 'BEGIN{FS=OFS="="} $1=="delay"{$1="ens_delay"} {o[$1]=$0} END{for(i in o) printf "%s\n", o[i]}' inputfile
This simply stores the last-seen line in an array whose key is the variable name, then prints out the content of the array in an unknown order.
Assuming I understand your specifications properly, this should do what you need. Given infile x,
$: last=$( grep -n delay x|tail -1|sed 's/:.*//' )
This grep's the file for all lines with delay and returns them with the line number prepended with a colon. The tail -1 grabs the last of those lines, ignoring all the others. sed 's/:.*//' strips the colon and the actual line content, leaving only the number (here it was 14.)
That all evaluates out to assign 14 as $last.
$: sed '/delay/ { '$last'!d; '$last' s/delay/ens_delay/; }' x
alpha_notify_teta=''
alpha_notify_check='YES'
text='CRDS'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
alpha_orange='YES'
alpha_orange_interval='300'
alpha_notification_level='ALL'
expression='YES'
ens_delay='9'
textfileooooop=''
alpha_enable='YES'
alpha_hostnames=''
Apologies for the ugly catenation. What this does is writes the script using the value of $last so that the result looks like this to sed:
$: sed '/delay/ { 14!d; 14 s/delay/ens_delay/; }' x
sed reads leading numbers as line selectors, so what this script of commands do -
First, sed automatically prints lines unless told not to, so by default it would just print every line. The script modifies that.
/delay/ {...} is a pattern-based record selector. It will apply the commands between the {} to all lines that match /delay/, which is why it doesn't need another grep - it handles that itself. Inside the curlies, the script does two things.
First, 14!d says (only if this line has delay, which it will) that if the line number is 14, do not (the !) delete the record. Since all the other lines with delay won't be line 14 (or whatever value of the last one the earlier command created), those will get deleted, which automatically restarts the cycle and reads the next record.
Second, if the line number is 14, then it won't delete, and so will progress to the s/delay/ens_delay/ which updates your value.
For all lines that don't match /delay/, sed just prints them as-is.

awk sub() of a substring by position

if I have the following:
>ID_10_J_X
ABCDEFGHIJKLMNOPQRSTUVQXYZ
(i.e. a fasta file!)
I want to be able to locate a substring based on the position (2nd element of first like i.e. 10) and take n positions around it i.e. 5 positions
EFGHIJKLMNO
and then substitute the position of interest with the 4th element of line 1 - i.e. X:
EFGHIXKLMNO
I can locate the substring, which is fine...but I am having trouble using the elements of line 1 to make the substitution in line 2. I have the following code:
#!/bin/bash
awk '
/>/{split($0,M,"_")}
!/^>/{split($1,N,"")
print M[1]"_"M[2]"_"M[3]"_"M[4]"\n"substr($1,M[2]-5,10)}
' $1
which gets me my substring.
Could someone help with my logic here to make the substitution. I gather I can use the sub() function and call the substring directly. My thinking is to use:
sub(regex/position,replacement,target)
which in my example would translate as:
sub(N[2],N[4],substr($1,M[2]-5,10))
Trying this results in
awk: cmd. line:5: print sub(M[2],M[4],substr($1,M[2]-10,20))}
awk: cmd. line:5: ^ sub third parameter is not a changeable object
So it seems I cannot call the substring explictly, and I alos have doubts about being able to use the position elements in the regex parameter.
Could someone help me with my code to form a general solution? My input is
>ID_10_J_X
ABCDEFGHIJKLMNOPQRSTUVQXYZ
and desired output is:
EFGHIXKLMNO
where I will have many inputs in the same file.
It must also hold true that, although I am looking for a substring consisting of 5 positions either side of the position given in line 1, if the position in line 1 is < 5, the substitution must be made in the specified position i.e.
>ID_2_J_X
ABCDEFGHIJKLMNOPQRSTUVQXYZ
AXCDEFG
It would be nice (but not essential) if the final substring is always a certain length i.e. if I have specified a substring of 10, but the substition is in position 2 as above, 8 characters are selected after the substitution to complete the a substring of length 10
Thanks
This awk script produces your desired output:
awk -F_ '/^>/{p=$2;s=$NF;next}{print substr($0,p-5,5) s substr($0,p+1,5)}' file
The first block saves your position p and replacement character s. The second prints the 5 characters before p, the replacement character, then the 5 characters after p.
Demo:
$ cat file
>ID_10_J_X
ABCDEFGHIJKLMNOPQRSTUVQXYZ
$ awk -F_ '/^>/{p=$2;s=$NF;next}{print substr($0,p-5,5) s substr($0,p+1,5)}' file
EFGHIXKLMNO
Here's an updated version of the code to deal with positions that are closer than 5 characters away from the start or end of the line. As it's slightly longer, I've used a script rather than a one-liner for clarity. You can run it like awk -f script.awk file:
BEGIN { FS="_" }
/^>/ {
p=$2; c=$NF; next
}
{
if (p-5<1) s=1
else if (p+5>length($0)) s=length($0)-10
else s=p-5
print substr($0,s,p-s) c substr($0,p,11-p+s)
}
Testing it out:
$ cat file
>ID_2_J_X
ABCDEFGHIJKLMNOPQRSTUVQXYZ
>ID_10_J_X
ABCDEFGHIJKLMNOPQRSTUVQXYZ
>ID_22_J_X
ABCDEFGHIJKLMNOPQRSTUVQXYZ
$ awk -f script.awk file
AXBCDEFGHIJK
EFGHIXJKLMNO
PQRSTUXVQXYZ

Resources