Extract lines between A and (B or C), containing D - bash

I need to extract text between A and (B or C) patterns, that contains D pattern inside.
For example I have a file and need to extract all between "proc sql" and ("quit" or "run"), containing "index" inside.
proc sql
bla-bla-bla
index=10;
quit
proc sql
bla-bla-bla
quit;
proc sql
index=10;
run
Needed output:
proc sql
bla-bla-bla
index=10;
quit
proc sql
index=10;
run
By now I have such solution:
perl -0777 -lne 'print for grep /\bindex\b/i, /^proc sql.*?quit.*?\n/mgs' file
But it only extracts between "proc sql" and "quit" (NOT "quit" or "run"), containing "index". I don't know how to add OR operation.
If you can propose alternative awk/sed/grep solution - would be nice.

This will do as you ask. It accumulates into $block all the lines between the start and end patterns. When the end pattern is reached it prints the block if it contains index
use strict;
use warnings;
my $block;
while ( <DATA> ) {
my $state = /^proc sql\b/ .. /^(?:quit|run)\b/;
$block .= $_ if $state;
if ( $state =~ /E/ ) {
print $block, "\n" if $block =~ /^index=/m;
$block = '';
}
}
__DATA__
proc sql
bla-bla-bla
index=10;
quit
proc sql
bla-bla-bla
quit;
proc sql
index=10;
run
output
proc sql
bla-bla-bla
index=10;
quit
proc sql
index=10;
run

Given that your input file is named input.txt, this will solve it in awk:
awk 'BEGIN {
procDetected = 0;
indexDetected = 0;
}
/proc/ {
buffer = "";
indexDetected = 0;
procDetected = 1;
}
/index/ {
indexDetected = 1;
}
{
if (procDetected) {
# Add the line to the buffer.
buffer = buffer $0 "\n";
}
}
/run/ || /quit/ {
if (procDetected && indexDetected) {
print buffer;
}
procDetected = 0;
indexDetected = 0;
}' input.txt

Related

explode comma from an array value in codeigniter

I want to explode coma from an array value.
My code is.
$to_ids_string = "";
$to_id = $this->input->post('to');
for ($r = 0; $r < count($this->input->post('to')); $r++) {
if ($to_ids_string != "") {
$to_ids_string = $to_ids_string . "," . $to_id[$r];
} else {
$to_ids_string = $to_id[$r];
}
}
echo $to_ids_string;
$a = explode(',', $to_ids_string);
foreach ($a as $item) {
echo("<li>$item</li>");
exit;
}
when i echo $to_ids_string it will return 2,3 but when i loop in foreach it only return 2 not show 3.
Because of your exit, if you use exit like that, then it is the end of your program and it doesn't echo anymore.
You forget to remove exit; from foreach loop. When you write exit, execution of your code stops. Hence you are not getting desired output.
Happens due to exit.
Please remove exit from your code.

Alogrithm in using perl to find the value in array - Absolutely Interview Questions

I am asked to do the perl program to find a value(from user input) in array. If matched "its ok". If not matched, then check within the value in the index[0] to index[1] ... index[n]. So then if the value matched to the between two elements then report which is near to these elements might be index[0] or index[1].
Let you explain.
Given array : 10 15 20 25 30;
Get the value from user : 14 (eg.)
Hence 14 matched with in the two elements that is 10(array[0]) - 15(array[1])
Ultimately the check point is do not use more than one for loop and never use the while loop. You need to check one for loop and many of if conditions.
I got the output by which I did here is:
use strict;
use warnings;
my #arr1 = qw(10 15 20 25 30);
my $in = <STDIN>;
chomp($in);
if(grep /$in/, #arr1)
{ } #print "S: $in\n"; }
else
{
for(my $i=0; $i<scalar(#arr1); $i++)
{
my $j = $i + 1;
if($in > $arr1[$i] && $in < $arr1[$j])
{
#print "SN: $arr1[$i]\t$arr1[$j]\n";
my ($inc, $dec) = "0";
my $chk1 = $arr1[$i] + 1;
AGAIN1:
if($in == $chk1)
{ }
else
{ $chk1++; $inc++; goto AGAIN1; }
my $chk2 = $arr1[$j] - 1;
AGAIN2:
if($in == $chk2){ }
else
{ $chk2--; $dec++; goto AGAIN2; }
if($inc > $dec)
{ print "Matched value nearest to $arr1[$j]\n"; }
elsif($inc < $dec)
{ print "Matched value nearest to $arr1[$i]\n"; }
}
}
}
However my question is there a way in algorithm?. Hence if someone can help on this one and it would be appreciated.
Thanks in advance.
You seem determined to make this as complicated as possible :-)
Your specification isn't completely clear, but I think this does what you want:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my #array = qw[10 15 20 25 30];
chomp(my $in = <STDIN>);
if ($in < $array[0]) {
say "$in is less than first element in the array";
exit;
}
if ($in > $array[-1]) {
say "$in is greater than last element in the array";
exit;
}
for (0 .. $#array) {
if ($in == $array[$_]) {
say "$in is in the array";
exit;
}
if ($in < $array[$_]) {
if ($in - $array[$_ - 1] < $array[$_] - $in) {
say "$in is closest to $array[$_ - 1]";
} else {
say "$in is closest to $array[$_]";
}
exit;
}
}
say "Shouldn't get here!";
Using the helper functions any and reduce from the core module List::Util and the built in abs.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce any/;
my #arr1 = qw(10 15 20 25 30);
chomp(my $in = <STDIN>);
if (any {$in == $_} #arr1) {
print "$in is in the array\n";
}
else {
my $i = reduce { abs($in - $arr1[$a]) > abs($in - $arr1[$b]) ? $b : $a} 0 .. $#arr1;
print "$in is closest to $arr1[$i]\n";
}

Open Reading Frame program not printing Amino Acid Sequences

I am working on a program that will be able to read a gene sequence and give me the Open Reading Frames (ORF) and then the protein sequence of each ORF. I have already gotten the code to work for finding the ORFs- but no amino acids will print. I am using Perl on my Mac.
I would like to get the code to tell me the string of amino acids produced from the open reading frames.
Here is my code:
#!/usr/bin/perl
#ORF_Find.txt -> finds long orfs in a DNA sequence
open(CHROM, "chr03.txt"); #Open file chr03.txt containing yeastchrom. 3
$DNA = ""; #start with empty DNA sequence
$header = <CHROM>; #get header of sequence
#Read line from file, join to end of $DNA, repeat until end of file
while ($current_line = <CHROM>)
{
chomp($current_line); #remove newline from end of current_line
$DNA= $DNA . $current_line;
}
#length of DNA sequence
$DNA_length = length($DNA);
#flag for ORF finder
$inORF=0;
#number of ORFs found
$numORFs = 0;
#minimum length
$minimum_codons =100;
#search each reading frame
for ($frame =0; $frame<3; $frame++)
{
print "\nFinding ORFs in frame: +" . ($frame + 1) . "\n";
#search for sequence match and print position of match if found
for ($i =frame; $i<=($DNA_length-3);$i += 3)
{
#get current codon from sequence
$codon= substr ($DNA, $i, 3);
#if not in orf search for ATG, else search for stop codon
if ($inORF == 0)
{
#if current codon is ATG, start ORF
if ($codon eq "ATG")
{
$inORF = 1;
$ORF_length = 1;
$ORF_start = $i;
}
}
elsif($inORF ==1)
{
#if current codon is a stop codon, end ORF
if ($codon eq "TGA" || $codon eq "TAG" || $codon eq "TAA")
{
#if ORF has at least min number of codons,print location
if ($ORF_length >= $minimum_codons)
{
print "FOUND ORF AT POSITION $ORF_start,";
print "length = $ORF_length\n";
$numORFs++;
}
#reset ORF variables
$inORF = 0;
$ORF_length = 0;
}
else
{
#increase length of ORF by one codon
$ORF_length++;
}
}
}
}
#change T to U
$DNA =~ s/T/U/g;
#search each ORF
for ($i=$ORF_start; $i<=($ORF_length-3); $i+=3)
{
#get codon from each ORF
$aa_codon= substr($DNA, $i, 3);
#find amino acids
foreach ($aa_codon eq "ATG")
{
print ("M") #METHIONINE
}
foreach ($aa_codon =~/UU[UC]/)
{
print ("F") #PHENYLALANINE
}
foreach ($aa_codon =~/UU[AG]/ || $aa_codon=~/CU[UCAG]/)
{
print ("L"); #LEUCINE
}
foreach ($aa_codon =~/AU[UAC]/)
{
print ("I"); #ISOLEUCINE
}
foreach ($aa_codon =~/GU[UACG]/)
{
print ("V"); #VALINE
}
foreach ($aa_codon =~/UC[UCAG]/ || $aa_codon=~/AG[UC]/)
{
print ("S"); #SERINE
}
foreach ($aa_codon =~/CC[UCAG]/)
{
print ("P"); #PROLINE
}
foreach ($aa_codon =~/AC[UCAG]/)
{
print ("T"); #THREONINE
}
foreach ($aa_codon =~/GC[UCAG]/)
{
print ("A"); #ALANINE
}
foreach ($aa_codon =~/UA[UC]/)
{
print ("Y"); #TYROSINE
}
foreach ($aa_codon =~/CA[UC]/)
{
print ("H"); #HISTIDINE
}
foreach ($aa_codon =~/CA[AG]/)
{
print ("G"); #GLUTAMINE
}
foreach ($aa_codon =~/AA[UC]/)
{
print ("N"); #ASPARAGINE
}
foreach ($aa_codon =~/AA[AG]/)
{
print ("K"); #LYSINE
}
foreach ($aa_codon =~/GA[UC]/)
{
print ("D"); #ASPARTIC ACID
}
foreach ($aa_codon =~/GA[AG]/)
{
print ("E"); #GLUTAMIC ACID
}
foreach ($aa_codon =~/UG[UC]/)
{
print ("C"); #CYSTINE
}
foreach ($aa_codon eq "UGG")
{
print ("W"); #TRYPTOPHAN
}
foreach ($aa_codon =~/AG[AG]/ || $aa_codon =~/CG[UCAG]/)
{
print ("R"); #ARGININE
}
foreach ($aa_codon =~/GG[UCAG]/)
{
print ("G"); #GLYCINE
}
foreach ($aa_codon =~/UA[AG]/|| $aa_codon eq "UGA")
{
print ("*") #STOP
}
}
#if no ORFS found, print message
if ($numORFs ==0)
{
print ("NO ORFS FOUND\n");
}
else
{
print ("\n$num_ORFs ORFS WERE FOUND\n");
}
First, this question would probably be more appropriate for a forum such as seqAnswers or BioStars. That aside, writing your own 6-frame translation script is a complex task, especially if you want to account for IUPAC ambiguous nucleotides. There are already lots of scripts and tools out there that do this. Probably the easiest suggestion I can make is to use one of the existing tools. Try mine, for example:
https://github.com/hepcat72/sixFrameTranslation/archive/master.zip
My script wasn't public until just now. I have opened it up so that you can use it. Just run it to get a usage.
Other than that, if you want to get your version running properly, the first thing you can do is change your she-bang to:
#!/usr/bin/perl -w
Note the -w. Then, add this line to the top of your script:
use strict;
It will help you debug issues such as the missing dollar sign in one of your for loops:
for ($i =frame; $i<=($DNA_length-3);$i += 3)
It should be:
for ($i =$frame; $i<=($DNA_length-3);$i += 3)
And BTW, it doesn't matter that you're running perl on your Mac. It's just perl. "Mac perl" was a project to create a perl environment back in the pre-OS-X days.

How to handle error thrown by module in perl

I am using the module DBD::Oracle in perl to insert xml contents into oracle 11 g instance. While inserting some of the documents in my sample set the script fails as the module returns Unsupported named object type for bind parameter. I would like to handle this error and make the loop iteration to go on.
following is my code,
use strict;
use warnings;
use DBI;
use DBD::Oracle qw(:ora_session_modes);
use DBD::Oracle qw(:ora_types);
die("USAGE: $0 <input_directory>") unless ($#ARGV == 0);
my $directory=$ARGV[0];
my $dbh = DBI->connect('dbi:Oraclle:dbname',"username", "pass");
my $SQL;
opendir(IMD, $directory) || die ("Cannot open directory");
my #listOfFiles= readdir(IMD);
closedir(IMD);
my $xmltype_string;
my $xml;
my $i = 1;
foreach my $file(#listOfFiles)
{
unless($file eq '.' or $file eq '..')
{
print "inserting File no. $i \t $file .... \n";
{
local $/=undef;
open (FILE , "<" , "$directory/$file" );
$xml=<FILE>;
close (FILE);
}
$SQL="insert into sampleTable values ( :ind, :xml)";
my $sth =$dbh-> prepare($SQL);
$sth->bind_param(":xml" , $xml , { ora_type => ORA_XMLTYPE});
$sth->bind_param(":ind" , $i);
$sth-> execute();
$i++;
}
}
Am getting the error in bind param.
Error handling is usually done via the Try::Tiny module:
use Try::Tiny;
try {
something_that_could_die();
}
catch {
handle_error($_);
}
finally {
do_something_either_way();
}; # ← trailing semicolon not optional.
Both catch and finally are optional.

Why does password input in Perl using Win32::Console require pressing Enter twice?

I am fairly new to Perl and am having a hard time grasping the behavior of the following password input code snippet:
use Win32::Console;
my $StdIn = new Win32::Console(STD_INPUT_HANDLE);
my $Password = "";
$StdIn->Mode(ENABLE_PROCESSED_INPUT());
local $| = 1;
print "Enter Password: ";
while (my $Data = $StdIn->InputChar(1)) {
if ("\r" eq $Data ) {
last;
}
elsif ("\ch" eq $Data ) {
if ( "" ne chop( $Password )) {
print "\ch \ch";
}
next;
}
$Password .=$Data;
print "*";
}
while (my $Data = $StdIn->InputChar(1)) {
print "\nShow password? [y/n] ";
if ("n" eq $Data) {
last;
}
elsif ("y" eq $Data) {
print "\nPassword: $Password\n";
last;
}
}
Basically what happens is that the script prompts the user for a password and displays * for every character input as expected but requires Enter to be pressed twice to accept the input. However, if I delete the second while loop (or replace with a print $password statement) the input only requires one press of Enter.
I have also noticed that in the second while loop, which prompts the user to enter y or n (without needing to press Enter) if the user enters 'y' then the line Show password? [y/n] is repeated before displaying the password.
Some insight on this behavior would be appreciated.
The first Enter gets you out of the first while loop. The second while loop then waits for another character before displaying the prompt. You should display the prompt before asking for another character (and display it only once).
Breaking things into subroutines helps build on basic blocks.
use strict; use warnings;
use Win32::Console;
run();
sub run {
my $StdIn = Win32::Console->new(STD_INPUT_HANDLE);
$StdIn->Mode(ENABLE_PROCESSED_INPUT);
my $Password = prompt_password($StdIn, "Enter Password: ", '*');
if ( prompt_echo($StdIn, "\nShow password? [y/n] ") ) {
print "\nPassword = $Password\n"
}
return;
}
sub prompt_password {
my ($handle, $prompt, $mask) = #_;
my ($Password);
local $| = 1;
print $prompt;
$handle->Flush;
while (my $Data = $handle->InputChar(1)) {
last if "\r" eq $Data;
if ("\ch" eq $Data ) {
if ( "" ne chop( $Password )) {
print "\ch \ch";
}
next;
}
$Password .= $Data;
print $mask;
}
return $Password;
}
sub prompt_echo {
my ($handle, $prompt) = #_;
local $| = 1;
print $prompt;
$handle->Flush;
while (my $Data = $handle->InputChar(1)) {
return if "n" eq $Data;
return 1 if "y" eq $Data;
}
return;
}

Resources