How to take out certain elements from a pdb file - bash

I am trying to take out certain columns from a pdb file. I already have taken out all lines that start out with ATOM in my code. For some reason my sub functions are not working and I do not know where or how to call them.
My code is:
open (FILE, $ARGV[0])
or die "Could not open file\n";
my #newlines;
while ( my $line = <FILE> ) {
if ($line =~ m/^ATOM.*/) {
push #newlines, $line;
}
}
my $atomcount = #newlines;
#print "#newlines\n";
#print "$atomcount\n";
##############################################################
#This function will take out the element from each line
#The element is from column 77 and contains one or two letters
sub atomfreq {
foreach my $record1(#newlines) {
my $element = substr($record1, 76, 2);
print "$element\n";
return;
}
}
################################################################
#This function will take out the residue name from each line
#The element is from column 18 and contains 3 letters
sub resfreq {
foreach my $record2(#newlines) {
my $residue = substr($record2, 17, 3);
print "$residue\n";
return;
}
}

As #Ossip already said in this answer you simply need to call your functions:
sub atomfreq {
...
}
sub resfreq {
...
}
atomfreq();
resfreq();
But I'm not sure whether these functions do what you intended because the comments imply that they should print every $residue and $element from the #newlines array. You've put a return statement inside the for loop which will immediately return from the whole function (and its for loop) so it will print only the first $residue or $element. Because the functions aren't supposed to return anything you can just drop that statement:
sub atomfreq {
foreach my $record1(#newlines) {
my $element = substr($record1, 76, 2);
print "$element\n";
}
}
sub resfreq {
foreach my $record2(#newlines) {
my $residue = substr($record2, 17, 3);
print "$residue\n";
}
}
atomfreq();
resfreq();

You can just call them right under your other code like this:
atomfreq();
resfreq();

Related

Alogrithm in using perl to find the value in array - Absolutely Interview Questions

I am asked to do the perl program to find a value(from user input) in array. If matched "its ok". If not matched, then check within the value in the index[0] to index[1] ... index[n]. So then if the value matched to the between two elements then report which is near to these elements might be index[0] or index[1].
Let you explain.
Given array : 10 15 20 25 30;
Get the value from user : 14 (eg.)
Hence 14 matched with in the two elements that is 10(array[0]) - 15(array[1])
Ultimately the check point is do not use more than one for loop and never use the while loop. You need to check one for loop and many of if conditions.
I got the output by which I did here is:
use strict;
use warnings;
my #arr1 = qw(10 15 20 25 30);
my $in = <STDIN>;
chomp($in);
if(grep /$in/, #arr1)
{ } #print "S: $in\n"; }
else
{
for(my $i=0; $i<scalar(#arr1); $i++)
{
my $j = $i + 1;
if($in > $arr1[$i] && $in < $arr1[$j])
{
#print "SN: $arr1[$i]\t$arr1[$j]\n";
my ($inc, $dec) = "0";
my $chk1 = $arr1[$i] + 1;
AGAIN1:
if($in == $chk1)
{ }
else
{ $chk1++; $inc++; goto AGAIN1; }
my $chk2 = $arr1[$j] - 1;
AGAIN2:
if($in == $chk2){ }
else
{ $chk2--; $dec++; goto AGAIN2; }
if($inc > $dec)
{ print "Matched value nearest to $arr1[$j]\n"; }
elsif($inc < $dec)
{ print "Matched value nearest to $arr1[$i]\n"; }
}
}
}
However my question is there a way in algorithm?. Hence if someone can help on this one and it would be appreciated.
Thanks in advance.
You seem determined to make this as complicated as possible :-)
Your specification isn't completely clear, but I think this does what you want:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my #array = qw[10 15 20 25 30];
chomp(my $in = <STDIN>);
if ($in < $array[0]) {
say "$in is less than first element in the array";
exit;
}
if ($in > $array[-1]) {
say "$in is greater than last element in the array";
exit;
}
for (0 .. $#array) {
if ($in == $array[$_]) {
say "$in is in the array";
exit;
}
if ($in < $array[$_]) {
if ($in - $array[$_ - 1] < $array[$_] - $in) {
say "$in is closest to $array[$_ - 1]";
} else {
say "$in is closest to $array[$_]";
}
exit;
}
}
say "Shouldn't get here!";
Using the helper functions any and reduce from the core module List::Util and the built in abs.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce any/;
my #arr1 = qw(10 15 20 25 30);
chomp(my $in = <STDIN>);
if (any {$in == $_} #arr1) {
print "$in is in the array\n";
}
else {
my $i = reduce { abs($in - $arr1[$a]) > abs($in - $arr1[$b]) ? $b : $a} 0 .. $#arr1;
print "$in is closest to $arr1[$i]\n";
}

Open Reading Frame program not printing Amino Acid Sequences

I am working on a program that will be able to read a gene sequence and give me the Open Reading Frames (ORF) and then the protein sequence of each ORF. I have already gotten the code to work for finding the ORFs- but no amino acids will print. I am using Perl on my Mac.
I would like to get the code to tell me the string of amino acids produced from the open reading frames.
Here is my code:
#!/usr/bin/perl
#ORF_Find.txt -> finds long orfs in a DNA sequence
open(CHROM, "chr03.txt"); #Open file chr03.txt containing yeastchrom. 3
$DNA = ""; #start with empty DNA sequence
$header = <CHROM>; #get header of sequence
#Read line from file, join to end of $DNA, repeat until end of file
while ($current_line = <CHROM>)
{
chomp($current_line); #remove newline from end of current_line
$DNA= $DNA . $current_line;
}
#length of DNA sequence
$DNA_length = length($DNA);
#flag for ORF finder
$inORF=0;
#number of ORFs found
$numORFs = 0;
#minimum length
$minimum_codons =100;
#search each reading frame
for ($frame =0; $frame<3; $frame++)
{
print "\nFinding ORFs in frame: +" . ($frame + 1) . "\n";
#search for sequence match and print position of match if found
for ($i =frame; $i<=($DNA_length-3);$i += 3)
{
#get current codon from sequence
$codon= substr ($DNA, $i, 3);
#if not in orf search for ATG, else search for stop codon
if ($inORF == 0)
{
#if current codon is ATG, start ORF
if ($codon eq "ATG")
{
$inORF = 1;
$ORF_length = 1;
$ORF_start = $i;
}
}
elsif($inORF ==1)
{
#if current codon is a stop codon, end ORF
if ($codon eq "TGA" || $codon eq "TAG" || $codon eq "TAA")
{
#if ORF has at least min number of codons,print location
if ($ORF_length >= $minimum_codons)
{
print "FOUND ORF AT POSITION $ORF_start,";
print "length = $ORF_length\n";
$numORFs++;
}
#reset ORF variables
$inORF = 0;
$ORF_length = 0;
}
else
{
#increase length of ORF by one codon
$ORF_length++;
}
}
}
}
#change T to U
$DNA =~ s/T/U/g;
#search each ORF
for ($i=$ORF_start; $i<=($ORF_length-3); $i+=3)
{
#get codon from each ORF
$aa_codon= substr($DNA, $i, 3);
#find amino acids
foreach ($aa_codon eq "ATG")
{
print ("M") #METHIONINE
}
foreach ($aa_codon =~/UU[UC]/)
{
print ("F") #PHENYLALANINE
}
foreach ($aa_codon =~/UU[AG]/ || $aa_codon=~/CU[UCAG]/)
{
print ("L"); #LEUCINE
}
foreach ($aa_codon =~/AU[UAC]/)
{
print ("I"); #ISOLEUCINE
}
foreach ($aa_codon =~/GU[UACG]/)
{
print ("V"); #VALINE
}
foreach ($aa_codon =~/UC[UCAG]/ || $aa_codon=~/AG[UC]/)
{
print ("S"); #SERINE
}
foreach ($aa_codon =~/CC[UCAG]/)
{
print ("P"); #PROLINE
}
foreach ($aa_codon =~/AC[UCAG]/)
{
print ("T"); #THREONINE
}
foreach ($aa_codon =~/GC[UCAG]/)
{
print ("A"); #ALANINE
}
foreach ($aa_codon =~/UA[UC]/)
{
print ("Y"); #TYROSINE
}
foreach ($aa_codon =~/CA[UC]/)
{
print ("H"); #HISTIDINE
}
foreach ($aa_codon =~/CA[AG]/)
{
print ("G"); #GLUTAMINE
}
foreach ($aa_codon =~/AA[UC]/)
{
print ("N"); #ASPARAGINE
}
foreach ($aa_codon =~/AA[AG]/)
{
print ("K"); #LYSINE
}
foreach ($aa_codon =~/GA[UC]/)
{
print ("D"); #ASPARTIC ACID
}
foreach ($aa_codon =~/GA[AG]/)
{
print ("E"); #GLUTAMIC ACID
}
foreach ($aa_codon =~/UG[UC]/)
{
print ("C"); #CYSTINE
}
foreach ($aa_codon eq "UGG")
{
print ("W"); #TRYPTOPHAN
}
foreach ($aa_codon =~/AG[AG]/ || $aa_codon =~/CG[UCAG]/)
{
print ("R"); #ARGININE
}
foreach ($aa_codon =~/GG[UCAG]/)
{
print ("G"); #GLYCINE
}
foreach ($aa_codon =~/UA[AG]/|| $aa_codon eq "UGA")
{
print ("*") #STOP
}
}
#if no ORFS found, print message
if ($numORFs ==0)
{
print ("NO ORFS FOUND\n");
}
else
{
print ("\n$num_ORFs ORFS WERE FOUND\n");
}
First, this question would probably be more appropriate for a forum such as seqAnswers or BioStars. That aside, writing your own 6-frame translation script is a complex task, especially if you want to account for IUPAC ambiguous nucleotides. There are already lots of scripts and tools out there that do this. Probably the easiest suggestion I can make is to use one of the existing tools. Try mine, for example:
https://github.com/hepcat72/sixFrameTranslation/archive/master.zip
My script wasn't public until just now. I have opened it up so that you can use it. Just run it to get a usage.
Other than that, if you want to get your version running properly, the first thing you can do is change your she-bang to:
#!/usr/bin/perl -w
Note the -w. Then, add this line to the top of your script:
use strict;
It will help you debug issues such as the missing dollar sign in one of your for loops:
for ($i =frame; $i<=($DNA_length-3);$i += 3)
It should be:
for ($i =$frame; $i<=($DNA_length-3);$i += 3)
And BTW, it doesn't matter that you're running perl on your Mac. It's just perl. "Mac perl" was a project to create a perl environment back in the pre-OS-X days.

Unable to increment last 2 digit of variable declared in file using script

I have the file given below:
elix554bx.xayybol.42> vi setup.REVISION
# Revision information
setenv RSTATE R24C01
setenv CREVISION X3
exit
My requirement is to read RSTATE from file and then increment last 2 digits of RSTATE in setup.REVISION file and overwrite into same file.
Can you please suggest how to do this?
If you're using vim, then you can use the sequence:
/RSTATE/
$<C-a>:x
The first line is followed by a return and searches for RSTATE. The second line jumps to the end of the line and uses Control-a (shown as <C-a> above, and in the vim documentation) to increment the number. Repeat as often as you want to increment the number. The :x is also followed by a return and saves the file.
The only tricky bit is that the leading 0 on the number makes vim think the number is in octal, not decimal. You can override that by using :set nrformats= followed by return to turn off octal and hex; the default value is nrformats=octal,hex.
You can learn an awful lot about vim from the book Practical Vim: Edit Text at the Speed of Thought by Drew Neil. This information comes from Tip 10 in chapter 2.
Here's an awk one-liner type solution:
awk '{
if ( $0 ~ 'RSTATE' ) {
match($0, "[0-9]+$" );
sub( "[0-9]+$",
sprintf( "%0"RLENGTH"d", substr($0, RSTART, RSTART+RLENGTH)+1 ),
$0 );
print; next;
} else { print };
}' setup.REVISION > tmp$$
mv tmp$$ setup.REVISION
Returns:
setenv RSTATE R24C02
setenv CREVISION X3
exit
This will handle transitions from two to three to more digits appropriately.
I wrote for you a class.
class Reader
{
public string ReadRs(string fileWithPath)
{
string keyword = "RSTATE";
string rs = "";
if(File.Exists(fileWithPath))
{
StreamReader reader = File.OpenText(fileWithPath);
try
{
string line = "";
bool finded = false;
while (reader != null && !finded)
{
line = reader.ReadLine();
if (line.Contains(keyword))
{
finded = true;
}
}
int index = line.IndexOf(keyword);
rs = line.Substring(index + keyword.Length +1, line.Length - 1 - (index + keyword.Length));
}
catch (IOException)
{
//Error
}
finally
{
reader.Close();
}
}
return rs;
}
public int GetLastTwoDigits(string rsState)
{
int digits = -1;
try
{
int length = rsState.Length;
//Get the last two digits of the rsstate
digits = Int32.Parse(rsState.Substring(length - 2, 2));
}
catch (FormatException)
{
//Format Error
digits = -1;
}
return digits;
}
}
You can use this as exists
Reader reader = new Reader();
string rsstate = reader.ReadRs("C://test.txt");
int digits = reader.GetLastTwoDigits(rsstate);

how to sort a list in Vala using custom Comparator

I'm trying to get a directory listing and sort it into last modified time order using Vala.
I've got the directory listing part into a List < FileInfo >.
But I cannot figure out how to sort the list.
This is done via the the sort(CompareFunc<G> compare_func) method in the List class. You can read more about it here.
A basic example for strings would be:
list.sort((a,b) => {
return a.ascii_casecmp(b);
});
The return value of the function passed to sort() is the same as the ISO C90 qsort(3) function:
The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second.
As you're interested in modify time, the FileAttribute you're looking for is TIME_MODIFIED which you would get by calling the appropriate get_attribute_* method of FileInfo.
static int main (string[] args) {
var directory = File.new_for_path ("/var/db/pkg");
var glib_list = new GLib.List<FileInfo> ();
try {
var enumerator = directory.enumerate_children (FileAttribute.TIME_MODIFIED, FileQueryInfoFlags.NOFOLLOW_SYMLINKS);
FileInfo file_info;
while ((file_info = enumerator.next_file()) != null) {
glib_list.append(file_info);
}
} catch(Error e) {
stderr.printf ("Error: %s\n", e.message);
}
// Lets sort it.
CompareFunc<FileInfo> my_compare_func = (a, b) => {
long c = a.get_modification_time().tv_sec;
long d = b.get_modification_time().tv_sec;
return (int) (c > d) - (int) (c < d);
};
glib_list.sort(my_compare_func);
foreach (FileInfo file_info in glib_list) {
stdout.printf ("%s\n", file_info.get_name());
}
return 0;
}

How can this perl sub be optimised for speed?

The following perl sub is used to store arrays of hashes.
Each hash to be stored is first checked for uniqueness using a given key, if a hash exists on the array with the same key value then it's not stored.
How can this perl sub be optimised for speed?
Example use:
my #members;
...
$member= {};
$hash->{'name'}='James';
hpush('name', \#members,$member);
The sub:
sub hpush {
# push a set of key value pairs onto an array as a hash, if the key doesn't already exist
if (#_ != 3) {
print STDERR "hpush requires 3 args, ".#_." given\n";
return;
}
my $uniq = shift;
my $rarray = shift;
my $rhash = shift;
my $hash = ();
#print "\nHash:\n";
for my $key ( keys %{$rhash} ) {
my $valuea = $rhash->{$key};
#print "key: $key\n";
#print "key=>value: $key => $valuea\n";
$hash->{ $key} = $valuea;
}
#print "\nCurrent Array:\n";
for my $node (#{$rarray}) {
#print "node: $node \n";
for my $key ( keys %{$node} ) {
my $valueb = $node->{$key};
#print "key=>value: $key => $valueb\n";
if ($key eq $uniq) {
#print "key=>value: $key => $valueb\n";
if (($valueb =~ m/^[0-9]+$/) && ($hash->{$key} == $valueb)) {
#print "Not pushing i $key -> $valueb\n";
return;
} elsif ($hash->{$key} eq $valueb) {
#print "Not pushing s $key -> $valueb\n";
return;
}
}
}
}
push #{$rarray}, $hash;
#print "Pushed\n";
}
Note that the perl isn't mine and I'm a perl beginner
This code is rather... not very efficient. First, it copies $rhash to $hash, with a for loop... for some reason. Then it loops through the hash keys, instead of simply using the hash key that it's looking for. Then it does two equivalent checks, apparently some attempt to distinguish numbers from non-numbers and selecting the appropriate check (== or eq). This is all unnecessary.
This code below should be roughly equivalent. I've trimmed it down hard. This should be as fast as it is possible to get it.
use strict;
use warnings;
hpush('name', \#members,$member);
sub hpush {
my ($uniq, $rarray, $rhash) = #_;
for my $node (#{$rarray}) {
if (exists $node->{$uniq}) {
return if ($node->{$uniq} eq $rhash->{$uniq});
}
}
push #{$rarray}, $rhash;
}

Resources