Python - Validation of input arguments - validation

I would like to validate the input argument against the supported/accepted value. Let's say pt is an input argument and user can insert the value as a single text ("apple") or multiple one ( "banana apple").
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--pt", nargs='+', type=str,help="Supported pt: apple,banana,milk,pear")
args = parser.parse_args()
input=args.pt
print(input)
==============
[root#localhost home]# python3 test_args.py --pt "apple banana"
['apple banana']
[root#localhost home]#
Any idea how to achieve this?
Moreover, I want to convert this args.pt into a list, so that it can be utlized in a for loop.
BR, Ashish

Related

Change or personalize OID

Finally managed to have my RPI computer module to work with SNMP.
I have a script running that gives me one of my parameters and if I use query using SNMP I get the info back.
pi#raspberrypi:~ $ snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendObjects | grep snmp_status
NET-SNMP-EXTEND-MIB::nsExtendCommand."snmp_status" = STRING: /home/pi/BDC/snmp_status.py
NET-SNMP-EXTEND-MIB::nsExtendArgs."snmp_status" = STRING:
NET-SNMP-EXTEND-MIB::nsExtendInput."snmp_status" = STRING:
NET-SNMP-EXTEND-MIB::nsExtendCacheTime."snmp_status" = INTEGER: 5
NET-SNMP-EXTEND-MIB::nsExtendExecType."snmp_status" = INTEGER: exec(1)
NET-SNMP-EXTEND-MIB::nsExtendRunType."snmp_status" = INTEGER: run-on-read(1)
NET-SNMP-EXTEND-MIB::nsExtendStorage."snmp_status" = INTEGER: permanent(4)
NET-SNMP-EXTEND-MIB::nsExtendStatus."snmp_status" = INTEGER: active(1)
NET-SNMP-EXTEND-MIB::nsExtendOutput1Line."snmp_status" = STRING: 0
NET-SNMP-EXTEND-MIB::nsExtendOutputFull."snmp_status" = STRING: 0
NET-SNMP-EXTEND-MIB::nsExtendOutNumLines."snmp_status" = INTEGER: 1
NET-SNMP-EXTEND-MIB::nsExtendResult."snmp_status" = INTEGER: 0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."snmp_status".1 = STRING: 0
If my unit is in alarm replies with
NET-SNMP-EXTEND-MIB::nsExtendOutLine."snmp_status".1 = STRING: 1
if not in alarm replies with
NET-SNMP-EXTEND-MIB::nsExtendOutLine."snmp_status".1 = STRING: 0
This status is stored in a file and it's parsed to the SNMP using a python script.
Now... next question.
The SNMP server gives me the following OID
.1.3.6.1.4.1.8072.1.3.2.3.1.2.11.115.110.109.112.95.115.116.97.116.117.115
and for each parameter it gives me one very different IOD.
How can I change this for something more easy... like the ones we see on MIB files?
If you are doing it in the command line, use
snmptranslate -m NET-SNMP-EXTEND-MIB .1.3.6.1.4.1.8072.1.3.2.3.1.2.11.115.110.109.112.95.115.116.97.116.117.115
To do it purely programmatically (i.e. without parsing command line output), you will need a way to parse the MIB files. I think such tools probably exist in Python, but I've never used them myself.
More often, I hard-code constants for the OIDs that I'm interested in, and manually inspect the MIB to know how to decode the index for each object. The OID you gave is an instance of NET-SNMP-EXTEND-MIB::nsExtendOutputFull, which belongs to nsExtendOutput1Entry. Normally the *Entry types will have an INDEX field telling you which field is used as the index of that table. In this case, it has an AUGMENTS field instead, which points you to nsExtendConfigEntry. The INDEX fornsExtendConfigEntry is nsExtendToken, which has a type of DisplayString (basically an OCTET STRING that is limited to human-readable characters).
Here's an example of how I would do this in Python -- you'll need pip install snmp:
from snmp.types import OID, OctetString
nsExtendOutputFull = OID.parse(".1.3.6.1.4.1.8072.1.3.2.3.1.2")
oid = OID.parse(".1.3.6.1.4.1.8072.1.3.2.3.1.2.11.115.110.109.112.95.115.116.97.116.117.115")
nsExtendToken = oid.extractIndex(nsExtendOutputFull, OctetString)
print(f"Index = {nsExtendToken}")
Here's the output:
Index = OctetString(b'snmp_status')

YAML mapping order not preserved when using alias and yamlordereddictloader loader

I want to load a YAML file into Python as an OrderedDict. I am using yamlordereddictloader to preserve ordering.
However, I notice that the aliased object is placed "too soon" in the OrderedDict in the output.
How can I preserve the order of this mapping when read into Python, ideally as an OrderedDict? Is it possible to achieve this result without writing some custom parsing?
Notes:
I'm not particularly concerned with the method used, as long as the end result is the same.
Using sequences instead of mappings is problematic because they can result in nested output, and I can't simply flatten everything (some nestedness is appropriate).
When I try to just use !!omap, I cannot seem to merge the aliased mapping (d1.dt) into the d2 mapping.
I'm in Python 3.6, if I don't use this loader or !!omap order is not preserved (apparently contrary to the top 'Update' here: https://stackoverflow.com/a/21912744/2343633)
import yaml
import yamlordereddictloader
yaml_file = """
d1:
id:
nm1: val1
dt: &dt
nm2: val2
nm3: val3
d2: # expect nm4, nm2, nm3
nm4: val4
<<: *dt
"""
out = yaml.load(yaml_file, Loader=yamlordereddictloader.Loader)
keys = [x for x in out['d2']]
print(keys) # ['nm2', 'nm3', 'nm4']
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"
Is it possible to achieve this result without writing some custom parsing?
Yes. You need to override the method flatten_mapping from SafeConstructor. Here's a basic working example:
import yaml
import yamlordereddictloader
from yaml.constructor import *
from yaml.reader import *
from yaml.parser import *
from yaml.resolver import *
from yaml.composer import *
from yaml.scanner import *
from yaml.nodes import *
class MyLoader(yamlordereddictloader.Loader):
def __init__(self, stream):
yamlordereddictloader.Loader.__init__(self, stream)
# taken from here and reengineered to keep order:
# https://github.com/yaml/pyyaml/blob/5.3.1/lib/yaml/constructor.py#L207
def flatten_mapping(self, node):
merged = []
def merge_from(node):
if not isinstance(node, MappingNode):
raise yaml.ConstructorError("while constructing a mapping",
node.start_mark, "expected mapping for merging, but found %s" %
node.id, node.start_mark)
self.flatten_mapping(node)
merged.extend(node.value)
for index in range(len(node.value)):
key_node, value_node = node.value[index]
if key_node.tag == u'tag:yaml.org,2002:merge':
if isinstance(value_node, SequenceNode):
for subnode in value_node.value:
merge_from(subnode)
else:
merge_from(value_node)
else:
if key_node.tag == u'tag:yaml.org,2002:value':
key_node.tag = u'tag:yaml.org,2002:str'
merged.append((key_node, value_node))
node.value = merged
yaml_file = """
d1:
id:
nm1: val1
dt: &dt
nm2: val2
nm3: val3
d2: # expect nm4, nm2, nm3
nm4: val4
<<: *dt
"""
out = yaml.load(yaml_file, Loader=MyLoader)
keys = [x for x in out['d2']]
print(keys)
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"
This has not the best performance as it basically copies all key-value pairs from all mappings once each during loading, but it's working. Performance enhancement is left as an exercise for the reader :).

Error remove over-representative sequences : TypeError: coercing to Unicode: need string or buffer, NoneType found

Hi I am running this python script to remove over-representative sequences from my fastq files, but I keep getting the error. I am new to bioinfomatics and have been following a fixed set of pipeline for sequence assembly. I wanted to remove over-representative sequences with this script
python /home/TranscriptomeAssemblyTools/RemoveFastqcOverrepSequenceReads.py -1 R1_1.fq -2 R1_2.fq
**Here is the error
Traceback (most recent call last):
File "TranscriptomeAssemblyTools/RemoveFastqcOverrepSequenceReads.py", line 46, in
leftseqs=ParseFastqcLog(opts.l_fastqc)
File "TranscriptomeAssemblyTools/RemoveFastqcOverrepSequenceReads.py", line 33, in ParseFastqcLog
with open(fastqclog) as fp:
TypeError: coercing to Unicode: need string or buffer, NoneType found**
Here is the script :
import sys
import gzip
from os.path import basename
import argparse
import re
from itertools import izip,izip_longest
def seqsmatch(overreplist,read):
flag=False
if overreplist!=[]:
for seq in overreplist:
if seq in read:
flag=True
break
return flag
def get_input_streams(r1file,r2file):
if r1file[-2:]=='gz':
r1handle=gzip.open(r1file,'rb')
r2handle=gzip.open(r2file,'rb')
else:
r1handle=open(r1file,'r')
r2handle=open(r2file,'r')
return r1handle,r2handle
def FastqIterate(iterable,fillvalue=None):
"Grab one 4-line fastq read at a time"
args = [iter(iterable)] * 4
return izip_longest(fillvalue=fillvalue, *args)
def ParseFastqcLog(fastqclog):
with open(fastqclog) as fp:
for result in re.findall('Overrepresented sequences(.*?)END_MODULE', fp.read(), re.S):
seqs=([i.split('\t')[0] for i in result.split('\n')[2:-1]])
return seqs
if __name__=="__main__":
parser = argparse.ArgumentParser(description="options for removing reads with over-represented sequences")
parser.add_argument('-1','--left_reads',dest='leftreads',type=str,help='R1 fastq file')
parser.add_argument('-2','--right_reads',dest='rightreads',type=str,help='R2 fastq file')
parser.add_argument('-fql','--fastqc_left',dest='l_fastqc',type=str,help='fastqc text file for R1')
parser.add_argument('-fqr','--fastqc_right',dest='r_fastqc',type=str,help='fastqc text file for R2')
opts = parser.parse_args()
leftseqs=ParseFastqcLog(opts.l_fastqc)
rightseqs=ParseFastqcLog(opts.r_fastqc)
r1_out=open('rmoverrep_'+basename(opts.leftreads).replace('.gz',''),'w')
r2_out=open('rmoverrep_'+basename(opts.rightreads).replace('.gz',''),'w')
r1_stream,r2_stream=get_input_streams(opts.leftreads,opts.rightreads)
counter=0
failcounter=0
with r1_stream as f1, r2_stream as f2:
R1=FastqIterate(f1)
R2=FastqIterate(f2)
for entry in R1:
counter+=1
if counter%100000==0:
print "%s reads processed" % counter
head1,seq1,placeholder1,qual1=[i.strip() for i in entry]
head2,seq2,placeholder2,qual2=[j.strip() for j in R2.next()]
flagleft,flagright=seqsmatch(leftseqs,seq1),seqsmatch(rightseqs,seq2)
if True not in (flagleft,flagright):
r1_out.write('%s\n' % '\n'.join([head1,seq1,'+',qual1]))
r2_out.write('%s\n' % '\n'.join([head2,seq2,'+',qual2]))
else:
failcounter+=1
print 'total # of reads evaluated = %s' % counter
print 'number of reads retained = %s' % (counter-failcounter)
print 'number of PE reads filtered = %s' % failcounter
r1_out.close()
r2_out.close()
Maybe you already solved it, I had the same error but now is running well.
Hope this help
(1) Files we need:
usage: RemoveFastqcOverrepSequenceReads.py [-h] [-1 LEFTREADS] [-2 RIGHTREADS] [-fql L_FASTQC] [-fqr R_FASTQC
(2) Specify fastqc_data.text files that are in the fastqc output, unzip the output directory
'-fql','--fastqc_left',dest='l_fastqc',type=str,help='fastqc text file for R1'
'-fqr','--fastqc_right',dest='r_fastqc',type=str,help='fastqc text file for R2'
(3) Keep the reads and the fastqc_data text in the same directory
(4) Specify the path location before each file
python RemoveFastqcOverrepSequenceReads.py
-1 ./bicho.fq.1.gz -2./bicho.fq.2.gz
-fql ./fastqc_data_bicho_1.txt -fqr ./fastqc_data_bicho_2.txt
(5) run! :)

ValueError: not enough values to pack (expected 2, got 1) syntax error

Hello I am taking an example from learn python the hard way and working to understand python. I have already caught one error in the book and realized you need () to print out strings. So maybe there could be some more mistakes in the syntax. when I ran this file I received an error that said
ValueError: not enough values to pack (expected 2, got 1) syntax error. And another general syntax error. So I can't set prompt to raw_input.
rom sys import argv
script, user_name = argv
prompt = '>'
print ("Hi %s, I'm the $s script.") % user_name, script
print ("I'd like to ask you a few questions.")
print ("Do you like %s?") % user_name
likes = raw_input(prompt)
print ("Where do you live %s") % user_name
lives = raw_input(prompt)
print """
(Alright, so you said %r about liking me.
You live in %r. Not sure where that is.
And you have a %r computer. Nice)
"""& (likes, lives, computer)
Is there a typo in the code example? $s instead of %s in the format string?
I suspect that your actual code has this:
print ("Hi %s, I'm the %s script.") % user_name, script
And you meant to write this:
print "Hi %s, I'm the %s script." % (user_name, script)
That is, print a single string which is the result of applying the format string to two arguments.

error in writing to a file

I have written a python script that calls unix sort using subprocess module. I am trying to sort a table based on two columns(2 and 6). Here is what I have done
sort_bt=open("sort_blast.txt",'w+')
sort_file_cmd="sort -k2,2 -k6,6n {0}".format(tab.name)
subprocess.call(sort_file_cmd,stdout=sort_bt,shell=True)
The output file however contains an incomplete line which produces an error when I parse the table but when I checked the entry in the input file given to sort the line looks perfect. I guess there is some problem when sort tries to write the result to the file specified but I am not sure how to solve it though.
The line looks like this in the input file
gi|191252805|ref|NM_001128633.1| Homo sapiens RIMS binding protein 3C (RIMBP3C), mRNA gnl|BL_ORD_ID|4614 gi|124487059|ref|NP_001074857.1| RIMS-binding protein 2 [Mus musculus] 103 2877 3176 846 941 1.0102e-07 138.0
In output file however only gi|19125 is printed. How do I solve this?
Any help will be appreciated.
Ram
Using subprocess to call an external sorting tool seems quite silly considering that python has a built in method for sorting items.
Looking at your sample data, it appears to be structured data, with a | delimiter. Here's how you could open that file, and iterate over the results in python in a sorted manner:
def custom_sorter(first, second):
""" A Custom Sort function which compares items
based on the value in the 2nd and 6th columns. """
# First, we break the line into a list
first_items, second_items = first.split(u'|'), second.split(u'|') # Split on the pipe character.
if len(first_items) >= 6 and len(second_items) >= 6:
# We have enough items to compare
if (first_items[1], first_items[5]) > (second_items[1], second_items[5]):
return 1
elif (first_items[1], first_items[5]) < (second_items[1], second_items[5]):
return -1
else: # They are the same
return 0 # Order doesn't matter then
else:
return 0
with open(src_file_path, 'r') as src_file:
data = src_file.read() # Read in the src file all at once. Hope the file isn't too big!
with open(dst_sorted_file_path, 'w+') as dst_sorted_file:
for line in sorted(data.splitlines(), cmp = custom_sorter): # Sort the data on the fly
dst_sorted_file.write(line) # Write the line to the dst_file.
FYI, this code may need some jiggling. I didn't test it too well.
What you see is probably the result of trying to write to the file from multiple processes simultaneously.
To emulate: sort -k2,2 -k6,6n ${tabname} > sort_blast.txt command in Python:
from subprocess import check_call
with open("sort_blast.txt",'wb') as output_file:
check_call("sort -k2,2 -k6,6n".split() + [tab.name], stdout=output_file)
You can write it in pure Python e.g., for a small input file:
def custom_key(line):
fields = line.split() # split line on any whitespace
return fields[1], float(fields[5]) # Python uses zero-based indexing
with open(tab.name) as input_file, open("sort_blast.txt", 'w') as output_file:
L = input_file.read().splitlines() # read from the input file
L.sort(key=custom_key) # sort it
output_file.write("\n".join(L)) # write to the output file
If you need to sort a file that does not fit in memory; see Sorting text file by using Python

Resources