Strict searching against two different files - windows

I have two questions regarding the following code:
import subprocess
macSource1 = (r"\\Server\path\name\here\dhcp-dump.txt")
macSource2 = (r"\\Server\path\name\here\dhcp-dump-ops.txt")
with open (r"specific-pcs.txt") as file:
line = []
for line in file:
pcName = line.strip().upper()
with open (macSource1) as source1, open (macSource2) as source2:
items = []
for items in source1:
if pcName in items:
items_split = items.rstrip("\n").split('\t')
ip = items_split[0]
mac = items_split[4]
mac2 = ':'.join(s.encode('hex') for s in mac.decode('hex')).lower() # Puts the :'s between the pairs.
print mac2
print pcName
print ip
Firstly, as you can see, the script is searching for the contents of "specific-pcs.txt" against the contents of macSource1 to get various details. How do I get it to search against BOTH macSource1 & 2 (as the details could be in either file)??
And secondly, I need to have a stricter matching process as at the moment a machine called 'itroom02' will not only find it's own details, but also provide the details for another machine called '2nd-itroom02'. How would I get that?
Many thanks for your assistance in advance!
Chris.

Perhaps you should restructure it a bit more like this:
macSources = [ r"\\Server\path\name\here\dhcp-dump.txt",
r"\\Server\path\name\here\dhcp-dump-ops.txt" ]
with open (r"specific-pcs.txt") as file:
for line in file:
# ....
for target in macSources:
with open (target) as source:
for items in source:
# ....
There's no need to do e.g. line = [] immediately before you do for line in ...:.
As far as the "stricter matching" goes, since you don't give examples of the format of your files, I can only guess - but you might want to try something like if items_split[1] == pcName: after you've done the split, instead of the if pcName in items: before you split (assuming the name is in the second column - adjust accordingly if not).

Related

How to use entrezpy and Biopython Entrez libraries to access ClinVar data from genomic position of variant

[Disclaimer: I have published this question 3 weeks ago in biostars, with no answers yet. I really would like to get some ideas/discussion to find a solution, so I post also here.
biostars post link: https://www.biostars.org/p/447413/]
For one of my projects of my PhD, I would like to access all variants, found in ClinVar db, that are in the same genomic position as the variant in each row of the input GSVar file. The language constraint is Python.
Up to now I have used entrezpy module: entrezpy.esearch.esearcher. Please see more for entrezpy at: https://entrezpy.readthedocs.io/en/master/
From the entrezpy docs I have followed this guide to access UIDs using the genomic position of a variant: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html in code:
# first get UIDs for clinvar records of the same position
# credits: credits: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html
chr = variants["chr"].split("chr")[1]
start, end = str(variants["start"]), str(variants["end"])
es = entrezpy.esearch.esearcher.Esearcher('esearcher', self.entrez_email)
genomic_pos = chr + "[chr]" + " AND " + start + ":" + end # + "[chrpos37]"
entrez_query = es.inquire(
{'db': 'clinvar',
'term': genomic_pos,
'retmax': 100000,
'retstart': 0,
'rettype': 'uilist'}) # 'usehistory': False
entrez_uids = entrez_query.get_result().uids
Then I have used Entrez from BioPython to get the available ClinVar records:
# process each VariationArchive of each UID
handle = Entrez.efetch(db='clinvar', id=current_entrez_uids, rettype='vcv')
clinvar_records = {}
tree = ET.parse(handle)
root = tree.getroot()
This approach is working. However, I have two main drawbacks:
entrezpy fulls up my log file recording all interaction with Entrez making the log file too big to be read by the hospital collaborator, who is variant curator.
entrezpy function, entrez_query.get_result().uids, will return all UIDs retrieved so far from all the requests (say a request for each variant in GSvar), thus this space inefficient retrieval. That is the entrez_uids list will quickly grow a lot as I process all variants from a GSVar file. The simple solution that I have implenented is to check which UIDs are new from the current request and then keep only those for Entrez.fetch(). However, I still need to keep all seen UIDs, from previous variants in order to be able to know which is the new UIDs. I do this in code by:
# first snippet's first lines go here
entrez_uids = entrez_query.get_result().uids
current_entrez_uids = [uid for uid in entrez_uids if uid not in self.all_entrez_uids_gsvar_file]
self.all_entrez_uids_gsvar_file += current_entrez_uids
Does anyone have suggestion(s) on how to address these two presented drawbacks?

Assign File.readlines[n] to variable

I'm reading a text file which has instructions on each line. I want to assign the text on each line to it's own variable. When I do this, the value returned is nil but when I output the value of readlines[n] it is correct.
e.g.
# Using the variable (incorrect result)
puts current_zone_size
>
e.g.
# Using readlines after variable assignment (incorrect result)
current_zone_size = instructions.readlines[0]
instructions.readlines[0]
>
e.g.
# Using readlines (correct result)
instructions.readlines[0]
> 8 10
This is my code:
instructions = File.open("operator-input.txt", "r")
current_zone_size = instructions.readlines[0]
rover_init_location_orientation = instructions.readlines[1]
rover_movements = instructions.readlines[2]
This is the text in the file being read:
8 10
1 2 E
MMLMRMMRRMML
Edit:
Is the file being closed? Is this the reason I can't assign values from File.readlines[n] to variables if I'm not doing the variable assignment from within a block?
Also, the file will only ever have three lines which is why I'm not using a loop to read the lines.
IO#readlines reads all the lines in the file. It should not come as a surprise that, in order to read all the lines in the file, it has to read the entire file.
So, where is the file pointer after you read the entire file? It is at the end of the file.
What happens if you call IO#readlines the second time, when the file pointer is still at the end of the file? It will start reading at the position of the file pointer, which means it will read an empty file.
Therefore, if you want to do it the way you are doing it, you need to reset the file pointer to the beginning of the file every time you call IO#readlines:
instructions = File.open('operator-input.txt', 'r')
current_zone_size = instructions.readlines[0]
instructions.pos = 0
rover_init_location_orientation = instructions.readlines[1]
instructions.pos = 0
rover_movements = instructions.readlines[2]
Note also that you are leaking resources: you never close the file, so it will only by closed at the earliest by Ruby when the instructions variable gets out of scope and the File instance gets garbage-collected, and at the latest by the OS automatically when your Ruby process exits, which may be a long time later. So, your code should rather be:
instructions = File.open('operator-input.txt', 'r')
current_zone_size = instructions.readlines[0]
instructions.pos = 0
rover_init_location_orientation = instructions.readlines[1]
instructions.pos = 0
rover_movements = instructions.readlines[2]
instructions.close
In general, it is much better to use the block form of File::open, which closes the file handle automatically for you at the end of the block, and also ensures that this happens even in the case of complex control flow, errors, or exceptions:
File.open('operator-input.txt', 'r') do |instructions|
current_zone_size = instructions.readlines[0]
instructions.pos = 0
rover_init_location_orientation = instructions.readlines[1]
instructions.pos = 0
rover_movements = instructions.readlines[2]
end
Note, however, that what you want to do is horribly inefficient: you read the entire file, then take the first line, throw the rest away. Then you read the entire file again, take the second line, throw the rest away. Then you read the entire file again, take the third line, throw the rest away.
It makes much more sense to read the entire file once and then take the lines you need. Something like this:
File.open('operator-input.txt', 'r') do |instructions|
current_zone_size, rover_init_location_orientation, rover_movements =
instructions.readlines
end
However, in the case where all you do is open the file, read all lines, then immediately close it again, you should rather use the IO::readlines method instead of IO#readlines, since it does all three things for you in one call:
current_zone_size, rover_init_location_orientation, rover_movements =
File.readlines('operator-input.txt')
I ended up reading all the lines at once, now I'm able to set each variable outside of a block. Like this:
instructions = File.readlines "operator-input.txt"
current_zone_size = instructions[0]
rover_init_location_orientation = instructions[1]
rover_movements = instructions[2]
e.g.
puts current_zone_size
> 8 10

Ruby Simple Read/Write File (Copy File)

I am practicing Ruby, and I am trying to copy contents from file "from" to file "to". can you tell me where I did it wrong?
thanks !
from = "1.txt"
to = "2.txt"
data = open(from).read
out = open(to, 'w')
out.write(data)
out.close
data.close
Maybe I am missing the point, but I think writing it like so is more 'ruby'
from = "1.txt"
to = "2.txt"
contents = File.open(from, 'r').read
File.open(to, 'w').write(contents)
Personally, however, I like to use the Operating systems terminal to do File operations like so. Here is an example on linux.
from = "1.txt"
to = "2.txt"
system("cp #{from} #{to}")
And for Windows I believe you would use..
from = "1.txt"
to = "2.txt"
system("copy #{from} #{to}")
Finally, if you were needing the output of the command for some sort of logging or other reason, I would use backticks.
#A nice one liner
`cp 1.txt 2.txt`
Here is the system and backtick methods documentation.
http://ruby-doc.org/core-1.9.3/Kernel.html
You can't perform data.close — data.class would show you that you have a String, and .close is not a valid String method. By opening from the way you chose to, you lost the File reference after using it with your read. One way to fix that would be:
from = "1.txt"
to = "2.txt"
infile = open(from) # Retain the File reference
data = infile.read # Use it to do the read
out = open(to, 'w')
out.write(data)
out.close
infile.close # And finally, close it

Import Multiple Images With Unknown Names

I need to import multiple images (10.000) in Matlab (2013b) from a subdirectory of the predefined directory of Matlab.
I don't know the exact names of the images.
I tried this:
file = dir('C:\Users\user\Documents\MATLAB\train');
NF = length(file);
for k = 1 : NF
img = imread(fullfile('C:\Users\user\Documents\MATLAB\train', file(k).name));
end
But it throws this error though I ran it with the Admin privileges:
Error using imread (line 347)
Can't open file "C:\Users\user\Documents\MATLAB\train\." for reading;
you may not have read permission.
The "dir" command returns the virtual directory elements "." (self directory) and ".." parent, as your error message shows.
A simple fix is to use a more specific dir call, based on your image types, perhaps:
file = dir('C:\Users\user\Documents\MATLAB\train\*.jpg');
Check the output of dir. The first two "files" are . and .., which is similar to the behaviour of the windows dir command.
file = dir('C:\Users\user\Documents\MATLAB\train');
NF = length(file);
for k = 3 : NF
img = imread(fullfile('C:\Users\user\Documents\MATLAB\train', file(k).name));
end
In R2013b you would have to do
file = dir('C:\Users\user\Documents\MATLAB\train\*.jpg');
If you have R2014b with the Computer Vision System Toolbox then you can use imageSet:
images = imageSet('C:\Users\user\Documents\MATLAB\train\');
This will create an object containing paths to all image files in the train directory, regardless of format. Then you can read the i-th image like this:
im = read(images, i);

'File path' use causing program exit in Python 3

I have downloaded a set of html files and saved the file paths which I saved them to in a .txt file. It has each path on a new line. I wanted to look at the first file in the list and then itterate through the whole list, opening the files and extracting data before going on to the next file.
My code works fine with a single path put in directly (for the first file) as:
path = r'C:\path\to\file.html'
and works if I itterate through the text file using:
file_list_fp = r'C:\path\to\file_with_pathlist.txt'
with open(file_list_fp, 'r') as file_list:
for filepath in file_list:
pathend = filepath.find('\n')
path = file[:pathend]
q = open(path, 'r').read()
but it fails when I try getting a single path using either:
with open(file_list_fp, 'r') as file_list:
path_n = file_list.readline()
end = path_n.find('\n')
path_bad1 = path_n[:end]
or:
with open(file_list_fp, 'r') as file_list:
path_bad2 = file_list.readline().split('\n')[0]
With these two my code exits just after that point. I can't figure out why. Any pointers very welcome. (I'm using Python 3.3.1 on windows.)

Resources