I want to obtain a ping-like response from a Windows network location that has a Distributed File System architecture e.g.
path = r'\\path\to\some\shared\folder_x'
delay = ping_func(path)
print delay # return response in milliseconds ?
Once I have host computer I can easily ping the location.
I can determine the host name for folder_x by looking at the DFS tab in the windows explorer which will look like e.g.
How can I do this programmatically in Python?
Since you are using Windows, your always install pywin32 and WMI to get the WMI functions. And below should help you connect to remote DFS. Can't test it as I don't have Windows or DFS
import wmi
c = wmi.WMI (ip, user="user", password="pwd")
for share in c.Win32_Share (Type=0):
print share.Caption, share.Path
for session in share.associators (
print " ", session.UserName, session.ActiveTime
I've been able to directly call the NetDfsGetInfo function using Python's "ctypes" module.
Some stumbling points I had was understanding the C++/Python interface and variable marshalling - that's what the dfs.argtypes helps with.
The C++ calls return their structures by placing pointers into a buffer you supply to the call. Using byref you are matching the function prototype LPBYTE *Buffer
Processing the output requires defining a "Structure" that matches the function return, in this case DFS_INFO_3. The python "buffer" variable is cast as a pointer to DFS_INFO_3 and ctypes.Structure defines the field names and the types the struct is build from. Then you can access them via attribute name, eg, dfs_info.EntryPath
There was a pointer to a variable-length array (DFS_STORAGE_INFO) returned too, which is able to be accessed via normal Python storage[i] syntax.
import ctypes as ct
from ctypes import wintypes as win
dfs = ct.windll.netapi32.NetDfsGetInfo
dfs.argtypes = [
class DFS_STORAGE_INFO(ct.Structure):
"""Contains information about a DFS root or link target in a DFS namespace."""
_fields_ = [ # noqa: WPS120
("State", win.ULONG),
("ServerName", win.LPWSTR),
("ShareName", win.LPWSTR),
class DFS_INFO_3(ct.Structure): # noqa: WPS114
"""Contains information about a Distributed File System (DFS) root or link."""
_fields_ = [ # noqa: WPS120
("EntryPath", win.LPWSTR),
("Comment", win.LPWSTR),
("State", win.DWORD),
("NumberOfStorages", win.DWORD),
# ----- Function call -----
buffer = win.LPBYTE() # allocate a LPBYTE type buffer to be used for return pointer
dret = dfs(r"\\something.else\here", None, None, 3, ct.byref(buffer))
# specify that buffer now points to a DFS_INFO_3 struct
dfs_info = ct.cast(buffer, ct.POINTER(DFS_INFO_3)).contents
for i in range(dfs_info.NumberOfStorages):
storage = dfs_info.Storage[i]
When preparing lectures, or conference presentations with beamer, I usually use layered drawings. Then for graphics included in consecutive slides ("frames" in beamer), I simply use different sets of layers.
For graphics created in IPE, I have created a dedicated expallviews.lua script.
Unfortunately, for graphics created with diagrams.net locally run as drawio-desktop, no such automated export of various layers exists. The only way is to manually select the visible layers in GUI and then export consecutive drawings to a set of PDF files.
Is there a more convenient method to solve that problem?
The described problem has been reported in issues 405 and 737 in the drawio-desktop repository.
After reviewing those issues, I have found a method based on automated (instead of a manual via GUI) changing the visibility of layers and exporting such drawings to the set of PDF files. The proposed method is described in the comment to the issue 405. It uses a simple Python script:
This script modifies the visibility of layers in the XML
file with diagram generated by drawio.
It works around the problem of lack of a possibility to export
only the selected layers from the CLI version of drawio.
Written by Wojciech M. Zabolotny 6.10.2022
(wzab01<at>gmail.com or wojciech.zabolotny<at>pw.edu.pl)
The code is published under LGPL V2 license
from lxml import etree as let
import xml.etree.ElementTree as et
import xml.parsers.expat as pe
from io import StringIO
import os
import sys
import shutil
import zlib
import argparse
PARSER = argparse.ArgumentParser()
PARSER.add_argument("--layers", help="Selected layers, \"all\", comma separated list of integers or integer ranges like \"0-3,6,7\"", default="all")
PARSER.add_argument("--layer_prefix", help="Layer name prefix", default="Layer_")
PARSER.add_argument("--outfile", help="Output file", default="output.drawio")
PARSER.add_argument("--infile", help="Input file", default="input.drawio")
ARGS = PARSER.parse_args()
# Find all elements with 'value' starting with the layer prefix.
# Return tuples with the element and the rest of 'value' after the prefix.
def find_layers(el_start):
res = []
for el in el_start:
val = el.get('value')
if val is not None:
if val.find(ARGS.layer_prefix) == 0:
# This is a layer element. Add it, and its name
# after the prefix to the list.
# If it is not a layer element, scan its children
return res
# Analyse the list of visible layers, and create the list
# of layers that should be visible. Customize this part
# if you want a more sophisticate method for selection
# of layers.
# Now only "all", comma separated list of integers
# or ranges of integers are supported.
def build_visible_list(layers):
if layers == "all":
return layers
res = []
for lay in layers.split(','):
# Is it a range?
s = lay.find("-")
if s > 0:
# This is a range
first = int(lay[:s])
last = int(lay[(s+1):])
return res
def is_visible(layer_tuple,visible_list):
if visible_list == "all":
return True
if int(layer_tuple[1]) in visible_list:
return True
EL_ROOT = et.fromstring(open(INFILENAME,"r").read())
except et.ParseError as perr:
# Handle the parsing error
ROW, COL = perr.position
"Parsing error "
+ str(perr.code)
+ "("
+ pe.ErrorString(perr.code)
+ ") in column "
+ str(COL)
+ " of the line "
+ str(ROW)
+ " of the file "
visible_list = build_visible_list(ARGS.layers)
layers = find_layers(EL_ROOT)
for layer_tuple in layers:
if is_visible(layer_tuple,visible_list):
print("set "+layer_tuple[1]+" to visible")
print("set "+layer_tuple[1]+" to invisible")
# Now write the modified file
with open(OUTFILENAME, 'w') as f:
t.write(f, encoding='unicode')
The maintained version of that script, together with a demonstration of its use is also available in my github repository.
[Disclaimer: I have published this question 3 weeks ago in biostars, with no answers yet. I really would like to get some ideas/discussion to find a solution, so I post also here.
biostars post link: https://www.biostars.org/p/447413/]
For one of my projects of my PhD, I would like to access all variants, found in ClinVar db, that are in the same genomic position as the variant in each row of the input GSVar file. The language constraint is Python.
Up to now I have used entrezpy module: entrezpy.esearch.esearcher. Please see more for entrezpy at: https://entrezpy.readthedocs.io/en/master/
From the entrezpy docs I have followed this guide to access UIDs using the genomic position of a variant: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html in code:
# first get UIDs for clinvar records of the same position
# credits: credits: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html
chr = variants["chr"].split("chr")[1]
start, end = str(variants["start"]), str(variants["end"])
es = entrezpy.esearch.esearcher.Esearcher('esearcher', self.entrez_email)
genomic_pos = chr + "[chr]" + " AND " + start + ":" + end # + "[chrpos37]"
entrez_query = es.inquire(
{'db': 'clinvar',
'term': genomic_pos,
'retmax': 100000,
'retstart': 0,
'rettype': 'uilist'}) # 'usehistory': False
entrez_uids = entrez_query.get_result().uids
Then I have used Entrez from BioPython to get the available ClinVar records:
# process each VariationArchive of each UID
handle = Entrez.efetch(db='clinvar', id=current_entrez_uids, rettype='vcv')
clinvar_records = {}
tree = ET.parse(handle)
root = tree.getroot()
This approach is working. However, I have two main drawbacks:
entrezpy fulls up my log file recording all interaction with Entrez making the log file too big to be read by the hospital collaborator, who is variant curator.
entrezpy function, entrez_query.get_result().uids, will return all UIDs retrieved so far from all the requests (say a request for each variant in GSvar), thus this space inefficient retrieval. That is the entrez_uids list will quickly grow a lot as I process all variants from a GSVar file. The simple solution that I have implenented is to check which UIDs are new from the current request and then keep only those for Entrez.fetch(). However, I still need to keep all seen UIDs, from previous variants in order to be able to know which is the new UIDs. I do this in code by:
# first snippet's first lines go here
entrez_uids = entrez_query.get_result().uids
current_entrez_uids = [uid for uid in entrez_uids if uid not in self.all_entrez_uids_gsvar_file]
self.all_entrez_uids_gsvar_file += current_entrez_uids
Does anyone have suggestion(s) on how to address these two presented drawbacks?
Is it possible to read pdf/audio/video files(unstructured data) using Apache Spark?
For example, I have thousands of pdf invoices and I want to read data from those and perform some analytics on that. What steps must I do to process unstructured data?
Yes, it is. Use sparkContext.binaryFiles to load files in binary format and then use map to map value to some other format - for example, parse binary with Apache Tika or Apache POI.
val rawFile = sparkContext.binaryFiles(...
val ready = rawFile.map ( here parsing with other framework
What is important, parsing must be done with other framework like mentioned previously in my answer. Map will get InputStream as an argument
We had a scenario where we needed to use a custom decryption algorithm on the input files. We didn't want to rewrite that code in Scala or Python. Python-Spark code follows:
from pyspark import SparkContext, SparkConf, HiveContext, AccumulatorParam
def decryptUncompressAndParseFile(filePathAndContents):
'''each line of the file becomes an RDD record'''
global acc_errCount, acc_errLog
proc = subprocess.Popen(['custom_decrypt_program','--decrypt'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(unzippedData, err) = proc.communicate(input=filePathAndContents[1])
if len(err) > 0: # problem reading the file
acc_errLog.add('Error: '+str(err)+' in file: '+filePathAndContents[0]+
', on host: '+ socket.gethostname()+' return code:'+str(returnCode))
return [] # this is okay with flatMap
records = list()
iterLines = iter(unzippedData.splitlines())
for line in iterLines:
#sys.stderr.write('Line: '+str(line)+'\n')
values = [x.strip() for x in line.split('|')]
records.append( (... extract data as appropriate from values into this tuple ...) )
return records
class StringAccumulator(AccumulatorParam):
''' custom accumulator to holds strings '''
def zero(self,initValue=""):
return initValue
def addInPlace(self,str1,str2):
return str1.strip()+'\n'+str2.strip()
def main():
global acc_errCount, acc_errLog
acc_errCount = sc.accumulator(0)
acc_errLog = sc.accumulator('',StringAccumulator())
binaryFileTup = sc.binaryFiles(args.inputDir)
# use flatMap instead of map, to handle corrupt files
linesRdd = binaryFileTup.flatMap(decryptUncompressAndParseFile, True)
df = sqlContext.createDataFrame(linesRdd, ourSchema())
The custom string accumulator was very useful in identifying corrupt input files.
I would like to retrieve the text of a specific window. Using
twapi::get_window_text $handle
I get the caption of the window. But how can I get the actual content ? In C++ I was using
How can I use these raw Windows API functions from TCL? For EM_GETLINE for example I have to define the numbers of lines to be fetched and the buffer where they shall be stored.
Could someone show me how to use raw Windows API functions from TCL or point me to a site where I can find examples? Thanks
You can send messages with Twapi's raw-API. I'm not fammilar with the the exact details how this message works, but you know that probably better than me:
package require twapi
proc get_richedit_text {hwnd line} {
set MAX_LEN 0x0100
# You have to lookup this value in the header.
set EM_GETLINE 0x00C4
set bufsize [expr {2 * ($MAX_LEN + 1)}]
# yes, twapi has malloc.
set szbuf [twapi::malloc $bufsize]
# catch everything, so we can free the buffer.
catch {
# set the first word to the size. Whatever a word is.
# I assume it is an int (type 1), but if it is a int64, use type 5, wchar is 3.
# arguments to Twapi_WriteMemory: type pointer(void*) offset bufferlen value
twapi::Twapi_WriteMemory 1 $szbuf 0 $bufsize $MAX_LEN
# send the message. You don't have SendMessage, only SendMessageTimeout
set ressize [twapi::SendMessageTimeout $hwnd $EM_GETLINE $line [twapi::pointer_to_address $szbuf] 0x0008 1000]
return [twapi::Twapi_ReadMemory 3 $szbuf 0 [expr {$ressize * 2}]]
} res opt
# free the buffer.
twapi::free $szbuf
return -options $opt $res
I used some internal/undocumented twapi API, the only documentation is twapi's source code.
I worked today in a simple script to checksum files in all available hashlib algorithms (md5, sha1.....) I wrote it and debug it with Python2, but when I decided to port it to Python 3 it just won't work. The funny thing is that it works for small files, but not for big files. I thought there was a problem with the way I was buffering the file, but the error message is what makes me think it is something related to the way I am doing the hexdigest (I think) Here is a copy of my entire script, so feel free to copy it, use it and help me figure out what the problem is with it. The error I get when checksuming a 250 MB file is
"'utf-8' codec can't decode byte 0xf3 in position 10: invalid continuation byte"
I google it, but can't find anything that fixes it. Also if you see better ways to optimize it, please let me know. My main goal is to make work 100% in Python 3. Thanks
import hashlib
import argparse
def hashFile(algorithm = "md5", filepaths=[], blockSize=4096):
algorithmType = getattr(hashlib, algorithm.lower())() #Default: hashlib.md5()
#Open file and extract data in chunks
for path in filepaths:
with open(path) as f:
while True:
dataChunk = f.read(blockSize)
if not dataChunk:
yield algorithmType.hexdigest()
except Exception as e:
print (e)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('filepaths', nargs="+", help='Specified the path of the file(s) to hash')
parser.add_argument('-a', '--algorithm', action='store', dest='algorithm', default="md5",
help='Specifies what algorithm to use ("md5", "sha1", "sha224", "sha384", "sha512")')
arguments = parser.parse_args()
algo = arguments.algorithm
if algo.lower() in ("md5", "sha1", "sha224", "sha384", "sha512"):
Here is the code that works in Python 2, I will just put it in case you want to use it without having to modigy the one above.
import hashlib
import argparse
def hashFile(algorithm = "md5", filepaths=[], blockSize=4096):
Hashes a file. In oder to reduce the amount of memory used by the script, it hashes the file in chunks instead of putting
the whole file in memory
algorithmType = hashlib.new(algorithm) #getattr(hashlib, algorithm.lower())() #Default: hashlib.md5()
#Open file and extract data in chunks
for path in filepaths:
with open(path, mode = 'rb') as f:
while True:
dataChunk = f.read(blockSize)
if not dataChunk:
yield algorithmType.hexdigest()
except Exception as e:
print e
def main():
parser = argparse.ArgumentParser()
parser.add_argument('filepaths', nargs="+", help='Specified the path of the file(s) to hash')
parser.add_argument('-a', '--algorithm', action='store', dest='algorithm', default="md5",
help='Specifies what algorithm to use ("md5", "sha1", "sha224", "sha384", "sha512")')
arguments = parser.parse_args()
#Call generator function to yield hash value
algo = arguments.algorithm
if algo.lower() in ("md5", "sha1", "sha224", "sha384", "sha512"):
for hashValue in hashFile(algo, arguments.filepaths):
print hashValue
print "Algorithm {0} is not available in this script".format(algorithm)
if __name__ == "__main__":
I haven't tried it in Python 3, but I get the same error in Python 2.7.5 for binary files (the only difference is that mine is with the ascii codec). Instead of encoding the data chunks, open the file directly in binary mode:
with open(path, 'rb') as f:
while True:
dataChunk = f.read(blockSize)
if not dataChunk:
yield algorithmType.hexdigest()
Apart from that, I'd use the method hashlib.new instead of getattr, and hashlib.algorithms_available to check if the argument is valid.