python win32com FileSystemObject failed on getting huge folder - windows

My test code is:
#!/usr/bin/env python
import win32com.client
def GetFolderSizeQuick(target_folder):
fso = win32com.client.Dispatch("Scripting.FileSystemObject")
fobj = fso.GetFolder(target_folder)
return fobj.size
print(GetFolderSizeQuick("d:/pytools"))
print(GetFolderSizeQuick("d:/cygwin"))
The result is:
D:\>python a.py
160659697
Traceback (most recent call last):
File "a.py", line 10, in <module>
print(GetFolderSizeQuick("d:/cygwin"))
File "a.py", line 7, in GetFolderSizeQuick
return fobj.size
File "D:\Applications\Python33\lib\site-packages\win32com\client\dynamic.py",
line 511, in __getattr__
ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
pywintypes.com_error: (-2147352567, '发生意外。', (0, None, None, None, 0, -2146828218), None)
The first call GetFolderSizeQuick on d:/pytools folder works. it's about 153MB. But the second call failed. The folder d:/cygwin is about 12.6GB.
I am working on windows 7 with python3.3.0 32bit version. So I think the problem happened on the 32bit or 64bit to store the result. 32bit int can not store 12.6GB size.
What is the real problem here, and how to fix it?

That's neither a directory size nor a 32/64-Bit problem.
It's even not a python2 or python3 problem.
Your Error translates to "No Access allowed!"
The simpliest way for testing would be to create a directory where only the owner is allowed to read and all others have NO rights at all. Then take this directory as input - you'll get the same error, even if the directory is empty. A good example would be the local "c:\system Volume Information".
Digging a little deeper:
The errorcodes given by python are signed, whereas for a reasonable lookup Microsoft describes and expects them as unsigned. Kudos to EB in this thread and Tim Peters in this thread, using the examples, you'll get reasonable error-Codes.
import win32com.client
import pywintypes
def get_folder_size(target_folder):
fso = win32com.client.Dispatch("Scripting.FileSystemObject")
fobj = fso.GetFolder(target_folder)
return fobj.size
if __name__ == '__main__':
try:
get_folder_size('c:/system volume information')
except pywintypes.com_error, e:
print e # debug, have to see which indices
print hex(e[0]+2**32), hex(e[2][5]+2**32)
Now search for both of the hex digits, the 2nd one should lead to a lot of "you are not allowed to..." queries and answers.

Related

Some .txt files will open with the wrong encoding

I have been working on a project for a while now, and I just reached another big step! However, for some .txt files that my program creates, it will give me this message:
File was loaded in the wrong encoding: 'UTF-8'
Most of the .txt files are fine, but it gives me this error for others at the top (I can still read them). Here is my code:
from socket import *
import codecs
import subprocess
ipa = '192.168.1.' # These are the first 3 digits of the IP addresses that the program looks for.
def is_up(adr):
s = socket(AF_INET, SOCK_STREAM)
s.settimeout(0.01)
if not s.connect_ex((adr, 135)):
s.close()
return 1
else:
s.close()
def main():
for i in range(1, 256):
adr = ipa + str(i)
if is_up(adr):
with codecs.open("" + getfqdn(adr) + ".txt", "w+", 'utf-8-sig') as f:
subprocess.run('ipconfig | findstr /i "ipv4"', stdout=f, shell=True, check=True)
subprocess.run('wmic/node:'+adr+' product get name, version, vendor', stdout=f, shell=True, check=True)
main()
# Most code provided by Ashish Jain
Unfortunately I don't think I'm allowed to say exactly which files are giving me trouble, because I might be distributing information that someone can use for malicious intent.
Since your script only writes to files, there's no reason to open it in w+ mode, which enables reading. Opening the files in w mode should be enough.
Furthermore, the commands that your script runs must not be outputting in utf-8-sig-encoded text, and hence the error. In most cases outputting with default encoding by not specifying an encoding will suffice.
Lastly, you're missing a space between wmic and /node: in the second command you run.

Python multiprocessing stdin input

All code written and tested on python 3.4 windows 7.
I was designing a console app and had a need to use stdin from command-line (win os) to issue commands and to change the operating mode of the program. The program depends on multiprocessing to deal with cpu bound loads to spread to multiple processors.
I am using stdout to monitor that status and some basic return information and stdin to issue commands to load different sub-processes based on the returned console information.
This is where I found a problem. I could no get the multiprocessing module to accept stdin inputs but stdout was working just fine. I think found the following help on stack So I tested it and found that with the threading module this all works great, except for the fact that all output to stdout is paused until each time stdin is cycled due to GIL lock with stdin blocking.
I will say I have been successful with a work around implemented with msvcrt.kbhit(). However, I can't help but wonder if there is some sort of bug in the multiprocessing feature that is making stdin not read any data. I tried numerous ways and nothing worked when using multiprocessing. Even attempted to use Queues, but I did not try pools, or any other methods from multiprocessing.
I also did not try this on my linux machine since I was focusing on trying to get it to work.
Here is simplified test code that does not function as intended (reminder this was written in Python 3.4 - win7):
import sys
import time
from multiprocessing import Process
def function1():
while True:
print("Function 1")
time.sleep(1.33)
def function2():
while True:
print("Function 2")
c = sys.stdin.read(1) # Does not appear to be waiting for read before continuing loop.
sys.stdout.write(c) #nothing in 'c'
sys.stdout.write(".") #checking to see if it works at all.
print(str(c)) #trying something else, still nothing in 'c'
time.sleep(1.66)
if __name__ == "__main__":
p1 = Process(target=function1)
p2 = Process(target=function2)
p1.start()
p2.start()
Hopefully someone can shed light on whether this is intended functionality, if I didn't implement it correctly, or some other useful bit of information.
Thanks.
When you take a look at Pythons implementation of multiprocessing.Process._bootstrap() you will see this:
if sys.stdin is not None:
try:
sys.stdin.close()
sys.stdin = open(os.devnull)
except (OSError, ValueError):
pass
You can also confirm this by using:
>>> import sys
>>> import multiprocessing
>>> def func():
... print(sys.stdin)
...
>>> p = multiprocessing.Process(target=func)
>>> p.start()
>>> <_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>
And reading from os.devnull immediately returns empty result:
>>> import os
>>> f = open(os.devnull)
>>> f.read(1)
''
You can work this around by using open(0):
file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.)
And "0 file descriptor":
File descriptors are small integers corresponding to a file that has been opened by the current process. For example, standard input is usually file descriptor 0, standard output is 1, and standard error is 2:
>>> def func():
... sys.stdin = open(0)
... print(sys.stdin)
... c = sys.stdin.read(1)
... print('Got', c)
...
>>> multiprocessing.Process(target=func).start()
>>> <_io.TextIOWrapper name=0 mode='r' encoding='UTF-8'>
Got a

sublimeREPL for python not showing complete Traceback

I'm trying to run my python code using sublimeREPL's "Python - RUN current file" command
It works fine if my program has no problems, but when it does, it doesn't show the complete Traceback (I don't get to see the "Repl Closed" message), and the output its not even consistent. Below two runs of the exactly same file (not posting images because stackoverflow doesn't allows me to because I'm new):
First Run:
------- Ford Fulkerson -------
Traceback (most recent call last):
File "Ford-Fulkerson.py", line 282, in <module>
D = FordFulkersonGeneral(G, ['A'], ['E'], None, restricciones)
File "Ford-Fulkerson.py", line 71, in FordFulk|
Second Run:
------- Ford Fulkerson -------
Traceback (most recent call last):
File "Ford-Fulkerson.py", line 282, in <module>
D = FordFulkersonGeneral(G, ['A'], ['E'
I was using the Anaconda's (64 bit) python distribution. Then I changed to a regular python (32 bit) install (made sure the window's path was all right) and even there its not working.
If I run my code from window's terminal I get the full Traceback (the actual error is not important, I know how to fix it):
------- Ford Fulkerson -------
Traceback (most recent call last):
File "Ford-Fulkerson.py", line 282, in <module>
D = FordFulkersonGeneral(G, ['A'], ['E'], None, restricciones)
File "Ford-Fulkerson.py", line 71, in FordFulkersonGeneral
G.deleteNode(v)
File "C:\Users\myusername\Documents\Learning\Anßlisis de Re
des\Ford-Fulkerson\mvr_graph.py", line 196, in deleteNode
self.nodes[node].delete(n)
AttributeError: 'dict' object has no attribute 'delete'
Edit:
I've found the answer by posting this question. The problem was in the path of the file - it contains an accent in the word "Análisis". I changed that and know its working.
It used to work when I had my OS language set to spanish. I set my new installation to english and now it was giving me trouble. I really didn't expected that, shame on you Windows x(.
I don't really know the protocol, I will just leave this question here in case anyone is going through this obscure thing.
I've found the answer by posting this question. The problem was in the path of the file - it contains an accent in the word "Análisis". I changed that and know its working.
It used to work when I had my OS language set to spanish. I set my new installation to english and now it was giving me troubles. I really didn't expected that, shame on you Windows x(.

Win32com Save PDF to XML with Acrobat Pro X > com_error "-2147467263, 'Not implemented'" [duplicate]

This question already has answers here:
"Not implemented" Exception when using pywin32 to control Adobe Acrobat
(2 answers)
Closed 6 years ago.
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Windows XP SP3
Python 2.7 pywin32-218
Adobe Acrobat X 10.0.0
I want to use Python to automate Acrobat Pro to export a PDF to XML. I already tried it manually using the 'Save As' dialog box from the running program and now want to do it via a Python script. I have read many pages including parts of the Adobe SDK, SDK Forum, VB Forums and am having no luck.
I read Blish's problem here: "Not implemented" Exception when using pywin32 to control Adobe Acrobat
And this page: timgolden python/win32_how_do_i/generate-a-static-com-proxy.html
I am missing something. My code is:
import win32com.client
import win32com.client.makepy
win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat')
adobe = win32com.client.DispatchEx('AcroExch.App')
avDoc = win32com.client.DispatchEx('AcroExch.AVDoc')
avDoc.Open('C:\Documents and Settings\PC\Desktop\a_PDF.pdf', 'C:\Documents and Settings\PC\Desktop')
pdDoc = avDoc.GetPDDoc()
jObject = pdDoc.GetJSObject()
jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00")
The full error is:
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00")
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 511, in __getattr__
ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147467263, 'Not implemented', None, None)
I'm guessing it has to do with make.py but I don't understand how to implement it in my code.
I pulled this line from my code and got the same error when I ran it:
win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat')
I then changed these two lines from 'DispatchEX' to 'Dispatch' and same error:
adobe = win32com.client.Dispatch('AcroExch.App')
avDoc = win32com.client.Dispatch('AcroExch.AVDoc')
When I run the Dispatches by themselves and then call them back I get:
>>> adobe = win32com.client.DispatchEx('AcroExch.App')
>>> adobe
<win32com.gen_py.Adobe Acrobat 10.0 Type Library.CAcroApp instance at 0x18787784>
>>> avDoc = win32com.client.Dispatch('AcroExch.AVDoc')
>>> avDoc
<win32com.gen_py.Adobe Acrobat 10.0 Type Library.CAcroAVDoc instance at 0x20365224>
Does this mean I should make only one call to Dispatch? I pulled:
adobe = win32com.client.Dispatch('AcroExch.App')
and got the same error.
This Adobe site says:
AVDoc
Product availability: Acrobat, Reader
Platform availability: Macintosh, Windows, UNIX
Syntax
typedef struct _t_AVDoc* AVDoc;
A view of a PDF document in a window. There is one AVDoc per displayed document. Unlike a PDDoc, an AVDoc has a window associated with it.
acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/API_References/Acrobat_API_Reference/AV_Layer/AVDoc.html#AVDocSaveParams
The PDDoc page says:
A PDDoc object represents a PDF document. There is a correspondence between a PDDoc and an ASFile. Also, every AVDoc has an associated PDDoc, although a PDDoc may not be associated with an AVDoc.
/9.1/Acrobat9_1_HTMLHelp/API_References/Acrobat_API_Reference/PD_Layer/PDDoc.html
I tried the following code and also got the same error:
import win32com.client
import win32com.client.makepy
pdDoc = win32com.client.Dispatch('AcroExch.PDDoc')
pdDoc.Open('C:\Documents and Settings\PC\Desktop\a_PDF.pdf')
jObject = pdDoc.GetJSObject()
jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00")
Same error if I change:
pdDoc = win32com.client.Dispatch('AcroExch.PDDoc')
to
pdDoc = win32com.client.gencache.EnsureDispatch('AcroExch.PDDoc')
like here: win32com.client.Dispatch works but not win32com.client.gencache.EnsureDispatch
user2993272, you were almost there: just one more line and the code you have should have worked flawlessly.
I'm going to attempt to answer in the same spirit as your question and provide you as much details as I can.
This thread holds the key to the solution you are looking for: https://mail.python.org/pipermail/python-win32/2002-March/000260.html
I admit that the post is not the easiest to find (perhaps Google scores it low based on the age of the content?).
Specifically, applying this piece of advice will get things running for you: https://mail.python.org/pipermail/python-win32/2002-March/000265.html
For completeness, this piece of code should get the job done and not require you to manually patch dynamic.py (snippet should run pretty much out of the box):
# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT
import winerror
# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
from scandir import walk
except ImportError:
from os import walk
import fnmatch
import sys
import os
ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".txt"
def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat
# Open the input file (as a pdf)
ret = avDoc.Open(f_path, f_path)
assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?
pdDoc = avDoc.GetPDDoc()
dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))
# Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
jsObject = pdDoc.GetJSObject()
# Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")
pdDoc.Close()
avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
del pdDoc
if __name__ == "__main__":
assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>
#$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt'
ROOT_INPUT_PATH = sys.argv[1]
INPUT_FILE_EXTENSION = sys.argv[2]
ROOT_OUTPUT_PATH = sys.argv[3]
OUTPUT_FILE_EXTENSION = sys.argv[4]
# tuples are of schema (path_to_file, filename)
matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))
# Magic piece of code that should get everything working for you!
# patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
global ERRORS_BAD_CONTEXT
ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)
for filename_with_path, filename_without_extension in matching_files:
print "Processing '{}'".format(filename_without_extension)
acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)
I have tested this on WinPython x64 2.7.6.3, Acrobat X Pro

Using windows command line to run python script with passing of url argument

I am writing this Python program which extracts some information from a webpage and I am required to run it using the windows command line. But I could not even print the original html page as a string. I am using Python 2.7
Here is my Python script:
#sys.py
import sys
import urllib
url = sys.argv[1]
f = urllib.urlopen(url)
print f.read()
When I try to run it from windows command line with: C...>sys.py "www.marinetraffic.com/ais/shipdetails.aspx?mmsi=311389000"
Errors appear as follows:
Traceback(most recent call last):
File "C:\...\sys.py", line 14 in <module>
f = urllib.urlopen(url)
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 463, in open_file
return self.open_local_file(url)
File "C:\Python27\lib\urllib.py", line 87, in open_loca_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] The system cannot find the path specified: 'www.marinetraffic.com\\ais\\shipdetails.aspx?mmsi=311389000'
There should not be any problem with the Python set up under the windows environment because I can still print out the sys.argv list as the arguments are passed in the command line.
Is it the problem with the 'urllib' library?
Is there any another way to run this using windows command line?
I think the problem is the way you specify your url, it needs to have the http:// part at the start.
It works for me when I type
python sys.py http://www.google.com/
but fails with
python sys.py www.google.com
(Note that I am using linux with python 2.7 but I think it may be the same problem for you)

Resources