Google Drive API: list files with no parent - google-api

The files in Google domain that I administer have gotten into a bad state; there are thousands of files residing in the root directory. I want to identify these files and move them to a folder underneath "My Drive".
When I use the API to list the parents for one of these orphaned files, the result is an empty array. To determine if a file is orphaned, I can iterate over all the files in my domain, and request the list of parents for each. If the list is empty, I know that the file is orphaned.
But this is hideously slow.
Is there anyway to use the Drive API to search for files that have no parents?
The "parents" field for the q parameter doesn't seem to be useful for this, as it's only possible to specify that the parents list contains some ID.
Update:
I'm trying to find a quick way to locate items that are truly at the root of the document hierarchy. That is, they are siblings of "My Drive", not children of "My Drive".

In Java:
List<File> result = new ArrayList<File>();
Files.List request = drive.files().list();
request.setQ("'root'" + " in parents");
FileList files = null;
files = request.execute();
for (com.google.api.services.drive.model.File element : files.getItems()) {
System.out.println(element.getTitle());
}
'root' is the parent folder, if the file or folder is in the root

Brute, but simple and it works..
do {
try {
FileList files = request.execute();
for (File f : files.getItems()) {
if (f.getParents().size() == 0) {
System.out.println("Orphan found:\t" + f.getTitle());
orphans.add(f);
}
}
request.setPageToken(files.getNextPageToken());
} catch (IOException e) {
System.out.println("An error occurred: " + e);
request.setPageToken(null);
}
} while (request.getPageToken() != null
&& request.getPageToken().length() > 0);

The documentation recommends following query: is:unorganized owner:me.

The premise is:
List all files.
If a file has no 'parents' field, it means it's an orphan file.
So, the script deletes them.
Before to start you need:
To create an OAuth id
Then you need to add the permissions '../auth/drive' to your OAuth id, and validating your app against google, so you have delete permissions.
Ready for copy paste demo
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']
def callback(request_id, response, exception):
if exception:
print("Exception:", exception)
def main():
"""
Description:
Shows basic usage of the Drive v3 API to delete orphan files.
"""
""" --- CHECK CREDENTIALS --- """
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
""" --- OPEN CONNECTION --- """
service = build('drive', 'v3', credentials=creds)
page_token = ""
files = None
orphans = []
page_size = 100
batch_counter = 0
print("LISTING ORPHAN FILES")
print("-----------------------------")
while (True):
# List
r = service.files().list(pageToken=page_token,
pageSize=page_size,
fields="nextPageToken, files"
).execute()
page_token = r.get('nextPageToken')
files = r.get('files', [])
# Filter orphans
# NOTE: (If the file has no 'parents' field, it means it's orphan)
for file in files:
try:
if file['parents']:
print("File with a parent found.")
except Exception as e:
print("Orphan file found.")
orphans.append(file['id'])
# Exit condition
if page_token is None:
break
print("DELETING ORPHAN FILES")
print("-----------------------------")
batch_size = min(len(orphans), 100)
while(len(orphans) > 0):
batch = service.new_batch_http_request(callback=callback)
for i in range(batch_size):
print("File with id {0} queued for deletion.".format(orphans[0]))
batch.add(service.files().delete(fileId=orphans[0]))
del orphans[0]
batch.execute()
batch_counter += 1
print("BATCH {0} DELETED - {1} FILES DELETED".format(batch_counter,
batch_size))
if __name__ == '__main__':
main()
This method won't delete files in the root directory, as they have the 'root' value for the field 'parents'. If not all your orphan files are listed, it means they are being automatically deleted by google. This process might take up to 24h.

Adreian Lopez, thanks for your script. It really saved me a lot of manual work. Below are the steps that I followed to implement your script:
Created a folder c:\temp\pythonscript\ folder
Created OAuth 2.0 Client ID using https://console.cloud.google.com/apis/credentials and downloaded the credentials file to c:\temp\pythonscript\ folder.
Renamed the above client_secret_#######-#############.apps.googleusercontent.com.json as credentials.json
Copied the Adreian Lopez's python's script and saved it as c:\temp\pythonscript\deleteGoogleDriveOrphanFiles.py
Go to "Microsoft Store" on Windows 10 and install Python 3.8
Open the Command Prompt and enter: cd c:\temp\pythonscript\
run pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
run python deleteGoogleDriveOrphanFiles.py and follow the steps on the screen to create c:\temp\pythonscript\token.pickle file and start deleting the orphan files. This step can take quite a while.
Verify the https://one.google.com/u/1/storage
Rerun step 8 again as necessary.

Try to use this in your query:
'root' in parents

Related

Google API to find the storage size of the drive

I'm looking for a Google API to get the size of the drive, but I can't find anything. The code to delete a user's email using the Google API is provided below. Similarly, I need to know the size of the user's drive. Could someone please assist me? Is there a way to get the drive's size via an API? Thanks.
from __future__ import print_function
import os.path
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/admin.directory.group', 'https://www.googleapis.com/auth/admin.directory.user']
def main():
"""Shows basic usage of the Admin SDK Directory API.
Prints the emails and names of the first 10 users in the domain.
"""
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
service = build('admin', 'directory_v1', credentials=creds)
return service
def test():
# user = service.users().get(userKey="user1#matador.csun.edu").execute()
# members = service.groups().list(domain='my.csun.edu', userKey=user['primaryEmail'], pageToken=None, maxResults=500).execute()
# print(user)
# Call the Admin SDK Directory API
print('Getting the first 10 users in the domain')
results = service.users().list(customer='my_customer', maxResults=10,
orderBy='email').execute()
print(results)
users = results.get('users', [])
if not users:
print('No users in the domain.')
else:
print('Users:')
for user in users:
print(user)
# print(dir(user))
# print(u'{0} ({1})'.format(user['primaryEmail'],
# user['name']['fullName']))
def del_user(user):
try:
service.users().delete(userKey=user).execute()
print("Deleted!")
except:
print("User doesn't exist!")
if __name__ == '__main__':
service = main()
nameExt='23'
# with open('NewGmailInProd/gmailUser'+nameExt+'.txt') as fileToRead:
# with open('NewGmailInProd/test.txt') as fileToRead:
# emails = fileToRead.readlines()
emails = ['user1#matador.csun.edu']
for email in emails:
del_user(email.strip())
The google drive api has an endpoint called about.get This endpoint returns a lot of information about a users drive account try me
One of the things it returns is the users storage quota.
You appear to be trying to go though the Admin SDK Directory API it just gives you access to administer Workspace its not going to give you anything with drive

How to get email message in outlook account using IMAP?

I tried one sample program for getting an email message in outlook account using IMAP. In this account, I have 20 folders its getting all email messages except these folders (contact, calendar, task) not getting data its throwing server error. How to fix this error.
Code
import imaplib
import pprint
import email
import base64
import json
import re
import os
import fileinput
imap_host = 'outlook.office365.com'
imap_user = 'XXXXXXXXXXX'
imap_pass = 'XXXXXXXXXXXXX'
count = 0
file_path = 'geek.txt'
# connect to host using SSL
imap = imaplib.IMAP4_SSL(imap_host,993)
# login to server
l = imap.login(imap_user, imap_pass)
# Get Flags,mailbox_name,delimiter using regex
list_response_pattern = re.compile(r'\((?P<flags>.*?)\) "(?P<delimiter>.*)" (?P<name>.*)')
# Get List of Sync folders
list_data = imap.list()
# Check Local Storage is empty Sync All Folders Details.
print(os.stat(file_path).st_size)
if os.stat(file_path).st_size == 0:
global day
# Iterate folders in Sync folder
for i in list_data[1]:
# Get Folder name
sample = re.findall('"\/"(.*)',i.decode("utf-8"))
# Get Message_ids
try:
print("message")
print(sample[0].lstrip().strip('"'))
data = imap.select(sample[0].lstrip())
search_resp, search_data = imap.search( None, "ALL" )
match = list_response_pattern.match(i.decode("utf-8"))
flags, delimiter, mailbox_name = match.groups()
print("1")
print(mailbox_name)
mailbox_name = mailbox_name.strip('"')
print(mailbox_name)
except Exception as e:
print(e)
continue
# Get Current Status of Folder
current_status = imap.status(
'"{}"'.format(mailbox_name),
'(MESSAGES RECENT UIDNEXT UIDVALIDITY UNSEEN)',
)
print(current_status)
# Get message using UID and Message_id
msg_ids = search_data[ 0 ].split()
print("total count: ",len(msg_ids))
for i in msg_ids:
print("$$$$$$$$$$$$$$$$$$$$")
print("Message Ids: ", i)
count = count + 1
fetch_resp, fetch_UID = imap.fetch( i, 'UID' )
print("Fetch UID: ", fetch_UID)
day = bytes(str(fetch_UID[0].split()[2]).split("'")[1].split(')')[0],'utf-8')
print("ID: ",day)
fetch_resp, fetch_mdg = imap.uid('fetch', day, '(RFC822)')
print(fetch_mdg)
print("$$$$$$$$$$$$$$$$$$$$$")
email_msg = fetch_mdg[0][1]
if email_msg and isinstance(email_msg, str):
try:
email_msg = email.message_from_string(email_msg)
except :
email_msg = None
elif email_msg and isinstance(email_msg, bytes):
try:
email_msg = email.message_from_bytes(email_msg)
except:
email_msg = None
print("*********************************")
print("Count: ",count)
print("UID: ",day)
print(mailbox_name)
print(email_msg['To'])
print(email_msg['From'])
print(email_msg['subject'])
print(email_msg)
print("*********************************")
# Store Folder details in File
status_details = current_status[1][0].decode("utf-8")
status_details = status_details.split('(')[1].split(')')[0].split(' ')
print(status_details)
if len(msg_ids) == 0:
json1 = json.dumps({'total_count':int(status_details[1]),'UID':0,'UIDNext':int(status_details[5]),'UIDValidity':int(status_details[7]), 'Folder name':mailbox_name})
else:
json1 = json.dumps({'total_count':int(status_details[1]),'UID':int(day),'UIDNext':int(status_details[5]),'UIDValidity':int(status_details[7]), 'Folder name':mailbox_name})
file = open(file_path,'a')
file.write(json1)
file.write("\n")
print('hi')
Response
$$$$$$$$$$$$$$$$$$$$
Message Ids: b'3'
Fetch UID: [b'3 (UID 11)']
ID: b'11'
[(b'3 (RFC822 {757}', b'MIME-Version: 1.0\r\nContent-Type: text/plain; charset="us-ascii"\r\nFrom: Microsoft Exchange Server\r\nTo: "\r\nSubject: Retrieval using the IMAP4 protocol failed for the following message:\r\n 11\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nThe server couldn\'t retrieve the following message:\r\n\r\nSubject: "Test email Sync 3"\r\nFrom: "Imap Testing" ("/O=3DEXCHANGELABS/OU=3DEXCHANGE ADMINISTRATIVE GROUP=\r\n (FYDIBOHF23SPDLT)/CN=3DRECIPIENTS/CN=3DEBF2483D9A0145A59A48B829B12A45E4-MA=\r\nILBOX1")\r\nSent date: 5/6/2020 2:02:59 AM\r\n\r\nThe message hasn\'t been deleted. You might be able to view it using either =\r\nOutlook or Outlook Web App. You can also contact the sender to find out wha=\r\nt the message says.=\r\n'), b' UID 11 FLAGS (\\Seen))']
$$$$$$$$$$$$$$$$$$$$$
Server Error
Subject: Retrieval using the IMAP4 protocol failed for the following message:
7
Content-Transfer-Encoding: quoted-printable
The server couldn't retrieve the following message:
Subject: "Testing"
Sent date: 5/6/2020 2:01:54 AM
The message hasn't been deleted. You might be able to view it using either =
Outlook or Outlook Web App. You can also contact the sender to find out wha=
t the message says.=
I have around 20 folders I iterate one by one get current status of folder and stored in sample file. Its successfully working.but I tried to print email messages some folders (contact,calender,task) its showing this response.

How to find Knowledge base ID (kbid) for QnAMaker?

I am trying to integrate QnAmaker knowledge base with Azure Bot Service.
I am unable to find knowledge base id on QnAMaker portal.
How to find the kbid in QnAPortal?
The Knowledge Base Id can be located in Settings under “Deployment details” in your knowledge base. It is the guid that is nestled between “knowledgebases” and “generateAnswer” in the POST (see image below).
Hope of help!
Hey you can also use python to get this by take a look at the following code.
That is if you wanted to write a program to dynamically get the kb ids.
import http.client, os, urllib.parse, json, time, sys
# Represents the various elements used to create HTTP request path for QnA Maker
operations.
# Replace this with a valid subscription key.
# User host = '<your-resource-name>.cognitiveservices.azure.com'
host = '<your-resource-name>.cognitiveservices.azure.com'
subscription_key = '<QnA-Key>'
get_kb_method = '/qnamaker/v4.0/knowledgebases/'
try:
headers = {
'Ocp-Apim-Subscription-Key': subscription_key,
'Content-Type': 'application/json'
}
conn = http.client.HTTPSConnection(host)
conn.request ("GET", get_kb_method, None, headers)
response = conn.getresponse()
data = response.read().decode("UTF-8")
result = None
if len(data) > 0:
result = json.loads(data)
print
#print(json.dumps(result, sort_keys=True, indent=2))
# Note status code 204 means success.
KB_id = result["knowledgebases"][0]["id"]
print(response.status)
print(KB_id)
except :
print ("Unexpected error:", sys.exc_info()[0])
print ("Unexpected error:", sys.exc_info()[1])

cherrypy: how to get all active sessions (storage_type = "file")

My aim is to track all logged user, in a website powered by cherrypy.
With sessions stored in RAM (tools.sessions.storage_type = "ram"), i can get the information through :
cherrypy.session.cache.values()
But with sessions stored in a file (tools.sessions.storage_type = "file"), trying to do the same, i get:
AttributeError: 'FileSession' object has no attribute 'cache'
How to access the information stored in the session files ?
EDIT :
The proposition of Andrew Kloos is to load the sessions files from the directory (given by tools.sessions.storage_path), and un-pickle the files.
This works in most cases, but sometimes, one of the files is still LOCKED, and unpickling fails.
On the other hand, I can not believe that there is a session object for the current session (namely cherrypy.session), and that there is no available object for the other sessions, and that one is obliged to go through the sessions files ...
Ok looking at the cherrypy/lib/sessions file I see that getting the session values runs this load function...
def _load(self, path=None):
if path is None:
path = self._get_file_path()
try:
f = open(path, "rb")
try:
return pickle.load(f)
finally:
f.close()
except (IOError, EOFError):
return None
So you just need to mimic that but also loop through all the sessions in the session file folder. Try something like this...
import cherrypy
from cherrypy._cpcompat import pickle
import os
class HelloWorld(object):
#cherrypy.expose
def asdf(self):
# loop through all the files in the sessions folder
for FileName in os.listdir(os.path.abspath(os.path.dirname('sessions')) + '/sessions'):
# **EDIT**
if(FileName.find('.lock') == -1)
f = open(os.path.abspath(os.path.dirname('sessions')) + '/sessions/' + FileName, "rb")
sessiondata = pickle.load(f)
print(sessiondata[0]['FirstName'])
# **EDIT**
cherrypy.session['FirstName'] = 'adsdf'
return 'hi'
cherrypy.config.update({
'tools.sessions.on' : True,
'tools.sessions.storage_type' : 'file',
'tools.sessions.storage_path' : 'sessions'
})
cherrypy.quickstart(HelloWorld())
Hope this helps!
I'll just give you a simple one liner using python.
sessions = os.listdir('./tmp/sessions')
sessions = filter(lambda session: '.lock' not in session, sessions)
First, you list the session files in the directory.
Then, you filter out the lock files.

CVSNT: from a tag to a list of file names and revisions

I have a project with sources under the control of CVSNT.
I need a list of source file names and revisions belonging to a certain tag.
For example:
the tag MYTAG is:
myproject/main.cpp 1.5.2.3
myproject/myclass.h 1.5.2.1
I know that with cvs log -rMYTAG > log.txt I get in log.txt all the information I need and then I can filter it to build my list, but, is there any utility which already do what I need?
Here's a Python script that does this:
import sys, os, os.path
import re, string
def runCvs(args):
f_in, f_out, f_err = os.popen3('cvs '+string.join(args))
out = f_out.read()
err = f_err.read()
f_out.close()
f_err.close()
code = f_in.close()
if not code: code = 0
return code, out, err
class RevDumper:
def parseFile(self, rex, filelog):
m = rex.search(filelog)
if m:
print '%s\t%s' % (m.group(1), m.group(2))
def filterOutput(self, logoutput, repoprefix):
rex = re.compile('^={77}$', re.MULTILINE)
files = rex.split(logoutput)
rex = re.compile('RCS file: %s(.*),v[^=]+selected revisions: [^0][^=]+revision ([0-9\.]+)' % repoprefix, re.MULTILINE)
for file in files:
self.parseFile(rex, file)
def getInfo(self, tag, module, repoprefix):
args = ['-Q', '-z9', 'rlog', '-S', '-N', '-r'+tag, module] # remove the -S if you're using an older version of CVS
code, out, err = runCvs(args)
if code == 0:
self.filterOutput(out, repoprefix)
else:
sys.stderr.write('CVS returned %d\n%s\n' % (code, err))
if len(sys.argv) > 2:
tag = sys.argv[1]
module = sys.argv[2]
if len(sys.argv) > 3:
repoprefix = sys.argv[3]
else:
repoprefix = ''
RevDumper().getInfo(tag, module, repoprefix)
else:
sys.stderr.write('Syntax: %s TAG MODULE [REPOPREFIX]' % os.path.basename(sys.argv[0]))
Note that you either have to have a CVSROOT environment variable set or run this from inside a working copy checked out from the repository you want to query.
Also, the file names displayed are based on the "RCS File" property of the rlog output, i.e. they still contain the repository prefix. If you want to filter that out you can specify a third argument, e.g. when your CVSROOT is something like sspi:server:/cvsrepo then you would call this like:
ListCvsTagRevisions.py MyTag MyModule /cvsrepo/
Hope this helps.
Note: If you need a script that lists the revisions currently in your working copy, see the edit history of this answer.

Resources