How to upload jpg to google drive using google drive api? - google-api

I'm trying to upload a jpg file onto google drive using the api but i'm not having much luck. Although the code does run without errors, the "image" saved in my google drive is untitled and doesn't actually contain data.
Here's how I'm doing it right now in Python:
post_body = "grant_type=refresh_token&client_id={}&client_secret={}&refresh_token={}".format(client_id, client_secret, refresh_token)
r = requests.post(refresh_url, data=post_body, headers={"Content-Type" : "application/x-www-form-urlencoded"})
r_json = json.loads(r.text)
access_token = r_json["access_token"]
media = MediaFileUpload(filename, mimetype="image/jpeg", resumable=True)
body = {
"name" : filename,
"mimeType" : "image/jpeg"
}
drive_url = "https://www.googleapis.com/upload/drive/v3/files?uploadType=media"
drive_r = requests.post(drive_url, data=body, headers={"Authorization": "Bearer " + access_token, "Content-type": "image/jpeg"})
When I print drive_r.text, the response I'm getting back is this:
{
"kind": "drive#file",
"id": "1Vt4gP***************",
"name": "Untitled",
"mimeType": "image/jpeg"
}

From your script, I understood that you want to upload a file to Google Drive without using googleapis for Python. In this case, I would like to propose the following modification.
Modification points:
In your script, the data from the file is not included in the request body.
You use uploadType=media. But it seems that you want to include the file metadata. In this case, please use uploadType=multipart.
Pattern 1:
If the file size you want to upload is less than 5 MB, you can use the following script. uploadType=multipart is used.
Modified script:
import json
import requests
access_token = r_json["access_token"] # This is your script for retrieving the access token.
filename = '###' # Please set the filename with the path.
para = {"name": filename}
files = {
'data': ('metadata', json.dumps(para), 'application/json'),
'file': open(filename, "rb")
}
r = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart",
headers={"Authorization": "Bearer " + access_token},
files=files
)
print(r.text)
Pattern 2:
If the file size you want to upload is more than 5 MB, you can use the following script. uploadType=resumable is used.
Modified script:
import json
import os
import requests
access_token = r_json["access_token"] # This is your script for retrieving the access token.
filename = '###' # Please set the filename with the path.
filesize = os.path.getsize(filename)
params = {
"name": filename,
"mimeType": "image/jpeg"
}
r1 = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable",
headers={"Authorization": "Bearer " + access_token, "Content-Type": "application/json"},
data=json.dumps(params)
)
r2 = requests.put(
r1.headers['Location'],
headers={"Content-Range": "bytes 0-" + str(filesize - 1) + "/" + str(filesize)},
data=open(filename, 'rb')
)
print(r2.text)
Note:
These sample scripts supposes that your access token can be used for uploading the file to Google Drive.
Reference:
Upload file data

Related

How to save user data to database instead of a pickle or a json file when trying to post videos on YouTube using Django and data v3 api

I'm trying to upload videos to youtube using Django and MSSQL, I want to store the user data to DB so that I can log in from multiple accounts and post videos.
The official documentation provided by youtube implements a file system and after login, all the user data gets saved there, I don't want to store any data in a file as saving files to DB would be a huge risk and not a good practice. So how can I bypass this step and save data directly to DB and retrieve it when I want to post videos to a specific account?
In short, I want to replace the pickle file implementation with storing it in the database.
Here's my code
def youtubeAuthenticate():
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
client_secrets_file = "client_secrets.json"
creds = None
# the file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first time
if os.path.exists("token.pickle"):
with open("token.pickle", "rb") as token:
creds = pickle.load(token)
# if there are no (valid) credentials availablle, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(client_secrets_file, SCOPES)
creds = flow.run_local_server(port=0)
# save the credentials for the next run
with open("token.pickle", "wb") as token:
pickle.dump(creds, token)
return build(api_service_name, api_version, credentials=creds)
#api_view(['GET','POST'])
def postVideoYT(request):
youtube = youtubeAuthenticate()
print('yt',youtube)
try:
initialize_upload(youtube, request.data)
except HttpError as e:
print("An HTTP error %d occurred:\n%s" % (e.resp.status, e.content))
return Response("Hello")
def initialize_upload(youtube, options):
print('options', options)
print("title", options['title'])
# tags = None
# if options.keywords:
# tags = options.keywords.split(",")
body=dict(
snippet=dict(
title=options['title'],
description=options['description'],
tags=options['keywords'],
categoryId=options['categoryId']
),
status=dict(
privacyStatus=options['privacyStatus']
)
)
# # Call the API's videos.insert method to create and upload the video.
insert_request = youtube.videos().insert(
part=",".join(body.keys()),
body=body,
media_body=MediaFileUpload(options['file'], chunksize=-1, resumable=True)
)
path = pathlib.Path(options['file'])
ext = path.suffix
getSize = os.path.getsize(options['file'])
resumable_upload(insert_request,ext,getSize)
# This method implements an exponential backoff strategy to resume a
# failed upload.
def resumable_upload(insert_request, ext, getSize):
response = None
error = None
retry = 0
while response is None:
try:
print("Uploading file...")
status, response = insert_request.next_chunk()
if response is not None:
respData = response
if 'id' in response:
print("Video id '%s' was successfully uploaded." % response['id'])
else:
exit("The upload failed with an unexpected response: %s" % response)
except HttpError as e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
e.content)
else:
raise
except RETRIABLE_EXCEPTIONS as e:
error = "A retriable error occurred: %s" % e
if error is not None:
print(error)
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print("Sleeping %f seconds and then retrying..." % sleep_seconds)
time.sleep(sleep_seconds)

Scrapes Emails from a list of URLs saved in CSV - BeautifulSoup

I am trying to parse thru a list of URLs saved in CSV format to scrape email addresses. However, the below code only managed to fetch email addresses from a single website. Need advice on how to modify the code to loop thru the list and save the outcome (the list of emails) to csv file.
import requests
import re
import csv
from bs4 import BeautifulSoup
allLinks = [];mails=[]
with open(r'url.csv', newline='') as csvfile:
urls = csv.reader(csvfile, delimiter=' ', quotechar='|')
links = []
for url in urls:
response = requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
links = [a.attrs.get('href') for a in soup.select('a[href]') ]
allLinks=set(links)
def findMails(soup):
for name in soup.find_all('a'):
if(name is not None):
emailText=name.text
match=bool(re.match('[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$',emailText))
if('#' in emailText and match==True):
emailText=emailText.replace(" ",'').replace('\r','')
emailText=emailText.replace('\n','').replace('\t','')
if(len(mails)==0)or(emailText not in mails):
print(emailText)
mails.append(emailText)
for link in allLinks:
if(link.startswith("http") or link.startswith("www")):
r=requests.get(link)
data=r.text
soup=BeautifulSoup(data,'html.parser')
findMails(soup)
else:
newurl=url+link
r=requests.get(newurl)
data=r.text
soup=BeautifulSoup(data,'html.parser')
findMails(soup)
mails=set(mails)
if(len(mails)==0):
print("NO MAILS FOUND")
You are overwriting links when you want to add to it.
allLinks = [];mails=[]
urls = ['https://www.nus.edu.sg/', 'http://gwiconsulting.com/']
links = []
for url in urls:
response = requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
links += [a.attrs.get('href') for a in soup.select('a[href]') ]
allLinks=set(links)
At end loop your mails and write to csv
import csv
with open("emails.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
w.writerow(['Email'])
for mail in mails:
w.writerow(mail)

How to find Knowledge base ID (kbid) for QnAMaker?

I am trying to integrate QnAmaker knowledge base with Azure Bot Service.
I am unable to find knowledge base id on QnAMaker portal.
How to find the kbid in QnAPortal?
The Knowledge Base Id can be located in Settings under “Deployment details” in your knowledge base. It is the guid that is nestled between “knowledgebases” and “generateAnswer” in the POST (see image below).
Hope of help!
Hey you can also use python to get this by take a look at the following code.
That is if you wanted to write a program to dynamically get the kb ids.
import http.client, os, urllib.parse, json, time, sys
# Represents the various elements used to create HTTP request path for QnA Maker
operations.
# Replace this with a valid subscription key.
# User host = '<your-resource-name>.cognitiveservices.azure.com'
host = '<your-resource-name>.cognitiveservices.azure.com'
subscription_key = '<QnA-Key>'
get_kb_method = '/qnamaker/v4.0/knowledgebases/'
try:
headers = {
'Ocp-Apim-Subscription-Key': subscription_key,
'Content-Type': 'application/json'
}
conn = http.client.HTTPSConnection(host)
conn.request ("GET", get_kb_method, None, headers)
response = conn.getresponse()
data = response.read().decode("UTF-8")
result = None
if len(data) > 0:
result = json.loads(data)
print
#print(json.dumps(result, sort_keys=True, indent=2))
# Note status code 204 means success.
KB_id = result["knowledgebases"][0]["id"]
print(response.status)
print(KB_id)
except :
print ("Unexpected error:", sys.exc_info()[0])
print ("Unexpected error:", sys.exc_info()[1])

Tweeting images programmatically

I have a business requirement for the project that I'm working on to allow users to print, email and share an image on Facebook and Twitter. The first three are simple whereas I'm finding it impossible to find a succinct example of how to post a tweet with an image using only client side scripting. I've seen various solutions using the Twitter API and almost all of them are PHP based. Surely this can't be that difficult.
This example uses the TwitterAPI python library.
from TwitterAPI import TwitterAPI
TWEET_TEXT = 'some tweet text'
IMAGE_PATH = './some_image.png'
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
ACCESS_TOKEN_KEY = ''
ACCESS_TOKEN_SECRET = ''
api = TwitterAPI(CONSUMER_KEY,CONSUMER_SECRET,ACCESS_TOKEN_KEY,ACCESS_TOKEN_SECRET)
# STEP 1 - upload image
file = open(IMAGE_PATH, 'rb')
data = file.read()
r = api.request('media/upload', None, {'media': data})
print('UPLOAD MEDIA SUCCESS' if r.status_code == 200 else 'UPLOAD MEDIA FAILURE')
# STEP 2 - post tweet with a reference to uploaded image
if r.status_code == 200:
media_id = r.json()['media_id']
r = api.request('statuses/update', {'status': TWEET_TEXT, 'media_ids': media_id})
print('UPDATE STATUS SUCCESS' if r.status_code == 200 else 'UPDATE STATUS FAILURE')

Google Drive API: list files with no parent

The files in Google domain that I administer have gotten into a bad state; there are thousands of files residing in the root directory. I want to identify these files and move them to a folder underneath "My Drive".
When I use the API to list the parents for one of these orphaned files, the result is an empty array. To determine if a file is orphaned, I can iterate over all the files in my domain, and request the list of parents for each. If the list is empty, I know that the file is orphaned.
But this is hideously slow.
Is there anyway to use the Drive API to search for files that have no parents?
The "parents" field for the q parameter doesn't seem to be useful for this, as it's only possible to specify that the parents list contains some ID.
Update:
I'm trying to find a quick way to locate items that are truly at the root of the document hierarchy. That is, they are siblings of "My Drive", not children of "My Drive".
In Java:
List<File> result = new ArrayList<File>();
Files.List request = drive.files().list();
request.setQ("'root'" + " in parents");
FileList files = null;
files = request.execute();
for (com.google.api.services.drive.model.File element : files.getItems()) {
System.out.println(element.getTitle());
}
'root' is the parent folder, if the file or folder is in the root
Brute, but simple and it works..
do {
try {
FileList files = request.execute();
for (File f : files.getItems()) {
if (f.getParents().size() == 0) {
System.out.println("Orphan found:\t" + f.getTitle());
orphans.add(f);
}
}
request.setPageToken(files.getNextPageToken());
} catch (IOException e) {
System.out.println("An error occurred: " + e);
request.setPageToken(null);
}
} while (request.getPageToken() != null
&& request.getPageToken().length() > 0);
The documentation recommends following query: is:unorganized owner:me.
The premise is:
List all files.
If a file has no 'parents' field, it means it's an orphan file.
So, the script deletes them.
Before to start you need:
To create an OAuth id
Then you need to add the permissions '../auth/drive' to your OAuth id, and validating your app against google, so you have delete permissions.
Ready for copy paste demo
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']
def callback(request_id, response, exception):
if exception:
print("Exception:", exception)
def main():
"""
Description:
Shows basic usage of the Drive v3 API to delete orphan files.
"""
""" --- CHECK CREDENTIALS --- """
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
""" --- OPEN CONNECTION --- """
service = build('drive', 'v3', credentials=creds)
page_token = ""
files = None
orphans = []
page_size = 100
batch_counter = 0
print("LISTING ORPHAN FILES")
print("-----------------------------")
while (True):
# List
r = service.files().list(pageToken=page_token,
pageSize=page_size,
fields="nextPageToken, files"
).execute()
page_token = r.get('nextPageToken')
files = r.get('files', [])
# Filter orphans
# NOTE: (If the file has no 'parents' field, it means it's orphan)
for file in files:
try:
if file['parents']:
print("File with a parent found.")
except Exception as e:
print("Orphan file found.")
orphans.append(file['id'])
# Exit condition
if page_token is None:
break
print("DELETING ORPHAN FILES")
print("-----------------------------")
batch_size = min(len(orphans), 100)
while(len(orphans) > 0):
batch = service.new_batch_http_request(callback=callback)
for i in range(batch_size):
print("File with id {0} queued for deletion.".format(orphans[0]))
batch.add(service.files().delete(fileId=orphans[0]))
del orphans[0]
batch.execute()
batch_counter += 1
print("BATCH {0} DELETED - {1} FILES DELETED".format(batch_counter,
batch_size))
if __name__ == '__main__':
main()
This method won't delete files in the root directory, as they have the 'root' value for the field 'parents'. If not all your orphan files are listed, it means they are being automatically deleted by google. This process might take up to 24h.
Adreian Lopez, thanks for your script. It really saved me a lot of manual work. Below are the steps that I followed to implement your script:
Created a folder c:\temp\pythonscript\ folder
Created OAuth 2.0 Client ID using https://console.cloud.google.com/apis/credentials and downloaded the credentials file to c:\temp\pythonscript\ folder.
Renamed the above client_secret_#######-#############.apps.googleusercontent.com.json as credentials.json
Copied the Adreian Lopez's python's script and saved it as c:\temp\pythonscript\deleteGoogleDriveOrphanFiles.py
Go to "Microsoft Store" on Windows 10 and install Python 3.8
Open the Command Prompt and enter: cd c:\temp\pythonscript\
run pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
run python deleteGoogleDriveOrphanFiles.py and follow the steps on the screen to create c:\temp\pythonscript\token.pickle file and start deleting the orphan files. This step can take quite a while.
Verify the https://one.google.com/u/1/storage
Rerun step 8 again as necessary.
Try to use this in your query:
'root' in parents

Resources