cherrypy: how to get all active sessions (storage_type = "file") - session

My aim is to track all logged user, in a website powered by cherrypy.
With sessions stored in RAM (tools.sessions.storage_type = "ram"), i can get the information through :
cherrypy.session.cache.values()
But with sessions stored in a file (tools.sessions.storage_type = "file"), trying to do the same, i get:
AttributeError: 'FileSession' object has no attribute 'cache'
How to access the information stored in the session files ?
EDIT :
The proposition of Andrew Kloos is to load the sessions files from the directory (given by tools.sessions.storage_path), and un-pickle the files.
This works in most cases, but sometimes, one of the files is still LOCKED, and unpickling fails.
On the other hand, I can not believe that there is a session object for the current session (namely cherrypy.session), and that there is no available object for the other sessions, and that one is obliged to go through the sessions files ...

Ok looking at the cherrypy/lib/sessions file I see that getting the session values runs this load function...
def _load(self, path=None):
if path is None:
path = self._get_file_path()
try:
f = open(path, "rb")
try:
return pickle.load(f)
finally:
f.close()
except (IOError, EOFError):
return None
So you just need to mimic that but also loop through all the sessions in the session file folder. Try something like this...
import cherrypy
from cherrypy._cpcompat import pickle
import os
class HelloWorld(object):
#cherrypy.expose
def asdf(self):
# loop through all the files in the sessions folder
for FileName in os.listdir(os.path.abspath(os.path.dirname('sessions')) + '/sessions'):
# **EDIT**
if(FileName.find('.lock') == -1)
f = open(os.path.abspath(os.path.dirname('sessions')) + '/sessions/' + FileName, "rb")
sessiondata = pickle.load(f)
print(sessiondata[0]['FirstName'])
# **EDIT**
cherrypy.session['FirstName'] = 'adsdf'
return 'hi'
cherrypy.config.update({
'tools.sessions.on' : True,
'tools.sessions.storage_type' : 'file',
'tools.sessions.storage_path' : 'sessions'
})
cherrypy.quickstart(HelloWorld())
Hope this helps!

I'll just give you a simple one liner using python.
sessions = os.listdir('./tmp/sessions')
sessions = filter(lambda session: '.lock' not in session, sessions)
First, you list the session files in the directory.
Then, you filter out the lock files.

Related

Mongo db won't connect with my code or write to the database

This is my code and I'm trying to get it so when a user does the add command, it stores their id and number they have used the command, but this isn't working, please can someone help.
num = 0
#client.command()
async def add(ctx):
global num
num += 1
await ctx.send('non')
mongo_url = "mongodb+=true&w=majority"
cluster = MongoClient(mongo_url)
db = cluster["mongo_url "]
collection = db["mongo_url "]
ping_cm = {"bank":num}
collection.insert_one(ping_cm)
I assume your mongo_url token is incorrect, it should have your name and password and db that you are storing it to, but you are accessing your token currently instead of your bank name, whatever that is called,
for example,
db = cluster["mongo_url "] #This has been set to your token, your mongo_url which won't do anything
You have used "bank" in other parts of your code, which is really confusing but I assume thats what you want to do and access, this will then store it in different rows for each user id who uses the command
num = 0
#client.command()
async def add(ctx):
global num
num += 1
await ctx.send('non')
mongo_url = "YOUR_MONGO_DATABASE_URL"
cluster = MongoClient(mongo_url)
db = cluster["bank"]
collection = db["bank"]
ping_cm = {"bank":num}
collection.insert_one(ping_cm)
await ctx.channel.send("Bank Updated!")
Make sure you are providing your mongo url "properly" otherwise the code won't be working at all they should look like this: eg.
EXAMPLE ONLY
mongo_url = "mongodb+srv://name:password#bank.9999000.mongodb.net/bank?retryWrites=true&w=majority" #EXAMPLE
You can get the URL when you go to the database you want to connect to, the click manage > db_url and copy that where I have included “YOUR_MONGO_DATABASE_URL" that should work if it is correct

TypeError: Object of type RowProxy is not JSON serializable - Flask

I am using SQLAlchemy to query the database from my Flask web-application using engine.After I do the SELECT Query and also do use fetchall object after ResultProxy is returned which ultimately returns RowProxy object and then I store in session.
Here is my code:
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from flask import Flask, session
engine = create_engine(os.environ.get('DATABASE_URL'))
db = scoped_session(sessionmaker(bind=engine))
app = Flask(__name__)
app.secret_key = os.environ.get('SECRET_KEY')
#app.route('/')
def index():
session['list'] = db.execute("SELECT title,author,year FROM books WHERE year = 2011 LIMIT 4").fetchall()
print(session['list'])
return "<h1>hello world</h1>"
if __name__ == "__main__":
app.run(debug = True)
Here is the output:
[('Steve Jobs', 'Walter Isaacson', 2011), ('Legend', 'Marie Lu', 2011), ('Hit List', 'Laurell K. Hamilton', 2011), ('Born at Midnight', 'C.C. Hunter', 2011)]
Traceback (most recent call last):
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 2449, in wsgi_app
response = self.handle_exception(e)
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 1866, in handle_exception
reraise(exc_type, exc_value, tb)
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\json\encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type RowProxy is not JSON serializable
The session item stores the data as i can see in output.But "hello world" is not rendered.
And if i replace the session variable by ordinary variable say x then it seems to be working.
But i think i need to use sessions so that my application will be used simultaneously by to users to display different things. So, how could i use sessions in this case or is there any other way?
Any help will be appreciated as I am new to Flask and web-development.
From what I understand about the Flask Session object is that it acts as a python dictionary; however values must be JSON serializable. In this case, just like the error suggests, the RowProxy object that is being returned by fetch all is not json serializable.
A solution to this problem would be to instead pass through a result of your query as a dictionary (which is JSON serializable).
It looks like the result of your query is returning a list of tuples so we can do the following:
res = db.execute("SELECT title,author,year FROM books WHERE year = 2011 LIMIT 4").fetchall()
user_books = {}
index = 0
for entry in res:
user_books[index] = {'title':res[index][0],
'author':res[index][1],
'year':res[index][2],
}
index += 1
session['list'] = user_books
A word of caution; however, is that since we are using the title of the book as a key, if there are two books with the same title, information may be overwritten, so consider using a unique id as the key.
Also note that the dictionary construction above would only work for the query you already have - if you added another column to the select statement you would have to edit the code to include the extra column information.

Python: Opening auto-generated file

As part of my larger program, I want to create a logfile with the current time & date as part of the title. I can create it as follows:
malwareLog = open(datetime.datetime.now().strftime("%Y%m%d - %H.%M " + pcName + " Malware scan log.txt"), "w+")
Now, my app is going to call a number of other functions, so I'll need to open the file, write some output to it and close the file, several times. It doesn't seem to work if I simply go:
malwareLog.open(malwareLog, "a+")
or similar. So how should I open a dynamically created txt file that I don't know the actual filename for...?
When you create malwareLog object, it has name attribute which contains the file name.
Here's an example: (my test is your malwareLog)
import random
test = open(str(random.randint(0,999999))+".txt", "w+")
test.write("hello ")
test.close()
test = open(test.name, "a+")
test.write("world!")
test.close()
with open(test.name, "r") as f: print(f.read())
You also can store the file name in a variable before or after creating the file.
###Before
file_name = "123"
malwareLog = open(file_name, "w")
###After
malwareLog = open(random.randint(0,999999), "w")
file_name = malwareLog.name

Google Drive API: list files with no parent

The files in Google domain that I administer have gotten into a bad state; there are thousands of files residing in the root directory. I want to identify these files and move them to a folder underneath "My Drive".
When I use the API to list the parents for one of these orphaned files, the result is an empty array. To determine if a file is orphaned, I can iterate over all the files in my domain, and request the list of parents for each. If the list is empty, I know that the file is orphaned.
But this is hideously slow.
Is there anyway to use the Drive API to search for files that have no parents?
The "parents" field for the q parameter doesn't seem to be useful for this, as it's only possible to specify that the parents list contains some ID.
Update:
I'm trying to find a quick way to locate items that are truly at the root of the document hierarchy. That is, they are siblings of "My Drive", not children of "My Drive".
In Java:
List<File> result = new ArrayList<File>();
Files.List request = drive.files().list();
request.setQ("'root'" + " in parents");
FileList files = null;
files = request.execute();
for (com.google.api.services.drive.model.File element : files.getItems()) {
System.out.println(element.getTitle());
}
'root' is the parent folder, if the file or folder is in the root
Brute, but simple and it works..
do {
try {
FileList files = request.execute();
for (File f : files.getItems()) {
if (f.getParents().size() == 0) {
System.out.println("Orphan found:\t" + f.getTitle());
orphans.add(f);
}
}
request.setPageToken(files.getNextPageToken());
} catch (IOException e) {
System.out.println("An error occurred: " + e);
request.setPageToken(null);
}
} while (request.getPageToken() != null
&& request.getPageToken().length() > 0);
The documentation recommends following query: is:unorganized owner:me.
The premise is:
List all files.
If a file has no 'parents' field, it means it's an orphan file.
So, the script deletes them.
Before to start you need:
To create an OAuth id
Then you need to add the permissions '../auth/drive' to your OAuth id, and validating your app against google, so you have delete permissions.
Ready for copy paste demo
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']
def callback(request_id, response, exception):
if exception:
print("Exception:", exception)
def main():
"""
Description:
Shows basic usage of the Drive v3 API to delete orphan files.
"""
""" --- CHECK CREDENTIALS --- """
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
""" --- OPEN CONNECTION --- """
service = build('drive', 'v3', credentials=creds)
page_token = ""
files = None
orphans = []
page_size = 100
batch_counter = 0
print("LISTING ORPHAN FILES")
print("-----------------------------")
while (True):
# List
r = service.files().list(pageToken=page_token,
pageSize=page_size,
fields="nextPageToken, files"
).execute()
page_token = r.get('nextPageToken')
files = r.get('files', [])
# Filter orphans
# NOTE: (If the file has no 'parents' field, it means it's orphan)
for file in files:
try:
if file['parents']:
print("File with a parent found.")
except Exception as e:
print("Orphan file found.")
orphans.append(file['id'])
# Exit condition
if page_token is None:
break
print("DELETING ORPHAN FILES")
print("-----------------------------")
batch_size = min(len(orphans), 100)
while(len(orphans) > 0):
batch = service.new_batch_http_request(callback=callback)
for i in range(batch_size):
print("File with id {0} queued for deletion.".format(orphans[0]))
batch.add(service.files().delete(fileId=orphans[0]))
del orphans[0]
batch.execute()
batch_counter += 1
print("BATCH {0} DELETED - {1} FILES DELETED".format(batch_counter,
batch_size))
if __name__ == '__main__':
main()
This method won't delete files in the root directory, as they have the 'root' value for the field 'parents'. If not all your orphan files are listed, it means they are being automatically deleted by google. This process might take up to 24h.
Adreian Lopez, thanks for your script. It really saved me a lot of manual work. Below are the steps that I followed to implement your script:
Created a folder c:\temp\pythonscript\ folder
Created OAuth 2.0 Client ID using https://console.cloud.google.com/apis/credentials and downloaded the credentials file to c:\temp\pythonscript\ folder.
Renamed the above client_secret_#######-#############.apps.googleusercontent.com.json as credentials.json
Copied the Adreian Lopez's python's script and saved it as c:\temp\pythonscript\deleteGoogleDriveOrphanFiles.py
Go to "Microsoft Store" on Windows 10 and install Python 3.8
Open the Command Prompt and enter: cd c:\temp\pythonscript\
run pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
run python deleteGoogleDriveOrphanFiles.py and follow the steps on the screen to create c:\temp\pythonscript\token.pickle file and start deleting the orphan files. This step can take quite a while.
Verify the https://one.google.com/u/1/storage
Rerun step 8 again as necessary.
Try to use this in your query:
'root' in parents

Access session cookie in scrapy spiders

I am trying to access the session cookie within a spider. I first login to a social network using in a spider:
def parse(self, response):
return [FormRequest.from_response(response,
formname='login_form',
formdata={'email': '...', 'pass':'...'},
callback=self.after_login)]
In after_login, I would like to access the session cookies, in order to pass them to another module (selenium here) to further process the page with an authentificated session.
I would like something like that:
def after_login(self, response):
# process response
.....
# access the cookies of that session to access another URL in the
# same domain with the autehnticated session.
# Something like:
session_cookies = XXX.get_session_cookies()
data = another_function(url,cookies)
Unfortunately, response.cookies does not return the session cookies.
How can I get the session cookies ? I was looking at the cookies middleware: scrapy.contrib.downloadermiddleware.cookies and scrapy.http.cookies but there doesn't seem to be any straightforward way to access the session cookies.
Some more details here bout my original question:
Unfortunately, I used your idea but I dind't see the cookies, although I know for sure that they exists since the scrapy.contrib.downloadermiddleware.cookies middleware does print out the cookies! These are exactly the cookies that I want to grab.
So here is what I am doing:
The after_login(self,response) method receives the response variable after proper authentication, and then I access an URL with the session data:
def after_login(self, response):
# testing to see if I can get the session cookies
cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
cookieJar.extract_cookies(response, response.request)
cookies_test = cookieJar._cookies
print "cookies - test:",cookies_test
# URL access with authenticated session
url = "http://site.org/?id=XXXX"
request = Request(url=url,callback=self.get_pict)
return [request]
As the output below shows, there are indeed cookies, but I fail to capture them with cookieJar:
cookies - test: {}
2012-01-02 22:44:39-0800 [myspider] DEBUG: Sending cookies to: <GET http://www.facebook.com/profile.php?id=529907453>
Cookie: xxx=3..........; yyy=34.............; zzz=.................; uuu=44..........
So I would like to get a dictionary containing the keys xxx, yyy etc with the corresponding values.
Thanks :)
A classic example is having a login server, which provides a new session id after a successful login. This new session id should be used with another request.
Here is the code picked up from source which seems to work for me.
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
Code:
def check_logged(self, response):
tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
cookieHolder=dict(SESSION_ID=tmpCookie)
#print response.body
if "my name" in response.body:
yield Request(url="<<new url for another server>>",
cookies=cookieHolder,
callback=self."<<another function here>>")
else:
print "login failed"
return
Maybe this is an overkill, but i don't know how are you going to use those cookies, so it might be useful (an excerpt from real code - adapt it to your case):
from scrapy.http.cookies import CookieJar
class MySpider(BaseSpider):
def parse(self, response):
cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
cookieJar.extract_cookies(response, response.request)
request = Request(nextPageLink, callback = self.parse2,
meta = {'dont_merge_cookies': True, 'cookie_jar': cookieJar})
cookieJar.add_cookie_header(request) # apply Set-Cookie ourselves
CookieJar has some useful methods.
If you still don't see the cookies - maybe they are not there?
UPDATE:
Looking at CookiesMiddleware code:
class CookiesMiddleware(object):
def _debug_cookie(self, request, spider):
if self.debug:
cl = request.headers.getlist('Cookie')
if cl:
msg = "Sending cookies to: %s" % request + os.linesep
msg += os.linesep.join("Cookie: %s" % c for c in cl)
log.msg(msg, spider=spider, level=log.DEBUG)
So, try request.headers.getlist('Cookie')
This works for me
response.request.headers.get('Cookie')
It seems to return all the cookies that where introduced by the middleware in the request, session's or otherwise.
As of 2021 (Scrapy 2.5.1), this is still not particularly straightforward. But you can access downloader middlewares (like CookiesMiddleware) from within a spider via self.crawler.engine.downloader:
def after_login(self, response):
downloader_middlewares = self.crawler.engine.downloader.middleware.middlewares
cookies_mw = next(iter(mw for mw in downloader_middlewares if isinstance(mw, CookiesMiddleware)))
jar = cookies_mw.jars[response.meta.get('cookiejar')].jar
cookies_list = [vars(cookie) for domain in jar._cookies.values() for path in domain.values() for cookie in path.values()]
# or
cookies_dict = {cookie.name: cookie.value for domain in jar._cookies.values() for path in domain.values() for cookie in path.values()}
...
Both output formats above can be passed to other requests using the cookies parameter.

Resources