TypeError: Object of type RowProxy is not JSON serializable - Flask - session

I am using SQLAlchemy to query the database from my Flask web-application using engine.After I do the SELECT Query and also do use fetchall object after ResultProxy is returned which ultimately returns RowProxy object and then I store in session.
Here is my code:
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from flask import Flask, session
engine = create_engine(os.environ.get('DATABASE_URL'))
db = scoped_session(sessionmaker(bind=engine))
app = Flask(__name__)
app.secret_key = os.environ.get('SECRET_KEY')
#app.route('/')
def index():
session['list'] = db.execute("SELECT title,author,year FROM books WHERE year = 2011 LIMIT 4").fetchall()
print(session['list'])
return "<h1>hello world</h1>"
if __name__ == "__main__":
app.run(debug = True)
Here is the output:
[('Steve Jobs', 'Walter Isaacson', 2011), ('Legend', 'Marie Lu', 2011), ('Hit List', 'Laurell K. Hamilton', 2011), ('Born at Midnight', 'C.C. Hunter', 2011)]
Traceback (most recent call last):
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 2449, in wsgi_app
response = self.handle_exception(e)
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 1866, in handle_exception
reraise(exc_type, exc_value, tb)
File "C:\Users\avise\AppData\Local\Programs\Python\Python38\Lib\json\encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type RowProxy is not JSON serializable
The session item stores the data as i can see in output.But "hello world" is not rendered.
And if i replace the session variable by ordinary variable say x then it seems to be working.
But i think i need to use sessions so that my application will be used simultaneously by to users to display different things. So, how could i use sessions in this case or is there any other way?
Any help will be appreciated as I am new to Flask and web-development.

From what I understand about the Flask Session object is that it acts as a python dictionary; however values must be JSON serializable. In this case, just like the error suggests, the RowProxy object that is being returned by fetch all is not json serializable.
A solution to this problem would be to instead pass through a result of your query as a dictionary (which is JSON serializable).
It looks like the result of your query is returning a list of tuples so we can do the following:
res = db.execute("SELECT title,author,year FROM books WHERE year = 2011 LIMIT 4").fetchall()
user_books = {}
index = 0
for entry in res:
user_books[index] = {'title':res[index][0],
'author':res[index][1],
'year':res[index][2],
}
index += 1
session['list'] = user_books
A word of caution; however, is that since we are using the title of the book as a key, if there are two books with the same title, information may be overwritten, so consider using a unique id as the key.
Also note that the dictionary construction above would only work for the query you already have - if you added another column to the select statement you would have to edit the code to include the extra column information.

Related

call DjangoRestFramework DetailAPIView() from another view

solution from #brian-destura below
The DRF test client does not work, but the django.test.client does. Odd (?) because it's a DRF APIView being called.
from django.test import Client
client = Client()
result = client.get('/api/place/6873947')
print(result.json)
I have a DRF DetailAPIView() that returns a complex serializer json response to external API queries, so in the browser, and via curl etc. http://localhost:8000/api/place/6873947/ returns a big JSON object. All good. The url entry in the 'api' app looks like this
path('place/<int:pk>/', views.PlaceDetailAPIView.as_view(), name='place-detail'),
I need to use that in another, function-based view, so first I tried using both django.test.Client and rest_framework.test.APIClient, e.g.
from rest_framework.test import APIClient
from django.urls import reverse
client = APIClient()
url = '/api/place/6873947/'
res = client.get(url)
That gets an empty result. With django Client:
from django.test import Client
c=Client()
Then
res = c.get('/api/place?pk=6873947')
and
res = c.get('/api/place/', {'pk': 6873947})
Both return "as_view() takes 1 positional argument but 2 were given"
I've tried other approaches in my IDE, picked up in StackOverflow, starting with
from api.views import PlaceDetailAPIView
pid = 6873947
from django.test import Client
from django.http import HttpRequest
from places.models import Place
request = HttpRequest()
request.method='GET'
request.GET = {"pk": pid}
Then
res = PlaceDetailAPIView.as_view({"pk": pid})
"as_view() takes 1 positional argument but 2 were given"
res = PlaceDetailAPIView.as_view()(request=request)
"Expected view PlaceDetailAPIView to be called with a URL keyword argument named "pk". Fix your URL conf, or set the .lookup_field attribute on the view correctly"
res = PlaceDetailAPIView.as_view()(request=request._request)
"HttpRequest' object has no attribute '_request"
I must be missing something basic, but hours of thrashing has gotten me nowhere - ideas?

gspread data does not appear in Google Sheet

I'm trying to write sensor data to a google sheet. I was able to write to this same sheet a year or so ago but I am active on this project again and can't get it to work. I believe the Oauth has changed and I've updated my code for that change.
In the below code, I get no errors, however no data in entered in the GoogleSheet. Also, If I look at GoogleSheets, the "last opened" date does not reflect the time my program would/should be writing to that google sheet.
I've tried numerous variations and I'm just stuck. Any suggestions would be appreciated.
#!/usr/bin/python3
#-- developed with Python 3.4.2
# External Resources
import time
import sys
import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import traceback
# Initialize gspread
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('MyGoogleCode.json',scope)
client = gspread.authorize(credentials)
# Start loop ________________________________________________________________
samplecount = 1
while True:
data_time = (time.strftime("%Y-%m-%d %H:%M:%S"))
row = ([samplecount,data_time])
# Append to Google sheet_
try:
if credentials is None or credentials.invalid:
credentials.refresh(httplib2.Http())
GoogleDataFile = client.open('DataLogger')
#wks = GoogleDataFile.get_worksheet(1)
wks = GoogleDataFile.get_worksheet(1)
wks.append_row([samplecount,data_time])
print("worksheets", GoogleDataFile.worksheets()) #prints ID for both sheets
except Exception as e:
traceback.print_exc()
print ("samplecount ", samplecount, row)
samplecount += 1
time.sleep(5)
I found my issue. I've changed 3 things to get gspread working:
Downloaded a newly created json file (probably did not need this step)
With the target worksheet open in chrome, I "shared" it with the email address found in the JSON file.
In the google developers console, I enabled "Drive API"
However, the code in the original post will not refresh the token. It will stop working after 60 minutes.
The code that works (as of July 2017) is below.
The code writes to a google sheet named "Datalogger"
It writes to the sheet shown as Sheet2 in the google view.
The only unique information is the name of the JSON file
Hope this helps others.
Jon
#!/usr/bin/python3
# -- developed with Python 3.4.2
#
# External Resources __________________________________________________________
import time
import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import traceback
# Initialize gspread credentials
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('MyjsonFile.json',scope)
headers = gspread.httpsession.HTTPSession(headers={'Connection': 'Keep-Alive'})
client = gspread.Client(auth=credentials, http_session=headers)
client.login()
workbook = client.open("DataLogger")
wksheet = workbook.get_worksheet(1)
# Start loop ________________________________________________________________
samplecount = 1
while True:
data_time = (time.strftime("%Y-%m-%d %H:%M:%S"))
row_data = [samplecount,data_time]
if credentials.access_token_expired:
client.login()
wksheet.append_row(row_data)
print("Number of rows in out worksheet ",wksheet.row_count)
print ("samplecount ", samplecount, row_data)
print()
samplecount += 1
time.sleep(16*60)

Streaming to HBase with pyspark

There is a fair amount of info online about bulk loading to HBase with Spark streaming using Scala (these two were particularly useful) and some info for Java, but there seems to be a lack of info for doing it with PySpark. So my questions are:
How can data be bulk loaded into HBase using PySpark?
Most examples in any language only show a single column per row being upserted. How can I upsert multiple columns per row?
The code I currently have is as follows:
if __name__ == "__main__":
context = SparkContext(appName="PythonHBaseBulkLoader")
streamingContext = StreamingContext(context, 5)
stream = streamingContext.textFileStream("file:///test/input");
stream.foreachRDD(bulk_load)
streamingContext.start()
streamingContext.awaitTermination()
What I need help with is the bulk load function
def bulk_load(rdd):
#???
I've made some progress previously, with many and various errors (as documented here and here)
So after much trial and error, I present here the best I have come up with. It works well, and successfully bulk loads data (using Puts or HFiles) I am perfectly willing to believe that it is not the best method, so any comments/other answers are welcome. This assume you're using a CSV for your data.
Bulk loading with Puts
By far the easiest way to bulk load, this simply creates a Put request for each cell in the CSV and queues them up to HBase.
def bulk_load(rdd):
#Your configuration will likely be different. Insert your own quorum and parent node and table name
conf = {"hbase.zookeeper.qourum": "localhost:2181",\
"zookeeper.znode.parent": "/hbase-unsecure",\
"hbase.mapred.outputtable": "Test",\
"mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",\
"mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",\
"mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}
keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter"
load_rdd = rdd.flatMap(lambda line: line.split("\n"))\#Split the input into individual lines
.flatMap(csv_to_key_value)#Convert the CSV line to key value pairs
load_rdd.saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv)
The function csv_to_key_value is where the magic happens:
def csv_to_key_value(row):
cols = row.split(",")#Split on commas.
#Each cell is a tuple of (key, [key, column-family, column-descriptor, value])
#Works well for n>=1 columns
result = ((cols[0], [cols[0], "f1", "c1", cols[1]]),
(cols[0], [cols[0], "f2", "c2", cols[2]]),
(cols[0], [cols[0], "f3", "c3", cols[3]]))
return result
The value converter we defined earlier will convert these tuples into HBase Puts
Bulk loading with HFiles
Bulk loading with HFiles is more efficient: rather than a Put request for each cell, an HFile is written directly and the RegionServer is simply told to point to the new HFile. This will use Py4J, so before the Python code we have to write a small Java program:
import py4j.GatewayServer;
import org.apache.hadoop.hbase.*;
public class GatewayApplication {
public static void main(String[] args)
{
GatewayApplication app = new GatewayApplication();
GatewayServer server = new GatewayServer(app);
server.start();
}
}
Compile this, and run it. Leave it running as long as your streaming is happening. Now update bulk_load as follows:
def bulk_load(rdd):
#The output class changes, everything else stays
conf = {"hbase.zookeeper.qourum": "localhost:2181",\
"zookeeper.znode.parent": "/hbase-unsecure",\
"hbase.mapred.outputtable": "Test",\
"mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2",\
"mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",\
"mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}#"org.apache.hadoop.hbase.client.Put"}
keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter"
load_rdd = rdd.flatMap(lambda line: line.split("\n"))\
.flatMap(csv_to_key_value)\
.sortByKey(True)
#Don't process empty RDDs
if not load_rdd.isEmpty():
#saveAsNewAPIHadoopDataset changes to saveAsNewAPIHadoopFile
load_rdd.saveAsNewAPIHadoopFile("file:///tmp/hfiles" + startTime,
"org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2",
conf=conf,
keyConverter=keyConv,
valueConverter=valueConv)
#The file has now been written, but HBase doesn't know about it
#Get a link to Py4J
gateway = JavaGateway()
#Convert conf to a fully fledged Configuration type
config = dict_to_conf(conf)
#Set up our HTable
htable = gateway.jvm.org.apache.hadoop.hbase.client.HTable(config, "Test")
#Set up our path
path = gateway.jvm.org.apache.hadoop.fs.Path("/tmp/hfiles" + startTime)
#Get a bulk loader
loader = gateway.jvm.org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles(config)
#Load the HFile
loader.doBulkLoad(path, htable)
else:
print("Nothing to process")
Finally, the fairly straightforward dict_to_conf:
def dict_to_conf(conf):
gateway = JavaGateway()
config = gateway.jvm.org.apache.hadoop.conf.Configuration()
keys = conf.keys()
vals = conf.values()
for i in range(len(keys)):
config.set(keys[i], vals[i])
return config
As you can see, bulk loading with HFiles is more complex than using Puts, but depending on your data load it is probably worth it since once you get it working it's not that difficult.
One last note on something that caught me off guard: HFiles expect the data they receive to be written in lexical order. This is not always guaranteed to be true, especially since "10" < "9". If you have designed your key to be unique, then this can be fixed easily:
load_rdd = rdd.flatMap(lambda line: line.split("\n"))\
.flatMap(csv_to_key_value)\
.sortByKey(True)#Sort in ascending order

cherrypy: how to get all active sessions (storage_type = "file")

My aim is to track all logged user, in a website powered by cherrypy.
With sessions stored in RAM (tools.sessions.storage_type = "ram"), i can get the information through :
cherrypy.session.cache.values()
But with sessions stored in a file (tools.sessions.storage_type = "file"), trying to do the same, i get:
AttributeError: 'FileSession' object has no attribute 'cache'
How to access the information stored in the session files ?
EDIT :
The proposition of Andrew Kloos is to load the sessions files from the directory (given by tools.sessions.storage_path), and un-pickle the files.
This works in most cases, but sometimes, one of the files is still LOCKED, and unpickling fails.
On the other hand, I can not believe that there is a session object for the current session (namely cherrypy.session), and that there is no available object for the other sessions, and that one is obliged to go through the sessions files ...
Ok looking at the cherrypy/lib/sessions file I see that getting the session values runs this load function...
def _load(self, path=None):
if path is None:
path = self._get_file_path()
try:
f = open(path, "rb")
try:
return pickle.load(f)
finally:
f.close()
except (IOError, EOFError):
return None
So you just need to mimic that but also loop through all the sessions in the session file folder. Try something like this...
import cherrypy
from cherrypy._cpcompat import pickle
import os
class HelloWorld(object):
#cherrypy.expose
def asdf(self):
# loop through all the files in the sessions folder
for FileName in os.listdir(os.path.abspath(os.path.dirname('sessions')) + '/sessions'):
# **EDIT**
if(FileName.find('.lock') == -1)
f = open(os.path.abspath(os.path.dirname('sessions')) + '/sessions/' + FileName, "rb")
sessiondata = pickle.load(f)
print(sessiondata[0]['FirstName'])
# **EDIT**
cherrypy.session['FirstName'] = 'adsdf'
return 'hi'
cherrypy.config.update({
'tools.sessions.on' : True,
'tools.sessions.storage_type' : 'file',
'tools.sessions.storage_path' : 'sessions'
})
cherrypy.quickstart(HelloWorld())
Hope this helps!
I'll just give you a simple one liner using python.
sessions = os.listdir('./tmp/sessions')
sessions = filter(lambda session: '.lock' not in session, sessions)
First, you list the session files in the directory.
Then, you filter out the lock files.

How to make 'perishable' link in Tornado

i want to make a link that is valid only for 24 hours, this is for a validation purpose, so my question is simple:
How do i make this link valid only for this time; i've a hint:
Get the epoch time.
Make a link using only this value: something.com/time/1359380374
When the user clics on the link, extract this value and compare.
I hear about Hash values? why? we cant get the time from the hash value (invert the process) so how this is done?
Your best bet is to have the users email send as an argument and then query the database to see if their link has expired:
Requested link query: update users set locked_stamp = now();
Request url: http://yourdomain.com/?email=useremail
Query: select true from users where email = '$email' and locked_stamp > now()-interval 1 hour and now() limit 1
Result: You have a person requesting within the hour with email: $email.
I have a script that using base64 to encode the timestamp... but its not secure by any means.
import tornado.web
import base64, re, time
import sys
def get_time():
"""Method used to get the current time in b64"""
return base64.b64encode(str(int(time.mktime(time.localtime()))))
class WebHandler(tornado.web.RequestHandler):
def get(self, _time):
timecheck = base64.b64decode(_time)
try:
#require it to be all digits
assert re.match('^\d+$', timecheck) is not None
# Must be within 1 hour: greater then 1 hour ago and less then now
assert int(timecheck) > int(time.mktime(time.localtime()))-3600 and \
int(timecheck) < int(time.mktime(time.localtime()))
except AssertionError:
raise tornado.web.HTTPError(401,'Woops! Unauthorized.')
else:
self.write('Pass')
# Route
application = tornado.web.Application([
(r"/([^\/]+)/?", WebHandler),
])
if __name__ == "__main__":
application.listen(8889)
tornado.ioloop.IOLoop.instance().start()
the same way it sets secure cookies:
signed_message = self.create_signed_value(secret, name, value)
Then you can check it:
message = self.decode_signed_value(secret, name, value, max_age_days=31, clock=None,min_version=None)
Secret should be a long random number, but you only need one per app. min_version could be DEFAULT_SIGNED_VALUE_VERSION (which is currently 2).
Don't roll your own solution. Use the one in the library. It's there. It works.

Resources