I have an importer which takes a list of emails and saves them into a postgres database. Here is a snippet of code within a tableless importer class:
query_temporary_table = "CREATE TEMPORARY TABLE subscriber_imports (email CHARACTER VARYING(255)) ON COMMIT DROP;"
query_copy = "COPY subscriber_imports(email) FROM STDIN WITH CSV;"
query_delete = "DELETE FROM subscriber_imports WHERE email IN (SELECT email FROM subscribers WHERE suppressed_at IS NOT NULL OR list_id = #{list.id}) RETURNING email;"
query_insert = "INSERT INTO subscribers(email, list_id, created_at, updated_at) SELECT email, #{list.id}, NOW(), NOW() FROM subscriber_imports RETURNING id;"
conn = ActiveRecord::Base.connection_pool.checkout
conn.transaction do
raw = conn.raw_connection
raw.exec(query_temporary_table)
raw.exec(query_copy)
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
result_delete = raw.exec(query_delete)
result_insert = raw.exec(query_insert)
ActiveRecord::Base.connection_pool.checkin(conn)
{
deleted: result_delete.count,
inserted: result_insert.count,
updated: 0
}
end
The issue I am having is that when I try to upload I get an exception:
PG::ERROR: another command is already in progress: ROLLBACK
This is all done in one action, the only other queries I am making are user validation and I have a DB mutex preventing overlapping imports. This query worked fine up until my latest push which included updating my pg gem to 0.14.1 from 0.13.2 (along with other "unrelated" code).
The error initially started on our staging server, but I was then able to reproduce it locally and am out of ideas.
If I need to be more clear with my question, let me know.
Thanks
Found my own answer, and this might be useful if anyone finds the same issue when importing loads of data using "COPY"
An exception is being thrown within the CSV.read() block, and I do catch it, but I was not ending the process correctly.
begin
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
ensure
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
end
This block ensures that the COPY command is completed. I also added this at the end to release the connection back into the pool, without disrupting the flow in the case of a successful import:
rescue
ActiveRecord::Base.connection_pool.checkin(conn)
Related
So.. I've been coding to make a GUI show the quantity of currency of a player, the datastore API works perfectly but the local script doesn't (it's local because else it would just update it each time a player's currency gets updated and would be a mess being the opposite of what I want to)
and well... sometimes it loads the currency into the GUI but other times it just stays on the original text: "Label" instead of my current currency (4600)
here's the proof
What normally happens and should always happen
What sometimes happens and shouldn't happen:
here's the script, I've tried putting waits on the start but the original code is inside the while true do..
wait(game.Players.LocalPlayer:WaitForChild("Data")
wait(game.Players.LocalPlayer.Data:WaitForChild("Bells"))
while true do
script.Parent.TextLabel.Text = game.Players.LocalPlayer:WaitForChild("Data"):WaitForChild("Bells").Value
wait() --wait is for not making the loop break and stop the whole script
end
well.. if you want to see if data is really in the player, here's the script, it requires a API (DataStore2)
--[Animal Crossing Roblox Edition Data Store]--
--Bryan99354--
--Module not mine--
--Made with a AlvinBlox tutorial--
--·.·.*[Get Data Store, do not erase]*.·.·--
local DataStore2 = require(1936396537)
--[Default Values]--
local DefaultValue_Bells = 300
local DefaultValue_CustomClothes = 0
--[Data Store Functions]--
game.Players.PlayerAdded:Connect(function(player)
--[Data stores]--
local BellsDataStore = DataStore2("Bells",player)
local Data = Instance.new("Folder",player)
Data.Name = "Data"
Bells = Instance.new("IntValue",Data)
Bells.Name = "Bells"
local CustomClothesDataStore = DataStore2("CustomClothes",player)
local CustomClothes = Instance.new("IntValue",Data)
CustomClothes.Name = "CustomClothes"
local function CustomClothesUpdate(UpdatedValue)
CustomClothes.Value = CustomClothesDataStore:Get(UpdatedValue)
end
local function BellsUpdate(UpdatedValue)
Bells.Value = BellsDataStore:Get(UpdatedValue)
end
BellsUpdate(DefaultValue_Bells)
CustomClothesUpdate(DefaultValue_CustomClothes)
BellsDataStore:OnUpdate(BellsUpdate)
CustomClothesDataStore:OnUpdate(CustomClothesUpdate)
end)
--[test and reference functions]--
workspace.TestDevPointGiver.ClickDetector.MouseClick:Connect(function(player)
local BellsDataStore = DataStore2("Bells",player)
BellsDataStore:Increment(50,DefaultValue_Bells)
end)
workspace.TestDevCustomClothesGiver.ClickDetector.MouseClick:Connect(function(player)
local CustomClothesDataStore = DataStore2("CustomClothes",player)
CustomClothesDataStore:Increment(50,DefaultValue_CustomClothes)
end)
the code that creates "Data" and "Bells" is located in the comment: Data Stores
the only script that gets the issue is the short one with no reason :<
I hope that you can help me :3
#Night94 I tryed your script but it also failed sometimes
The syntax in your LocalScript is a little off with the waits. With that fixed, it works every time. Also, I would use an event handler instead of updating the value with a loop:
game.Players.LocalPlayer:WaitForChild("Data"):WaitForChild("Bells").Changed:Connect(function(value)
script.Parent.TextLabel.Text = value
end)
Printing the contents of a variable gives me a bunch of data.
I want to access part of that data, but get an error.
I'm using Viewpoint::EWS and am successfully accessing the data I need.
calendaritems = folder.find_items({:folder_id => folder.folder_id, :calendar_view => {:start_date => sd.rfc3339(), :end_date => ed.rfc3339()}})
calendaritems.each do |event|
...
end
Printing the variable "event", I can see the data I need: "date_time_stamp" (or "appointment_reply_time").
#<Viewpoint::EWS::Types::CalendarItem:0x00005652b332dfa0
#ews_item=
:date_time_stamp=>{:text=>"2019-03-18T12:01:49Z"},
:appointment_reply_time=>{:text=>"2019-03-18T13:01:55+01:00"},
However, trying to access using "event.date_time_stamp" (or "event.appointment_reply_time") leads to the error
undefined method `date_time_stamp' for <Viewpoint::EWS::Types::CalendarItem:0x00005622f83c3d38> (NoMethodError)
Here's the code:
calendaritems = folder.find_items({:folder_id => folder.folder_id, :calendar_view => {:start_date => sd.rfc3339(), :end_date => ed.rfc3339()}})
calendaritems.each do |event|
if event.recurring?
puts "#{event.date_time_stamp} | #{(event.start-event.date_time_stamp).to_i} | #{event.organizer.email_address}"
if (event.start-event.date_time_stamp).to_i == reminderDays
executeSomething()
end
end
end
I'm looking through recurring appointments for a resource within a week. Since those will be silently dropped after a year, the plan is to set up a system to remind people that this will happen, so they can rebook the resource.
At first I tried using the creation date of the appointment (event.date_time_created), which works as expected, but then noticed, that people can update their appointments, thus resetting the 1 year timer.
That's why I also need the date of the last update.
The debug output you supplied says that event variable has an attribute "ews_item" and then it has a hash with an attribute "date_time_stamp", so try event.ews_item[:date_time_stamp]
I'm using Airflow for some ETL things and in some stages, I would like to use temporary tables (mostly to keep the code and data objects self-contained and to avoid to use a lot of metadata tables).
Using the Postgres connection in Airflow and the "PostgresOperator" the behaviour that I found was: For each execution of a PostgresOperator we have a new connection (or session, you name it) in the database. In other words: We lose all temporary objects of the previous component of the DAG.
To emulate a simple example, I use this code (do not run, just see the objects):
import os
from airflow import DAG
from airflow.operators.postgres_operator import PostgresOperator
default_args = {
'owner': 'airflow'
,'depends_on_past': False
,'start_date': datetime(2018, 6, 13)
,'retries': 3
,'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'refresh_views'
, default_args=default_args)
# Create database workflow
drop_exist_temporary_view = "DROP TABLE IF EXISTS temporary_table_to_be_used;"
create_temporary_view = """
CREATE TEMPORARY TABLE temporary_table_to_be_used AS
SELECT relname AS views
,CASE WHEN relispopulated = 'true' THEN 1 ELSE 0 END AS relispopulated
,CAST(reltuples AS INT) AS reltuples
FROM pg_class
WHERE relname = 'some_view'
ORDER BY reltuples ASC;"""
use_temporary_view = """
DO $$
DECLARE
is_correct integer := (SELECT relispopulated FROM temporary_table_to_be_used WHERE views LIKE '%<<some_name>>%');
BEGIN
start_time := clock_timestamp();
IF is_materialized = 0 THEN
EXECUTE 'REFRESH MATERIALIZED VIEW ' || view_to_refresh || ' WITH DATA;';
ELSE
EXECUTE 'REFRESH MATERIALIZED VIEW CONCURRENTLY ' || view_to_refresh || ' WITH DATA;';
END IF;
END;
$$ LANGUAGE plpgsql;
"""
# Objects to be executed
drop_exist_temporary_view = PostgresOperator(
task_id='drop_exist_temporary_view',
sql=drop_exist_temporary_view,
postgres_conn_id='dwh_staging',
dag=dag)
create_temporary_view = PostgresOperator(
task_id='create_temporary_view',
sql=create_temporary_view,
postgres_conn_id='dwh_staging',
dag=dag)
use_temporary_view = PostgresOperator(
task_id='use_temporary_view',
sql=use_temporary_view,
postgres_conn_id='dwh_staging',
dag=dag)
# Data workflow
drop_exist_temporary_view >> create_temporary_view >> use_temporary_view
At the end of execution, I receive the following message:
[2018-06-14 15:26:44,807] {base_task_runner.py:95} INFO - Subtask: psycopg2.ProgrammingError: relation "temporary_table_to_be_used" does not exist
Someone knows if Airflow has some way to retain the same connection to the database? I think it can save a lot of work in creating/maintaining several objects in the database.
You can retain the connection to the database by building a custom Operator which leverages the PostgresHook to retain a connection to the db while you perform some set of sql operations.
You may find some examples in contrib on incubator-airflow or in Airflow-Plugins.
Another option is to persist this temporary data to XCOMs. This will give you the ability to keep the metadata used with the task in which it was created. This may help troubleshooting down the road.
When performing a COPY command, a few informations are printed, like :
INFO: Load into table '<table>' completed, 22666 record(s) loaded successfully.
INFO: Load into table '<table>' completed, 1 record(s) could not be loaded. Check 'stl_load_errors' system table for details.
And I need to identify failing records.
Thus I need 2 things :
Determine when there are failing rows: now, it's only printed on screen and I don't know how to get the message in code.
Determine the failing rows.
One way to do that would be to access to the query identifier that is visible in the table stl_load_errors, but I have no clue how to access it by code.
(I currently use the pg gem to connect to redshift)
stl_load_errors is a table in Redshift that (as you may have guessed already) includes all the errors that happen when loading into Redshift. So you can query it by doing something like:
SELECT * FROM stl_load_errors
Now, to answer your questions use the following snippet:
database = PG.connect(redshift)
begin
query = "COPY %s (%s) FROM 's3://%s/%s' CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' CSV GZIP" %
[ table, columns, s3_bucket, s3_key, access_key_id, secret_access_key ]
database.exec(query)
puts 'File succesfully imported'
rescue PG::InternalError
res = database.exec("SELECT line_number, colname, err_reason FROM pg_catalog.stl_load_errors WHERE filename = 's3://#{s3_bucket}/#{s3_key}'")
res.each do |row|
puts "Importing failed:\n> Line %s\n> Column: %s\n> Reason: %s" % row.values_at('line_number', 'colname', 'err_reason')
end
end
That should output all the information you need, recall variables like redshift, table, columns, s3_bucket, s3_key, access_key_id, and secret_access_key depend on your configuration.
UPDATE:
To answer your comment below, more specifically, you could use a query like this:
"SELECT lines_scanned FROM pg_catalog.stl_load_commits WHERE filename = 's3://#{s3_bucket}/#{s3_key}' AND errors = -1"
I'm currently hosting a simple Ruby script that stores URLs and Scores and saving them to YAML. However, I'd like to save to a Postgresql database instead since the yaml file is deleted every time I restart the app. Here's the error I'm getting in Heroku:
could not connect to server: No such file or directory (PG::ConnectionBad)
Here's an example script that works locally, but throws me the above error in Heroku:
require 'pg'
conn = PG.connect( dbname: 'template1' )
res1 = conn.exec('SELECT * from pg_database where datname = $1', ['words'])
if res1.ntuples == 1 # db exists
# do nothing
else
conn.exec('CREATE DATABASE words')
words_conn = PGconn.connect( :dbname => 'words')
words_conn.exec("create table top (url varchar, score integer);")
words_conn.exec("INSERT INTO top (url, score) VALUES ('http://apple.com', 1);")
end
Thanks in advance for any help or suggestions!
Assuming you have created a Postgres database using the Heroku toolchain via heroku addons:add heroku-postgresql:dev (or the plan of your choice) you should have a DATABASE_URL environmental variable that contains your connection string. You can check that locally through heroku pg:config.
Using the pg gem (docs: http://deveiate.org/code/pg/PG/Connection.html) - and modifying the example from there to suit -
require 'pg'
# source the connection string from the DATABASE_URL environmental variable
conn = PG::Connection.new(ENV['DATABASE_URL'])
res = conn.exec_params('create table top (url varchar, score integer;")
Update: A slightly more complete example for the purposes of error handling:
conn = PG::Connection.new(ENV['TEST_DATABASE_URL'])
begin
# Ensures the table is created if it doesn't exist
res = conn.exec("CREATE TABLE IF NOT EXISTS top (url varchar, score integer);")
res.result_status
rescue PG::Error => pg_error
puts "Table creation failed: #{pg_error.message}"
end