UPSERT in Memsql from another table - insert

I am trying this query to insert some records from a table to another one,when recods are not already exsiting in the target table, but I am getting the following error, what is the best query to UPSERT in memsql from another table?
Query:
INSERT INTO ema.device_set
(segment_0, segment_1, segment_2, segment_3, segment_4, last_updated)
SELECT tmp.segment_0, tmp.segment_1, tmp.segment_2, tmp.segment_3, tmp.segment_4, tmp.last_updated
FROM ema.tmp_device_set tmp
WHERE NOT EXISTS (
SELECT *
FROM ema.device_set tab
WHERE tmp.segment_0 = tab.segment_0 and tmp.segment_1 = tab.segment_1 and tmp.segment_2 = tab.segment_2 and tmp.segment_3 = tab.segment_3 and tmp.segment_4 = tab.segment_4
);
error:
Partition has no master instance or Leaf Error: The database will be available to query in 2 seconds after recovery from disk is finished.

That error message means your nodes are down or recovering from disk. It has nothing to do with the specific UPSERT you are trying to do.

Check to make sure your query is not in any violations of the MemSQL INSERT...SELECT rules shown at the following link.
https://docs.memsql.com/docs/insert

Related

Perform Incremental Load On Qlik Sense

New to Qlik Sense.
I would like to perform incremental insert, update and delete. Through research i managed to write this script
//This fetches deleted records
SELECT `sale_detail_auto_id`
FROM `iprocure_ods.deleted_records` as dr
INNER JOIN `iprocure_ods.saledetail` sd ON sd.sale_detail_auto_id = dr.identifier AND dr.type = 2
WHERE dr.delete_date > TIMESTAMP('$(vSaleTransactionsRunTime)');
//This fetches new and updated records
[sale_transactions]:
SELECT *
FROM `iprocure_edw.sale_transactions`
WHERE `server_update_date` > TIMESTAMP('$(vSaleTransactionsRunTime)');
Concatenate([sale_transactions])
LOAD *
FROM [lib://qlikPath/saletransactions.qvd] (qvd) Where Not Exists(`sale_detail_auto_id`);
//This part updates runtime dates
MaxUpdateDate:
LOAD Timestamp(MAX(`server_update_date`), '$(TimestampFormat)') As maxServerUpdateDate
FROM [lib://qlikPath/saletransactions.qvd] (qvd);
Let vSaleTransactionsRunTime = peek('maxServerUpdateDate', 0, MaxUpdateDate);
DROP Table MaxUpdateDate;
New and update records works fine. The problem is with the deleted records are replaced with empty column except sale_detail_auto_id column.
How can i fetch data from saletransactions.qvd that are not in deleted records?
In first SELECT you select sale_detail_auto_id fields which is also exists under the same field name in new and updated records, so then you see deleted ids together with new ones. You need to rename that column to avoid conflict.
Please use AS, for example:
sale_detail_auto_id` AS `deleted_sale_detail_auto_id`
and then in EXISTS use that field:
Where Not Exists(deleted_sale_detail_auto_id, sale_detail_auto_id);
UPDATED:
Additionally I think it doesn't make sense to store deleted ids in data model so you can name that table:
[TEMP_deleted_ids]
SELECT sale_detail_auto_id` AS `deleted_sale_detail_auto_id`
and then in the end of the script remove it:
DROP Table [TEMP_deleted_ids];

How to read hbase current and previous versions data from Hive

I want to read all the data from hbase table using Hive.
Should be able to read all the Current and previous versions data from hbase
You can specify the number of version you get for Scan and Get and it will retrieve them:
HTable cTable = new HTable(TableName);
Get res = new Get(Bytes.toBytes(key));
//set no. of version that you want to fetch.
res.setMaxVersions(verNo); <--
Result fetchRow = cTable.get(res);
NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long,byte[] >>> allVersions = fetchRow.getMap();
Note: Versioning is by default disabled while creating table.
So, you need to enable it.
create 'employee',{NAME=>"myname",Versions=>2},'office' //Here versioning is enabled for column "myname" as "2" and no versioning for column "office"
describe 'employee' // show you versioning information.
alter 'employee',NAME=>'office',VERSIONS =>4 // Alter
// Put and scan the table - it will show new and old value
put 'employee','1','myname:name','Jigyasa1'
put 'employee','1','myname:name','Jigyasa2'
put 'employee','1','office:name','Add1'
put 'employee','1','office:name','Add2'
scan 'employee',{VERSIONS=>10}
For Hbase-hive integration follow the ref. link :
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

Drupal 7 | Query through multiple node references

Let me start with structure first:
[main_node]->field_reference_to_sub_node->[sub_node]->field_ref_to_sub_sub_node->[sub_sub_node]
[sub_sub_node]->field_type = ['wrong_type', 'right_type']
How to efficiently query all [sub_sub_node] ids with right_type, referenced by main_node (which is current opened node)?
Doing node_load on foreach seems a bit of overkill for this. Anybody has some better solutions? Greatly appreciated!
If you want to directly query the table of the fields:
$query = db_select('node', 'n')->fields('n_sub_subnode', array('nid'));
$query->innerJoin('table_for_field_reference_to_sub_node', 'subnode', "n.nid = subnode.entity_id AND subnode.entity_type='node'");
$query->innerJoin('node', 'n_subnode', 'subnode.subnode_target_id = n_subnode.nid');
$query->innerJoin('table_for_field_ref_to_sub_sub_node', 'sub_subnode', "n_subnode.nid = sub_subnode.entity_id AND sub_subnode.entity_type='node'");
$query->innerJoin('node', 'n_sub_subnode', 'sub_subnode.sub_subnode_target_id = n_sub_subnode.nid');
$query->innerJoin('table_for_field_type', 'field_type', "n_sub_subnode.nid = field_type.entity_id AND field_type.entity_type='node'");
$query->condition('n.nid', 'your_main_node_nid');
$query->condition('field_type.field_type_value', 'right_type');
Here is the explanation of each line:
$query = db_select('node', 'n')->fields('n_sub_subnode', array('nid'));
We start by querying the base node table, with the alias 'n'. This is the table used for the 'main_node'. The node ids which will be returned will be however from another alias (n_sub_subnode), you will see it below.
$query->innerJoin('table_for_field_reference_to_sub_node', 'subnode', "n.nid = subnode.entity_id AND subnode.entity_type='node'");
The first join is with the table of the field_reference_to_sub_node field, so you have to replace this with the actual name of the table. This is how we will get the references to the subnodes.
$query->innerJoin('node', 'n_subnode', 'subnode.subnode_target_id = n_subnode.nid');
A join back to the node table for the subnodes. You have to replace the 'subnode_target_id' with the actual field for the target id from the field_reference_to_sub_node table. The main purpose of this join is to make sure there are valid nodes in the subnode field.
$query->innerJoin('table_for_field_ref_to_sub_sub_node', 'sub_subnode', "n_subnode.nid = sub_subnode.entity_id AND sub_subnode.entity_type='node'");
The join to the table that contains references to the sub_sub_node, so you have to replace the 'table_for_field_ref_to_sub_sub_node' with the actual name of the table. This is how we get the references to the sub_sub_nodes.
$query->innerJoin('node', 'n_sub_subnode', 'sub_subnode.sub_subnode_target_id = n_sub_subnode.nid');
The join back to the node table for the sub_sub_nodes, to make sure we have valid references. You have to replace the 'sub_subnode_target_id' with the actual field for the target id from the 'field_ref_to_sub_sub_node' table.
$query->innerJoin('table_for_field_type', 'field_type', "n_sub_subnode.nid = field_type.entity_id AND field_type.entity_type='node'");
And we can now finally join the table with the field_type information. You have to replace the 'table_for_field_type' with the actual name of the table.
$query->condition('n.nid', 'your_main_node_nid');
You can put now a condition for the main node id if you want.
$query->condition('field_type.field_type_value', 'right_type');
And the condition for the field type. You have to replace the 'field_type_value' with the actual name of the table field for the value.
Of course, if you are really sure that you always have valid references, you can skip the joins to the node table and directly join the field tables using the target id and the entity_id fields (basically the target_id from on field table has to be the entity_id for the next one).
I really hope I do not have typos, so please check the queries carefully.

BigQuery - Check if table already exists

I have a dataset in BigQuery. This dataset contains multiple tables.
I am doing the following steps programmatically using the BigQuery API:
Querying the tables in the dataset - Since my response is too large, I am enabling allowLargeResults parameter and diverting my response to a destination table.
I am then exporting the data from the destination table to a GCS bucket.
Requirements:
Suppose my process fails at Step 2, I would like to re-run this step.
But before I re-run, I would like to check/verify that the specific destination table named 'xyz' already exists in the dataset.
If it exists, I would like to re-run step 2.
If it does not exist, I would like to do foo.
How can I do this?
Thanks in advance.
Alex F's solution works on v0.27, but will not work on later versions. In order to migrate to v0.28+, the below solution will work.
from google.cloud import bigquery
project_nm = 'gc_project_nm'
dataset_nm = 'ds_nm'
table_nm = 'tbl_nm'
client = bigquery.Client(project_nm)
dataset = client.dataset(dataset_nm)
table_ref = dataset.table(table_nm)
def if_tbl_exists(client, table_ref):
from google.cloud.exceptions import NotFound
try:
client.get_table(table_ref)
return True
except NotFound:
return False
if_tbl_exists(client, table_ref)
Here is a python snippet that will tell whether a table exists (deleting it in the process--careful!):
def doesTableExist(project_id, dataset_id, table_id):
bq.tables().delete(
projectId=project_id,
datasetId=dataset_id,
tableId=table_id).execute()
return False
Alternately, if you'd prefer not deleting the table in the process, you could try:
def doesTableExist(project_id, dataset_id, table_id):
try:
bq.tables().get(
projectId=project_id,
datasetId=dataset_id,
tableId=table_id).execute()
return True
except HttpError, err
if err.resp.status <> 404:
raise
return False
If you want to know where bq came from, you can call build_bq_client from here: http://code.google.com/p/bigquery-e2e/source/browse/samples/ch12/auth.py
In general, if you're using this to test whether you should run a job that will modify the table, it can be a good idea to just do the job anyway, and use WRITE_TRUNCATE as a write disposition.
Another approach can be to create a predictable job id, and retry the job with that id. If the job already exists, the job already ran (you might want to double check to make sure the job didn't fail, however).
Enjoy:
def doesTableExist(bigquery, project_id, dataset_id, table_id):
try:
bigquery.tables().get(
projectId=project_id,
datasetId=dataset_id,
tableId=table_id).execute()
return True
except Exception as err:
if err.resp.status != 404:
raise
return False
There is an edit in exception.
you can use exists() now to check if dataset exists same with table
BigQuery exist documentation
recently big query introduced so called scripting statements that can be quite a game changer for some flows.
check them out here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
Now for example to check if table exists you can use something like this:
sql = """
BEGIN
IF EXISTS(SELECT 1 from `YOUR_PROJECT.YOUR_DATASET.YOUR_TABLE) THEN
SELECT 'table_found';
END IF;
EXCEPTION WHEN ERROR THEN
# you can print your own message like above or return error message
# however google says not to rely on error message structure as it may change
select ##error.message;
END;
"""
With my_bigquery being an instance of class google.cloud.bigquery.Client (already authentified and associated to a project):
my_bigquery.dataset(dataset_name).table(table_name).exists() # returns boolean
It does an API call to test for the existence of the table via a GET request
Source: https://googlecloudplatform.github.io/google-cloud-python/0.24.0/bigquery-table.html#google.cloud.bigquery.table.Table.exists
It works for me using 0.27 of the Google Bigquery Python module
Inline SQL Alternative
tarheel's answer is probably the most correct at this point in time
but I was considering the comment from Ivan above that "404 could also mean the resource is not there for a bunch of reasons", so here is a solution that should always successfully run a metadata query and return a result.
It's not the fastest, because it always has to run the query, bigquery has overhead for small queries
A trick I've seen previously is to query information_schema for a (table) object, and union that to a fake query that ensures a record is always returned even if the the object doesn't. There's also a LIMIT 1 and an ordering to ensure the single record returned represents the table, if it does exist. See the SQL in the code below.
In spite of doc claims that Bigquery standard SQL is ISO compliant, they don't support information_schema, but they do have __table_summary__
dataset is required because you can't query __table_summary__ without specifying dataset
dataset is not a parameter in the SQL because you can't parameterize object names without sql injection issues (apart from with the magical _TABLE_SUFFIX, see https://cloud.google.com/bigquery/docs/querying-wildcard-tables )
#!/usr/bin/env python
"""
Inline SQL way to check a table exists in Bigquery
e.g.
print(table_exists(dataset_name='<dataset_goes_here>', table_name='<real_table_name'))
True
print(table_exists(dataset_name='<dataset_goes_here>', table_name='imaginary_table_name'))
False
"""
from __future__ import print_function
from google.cloud import bigquery
def table_exists(dataset_name, table_name):
client = bigquery.Client()
query = """
SELECT table_exists FROM
(
SELECT true as table_exists, 1 as ordering
FROM __TABLES_SUMMARY__ WHERE table_id = #table_name
UNION ALL
SELECT false as table_exists, 2 as ordering
) ORDER by ordering LIMIT 1"""
query_params = [bigquery.ScalarQueryParameter('table_name', 'STRING', table_name)]
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = query_params
if dataset_name is not None:
dataset_ref = client.dataset(dataset_name)
job_config.default_dataset = dataset_ref
query_job = client.query(
query,
job_config=job_config
)
results = query_job.result()
for row in results:
# There is only one row because LIMIT 1 in the SQL
return row.table_exists

Linq Contains issue: cannot formulate the equivalent of 'WHERE IN' query

In the table ReservationWorkerPeriods there are records of all workers that are planned to work on a given period on any possible machine.
The additional table WorkerOnMachineOnConstructionSite contains columns workerId, MachineId and ConstructionSiteId.
From the table ReservationWorkerPeriods I would like to retrieve just workers who work on selected machine.
In order to retrieve just relevant records from WorkerOnMachineOnConstructionSite table I have written the following code:
var relevantWorkerOnMachineOnConstructionSite = (from cswm in currentConstructionSiteSchedule.ContrustionSiteWorkerOnMachine
where cswm.MachineId == machineId
select cswm).ToList();
workerOnMachineOnConstructionSite = relevantWorkerOnMachineOnConstructionSite as List<ContrustionSiteWorkerOnMachine>;
These records are also used in the application so I don't want to bypass the above code even if is possible to directly retrieve just workerPeriods for workers who work on selected machine. Anyway I haven't figured out how it is possible to retrieve the relevant workerPeriods once we know which userIDs are relevant.
I have tried the following code:
var userIDs = from w in workerOnMachineOnConstructionSite select new {w.WorkerId};
List<ReservationWorkerPeriods> workerPeriods = currentConstructionSiteSchedule.ReservationWorkerPeriods.ToList();
allocatedWorkers = workerPeriods.Where(wp => userIDs.Contains(wp.WorkerId));
but it seems to be incorrect and don't know how to fix it. Does anyone know what is the problem and how it is possible to retrieve just records which contain userIDs from the list?
Currently, you are constructing an anonymous object on the fly, with one property. You'll want to grab the id directly with (note the missing curly braces):
var userIDs = from w in workerOnMachineOnConstructionSite select w.WorkerId;
Also, in such cases, don't call ToList on it - the variable userIDs just contains the query, not the result. If you use that variable in a further query, the provider can translate it to a single sql query.

Resources