At a time how many rows we can update using Documentum update DQL query? - documentum6.5

While updating more rows i.e 1,00,000 rows at a time in Documentum table , That time i am getting below error.
Error :
[DM_SESSION_E_NON_EXIST_OBJ]error: "The object identified by SET_PUSH_OBJECT_STATUS does not exist."; ERRORCODE: 100; NEXT: null
At a time how many rows we can update using Documentum update DQL query ?

Your error is caused by incorrect implementation in DFC, which limits the maximum length of DQL queries to ~63800 bytes

AFAIK the underlying database will set the limits. If you run the query in DA or eg. FME dqMan you are able to see the generated SQL that are run.
If you run into problems it might be due to inconsistencies in the repository- in that case you might want to run the Consistency Checker job to ensure your repo is in good shape before doing large updates.

Related

Impala query with LIMIT 0

Being production support team member, I investigate issues with various Impala queries and while researching on an issue , I see a team submits an Impala query with LIMIT 0 which obviously do not return any rows and then again without LIMIT 0 which gives them result. I guess they submit these queries from IBM Datastage. Before I question them why they do so.. wanted to check what could be a reason for someone to run with LIMIT 0. Is it just to check syntax or connection with Impala? I see a similar question discussed here in context of SQL but thought to ask anyway in Impala perspective. Thanks Neel
I think you are partially correct.
Pls note, limit will process all the data and then apply limit clause.
LIMIT 0 is mostly used to -
to check if syntax of SQL is correct. But impala do fetch all the records before applying limit. so SQL is completely validated. Some system may use this to check out the sql they generated automatically before actually applying it in server.
limit fetching lots of rows from a huge table or a data set every time you run a SQL.
sometime you want to create an empty table using structure of some other tables but do not want to copy store format, configurations etc.
dont want to burden the hue/any interface that is interacting with impala. All data will be processed but will not be returned.
performance test - this will somewhat give you an idea of run time of SQL. i used the word somewhat because its not actual time to complete but estimated time to complete a SQL.

Does the number of columns in a Vertica table impact query performance?

We are working with a Vertica 8.1 table containing 500 columns and 100 000 rows.
The following query will take around 1.5 seconds to execute, even when using the vsql client straight on one of the Vertica cluster nodes (to eliminate any network latency issue) :
SELECT COUNT(*) FROM MY_TABLE WHERE COL_132 IS NOT NULL and COL_26 = 'anotherValue'
But when checking the query_requests table, the request_duration_ms is only 98 ms, and the resource_acquisitions table doesn't indicate any delay in resource asquisition. I can't understand where the rest of the time is spent.
If I then export to a new table only the columns used by the query, and run the query on this new, smaller, table, I get a blazing fast response, even though the query_requests table still tells me the request_duration_ms is around 98 ms.
So it seems that the number of columns in the table impacts the execution time of queries, even if most of these columns are not referenced. Am I wrong ? If so, why is it so ?
Thanks by advance
It sounds like your query is running against the (default) superprojection that includes all tables. Even though Vertica is a columnar database (with associated compression and encoding), your query is probably still touching more data than it needs to.
You can create projections to optimize your queries. A projection contains a subset of columns; if one is available that has all the columns your query needs, then the query uses that instead of the superprojection. (It's a little more complicated than that, because physical location is also a factor, but that's the basic idea.) You can use the Database Designer to create some initial projections based on your schema and sample queries, and iteratively improve it over time.
I was running Vertica 8.1.0-1, it seems the issue was a Vertica bug in the Vertica planning phase causing a performance degradation. It was solved in versions >= 8.1.1 :
[https://my.vertica.com/docs/ReleaseNotes/8.1.x/Vertica_8.1.x_Release_Notes.htm]
VER-53602 - Optimizer - This fix improves complex query performance during the query planning phase.

How to obtain count of update records from Hive

I am working with hive driver in which the executeUpdate() record does not return the affected record count. Is there any alternate way in which this can be obtained? We need to get the effected record value for further processing.
If I am not mistaken Hive does not show (or even know?!) the number of updated records. Therefore extracting this directly is likely not going to work.
Workaround
First run a count query using the exact where statement and log the result
Then do the actual update
Naturally this incurs significant overhead.

What will happen when inserting a row during a long running query

I am writing some data loading code that pulls data from a large, slow table in an oracle database. I have read-only access to the data, and do not have the ability to change indexes or affect the speed of the query in any way.
My select statement takes 5 minutes to execute and returns around 300,000 rows. The system is inserting large batches of new records constantly, and I need to make sure I get every last one, so I need to save a timestamp for the last time I downloaded the data.
My question is: If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?
My gut tells me that the answer is 'no', especially since a large portion of those 5 minutes is just the time spent on the data transfer from the database to the local environment, but I can't find any direct documentation on the scenario.
"If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?"
No. Oracle enforces strict isolation levels and does not permit dirty reads.
The default isolation level is Read Committed. This means the result set you get after five minutes will be identical to the one you would have got if Oracle could have delivered you all the records in 0.0000001 seconds. Anything committed after you query started running will not be included in the results. That includes updates to the records as well as inserts.
Oracle does this by tracking changes to the table in the UNDO tablespace. Provided it can restrict the original image from that data your query will run to completion; if for any reason the undo information is overwritten your query will fail with the dreaded ORA-1555: Snapshot too old. That's right: Oracle would rather hurl an exception than provide us with an inconsistent result set.
Note that this consistency applies at the statement level. If we run the same query twice within the one transaction we may see two different results sets. If that is a problem (I think not in your case) we need to switch from Read Committed to Serialized isolation.
The Concepts Manual covers Concurrency and Consistency in great depth. Find out more.
So to answer your question, take the timestamp from the time you start the select. Specifically, take the max(created_ts) from the table before you kick off the query. This should protect you from the gap Alex mentions (if records are not committed the moment they are inserted there is the potential to lose records if you base the select on comparing with the system timestamp). Although doing this means you're issuing two queries in the same transaction which means you do need Serialized isolation after all!

Order by making very slow the application using oracle

In my application I need to generate the report for transaction history which is done by all clients. I have used Oracle 12c for my application. I have 300k clients. This table is related client details and transaction history table. I have written the query to generate showing transaction history per month. It returns near 20 million records.
SELECT C.CLIENT_ID, CD.CLIENT_NAME, ...... FROM CLIENT C, CLIENT_DETAILS CD,
TRANSACTION_HISTORY TH
--Condition part
ORDER BY C.CLIENT_ID
These 3 tables have right indexes which is working fine. But when fetching data using order by to showing customers in order this query takes 8 hours to execute the batch process.
I have analysed the cost of the query. Cost is 80085. But when I remove the order by from query the cost became to 200. So that I have removed the order by as of now. But I need to show the customers by order. I cannot use the limit. Is there any way to overcome this?
you can try indexing the client id in the table, which would speed up the performance of the table to fetch the data in some order.
you can use the link for the reference: link
Hope this would help you

Resources