Read dead tuples data - greenplum

Is it possible to read the dead tuples from a table using xmin or xmax column or by changing any catalog parameters? Is there any way to read the old data which was updated?
Thanks in Advance

There is a hidden debugging variable --
I'm not going to post it here, because I believe that it's dangerous.
Google: greenplum+show+deleted
It's only good until VACUUM, after which those tuples are deleted.
This debugging GUC is for viewing only -- if you mess with the xmin, xmax, ctid, ... trying to undelete something, you are certainly going to cause corruption.
You can, however, export those rows without the hidden columns, then re-import.

Related

when would I need to modify knowledge module in ODI?

I have came across an ODI project and there seems to be a lot of user defined KMs and I don't understand why they were modified? Is there any particular scenario where existing KM doesn't work?
There are a lot of reasons for writing your own KM or modify the existing ones, for example:
log in your own paths/tables;
read metadata from flex fields (Metadata like: default values for some columns, base table name used for temporary tables, type of load: full/incremental etc);
make different transformations/stage steps, different from the standard KM's;
customize your CKM: make error table - where you see your rows in error, make correct result table and so on;
for modifying KM you may have want to save the temporary tables with your own standard and so on.
The benefits of writing KM's is that the limit is your imagination (or almost). You can do plenty of stuff. The standard KM's are very good but there are some moments when you reach the limit with them and from there you should create your own.
Hope that this helps you.

Postgres tsvector_update_trigger sometimes takes minutes

I have configured free text search on a table in my postgres database. Pretty simple stuff, with firstname, lastname and email. This works well and is fast.
I do however sometimes experience looong delays when inserting a new entry into the table, where the insert keeps running for minutes and also generates huge WAL files. (We use the WAL files for replication).
Is there anything I need to be aware of with my free text index? Like Postgres maybe randomly restructuring it for performance reasons? My index is currently around 400 MB big.
Thanks in advance!
Christian
Given the size of the WAL files, I suspect you are right that it is an index update/rebalancing that is causing the issue. However I have to wonder what else is going on.
I would recommend against storing tsvectors in separate columns. A better way is to run an index on to_tsvector()'s output. You can have multiple indexes for multiple languages if you need. So instead of a trigger that takes, say, a field called description and stores the tsvector in desc_tsvector, I would recommend just doing:
CREATE INDEX mytable_description_tsvector_idx ON mytable(to_tsvector(description));
Now, if you need a consistent search interface across a whole table, there are more elegant ways of doing this using "table methods."
In general the functional index approach has fewer issues associated with it than anything else.
Now a second thing you should be aware of are partial indexes. If you need to, you can index only records of interest. For example, if most of my queries only check the last year, I can:
CREATE INDEX mytable_description_tsvector_idx ON mytable(to_tsvector(description))
WHERE created_at > now() - '1 year'::interval;

What would make a foxpro memo table loses its records?

I have an old Foxpro database that I work with. The database could be about 100 meg in size and due to corruption and index issue, all of a sudden the new table (table after corruption) is about 4k in size.
I understand that the data is corrupted why would the data disappear though?
If any Foxpro experts could tell me why is the data missing, i would really appreciate it.
BTW: Foxpro is still very fast compare to a lot of the bells and whistles in databases out there.
The last data truncation/error occurred after a power outage and the data is just gone. The file size decreased to 4k.
Maybe CHR(0) in the corruption, though I wouldn't expect the file to shrink unless you also did something to rewrite the file. Maybe PACK?
A DBF file has a header followed by data. If the header is corrupted, it loses track of where the data is.
I have had an instances in the past where windows has mis-reported the physical size of a foxpro table, reporting one file to be BIGGER than it actually was and reporting another SMALLER than it actually was.
The data MAY actually still be there, the trick would be getting Foxpro to recognise the fact that there are additional records in the table than recorded in the table header.
QUESTIONS:-
Have you packed the table ?
have you tried one of the table recovery tools like DBF recovery on the file ?
If the answer is no to both of the above then it maybe worth a try !
Good luck

How slow are cursors really and what would be better alternatives?

I have been reading that cursors are pretty slow and one should unless out of options avoid them. I am trying to optimize my stored procedures and one of them uses a cursor. It frequently is being called by my application and with lot of users(20000) and rows to update. I was thinking maybe I should use something else as an alternative.
All I am trying to do or want is to get a list of records and then operate on depending on each row value. So for e.g we have say -
Employee - Id,Name,BenefitId,StartDate,EndDate
So based on benefitId I need to do different calculation using dates between StartDate and EndDate and update employee details. I am just making this contrived example to give a idea on my situation.
What are your thoughts on it ? Are there better alternatives for cursors like say using temp tables or user defined functions? When should you really opt for them or should we never be using cursors ? Thanks everyone for their help.
I once changed a stored procedure from cursors to set based logic. Running time went from 8 hours to 22 seconds. That's the kind of difference we're talking about.
Instead of taking different action a record at a time, use several passes on the data. Update and set field1=A where field2 is X, then update and set field1= B where field2 is Y, etc.
I've changed out cursors and moved from over 24 hours of processing time to less than a minute.
TO help you see how to fix your proc with set-based logic, read this:
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
A cursor does row-by-row processing, or "Row By Agonizing Row" if your name is Jeff Moden.
This is just one example of how to do set-based SQL programming as opposed to RBAR, but it depends ultimately on what your cursor is doing.
Also, have a look at this on StackOverflow:
RBAR vs. Set based programming for SQL
First off, it sounds like you are trying to mix some business logic in your stored procs. That's generally something you want to avoid. A better solution would be to have a middle tier layer which encapsulates that business logic. That way your data layer remains purely data.
To answer your original question, it really depends on what you are using the cursors for. In some cases you can use a table variable or a temp table. You have to remember to free up temp tables though so I would suggest using table variables whenever possible. Sometimes, though, there is just no way around using cursors. Maybe the original DBA's didn't normalize enough (or normalized too much) and you are forced to use a cursor to traverse through multiple tables without any foreign key relationships.

Should I store reference data in my application memory, or in the database?

I am faced with the choice where to store some reference data (essentially drop down values) for my application. This data will not change (or if it does, I am fine with needing to restart the application), and will be frequently accessed as part of an AJAX autocomplete widget (so there may be several queries against this data by one user filling out one field).
Suppose each record looks something like this:
category
effective_date
expiration_date
field_A
field_B
field_C
field_D
The autocomplete query will need to check the input string against 4 fields in each record and discrete parameters against the category and effective/expiration dates, so if this were a SQL query, it would have a where clause that looks something like:
... WHERE category = ?
AND effective_date < ?
AND expiration_date > ?
AND (colA LIKE ? OR colB LIKE ? OR colC LIKE ?)
I feel like this might be a rather inefficient query, but I suppose I don't know enough about how databases optimize their indexes, etc. I do know that a lot of really smart people work really hard to make database engines really fast at this exact type of thing.
The alternative I see is to store it in my application memory. I could have a list of these records for each category, and then iterate over each record in the category to see if the filter criteria is met. This is definitely O(n), since I need to examine every record in the category.
Has anyone faced a similar choice? Do you have any insight to offer?
EDIT: Thanks for the insight, folks. Sending the entire data set down to the client is not really an option, since the data set is so large (several MB).
Definitely cache it in memory if it's not changing during the lifetime of the application. You're right, you don't want to be going back to the database for each call, because it's completely unnecessary.
There's can be debate about exactly how much to cache on the server (I tend to cache as little as possible until I really need to), but for information that will not change and will be accessed repeatedly, you should almost always cache that in the Application object.
Given the number of directions you're coming at this data (filtering on 6 or more columns), I'm not sure how much more you'll be able to optimize the information in memory. The first thing I would try is to store it in a list in the Application object, and query it using LINQ-to-objects. Or, if there is one field that is used significantly more than the others, or try using a Dictionary instead of a list. If the performance continues to be a problem, try using storing it in a DataSet and setting indexes on it (but of course you loose some code-simplicity and maintainability this way).
I do not think there is a one size fits all answer to your question. Depending on the data size and usage patterns the answer will vary. More than that the answer may change over time.
This is why in my development I built some intermediate layer which allows me to change how the caching is done by changing configuration (with no code changes). Every while we analyze various stats (cache hit ratio, etc.) and decide if we want to change cache behavior.
BTW there is also a third layer - you can push your static data to the browser and cache it there too
Can you just hard-wire it into the program (as long as you stick to DRY)? Changing it only requires a rebuild.

Resources