Exact Duplicate records in Ms Access DB Deleting? - ms-access-2013

I have a database which imports linked data-tables. Obviously with linked Tables I cannot change the design of the data tables. However there are many duplicates in the data table that I want to use and my aim is to run a query that deletes all but 1 of the duplicates in the table. Is there a way of doing this??
Any support would be appreciated.
Chris

To omit the duplicates you could conceivably create a SELECT DISTINCT query to return all the fields from the linked table, and then use that query (instead of the linked table) as a basis for your other queries.

Related

Is there an advantage to using a local index on a partitioned table in Oracle?

I assume the answer is "no" in this scenario, but I figured I'd ask and see if there was something I was missing:
I have an Oracle table which is partitioned for ease of data loading -- data is loaded into six separate tables and then partition-switched into the main table. The only thing differentiating these loading tables is the source of the data, so each one has a unique datasource column which is used to partition the main table. We occasionally have some ad hoc queries which look at this datasource in the main table, but the standard reports querying this table ignore this column entirely. Nothing insert/update/deletes individual records from this table, so there's no concern about updating any indexes.
In this case, is there any reason to use local indexes instead of global ones?
A local index makes a lot of sense - if you use partitioning for performance reasons.
If your queries always contain the partition key then a Oracle will only scan that specific partition (that is known as "partition pruning").
If you then have additional conditions that would benefit from an index lookup, the database only needs to check the local index which is much smaller then a global index and thus the lookup will be faster.
In your case, if you never (or almost never) include the partition key in the queries, you are right that the local index wouldn't be helpful.

Joining Tables in Kibana

Suppose I have a huge database (table a) about employees in a certain department which includes the employee name in addition to many other fields. Now in a different databse (or a different table, say table b) I have only two entries; the employee name and his ID. But this table (b) contains entries not only for one department but rather for the whole company. The raw format for both tables is text-files so I parse them with logstash into Elasticsearch and then I visualize the results with Kibana.
Now after I created several visualizations from table (a) in Kibana where the x-axis shows the employee name, I realize it would be nice if we have the employee IDs instead. Since I know I have this information in table (b), I search for someway to tell Kibana to translate the employee name in the graphs generated from table (a) to employee ID based on table (b). My questions are as follows:
1) Is there a way to do this directly in Kibana? If yes, can we do it if each table is saved in a separate index or do we have to save them both in the same idnex?
2) If this cannot be done directly in Kibana and has to be done when indexing the data, is there a way to still parse both text files separately with logstash?
I know Elasticsearch is a non-relational database and therefore is not designed for SQL-like functionalities (join). However there should be an equivalent or a workaround. This is just a simple use case but of course the generic question is how to correlate data from different sources. Otherwise Elasticsearch would be honestly not that powerful.
Similar questions have been asked and answered.
Basically the answer is that -- no you can't do joins in Kibana, you have to do them at indexing time. Space is cheap and elasticsearch handles duplicate data nicely, so just create any fields you need to display at indexing time.
You might want to give Kibi a try.
The answer, unfortunately that I know of, is either write your own plug-in OR as we have had to do, downgrade to ES 2.4.1 and install Kibi
(https://siren.solutions/new-release-siren-join-2-4-1-compatible-with-es-2-4-1/)
and then install the kibi join plugin
(http://siren.solutions/relational-joins-for-elasticsearch-the-siren-join-plugin/)
This will allow you to get the joins you seek from a relational DB.

Check all table columns for a value

Ok, tricky question I am trying to figure out where a database schema is storing a particular pointer. I know the pointer value I just don't what table it is in or what column. I know the pointer is 123123123. How do I check all table columns to see if any of them have that value?
Thanks.
In h2 you can use fulltext search, but then you would need to add all tables in the search scope and indexing.
If you need to index only primary keys, then it might be better but you still need to come up with individual FT_CREATE_INDEX() calls for each table. You can automate this with several languages or with ETLs (like scriptella).
If you've enough disk space, you could dump a SQL from your db and use a viewer for big files like glogg.
The advantage of the first solution is no external tools but you need to work out a specific indexing script for SQL for any existing or new table. The 2nd solution is a 1 time fix.
I use SQL Search from RedGate. It's free and it helps you find any text anywhere in the database.
https://www.red-gate.com/products/?gclid=CjwKEAjwiYG9BRCkgK-G45S323oSJABnykKAE7IH_EMhnmq7OdLdXljfIkdGZrDD6OnOrT4VB0agahoCVn3w_wcB

Compound rowkey in Azure Table storage

I want to move some of my Azure SQL tables to Table storage. As far as I understand, I can save everything in the same table, seperating it using PartitionKey and keeping it unique within each partition using Rowkey.
Now, I have a table with a compound key:
ParentId: (uniqueidentifier)
ReportTime: (datetime)
I also understand RowKeys have to be strings. Will I need to combine these in a single string? Or can I combine multiple keys some other way? Do I need to make a new key perhaps?
Any help is appreciated.
UPDATE
My idea is to put data from several (three for now) database tables and put in the same storage table seperating them with the partition key.
I will query using the ParentId and a WeekNumber (another column). This table has about 1 million rows that's deleted weekly from the db. My two other tables has about 6 million and 3.5 million
This question is pretty broad and there is no right answer.
The specific question - can you use Compound Keys with Azure Table Storage. Yes, you can do that. But this involves manual Serializing / Deserializing of your object's properties. You can achieve that by overriding the TableEntity's ReadEntity and WriteEntity methods. Check this detailed blog post on how can you override these methods to use your own custom serialization/deserialization.
I will further discuss my view on your more broader question.
First of all, why you want to put data from 3 (SQL) tables into one (Azure Table)? Just have 3 Azure tables.
Second thought, as Fabrizio points out is how are you going to query the records. Because Windows Azure Table service has only one index, and that is PartitionKey + RowKey properties (columns). If you are pretty sure you will mostly query data by known PartitionKey and RowKey, then Azure Tables is perfectly suiting you! However you say that your combination for RowKey is ParentId + WeekNumber! That means that a record is uniquely identified by this combination! If it is true, then you are even more ready to go.
Next you say you are going to delete records every week! You should know that DELETE operation acts on a single entity. You can use Entity Group Transactions to DELETE multiple entities at once, but there is a limit of (a) All entities in batch operation must have the same PartitionKey, (b) The maximum number of entities per batch is 100, and (c) The maximum size of batch operation is 4MB. Say you have 1M records like you say. In order to delete them, you have to first retrieve them in groups by 100, then delete in groups by 100. These are, in best possible case 10k operations on retrieval and 10k operations on deletion. Event if it will only cost 0.002 USD, think about time taken to execute 10k operations against a REST API.
Since you have to delete entities on a regular basis, which is fixed to a WeekNumber let's say, I can suggest that you dynamically create your tables and include the week number in its name. Thus you will achieve:
Even better partitioning of information
Easier and more granular information backup / delete
Deleting millions of entities requires just one operation - delete table.
There is not an unique solution for your problem. Yes, you can use ParentID as PartitionKey and ReportTime as Rowkey (or invert the assignment). But the big 2 main questions re: how do you query your data, with what conditions? and how many data do you store? 1000, 1 million items, 1000 millions items? The total storage usage is important. But it's also very important to consider the number of transaction you will generate to the storage.

IOT vs Heap in Oracle. Help me make choice

I've read many information about IOT, and now in my head gruel...
Pls, help me solve question.
Have table, that have structure:
ID (PK); ID_DRUG_NAME (a); ID_FROM (b); ID_PROVIDER (c); DELETED;
The data from this table is never deleted but only marked that are removed.
Many queries uses ID, another queries uses a,b or a,c or a,b,c.
I want recreate this table using operator ORGANIZATION INDEX.
How it will be profitable?
How to rightly create a primary key and indexes?
What pitfalls do I get?
Index-organized tables (IOT) are best used when there is a single access-path. You've identified two different lead columns, so an IOT is probably not a good choice.
The issue here is that, if you make it an IOT, you have to choose one of the two columns (ID or ID_DRUG_NAME) that you'll frequently be filtering on to index. Theoretically, you could still add a second index on an IOT, but it's almost always a bad idea. An IOT with a second index is typically performs worse than if the second index doesn't exist, even when querying against the column in the second index.

Resources