Selecting unique rows from a table having duplicate values in only one column - vertica

I have a table containing zip codes and city areas. The data is such that multiple zip codes can refer to the same area. It currently looks like this.
ZIP CITY AREA
1 A
2 B
2 A
3 C
3 A
4 D
I want to remove the duplicate occurrences so that the table looks like this:
ZIP CITY AREA
1 A
2 B
3 C
4 D
I don't mind which city area is mapped to a zipcode, but am unable to remove the duplicates using Distinct().
I understand of the question is a simple one but am new to SQL and any help would greatly be appreciated.

Try using GROUP BY, which is supported by Vertica:
SELECT city_area, MIN(ZIP) AS ZIP
FROM yourTable
GROUP BY city_area;
Your sample data, while brief, seems to imply that you want to select the smallest ZIP value for each city area, should a city area have more than one record.

Related

How to show different groups in same matrix where no parent or child relationship

I am tasked with reproducing a spreadsheet in an SSRS report to save hours of Excel spreadsheet work. I have done all the calculations and got them into a single dataset however I am not able to work out how to display in the same table/matrix.
My spreadsheet looks like this:
Column B is a text column used to describe what the figures in each group are showing. Col C is 'Region' grouping.
I have got this far with my matrix - grouping by region and month. This gives me rows 3 to 8 incl of the spreadsheet.
But I am not able to work out how to add the next group of data (rows 9 to 12 in the spreadsheet) into the matrix. Each group of figures would use an expression to pull by a different field so only using single dataset: I still want it to use the region and month exactly the same as the top group. There is no parent or child relationship between the labels in col B in the spreadsheet.
I have tried adding an adjacent group below but it is still trying to keep it as part of the top group.
Is this at all possible?
do I need to have 6 different matrix, placing them together and just hide the month names in the bottom 5?
This is an extract of the data results. The top group counts the uniques customer id, the second group counts the unique sale id, the 3rd group totals the net sale value, the 4th group totals the profit value, the 5th group calculates the total sales and divides by the number of customers, the 6th group calculates the total sales and divides by the number of sales.
It looks like you will have to have 6 separate tablix and amend the aggregate function and field for each tablix

Store a tree structure as a table

Let me describe the one record structure in pseudo-code:
Record
UserName
E-mail
Items[8]
ItemPropertyA
ItemPropertyB
ItemPropertyC
ItemPropertyD
ItemPropertyE
There are 1-8 items in a record and exactly 5 properties each in each item. So I need to store these many records as (excel) table and I want it to be human readable, if possible. The straitforward approach is to put items and properties in 8 * 5 = 40 columns, but this is difficult to review. I'm going to place a JSON array of properties in each cell (one celll per item), using as many cells in each rows as needed. I'm just curious about other tree-to-table possibilities.
There is an alternative to 40 separate columns (some of which may be unused if there are fewer than 8 items in a record). You can use database style normalized records:
SHEET 1
RecordId UserName Email
1 Bobby bobby#example.com
2 Susan sueb#example.com
SHEET 2
RecordId ItemId PropertyA PropertyB PropertyC PropertyD PropertyE
1 1 Chocolate Electric Round Silver Hebrew
1 2 Raspberry Steam Trapezoid Brass Esperanto
1 3 Durian Gravity Bezier Titanium Bahasa Melayu
2 1 Vanilla Solar Rhombus Copper Pashto
Of course you could normalize even further and have just a single Property column, but the above seems perhaps enough when you know each item has exactly the same five properties.

OpenOffice Calc move only unique values to new column

I looked around for a bit and didn't see any question quite like the one I have. I have a sheet with over 80k values in column A. What I need, is to remove every occurrence of a duplicate. If the value 5 appears more than once, I don't want the value at all. For example, if I have something like this:
A
1
2
2
3
4
3
I ONLY want the values of 1 and 4, because they only appear once. I'd like every other value deleted, or to have only the values like 1 and 4 appear in another column.
Any help is greatly appreciated.
Work on a copy as the following deletes records from source data. In B1 (adjust 90000 to suit):
=COUNTIF(A$1:A$90000;A1)>1
and copy down to suit. Filter A:B, select 1 for ColumnB and delete the selected rows. Change filter to select All.

SSRS / Visual Studio: Distribute data into columns based on another field

Is it possible to easily distribute data ('subject's in my case) into different columns based on the value of another field ('block' in my case) so I could have a kind of timetabling grid report, i.e.
if my data looks like:
Subject | Block
----------
English | A
French | B
Science | C
----------
x | A
y | B
z | C
How might I produce a table / matrix that looks like:
Block A | Block B | Block C
English | French | Science
x | y | z
(forgive the formatting!)
I can't help thinking this must be straight forward, but I can't seem to find the appropriate technique. Something like a pivot, but listing rather than aggregating values? I thought maybe filtered columns, but that doesn't seem very efficient. Many thanks for any advice!
Using the following as a basis
https://stackoverflow.com/a/9007678/2311633
(I have copied the relevant sections so the complete answer is on this page...)
You can create a horizontally expanding table by:
First create a Tablix by dragging the Matrix Report Item onto the design surface. The Tablix will have a RowGroup and a ColumnGroup by default.
Delete the Row Group by right clicking on it and selecting "Delete Group" In the Delete Group prompt, delete both just the group. (Not related rows and columns; you'll probably want these as left label for your rows.)
At this point right click the Column Group and "Add Group -> Child Group...". Keep adding child groups for each of the rows you require. For each child group select 'Group by' and choose each of the series you wish to display on each individual row.
I am unable to post images at this time, but have been able to recreate what you have requested above. Once I'm able to I'm can post screenshots for further clarification if required.
Update
Alternatively, If you are able to edit the SQL source could you add another field to define a row number for each item. Using ROW_NUMBER() and PARTITION_BY you could add a new column such as
SELECT ROW_NUMBER() OVER(PARTITION BY [Block] ORDER BY [Block]) as rownum
Then you could just create a simple Matrix as shown here https://www.flickr.com/photos/135805284#N08/20883237722/

Matching Similar but not same columns

I have 2 tables say A and B. I need to updated state column in A based on city in B.
B has got actual lookup data
A and B has another column City.
City in A is kind of junk data like Atlanta,Atlanta Georgia,Atlanta-Georgia,Atlanta,Georgia
etc
City in B is just Atlanta.
I need to compare both the cities and update state in A
SELECT DISTINCT b.state FROM A, B WHERE INSTR(A.city ,TRIM(UPPER(B.CITY))) >0
The above select select most of them but not some of them. Can someone help me out please.
Thanks
Can you please list some examples that are left out from above SQL. Thanks.
Secondly, try soundex function. See if that works.
Cheers
V

Resources