I have an Access Database where I have a defined query.
This query has 3 fields "Category", "Points" and "Place"
Multiple records can have the same Category entry.
What I now need to do is group query records by "Category".
In each of these subgroups order by "Points"
Edit "Place" in subgroup starting from "1st" and going to "2nd" and so on.
Repeat for next group till there are no more groups.
Maybe it will be easier if I say that the purpose is to assign awards (1st award, 2nd, 3rd (and so on) in different age groups according to points scored by a player).
I am a .NET programmer so I Immediately thought about LINQ but there is no linq here;/
Is there any other better way than to do separate sql requests and then sorting by Points?
Ok so here is the recordset I have (with Place column already filled, originally its empty and this is how it should be filled):
ID CATEGORY POINTS PLACE
----------------------------------
1 CAT1 8 2
1 CAT1 10 1
1 CAT2 12 2
1 CAT2 11 3
1 CAT2 13 1
1 CAT3 12 1
1 CAT3 9 3
1 CAT3 3 4
1 CAT3 10 2
I don't know up front how many categories are there.
Not sure what you mean by separate SQL requests. In VBA, you could use a single SQL request to fill a temp table with your data (plus an empty "PLACE" column), then use a recordset based on a similar query (like yours but against the temp table with "ORDER BY Category, Points DESC" added). Then loop through that recordset and assign the score (using recordset.edit & .update etc.). (But if you're a .Net programmer, I'm probably not telling you anything you don't already know...)
After reading documentation thoroughly I found that I would have to do a lot of sorting and grouping by hand so I eventually decided it is not worth it and switched to writing a desktop C# app which operated on this database. From there everything was obvious and a lot faster
Related
I need to get sales figures from open orders, sorted by code. The items are separated in the stock table by lot number (for traceability reasons) but the lot numbers do not appear in the orders table. The only link between the 2 tables is the part number.
When my query
SELECT code, SUM(qty*price) AS Sales
FROM orders INNER JOIN stock ON orders.partno = stock.partno
GROUP BY code
started returning strange results (very high sales figures for a given code), I changed it to
SELECT DISITNCT orders.partno, stock.lot, stock.code
FROM orders INNER JOIN stock ON orders.partno = stock.partno
and noticed that if several lots of a given part are in stock they are all returned
Part1 LotA code
Part1 LotB code
Part1 LotC code
which means that if a customer orders 300 units of Part1, my query returns 900 and my sales figure is multiplied by 3.
How can I work around that?
It must be noted that I do not work from a database but from a group of tables, the structures of which can sometimes be whimsical.
You should really use table.column or alias.column reference when writing queries. As your question stands, we do not know which table the PRICE comes from... the parts table or the lots table. If you are dealing with inventory tracking such as FIFO or LIFO method accounting, you must have an association to the lot table for inventory being tracked/sold.
Now, why are you getting large numbers? That is because of a Cartesian result. If you are not familiar with that, for each record in one table joined to another, it is returning however many matches.
So, if you have an order of one line item, there is only one line item in a products available table. So this is simple 1:1 ratio. Now, you have your STOCK table that can have multiple records for the exact same part number. You are now returning the same original order line item for EACH LOT ENTRY in the Stock table. So now, for your 1 item, you are getting 3 lots (1:3 result).
I know this is important from a cost-of-goods sold basis, hence your need to know which "lot" it is joined to so you only get that one specific record for proper pricing.
If however, you do have a generic product table of everything you sell, and that table has a generic common price no matter which "lot" was used for the sale, I would join to that table instead for your report. But you will still have the accounting issue of inventory, cost-of-goods, etc.
newparts_calc
if (([MonthToDateQuery].[G/L Account] = 4200 and [Query1].[G_L_Group] = 'NEW')) THEN ([Credit Amount]-[Debit Amount]) ELSE (0)
Data Item1
total([newparts_calc])
I need Data Item1 to return newparts_calc values only.
So for example in 1st row Data Item1 should be 8,540.8, but is 34,163.2
Whats wrong? how do i fix?
REVISED QUESTION
I apologize for not making sense on the original question.
I have many of the calc's that im trying to gather and put on a crosstab. I want to see sales by month (row) and part category (column)
[Query2] is the one shown in picture above.
It joins [MonthToDateQuery] AND [Query1]
The join is on 'Invoice' and carnality is 1..1 = 1..1
[MonthToDateQuery] is based on the package im working in. General ledger. It supplies the g/l entries for each sales g/l account
[Query1] is a SQL query i brought in to be able to break out categories even further from g/l group.
For example g/l account 4300 is rebuilt. However i needed to break out even further to see Rebuilt-Production and Rebuilt-New. I can do that with the g/l group.
I saw in my g/l account ledger entries that it referenced the invoice number. So thats how i tied in my SQL.
So as you can see from the table below (which is the view tabular data from query) i need a total. I have tried plugging newparts_calc into my crosstab and setting aggregation to total but the numbers still dont seem right. I dont think i have something set as it should be.
All the calc's im doing are based on single or multiple G/L Accounts and single or multiple G/L Groups.
Any Advice?
As you can see the problem seems to be duplicate invoice numbers.
How can i fix?
Couple things come to mind:
-Set the processing order to 2
-Since your calc is always a multiple and you are joining two queries, you may need to check your cardinality. Sometimes it helps to add derived queries to ensure you are working with the correct grain.
I'm obviously missing something, but if you want
I need Data Item1 to return newparts_calc values only.
just use newparts_calc, without total? That would give you proper value for row 1 -)
If you need a running-total for days (sum of values for previous days) — you should use a running_total function.
At a guess, one of your two queries is returning multiple rows for each invoice, which will cause this double counting. Look at the output of the two queries and see if that's happening. If so, then you just need to work out how to collapse that down to one row per invoice.
Per your new question - The underlying data has got to be causing the issue. Its clearly not 1:1 (note that even though this is what your stated cardinality is, Cognos does not enforce 1:1). Invoice number is not unique, GL Group is at a lower level.
Suppose I have a large (300-500k) collection of text documents stored in the relational database. Each document can belong to one or more (up to six) categories. I need users to be able to randomly select documents in a specific category so that a single entity is never repeated, much like how StumbleUpon works.
I don't really see a way I could implement this using slow NOT IN queries with large amount of users and documents, so I figured I might need to implement some custom data structure for this purpose. Perhaps there is already a paper describing some algorithm that might be adapted to my needs?
Currently I'm considering the following approach:
Read all the entries from the database
Create a linked list based index for each category from the IDs of documents belonging to the this category. Shuffle it
Create a Bloom Filter containing all of the entries viewed by a particular user
Traverse the index using the iterator, randomly select items using Bloom Filter to pick not viewed items.
If you track via a table what entries that the user has seen... try this. And I'm going to use mysql because that's the quickest example I can think of but the gist should be clear.
On a link being 'used'...
insert into viewed (userid, url_id) values ("jj", 123)
On looking for a link...
select p.url_id
from pages p left join viewed v on v.url_id = p.url_id
where v.url_id is null
order by rand()
limit 1
This causes the database to go ahead and do a 1 for 1 join, and your limiting your query to return only one entry that the user has not seen yet.
Just a suggestion.
Edit: It is possible to make this one operation but there's no guarantee that the url will be passed successfully to the user.
It depend on how users get it's random entries.
Option 1:
A user is paging some entities and stop after couple of them. for example the user see the current random entity and then moving to the next one, read it and continue it couple of times and that's it.
in the next time this user (or another) get an entity from this category the entities that already viewed is clear and you can return an already viewed entity.
in that option I would recommend save a (hash) set of already viewed entities id and every time user ask for a random entity- randomally choose it from the DB and check if not already in the set.
because the set is so small and your data is so big, the chance that you get an already viewed id is so small, that it will take O(1) most of the time.
Option 2:
A user is paging in the entities and the viewed entities are saving between all users and every time user visit your page.
in that case you probably use all the entities in each category and saving all the viewed entites + check whether a entity is viewed will take some time.
In that option I would get all the ids for this topic- shuffle them and store it in a linked list. when you want to get a random not viewed entity- just get the head of the list and delete it (O(1)).
I assume that for any given <user, category> pair, the number of documents viewed is pretty small relative to the total number of documents available in that category.
So can you just store indexed triples <user, category, document> indicating which documents have been viewed, and then just take an optimistic approach with respect to randomly selected documents? In the vast majority of cases, the randomly selected document will be unread by the user. And you can check quickly because the triples are indexed.
I would opt for a pseudorandom approach:
1.) Determine number of elements in category to be viewed (SELECT COUNT(*) WHERE ...)
2.) Pick a random number in range 1 ... count.
3.) Select a single document (SELECT * FROM ... WHERE [same as when counting] ORDER BY [generate stable order]. Depending on the SQL dialect in use, there are different clauses that can be used to retrieve only the part of the result set you want (MySQL LIMIT clause, SQLServer TOP clause etc.)
If the number of documents is large the chance serving the same user the same document twice is neglibly small. Using the scheme described above you don't have to store any state information at all.
You may want to consider a nosql solution like Apache Cassandra. These seem to be ideally suited to your needs. There are many ways to design the algorithm you need in an environment where you can easily add new columns to a table (column family) on the fly, with excellent support for a very sparsely populated table.
edit: one of many possible solutions below:
create a CF(column family ie table) for each category (creating these on-the-fly is quite easy).
Add a row to each category CF for each document belonging to the category.
Whenever a user hits a document, you add a column with named and set it to true to the row. Obviously this table will be huge with millions of columns and probably quite sparsely populated, but no problem, reading this is still constant time.
Now finding a new document for a user in a category is simply a matter of selecting any result from select * where == null.
You should get constant time writes and reads, amazing scalability, etc if you can accept Cassandra's "eventually consistent" model (ie, it is not mission critical that a user never get a duplicate document)
I've solved similar in the past by indexing the relational database into a document oriented form using Apache Lucene. This was before the recent rise of NoSQL servers and is basically the same thing, but it's still a valid alternative approach.
You would create a Lucene Document for each of your texts with a textId (relational database id) field and multi valued categoryId and userId fields. Populate the categoryId field appropriately. When a user reads a text, add their id to the userId field. A simple query will return the set of documents with a given categoryId and without a given userId - pick one randomly and display it.
Store a users past X selections in a cookie or something.
Return the last selections to the server with the users new criteria
Randomly choose one of the texts satisfying the criteria until it is not a member of the last X selections of the user.
Return this choice of text and update the list of last X selections.
I would experiment to find the best value of X but I have in mind something like an X of say 16?
I'm working on a VB6 program that connects to a SQL Server 2008 R2 database. In the past I have always used the MSFlexGrid control and populated it manually. Now, however, the guy who is paying me for this wants me to use data-bound grids instead, which forces me to use the MSHFlexGrid control because I'm using ADO and not DAO. So, I have two questions...
First, how would I move a column in a MSHFlexGrid? For example, if I wanted the third column to appear as the sixth column in the grid, is there a simple single line of code that would do that?
Second, believe it or not, I've never had to do anything in a grid other than display the data, as is, from a recordset. Now, however, I have a recordset with some fields that contain just ID numbers that refer to records in other files - for example, a field containing an ID number referring to a record in the Customers table, instead of the field containing the customer's name. What is the easiest way to, instead of having a column showing customer ID numbers from the recordset, having that column show customer names? I thought I read somewhere that there's a way to embed a sql command in a MSHFlexGrid column, but if there is I wouldn't know how to do it. Is this possible, or is there a simpler way to do it?
TIA,
Kevin
The column order would typically be handled by your SELECT statement.
Say you have a Pies table that has a FruitID foreign key related to the FruitID in a Fruits table:
SELECT PieID AS ID, Pie, Fruit FROM Pies LEFT OUTER JOIN Fruits
ON Pies.FruitID = Fruits.FruitID
This returns 3 items: ID, Pie, and Fruit in that order.
Moving columns after the query/display operation is rarely used, but yes ColPosition can be used for that.
Wow! VB6.... Back to the future! :-)
You can move Columns using the ColPosition Property.
This article shows how you could setup the grid to display hierarchical data.
If you just want to display the customer name on the same line as the main data then that is doable as well by just creating the proper SQL for your data source. For that matter you can control the column order the same way as well.
Now, how about considering upgrading to .Net? Just kidding..... No, I'm not. OK. I am, maybe. :-)
I have a course search engine and when I try to do a search, it takes too long to show search results. You can try to do a search here
http://76.12.87.164/cpd/testperformance.cfm
At that page you can also see the database tables and indexes, if any.
I'm not using Stored Procedures - the queries are inline using Coldfusion.
I think I need to create some indexes but I'm not sure what kind (clustered, non-clustered) and on what columns.
Thanks
You need to create indexes on columns that appear in your WHERE clauses. There are a few exceptions to that rule:
If the column only has one or two unique values (the canonical example of this is "gender" - with only "Male" and "Female" the possible values, there is no point to an index here). Generally, you want an index that will be able to restrict the rows that need to be processed by a significant number (for example, an index that only reduces the search space by 50% is not worth it, but one that reduces it by 99% is).
If you are search for x LIKE '%something' then there is no point for an index. If you think of an index as specifying a particular order for rows, then sorting by x if you're searching for "%something" is useless: you're going to have to scan all rows anyway.
So let's take a look at the case where you're searching for "keyword 'accounting'". According to your result page, the SQL that this generates is:
SELECT
*
FROM (
SELECT TOP 10
ROW_NUMBER() OVER (ORDER BY sq.name) AS Row,
sq.*
FROM (
SELECT
c.*,
p.providername,
p.school,
p.website,
p.type
FROM
cpd_COURSES c, cpd_PROVIDERS p
WHERE
c.providerid = p.providerid AND
c.activatedYN = 'Y' AND
(
c.name like '%accounting%' OR
c.title like '%accounting%' OR
c.keywords like '%accounting%'
)
) sq
) AS temp
WHERE
Row >= 1 AND Row <= 10
In this case, I will assume that cpd_COURSES.providerid is a foreign key to cpd_PROVIDERS.providerid in which case you don't need an index, because it'll already have one.
Additionally, the activatedYN column is a T/F column and (according to my rule above about restricting the possible values by only 50%) a T/F column should not be indexed, either.
Finally, because searching with a x LIKE '%accounting%' query, you don't need an index on name, title or keywords either - because it would never be used.
So the main thing you need to do in this case is make sure that cpd_COURSES.providerid actually is a foreign key for cpd_PROVIDERS.providerid.
SQL Server Specific
Because you're using SQL Server, the Management Studio has a number of tools to help you decide where you need to put indexes. If you use the "Index Tuning Wizard" it is actually usually pretty good at tell you what will give you the good performance improvements. You just cut'n'paste your query into it, and it'll come back with recommendations for indexes to add.
You still need to be a little bit careful with the indexes that you add, because the more indexes you have, the slower INSERTs and UPDATEs will be. So sometimes you'll need to consolidate indexes, or just ignore them altogether if they don't give enough of a performance benefit. Some judgement is required.
Is this the real live database data? 52,000 records is a very small table, relatively speaking, for what SQL 2005 can deal with.
I wonder how much RAM is allocated to the SQL server, or what sort of disk the database is on. An IDE or even SATA hard disk can't give the same performance as a 15K RPM SAS disk, and it would be nice if there was sufficient RAM to cache the bulk of the frequently accessed data.
Having said all that, I feel the " (c.name like '%accounting%' OR c.title like '%accounting%' OR c.keywords like '%accounting%') " clause is problematic.
Could you create a separate Course_Keywords table, with two columns "courseid" and "keyword" (varchar(24) should be sufficient for the longest keyword?), with a composite clustered index on courseid+keyword
Then, to make the UI even more friendly, use AJAX to apply keyword validation & auto-completion when people type words into the keywords input field. This gives you the behind-the-scenes benefit of having an exact keyword to search for, removing the need for pattern-matching with the LIKE operator...
Using CF9? Try using Solr full text search instead of %xxx%?
You'll want to create indexes on the fields you search by. An index is a secondary list of your records presorted by the indexed fields.
Think of an old-fashioned printed yellow pages - if you want to look up a person by their last name, the phonebook is already sorted in that way - Last Name is the clustered index field. If you wanted to find phone numbers for people named Jennifer or the person with the phone number 867-5309, you'd have to search through every entry and it would take a long time. If there were an index in the back with all the phone numbers or first names listed in order along with the page in the phonebook that the person is listed, it would be a lot faster. These would be the unclustered indexes.
I would try changing your IN statements to an EXISTS query to see if you get better performance on the Zip code lookup. My experience is that IN statements work great for small lists but the larger they get, you get better performance out of EXISTS as the query engine will stop searching for a specific value the first instance it runs into.
<CFIF zipcodes is not "">
EXISTS (
SELECT zipcode
FROM cpd_CODES_ZIPCODES
WHERE zipcode = p.zipcode
AND 3963 * (ACOS((SIN(#getzipcodeinfo.latitude#/57.2958) * SIN(latitude/57.2958)) +
(COS(#getzipcodeinfo.latitude#/57.2958) * COS(latitude/57.2958) *
COS(longitude/57.2958 - #getzipcodeinfo.longitude#/57.2958)))) <= #radius#
)
</CFIF>