I've got a database setup that is a bit on the complicated side, with several many-many tables.
I'm trying to generate an XML document from this data. There's a bit of checking, like if a name is not defined in one language try to get the name from another language (instead of showing null)
The problem I have that there are a lot of queries within loops.
Are there any guidelines for this, like what stuff to stay away from and what to use, to improve the performance?
cfoutput cfloop cfquery ?
If the looping logic is basically doing data processing, eg: based on the values from the first query, deciding what to go back to the database with for the next query, the best thing you can do for performance is to take all that logic out of your CF code, and put it into the DB. Use the DB for data processing, use CF for handling the data once it's been processed, and converting it into output.
The only time CF should be doing data manipulation is if you need to process data from differing sources: eg the database, some remote service, the file system, a different database, etc. Basically only if the database can't do the data processing itself should you be involving ColdFusion.
Regarding, " like if a name is not defined in one language try to get the name from another language (instead of showing null)".
You should be able to do this in your query. Pretty much every db out there has a coalesce function. They all support case constructs as well. You just have to pick the most appropriate method for your situation.
Related
I have data from multiple sources - a combination of Excel (table and non table), csv and, sometimes, even a tsv.
I create queries for each data source and then I am bringing them together one step at a time or, actually, it's two steps: merge and then expand to bring in the fields I want for each data source.
This doesn't feel very efficient and I think that maybe I should be just joining everything together in the Data Model. The problem when I did that was that I couldn't then find a way to write a single query to access all the different fields spread across the different data sources.
If it were Access, I'd have no trouble creating a single query one I'd created all my relationships between my tables.
I feel as though I'm missing something: How can I build a single query out of the data model?
Hoping my question is clear. It feels like something that should be easy to do but I can't home in on it with a Google search.
It is never a good idea to push the heavy lifting downstream in Power Query. If you can, work with database views, not full tables, use a modular approach (several smaller queries that you then connect in the data model), filter early, remove unneeded columns etc.
The more work that has to be performed on data you don't really need, the slower the query will be. Please take a look at this article and this one, the latter one having a comprehensive list for Best Practices (you can also just do a search for that term, there are plenty).
In terms of creating a query from the data model, conceptually that makes little sense, as you could conceivably create circular references galore.
So I was thinking... Imagine you have to write a program that would represent a schedule of a whole college.
That schedule has several dimensions (e.g.):
time
location
indivitual(s) attending it
lecturer(s)
subject
You would have to be able to display the schedule from several standpoints:
everything held in one location in certain timeframe
everything attended by individual in certain timeframe
everything lecturered by a certain lecturer in certain timeframe
etc.
How would you save such data, and yet keep the ability to view it from different angles?
Only way I could think of was to save it in every form you might need it:
E.g. you have folder "students" and in it each student has a file and it contains when and why and where he has to be. However, you also have a folder "locations" and each location has a file which contains who and why and when has to be there. The more angles you have, the more size-per-info ratio increases.
But that seems highly inefficinet, spacewise.
Is there any other way?
My knowledge of Javascript is 0, but I wonder if such things would be possible with it, even in this space inefficient form.
If not that, I wonder if it would work in any other standard (C++, C#, Java, etc.) language, primarily in Java...
EDIT: Could this be done by using MySQL database?
Basically, you are trying to first store data and then present it under different views.
SQL databases were made exactly for that: from one side you build a schema and instantiate it in a database to store your data (the language is called Data Definition Language, DDL), then you make requests on it with the query language (SQL), what you call "views". There are even "views" objects in SQL databases to build these views Inside the database (rather than having to the code of the request in the user code).
MySQL can do that for sure, note that it is possible to compile some SQL engine for Javascript (SQLite for example) and use local web store to store the data.
There is another aspect to your question: optimization of the queries. While SQL can do most of the request job for your views. It is sometimes preferred to create actual copies of the requests results in so called "datamarts" (this is called de-normalizing a request), so that the hard work of selecting or computing aggregate/groups functions and so on is done once per period of time (imagine that a specific view changes only on Monday), then requesters just have to read these results. It is important in this case to separate at least semantically what is primary data from what is secondary data (and for performance/user rights reasons, physical separation is often a good idea).
Note that as you cited MySQL, I wrote about SQL but mostly any database technology could do that what you searched to do (hierarchical, object oriented, XML...) as long as the particular implementation that you use is flexible enough for your data and requests.
So in short:
I would use a SQL database to store the data
make appropriate views / requests
if I need huge request performance, make appropriate de-normalized data available
the language is not important there, any will do
I am developing an internal web application that needs a back end. The data stored is not really RDBMS type. Currently it is in XML document fashion that the application parses (XQuery) to display html tables and other type of fields.
It is likely that I will have a few more different types of XML documents and CSV(comma separated values) coming up. Given the scenario, I can always back the data up with a mySQL database, breaking the process that generates XML or CSV to insert straight in to database.
Is no-sql database a good choice in this scenario? or mySQL is still better? I do not see any need for clustering/high availability/distributed processing scenarios.
Define "better".
I think the choice should be made based on how relational (MySQL) or document-based (NoSQL) your data is.
A good way to know is to analyze typical use cases. Better yet, write two prototypes and measure.
Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?
Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu
As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.
I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.
I have a request for a feature to be able to save a user's search for later.
Right now I'm building LINQ statements on the fly based on what the user has specified.
So I started wondering, is there an easy way for me to simply take the query that the user built, and persist it somewhere, preferably my database, so that I can retrieve it later?
Is there some way of persisting the query as XML or perhaps JSON, and then reconstituting the query later?
Never done this before, but I've had this idea:
Rather than having the query run against your database directly, if you were to have it run against an OData endpoint, you could conceivably extract the URL that is generated as the query string, and save that URL for later use. Since OData has a well-though-out spec already, you would be able to profit from other people's labor.
I'd go with a domain-specific object here even if such goodies did exist -- what happens when you save serialized queries in LINQ and your underlying model changes, invalidating everyone's saved queries. Using your own data format should shield you from this to some extent.
Take a look at the Expression class. This will allow you to pre-compile a query. Although persisting this for later use to the DB for better performance is questionable.
I'm writing this as I watch this presentation at PDC10. Just after the 1-hour mark, he shows how he's built a JSON serializer for expression trees. You might find that interesting.