SQL TUNNING TIPS
sql is the heart of a Relational DB system. The first consideration when writing an sql statement is that it return a correct result. The second is that it be the most efficient for a given situation.
Following are some general tips that often increase sql statement efficiency. Being general they may not apply to a particular scenario. Any code comparisons are greatly simplified concept examples - not efficiency test cases.
Remember that processing sql is a sequence of Parse (syntax check and object resolution), Execution (required reads and writes), and Fetch (row results retrieved, listed, sorted, and returned). sql "tuning" consists, quite simply, of reducing one or more of them.
Note: generally Parse is the greatest time and resource hog. Parse overhead can be minimised by the use of Procedures, Functions, Packages, Views, etc.
Remember that the RDBMS Optimiser (internal RDBMS code designed to execute queries most efficiently) may transform your statement, i.e. a sub-query back into a join, for reasons that the RDBMS deems appropriate. As query complexity rises optimisation choices increase and the risk of the Optimiser making wrong decisions substantially increases. Because optimisation is part of overall response time writing efficient queries is critical. Testing queries (and variations) prior to putting them into operation is best practice.
Also remember that there can be a disconnect between efficient sql and comprehensible sql. Always document your code.
* One: only "tune" sql after code is confirmed as working correctly.
* Two: ensure repeated sql statements are written absolutely identically to facilate efficient reuse: re-parsing can often be avoided for each subsequent use.
Writing best practices: all sql verbs in upper-case i.e. SELECT; separate all words with a single space; all sql verbs begin on a new line; sql verbs aligned right or left within the initial verb; set and maintain a table alias standard; use table aliases and when a query involves more than one table prefix all column names with their aliases. Whatever you do, be consistent.
* Three: code the query as simply as possible i.e. no unnecessary columns are selected, no unnecessary GROUP BY or ORDER BY.
* Four: it is the same or faster to SELECT by actual column name(s). The larger the table the more likely the savings.
Use:
SELECT customer_id, last_name, first_name, street, city FROM customer; Rather than:
SELECT * FROM customer;
* Five: do not perform operations on DB objects referenced in the WHERE clause:
Use:
SELECT client, date, amount FROM sales WHERE amount > 0;
Rather than:
SELECT client, date, amount FROM sales WHERE amount!= 0;
* Six: avoid a HAVING clause in SELECT statements - it only filters selected rows after all the rows have been returned. Use HAVING only when summary operations applied to columns will be restricted by the clause. A WHERE clause may be more efficient.
Use:
SELECT city FROM country WHERE city!= 'Vancouver' AND city!= 'Toronto'; GROUP BY city;
Rather than:
SELECT city FROM country GROUP BY city HAVING city!= 'Vancouver' AND city!= 'Toronto';
* Seven: when writing a sub-query (a SELECT statement within the WHERE or HAVING clause of another sql statement):
-- use a correlated (refers to at least one value from the outer query) sub-query when the return is relatively small and/or other criteria are efficient i.e. if the tables within the sub-query have efficient indexes.
-- use a noncorrelated (does not refer to the outer query) sub-query when dealing with large tables from which you expect a large return (many rows) and/or if the tables within the sub-query do not have efficient indexes.
-- ensure that multiple sub-queries are in the most efficient order.
-- remember that rewriting a sub-query as a join can sometimes increase efficiency.
* Eight: minimise the number of table lookups especially if there are sub-query SELECTs or multicolumn UPDATEs.
* Nine: when doing multiple table joins consider the benefits/costs for each of EXISTS, IN, and table joins. Depending on your data one or another may be faster.
Note: IN is usually the slowest.
Note: when most of the filter criteria are in the sub-query IN may be more efficient; when most of the filter criteria are in the parent-query EXISTS may be more efficient.
* Ten: where possible use EXISTS rather than DISTINCT.
* Eleven: where possible use a non-column expression (putting the column on one side of the operator and all the other values on the other). Non-column expressions are often processed earlier thereby speeding the query.
Use:
WHERE SALES < 1000/(1 + n);
Rather than:
WHERE SALES + (n * SALES) < 1000;
* Twelve: the most efficient method for storing large binary objects, i.e. multimedia objects, is to place them in the file system and place a pointer in the DB.
No comments:
Post a Comment