Monday, September 13, 2010

PL/SQL, Deterministic Functions, and Function-Based Indexes

User-defined functions provide a tremendous way to create consistent, modular, and re-usable code used to format output and process data within SQL. Unfortunately, when PL/SQL functions are used in SQL statements, performance often suffers due to context switches incurred as Oracle moves between the SQL and PL/SQL engines of the database. To improve performance, function-based indexes can be created, which the optimizer can use in lieu of executing the PL/SQL function for each row. In order for a function-based index to be created with a user-defined function, the function must be identified as DETERMINISTIC.

To illustrate, we will use a user-defined function to solve a problem encountered by many of our clients that use a specific Advancement product where dates are stored as character strings in the database. In this product, the "date" column is mandatory; however, there may be situations when the end user does not know the date value to be entered. In that case, as much information is entered as possible. This may lead to zeroed out values for unknown date components. For example, if the year is not known, but the month and day is, the end user may enter a value of '00000905' (for September 5th with an unknown year). If no date component is known, the end user may enter placeholder values such as all zeros or a space. To further complicate matters, since the date is a character string and validation at the database level is not possible, there may be occurrences of invalid date formats entered.

Note: Of course storing dates as character strings is not good practice, we are simply using this as an example of overcoming a problem with the use of user-defined functions.

To begin, we will create a test table with "date" values stored as character strings. The expected "good" format for the date values should be 'YYYYMMDD'.

CREATE TABLE date_test
( date_string VARCHAR2(8) );

Now a few rows will be inserted into the table.

INSERT INTO date_test VALUES ('00000000');
INSERT INTO date_test VALUES (' ');
INSERT INTO date_test VALUES ('19920821') -- Valid format;
INSERT INTO date_test VALUES ('08011999');
INSERT INTO date_test VALUES ('00000810');

Additional rows are added to illustrate performance strain encountered with user-defined functions and context switching.

INSERT INTO date_test SELECT * FROM date_test;
/
/
/
/
/
/
/
/
/
/
/
/
/

COMMIT;

Next, a function is created to test for valid dates. The function tries to perform an explicit conversion using TO_DATE. If the conversion fails, the date is not valid and an 'N' is returned; otherwise, a 'Y' is returned.

CREATE OR REPLACE FUNCTION is_date
 ( p_string  IN VARCHAR2
 , p_format  IN VARCHAR2 DEFAULT 'YYYYMMDD' )
RETURN VARCHAR2 IS
  tmp  DATE;
BEGIN
  tmp := TO_DATE(p_string, p_format);
  RETURN 'Y';
EXCEPTION
  WHEN OTHERS THEN
    RETURN 'N';
END is_date;
/

SHO ERR

Take a look at the number of rows in the table, along with the result of the IS_DATE function for each distinct value.

SELECT COUNT(*) FROM date_test;

  COUNT(*)
----------
     81920

COL is_date FORMAT A7
SELECT DISTINCT date_string, is_date
FROM ( 
       SELECT date_string
            , is_date(date_string) AS is_date 
         FROM date_test 
     );

DATE_STR IS_DATE
-------- -------
00000000 N
08011999 N
00000810 N
19920821 Y
         N 
Next, we will use the IS_DATE function to find only rows with a valid date format. Examine the execution plan of the query. Notice the execution plan makes use of a full table scan.

EXPLAIN PLAN FOR
SELECT * FROM date_test
WHERE is_date(date_string) = 'Y';

SELECT * FROM TABLE(dbms_xplan.display());
---------------------------------------
| Id  | Operation         | Name      |
---------------------------------------
|   0 | SELECT STATEMENT  |           |
|*  1 |  TABLE ACCESS FULL| DATE_TEST |
--------------------------------------- 
Attempt to create a function-based index to improve performance. Unfortunately, a user-defined function cannot be used in a function-based index unless the function is determined to be DETERMINISTIC.

CREATE INDEX date_test_fbi ON date_test(is_date(date_string));
*
ERROR at line 1:
ORA-30553: The function is not deterministic
Change the function to be 'DETERMINISTIC' to allow an FBI to be created.

CREATE OR REPLACE FUNCTION is_date
 ( p_string  IN VARCHAR2
 , p_format  IN VARCHAR2 DEFAULT 'YYYYMMDD' )
RETURN VARCHAR2 DETERMINISTIC IS
  tmp  DATE;
BEGIN
  tmp := TO_DATE(p_string, p_format);
  RETURN 'Y';
EXCEPTION
  WHEN OTHERS THEN
    RETURN 'N';
END is_date;
/

SHO ERR
Attempt to create the index again. This time the index will be created successfully.

CREATE INDEX date_test_fbi ON date_test(is_date(date_string));
Re-examine the execution plan of the query executed earlier to make sure the index is being used.

EXPLAIN PLAN FOR
SELECT * FROM date_test
WHERE is_date(date_string) = 'Y';

SELECT * FROM TABLE(dbms_xplan.display());
-----------------------------------------------------
| Id  | Operation                   | Name          |
-----------------------------------------------------
|   0 | SELECT STATEMENT            |               |
|   1 |  TABLE ACCESS BY INDEX ROWID| DATE_TEST     |
|*  2 |   INDEX RANGE SCAN          | DATE_TEST_FBI |
----------------------------------------------------- 
Now that the index is used, the function provides a much improved performance boost, as the function does not have to be executed, thus allowing the PL/SQL logic to be contained within a single function (rather than trying to valid dates directly in SQL) in an efficient way.

Our next blog will illustrate the difference between DETERMINISTIC and NON-DETERMINISTIC functions.

No comments:

Post a Comment

Followers