I am supporting an application that runs on Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production.
We have a table that stores hourly data that uses triggers call a package to sync to a monthly table that stores a total amount.
When the hourly table is updated, the on update trigger saves the old amount to a table package variable in this format: primary key fields, amount.
TYPE HOURLY_CHANGE_TAB IS TABLE OF HOURLY_REC INDEX BY BINARY_INTEGER;TYPE HOURLY_REC IS RECORD ( KEY NUMBER(10), SUB_KEY NUMBER, MONTH VARCHAR2(6), AMOUNT NUMBER);
Amount is set to -old_amount. It then adds the new amount to amount. It does this for each hour record that gets updated. So if there are 120 hours in a month, it would subtract the old amount 120 times from the variable, and add the new amount 120 times to get the total change. An after statement trigger or manual call would update the months record with this total and clear the package variable. When we are updating a large number of hours, we set a flag to disable the month update and then manually call the update method once at the end for performance. Regardless, the issue has already happened by the time this method gets called.
When I update a deal that runs for multiple years, the total amount is off by new amount - old amount in a random number of months. So if old amount was 100 per hour and new amount is 10 per hour then the amount we push to the month record is -90 what it should be. The majority of the months would be correct. I have seen zero, one, two, and four months broken after each update / rollback testing run. Each time I run the exact same update and then rollback after checking for the error.
I logged every time the package variable changes to a new table and saw the following:
KEY and SUB_KEY are keys in this example. Month is the monthly record our hour is part of. The amount column shows the amount to be added to the package variable. Index is the index of the table variable that we are updating for this row. Starting amount it the value of the amount column at the start of logging the update. Calc Amount is what the package variable should store after we add the amount to the starting amount. Ending Amount is the value retrieved from the package variable after the update is saved. Row index is the incrementing unique sequence in the logging table to keep track of the order in which the updates ran. The correct amount is a window function summing the amount ordered by row_index, partitioned by the keys, to show what the package variable should store on every row.
Notice what happens between rows 2 and 3. The ending amount on 2 is the correct value of 540. The starting amount on 3 is 550, shifted by new volume(20) - old volume(10). This relationship held true for an update of 100 to 10, the shifted amount was -90, not 10. And yet there is no gap in the row index. We go straight from 16188 to 16189 without a gap. I created a copy of the package and removed everything but the method to update the package variable. There is no other path to update the table variable except via the method that is logged. The package variable is defined within the package body so no other package can update it, and I deleted all other methods in the package but the one to update it. I can't fathom how this variable is changing between the end of one method and the start of the next.
CREATE OR REPLACE PACKAGE BODY TEST_PKG AS PKG_HOUR_COUNT NUMBER := 0; HOUR_TAB HOURLY_CHANGE_TAB ;...
Again, the number of months affected by this issue is random, and they are all broken by the same amount, one month isn't offsetting another by some off by one error.
Note that I have endeavored to reduce the following code to a minimum version of the code. Table names and columns have been changed. If I attempt to run the trigger code in a loop versus running an update the issue does not happen.
I was able to create a new version of the package, delete everything not related to these calls and reproduce the issue and was successful. I tried to create a mini copy of the table with just the rows I needed and may have been able to reproduce the issue once. I know that's vague but it hasn't worked since on the mini copied table.
SUB_KEY is not used in the package variable since it only defines the hours in this table so all hours with the same KEY and SUB_KEy will be in the same month, regardless of SUB_SUB_KEY.
Table Definition:
CREATE TABLE ORDER_HOURS(KEY NUMBER(12,0) NOT NULL ENABLE, SUB_KEY NUMBER(6,0) NOT NULL ENABLE, SUB_SUB_KEY NUMBER(3,0) NOT NULL ENABLE, GMT_TIME DATE NOT NULL, DAY DATE NOT NULL ENABLE, HOUR VARCHAR2(4 BYTE) NOT NULL ENABLE, AMOUNT NUMBER(10,3) NOT NULL ENABLE, CONFIRMED_FLAG VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL ENABLE, CONSTRAINT HOUR_PK PRIMARY KEY (KEY, SUB_KEY, SUB_SUB_KEY, GMT_TIME) USING INDEX ) PARTITION BY RANGE ("DAY") INTERVAL (NUMTOYMINTERVAL(3,'MONTH')) ( PARTITION p0 VALUES LESS THAN (TO_DATE('1-1-2007', 'DD-MM-YYYY')), PARTITION p1 VALUES LESS THAN (TO_DATE('1-1-2008', 'DD-MM-YYYY')), PARTITION p2 VALUES LESS THAN (TO_DATE('1-7-2009', 'DD-MM-YYYY')), PARTITION p3 VALUES LESS THAN (TO_DATE('1-1-2010', 'DD-MM-YYYY')) );
Trigger:
CREATE OR REPLACE TRIGGER UPDATE_HOUR BEFORE UPDATE OF AMOUNT ON ORDER_HOURS FOR EACH ROW BEGIN IF :NEW.CONFIRMED_FLAG = 'Y' AND :OLD.CONFIRMED_FLAG = 'Y' THEN RAISE_APPLICATION_ERROR(-20914, 'You cannot Update.'); ELSE IF :NEW.CONFIRMED_FLAG = 'A' THEN :NEW.CONFIRMED_FLAG := 'Y'; END IF; IF TEST_PKG.GET_UPDATE_HOURS_FLAG = 'Y' THEN TEST_PKG.UPDATE_HOURS(:OLD.KEY, :NEW.KEY, :OLD.SUB_KEY, :NEW.SUB_KEY, :OLD.SUB_SUB_KEY, :NEW.SUB_SUB_KEY, :OLD.DAY, :NEW.DAY, :OLD.AMOUNT, :NEW.AMOUNT); END IF; END IF; END;/
Logging Table:
CREATE TABLE HOURS_DATA(KEY number,SUB_KEY number,MONTH VARCHAR2(4000),AMOUNT NUMBER,VARIABLE_INDEX NUMBER,STARTING_AMOUNT NUMBER,CALC_AMOUNT NUMBER,ENDING_AMOUNT NUMBER,ROW_INDEX NUMBER,CREATE_DATE DATE);
Trimed down package, only portion remaining that touches HOUR_TAB:
CREATE OR REPLACE PACKAGE TEST_PKG ASUPDATE_HOURS_FLAG VARCHAR2(1) DEFAULT 'Y';FUNCTION GET_UPDATE_HOURS_FLAG RETURN VARCHAR2;PROCEDURE UPDATE_HOURS( OLD_KEY NUMBER, NEW_KEY NUMBER, OLD_SUB_KEY NUMBER, NEW_SUB_KEY NUMBER, OLD_SUB_SUB_KEY NUMBER, NEW_SUB_SUB_KEY NUMBER, OLD_DAY DATE, NEW_DAY DATE, OLD_AMOUNT NUMBER, NEW_AMOUNT NUMBER); PROCEDURE UPDATE_HOUR_TAB ( KEY NUMBER, SUB_KEY NUMBER, MONTH VARCHAR2, AMOUNT NUMBER);PROCEDURE FLUSH_HOUR_TAB;TYPE HOURLY_REC IS RECORD ( KEY NUMBER(10), SUB_KEY NUMBER, MONTH VARCHAR2(6), AMOUNT NUMBER); TYPE HOURLY_CHANGE_TAB IS TABLE OF HOURLY_REC INDEX BY BINARY_INTEGER; END TEST_PKG;/CREATE OR REPLACE PACKAGE BODY TEST_PKG AS PKG_HOUR_COUNT NUMBER := 0; HOUR_TAB HOURLY_CHANGE_TAB ;FUNCTION GET_UPDATE_HOURS_FLAG RETURN VARCHAR2 ISBEGIN RETURN UPDATE_HOURS_FLAG;END GET_UPDATE_HOURS_FLAG;PROCEDURE UPDATE_HOUR_TAB ( KEY NUMBER, SUB_KEY NUMBER, MONTH VARCHAR2, AMOUNT NUMBER) IS CNT NUMBER; STARTING_AMOUNT number := 0; CALC_AMOUNT number := 0;BEGIN CNT := HOUR_TAB.FIRST; WHILE CNT IS NOT NULL LOOP EXIT WHEN HOUR_TAB(CNT).KEY = KEY AND HOUR_TAB(CNT).SUB_KEY = SUB_KEY AND HOUR_TAB(CNT).MONTH = MONTH; CNT := HOUR_TAB.NEXT(CNT); END LOOP; IF CNT IS NULL THEN PKG_HOUR_COUNT := PKG_HOUR_COUNT + 1; HOUR_TAB(PKG_HOUR_COUNT).KEY := KEY; HOUR_TAB(PKG_HOUR_COUNT).SUB_KEY := SUB_KEY; HOUR_TAB(PKG_HOUR_COUNT).MONTH := MONTH; HOUR_TAB(PKG_HOUR_COUNT).AMOUNT := AMOUNT; ELSE STARTING_AMOUNT := HOUR_TAB(CNT).AMOUNT; CALC_AMOUNT := HOUR_TAB(CNT).AMOUNT + AMOUNT; HOUR_TAB(CNT).AMOUNT := HOUR_TAB(CNT).AMOUNT + AMOUNT; END IF; IF CNT IS NULL THEN CNT := PKG_HOUR_COUNT; END IF; INSERT INTO HOURS_DATA VALUES(KEY, SUB_KEY, MONTH, AMOUNT, CNT, STARTING_AMOUNT, CALC_AMOUNT, HOUR_TAB(CNT).AMOUNT, (SELECT COUNT(*) FROM HOURS_DATA), SYSDATE);END UPDATE_HOUR_TAB;PROCEDURE UPDATE_HOURS( OLD_KEY NUMBER, NEW_KEY NUMBER, OLD_SUB_KEY NUMBER, NEW_SUB_KEY NUMBER, OLD_SUB_SUB_KEY NUMBER, NEW_SUB_SUB_KEY NUMBER, OLD_DAY DATE, NEW_DAY DATE, OLD_AMOUNT NUMBER, NEW_AMOUNT NUMBER) ISBEGIN UPDATE_HOUR_TAB ( OLD_KEY, OLD_SUB_KEY, TO_CHAR(OLD_DAY,'YYYYMM'), -OLD_AMOUNT); UPDATE_HOUR_TAB ( NEW_KEY, NEW_SUB_KEY, TO_CHAR(NEW_DAY,'YYYYMM'), NEW_AMOUNT); END UPDATE_HOURS;PROCEDURE FLUSH_HOUR_TAB IS CNT NUMBER;BEGIN CNT := HOUR_TAB.FIRST; WHILE CNT IS NOT NULL LOOP CNT := HOUR_TAB.NEXT(CNT); --UPDATE MONTHS, DATA ALREADY BROKEN AT THIS POINT END LOOP; HOUR_TAB.DELETE; PKG_HOUR_COUNT := 0;END FLUSH_HOUR_TAB;END TEST_PKG;/
Insert Script for test data:
DECLARE ORDER_KEY NUMBER := 10; SUB_KEY NUMBER := 0; SUB_SUB_KEY NUMBER := 1; V_START_TIME TIMESTAMP WITH TIME ZONE := '01-FEB-19 09.00.00 AM UTC'; V_END_TIME TIMESTAMP WITH TIME ZONE := '31-DEC-22 08.00.00 AM UTC'; V_CURRENT_TIME TIMESTAMP WITH TIME ZONE; V_DAY DATE; V_HOUR NUMBER(2);BEGIN V_CURRENT_TIME := V_START_TIME; WHILE V_CURRENT_TIME <= V_END_TIME LOOP --DBMS_OUTPUT.PUT_LINE(V_CURRENT_TIME); V_DAY := CAST(V_CURRENT_TIME AT TIME ZONE ('US/PACIFIC') AS DATE); V_HOUR := TO_CHAR(V_DAY, 'HH24') + 1; --EXCLUDE WEEKENDS IF MOD(TO_CHAR(V_DAY, 'J'), 7) + 1 NOT IN(6, 7) THEN --DBMS_OUTPUT.PUT_LINE(V_DAY); INSERT INTO ORDER_HOURS VALUES(ORDER_KEY, SUB_KEY, SUB_SUB_KEY, V_CURRENT_TIME, TRUNC(V_DAY), V_HOUR, 10, 'N'); END IF; V_CURRENT_TIME := V_CURRENT_TIME + INTERVAL '1' HOUR;END LOOP; COMMIT;END;/
Update I ran - only causes error on original table:
DECLARE P_AMOUNT NUMBER := 20; P_KEY NUMBER := 10;BEGINFOR UPDATE_HOUR IN(SELECT * FROM ORDER_HOURS WHERE KEY = P_KEY AND DAY >= '1-MAR-20' ) LOOP UPDATE ORDER_HOURS SET AMOUNT = P_AMOUNT WHERE KEY = P_KEY AND DAY = UPDATE_HOUR.DAY AND HOUR = UPDATE_HOUR.HOUR AND (AMOUNT <> P_AMOUNT);--Running the trigger code directly has never worked-- IF :NEW.CONFIRMED_FLAG = 'Y' AND :OLD.CONFIRMED_FLAG = 'Y'-- THEN RAISE_APPLICATION_ERROR(-20914, 'You cannot Update.');-- ELSE-- IF :NEW.CONFIRMED_FLAG = 'A' THEN-- :NEW.CONFIRMED_FLAG := 'Y';-- END IF;-- IF TEST_PKG.GET_UPDATE_HOURS_FLAG = 'Y' THEN-- TEST_PKG.UPDATE_HOURS(:OLD.KEY, :NEW.KEY, :OLD.SUB_KEY, :NEW.SUB_KEY, :OLD.SUB_SUB_KEY, :NEW.SUB_SUB_KEY, :OLD.DAY, :NEW.DAY, :OLD.AMOUNT, :NEW.AMOUNT);-- END IF;-- END IF; END LOOP; --DOES THE MONTHLY UPDATE BASED ON PACKAGE VARIABLE AND CLEARS THE VARIABLE TEST_PKG.FLUSH_HOUR_TAB;END;/
Code to check logs for errors:
SELECT HD.*FROM (SELECT HD.*, SUM(AMOUNT) OVER(PARTITION BY KEY, SUB_KEY, MONTH ORDER BY ROW_INDEX) CORRECT_AMOUNT, COUNT(*) OVER(PARTITION BY SUB_KEY) SEQ_COUNTFROM HOURS_DATA HD) HDWHERE HD.ENDING_AMOUNT <> CORRECT_AMOUNTORDER BY ROW_INDEX;
I have checked to see if there are more rows than there should be with an the same index or sub_key but the number of sub_key rows and index_rows is always the same. Right now my only hypothesis is a bug in Oracle. I am happy to provide more data or code if needed, but since I have not been able to duplicate this on another table, I am not sure how to help someone else reproduce it themselves.
The Order Hours table has 294,574,145 rows, filling 138 partitions, and using 7045 MB for data. When I run the attached code against the mini table ORDER_HOUR, it does not error out. When I only change the table name to the actual table with all the data, I do get the error. This makes me suspect that this is a bug with Oracle around triggers on large partitioned tables since the only difference is the table the trigger runs against. This makes it more difficult for someone to test against and try to reproduce if the issue is table specific. Any suggestions around cause or additional traces to run is appreciated.
While I may be able to solve this by changing how we update the monthly table, or update the month table after every update to an hour despite the performance cost(there are triggers on the month table that do a monthly cost calculation so each update is expensive), this issue is strange enough that I would like to solve it if possible.
Thank you for your help and patience reading through this wall of text.