Monday, June 14, 2021

TRACKING CHANGES IN POSTGRESQL

 Database security is always an essential issue in any database application. Especially when critical data is stored, it might be interesting to know who has changed which data when and how. To track those changes made to tables in PostgreSQL you can write yourself a generic changelog trigger. The easiest way to do that is to write a generic PL/pgSQL function and use it for all tables in the system. As PostgreSQL provides good support for stored procedures, this is definitely not hard to do.

Creating a table to store some history 

First of all we need a table to store those changes. For a quick prototype we can use the following table structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE SCHEMA logging;
 
CREATE TABLE logging.t_history (
 
        id                 serial,
 
        tstamp                      timestamp       DEFAULT now(),
 
        schemaname           text,
 
        tabname                  text,
 
        operation                 text,
 
        who             text            DEFAULT current_user,
 
        new_val                   json,
 
        old_val                    json
 
);

For the sake of simplicity we did not use enumerators or so to store data in a more efficient way.

The point of this table is to keep track of all changes made to a table. We want to know which operation has been taking place. The next important issue is: When a new row is added it will be visible by the trigger procedure. The same applies to deletion and changes.

The backbone of this infrastructure

Taking this into account we can come up with the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
CREATE FUNCTION change_trigger() RETURNS trigger AS $$
 
        BEGIN
 
                IF      TG_OP = 'INSERT'
 
                THEN
 
                        INSERT INTO logging.t_history (tabname, schemaname, operation, new_val)
 
                                VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(NEW));
 
                        RETURN NEW;
 
                ELSIF   TG_OP = 'UPDATE'
 
                THEN
 
                        INSERT INTO logging.t_history (tabname, schemaname, operation, new_val, old_val)
 
                                VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP,
 
                                        row_to_json(NEW), row_to_json(OLD));
 
                        RETURN NEW;
 
                ELSIF   TG_OP = 'DELETE'
 
                THEN
 
                        INSERT INTO logging.t_history (tabname, schemaname, operation, old_val)
 
                                VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(OLD));
 
                        RETURN OLD;
 
                END IF;
 
        END;
 
$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

The interesting part here is actually that the trigger is totally generic. We can use the row_to_json function to encode any change into a JSON document. The advantage is that we can use the very same trigger for countless tables. NEW and OLD will contain the rows changed by our operations.

Once we have the backbone in place, we can test things using a simple table. Here is an example:

1
CREATE TABLE t_trig (id int, name text);

Once we have created our tables, we can deploy triggers doing the real work:

1
2
3
CREATE TRIGGER t BEFORE INSERT OR UPDATE OR DELETE ON t_trig
 
        FOR EACH ROW EXECUTE PROCEDURE change_trigger();

Let us test the trigger and see what happens:

1
2
3
INSERT INTO t_trig VALUES (1, 'hans');
 
UPDATE t_trig SET id = 10 * id, name = 'paul';

Our history table will contain all changes we have made to the underlying tables:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-[ RECORD 1 ]--------------------------
 
id         | 1
 
tstamp     | 2013-12-02 12:13:18.069316
 
schemaname | public
 
operation  | INSERT
 
who        | hs
 
tabname    | t_trig
 
new_val    | {"id":1,"name":"hans"}
 
old_val    |
 
-[ RECORD 2 ]--------------------------
 
id         | 2
 
tstamp     | 2013-12-02 12:13:18.069316
 
schemaname | public
 
operation  | UPDATE
 
who        | hs
 
tabname    | t_trig
 
new_val    | {"id":10,"name":"paul"}
 
old_val    | {"id":1,"name":"hans"}

Security considerations

If we assume that we use this mechanism to track changes to protect our data, we have to make sure that somebody changing the data cannot change the log as well. Therefore we have to take precautions to permit this special case. One way to do this is to mark our trigger function as “SECURITY DEFINER”. This means that the function itself is not executed as the user making the change but as the one who has written the function. If the superuser deploys the trigger, we can protect ourselves against evil action.

No comments:

Post a Comment