Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a check "Candidates to convert btree index into hash index" #543

Open
mfvanek opened this issue Dec 17, 2024 · 0 comments
Open

Add a check "Candidates to convert btree index into hash index" #543

mfvanek opened this issue Dec 17, 2024 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@mfvanek
Copy link
Owner

mfvanek commented Dec 17, 2024

Take a look at

DO $$
DECLARE
    count bigint;
    unique_count bigint;
    row record;
BEGIN
    CREATE TEMPORARY TABLE hash_index_candidates (
        table_name text,
        index_name text, -- b-tree index to migrate
        column_name text,
        unique_ratio decimal,
        btree_index_size text,
        potential_hash_index_size text, -- assumption based value
        count bigint
    ); -- stores temporary results for the current session
    FOR row IN (
        SELECT
            t.table_name AS table_name,
            t.index_name AS index_name,
            t.column_name AS column_name,
            pg_size_pretty(t.size) AS index_size
        FROM
            (
                SELECT
                    tab.relname AS table_name,
                    cls.relname AS index_name,
                    pa.attname AS column_name,
                    pg_relation_size(cls.relname :: text) AS size
                FROM pg_stat_user_indexes idx
                JOIN pg_index pi ON pi.indexrelid = idx.indexrelid
                JOIN pg_class cls ON cls.oid = idx.indexrelid
                JOIN pg_class tab ON tab.oid = idx.relid
                JOIN pg_am am ON am.oid = cls.relam
                JOIN pg_attribute pa ON cls.oid = pa.attrelid
                JOIN pg_type pt ON pa.atttypid = pt.oid
                WHERE amname = 'btree' -- b-tree type
                AND indnatts = 1 -- index covers single column
                AND indislive = true -- check index "health"
                AND indisvalid = true -- check index "health"
                AND indpred IS NULL -- index has no conditions
                AND typname IN ('varchar', 'text', 'char')
                AND indisprimary = false -- index is not primary
                AND typlen = -1 -- type has unlimited bytes length
                AND indisunique = false -- index is not unique
            ) t
        WHERE t.size > 104857600 -- index size is more than 100MB
    ) LOOP
        EXECUTE format('SELECT COUNT(%I) FROM %I', row.column_name, row.table_name ) INTO count;
        EXECUTE format('SELECT COUNT(DISTINCT %I) FROM %I', row.column_name, row.table_name) INTO unique_count;
        INSERT INTO hash_index_candidates
        VALUES
        (
            row.table_name,
            row.index_name,
            row.column_name,
            round(unique_count :: decimal / (count :: decimal + 1) * 100, 2),
            row.index_size,
            pg_size_pretty(count / 300 * 8192), -- assumes 300 tuples per hash bucket
            count
        );
    END LOOP;
END
$$ LANGUAGE plpgsql;

SELECT * FROM hash_index_candidates WHERE unique_ratio > 95; -- fetch results

First, we should check if the candidate satisfies the rule of thumb we came up with here:

Values have high cardinality. Ideally, their hash codes have high cardinality as well;
Values are queried only with equality operators;
Values lengths should be at least 25 characters long.
The most important part is the usage of the index. If you’re confident that your field under the index is queried by equality operators only, then go ahead and migrate.

@mfvanek mfvanek added enhancement New feature or request help wanted Extra attention is needed labels Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant