Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata cache corruption / crash when updating with same topic, different topic ID #4964

Open
kwdubuc opened this issue Feb 9, 2025 · 1 comment
Labels

Comments

@kwdubuc
Copy link

kwdubuc commented Feb 9, 2025

When rd_kafka_metadata_cache_topic_update() is called with the same topic but a different internal topic ID as an existing metadata cache entry, the metadata cache AVL indexed by topic_id will retain a stale pointer to a now-freed metadata cache entry. This causes corruption and, usually in short order, crashes.

Suppose we start with the metadata cache containing a single entry at address a1, with {topic, topic_id} being {"t1", 1}. rd_kafka_metadata_cache_insert() is called with topic "t1" and topic_id 2. Follow the code path through rd_kafka_metadata_cache_insert():

        rkmce = rd_tmpabuf_alloc(&tbuf, sizeof(*rkmce));

        rkmce->rkmce_mtopic = *mtopic;

        rkmce->rkmce_metadata_internal_topic = *metadata_internal_topic;

rkmce is the new metadata cache entry, with {topic, topic_id} being {"t1", 2}.

        /* Insert (and replace existing) entry. */
        old = RD_AVL_INSERT(&rk->rk_metadata_cache.rkmc_avl, rkmce,
                            rkmce_avlnode);

old is a1, containing {"t1", 1}. old has been removed from the AVL indexed by topic; rkmce is inserted in the AVL indexed by topic.

        /* Insert (and replace existing) entry into the AVL tree sorted
         * by topic id. */
        if (!RD_KAFKA_UUID_IS_ZERO(
                rkmce->rkmce_metadata_internal_topic.topic_id)) {
                /* If topic id isn't zero insert cache entry into this tree */
                old_by_id = RD_AVL_INSERT(&rk->rk_metadata_cache.rkmc_avl_by_id,
                                          rkmce, rkmce_avlnode_by_id);

rkmce is inserted in the AVL indexed by topic ID. Since there was no previous metadata cache entry with ID 2, old_by_id is null. The AVL indexed by topic ID continues to have an entry for ID 1, still pointing to a1.

        } else if (old && !RD_KAFKA_UUID_IS_ZERO(
                              old->rkmce_metadata_internal_topic.topic_id)) {
                /* If it had a topic id, remove it from the tree */
                RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl_by_id, old);
        }

This is the else clause to the preceding if, so it does not run.

        if (old) {
                /* Delete and free old cache entry */
                rd_kafka_metadata_cache_delete(rk, old, 0);
        }

The last parameter to rd_kafka_metadata_cache_delete() is unlink_avl, whether it is to unlink the cache entry from the AVLs. Since unlink_avl is 0, rd_kafka_metadata_cache_delete() doesn't update the AVLs and just frees old / a1.

At this point, the landmine has been armed: the AVL indexed by topic ID has a stale pointer to freed memory a1. Depending on when and how that memory is reallocated, a variety of ill-effects can occur. Most of these ill-effects are crashes at inconsistent locations, but usually related to metadata cache operations.

        if (old_by_id && old_by_id != old) {
                /* If there was a different cache entry in this tree,
                 * remove and free it. */
                RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl, old_by_id);
                rd_kafka_metadata_cache_delete(rk, old_by_id, 0);
        }

Since old_by_id is null, this code doesn't run.

I believe that the bug is that this:

        } else if (old && !RD_KAFKA_UUID_IS_ZERO(
                              old->rkmce_metadata_internal_topic.topic_id)) {
                /* If it had a topic id, remove it from the tree */
                RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl_by_id, old);
        }

should not be an else. Instead it should be:

        if (old && !RD_KAFKA_UUID_IS_ZERO(
                old->rkmce_metadata_internal_topic.topic_id) &&
            rd_kafka_Uuid_cmp(rkmce->rkmce_metadata_internal_topic.topic_id,
                old->rkmce_metadata_internal_topic.topic_id) != 0) {
                /* If it had a different topic id, remove it from the tree */
                RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl_by_id, old);
        }

That is, if old had a topic ID which was different from rkmce, remove old from the AVL indexed by topic ID. If old has the same topic ID as rkmce, then from the preceding if body, old_by_id will be equal to old and properly handled by the existing code.

This seems to solve the problem in my testing.

This was found using librdkafka v2.6.0, but, as far as I can see, it continues to exist on the most recent main.

I suspect this is also the root cause of issues #4904 and #4907.

@emasab
Copy link
Contributor

emasab commented Feb 10, 2025

Thanks @kwdubuc for the investigation on this bug, thanks also @GerKr and @marcin-krystianc for their help in #4778 and @marcin-krystianc for #4864 PR.
I think this preposed fix is correct because if in #4864 it removes from rkmc_avl_by_id without checking that the id is different, it'll remove the entry that was just inserted at:

                /* If topic id isn't zero insert cache entry into this tree */
                old_by_id = RD_AVL_INSERT(&rk->rk_metadata_cache.rkmc_avl_by_id,
                                          rkmce, rkmce_avlnode_by_id);

code in #4864 is:

        if (old && !RD_KAFKA_UUID_IS_ZERO(
                              old->rkmce_metadata_internal_topic.topic_id)) {
                /* If it had a topic id, remove it from the tree */
                RD_AVL_REMOVE_ELM(&rk->rk_metadata_cache.rkmc_avl_by_id, old);

@emasab emasab added the bug label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants