Re: Freeze Break request: save and then truncate the mailman bounceevnt table

Neal Gompa via infrastructure <infrastructure@xxxxxxxxxxxxxxxxxxxxxxx> · Tue, 8 Apr 2025 20:03:15 -0400



On Tue, Apr 8, 2025 at 5:42 PM Kevin Fenzi via infrastructure
<infrastructure@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Greetings.
>
> We have had several applications crashing (resultsdb,
> resultsdb_ci_listener) or being slow (bodhi) of late.
>
> I did some digging today and discovered that db01 is pretty saturated on
> I/O. This means all the apps that use db01 are fighting i/o and
> returning things slower than they should.
>
> On looking more, it was mailman that was using the vast amount of the
> i/o. I of course thought at first that it was crawlers, but it is not.
>
> Instead it seems to be the bounce processor.
> This processor wakes up every few minutes and does a query for any
> bounces in the bounceevent table that are processed = 'false'.
> If it finds any, it processes them.
>
> However, that table is now 50GB and contains 152167015 rows
> (all of them pretty much processed = 'True').
>
> From the logs (which logs slow queries), an example:
>
> 2025-04-08 21:32:40.510 GMT [7073] LOG:  duration: 267423.928 ms  plan:
>         Query Text: SELECT bounceevent.id AS bounceevent_id, bounceevent.list_id AS bounceevent_
> list_id, bounceevent.email AS bounceevent_email, bounceevent.timestamp AS bounceevent_timestamp,
>  bounceevent.message_id AS bounceevent_message_id, bounceevent.context AS bounceevent_context, b
> ounceevent.processed AS bounceevent_processed
>         FROM bounceevent
>         WHERE bounceevent.processed = false
>         Gather  (cost=1000.00..7441540.83 rows=1 width=137)
>           Workers Planned: 2
>           ->  Parallel Seq Scan on bounceevent  (cost=0.00..7440540.73 rows=1 width=137)
>                 Filter: (NOT processed)
>
> Yes, thats 267seconds to process that query, all the time hammering I/O
> because the table is too large to cache well.
>
> This all pointed me to find this 7 year old bug report:
> https://gitlab.com/mailman/mailman/-/issues/343
> Hopefully abompard finds it a fun blast from the past. :)
>
> Anyhow, a quick fix I think would be:
>
> * Save a copy of the latest database dump that should have that table
> backed up.
> * 'truncate bounceevent' to wipe it
>
> Thoughts? +1s? counter proposals?
>
> I'd like to do this so the other db01 users stop having problems.
>

+1

(Maybe this should be automated and run once a year automatically?)


-- 
真実はいつも一つ！/ Always, there's only one truth!
-- 
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue