Incoming events failing after seeing "evil" events in auth chain
conduit.rory.gay
has a room that's effectively broken for incoming federation, causing a large amount of log spam and possibly performance degradation.
The room is !mefQhZzgTaxNCNzAeK:kde.org
(#kde:kde.org
). The error messages look like this in the log:
WARN http_request{otel.name="/_matrix/federation/v1/send/:transaction_id" path=/_matrix/federation/v1/send/:transaction_id method=PUT}:run_ruma_handler{auth.user=None auth.device=None auth.servername=Some("fedora.im") auth.appservice_id=None}:handle_incoming_pdu{origin="fedora.im" event_id="$Aq-G08tEA2Eg2T16mwVTQD_aZlBlzltiEPm7hJ4jk5Y" room_id="!mefQhZzgTaxNCNzAeK:kde.org"}: grapevine::service::rooms::event_handler: Prev event $V6vOVpB89ZnniQLBkzVmeI20BdvJrWSxuGxm9yIz2qU failed: M_FORBIDDEN: Evil event in db
This log message is coming from this check, which triggers when auth chain resolution encounters an event with the wrong room id.
With logging changes, we were able to identify which event was triggering this:
WARN http_request{otel.name="/_matrix/federation/v1/send/:transaction_id" path=/_matrix/federation/v1/send/:transaction_id method=PUT}:run_ruma_handler{auth.user=None auth.device=None auth.servername=Some("tchncs.de") auth.appservice_id=None}:handle_incoming_pdu{origin="tchncs.de" event_id="$0dp9lZUnMj4lf9ENlDNSCMqtGJfMJpOcRWEXXokKjn0" room_id="!mefQhZzgTaxNCNzAeK:kde.org"}:upgrade_outlier_to_timeline_pdu{incoming_pdu=$SklI7dJgB1iD5B48bpFKpQWg0uSzozJthryMhOXAl48 create_event=$m6CVKKSDaFgHGYovw0hEqZOOaFHWMw1YlQZ-xloGsuc}:resolve_state{room_id="!mefQhZzgTaxNCNzAeK:kde.org"}:get_auth_chain{room_id="!mefQhZzgTaxNCNzAeK:kde.org" starting_events=["$YwZINFW0bAoOc4i9xEWwhhwu2G3DODiNA2xQuaY6SdM", "$jcpTU3hGQFpRNGLgBwhlBvkWBlj8vJjHlP36XQZLbaM", "$5Y9nDyI1ztYnc6pSDcWR1gDwd4b6n1uIKuZ_7LjzmVk", "$2TAIv9q4EsVqod6XFCzdAqO2aGlazr6CK8lEfzaq2AM", "$02SbBt9MFJ_fJyipuVznuAlQ05imIHADgJmZj_64BPM", "..."]}:get_auth_chain_inner{room_id="!mefQhZzgTaxNCNzAeK:kde.org" event_id="$U6r8_7zO5z537y8LUWMyUaDbtQyqsk37KjAfWTeSzhE"}: grapevine::service::rooms::auth_chain: Event referenced in auth chain has incorrect room id bad_room_id=!JmUbgVFsJPolpfQhiT:opensuse.org
{
"auth_events": [
"$fqdS2YfH3w05gla1NAXgRlUX1cv9_6EQ-XEiOjEmfL8",
"$ek3suTeLrQqiotaJL05bjGXQfi36mdqk2a9RFOQxwUA",
"$BW5jZ0yVL45MRORS13vR-bjUy1vYf0z1CFipzyiDg3Y",
"$48X-6mZMz7BA4kKa2VK7k9LifvoxDVo9BzTVEL9WTiU"
],
"content": {
"displayname": "appservice",
"membership": "join"
},
"depth": 10220,
"event_id": "$U6r8_7zO5z537y8LUWMyUaDbtQyqsk37KjAfWTeSzhE",
"hashes": {
"sha256": "B5Gt7s8s3/KkdsHtz3Q0Eerp3gqmw9vFxB9h2Xgy/GQ"
},
"origin": "libera.chat",
"origin_server_ts": 1634119843658,
"prev_events": [
"$uhDPWM33oo6pIYcYpdNE9bihvzanM6RndvvNmEvRjsk"
],
"prev_state": [],
"room_id": "!JmUbgVFsJPolpfQhiT:opensuse.org",
"sender": "@appservice:libera.chat",
"signatures": {
"libera.chat": {
"ed25519:t4fjCr": "/ksGD2p1Aas4Cl0hTthoLl4WYtXuqw17zRG7/YCvLpMBr9F4h1VX1RCrCG3NO+7TR7b3th2fhRH9JI/AMQmSBQ"
}
},
"state_key": "@appservice:libera.chat",
"type": "m.room.member",
"unsigned": {
"age": 72437213661,
"replaces_state": "$fqdS2YfH3w05gla1NAXgRlUX1cv9_6EQ-XEiOjEmfL8"
}
}
This event appears to be part of the starting set passed to get_auth_chain
from resolve_state
. There are two calls made, one with the state at the incoming event as the starting set, and the other with the current state in the room from the db as the starting set. From the surrounding log messages, we determined that the state for the incoming event was being determined by calling the /state_ids
endpoint on matrix.org (here). We manually queried this endpoint, and the bad event was not part of the response, so we believe that it must be present in the current state for the room in our database.
Important context is that conduit.rory.gay
was running conduwuit at the time that both rooms (!mefQhZzgTaxNCNzAeK:kde.org
and !JmUbgVFsJPolpfQhiT:opensuse.org
) where joined. It's possible that this is a db corruption bug only present in conduwuit, similar to #27.