A question was asked on StackOverflow about handling dead letters in RabbitMQ. The core of the question is centered around how to handle what are known as “poison messages” – messages that are problematic and cannot be processed for some reason.
The person asking wants to know how to deal with these bad messages, when they have been sent to a dead-letter queue. The queue stacks up with messages while the code to handle them is fixed – but what happens when the fix is ready? How do you re-process the messages through the right queue and consumer?
The Original Question
TL;DR: I need to “replay” dead letter messages back into their original queues once I’ve fixed the consumer code that was originally causing the messages to be rejected.
I have configured the Dead Letter Exchange (DLX) for RabbitMQ and am successfully routing rejected messages to a dead letter queue. But now I want to look at the messages in the dead letter queue and try to decide what to do with each of them. Some (many?) of these messages should be replayed (requeued) to their original queues (available in the “x-death” headers) once the offending consumer code has been fixed. But how do I actually go about doing this? Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to? And what about searching the dead letter queue? What if I know that a message (let’s say which is encoded in JSON) has a certain attribute that I want to search for and replay? For example, I fix a defect which I know will allow message with PacketId: 1234 to successfully process now. I could also write a one-off program for this I suppose.
I certainly can’t be the first one to encounter these problems and I’m wondering if anyone else has already solved them. It seems like there should be some sort of Swiss Army Knife for this sort of thing. I did a pretty extensive search on Google and Stack Overflow but didn’t really come up with much. The closest thing I could find were shovels but that doesn’t really seem like the right tool for the job.
There’s a lot of text in the question, but it can be boiled down to a few simple things – summed up in the question’s own “TL;DR” at the start.
The TL;DR Answer
In the middle of the second paragraph, this person asks,
Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to?
Generally speaking, yes.
There are other options, depending on the exact scenario. However, there is no built-in poison message or dead-letter handling strategy in RabbitMQ. It provides the mechanism to recognize and isolate poison messages via dead-letter exchanges, but it does not give guidance or solutions on handling the messages one they are in the dead letter queue.
One Solution: Automatic Retry
Sometimes a message is dead lettered not because of your code or a problem with your system. For example, an external system may be down or unreachable. In cases like this, an automatic retry of a message can be a useful solution for a dead letter.
The best way to achieve this will vary with your needs, but a common solution is to set up a delay for the re-try. Using the delayed message exchange plugin as your dead letter exchange will allow you to set a timer on a message in the hopes that waiting for the service to come back up will be successful.
But, this would only automate the retries on an interval and you may not have fixed the problem before the retries happen.
The Needs Of The System
In the case of the question being asked, creating an app to handle the dead letters is likely the best way to go, for several reasons.
- The need to examine messages individually and decide what to do with them
- The need to find specific messages in a queue – which isn’t possible in RabbitMQ, directly
- The need to wait for an unknown period of time, before retrying, while things are fixed
Each of these requirements are individually not difficult to do – but to do them from RabbitMQ, in an automated manner, may be difficult if not impossible. At least, it may not be possible all the time.
Examine Messages Individually
One of the stated needs is to look at messages in the dead letter queue and individually determine what should be done with them. There may be a lot of reasons for this – finding messages that have bad or missing data, identifying duplicates that can be thrown away, sending messages off for analysis to see what can be fixed or be done, etc.
It’s possible to use a standard consumer for some of this work, provided the examination can be automated.
For example, say you have previously fixed a bug and are dealing with a lot of messages that need to be re-processed. It may make sense to write a small app that reads the dead letters from the queue and re-publishes them to the original queue, assuming they meet some criteria. Messages that don’t meet the criteria could be sent to another queue for manual intervention (which would require it’s own app to read and work with the messages).
If you can automate the “examine messages individually” process, you should. But this is likely not possible early in a project’s life as the ability to do this typically means problems have already been solved and the solution can be automated. When you’re early in the life of an app, you’ll likely run into problems that don’t have a solution, yet, and you’ll need to manually sort through the messages to deal with them.
Find Message By Some Attribute
RabbitMQ is a true queue – first in, first out. You don’t get random access, seek or search in it’s queues. You get the next message in the queue, and that’s all you’ll ever get. Because of this, features like searching messages to look for certain ones, while ignoring others are not really possible.
This means you’ll need a database to store the messages, and you’ll need to pull the messages out of the queue so they can be searched and matched correctly. Because you’re pulling the messages out of the queue, you’ll need to ensure you get all the header info from the message so that you can re-publish to the correct queue when the time comes.
But the time to re-publish messages is not always known.
Re-Publish At An Unknown Future Time
It may be ok to leave the messages stuck in the queue while you fix the problem. I’ve done this numerous times and with great success. Fix the problem, republish the code and the queue starts draining again.
But this only works if you are dealing with a single message type in your queue.
If you are dead-lettering multiple message types or messages from multiple consumer types, and sending them all to the same dead letter queue, you’re going to need a more intelligent way to handle the messages.
Having an app that can read the dead letters and re-publish them when a user says to republish, will be necessary in some scenarios.
Not Inventing Anything
In the original question, a statement of this not being a unique problem appears:
I certainly can’t be the first one to encounter these problems and I’m wondering if anyone else has already solved them.
And certainly this person is not the first to have these problems!
There are many solutions to this problem, with only a few possible options listed here. There are even some off-the-shelf software solutions that have strategies and solutions for these problems baked in. I think NServiceBus (or the SaaS version of it) has this built in, and MassTransit has features for this as well.
There are certainly other people that have run into this, and have solved it in a myriad of ways. The availability of these features comes down to the library and language you are using, though.
Because of this, it often comes down to writing code to handle the dead letters. That code may require a database so you can search and select specific messages, get rid of some, etc.
But whatever your language and library, you should automate the solution to a given problem. But until you have a working solution, you’ll likely need to build an app to handle your dead letter queue appropriately.