Keeping data in sync between the (authoritative) database & the
(non-authoritative) search index is one of the more difficult problems when
using Haystack. Even frequently running the
update_index management command
still introduces lag between when the data is stored & when it’s available
A solution to this is to incorporate Django’s signals (specifically
models.db.signals.post_delete), which then
trigger individual updates to the search index, keeping them in near-perfect
Older versions of Haystack (pre-v2.0) tied the
SearchIndex directly to the
signals, which caused occasional conflicts of interest with third-party
To solve this, starting with Haystack v2.0, the concept of a
has been introduced. In it’s simplest form, the
to whatever signals are setup & can be configured to then trigger the updates
without having to change any
SignalProcessor into your setup will
increase the overall load (CPU & perhaps I/O depending on configuration).
You will need to capacity plan for this & ensure you can make the tradeoff
of more real-time results for increased load.
The default setup is configured to use the
haystack.signals.BaseSignalProcessor class, which includes all the
underlying code necessary to handle individual updates/deletes, BUT DOES NOT
HOOK UP THE SIGNALS.
This means that, by default, NO ACTION IS TAKEN BY HAYSTACK when a model is
saved or deleted. The
BaseSignalProcessor.teardown methods are both empty to prevent anything
from being setup at initialization time.
This usage is configured very simply (again, by default) with the
HAYSTACK_SIGNAL_PROCESSOR setting. An example of manually setting this
would look like:
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.BaseSignalProcessor'
This class forms an excellent base if you’d like to override/extend for more advanced behavior. Which leads us to...
The other included
SignalProcessor is the
haystack.signals.RealtimeSignalProcessor class. It is an extremely thin
extension of the
BaseSignalProcessor class, differing only in that
in implements the
setup/teardown methods, tying ANY Model
save/delete to the signal processor.
If the model has an associated
will then trigger an update/delete of that model instance within the search
Configuration looks like:
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
This causes all
SearchIndex classes to work in a realtime fashion.
These updates happen in-process, which if a request-response cycle is involved, may cause the user with the browser to sit & wait for indexing to be completed. Since this wait can be undesirable, especially under load, you may wish to look into queued search options. See the Haystack-Related Applications documentation for existing options.
RealtimeSignalProcessor classes are fairly
simple/straightforward to customize or extend. Rather than forking Haystack to
implement your modifications, you should create your own subclass within your
codebase (anywhere that’s importable is usually fine, though you should avoid
For instance, if you only wanted
User saves to be realtime, deferring all
other updates to the management commands, you’d implement the following code:
from django.contrib.auth.models import User from django.db import models from haystack import signals class UserOnlySignalProcessor(BaseSignalProcessor): def setup(self): # Listen only to the ``User`` model. models.signals.post_save.connect(self.handle_save, sender=User) models.signals.post_delete.connect(self.handle_delete, sender=User) def teardown(self): # Disconnect only for the ``User`` model. models.signals.post_save.disconnect(self.handle_save, sender=User) models.signals.post_delete.disconnect(self.handle_delete, sender=User)
For other customizations (modifying how saves/deletes should work), you’ll need
to override/extend the
handle_save/handle_delete methods. The source code
is your best option for referring to how things currently work on your version