Leaking memory like a pro in Python

For all my Python zealotry, I never claimed it was perfect. In fact, there are a number of places you can shoot yourself in the foot if you aren’t careful. But I was kind of bummed about this one:

import numpy as np

class MainScreen(object):

    def __init__(self, cats):
        self.cats = cats
        self._signal = None

    def turn_on(self):
        if self._signal is None:
            signal = Signal(self)
            self._signal = signal
        return self._signal

class Signal(object):

    def __init__(self, main_screen):
        # well we want to know which screen we came from!
        self.main_screen = main_screen

    def we_get(self):
        return True

for i in xrange(50):
    cats = np.empty((10000, 500), dtype=float)
    cats.fill(0)
    main_screen = MainScreen(cats)
    signal = main_screen.turn_on()
    # all your base...

Try it.

EDIT: I’ve been informed by an astute reader that the GC is actually capable of dealing with the above case. Yang Zhang says “The collector actually works with cycles, including this case. (Actually, most objects are managed via reference-counting; the gc is specifically for cyclic references.) The problem is that it’s triggered by # allocations, not # bytes. In this case, each cats array is large enough that you need several GB of allocations before the threshold is triggered. Try adding gc.collect() or gc.get_count() in the loop (and inspect gc.get_threshold()).”

All is not lost however. This pattern can be implemented using weak references:

class MainScreen(object):

    def __init__(self, cats):
        self.cats = cats
        self._signal = None

    def turn_on(self):
        import weakref
        if self._signal is None:
            signal = Signal(self)
            self._signal = weakref.ref(signal)
        return self._signal()