Python applications do leak memory. Not due to Python itself, but due to application bugs. Though recent versions of Python have a true garbage collector that breaks cyclical references, you may still leak a lot of memory by keeping object references in forgotten corners of your code.
Another common reason of memory leak is the presence of __del__ method in a class, which prevents the garbage collector to break cycles with those classes. And then, the uncollected object keeps references to others, which keep references to others, and suddendly 90% of your object pool cannot go away.
Unfortunately my application was leaking so much memory this way, that it was getting sluggish to use in half an hour. So I had to hunt which objects were not being freed, and why. I managed to improve the situation a lot by breaking references "manually" (setting all references to other classes to None when the class had an explicit unload method), until I found the real culprit: three classes that had __del__ methods.
The technique I put together (with the help of a lot of Googling) was to explore some features of garbage collector (gc).
import gc
gc.collect()
objects = gc.get_objects()
objects_id = {}
for o in objects:
objects_id[id(o)] = True
# gc.garbage
In this code, I force a garbage collection, so I won't see collectable cyclic references; and then I get the complete pool of active objects. There will be several thousands of them at minimum, since everything is there: functions, methods, modules, instances, variables, etc.
The gc.garbage list contains a list of objects that gc could not garbage-collect because it didn't know how to brake the cycle of references; and it tipically happens when one class has a __del__ method, which means that developer should clean the reference by himself, but didn't. It is a very good place to begin to search for leaks.
But my application was also keeping objects alive by true references (not cyclical references), and I needed to find who was referring those leaked objects. In order to do that, I did the following code:
for o in l:
print o
if verbose >= 2:
if o in gc.garbage:
print o
print " In gc.garbage (possible cause: " \
"presence of __del__ method)"
else:
cold_trail, lines = show_referrers(o, [id(o)], 1)
for line in lines:
print line
def show_referrers(initial_object, backrefs, level):
cold_trail = True
lines = []
for o in gc.get_referrers(initial_object):
bump = 0
if (id(o) in backrefs):
# cyclical reference to an object of the trail
continue
elif (id(o) not in objects_id):
# object created within this very routine
continue
if isinstance(o, (type, ModuleType, FunctionType)):
# dead end, but at least we are 100% sure
# this trail does not lead to a cycle
#
# lines.append(" "*(level+1) + str(type(o)) + \
# " " + str(o)[0:80])
cold_trail = False
continue
if isinstance(o, (BufferType)):
# uninteresting to print, but must be followed
pass
else:
lines.append(" "*(level+1) + str(type(o)) + " " + \
str(o)[0:80])
bump = 1
if len(backrefs) < 8:
backrefs_new = backrefs[:]
backrefs_new.append(id(o))
referrers_are_cold_trails, referrers_lines = \
show_referrers(o, backrefs_new, level+bump)
lines.extend(referrers_lines)
cold_trail = cold_trail and referrers_are_cold_trails
if cold_trail:
# our introspection was worthless because
# only lead to cyclical refs
lines = []
return (cold_trail, lines)
It is centered around the gc.get_referrers() function which returns who is keeping references to a given object. Since the primary reference is most likely being kept by a list or a dictionary, we need then to find who refers to that list or dict, and so on.
Of course one object may be referred by many others, and some references end up being a cycle. Those cycles can be ignored because if they were the only problem, GC would have done away with the object (except by the __del__ cases). What keeps the object alive is a non-cyclic reference. So my code tries to detect and ignore referral paths that lead to a cycle, calling it a "cold trail".
When the ultimate referrer to an object is a module or a function, it may or may not be helpful to print it. In my case, it was not, so I commented out the code that annotates such objects. If no referrer to the object is listed, try then to enable this annotation too,
I used object IDs in object_id and backref since "object in list" may fail if some involved class implements a custom __eq__. And, due to the low-level nature of those operations, I felt more comfortable using IDs, as if it were C++ pointers.