PYTHON MEMORY LEAK INVESTIGATION

I. Overview of memory leak in Python

Memory leak is a gradual increase in the physical RAM usage of a process. The usage of RAM can be seen by some values of VIRT, RES that are virtual memory usage and physical memory usage respectively. However, only RES value is taken into account of considering the memory leak scenario. There are many reasons that leads to this scenario, some of them are listed below:

II. Python memory leak deep-dive

  1. Memory Fragmentation:

In 32-bit OS, so YES. The main problem that we have only 2^32 is the biggest chunk of memory (it can be seen as a virtual memory (VM) – called VIRT) that can be delivered from OS when a request demands memory. Besides, in python, all the objects and references of objects are stored in heap that is used for memory dynamic allocation. Those above reasons can make the memory (infact, it is heap) fragmentation in this following way: Meanwhile we have available free contigous memory (it resides inside VM) that is in small blocks for our process. Those small blocks are not suitable for the large memory request then it calls more memory for the request from OS. When that memory is returned, it is splitted up into the smaller blocks then when the next memory request comes; a new memory amount needs to be requested from OS. In another way, if a request of memory arrives, in side the heap of VM, it does not have enough contigous memory block, the demanding process halts to wait for the contigous memory block that can be returned back from other processes. This loop happens multiple times plus the memory splitting above in the long term running process, it probably cause memory fragmentation.

Besides, Python by default use Cpython that does not allow to move objects around to compact memory so that it can avoid the memory fragmentation. One of the solutions is remove Cpython but it is really risky and take a lots of efforts.

In 64-bit OS, the answer is NO. Since this is the case happening in 32-bit OS but in 64-bit OS, it has maximum 2^64 byte chunk of virtual memory (VM). Since Python VM does its own internal memory management so that It may happen in 64-bit OS but in the next thousand years of running process, not by second or minute.

Otherwise, in the worst case if this scenario happens, what we expect is that the VM is filled up gradually or extremely but the real physical memory usage seems no change. In our case, both of VM and real physical memory instances and in the long period of running, it reaches the VM capacity.

  1. Race conditions between multiple threads:

First, the potential for race conditions will increase the memory leak dramatically. What we can agree on that GIL is working well in this case. GIL ensures the non-concurrency of working between multi threads that mean only one thread is executed at once. Besides, each thread in our service source code has its own tasks and they neither over-write nor reference any variables, any values, etc. at the same time. It avoids the possibility of race conditions between working threads.

  1. Sqlalchemy session

As we know that, the Session object itself is not thread-safe but thread-local. In some processes, it can use multithread to access (it may happen at the same time) to mysql through sqlalchemy. In order to manage threads and session due to the non-concurrency of multiplethreads, it is recommended to use scoped_session that does the simple task, which is holding the underlying Session object for whose who (here is working thread) ask for it. By using scope_session object, it can avoid the concurrency of mysql access happening among threads at the same time.

Besides, sqlalchemy is considered as a standard and useful object-relational mapper (ORM) for python to be used as a layer of mysql access. The problem of memory leak in sqlalchemy usage seems not relevant.

  1. Code quality in sense of memory management in Python

One of the challenges in writing python for large scale program is that keeping as small as possible the memory usage. Python internally manages memory itself and all of the objects are managed by using a reference count system. It will free the assigned memory back to OS when the object's reference count falls down to zero.

In Python programming, a recursive function can cause the memory leak. An object's memory is only freed when the its reference count falls down to zero. In the recursive function, it probably has problem of inter-reference among objects (e.g, foo.x → bar, bar.y → foo) that causes the reference count is stuck inside a circular link and it never reaches zero.

Besides, when we assign a variable to an object in python and then delete the variable (e.g. b = object1, del b), only the reference is deleted but the object is not, it causes the overhead in memory usage.

III. Investigation schedule in memory leak

  1. Overview:

Our service is written in multithreaded and there are so many interactions among those threads. It also uses sqlalchemy to access mysql and calls OpenStack nova service to get the information of nodes. The process runs as a daemon as uses Python 2.7.3 version

  1. Investigation tools

IV. RESULT:

def run(self):

LOG.info('%s starting', self)

    generator = self._generator(self._client, self._queue)

    for _ in generator:

        # Do something here

    if self._stop.wait(next_time - now):

    break

    LOG.info('%s stopping', self)

    generator.close()

9/1/2016

VietStack team