Pivot Algorithm For Generating Self-avoiding Chain, Using Python and Cython

Nov 13, 2014

Pivot algorithm is best Monte Carlo algorithm known so far used for generating canonical ensemble of self-avoiding random walks (ﬁxed number of steps). Originally it is for the random walk on a lattice, but it also can be modiﬁed for continuous random walk. Recently I encountered a problem where I need to generate self-avoiding chain conﬁgurations.

The simplest version of this algorithm breaks into following steps:

Prepare an initial conﬁguration of a $N$ steps walk on lattice (equivalent to a $N$ monomer chain)
Randomly pick a site along the chain as pivot site
Randomly pick a side (right to the pivot site or left to it), the chain on this side is used for the next step.
Randomly apply a rotate operation on the part of the chain we choose at the above step.
After the rotation, check the overlap between the rotated part of the chain and the rest part of the chain. Accept the new conﬁguration if there is no overlap and restart from 2th step. Reject the conﬁguration and repeat from 2th step if there are overlaps. For random walks on a 3D cubic lattice, there are only 9 distinct rotation operations.

Some references on Pivot algorithm

[Lal(1969)]^[1]: The original paper of pivot algorithm.
[Madras and Sokal(1988)]^[2]: The most cited paper on this ﬁeld. For the ﬁrst time, they showed pivot algorithm is extrememly efﬁcient contradicted to the intuition.
[Clisby(2010)]^[3]: The author developed a new more efﬁcient inplementation of pivot algorithm and calculate the critical exponent $\nu$ , which is also the Flory exponent for polymer chain in bad solvent.

Python Implementation Using Numpy and Scipy #

The implement of this algorithm in Python is very straightforward. You can download raw ﬁle.

import numpy as np
import timeit
from scipy.spatial.distance import cdist

# define a dot product function used for the rotate operation
def v_dot(a):return lambda b: np.dot(a,b)

class lattice_SAW:
    def __init__(self,N,l0):
        self.N = N
        self.l0 = l0
        # initial configuration. Usually we just use a straight chain as inital configuration
        self.init_state = np.dstack((np.arange(N),np.zeros(N),np.zeros(N)))[0]
        self.state = self.init_state.copy()

        # define a rotation matrix
        # 9 possible rotations: 3 axes * 3 possible rotate angles(90,180,270)
        self.rotate_matrix = np.array([[[1,0,0],[0,0,-1],[0,1,0]],[[1,0,0],[0,-1,0],[0,0,-1]]
        ,[[1,0,0],[0,0,1],[0,-1,0]],[[0,0,1],[0,1,0],[-1,0,0]]
        ,[[-1,0,0],[0,1,0],[0,0,-1]],[[0,0,-1],[0,1,0],[-1,0,0]]
        ,[[0,-1,0],[1,0,0],[0,0,1]],[[-1,0,0],[0,-1,0],[0,0,1]]
        ,[[0,1,0],[-1,0,0],[0,0,1]]])

    # define pivot algorithm process where t is the number of successful steps
    def walk(self,t):
        acpt = 0
        # while loop until the number of successful step up to t
        while acpt <= t:
            pick_pivot = np.random.randint(1,self.N-1) # pick a pivot site
            pick_side = np.random.choice([-1,1]) # pick a side

            if pick_side == 1:
                old_chain = self.state[0:pick_pivot+1]
                temp_chain = self.state[pick_pivot+1:]
            else:
                old_chain = self.state[pick_pivot:]
                temp_chain = self.state[0:pick_pivot]

            # pick a symmetry operator
            symtry_oprtr = self.rotate_matrix[np.random.randint(len(self.rotate_matrix))]
            # new chain after symmetry operator
            new_chain = np.apply_along_axis(v_dot(symtry_oprtr),1,temp_chain - self.state[pick_pivot]) + self.state[pick_pivot]

            # use cdist function of scipy package to calculate the pair-pair distance between old_chain and new_chain
            overlap = cdist(new_chain,old_chain)
            overlap = overlap.flatten()

            # determinte whether the new state is accepted or rejected
            if len(np.nonzero(overlap)[0]) != len(overlap):
                continue
            else:
                if pick_side == 1:
                    self.state = np.concatenate((old_chain,new_chain),axis=0)
                elif pick_side == -1:
                    self.state = np.concatenate((new_chain,old_chain),axis=0)
                    acpt += 1

        # place the center of mass of the chain on the origin
        self.state = self.l0*(self.state - np.int_(np.mean(self.state,axis=0)))

N = 100 # number of monomers(number of steps)
l0 = 1 # bond length(step length)
t = 1000 # number of pivot steps

chain = lattice_SAW(N,l0)

%timeit chain.walk(t)
1 loops, best of 3: 2.61 s per loop

Above code performs a 100 monomer chain with 1000 successful pivot steps. However even with numpy and the built-in function cdist of scipy, the code is still too slow for large number of random walk steps.

Cython Implementation #

When come to the loops, Python can be very slow. In many complex situations, even numpy and scipy is not that helpful. For instance in this case, in order to determine the overlaps, we need to have a nested loop over two sets of sites (monomers). In the above code, I use built-in function cdist of scipy to do this, which is already highly optimized. But actually we don’t have to complete the loops, because we can stop the search if we encounter one overlap. However I can’t think of a natural numpy or scipy way to do this efﬁciently due to the conditional break. Here is where [Cython]^[4] can be extrememly useful. Cython can translate your python code to C and translate your C or C++ code to a Python module so you can directly import your C/C++ code in Python. To do that, ﬁrst we just handwrite our pivot algorithm using plain C++ code.

#include <math.h>
using namespace std;
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t){
... // pivot algorithm codes here
}

Name the ﬁle c_lattice_SAW.cpp. Here we deﬁne a function called c_lattice_SAW. Where chain is the array storing the coordinates of monomers, N is the number of monomers, l0 is the bond length, t is the number of successful steps.

C++11 library random is used here in order to use Mersenne twister RNG directly.
The C++ code in this case is not a complete program. It doesn’t have main function.

The whole C++ code can be found in this gist. Beside our plain C code, we also need a header ﬁle c_lattice_SAW.h.

void c_lattice_SAW(double* chain, int N, double l0, int ve, int t);

If you don’t want to handwrite a C code, another way to use Cython is to write plain Cython program whose syntax is very much Python-like. But in that way, how to get high quality random numbers efﬁciently is a problem. Usually there are several ways to get random numbers in Cython

Use Python module random.This method will be very slow if random number is generated in a big loop because generated C code must call a Python module everytime which is a lot of overhead time.
Use numpy to generate many random numbers in advance. This will require large amount of memory and also in many cases, the total number of random numbers needed is not known before the computation.
Use C function rand() from standard library stdlib in Cython. rand() is not a very good RNG. In a scientiﬁc computation like Monte Carlo simulation, this is not good way to get random numbers.
Wrap a good C-implemented RNG using Cython. This can be a good way. Currently I have found two ways to do this: 1. [Use numpy randomkit]^[5] 2. [Use GSL library]^[6]
Handwrite C or C++ code using random library or other external library and use Cython to wrap the code. This will give the optimal performance, but comes with more complicated and less readable code.

What I did in this post is the last method.

Now we need to make a .pyx ﬁle that will handle the C code in Cython and deﬁne a python function to use our C code. Give the .pyx a different name like lattice_SAW.pyx

import cython
import numpy as np
cimport numpy as np

cdef extern from "c_lattice_SAW.h":
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t)

@cython.boundscheck(False)
@cython.wraparound(False)

def lattice_SAW(int N, double l0, int ve, int t):
    cdef np.ndarray[double,ndim = 1,mode="c"] chain = np.zeros(N*3)
    c_lattice_SAW(&chain[0],N,l0,ve,t)
    return chain

Compile our C code to generate a shared library which can be imported into Python as a module. To do that, we use Python distutils package. Make a ﬁle named setup.py.

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

import numpy

setup(
cmdclass = {'build_ext':build_ext},
            ext_modules = [Extension("lattice_SAW",
            sources = ["lattice_SAW.pyx","c_lattice_SAW.cpp"],
            extra_compile_args=['-std=c++11','-stdlib=libc++'],
            language="c++",
            include_dirs = [numpy.get_include()])],
)

Instead of normal arguments, we also have extra_compile_args here. This is because in the C++ code, we use library random which is new in C++11. On Mac, -std=c++11 and -stdlib=libc++ need to be added to tell the compilers to support C++11 and use libc++ as the standard library. On Linux system, just -std=c++11 is enough.

If cimport numpy is used, then the setting include_dirs = [numpy.get_include()])] need to be added

Then in terminal we do,

On Linux

python setup.py build_ext --inplace

or on Mac OS

clang++ python setup.py build_ext --inplace

clang++ tell the python use clang compiler not gcc because apparently the version of gcc shipped with OS X doesn’t support C++11.

If the compilation goes successfully, then a .so library ﬁle is generated. Now we can import our module in Python in that working directory

import lattice_SAW
import numpy

%timeit lattice_SAW.lattice_SAW(100,1,1,1000)
100 loops, best of 3: 5.97 ms per loop

That is 437 times faster than our Numpy/Scipy way!