Pivot Algorithm For Generating Self-avoiding Chain, Using Python and Cython

Pivot al­go­rithm is best Monte Carlo al­go­rithm known so far used for gen­er­at­ing canon­i­cal en­sem­ble of self-avoid­ing ran­dom walks (fixed num­ber of steps). Originally it is for the ran­dom walk on a lat­tice, but it also can be mod­i­fied for con­tin­u­ous ran­dom walk. Recently I en­coun­tered a prob­lem where I need to gen­er­ate self-avoid­ing chain con­fig­u­ra­tions.

The sim­plest ver­sion of this al­go­rithm breaks into fol­low­ing steps:

  • Prepare an ini­tial con­fig­u­ra­tion of a NN steps walk on lat­tice (equivalent to a NN monomer chain)
  • Randomly pick a site along the chain as pivot site
  • Randomly pick a side (right to the pivot site or left to it), the chain on this side is used for the next step.
  • Randomly ap­ply a ro­tate op­er­a­tion on the part of the chain we choose at the above step.
  • After the ro­ta­tion, check the over­lap be­tween the ro­tated part of the chain and the rest part of the chain. Accept the new con­fig­u­ra­tion if there is no over­lap and restart from 2th step. Reject the con­fig­u­ra­tion and re­peat from 2th step if there are over­laps. For ran­dom walks on a 3D cu­bic lat­tice, there are only 9 dis­tinct ro­ta­tion op­er­a­tions.

Some ref­er­ences on Pivot al­go­rithm

  • [Lal(1969)][1]: The orig­i­nal pa­per of pivot al­go­rithm.
  • [Madras and Sokal(1988)][2]: The most cited pa­per on this field. For the first time, they showed pivot al­go­rithm is ex­trememly ef­fi­cient con­tra­dicted to the in­tu­ition.
  • [Clisby(2010)][3]: The au­thor de­vel­oped a new more ef­fi­cient in­ple­men­ta­tion of pivot al­go­rithm and cal­cu­late the crit­i­cal ex­po­nent ν\nu, which is also the Flory ex­po­nent for poly­mer chain in bad sol­vent.

Python Implementation Using Numpy and Scipy #

The im­ple­ment of this al­go­rithm in Python is very straight­for­ward. You can down­load raw file.

import numpy as np
import timeit
from scipy.spatial.distance import cdist

# define a dot product function used for the rotate operation
def v_dot(a):return lambda b: np.dot(a,b)

class lattice_SAW:
    def __init__(self,N,l0):
        self.N = N
        self.l0 = l0
        # initial configuration. Usually we just use a straight chain as inital configuration
        self.init_state = np.dstack((np.arange(N),np.zeros(N),np.zeros(N)))[0]
        self.state = self.init_state.copy()

        # define a rotation matrix
        # 9 possible rotations: 3 axes * 3 possible rotate angles(90,180,270)
        self.rotate_matrix = np.array([[[1,0,0],[0,0,-1],[0,1,0]],[[1,0,0],[0,-1,0],[0,0,-1]]

    # define pivot algorithm process where t is the number of successful steps
    def walk(self,t):
        acpt = 0
        # while loop until the number of successful step up to t
        while acpt <= t:
            pick_pivot = np.random.randint(1,self.N-1) # pick a pivot site
            pick_side = np.random.choice([-1,1]) # pick a side

            if pick_side == 1:
                old_chain = self.state[0:pick_pivot+1]
                temp_chain = self.state[pick_pivot+1:]
                old_chain = self.state[pick_pivot:]
                temp_chain = self.state[0:pick_pivot]

            # pick a symmetry operator
            symtry_oprtr = self.rotate_matrix[np.random.randint(len(self.rotate_matrix))]
            # new chain after symmetry operator
            new_chain = np.apply_along_axis(v_dot(symtry_oprtr),1,temp_chain - self.state[pick_pivot]) + self.state[pick_pivot]

            # use cdist function of scipy package to calculate the pair-pair distance between old_chain and new_chain
            overlap = cdist(new_chain,old_chain)
            overlap = overlap.flatten()

            # determinte whether the new state is accepted or rejected
            if len(np.nonzero(overlap)[0]) != len(overlap):
                if pick_side == 1:
                    self.state = np.concatenate((old_chain,new_chain),axis=0)
                elif pick_side == -1:
                    self.state = np.concatenate((new_chain,old_chain),axis=0)
                    acpt += 1

        # place the center of mass of the chain on the origin
        self.state = self.l0*(self.state - np.int_(np.mean(self.state,axis=0)))

N = 100 # number of monomers(number of steps)
l0 = 1 # bond length(step length)
t = 1000 # number of pivot steps

chain = lattice_SAW(N,l0)

%timeit chain.walk(t)
1 loops, best of 3: 2.61 s per loop

Above code per­forms a 100 monomer chain with 1000 suc­cess­ful pivot steps. However even with numpy and the built-in func­tion cdist of scipy, the code is still too slow for large num­ber of ran­dom walk steps.

Cython Implementation #

When come to the loops, Python can be very slow. In many com­plex sit­u­a­tions, even numpy and scipy is not that help­ful. For in­stance in this case, in or­der to de­ter­mine the over­laps, we need to have a nested loop over two sets of sites (monomers). In the above code, I use built-in func­tion cdist of scipy to do this, which is al­ready highly op­ti­mized. But ac­tu­ally we don’t have to com­plete the loops, be­cause we can stop the search if we en­counter one over­lap. However I can’t think of a nat­ural numpy or scipy way to do this ef­fi­ciently due to the con­di­tional break. Here is where [Cython][4] can be ex­trememly use­ful. Cython can trans­late your python code to C and trans­late your C or C++ code to a Python mod­ule so you can di­rectly import your C/C++ code in Python. To do that, first we just hand­write our pivot al­go­rithm us­ing plain C++ code.

#include <math.h>
using namespace std;
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t){
... // pivot algorithm codes here

Name the file c_lattice_SAW.cpp. Here we de­fine a func­tion called c_lattice_SAW. Where chain is the ar­ray stor­ing the co­or­di­nates of monomers, N is the num­ber of monomers, l0 is the bond length, t is the num­ber of suc­cess­ful steps.

  • C++11 li­brary ran­dom is used here in or­der to use Mersenne twister RNG di­rectly.
  • The C++ code in this case is not a com­plete pro­gram. It does­n’t have main func­tion.

The whole C++ code can be found in this gist. Beside our plain C code, we also need a header file c_lattice_SAW.h.

void c_lattice_SAW(double* chain, int N, double l0, int ve, int t);

If you don’t want to hand­write a C code, an­other way to use Cython is to write plain Cython pro­gram whose syn­tax is very much Python-like. But in that way, how to get high qual­ity ran­dom num­bers ef­fi­ciently is a prob­lem. Usually there are sev­eral ways to get ran­dom num­bers in Cython

  • Use Python mod­ule random.This method will be very slow if ran­dom num­ber is gen­er­ated in a big loop be­cause gen­er­ated C code must call a Python mod­ule every­time which is a lot of over­head time.

  • Use numpy to gen­er­ate many ran­dom num­bers in ad­vance. This will re­quire large amount of mem­ory and also in many cases, the to­tal num­ber of ran­dom num­bers needed is not known be­fore the com­pu­ta­tion.

  • Use C func­tion rand() from stan­dard li­brary stdlib in Cython. rand() is not a very good RNG. In a sci­en­tific com­pu­ta­tion like Monte Carlo sim­u­la­tion, this is not good way to get ran­dom num­bers.

  • Wrap a good C-implemented RNG us­ing Cython. This can be a good way. Currently I have found two ways to do this: 1. [Use numpy randomkit][5] 2. [Use GSL li­brary][6]

  • Handwrite C or C++ code us­ing random li­brary or other ex­ter­nal li­brary and use Cython to wrap the code. This will give the op­ti­mal per­for­mance, but comes with more com­pli­cated and less read­able code.

What I did in this post is the last method.

Now we need to make a .pyx file that will han­dle the C code in Cython and de­fine a python func­tion to use our C code. Give the .pyx a dif­fer­ent name like lattice_SAW.pyx

import cython
import numpy as np
cimport numpy as np

cdef extern from "c_lattice_SAW.h":
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t)


def lattice_SAW(int N, double l0, int ve, int t):
    cdef np.ndarray[double,ndim = 1,mode="c"] chain = np.zeros(N*3)
    return chain

Compile our C code to gen­er­ate a shared li­brary which can be im­ported into Python as a mod­ule. To do that, we use Python distutils pack­age. Make a file named setup.py.

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

import numpy

cmdclass = {'build_ext':build_ext},
            ext_modules = [Extension("lattice_SAW",
            sources = ["lattice_SAW.pyx","c_lattice_SAW.cpp"],
            include_dirs = [numpy.get_include()])],

Instead of nor­mal ar­gu­ments, we also have extra_compile_args here. This is be­cause in the C++ code, we use li­brary random which is new in C++11. On Mac, -std=c++11 and -stdlib=libc++ need to be added to tell the com­pil­ers to sup­port C++11 and use libc++ as the stan­dard li­brary. On Linux sys­tem, just -std=c++11 is enough.

If cimport numpy is used, then the set­ting include_dirs = [numpy.get_include()])] need to be added

Then in ter­mi­nal we do,

On Linux

python setup.py build_ext --inplace

or on Mac OS

clang++ python setup.py build_ext --inplace

clang++ tell the python use clang com­piler not gcc be­cause ap­par­ently the ver­sion of gcc shipped with OS X does­n’t sup­port C++11.

If the com­pi­la­tion goes suc­cess­fully, then a .so li­brary file is gen­er­ated. Now we can im­port our mod­ule in Python in that work­ing di­rec­tory

import lattice_SAW
import numpy

%timeit lattice_SAW.lattice_SAW(100,1,1,1000)
100 loops, best of 3: 5.97 ms per loop

That is 437 times faster than our Numpy/Scipy way!

  1. http://​www.tand­fon­line.com/​doi/​abs/​10.1080/​00268976900100781#.VGLB­S­fld­V8E ↩︎

  2. http://​link.springer.com/​ar­ti­cle/​10.1007/​BF01022990 ↩︎

  3. http://​jour­nals.aps.org/​prl/​ab­stract/​10.1103/​Phys­RevLett.104.055702 ↩︎

  4. http://​cython.org/ ↩︎

  5. https://​groups.google.com/​fo­rum/#!​msg/​cython-users/​9UG­Mi_b3tVo/​7m­N­r87E3ZIAJ ↩︎

  6. http://​pyin­sci.blogspot.com/​2010/​12/​ef­fic­cient-mcmc-in-python.html ↩︎