When trying to optimize a web app, finding bottleneck is key (where we spent a lot of time). In other type of app it can be finding memory or CPU usage. This optimization should not be done to early, no need to optimize an unused or to-be-refactored piece of code.

Werkzeug have a build-in middleware that can profiles a request with python cProfile. It allow to follow exactly the execution graph of a request.

To enable this middleware just initialize your app with it:

from werkzeug.contrib.profiler import ProfilerMiddleware
from myapp import app  # This is your Flask app
app.wsgi_app = ProfilerMiddleware(app.wsgi_app)
app.run(debug=True)    # Standard run call

Now each request will be profiled so make sure to remove this middleware when your optimization process is finished because it will drastically degrade your app response time!

Without any further options a cProfile output is printed:

PATH: '/my/path'
         87052 function calls (81433 primitive calls) in 0.203 seconds

   Ordered by: internal time, call count
   List reduced from 711 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4523    0.088    0.000    0.088    0.000 {method 'recv' of '_socket.socket' objects}
  2088/26    0.011    0.000    0.022    0.001 /opt/lib/python2.7/site-packages/schema.py:104(validate)
      161    0.011    0.000    0.105    0.001 /opt/lib/python2.7/socket.py:406(readline)
      38     0.003    0.000    0.003    0.000 /opt/lib/python2.7/json/decoder.py:371(raw_decode)
     4661    0.003    0.000    0.003    0.000 {method 'write' of 'cStringIO.StringO' objects}

     ...
 

It’s a good start but this is not really easy to read and to follow the call graph. When developping a C/C++ app, callgrind (the reference call-graph profiler) provide a very nicer third party graphical app called kcachegrind (or qcachegrind for the Qt version). It can be installed on Mac via brew. Here is what a debug session looks like:

qcachegrind

The problem which is not really one is that the call-graph session must be a callgrind one not a cProfile. This can be easily solved by using a small python script called pyprof2calltree (just pip install it in your virtualenv).

The Werkzeug middleware must be initialized with the profile_dir option in order to store profiling session as a file (beware, if a directory is specified, it must exist or an error is raised). Then the cProfile output can be converted using pyprof2calltree:

$> pyprof2calltree \
        -i GET.my.path.000230ms.1450347684.prof \
        -o callgrind.GET.my.path.000230ms.1450347684.prof

The generated callgrind.* file can now be opened with kcachegrind. To get callgraph ploting work the dot binary must be installed.

You may experience $PATH issue like me on Mac when opening kcachegrind directly from the Finder, to solves this just open it from a terminal.

Checkout kcachegrind documentation and callgrind documentation for more details on usage.