profiler¶

paddle.utils.profiler. profiler ( state, sorted_key=None, profile_path='/tmp/profile', tracer_option='Default' ) [source]

The profiler interface. Different from fluid.profiler.cuda_profiler, this profiler can be used to profile both CPU and GPU program.

Parameters

state (str) – The profiling state, which should be one of ‘CPU’, ‘GPU’ or ‘All’. ‘CPU’ means only profiling CPU; ‘GPU’ means profiling both CPU and GPU; ‘All’ means profiling both CPU and GPU, and generates timeline as well.
sorted_key (str, optional) – The order of profiling results, which should be one of None, ‘calls’, ‘total’, ‘max’, ‘min’ or ‘ave’. Default is None, means the profiling results will be printed in the order of first end time of events. The calls means sorting by the number of calls. The total means sorting by the total execution time. The max means sorting by the maximum execution time. The min means sorting by the minimum execution time. The ave means sorting by the average execution time.
profile_path (str, optional) – If state == ‘All’, it will generate timeline, and write it into profile_path. The default profile_path is /tmp/profile.
tracer_option (str, optional) – tracer_option can be one of [‘Default’, ‘OpDetail’, ‘AllOpDetail’], it can control the profile level and print the different level profile result. Default option print the different Op type profiling result and the OpDetail option print the detail profiling result of different op types such as compute and data transform, AllOpDetail option print the detail profiling result of different op name same as OpDetail.

Raises

ValueError – If state is not in [‘CPU’, ‘GPU’, ‘All’]. If sorted_key is not in [‘calls’, ‘total’, ‘max’, ‘min’, ‘ave’].

Examples

import paddle.fluid as fluid
import paddle.fluid.profiler as profiler
import numpy as np

epoc = 8
dshape = [4, 3, 28, 28]
data = fluid.data(name='data', shape=[None, 3, 28, 28], dtype='float32')
conv = fluid.layers.conv2d(data, 20, 3, stride=[1, 1], padding=[1, 1])

place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

with profiler.profiler('CPU', 'total', '/tmp/profile', 'Default') as prof:
    for i in range(epoc):
        input = np.random.random(dshape).astype('float32')
        exe.run(fluid.default_main_program(), feed={'data': input})

Examples Results:

#### Examples Results ####
#### 1) sorted_key = 'total', 'calls', 'max', 'min', 'ave' ####
# The only difference in 5 sorted_key results is the following sentence:
# "Sorted by number of xxx in descending order in the same thread."
# The reason is that in this example, above 5 columns are already sorted.
------------------------->     Profiling Report     <-------------------------

Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread
#Sorted by number of calls in descending order in the same thread
#Sorted by number of max in descending order in the same thread
#Sorted by number of min in descending order in the same thread
#Sorted by number of avg in descending order in the same thread

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::conv2d             8           129.406     0.304303    127.076     16.1758     0.983319
thread0::elementwise_add    8           2.11865     0.193486    0.525592    0.264832    0.016099
thread0::feed               8           0.076649    0.006834    0.024616    0.00958112  0.000582432

#### 2) sorted_key = None  ####
# Since the profiling results are printed in the order of first end time of Ops,
# the printed order is feed->conv2d->elementwise_add
------------------------->     Profiling Report     <-------------------------

Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::feed               8           0.077419    0.006608    0.023349    0.00967738  0.00775934
thread0::conv2d             8           7.93456     0.291385    5.63342     0.99182     0.795243
thread0::elementwise_add    8           1.96555     0.191884    0.518004    0.245693    0.196998