profiler¶
- paddle.utils.profiler. profiler ( state, sorted_key=None, profile_path='/tmp/profile', tracer_option='Default' ) [source]
-
The profiler interface. Different from fluid.profiler.cuda_profiler, this profiler can be used to profile both CPU and GPU program.
- Parameters
-
state (str) – The profiling state, which should be one of ‘CPU’, ‘GPU’ or ‘All’. ‘CPU’ means only profiling CPU; ‘GPU’ means profiling both CPU and GPU; ‘All’ means profiling both CPU and GPU, and generates timeline as well.
sorted_key (str, optional) – The order of profiling results, which should be one of None, ‘calls’, ‘total’, ‘max’, ‘min’ or ‘ave’. Default is None, means the profiling results will be printed in the order of first end time of events. The calls means sorting by the number of calls. The total means sorting by the total execution time. The max means sorting by the maximum execution time. The min means sorting by the minimum execution time. The ave means sorting by the average execution time.
profile_path (str, optional) – If state == ‘All’, it will generate timeline, and write it into profile_path. The default profile_path is /tmp/profile.
tracer_option (str, optional) – tracer_option can be one of [‘Default’, ‘OpDetail’, ‘AllOpDetail’], it can control the profile level and print the different level profile result. Default option print the different Op type profiling result and the OpDetail option print the detail profiling result of different op types such as compute and data transform, AllOpDetail option print the detail profiling result of different op name same as OpDetail.
- Raises
-
ValueError – If state is not in [‘CPU’, ‘GPU’, ‘All’]. If sorted_key is not in [‘calls’, ‘total’, ‘max’, ‘min’, ‘ave’].
Examples
import paddle.fluid as fluid import paddle.fluid.profiler as profiler import numpy as np epoc = 8 dshape = [4, 3, 28, 28] data = fluid.data(name='data', shape=[None, 3, 28, 28], dtype='float32') conv = fluid.layers.conv2d(data, 20, 3, stride=[1, 1], padding=[1, 1]) place = fluid.CPUPlace() exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) with profiler.profiler('CPU', 'total', '/tmp/profile', 'Default') as prof: for i in range(epoc): input = np.random.random(dshape).astype('float32') exe.run(fluid.default_main_program(), feed={'data': input})
Examples Results:
#### Examples Results #### #### 1) sorted_key = 'total', 'calls', 'max', 'min', 'ave' #### # The only difference in 5 sorted_key results is the following sentence: # "Sorted by number of xxx in descending order in the same thread." # The reason is that in this example, above 5 columns are already sorted. -------------------------> Profiling Report <------------------------- Place: CPU Time unit: ms Sorted by total time in descending order in the same thread #Sorted by number of calls in descending order in the same thread #Sorted by number of max in descending order in the same thread #Sorted by number of min in descending order in the same thread #Sorted by number of avg in descending order in the same thread Event Calls Total Min. Max. Ave. Ratio. thread0::conv2d 8 129.406 0.304303 127.076 16.1758 0.983319 thread0::elementwise_add 8 2.11865 0.193486 0.525592 0.264832 0.016099 thread0::feed 8 0.076649 0.006834 0.024616 0.00958112 0.000582432 #### 2) sorted_key = None #### # Since the profiling results are printed in the order of first end time of Ops, # the printed order is feed->conv2d->elementwise_add -------------------------> Profiling Report <------------------------- Place: CPU Time unit: ms Sorted by event first end time in descending order in the same thread Event Calls Total Min. Max. Ave. Ratio. thread0::feed 8 0.077419 0.006608 0.023349 0.00967738 0.00775934 thread0::conv2d 8 7.93456 0.291385 5.63342 0.99182 0.795243 thread0::elementwise_add 8 1.96555 0.191884 0.518004 0.245693 0.196998