It has always been debate about heap allocation or stack allocation performance in C/C++ code (or in other programming language). We all know that stack allocation means that assembly just needs to increment stack pointer and that’s it. How ever in case of heap, there is lot more going on. The memory allocator needs to ask kernel for some free page, needs to partition the page properly, with time fragmentation may occur, etc. Thus with one word to say, lot of work. But there are very few if any benchmarks available on the web for real comparison about the results with one or another technique and how really it affects that execution speed.
Now time is come for some real benchmark for stack allocation versus heap allocation. The test scenario is following:
- In the loop allocate memory blocks with 10 different chunks with different total sizes.
- Fill the chunks with some data.
- Measure the data and provide some output (to avoid compiler optimizations for doing nothing).
- Free memory.
Following tests are performed:
- Using gcc -O1 optimizations, heap vs stack with out data init.
- Using gcc -O1 optimizations, heap vs stack with data init.
- Using gcc -O2 optimizations, heap vs stack with out data init.
- Using gcc -O2 optimizations, heap vs stack with data init.
In test cases with out memory initialization, the memory pointer addresses are summed and printed on stderr, so that compiler would optimize the allocations because of no further usage.
The data initialization is done by doing memcpy. The typical memory block usage usually is not byte access, but some more bulky, like mem copy, integer, long, etc i.e. word size access to memory. Thus the test case with initialization uses pre-initialized memory block with random data which is copied to target buffer. After the copy every 10th byte is read.
The testing package uses Enduro/X high performance middleware standard library for results plotting and time measurements. The actual code used for testing can be found here. The code is pure C. The stack based allocation with different sizes is done by using C99 variable length array feature.
Particular test cases was executed on Intel(R) Core(TM) i7-6600U CPU, Linux 64 bit, 4.15.0-50-generic, Spectre and Meltdown patches disabled. For your hardware setup, you might want to adjust LOOPS variable in build.sh before running, due to different machine speed. As too fast machine will give inconsistent measurements. To slow machine might give “infinite” wait time for results.
Heap vs Stack, -O1, no data init
Heap vs Stack, -O1, with data init
Heap vs Stack, -O2, no data init
Heap vs Stack, -O2, with data init
In conclusions we see that pure with out data init, stack allocation is much faster. Difference is about 4 seconds on 10M loops over 10 allocation attempts. So if checking on single pair of malloc+free, this will be about 0.4 seconds. But needs to take into account that 10 millions, is very very big number to process.
To continue we see that with -O2 filled data initialization shows much smaller gap between stack and heap allocation, due to fact that CPU is much more busy with data processing than allocation procedures. But any way on 1 million loops with 10 allocation+init+free attempts, we see that difference is about 8% at 60KB total processing size. Which also also not a big show stopper.
Thus to conclude, there is difference between stack and heap allocation strategies, but in real processing life, the affect on processing speed is low. In case if there is choice in doing stack allocation and afterwards doing copy of data to other data block due to stack allocation life cycle boundaries, much more effective would be to use heap allocation and pass the pointer to other function program blocks instead of copy.