Stack allocation vs heap allocation – performance benchmark

Intro

It has always been debate about heap allocation or stack allocation performance in C/C++ code (or in other programming language). We all know that stack allocation means that assembly just needs to increment stack pointer and that’s it. How ever in case of heap, there is lot more going on. The memory allocator needs to ask kernel for some free page, needs to partition the page properly, with time fragmentation may occur, etc. Thus with one word to say, lot of work. But there are very few if any benchmarks available on the web for real comparison about the results with one or another technique and how really it affects that execution speed.

Now time is come for some real benchmark for stack allocation versus heap allocation. The test scenario is following:

  • In the loop allocate memory blocks with 10 different chunks with different total sizes.
  • Fill the chunks with some data.
  • Measure the data and provide some output (to avoid compiler optimizations for doing nothing).
  • Free memory.

Following tests are performed:

  • Using gcc -O1 optimizations, heap vs stack with out data init.
  • Using gcc -O1 optimizations, heap vs stack with data init.
  • Using gcc -O2 optimizations, heap vs stack with out data init.
  • Using gcc -O2 optimizations, heap vs stack with data init.

In test cases with out memory initialization, the memory pointer addresses are summed and printed on stderr, so that compiler would optimize the allocations because of no further usage.

The data initialization is done by doing memcpy. The typical memory block usage usually is not byte access, but some more bulky, like mem copy, integer, long, etc i.e. word size access to memory. Thus the test case with initialization uses pre-initialized memory block with random data which is copied to target buffer. After the copy every 10th byte is read.

The testing package uses Enduro/X high performance middleware standard library for results plotting and time measurements. The actual code used for testing can be found here. The code is pure C. The stack based allocation with different sizes is done by using C99 variable length array feature.

Test results

Particular test cases was executed on Intel(R) Core(TM) i7-6600U CPU, Linux 64 bit, 4.15.0-50-generic, Spectre and Meltdown patches disabled. For your hardware setup, you might want to adjust LOOPS variable in build.sh before running, due to different machine speed. As too fast machine will give inconsistent measurements. To slow machine might give “infinite” wait time for results.

Heap vs Stack, -O1, no data init

noinit1

Heap VS Stack allocation benchmark, milliseconds per 10 million allocations of mixed 10 block sizes

Heap vs Stack, -O1, with data init

filled1

1M memory allocations with 10 differently sized chunks 

Heap vs Stack, -O2, no data init

noinit2

10M allocations with 10 mixed memory blocks 

Heap vs Stack, -O2, with data init

filled2

1M allocations with 10 mixed blocks

Conclusions

In conclusions we see that pure with out data init, stack allocation is much faster. Difference is about 4 seconds on 10M loops over 10 allocation attempts. So if checking on single pair of malloc+free, this will be about 0.4 seconds. But needs to take into account that 10 millions, is very very big number to process.

To continue we see that with -O2 filled data initialization shows much smaller gap between stack and heap allocation, due to fact that CPU is much more busy with data processing than allocation procedures. But any way on 1 million loops with 10 allocation+init+free attempts, we see that difference is about 8% at 60KB total processing size. Which also also not a big show stopper.

Thus to conclude, there is difference between stack and heap allocation strategies, but in real processing life, the affect on processing speed is low. In case if there is choice in doing stack allocation and afterwards doing copy of data to other data block due to stack allocation life cycle boundaries, much more effective would be to use heap allocation and pass the pointer to other function program blocks instead of copy.

2 thoughts on “Stack allocation vs heap allocation – performance benchmark

  1. Do u think accessing(not allocating) would be the same speed in both the case ? I mean chances of stack variable present in L1/L2 cache is absolutely nil, but if the heap variable which was already allocated might be or might not be present in those caches. Lets take a case of nothing is present in those cache line, both are pointing to a memory location( stack at some address and heap at some address) which would take same amount of CPU cycles right ?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s