Memory leak detect in uncertain cases – advanced techniques by JNI Example

Just wanted to share my experience with endurox-java development with is binding for Enduro/X Real Time Microservices middleware. During the long running test cases, it is been found that processes basically are leaking the memory. But Valgrind does not report that memory is leaked. Thus next step if you are sure that there is leak, you may use “–show-reachable=yes” parameter, to track down similar places/sizes where possibly leaked (decided by you) memory could be allocated.

In this case author found that it is somewhere in libjvm.so. This may be a call from JNI back to Java VM, as “jmap -histo 3928” did not show any object leak. So what have been seen is that some object of 312 bytes somewhere periodically is allocated (the number of allocs are similar to IPC call count):

==3928== 312 bytes in 1 blocks are possibly lost in loss record 15,259 of 15,611
==3928== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3928== by 0x58F4FEC: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x52F9538: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x52F95F4: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x56E5FA9: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x56A2CE6: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x56A5685: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x566435F: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x599D38E: ??? (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so)
==3928== by 0x8979584: ???
==3928== by 0x783D389C7: ???
==3928== by 0x8A21FA7: ???

Thus to trace further, as the Enduro/X Libaries are using debug logging at certain steps, it would be helpful print some debug message when memory is allocated, thus by context is process logs, it could be possible to detect region in code where the leaking occurs. Thanks to GNU C Library, it is possible to install malloc/realloc/free hooks to do so. Thus author does following – updates java loader – Enduro/X Java Linker (a compiler for building java apps as stand alone binaries which basically boots a C code which loads a Java class from byte array and starts to execute it) to init a a hook for a malloc:

/* save the original pointer... */
__thread void * old_malloc_hook;

static void *
my_malloc_hook (size_t size, const void *caller)
{
void *result;
nstd_tls_t *tls;
/* Restore all old hooks */
__malloc_hook = old_malloc_hook;

/* Call recursively */
result = malloc (size);

/* Save underlying hooks */
old_malloc_hook = __malloc_hook;
/* NOTE!!!! THIS MUST NOT use gettimeofday() as it will deadlock! */
NDRX_LOG(log_error, "malloc (%d) returns %p", (int)size, result);

/* Restore our own hooks */
__malloc_hook = my_malloc_hook;

return result;
}

static void
my_init (void)
{
old_malloc_hook = __malloc_hook;
__malloc_hook = my_malloc_hook;
}

And thus now in logs got something more pointing out… in logs:

N:NDRX:5:d5d3db3a: 2961:7f14cece9840:001:00000000:000000000:er_translate:Buffer.c:0332:Allocating [org/endurox/TypedUbf] class
N:NDRX:5:d5d3db3a: 2961:7f14cece9840:001:00000000:000000000:er_translate:Buffer.c:0352:About to NewObject(org/endurox/TypedUbf)
N:NDRX:2:d5d3db3a: 2961:7f14cece9840:001:00000000:000000000:_malloc_hook:ds/exj.c:0523:malloc (312) returns 0x100cb40
N:NDRX:5:d5d3db3a: 2961:7f14cece9840:001:00000000:000000000:er_translate:Buffer.c:0363:NewObject() done

WHOLLA, seems like we do some malloc before creating a new object of 312 bytes… So what is it? if we check further the logs, we see that 312 bytes are allocated at intervals where java env object calls are made from C. Thus something strange happens… And the thing is.. The Enduro/X Java service call dispatcher is fully started from C side, and code like this one (fixed already):

...
NDRX_LOG(log_debug, "Allocating [%s] class", clazz);
    
bclz = (*env)->FindClass(env, clazz);
NDRX_LOG(log_debug, "About to NewObject(%s)", clazz);
    
ret = (*env)->NewObject(env, bclz, mid, ctx_obj, (jboolean)finalize, 
            (jlong)data, len);

NDRX_LOG(log_debug, "NewObject() done");

/* line added after leak found...: */
(*env)->DeleteLocalRef( env, bclz);

...

 

Still generates local references for “class definitions” to C and thus they must be made free, by:

/* line added after leak found...: */
(*env)->DeleteLocalRef( env, bclz);

Thus to conclude, the valgrind still accessible printing feature might hint you the size of leaked objects. The Glib C __malloc_hook feature, gives further options to understand in log sequence where problem occurs, the with some brainstorming it is possible to more closely understand where the leaks occur.

Leave a comment