The previous fix also had a conceptual error: it allowed the per-thread map to
be modified concurrently, as the behavior of map::find/map::operator[] is to
insert a new element with that key if it is not found.
Also this new fix uses a recursive mutex (also available in C++11) as the same
thread will aquire the lock during conditions such as destruction, e.g. a TLS
entry querying the current thread for logging as part of its destructor.
While cleaning up TLS resources, some destructors might still query for the
thread which is currently being shut down; logging is a very common case.
This patch causes the TLS implementation to use the lock only when the data
has not been created yet, improving performance and fixing that deadlock.