While attempting to reproduce at 2.0.8, I was able to produce a similar, but different issue. When a write fails on a connection, Connection#writeHandler will call defunct on the Connection. In Connection#defunct, if the host for that connection is considered down PooledConnection#notifyOwnerWhenDefunct is called. This triggers the HostConnectionPool to close it's transport which closes all of it's connections. When closing each connection it needs to acquire the write lock on the channel.
The issue arises if you have multiple writes fail concurrently on a separate connection for the same host. In this situation each netty worker holds the writeLock for it's channel that has the failed write, so when it goes to close the other channels it will not be able to acquire the writeLock for the channel held by the other worker(s).
I verified that both channel.writeLock objects belonged to different channels on the same destination host by doing a heap dump.
I was able to produce this by disabling a network interface belonging to a CCM node while running a stress test.
3 node ccm cluster running cassandra 2.0.8
I can reproduce this rather quickly on a 3 node cluster (0-5 minutes typically) with a targeted test scenario that injects connection resets on multiple connections simultaneously between a client connection and a particular host on both 2.0.8 and 2.1.3 driver versions. Could not reproduce with fix on 2.0 and 2.1 branch, marking as resolved.