Each time when the driver writes into the socket, it allocates a new `Buffer`, which means a `malloc(3)` done by Node.js for buffers larger than 8KB (so called, non-pooled buffers). See:
The allocation could be avoided by reusing the buffer in situations when messages fit into the buffer. This way coalesced message chunks would be copied into the reused buffer. As a further improvement encoders could write directly into the reused buffer instead of allocating a buffer per message (although, this is tricky to implement, as such change would impact multiple modules and might "fragile" in terms of maintenance).
This optimization may give around 10-15% throughput improvement in write-heavy scenarios, but, obviously, that has to be measured as something in the server-side or the client may be a bottleneck here.
Another nice-to-have optimization here might be replacing shift & push approach for the internal queue and batch array with a single slice operation for each batch. The following microbenchmark compares performance of each approach: https://github.com/puzpuzpuz/microbenchmarks/blob/master/src/slice-vs-push.js
On Node.js v14.8.0 (Ubuntu 18.04) it gives the following output:
$ node src/slice-vs-push.js
slice x 819,592 ops/sec ±1.42% (87 runs sampled)
shift and push x 166,645 ops/sec ±0.17% (96 runs sampled)
Buffer.concat() uses the internal buffer pool: https://nodejs.org/api/buffer.html#buffer_static_method_buffer_concat_list_totallength
Buffer.concat() may also use the internal Buffer pool like Buffer.allocUnsafe() does.
Also, the driver uses Buffer.allocUnsafe() extensively, which also a slice from the pool.
Buffer.concat() uses the internal buffer pool
Yes, that’s correct. However, that “pool“ only amortizes buffer allocation by allocating the pool buffer in batches. By default, the “pool“ is used if the newly allocated buffer is <=4KB which may be not the case for the coalesced buffer allocated by the driver (which is constrained by 8KB).
In any case, current implementation assumes a certain allocation rate which could be reduced significantly. Whether it improves the end throughput of the driver in write-heavy scenarios or not is an open question. I just wanted to describe the potential optimization.
buffer is <=4KB
Good info, definitely worth investigating.