Reuse coalescing buffer during writes

Description

Each time when the driver writes into the socket, it allocates a new `Buffer`, which means a `malloc(3)` done by Node.js for buffers larger than 8KB (so called, non-pooled buffers). See:
https://github.com/datastax/nodejs-driver/blob/master/lib/writers.js#L271

The allocation could be avoided by reusing the buffer in situations when messages fit into the buffer. This way coalesced message chunks would be copied into the reused buffer. As a further improvement encoders could write directly into the reused buffer instead of allocating a buffer per message (although, this is tricky to implement, as such change would impact multiple modules and might "fragile" in terms of maintenance).

This optimization may give around 10-15% throughput improvement in write-heavy scenarios, but, obviously, that has to be measured as something in the server-side or the client may be a bottleneck here.

Another nice-to-have optimization here might be replacing shift & push approach for the internal queue and batch array with a single slice operation for each batch. The following microbenchmark compares performance of each approach: https://github.com/puzpuzpuz/microbenchmarks/blob/master/src/slice-vs-push.js

On Node.js v14.8.0 (Ubuntu 18.04) it gives the following output:
```
$ node src/slice-vs-push.js
slice x 819,592 ops/sec ±1.42% (87 runs sampled)
shift and push x 166,645 ops/sec ±0.17% (96 runs sampled)
```

Environment

None

Pull Requests

None

Activity

Show:
Jorge Bay Gondra
October 26, 2020, 8:19 AM

Hi !

Buffer.concat() uses the internal buffer pool: https://nodejs.org/api/buffer.html#buffer_static_method_buffer_concat_list_totallength

Buffer.concat() may also use the internal Buffer pool like Buffer.allocUnsafe() does.

Also, the driver uses Buffer.allocUnsafe() extensively, which also a slice from the pool.

Andrey Pechkurov
October 26, 2020, 8:49 AM

Buffer.concat() uses the internal buffer pool

Yes, that’s correct. However, that “pool“ only amortizes buffer allocation by allocating the pool buffer in batches. By default, the “pool“ is used if the newly allocated buffer is <=4KB which may be not the case for the coalesced buffer allocated by the driver (which is constrained by 8KB).

In any case, current implementation assumes a certain allocation rate which could be reduced significantly. Whether it improves the end throughput of the driver in write-heavy scenarios or not is an open question. I just wanted to describe the potential optimization.

Jorge Bay Gondra
October 26, 2020, 8:57 AM

buffer is <=4KB

TIL!

Good info, definitely worth investigating.

Assignee

Unassigned

Reporter

Andrey Pechkurov

Reviewer

None

Fix versions

None

Labels

Components

None

PM Priority

None

Pull Request

None

Priority

Minor
Configure