Add driver-side write batching of native frames

Description

_emphasized text_See CASSANDRA-5663.

The relevant code in Cassandra is in Message.java (look for the Flusher class).

Environment

None

Pull Requests

None

Activity

Show:
Ariel Weisberg
February 5, 2015, 5:19 PM

I tried this out with mixed success. As usual you only see a benefit if you are using no delay. I will experiment with larger windows for coalescing messages to see if that yields an improvement beyond just no delay.

If I am reading correctly the default is not have no delay enabled between client and server. The client has a config option for it, but I don't see any place where the server enables no delay to clients for Netty.

I am a little leery of the buffer reference counting in Netty. The lifecycle is not clear to me and I don't see where some/all are being discarded. It doesn't seem to leak though.

I tested on EC2 with c3.8xlarge instances in the same placement group running Ubuntu 14.04. 5 client nodes and 1 server node.

I used https://github.com/aweisberg/cassandra/commit/8f51b8aafb8c56e4c93691a940a63785fc0ab445 for the server and stress client.

I change the following on instances

I start the clients with

This is a very small dataset of small keys designed to factor out messaging overhead. Caches weren't enabled.

I started by testing the default configuration of the stress client which doesn't enable TCP no delay.

500 client threads per client node

w/Coalescing

wo/Coalescing

290076

291898

280125

288428

16 client threads per client node

w/Coalescing

wo/Coalescing

109605

115520

109121

114010

Enabling TCP no delay

500 client threads per client node

w/Coalescing

wo/Coalescing

288909

246410

293079

248919

293571

16 client threads per client node

w/Coalescing

wo/Coalescing

128019

116200

128607

116577

 

123755

Ariel Weisberg
February 10, 2015, 4:04 PM

There was a request for a comparison with Thrift at low concurrency.

By default the server is enabling no delay going back to the client. The client is not enabling no delay going to the server by default. Coalescing doesn't do much if no delay isn't enabled. When I say I disabled coalescing I mean I disabled it on both client and server. If the thread writing to the socket wakes up and finds multiple messages it will coalesce them, but it will not wait to write and flush to the socket.

TL;DR Thrift is faster, more so a lower concurrency. I am going to put together a benchmark of just the IO frame works and try a few different things.

4 threads
With coalescing, no delay on server, no no delay on client
9873
9542

With no coalescing client and server, no no delay on server, no no delay on client
8018
8206

With no coalescing client and server, no delay on server, no delay on client
9890
9551

Thrift sync
12795
12602

Thrift HSHA
5802
5827

100 threads
With coalescing, no delay on server, no no delay on client
73524
71993

Thrift HSHA
66440
65699

Thrift sync
89914
89777

With no coalescing client and server, no no delay on server, no no delay on client
65376
66647

With no coalescing client and server, no delay on server, no delay on client
76798
77610

Olivier Michallat
March 20, 2015, 2:00 PM
Edited

Thanks. We're finally starting to work on Netty 4 integration in the driver, and realized that this ticket is almost mandatory since using writeAndFlush is inefficient with many small messages. I've included your patch on this pull request: https://github.com/datastax/java-driver/pull/301

We're going to look at making TCP_NODELAY configurable defaulting TCP_NODELAY to true as well.

Fixed

Assignee

Olivier Michallat

Reporter

Olivier Michallat

Labels

None

PM Priority

None

Affects versions

None

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Priority

Major
Configure