Add means to measure request sizes

Description

https://datastax-oss.atlassian.net/browse/PYTHON-284

Lots of new users to Cassandra struggle with understanding a given request's size, they either incorrectly equate their source information size as being equivalent (text file or other database). So my suggestions are as follows
Provide a metrics summary from the session object where we store average request size over say the last 5 minutes. This will probably be suitable for the proof of concept or getting to design your data model phase. Likewise it'll help DBAs diagnose client side problems more readily.
On a given statement give the ability to retrieve 'parameter size' this would not give them completely accurate numbers, but it maybe enough to help aide in data modeling as it could become the unit of measure for their performance numbers
Log WARN requests over a given size, and make that given size
numbers 1 and 3 would be aided by implementation of query listeners discussed for Python Extension Support by Jon Haddad already in internal documents

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
April 13, 2015, 8:22 AM

By request size do you refer to the size of the network frame, or the storage size server-side?

Ryan Svihla
April 13, 2015, 4:15 PM

Definitely not on the server side, totally different problem set (and with different variables)

Ultimately the core desire here is to provide some way for end users to grasp that transactions per second will not be the same for all sizes of writes and reads, it's a common enough issue for someone new to Cassandra to grotesquely misunderstand how large their requests are (and leads to upset users who are wondering why they can get 20k TPS per node with only Cassandra stress but only 1k TPS per node with their 2MB sized writes that they believe to be much smaller).

I believe ultimately the desire here is to have a common unit of measure that demonstrates a cost when running session.execute, and if we can solve that question well whatever the unit of measure, I think we'll be much further along in having users that understand what they're doing as well as having a full grasp in the costs of certain data model and query approaches in relation to others, and why it maybe desirable to shrink their giant write model.

Fixed

Assignee

Unassigned

Reporter

Ryan Svihla

Labels

None

PM Priority

A

Affects versions

None

Fix versions

Pull Request

None

Doc Impact

None

Size

None

External issue ID

None

External issue ID

None

Sprint

Java 4.x

Priority

Major
Configure