Buffer size and sorting data

In previous articles (Buffer size and CPU time, and Buffer size and reading data) we’ve looked at the performance gains that can be realised by increasing the buffer size and, for writing data, the number of buffers used.

Reading some of the literature, for example Windows Server Configuration and Tuning for Optimal Server Performance from SAS®, one area we haven’t tested is the impact of buffer size on sorting data.

In some tests, larger (greater than 65532 bytes) BUFSIZE increased elapsed time during memory intensive tasks, such as sorting.

Initial “feel” is that larger buffers do make sorting slower, but we need to gather evidence and, if there is performance degradation, see if it can be mitigated by tweaking the SORTSIZE setting.

Initial findings, based on 10m observations, and a single run of this code:

options bufsize=&buffer sortsize=&sort;
data temp(keep=i label);
do i = 1 to 10**&obsexp;
label = put(int(ranuni(-1)*10**&obsexp), z&obsexp..);
proc sort
data = temp;
by label;

The results surprised me.
The effect on CPU time seems to be largely independent of buffer size, what I find counter-intuitive is that the speed decrease as sortsize increases.

I’ll update this page once further research has been carried out, for example with larger numbers of observations and with more repetitions.


Previous: Buffer size and reading data  

Technorati Tags: , , ,