Leader

Tuesday, 13 August 2013

Alternative function: nanmean (multi-dimensional)

Download
nanmean3.m



Multi-dimensional means
This function upgrades the alternative NANMEAN function here: nanmean2.m). This version can takes the mean of data containing NaNs, along any specified dimension.

This version basically performs 3 steps; the data is premuted so that the dimension specified is moved to dimension 1 (rows). The mean is then peformed on each column in turn, while ignoring the NaNs, using the same process as in nanmean2.m. The resulting row of means is then permuted back to the dimension it came from.

There are other ways of doing this, however, this method provides a nice example of permuting and reshaping data in multidimensional matrixes, which can be tricky to understand.




Code run-through
Let's assume we have a 4x4x4x4 variable called data, which contains NaN values, and that we want to take an average along dimension 3.

>>m=nanmean3(data,3)

First the script checks the number of inputs. In this case there are two inputs "data" and "dim=3", but if there's was only one, it assumes dim=1 (rows).
if nargin == 1
    dim = 1;
end

The total number of dimensions is found using the NDIMS function.
dims=ndims(data);

If dim>1, the matrix is permuted so that the 'target' dimension (dim) is moved to dimension 1. This is done by rotating the dimensions up to and including the 'target' dimension over the next few lines of code:

dim_vec=1:dims;
Creates vector of the dimensions, for example, for our 4 dimensional matrix 

>>dim_vec =
[1 2 3 4].

target_dims=dim:-1:1;
Creates a new vector where the 'target' dimension (3) is first.

>>target_dims = 
[3 2 1]

This new vector is then used to replace first indexes of dim_vec, up to and including the 'target' dimension .

dim_vec(1:length(target_dims))=target_dims;

>>dim_vec = 
[3 2 1 4]

The permutation is performed using the new dim_vec.

r_data=permute(data,[dim_vec]);

The target dimension has now been rotated to the first dimension (and everything after it has been left alone). So now, the command r_data(:,1) returns the same data as data(1,1,:,1) would.

In order to perform the nanmean, a similar technique is used as in nanmean2.m. However, to simplify using the ISNAN function, r_data is linearised in to one column first. This isn't a problem because we know the length of each column in r_data, so we can cut it back up correctly later. In the previous version of the script a for loop was used to step along each column and take the average, in this version a for loop is used to grab a chunk of a linear vector (each consecutive chunk constituting a column of r_data), and the average is calculated in the same as namean2.m...

r_data_size=size(r_data);
Gets the size of r_data for later use. In this case
>>size(r_data) = 
[4, 4, 4, 4]


r_data_lin=r_data(:);
Creates a version of r_data that is 1xnumel(r_data) long; it's effectively a 1 column list of r_data's contents.

step=length(r_data(:,1));
ind=[0-(step-1), 0];

step (in this example step=4) gets the column length from r_data, which is used to advance ind in the for loop. ind contains the start and end indexes for the 'chunk' being averaged. It starts as [-3, 0] because step is added each time though through the loop (including the first time), making the the first chunk [-3, 0] + 3 = [1, 4]. So the first time through the loop r_data_lin(1:4) is used, the next time r_data_lin(5:8), and so on.

range=[1:length(r_data_lin)/length(r_data(:,1))];

range contains the number of operations that need to be performed in the for loop. This the length of r_data_lin divided by the length of the target dim (which is now in the columns, ie. length(r_data(:,1)). This is because one nanmean is required per column. i is used as an index to store the calculated means in to the correct place in the variable means, which is preallocated before the loop.

means=zeros(1,max(range));
for i=range
    ind=ind+step;
    col=r_data_lin(ind,1);
    m = mean(col(~isnan(col)));
    means(i) = m;
end

The for loop itself performs the nanmean by grabbing the chuck defined by ind (col=r_data_lin(ind,1);), and by taking the mean of that chunk, ignoring the values that are NaNs (m = mean(col(~isnan(col)))) (see nanmean2.m).

After all the means have been calculated, the variable means is reshaped using r_data_size, but ignoring the first value, which represents the number of rows in each column, as it's just been compressed by taking the mean (in this example, from 4 per column values to 1 value per column).
means=reshape(means,[1, r_data_size(2:end)]);

Finally, the means variable is permuted back to the same arrangement as the original data variable, using dim_vec.

if dim>1
    m=permute(means,dim_vec);
else
    m=means;
end

Final code
function m=nanmean3(data,dim)

if nargin == 1
    dim = 1;
end

dims=ndims(data);

if dim>1
    dim_vec=1:dims;
    target_dims=dim:-1:1;
    dim_vec(1:length(target_dims))=target_dims;
    
    r_data=permute(data,[dim_vec]);
else 
    r_data=data;
end

r_data_size=size(r_data);
r_data_lin=r_data(:);

step=length(r_data(:,1));
ind=[0-(step-1), 0];
range=[1:length(r_data_lin)/length(r_data(:,1))];
means=zeros(1,max(range));
for i= range
    ind=ind+step;
    col=r_data_lin(ind,1);
    m = mean(col(~isnan(col)));
    means(i) = m;
end
means=reshape(means,[1, r_data_size(2:end)]);

if dim>1
    m=permute(means,dim_vec);
else
    m=means;
end

No comments:

AdSense