Leader

Saturday, 6 April 2013

Alternative function: nanmean

Updated version (multi-dimensional): nanmean3.m

Download
nanmean2.m

Averaging that, but not that
One of Matlab's features is it’s ability to deal with NaN (Not-A-Number) values in data. But if you don’t have the statistics toolbox, it lacks the essential function NANMEAN, which simply returns the mean of a set of values containing a NaN. Using the MEAN function on data containing a NaN returns a NaN as the mean; because you’re supposed to buy the statistics toolbox, you cheapskate.

Anyway, try it.

>>excitingdata = [1; 2; 3; 4; NaN]
excitingdata =
1
2
3
4
NaN

You can probably calculate the mean yourself, but the MEAN function can’t.


>>mean(excitingdata)
ans = NaN


So we have to do it the long winded way. We can use the ISNAN function to create an index of the data excluding the N.


>>gooddata_index = ~isnan(excitingdata)


ISNAN returns a 1 if the “number” at a location is a NaN and a 0 if it’s a real number. So using the logical operator “not” or ‘~’ before it asks the opposite - is the number at a location not a NaN? When this is true, a 1 is returned, and when it isn’t (ie. it is a NaN), a 0 is returned. For example:


>>gooddata_index = ~isnan(excitingdata)

gooddata_index =

1

1

1
1
0

We can now use this index to extract the non-NaN numbers from excitingdata; where the index is a 1, the number at that location is used, when the index is a 0, it isn’t.

>>gooddata = excitingdata(gooddata_index)
ans =
1
2
3
4

The mean of this data can now be calculated using MEAN.

>>mean(gooddata)
ans=2.5000

The above four sections of code can be combined in to one line as follows:

>>mean(excitingdata(~isnan(excitingdata)))
ans=2.5000

Remember ~isnan(excitingdata) is the index of non-NaN values, which is inserted straight into excitingdata to extract just the non-NaN values, which in turn are passed straight to the MEAN function.

Note that the above steps will not work for a two-dimensional matrix, but the NANMEAN2 function will.


Code run-through
The NANMEAN2 function above follows the explanation to calculate the mean but adds flexibility to deal with matrices. It accepts two inputs; data should contain the data to calculate the mean for, and dim can be 1 or 2, which tells NANMEAN2 which dimension to take the mean along.

The first bit of code uses the NARGIN function to check the number of input arguments. If dim is not specified (ie. when the number of input arguments = 1) , it defaults to dim = 1, which means the mean is take for each column:


if nargin == 1
    dim = 1;
end

The calculation is then performed in the next section of code:

if dim == 1 %(column)
    dim_length = length(data(1,:));
    means = zeros(1,dim_length);
    for i = 1:dim_length
        col = data(:,i);
        m = mean(col(~isnan(col)));
        means(i) = m;
    end
elseif dim == 2 %(rows)
    dim_length = length(data(:,1));
    means = zeros(dim_length,1);
    for i = 1:dim_length
        col = data(i,:);
        m = mean(col(~isnan(col)));
        means(i) = m;
    end
else
    disp('Error')
end

The first if statement checks the value of dim. If dim = 2, it performs the code between the elseif and else statements, if dim doesn't equal 1 or 2, it displays an error and doesn't do anything. If dim = 1 it executes the code in the first part of the if statement:

    dim_length = length(data(1,:));
    means = zeros(1,dim_length);
    for i = 1:dim_length
        col = data(:,i);
        m = mean(col(~isnan(col)));
        means(i) = m;
    end

In this section of code dim_length = length(data(1,:)); calculates the number of columns in data. The next line, means = zeros(1,dim_length); then preallocates an output vector with zeros ready to store the calculated mean for each column. Preallocating means the output vector won't grow inside the for loop (this is good) and also means that we can be specific with where each value is put in each loop of the for loop, rather than just appending to the end each time (this is good too).

The actual calculation of the mean is performed in the for loop, once for each column. for i = 1:dim_length creates a vector of values for the for loop to use. col = data(:,i); extracts a single column of data to use in this iteration of the for loop. The first value of i is 1, so the first time through the for loop col = data(:,1) extracts the first column. The second time through the loop col = data(:,2) extracts the second column, and so on.

m = mean(col(~isnan(col))); gets the non-NaN values from the column  and takes the mean (as per the explanation).  means(i) = m; then stores the mean in the output vector (which is a row containing a separate mean for each column), at index i, 

The second part of the main if statement does exactly the same as above, but for rows:


elseif dim == 2 %(rows)
    dim_length = length(data(:,1));
    means = zeros(dim_length,1);
    for i = 1:dim_length
        col = data(i,:);
        m = mean(col(~isnan(col)));
        means(i) = m;
    end

Note the differences in the expressions dim_length = length(data(:,1));, means = zeros(dim_length,1); and col = data(i,:);. Everything is done along the row, dimension.

This function won't work for a greater than 2D matrices, but can be expanded to do so, if needed.

Final code
function means = nanmean2(data,dim)

if nargin == 1
    dim = 1;
end

if dim == 1 %(columns)
    dim_length = length(data(1,:));
    means = zeros(1,dim_length);
    for i = 1:dim_length
        col = data(:,i);
        m = mean(col(~isnan(col)));
        means(i) = m;
    end
elseif dim == 2 %(rows)
    dim_length = length(data(:,1));
    means = zeros(dim_length,1);
    for i = 1:dim_length
        col = data(i,:);
        m = mean(col(~isnan(col)));
        means(i) = m;
    end
else
    disp('Error')
end

3 comments:

Laura Jean said...

Thanks for the function! In the 3D version,
LIne 33 needs to change to something like:

col=r_data_lin(ind(1):ind(2));

Thanks again,
Laura

Extramarks said...
This comment has been removed by the author.
Extramarks said...

Thanks a lot man!
~ Cheapskate

AdSense