Leader

Friday 27 February 2015

Simple model fitting: Solar panel profit

Here's an example of how to use Matlab to create a basic predictive model. It'll cover;
  • Choosing an appropriate model based on our current understanding of the system, and what reasonable assumptions we can make about it
  • Using curve fitting to fit the model to the data we already have to refine the coefficients for the model
  • Qualitative and basic quantitative (but not statistical) analysis of the model, and the assumptions made.

The data used will be real income data for a 2.8 kWh solar panel system located in the UK. We have three years worth of data - let's work out how much we can expect to earn over the 25 year period of the original Feed-in Tariff scheme.

The first thing to understand is that all models require assumptions to be made. There has to be logical reason underlying these assumptions, or the model will lack predictive power. Matlab can not do this for you - it can certainly help - but brute force can't replace sensible reasoning.





So what specific things do we need to assume to be true while modelling solar panel income? Many factors, obviously, but for simplicities sake, let's consider the main two.

  1. Solar panel income varies with the weather.
  2. Solar panel efficiency declines with time

We can consider these two points as essentially true. But how do we use them the model the data? We need to make assumptions about them...

  1. We don't know the quantitative relationship between season and income, nor what the weather will be like in the future. Let's assume that the data we already have is an accurate enough, average, representation of the weather that can also predict the seasonal changes for each year in the future.
  2. We don't know how solar panel efficiency declines (it might be possible to look this up), but let's just try and make a reasonable assumption. The performance decline could be linear? Or Exponential? Or something else?

Let's look at the data and see what we can make of it so far (each data point = 1/4 year).

k=[390 400 200 190 380 410 200 150 340 390 180 160];  
k=k'; 
step=1/4;
x=0+step : step : 3; % 0-3 years in 1/4 steps
x=x'; 

scatter(x,k) 
ylabel('Income, £')
xlabel('Year')
axis([min(x)-step, max(x)+step, 0, 500])



As a general rule, it's best to try modelling a system with as few coefficients as possible (think Occam's razor), it's unnecessary and dangerous to add unneeded complexity. With this in mind, the seasonal variation is clear from the data - a sine wave will fit that quite nicely (whereas a linear or exponential polynomial clearly won't do it justice). The rate of efficiency decline over this time period less clear, though. Possibly it's not exponential, as it doesn't appear to drop off very quickly. Initially, with lack of evidence to the contrary, we should first try and use the simplest term, a linear polynomial, to model the decay.

Generating a sine wave
So we've decided to model our data with one sine wave and a linear polynomial. Sine waves are defined by three coefficients, amplitude, frequency and phase. To plot one in Matlab, first we need to generate a vector of points to calculate the wave over:

n = 20; 
t = 0 : 1/n : 1-1/n 

This will generate a vector from 0 to almost 1 (-1 step) in steps of 1/n, where n will control the number of points (resolution) of the y data calculated. Next, to calculate the sine wave:

amp = 1;
freq = 2;
phase = deg2rad(130); % Matlab uses rads by default, I like degrees
y = amp * sin(2*pi*t*freq-phase);
plot(t,y) 

Increasing n above would increase the resolution and smoothness of the curve.

Making the model
Now we need to do two things. Create the equation for our model and find coefficients that fit the data we have so far.

Let's combine the linear polynomial and sine equations to make the model:

 y = yearAmp*sine(2*pi*x*yearFreq-yearPhase)  ...
+ periodAmp*x+B

Here the sine wave will fit the income oscillation over each year. The periodAmp*x term of the linear polynomial will control the rate of decline and the +B term will shift the whole curve on y.

Let's define the equation in Matlab, there's special syntax for this .

F=@(yearAmp,yearFreq,yearPhase, ... 
periodAmp, B, x) ...
yearAmp*sin(2*pi*x*yearFreq+yearPhase) ...
+ periodAmp*x + B;

Basically, the equation is stored in a new "anonymous" function (an anonymous function is a function stored in a variable in the workspace rather than in a .m file.). The @ symbol indicates we're defining a function and its inputs are defined next in the brackets (a,b,c, ... etc). Note we need x, the independent variable, to be the last input as we're going to use this function with the FIT function. The rest is the equation as above (without the y = part). ie. NewFunction=@(input1, unput2) function code.

So next, we need to try to fit this model (stored in the variable F) to our data. Rather than let Matlab do this blindly, we can estimate the parameters first - this is an important step, with 4 coefficients Matlab has a huge parameter space to explore, it'll come down to chance if it can find a good fit in a limited number of iterations (as the first coefficient values will be guessed randomly by Matlab if they're not specified).

We can use the data we have to guide us. In one year, income seems to vary by about £100 between the best and worst months. One cycle of this wave should take 1 year. The first we have are high, so the phase of the wave isn't zero.

yearAmp = 100; 
yearFreq = 1; 
yearPhase = deg2rad(-90);

For the entire period, things are more speculative. Let's hazard a guess at how much the efficiency will reduce income by the end of the 25 years... ~50%? ~£200?.

periodAmp = -200/25;

Finally, the last term, B. We need this to place the data in the absolute correct place on the y axis. A good estimate is the average of the first years income.

B = mean(k(1:4));

Now let's use curve fitting to fit this model to the data (in k) that we have - pass the x and y data to the FIT function, along with our model F and the estimates of our start points for each coefficient.

fObj = fit(x, k, F, ... 
'StartPoint', [yearAmp, yearFreq, yearPhase, periodAmp, B]); 

This creates a fit object that contains our model and Matlab's refined estimate of all 5 coefficients, note that the startpoints need to be specified in the order that we defined in the function .
Let's plot this with the data and check it looks sensible.

close all
hPlot=plot(fObj)
hold on
scatter(x,k);
ylabel('Income, £') 
xlabel('Year')
axis([min(x)-step, max(x)+step, 0, 500])




And let's take a look at the coefficients Matlab has found for us, these are stored in fObj.coefficientName

disp(fObj)

General model:
     fObj(x) = yearAmp*sin(2*pi*x*yearFreq+yearPhase)+periodAmp*x+B
     Coefficients (with 95% confidence bounds):
       yearAmp =         141  (128.5, 153.5)
       yearFreq =      0.9917  (0.9755, 1.008)
       yearPhase =     -0.8441  (-1.026, -0.662)
       periodAmp =      -13.12  (-23.34, -2.89)
       B =       303.6  (285, 322.3)

Note that these value output as a vectors using the CONFINT and COEFFVALUES functions, which makes accessing the values with numerical indexing possible.

coeffvalues(fObj)

ans =

  140.9957    0.9917   -0.8441  -13.1165  303.6258

confint(fObj)

ans =

  128.4937    0.9755   -1.0261  -23.3433  284.9791
  153.4977    1.0078   -0.6620   -2.8897  322.2726


Anyway, all look fairly similar to our "reasonable" estimations, which is good. And even without any statistical analysis it's clear that the model explains most of the variation in the data we have so far.

Predicting the future
Next we can use this model to predict the future. Let's create a new x axis for the whole 25 years we're interested in, to make the curve smoother we can generate at more than 4 points per year. can do this with more than 4 points per year, for a smoother curve (let's add *10).

res = 10;
step=1/(4*res);
x2=0+1/4 : step : 25;
x2=x2';

Now let's evaluate our model (remember, it's in the fit object, fObj, with the coefficients). Pass it to FEVAL with the x axis.

y = feval(fObj,x2) 

Done. Future predicted.

But is it accurate? As a basic quantitative test of the predictive power of the model, let's calculate the cumulative income over 25 years (bearing in mind there are now a lot more points on the xaxis, and we only want to take the cumulative sum of 4 points per year).

y2 = cumsum(y) / numel(y)*4*25

Now let's plot everything and see what it looks all like.

close all 
subplot(2,1,1), 
hPlot=plot(x2,y)
ylabel('Income, £') 
xlabel('Year')
hold on
scatter(x(1:numel(k)),k)
subplot(2,1,2), 
hPlot=plot(x2,y2)
ylabel('Cumulative income, £')
xlabel('Year')





Testing the assumptions
So how good is our model? Well the final cumulative income over 25 years looks a little low at around £14,000 - looking at cumulative income curve, it oscillates towards the end! That's not supposed to happen - it's because for later years, income becomes negative in the months with poorer weather. We're only considering income in this mode, which has a minimum of 0 in bad weather; it appears one of our assumptions is violating this rule.

The linear term modelling the decay lacks the power to describe the complexity in the system accurately, so let's try adding a coefficient, periodRate (ie. rate of change of periodAmp), to make the polynomial exponential.

y = yearAmp*sine(2*pi*x*yearFreq-yearPhase)  ...
+ periodAmp*x^periodRate+B


F=@(yearAmp,yearFreq,yearPhase, ... 
periodAmp,periodRate,B, x) ...
yearAmp*sin(2*pi*x*yearFreq+yearPhase) ...
+ periodAmp*x.^periodRate + B;

periodRate=1;

 fObj = fit(x, k, F, ... 
'StartPoint', [yearAmp,yearFreq,yearPhase, ... 
periodAmp,periodRate,B]); 

res = 10;
step=1/(4*res);
x2=0+1/4 : step : 25;
x2=x2';
y = feval(fObj,x2) 

y2 = cumsum(y) / numel(y)*4*25

close all 
subplot(2,1,1), 
hPlot(1)=plot(x2,y)
ylabel('Income, £') 
xlabel('Year')
hold on
scatter(x(1:numel(k)),k)
subplot(2,1,2), 
hPlot(2)=plot(x2,y2)
ylabel('Cumulative income, £')
xlabel('Year')



disp(fObj)

     General model:
     fObj(x) = yearAmp*sin(2*pi*x*yearFreq+yearPhase)+periodAmp*x.^periodRate+B
     Coefficients (with 95% confidence bounds):
       yearAmp =       127.3  (105.9, 148.7)
       yearFreq =      0.9936  (0.961, 1.026)
       yearPhase =     -0.8754  (-1.253, -0.4978)
       periodAmp =   1.551e+04  (-3.507e+07, 3.511e+07)
       periodRate =   -0.001293  (-2.926, 2.923)

       B =  -1.522e+04  (-3.51e+07, 3.507e+07)

That's better. The cumulative income is around £25,000 after 25 years, which is reasonable. We're still a bit limited with our prediction about the efficiency decay, but we can't overcome this problem without more data. 25 years of data should provide the best prediction for the cumulative income after 25 years...

Other things to explore
  • Try changing a predicted coefficient to something outlandish and refitting to see why we need to make reasonable estimates before fitting.
  • Try refining the fits further by supplying the fit with upper and lower limits based on known physical constraints, along with the start points.
  • How can we be sure we've really selected the best model? What statistical tests could we apply?


No comments:

AdSense