Implementation of the SPlus coding for the FLDA procedure
==========================================================

1) INSTALLATION
===============+

Click on the SPlus code link and save the file in your SPlus
directory. Start up Splus and type 

source("filename")

where filename is replaced by the name you saved the flda code under. This
should load in 11 functions plus one test data set as described below.


2) THE FUNCTIONS
=================

The FLDA algorithm is fit using 10 functions :

* fldafit : This is the main function. It first initializes the parameters
by calling fldainit. It then enters a loop which iteratively performs the
M step (by calling Mstep), the E step (by calling Estep) and calculating
the new observed log likelihood (by calling calcobslike). It exits this
loop when the likelihood has converged. Finally it imposes the constraint
(7) from the paper on the parameters by calling constraints.

* fldainit : This function initializes the parameters lambda.zero, Lambda,
alpha, Theta, sigma and D. It also provides initial estimates of the
expected value of gamma_ij and gamma_ij * gamma_ij^T. This initialization
is achieved by projecting the observed data onto spline bases and using the
estimated spline coefficients to estimate the various parameters. Finally
the function produces the spline basis matrix S_ij for each curve.

* Mstep : This function performs the M step where the parameters are
calculated given current expected values for the gammas. This is achieved
by calling step1 (which calculates sigma and D) and step2 (which
calculates lambda.zero, Lambda, alpha and Theta).

* step1 : This function calculates sigma and D given current estimates for
the other parameters and the gammas. 

* step2 : This function calculates lambda.zero, Lambda, alpha and Theta
given current estimates for the other parameters and the gammas. It
achieves this by performing a loop where first Theta and lambda.zero are
estimated assuming alpha and Lambda are fixed (by calling step2a) and then
alpha and Lambda are estimated assuming Theta and lambda.zero are fixed
(by calling step2b). This loop continues until all parameters have
converged.

* step2a : This function calculates Theta and Lambda.zero assuming all other
parameters are fixed. Theta is calculated by holding all but the jth
column constant, maximizing (using least squares) the jth column and
iterating through all columns until there is no further change.

* step2b : This function calculates alpha and Lambda assuming all other
parameters are fixed. Again a loop is performed where first alpha is fit
assuming Lambda fixed and then Lambda is fit assuming alpha fixed. In
calculating Lambda a secondary loop is used in a similar manner to that
for fitting Theta.

* Estep : This function performs the E step where the expected value of 
gamma_ij and gamma_ij * gamma_ij^T is calculated, given the current
estimated parameters, for use in the M step.

* calcobslike : This function calculates the observed log likelihood (up
to constants) given current parameter estimates.

* constraints : This function enforces the constraint (7) from the paper
on the parameters. This means that the alphas can be interpreted as the
number of standard deviations that the groups are apart etc.

In addition to the 10 fitting functions there is a prediction function.

* fldapred : This function produces the alpha hats used to provide a low
dimensional pictorial representation of each curve. It also produces a
class prediction for each curve. It takes as input the fit from fldafit
and the original data (for training error rates) or a new data set (for
test predictions).


3) IMPLEMENTATION
==================

To fit the FLDA algorithm after installing the functions one should call
fldafit. 

fldafit - Required Arguments

The only required argument is a list containing the data. This list must
have the following elements :

* y : A vector of the raw data for each curve. The data should be ordered
by class so observations from the first class should go first etc. 

* curve : A vector of the same length as y denoting which curve each
observation is from. For example if the first curve has 4 observations the
first four elements of this vector should be 1's etc. The curves are
numbered from 1 to m_1 + ... + m_K where m_i is the number of elements in
the ith class and K is the number of classes.

* timeindex : Also of the same length as y. This vector records the time
that each observation was observed. This is done relative to the grid you
choose (see optional arguments below). For example if you use a grid of
101 equally spaced time points between age 10 years and 20 years and the
first observation for the first curve is at age 12 years the first element
of timeindex would be 21 etc.

* class : A vector of length m_1 + ... + m_K. The first m_1 elements
should be 1's etc through to the last m_K elements which should be K's.

For example suppose you have two classes each with 2 curves with the
following data.

Class 1 Curve 1
Observations at times 12.2, 13.6 and 18.0 with values of 1.3, 1.5, 1.9

Class 1 Curve 2
Observations at times 10.0 and 20.0 with values of 4.5, 7.0

Class 2 Curve 1
Observations at times 11.0, 13.5 and 19.1 with values of 1.3, 6.0, 1.6

Class 2 Curve 2
Observations at times 13.0, 13.3, 14.7, 15.2 with values of 1.0, 5.6, 5.0, 5.0

Then y = 1.3, 1.5, 1.9, 4.5, 7.0, 1.3, 6.0, 1.6, 1.0, 5.6, 5.0, 5.0
curve = 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4
timeindex = 23, 37, 81, 1, 101, 11, 36, 92, 31, 34, 48, 53 (assuming the
above grid)
and class = 1, 1, 2, 2


fldafit - Optional Arguments

q : This is the dimension of the spline basis. The spline basis will be a
natural cubic spline with q-2 equally spaced knots. Default is q=5.

h : This is the dimension of alpha. The default is h=1. All the results in
the paper use h=1 because interpretation is easier. For h>1 the covariance
structures are non-diagonal so visual interpretation of differences
between the alpha hats is difficult. In any case h must be less than K.

p : This is the rank constraint on the gammas as outlined in section
5.1. If p=q then a full rank matrix is fit. For p<q there is a rank
constraint. This can produce superior fits for sparsely observed
curves. Some experimentation may be required to find the best value for
p. Default is p=1.

tol : This gives the minimum relative change before the various loops in
the functions will exit. The default is 0.001 i.e. the criteria (log
likelihood or sum of squares) must change be less than 0.1% of its
previous value before the loop will exit.

maxit : The maximum number of iterations that fldafit will perform. If the
parameters have not converged by this point the algorithm will stop and
return the current estimates. Default = 50.

pert : This has to do with the initialization. fldainit tries to fit a
spline to each individual curve based on that curves observations so that
the parameters can be initialized in a sensible way. However, if any of
the curves have not been observed at enough points it will not be possible
to fit a spline to that curve. pert is a small adjustment (similar to
performing ridge regression) which shrinks the estimates towards zero and
allows even sparse curves to be fit. The default is pert = 0.1. If all
curves have been observed at many time points then pert can be set to
zero.

grid : This is a fine grid of time points that should cover all time
periods that curves have been observed at. For example if curves have been
measured between ages 10 and 20 in monthly increments. Then grid should
consist of 121 numbers equally spaced between 10 and 20 eg seq(10, 20,
length = 121). This is used in conjunction with the timeindex
vector. Default is grid = seq(0.01, 1, length = 100).


fldafit - Output

The output from this function will be a list of the following elements

parameters : A list of all the fitted parameters i.e. lambda.zero, alpha,
Lambda, Theta, sigma (note sigma is actually sigma squared) and D.

vars : Expected values of gamma and gamma * gamma^T.

S : All the spline basis matrices combined together.

FullS : The spline basis matrix over the entire grid.

like : The final log likelihood (up to constants).


fldapred - Required Arguments

fit : This is the fit from fldafit.

data : This is either your original data if you want to calculate training
error rates and produce alpha hats etc for your curves. Or a new data set
if you want to produce predictions for new curves etc. For new data y,
curve and timeindex should be included but obviously class is not required.


fldapred - Optional Arguments

None


fldapred - Output

alpha.hat : These are the low dimensional representations of each
curve. They can be directly plotted to provide a visual guide as to class
separation..

Calpha : The Covariance matrix for each alpha.hat. If h=1 so the
alpha.hats are scalars taking the square root of this will give standard
error estimates for the alpha.hats. The plots from the paper can be
reproduced by plotting alpha.hat on the X axis and sqr(Calpha) on the
Y-axis.

class.pred : The FLDA class prediction for each curve.

distance : A matrix of the distance, relative to the inverse covariance of
the alpha hats, of alpha.hat from the alpha for each class. Each curve is
classified to the class with lowest distance.
  

4) TESTING THE FUNCTIONS OUT
=============================

Included with the above mentioned functions is a test data set to make
sure everything is working OK. After installing the functions type the
following command line.

test.fit <- fldafit(testdata, p = 2, pert = 0.1, maxit=7)  

If everything is working OK you should get the following output

[1] "Iteration  1 , LogLikelihood = 371.494"
[1] "Iteration  2 , LogLikelihood = 532.435"
[1] "Iteration  3 , LogLikelihood = 669.016"
[1] "Iteration  4 , LogLikelihood = 748.791"
[1] "Iteration  5 , LogLikelihood = 782.725"
[1] "Iteration  6 , LogLikelihood = 802.381"
[1] "Iteration  7 , LogLikelihood = 819.077"

Try typing 

test.fit$parameters$alpha

and you should get

          [,1]
[1,] -27.07486
[2,] -14.31322
[3,]  41.38809

To produce the alpha hats used for 1 and 2 dimensional representations of
the curves as well as class predictions and estimated variances of the
alpha hats we use the function test.pred. Try typing 

test.pred <- fldapred(test.fit,testdata)

and then 

test.pred$class.pred

You should get the following output

[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3

These correspond to the original three classes so the training error is
zero (this is an easy data set).

You should now be ready to start on your own data.

Note that this code was written using Splus 5 and may experience problems
on older versions.

Please email any bugs you discover to gareth@usc.edu. I would also
appreciate hearing about successful implementations of the FLDA procedure.