Implementation of the SPlus coding for the FLDA procedure ========================================================== 1) INSTALLATION ===============+ Click on the SPlus code link and save the file in your SPlus directory. Start up Splus and type source("filename") where filename is replaced by the name you saved the flda code under. This should load in 11 functions plus one test data set as described below. 2) THE FUNCTIONS ================= The FLDA algorithm is fit using 10 functions : * fldafit : This is the main function. It first initializes the parameters by calling fldainit. It then enters a loop which iteratively performs the M step (by calling Mstep), the E step (by calling Estep) and calculating the new observed log likelihood (by calling calcobslike). It exits this loop when the likelihood has converged. Finally it imposes the constraint (7) from the paper on the parameters by calling constraints. * fldainit : This function initializes the parameters lambda.zero, Lambda, alpha, Theta, sigma and D. It also provides initial estimates of the expected value of gamma_ij and gamma_ij * gamma_ij^T. This initialization is achieved by projecting the observed data onto spline bases and using the estimated spline coefficients to estimate the various parameters. Finally the function produces the spline basis matrix S_ij for each curve. * Mstep : This function performs the M step where the parameters are calculated given current expected values for the gammas. This is achieved by calling step1 (which calculates sigma and D) and step2 (which calculates lambda.zero, Lambda, alpha and Theta). * step1 : This function calculates sigma and D given current estimates for the other parameters and the gammas. * step2 : This function calculates lambda.zero, Lambda, alpha and Theta given current estimates for the other parameters and the gammas. It achieves this by performing a loop where first Theta and lambda.zero are estimated assuming alpha and Lambda are fixed (by calling step2a) and then alpha and Lambda are estimated assuming Theta and lambda.zero are fixed (by calling step2b). This loop continues until all parameters have converged. * step2a : This function calculates Theta and Lambda.zero assuming all other parameters are fixed. Theta is calculated by holding all but the jth column constant, maximizing (using least squares) the jth column and iterating through all columns until there is no further change. * step2b : This function calculates alpha and Lambda assuming all other parameters are fixed. Again a loop is performed where first alpha is fit assuming Lambda fixed and then Lambda is fit assuming alpha fixed. In calculating Lambda a secondary loop is used in a similar manner to that for fitting Theta. * Estep : This function performs the E step where the expected value of gamma_ij and gamma_ij * gamma_ij^T is calculated, given the current estimated parameters, for use in the M step. * calcobslike : This function calculates the observed log likelihood (up to constants) given current parameter estimates. * constraints : This function enforces the constraint (7) from the paper on the parameters. This means that the alphas can be interpreted as the number of standard deviations that the groups are apart etc. In addition to the 10 fitting functions there is a prediction function. * fldapred : This function produces the alpha hats used to provide a low dimensional pictorial representation of each curve. It also produces a class prediction for each curve. It takes as input the fit from fldafit and the original data (for training error rates) or a new data set (for test predictions). 3) IMPLEMENTATION ================== To fit the FLDA algorithm after installing the functions one should call fldafit. fldafit - Required Arguments The only required argument is a list containing the data. This list must have the following elements : * y : A vector of the raw data for each curve. The data should be ordered by class so observations from the first class should go first etc. * curve : A vector of the same length as y denoting which curve each observation is from. For example if the first curve has 4 observations the first four elements of this vector should be 1's etc. The curves are numbered from 1 to m_1 + ... + m_K where m_i is the number of elements in the ith class and K is the number of classes. * timeindex : Also of the same length as y. This vector records the time that each observation was observed. This is done relative to the grid you choose (see optional arguments below). For example if you use a grid of 101 equally spaced time points between age 10 years and 20 years and the first observation for the first curve is at age 12 years the first element of timeindex would be 21 etc. * class : A vector of length m_1 + ... + m_K. The first m_1 elements should be 1's etc through to the last m_K elements which should be K's. For example suppose you have two classes each with 2 curves with the following data. Class 1 Curve 1 Observations at times 12.2, 13.6 and 18.0 with values of 1.3, 1.5, 1.9 Class 1 Curve 2 Observations at times 10.0 and 20.0 with values of 4.5, 7.0 Class 2 Curve 1 Observations at times 11.0, 13.5 and 19.1 with values of 1.3, 6.0, 1.6 Class 2 Curve 2 Observations at times 13.0, 13.3, 14.7, 15.2 with values of 1.0, 5.6, 5.0, 5.0 Then y = 1.3, 1.5, 1.9, 4.5, 7.0, 1.3, 6.0, 1.6, 1.0, 5.6, 5.0, 5.0 curve = 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 timeindex = 23, 37, 81, 1, 101, 11, 36, 92, 31, 34, 48, 53 (assuming the above grid) and class = 1, 1, 2, 2 fldafit - Optional Arguments q : This is the dimension of the spline basis. The spline basis will be a natural cubic spline with q-2 equally spaced knots. Default is q=5. h : This is the dimension of alpha. The default is h=1. All the results in the paper use h=1 because interpretation is easier. For h>1 the covariance structures are non-diagonal so visual interpretation of differences between the alpha hats is difficult. In any case h must be less than K. p : This is the rank constraint on the gammas as outlined in section 5.1. If p=q then a full rank matrix is fit. For p