2003 Volume 3 Issue 1 Pages 30-45
Among samples analyzed for gene expression, samples incorrectly labeled or identified as likely contaminated are those whose expression patterns are markedly different. Such samples should be designated outliers, since they can exert a negative effect on the selection of informative genes for sample classification. We developed a method based on Akaike's Information Criterion (AIC) to detect such outliers. Our method is advantageous because it is free from a significance level and it facilitates objective decision-making. We applied our method to analyze the public microarray data of Alon et al. (1999) and found that some of the detected outlying samples coincided with samples considered as likely contaminated. Application of our method produced a higher discrimination level for informative genes in tumor- and normal tissues and, upon exclusion of the outliers, yielded higher classification accuracy. The detection of outlying samples prior to sample classification is essential, and the method described here serves as a valuable check.