Data Science: Similarity Least Squares (SLSTM) + physics + Statistical Design of Experiment (DOE)

            A New Paradigm for Analysis and Management of Complexity

horizontal rule

HCV project*
Breast cancer
Biliary Cirrhosis
MicroArray QC

Welcome to the home page of SLSTM technology created by Data Scientist James Minor
 See Impact$$ for sample of documented success of SLSTM projects.

"Better clinical prognostics mean effective healthcare at lower cost!", SLSguy.

"Quality is the survival probability of a product or service (process) in a competitive market", SLSguy, 1994
"Quality supports business, but statistics drives quality!", SLSguy.

*completed sites. Big news in Diabetes site. Publications noted in HCV clinical and microarray QC

SLSTM was originally called SMILES in 1970's 
Please refresh your View (F5 key) to update all links on web pages.

horizontal rule

SLSTM concept, a combination of physics and statistics 

In a data matrix consider each row (profile or pattern) as a physical object.
Each column becomes a property of the objects. Two rows (objects) are similar if their property values are "near" to each other. Statistical inference based on object similarity is the original SLSTM concept. The concept was first publicly referenced in support of environmental research: 
Miller, C., Filkin, D., Owens, A., Steed J., and Jesson, P. (1981), “Two-D Model of Stratospheric Chemistry and Transport”, Journal of Geophysical Research, 86 (C12), 12039-12065

Design of Experiment (DOE)

A statistical model is trained on data. DOE assures the training data produces a stable model. DOE specifies a minimal set of critical locations in the functional domain for a stable model. Only a stable model is suitable for validation.

A DOE reference for non-statisticians: Box, Hunter, and Hunter and/or the Echip system of Bob Wheeler.

Support Points, a consequence of the SLSTM concept

Support points are a set of key locations in the function's domain that enable an SLSTM  model to adequately represent functionality given good data coverage. "Super" support points (super vectors) can do this task with minimal locations and minimal data coverage.  This reduces model complexity and enhances all forms of prediction including extrapolation and sparse interpolation.

Validation, model performance in real time

What is the true error rate of the model? 
The model must be stable. Model instability compromises realistic validation.

What is the best error rate possible given the set of data variables?
The SLSTM system appears to approach this limit.


Back to Top

horizontal rule

Copyright of James M Minor, July 4, 2004.
For problems or questions regarding this web contact email
Last updated: June 20, 2013.