Class Cluster

java.lang.Object
  |
  +--Cluster

public class Cluster
extends java.lang.Object

Class Cluster.java - Cluster the current ProtPlot data. This work was produced by Peter Lemkin of the National Cancer Institute, an agency of the United States Government and Djamel Medjahed (SAIC-Frederick). As a work of the United States Government there is no associated copyright. It is offered as open source software under the Mozilla Public License (version 1.1) subject to the limitations noted in the accompanying LEGAL file.

Version:
$Date: 2002/10/14 20:17:17 $ $Revision: $
Author:
P. Lemkin (NCI), Djamel Medjahed (SAIC), NCI-Frederick, Frederick, MD
See Also:
ProtPlot Home

This work was derived from MAExplorer under the Mozilla 1.1 Open Source Public License by Peter Lemkin of the National Cancer Institute, an agency of the United States Government subject to the limitations noted in the accompanying LEGAL file. See licence info on http://maexplorer.sourceforge.org/


Field Summary
static boolean allowClusterReportUpdatesFlag
          flag set if clustering is enabled for the current cluster method
static int CLUST_KMEANS_PROTEINS_METHOD
          K-means proteins clustering method
static int CLUST_SIMILAR_PROTEINS_METHOD
          Similar proteins clustering method
 boolean clusterDataReadyFlag
          flag that indicates that the cluster data exists
static int clusterMethod
          Cluster method computation method using the CLUST_xxxx values
static int DIST_AVG_ALL_DATA
          Jain method 4 for computing distance using missing values using average of all data in the database.
static int DIST_FISHER_LOW_VALUES
          method for computing distance using the Fisher clustering low values distance metric that maps low or missing values counts to a low value (i.e.
static int DIST_WEIGHT
          Jain method 3 for computing distance using missing values by weighting
(package private)  float distIJ
          current distance between mPidI and mPidJ
static int distMode
          distance computation method using the DIST_xxxx values
(package private)  float[] distSimilarCluster
          distances [0:mMasterPid-1] sorted by prp.mPidSimilarCluster[] list
static float distThreshold
          Distance threshold for similarity clustering
(package private)  float[] dMean
          mean distances for all samples [0:nPRPs-1]
static float MAX_DISTANCE
          value to use if no samples exist
(package private)  float[][] mExprVector
          Master Protein List [0:nMasterPids-1][0:nSamples-1] of protein expression values from all PRP DBs
(package private)  int mPidI
          current mPid i
(package private)  int mPidJ
          current mPid J
(package private)  int nBoth
          # of samples in both mPidI and mPidJ in current expr profile sample list
(package private)  int nMissing
          # of missing samples in either mPidI or mPidJ in current expr profile sample list
(package private)  int nPRPs
          # of samples in entire DB
(package private)  int nSamples
          # of samples in current expr profile sample list
static boolean popupClusterWindowFlag
          flag set it popup cluster control window GUI
private static ProtPlot prp
          instance of ProtPlot
 java.lang.String simClustTitle
          title for current similar cluster string
 ShowStringPopup similarRptPopup
          the instance of the cluster report.
private static UtilPRP util
          Utility instance
 
Constructor Summary
Cluster(ProtPlot prp)
          Cluster() - constructor to setup globals for clustering
 
Method Summary
 float calcDistIJ(int mPidI, int mPidJ)
          calcDistIJ() - compute the distance between midI and midJ using distMode method.
 void calcMeanDistIJ()
          calcMeanDistIJ() - compute mean distances for all samples
 boolean cluster()
          cluster() - cluster if allowClusterReportUpdatesFlag enabled using the current clusterMethod.
 boolean clusterKMeansProteins()
          clusterKMeansProteins() - cluster proteins passing the K-means filter.
 boolean clusterKMeansProteins(boolean popupClusterWindowFlag)
          clusterKMeansProteins() - cluster proteins passing the filter similar to the current protein.
 boolean clusterSimilarProteins()
          clusterSimilarProteins() - cluster proteins passing the filter similar to the current protein.
 boolean clusterSimilarProteins(boolean popupClusterWindowFlag)
          clusterSimilarProteins() - cluster proteins passing the filter similar to the current protein.
 float distFisherLowValues(int mPidI, int mPidJ)
          distFisherLowValues() - compute the Fisher clustering low values distance metric by mapping low or missing values counts to a low value (i.e.
 float distMissing(int mPidI, int mPidJ)
          distMissing() - return average missing distance between mPidI and mPidJ.
 float distMissingIJ(int s, int mPidI, int mPidJ)
          distMissingIJ() - compute sample s distance between Esi and Esj for mPidI and mPidJ.
 float distMissingMean(int mPidI, int mPidJ)
          distMissingMean() - return average missing distance between mPidI and mPidJ.
 float distMissingMeanIJ(int s, int mPidI, int mPidJ)
          distMissingMeanIJ() - compute sample s distance between Esi and Esj for mPidI and mPidJ.
 boolean isSampleInExprProfileList(int s)
          isSampleInExprProfileList() - test if sample s is in current Expr Profile list of samples
 void saveCluster()
          saveCluster() - save current clustered proteins into prp.savedClusterMPIDs[]
static float setClusterMethod(int clustMethod)
          setClusterMethod() - set method to use for clustering
static int setDistMode(int distanceMode)
          setDistMode() - set the method to use for clustering
static float setDistThreshold(float distThr)
          setDistThreshold() - set the distance threshold to use for clustering
 java.lang.String similarProteinsReportStr()
          similarProteinsReportStr() - compute report string of similar-cluster of filtered proteins
static boolean updateSeedProtein(int newSeedMPID)
          updateSeedProtein() - update the seed protein mPid used for clustering and recluster if enabled
 void updateSimilarProteinsReport()
          updateSimilarProteinsReport() - compute report string of similar-cluster of filtered proteins [TODO] This will update the Cluster Report window text area.
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

MAX_DISTANCE

public static final float MAX_DISTANCE
value to use if no samples exist

DIST_WEIGHT

public static final int DIST_WEIGHT
Jain method 3 for computing distance using missing values by weighting

DIST_AVG_ALL_DATA

public static final int DIST_AVG_ALL_DATA
Jain method 4 for computing distance using missing values using average of all data in the database.

DIST_FISHER_LOW_VALUES

public static final int DIST_FISHER_LOW_VALUES
method for computing distance using the Fisher clustering low values distance metric that maps low or missing values counts to a low value (i.e. a count of 1). NOTE: this requires the actual counts data.

CLUST_SIMILAR_PROTEINS_METHOD

public static final int CLUST_SIMILAR_PROTEINS_METHOD
Similar proteins clustering method

CLUST_KMEANS_PROTEINS_METHOD

public static final int CLUST_KMEANS_PROTEINS_METHOD
K-means proteins clustering method

prp

private static ProtPlot prp
instance of ProtPlot

util

private static UtilPRP util
Utility instance

mExprVector

float[][] mExprVector
Master Protein List [0:nMasterPids-1][0:nSamples-1] of protein expression values from all PRP DBs

nPRPs

int nPRPs
# of samples in entire DB

nSamples

int nSamples
# of samples in current expr profile sample list

nMissing

int nMissing
# of missing samples in either mPidI or mPidJ in current expr profile sample list

nBoth

int nBoth
# of samples in both mPidI and mPidJ in current expr profile sample list

mPidI

int mPidI
current mPid i

mPidJ

int mPidJ
current mPid J

distIJ

float distIJ
current distance between mPidI and mPidJ

dMean

float[] dMean
mean distances for all samples [0:nPRPs-1]

distSimilarCluster

float[] distSimilarCluster
distances [0:mMasterPid-1] sorted by prp.mPidSimilarCluster[] list

simClustTitle

public java.lang.String simClustTitle
title for current similar cluster string

clusterDataReadyFlag

public boolean clusterDataReadyFlag
flag that indicates that the cluster data exists

allowClusterReportUpdatesFlag

public static boolean allowClusterReportUpdatesFlag
flag set if clustering is enabled for the current cluster method

popupClusterWindowFlag

public static boolean popupClusterWindowFlag
flag set it popup cluster control window GUI

distThreshold

public static float distThreshold
Distance threshold for similarity clustering

distMode

public static int distMode
distance computation method using the DIST_xxxx values

clusterMethod

public static int clusterMethod
Cluster method computation method using the CLUST_xxxx values

similarRptPopup

public ShowStringPopup similarRptPopup
the instance of the cluster report. We make this public since we may update it from elsewhere
Constructor Detail

Cluster

public Cluster(ProtPlot prp)
Cluster() - constructor to setup globals for clustering
Parameters:
prp - is instance of ProtPlot
Method Detail

setDistMode

public static int setDistMode(int distanceMode)
setDistMode() - set the method to use for clustering
Parameters:
distanceMode - is method to use
Returns:
the mode if succeed, else -1

setDistThreshold

public static float setDistThreshold(float distThr)
setDistThreshold() - set the distance threshold to use for clustering
Parameters:
distThr - is the distance threshold to use if > 0.0
Returns:
the threshold if succeed, else set threshold to MAX_DISTANCE_SQ

updateSeedProtein

public static boolean updateSeedProtein(int newSeedMPID)
updateSeedProtein() - update the seed protein mPid used for clustering and recluster if enabled
Parameters:
newSeedMPID - is the new seed mPid protein
Returns:
true if suceed

setClusterMethod

public static float setClusterMethod(int clustMethod)
setClusterMethod() - set method to use for clustering
Parameters:
clustMethod - is the cluster method to use including:
Returns:
the clusterMethod if succeed, else -1

cluster

public boolean cluster()
cluster() - cluster if allowClusterReportUpdatesFlag enabled using the current clusterMethod.

calcDistIJ

public float calcDistIJ(int mPidI,
                        int mPidJ)
calcDistIJ() - compute the distance between midI and midJ using distMode method.
Parameters:
mPidI - is protein mPid # i
mPidJ - is protein mPid # j
Returns:
distance computed as above.

calcMeanDistIJ

public void calcMeanDistIJ()
calcMeanDistIJ() - compute mean distances for all samples
                         n   i-1
 dMean[s]= (2/n(n-1)) * Sum  Sum | esi-esj | 
                        i=1  j=0

 See A. K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall,
 NJ, 1988. Section 2.2.3 Missing Data, page 20 for discussion of this 
 algorithm.

distMissingIJ

public float distMissingIJ(int s,
                           int mPidI,
                           int mPidJ)
distMissingIJ() - compute sample s distance between Esi and Esj for mPidI and mPidJ.
 If Esi or Esj is missing (i.e. 0.0) , then return  -1.0
 else return |Esi - Esj|.

 See A. K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall,
 NJ, 1988. Section 2.2.3 Missing Data, page 19 for (method 3) discussion of
 this algorithm.
Parameters:
s - is ith sample
mPidI - is protein mPid # i
mPidJ - is protein mPid # j
Returns:
distance computed as above, -1.0F if bad data.

distMissing

public float distMissing(int mPidI,
                         int mPidJ)
distMissing() - return average missing distance between mPidI and mPidJ.
 dist(i,j) = (nSamples/nBoth) SUM dist(s,i,j)
                              s in EP list

 See A. K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall,
 NJ, 1988. Section 2.2.3 Missing Data, page 20 (method 3) for discussion of
 this algorithm.
Parameters:
mPidI - is protein mPid # i
mPidJ - is protein mPid # j
Returns:
distance computed as above.

distMissingMeanIJ

public float distMissingMeanIJ(int s,
                               int mPidI,
                               int mPidJ)
distMissingMeanIJ() - compute sample s distance between Esi and Esj for mPidI and mPidJ.
 If Esi or Esj is missing (i.e. 0.0) , then return dMean
 else return (Esi - Esj).

 See A. K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall,
 NJ, 1988. Section 2.2.3 Missing Data, page 20 (method 4) for discussion 
 of this algorithm.
Parameters:
s - is ith sample
mPidI - is protein mPid # i
mPidJ - is protein mPid # j
Returns:
distance computed as above.

distMissingMean

public float distMissingMean(int mPidI,
                             int mPidJ)
distMissingMean() - return average missing distance between mPidI and mPidJ.
 dist(i,j) = (nSamples/nBoth) SUM distMean(s,dMean,i,j)
                              s in EP list

 See A. K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall,
 NJ, 1988. Section 2.2.3 Missing Data, page 20 (method 4) for discussion of
 this algorithm.
Parameters:
mPidI - is protein mPid # i
mPidJ - is protein mPid # j
Returns:
distance computed as above.

distFisherLowValues

public float distFisherLowValues(int mPidI,
                                 int mPidJ)
distFisherLowValues() - compute the Fisher clustering low values distance metric by mapping low or missing values counts to a low value (i.e. a count of 1). [TODO] Note this requires the actual counts data (not available as of V0.29.2).
Parameters:
mPidI - is protein mPid # i
mPidJ - is protein mPid # j
Returns:
distance computed as above.

isSampleInExprProfileList

public boolean isSampleInExprProfileList(int s)
isSampleInExprProfileList() - test if sample s is in current Expr Profile list of samples
Parameters:
s - sample to test
Returns:
true if s is in the current expr profile sample list.

clusterKMeansProteins

public boolean clusterKMeansProteins(boolean popupClusterWindowFlag)
clusterKMeansProteins() - cluster proteins passing the filter similar to the current protein. Cluster all filtered genes, prp.mPidFiltered[0:prp.nMpidFiltered-1], by distance from curMPID. If successful, it builds list prp.mPidSimilarCluster[0:prp.nMpidSimilarCluster-1].
Returns:
true if succeed
[TODO] see clusterKMeansProteins() for notes

clusterKMeansProteins

public boolean clusterKMeansProteins()
clusterKMeansProteins() - cluster proteins passing the K-means filter. Cluster all filtered genes, prp.mPidFiltered[0:prp.nMpidFiltered-1], by K-means for the number of clusters specified by a slider. If successful, it builds list prp.mPidSimilarCluster[0:prp.nMpidSimilarCluster-1].
Returns:
true if succeed
[TODO] 1) implement # of clusters slider, 2) implement K-means, etc. 3) need to add data structure to support cluster identification

clusterSimilarProteins

public boolean clusterSimilarProteins(boolean popupClusterWindowFlag)
clusterSimilarProteins() - cluster proteins passing the filter similar to the current protein. Cluster all filtered genes, prp.mPidFiltered[0:prp.nMpidFiltered-1], by distance from curMPID. If successful, it builds list prp.mPidSimilarCluster[0:prp.nMpidSimilarCluster-1].
Returns:
true if succeed

clusterSimilarProteins

public boolean clusterSimilarProteins()
clusterSimilarProteins() - cluster proteins passing the filter similar to the current protein. Cluster all filtered genes, prp.mPidFiltered[0:prp.nMpidFiltered-1], by distance from curMPID. If successful, it builds list prp.mPidSimilarCluster[0:prp.nMpidSimilarCluster-1].
Returns:
true if succeed

updateSimilarProteinsReport

public void updateSimilarProteinsReport()
updateSimilarProteinsReport() - compute report string of similar-cluster of filtered proteins [TODO] This will update the Cluster Report window text area. Until that is finished, it will generate a new similar cluster report window.

similarProteinsReportStr

public java.lang.String similarProteinsReportStr()
similarProteinsReportStr() - compute report string of similar-cluster of filtered proteins
Returns:
report string if succeed, else null

saveCluster

public void saveCluster()
saveCluster() - save current clustered proteins into prp.savedClusterMPIDs[]