Format of .prp ProtPlot data files

TMAP home | Introduction | Using ProtPlot | Menus | Estimating expression | Download |
Revision history | PRP file format | PDF documents | History of ProtPlot | Latest version

The ProtPlot data is contained in a set of tissue- and histologic-specific .prp files. The set of .prp files constituting the ProtPlot database is included when you download ProtPlot. A .prp file is named using the following convention:

   {tissue name}_{histologic state}_tot.prp
where:
   {tissue name} could be: brain, prostate, or pancreas, etc. See the
            database file 'tissueNamesFile.txt' for a list of tissue names.
   {histologic state} can be a disagnosis category: normal, precancer, or cancer.
The .prp format has the following tab-delimited format (without the quotes added here for clarity). The first row is the tab-delimited list of field names followed by the tab-delimited corresponding data. The order of the columns is not important. Additional columns may be included in the files, but are ignored if the key words are different from any of the keywords in the following list.

The data in the initial startup must have the following fields: (pI, Mw, SP-ID, SP-ACC, expression, tissue). On subsequent addition of data using the (File menu | Use new PRP data file with this working database), it only requires (SP-ID, expression) since it will get the rest of the missing data from Master Protein Index entry.

'pI'	'Molecular Mass' 'SP-ACC' 'SP-ID          'MaxESTexpr' 'Tissue' 'Family'
6.74	 31544            O00108   AQP3_HUMAN      0.044334972  30       1
5.05     44106            P08727   K1CS_HUMAN      0.152709348  30       1
9.58      9330            P42677   RS27_HUMAN      0.004975124  30       1
                 . . .

where the following lists the files (case-independent) and their alternate names:

  1. 'pI' is the estimated isoelectric point of the protein. It is a decimal number (e.g., 4.61). Alternate names: (pI, pIe)
  2. 'Molecular Mass' is the molecular mass in Daltons (not KiloDatons!). Alternate names: (Mw, Molecular Mass)
  3. 'SP-ACC' is the SwissProt Accession number. Alternate names: (SP_ACC, SP-ACC, SPACC, "SwissProt Acc")
  4. 'SP-ID' is the SwissProt ID. Alternate names: (SP_ID, SP-ID, SPID, "SwissProt ID")
  5. 'GB-ID' is the GenBank ID (optional field). Alternate names: (GB_ID, GB-ID, GBID, "GenBank ID")
  6. 'MaxESTexpr' is the derived expression in the range of 0.0 to 1.0. Missing proteins are not entered. (NOTE: in the master protein index computed across all samples, 0.0 indicates there is no protein for a particular tissue when refering to its expression as part of the expression profile.) See discussion on how MaxESTexpr is computed for ProtPlot. Alternate names: (GB_ID, GB-ID, GBIDMaxESTexpr, "Max EST expr", estExpr, expr, expression).
  7. 'Tissue' specifies the tissue(s) that constitute the sample. Alternate names: (Tissue, "Tissue Name"). It is either:
    1. a tissue number from the 'tissueNamesFile.txt' file, or
    2. a hexadecimal bit pattern of several tissue numbers (eg. a mixture of tissues), then it is represented by the sum of (2**tissue(i)) for a set of n tissues.
  8. 'Family' is the protein families that the protein belongs to. It is a hexadecimal bit pattern of several protein family numbers from the 'familyNamesFile.txt' file. The mixture of families is sum of (2**family(i)) for a set of n families.. Alternate names: (Family, "Family Name", "Protein Family", "Protein Family Name"). [FUTURE] (This data is optional).


Djamel Medjahed, LMT, SAIC-Frederick
Peter Lemkin, LECB, NCI-Frederick

Revised: 08-26-2004