ClusterFinder: The Simplified Solution for Metabolomics Data Analysis

ClusterFinder
ClusterFinder (CF) is a metabolomics data analysis software tool for targeted and non-targeted metabolomics.   ClusterFinder processes isotopically enriched metabolomics data; centered around 13C at 93-97% and 2-5% (typically 95% and 5% 13C) abundance.  CF suppression correction algorithms will correct for most of the variance (i.e., instrumentation error, injector, source etc.) and then normalization algorithms correct for a large part of the remaining variance.  ClusterFinder outputs three values: 1) the raw (suppressed) values observed; 2) a suppression-corrected value; and 3) a normalized (suppression-corrected and normalized) value.

ClusterFinder™ is provided with any kit purchase and developed to support the analysis of IROA LC-MS data files generated using the IROA TruQuant Yeast Extract Semi-targeted QC Workflow Kit.

ClusterFinder metabolomics data analysis benefits:

  • Accurate compound formula ID for MS alone; complete ID with the addition of SWATH, or IM, even at low concentrations
  • Unique IROA patterns discriminate peaks of biological origin from artifacts allowing the removal of false data
  • All IROA-based fragments have the IROA ratio pattern derived from their parent peaks and can be identified using the “peak correlation” ClusterFinder module
  • Suppression-corrected measurements for significantly better quantitation
  • Reproducibility.  DUAL MSTUS sample normalization to a universal standard for complete comparability
  • Batch-to-batch correction
  • Ensures high-level daily instrument QC for accurate and reproducible results
  • ClusterFinder builds libraries, IDs/quantitates compounds, and normalizes data. View the Protocol and ClusterFinder videos to see how IROA can be used for your Metabolomics data analysis workflows.
  • ClusterFinder: Your Solution to a Streamlined Metabolomic Workflow!

Suppression Correction Algorithm

The unique IROA labeling pattern ensures that the monoisotopic peaks and the carbon envelope of the associated isotopic peaks (M-1 etc.) can be detected during LC-MS.  The carbon envelope differentiates the IROA-IS from natural abundance peaks (and is used to identify compounds of interest and exclude artifacts that may look otherwise similar.

The IROA-IS is a true Internal Standard and can be spiked into any natural abundance experimental sample (cells, tissue biopsy, plant material, blood, etc.) and all the IS peaks may be easily identified using the ClusterFinder software according to the presence of their characteristic M-1 peak and associated carbon envelope. It provides enough information for complete identification and quantitation of samples without the need for chromatographic base-line correction.

Fundamental to the IROA concepts (and inherent in the name Isotopic Ratio Outlier Analysis) is the fact that the ratio of the C-12 envelope to the C-13 envelope is unaffected by suppression even though both the C-12 and C-13 isotopomeric sets may be strongly suppressed.  This has afforded a mechanism for suppression correction that has been built into ClusterFinder.

The algorithm for suppression correction starts by determining the true ratio of the total number of molecules in their respective C12 and C13 envelopes and then multiplying this ratio with the “unsuppressed value” of the molecule at hand.

    1. The “least unsuppressed value” can be the largest C13 value seen within an experiment for each compound. This value can then be used within an experiment and will recover the effects of suppression as seen in the experiment.
    2. The “least unsuppressed value” may be the largest C13 value seen in a sample that is otherwise a blank (i.e., containing no sample).
    3. A more accurate, and enduring value for the “unsuppressed value” can be determined in a specific experiment in which the internal solution is serially diluted and analyzed in the absence of any other sample.
    4. If there is a quantitatively accurate determination of the concentration of the internal standard for the peak in question, then the ratio may simply be multiplied by this concentration to obtain an approximately accurate concentration value.

It should be noted that options 1 and 2 require no further data to be generated external to the experiment, while options 3 and 4 rely on experimentally established quantitative values, that once established may be used at any time if an identical aliquot size of the same internal standard is used to resolvate the dried sample. Options c and d represent solutions that can be repeatedly used for longer periods of time, over many experiments, or even for very large studies using hundreds or thousands of samples.

DUAL MSTUS sample normalization algorithm

Once suppression is corrected, a Dual MSTUS  algorithm is employed to provide a very accurate mechanism for the normalization of samples against sample-to-sample variances.

  1. The MS Total Usable Signal (MSTUS) normalization algorithm (Warrick) simply assumes that the overall chemical composition of all samples is close enough that the sum of all the verifiable compounds for comparable samples will be “reasonably” constant. Thus, according to this theory if the sum of these verifiable compounds differs then it is more likely that this is due to differences in the physical size (density or similar property) of the original samples. The algorithm was devised to normalize urine samples which often demonstrate very different concentrations or dilutions but has been shown to be equally effective for most classes of solid or liquid samples. In MSTUS, for every sample a “Normalization Factor” (NF) is developed that when the AUCs for all peaks as individually multiplied they will sum to a “common value.”  The “common value” is determined arbitrarily by the experimentalist in every experiment. In the IROA Workflow, we have considered and implemented several key features that have strengthened the MSTUS normalization procedure with the modification we call “Dual MSTUS”:
  2. All verified peaks are, by definition, peaks that are found in both the IS and the sample, i.e., show up in both the C12 envelope and the C13 envelope of a compound’s isotopolog ladder in both the LTRS and experimental samples. This is a very rigorous test of biological relevance.
  3. All the C13 peaks for each compound represent the same amount of material in each sample because they are derived by addition of equal aliquots of the internal standard.
  4. The amplitudes of both the C12 and C13 envelopes suppressed equally are corrected for suppression losses by correcting each according to the same correction factor.
  5. Each sample is represented by two equally important MSTUS values, the C12 MSTUS (the sum of all C12 envelopes), and the C13 MSTUS value (the sum of all C13 envelopes).
  6. The NF is developed to make the C12 MSTUS equal to the C13 MSTUS (which we know to be constant).

Since the C13 MSTUS is always present at the same concentrations this normalization not only avoids any arbitrary quality in the NF but also means that every suppression corrected and normalized sample can be directly compared to any other similarly treated sample, i.e., has the same amount of IS in it.

There is one additional aspect of this algorithm to consider, namely, in practice some samples have more compounds than others. Since the rules of Dual MSTUS require that every C12 envelope must have a C13 envelope equivalent, the NF is a simple relative measurement; however, if you want to directly compare two or more samples the comparison should only use those compounds that are common to all the samples that you wish to compare.  The reality is that this is not that much of a restriction as the IROA software algorithms satisfactorily creates non-sparse datasets, but nevertheless it is a factor worth considering when comparing samples.

Warrick BM, Hnatyshyn S, Ott K-H, Reily MD, Sanders M, Zhang H, and Drexler DM, “Normalization strategies for metabolomic analysis of urine samples” J Chrom B 877 (2009) 547-662, doi:10.1016/j.jchromb.2009.01.007.