Table S1: Search time trials using Ullmann algorithm. Deoxycytidine ms 1548ms 2.36ms 2422ms. Deoxyuridine ms 1608ms 2.

Similar documents
Oregon State University

Identifying Functional Groups. (Chapter 2 in the Klein text)

Name the ester produced when methanol and pentanoic acid react. methyl pentanoate. Name the type of reaction used to make an ester

Revision Sheet Final Exam Term

Paper 9: ORGANIC CHEMISTRY-III (Reaction Mechanism-2) Module17: Reduction by Metal hydrides Part-II CHEMISTRY

Chem 1120 Final 210 points Dr. Luther Giddings

GREENE'S PROTECTIVE GROUPS IN ORGANIC SYNTHESIS

Chapter 15 Alcohols, Diols, and Thiols

Chemistry 1120 Exam 1 Study Guide

Why Carbon? What does a carbon atom look like?

Carboxylic Acids and Their Derivatives

Organic Chemistry. Chapter 23. Hill, Petrucci, McCreary & Perry 4 th. Ed. Alkane to Substituent Group methane CH 4 methyl CH 3

Please read and sign the Honor Code statement below:

Chapter 8 Lecture Reactions of Alkenes

unit 9 practice test (organic and biochem)

Cl or C here H 2 N. 4. Consider the following local anesthetic agents and find the pharmacophore. Double bonds. have been omitted for clarity.

cyclobutane Benzene Ring phenyl

Esters of Carboxylic Acids These are derivatives of carboxylic acids where the hydroxyl group is replaced by an alkoxy group.

6/9/2015. Unit 15: Organic Chemistry Lesson 15.2: Substituted Hydrocarbons & Functional Groups

COURSE OUTLINE CHEMISTRY II 2018

General Chemistry. Ch. 10

Prelab 6: Carboxylic Acids

Teacher s Tools Chemistry Organic Chemistry: Nomenclature and Isomerism

KMnO 4 1 O 4'' Apigenin. 1 In the following reactions draw the structures of products B and C. 1. NaH/DMF 2. excess MeI. acetic anhydride(excess)

Chapter 19: Carboxylic Acid Derivatives: Nucleophilic Acyl Substitution 19.1: Nomenclature of Carboxylic Acid Derivatives (please read)

Organic Chemistry Diversity of Carbon Compounds

Worksheet Chapter 17: Food chemistry glossary

ORGANIC AND BIOORGANIC CHEMISTRY

Carboxylic Acid Derivatives Reading Study Problems Key Concepts and Skills Lecture Topics: Structures and reactivity of carboxylic acid derivatives

Level 3 Chemistry, 2007

ORGANIC SYNTHESIS VIA ENOLATES

11/5/ Oxidation of Alkenes: Cleavage to Carbonyl Compounds. Oxidation of Alkenes: Cleavage to Carbonyl Compounds

Chapter 10. Carboxylic Acids and Derivatives. Naming Carboxylic Acids and Derivatives. Carboxylic Acids: RCOOH (RCO 2 H)

Introduction to Carbohydrates

Biomolecules. Macromolecules Proteins Nucleic acids Polysaccharides Lipids

Chapter 20 and GHW#10 Questions. Proteins

1. Butane. 2. trans-2-methylcyclohexanol. 3. 1,2-dimethylcyclohexene. Chem 131 Spring 2018 Exam I Practice

Org/Biochem Final Lec Form, Spring 2012 Page 1 of 6

A BEGINNER S GUIDE TO BIOCHEMISTRY

Chapter 15 An Introduction to Organic Chemistry, Biochemistry, and Synthetic Polymers. An Introduction to Chemistry by Mark Bishop

Alkane C-C single bond (propane) Alkene C=C double bond (propene) Alcohol - OH group (1-propanol) major. minor

Chapter 13: Alcohols, Phenols, and Ethers

Alehydes, Ketones and Carboxylic Acid

Chap 7: Alcohols, Phenols, & Thiols

CHE 102 Exam 3 CH 3 CHCOOH. CH 3 CH 2 CH 2 O d. e. f. CH3 COO g. h. i. O O CH 3 CCH 2 CCH 3

For more info visit

ALCOHOLS, ETHERS, PHENOLS, AND THIOLS

Carbohydrates. Learning Objective

Oxidizing Alcohols. Questions. Prediction. Analysis. Safety Precautions. Materials. Conclusions. Procedure. 74 MHR Unit 1 Organic Chemistry

Metabolism of xenobiotics FM CHE 5-6

Organic. Carbon Chemistry

Carbon s unique bonding pattern arises from the hybridization of the electrons.

Infrared Spectroscopy

A carboxylic acid is an organic compound that contains a carboxyl group, COOH

Lecture Notes Chemistry Mukund P. Sibi Lecture 31 Reactions at the Alpha-Carbon of Carbonyl Compounds

Details of Organic Chem! Date. Carbon & The Molecular Diversity of Life & The Structure & Function of Macromolecules

IR Spectroscopy Part II

CHEM 242 UV-VIS SPECTROSCOPY, IR SPECTROSCOPY, CHAP 13A ASSIGN AND MASS SPECTROMETRY C 8 H 17 A B C

BIOB111 - Tutorial activity for Session 14

AS Describe aspects of organic chemistry. COLLATED POLYMER QUESTIONS - polyesters, polyamides and peptides

Chapter 18. Carboxylic Acids and Their Derivatives. Nucleophilic Addition-Elimination at the Acyl Carbon

Chapters 13/14: Carboxylic Acids and Carboxylic Acid Derivatives

FATTY ACID PROFILING BY GAS CHROMATOGRAPHY FOR THE SHERLOCK MIS

Practice Questions for Biochemistry Test A. 1 B. 2 C. 3 D. 4

CARBOXYLIC ACIDS AND THEIR DERIVATIVES: NUCLEOPHILIC ADDITION-ELIMINATION AT THE ACYL CARBON

In silico prediction of metabolism as a tool to identify new metabolites of dietary monoterpenes

13. Carboxylic Acids (text )

CHAPTER 3. Carbon & the Molecular Diversity of Life

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination #5: Section Five December 7, Name: (print) Section:

Chapter 18 Carboxylic Acids and Their Derivatives. Nucleophilic Addition- Elimination at the Acyl Carbon

Carboxylic Acids. The Importance of Carboxylic Acids (RCO 2 H)

Organic/Biochem Test #2 Takehome Name: Spring 2012 Page 1 of 6. Multiple choice: Circle the best answer for each of the following questions.

Polar bodies are either introduced or unmasked, which results in more polar metabolites Phase I reactions can lead either to activation or

1. Draw a standard line bond structure for compounds of the following molecular formulas:

B07 Alcohols, Corboxylic Acids & Esters.notebook. November 19, Alcohols

For example, monosaccharides such as glucose are polar and soluble in water, whereas lipids are nonpolar and insoluble in water.

From Atoms to Cells: Fundamental Building Blocks. Models of atoms. A chemical connection

Reactions and amino acids structure & properties

Part I Short Answer Choose a letter to fill in the blanks. Use choices as many times as you wish. Only one choice is needed per blank.

A. Incorrect! No, this is not the description of this type of molecule. B. Incorrect! No, this is not the description of this type of molecule.

Annotation of potential isobaric and isomeric lipid species measured with the AbsoluteIDQ p180 Kit (and p150 Kit)

1. Choose the answer that has the following compounds located correctly in the separation scheme.

Pharmacognosy- 1 PHG 222. Prof. Dr. Amani S. Awaad

MITOCW watch?v=kl2kpdlb8sq

1-To know what is protein 2-To identify Types of protein 3- To Know amino acids 4- To be differentiate between essential and nonessential amino acids

2. Which of the following is NOT true about carbohydrates

Biochemical Oxidation

Alcohols, Phenols, Ethers And Thiols Lec:3

9/6/2011. Amino Acids. C α. Nonpolar, aliphatic R groups

Arginine side chain interactions and the role of arginine as a mobile charge carrier in voltage sensitive ion channels. Supplementary Information

Chem 60 Takehome Test 2 Student Section

CHM 424L Organic Laboratory, Dr. Laurie S. Starkey Introduction to Mass Spectrometry

Basic Biochemistry. Classes of Biomolecules

Chem 499. Spring, 2016 Beauchamp. Credit

Biology 2E- Zimmer Protein structure- amino acid kit

Chapter 4 - Carbon Compounds

Chapter 20: Carboxylic Acid Derivatives: Nucleophilic Acyl Substitution

level 6 (6 SCQF credit points)

Carbohydrates. Chapter 18

MITOCW watch?v=xms9dyhqhi0

Transcription:

Supplementary Materials: Table S1: Performance of test functional group searches using the Ullmann algorithm. Although the Ullmann algorithm finds all subgraphs for a given compound and functional group pairing, the time needed is prohibitive for functional group searches against a large number of database compounds or for large compounds and large functional groups. For example, searching for the carboxylic acid functional group within reasonably large structures such as deoxycorticosterone takes over one hour. Table S1: Search time trials using Ullmann algorithm Carboxylic Acid Epoxide Alkene Alcohol Deoxycytidine 145828ms 1548ms 2.36ms 2422ms R-3-Hydroxybutyric Acid 2-Hydroxybutyric Acid 1992ms 62ms.13ms 144ms 2275ms 62ms.12ms 142ms Deoxyuridine 210620ms 1608ms 2.21ms 2340ms Database Compounds 1-Methylhistidine 7651ms 256ms.91ms 473ms Cortexolone 4239178ms 25635ms 41.41ms 38455ms 2-Methoxyestrone 955437ms 11167ms 22.97ms 15019ms Deoxycorticosterone 828605ms 12080ms 24.81ms 11490ms 1,3-Diaminopropane.09ms.08ms.06ms.11ms 2-Ketobutyric Acid 1375ms 42ms.09ms 86ms Table S2: Performance of test functional group searches using CASS with no Short Circuiting. Our algorithm finds all functional groups considerably faster than the original Ullmann algorithm. The

time needed to find a particular functional group increases with the number of atoms in both the functional group and database compound, but the time needed for a search is most related to the number of possible atom to atom mappings between the functional groups and database compounds. Small functional groups show nearly no increase in time as the number of possible mappings does not increase very quickly. However, this pseudo-linear performance only occurs for values of m lower than approximately 150 but remains sufficiently fast to allow for efficient functional group searching in all database compounds (Figure 9E). Table S2: Search time trials using CASS with no Short Circuiting Carboxylic Acid Epoxide Alkene Alcohol Deoxycytidine.48ms.49ms.54ms.49ms R-3-Hydroxybutyric Acid 2-Hydroxybutyric Acid.27ms.16ms.11ms.24ms.27ms.16ms.11ms.24ms Deoxyuridine.56ms.55ms.53ms.46ms Database Compounds 1-Methylhistidine.36ms.16ms.33ms.22ms Cortexolone 1.07ms.83ms 2.72ms 1.05ms 2-Methoxyestrone.83ms.72ms 2.21ms.60ms Deoxycorticosterone.90ms.89ms 2.61ms.60ms 1,3-Diaminopropane.03ms.01ms.07ms.02ms 2-Ketobutyric Acid.23ms.13ms.11ms.17ms Table S3: Performance of test functional group searches using CASS with Short Circuiting. This algorithm terminates when the first proper mapping is found. This allows for significant relative and absolute time savings for some database functional group pairings compared to our algorithm with short-circuiting disabled. For large database compounds and/or large functional groups, this time savings can be significant as is the case for 2-Methoxyesterone and Alkene. The time savings

of this algorithm depends on the amount of time needed to find the first valid mapping relative to the time needed to complete the enumeration procedure. The earlier the first valid mapping was found in the enumeration process the greater the time savings. Therefore, short circuiting is most effective when the algorithm happens to find a valid mapping early in the enumeration process or when there are numerous instances of the functional group. Table S3: Search time trials using CASS with Short Circuiting Carboxylic Acid Epoxide Alkene Alcohol Deoxycytidine.48ms.49ms.23ms.29ms R-3-Hydroxybutyric Acid.19ms.16ms.11ms.12ms 2-Hydroxybutyric Acid.19ms.16ms.12ms.03ms Deoxyuridine.55ms.55ms.22ms.29ms Database Compounds 1-Methylhistidine.20ms.16ms.12ms.06ms Cortexolone 1.07ms.83ms 1.27ms.28ms 2-Methoxyestrone.82ms.73ms.03ms.06ms Deoxycorticosterone.90ms.89ms 2.6ms.25ms 1,3-Diaminopropane.03ms.01ms.08ms.02ms 2-Ketobutyric Acid.10ms.13ms.11ms.10ms Table S4: Performance comparison for stereoisomerism using CASS, with and without short circuiting. For non-stereoisomeric compounds, no valid mapping exists and therefore both algorithms must exhaust all possible enumerations before termination. As a result both algorithms performed identically when the two compound were non-stereoisomeric. However, when stereoisomerism was present, the short-circuiting algorithm terminates early saving considerable time. The extent of time savings from short-circuiting increases as the compounds become larger, allowing for the detection of stereoisomerism in a larger set of compounds than would be feasible with the non-short circuiting algorithm.

Table S4: Comparison of CASS with and without short circuiting for stereoisomerism testing. Number of atoms in compound pair Short-circuit time (seconds) Non-short circuit time (seconds) Stereoisomer Detected Y/N 10 0.001 0.001 Y 10 0.001 0.001 N 23 0.003 0.01 Y 23 0.002 0.002 N 29 0.005 0.038 Y 29 0.001 0.001 N 46 0.01 0.077 Y 46 0.002 0.002 N 61 0.023 0.427 Y 61 0.02 0.02 N 66 0.005 0.004 N 75 0.135 0.134 N 88 0.966 0.966 N 93 0.015 0.014 N 158 0.102 0.101 N 158 0.613 0.616 N

Table S5: groups comprising optimal strategies for the combined HMDB and KEGG database and their performance in percent unambiguous compounds. A) Stoichiometric analysis offers the best performance of all strategies and provides very good results with only three functional groups. However, stoichiometric adduct formation is likely impossible to ensure. Table S5 A: groups comprising best stoichiometric strategies for combined database Strategy: Best strategy of 3 Best strategy of 5 Best strategy of 10 Best strategy of 15 Group Only + + + Super Super Only (30.35%) (31.03%) (31.06%) (30.62%) (30.62%) Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (26.41%) Primary Alcohol, (33.12%) Alkene, Dialkylether, Enol, Methyl, Secondary Alcohol (34.78%) Phenol, Secondary Alcohol (33.93%) Alkene, Dialkylether, Enol, Methyl, Secondary Alcohol (34.25%) Alkene, Enol, Methyl,, Super Ether (34.29%) Super Amine, Super Carboxylic Acid Amide, Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (27.08%) 1,2-diphenol, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Ketone, Methyl, Phenol, Primary Alcohol,, Tertiary Alcohol (35.7%) Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Alcohol, Secondary Amine (37.73%) 6-Heterocycle, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Ketone, Methyl, Phenol, Primary Alcohol,, Secondary Amine (36.91%) 1,2-diol, 6-Heterocycle, Alkene, Dialkylether, Enol, Enolether, Alcohol, Secondary Amine (37.32%) 1,2-diol, 6-Heterocycle, Alkene, Enol, Enolether, Ketone, Methyl,, Secondary Amine, Super Ether (37.37%) Algorithm terminated due to performance cutoff 1,2-diphenol, Aldehyde, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Imine, Ketone, Methyl, Phenol, Primary Amine, Alcohol, Secondary Amine, Tertiary Alcohol (36.67%) 1,2-diol, 5-Heterocycle, Aldehyde, Alkene, Dialkylether, Enamine, Enol, Enolether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine (38.77%) 1,2-diol, 1,2-diphenol, 5-Heterocycle, 6- Heterocycle, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine (38.06%) 1,2-diol, 1,2-diphenol, 5-Heterocycle, 6- Heterocycle, Alkene, Alpha-aminoacid, Dialkylether, Enamine, Enol, Enolether, Ketone, Methyl, Alcohol, Secondary Amine (38.37%) 1,2-diol, 1,2-diphenol, 6-Heterocycle, Alkene, Alpha-aminoacid, Enamine, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Super Carboxylic Acid Derivative, Super Ether (38.48%) Algorithm terminated due to performance cutoff Table S5 B) For non-stoichiometric strategy analysis, alcohols and amines have a greater impact earlier in the strategy analysis than in stoichiometric

strategies. Furthermore, functional groups such as methyl groups and alkenes impart much less performance in a non-stoichiometric environment than in a stoichiometric one. Table S5 B: groups comprising best non-stoichiometric strategies for combined database Strategy: Best strategy of 3 Best strategy of 5 Best strategy of 10 Best strategy of 15 Group Only + + + Super Super Only Alkene, Ketone, (23.18%) Ketone, Methyl, (22.94%) Alkene, Ketone, (23.29%) Ketone, Methyl, (22.61%) Ketone, Super Carboxylic Acid Derivative, Super Hydroxyl (22.7%) Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (22.22%) Primary Alcohol, (26.5%) Enol, Ketone, Methyl, Primary Alcohol, (25.99%) Primary Alcohol, (26.3%) Enol, Enolether, Ketone, Methyl, Secondary Alcohol (25.68%) Ketone, Methyl,, Super Carboxylic Acid Derivative, Super Hydroxyl (25.78%) Super Amine, Super Carboxylic Acid Amide, Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (23.06%) Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Ketone, Methyl, Phenol, Primary Alcohol,, Tertiary Alcohol (30.51%) Carboxylic Acid Ester, Dialkylether, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (30.51%) 5-Heterocycle, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Ketone, Methyl, Phenol, Primary Alcohol,, Tertiary Alcohol (30.86%) 5-Heterocycle, Carboxylic Acid Ester, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (30.53%) 5-Heterocycle, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Super Carboxylic Acid Derivative, Super Ether, Tertiary Alcohol (30.54%) Algorithm terminated due to performance cutoff 1,2-diol, 1,2-diphenol, Aldehyde, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Ketone, Methyl, Phenol, Primary Amine, Alcohol, Secondary Amine, Tertiary Alcohol (32.1%) 1,2-diol, Aldehyde, Alkene, Carboxylic Acid Ester, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (32.66%) 5-Heterocycle, 6-Heterocycle, Aldehyde, Alkene, Alkylarylethermol, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Ketone, Methyl, Phenol, Alcohol, Secondary Amine, Tertiary Alcohol (32.89%) 1,2-diol, 5-Heterocycle, 6-Heterocycle, Aldehyde, Alkene, Carboxylic Acid Ester, Enamine, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (32.45%) 1,2-diol, 5-Heterocycle, 6-Heterocycle, Aldehyde, Alkene, Enol, Enolether, Ketone, Methyl, Alcohol, Secondary Amine, Super Carboxylic Acid Derivative, Super Ether, Tertiary Alcohol (32.55%) Algorithm terminated due to performance cutoff Table S5 C) The pseudostoichiometric strategies for distinct functional groups are very similar to those that performed well in stoichiometric strategies.

When subgraphs and overlapping can be detected, pseduotstochiometric strategies differ from nonstoichiometric and stoichiometric strategies until more functional groups are added. Given the difficulty of ensuring stoichiometric adduct formation; these strategies will likely be most efficacious in the wet lab environment. Table S5 C: groups comprising best pseudostoichiometric strategies for combined database Strategy: Best strategy of 3 Best strategy of 5 Best strategy of 10 Best strategy of 15 Group Only + + + Super Super Only (28.37%) Dialkylether, Methyl, Secondary Alcohol (27.35%) (28.7%) Dialkylether, Methyl, Secondary Alcohol (27.25%) Ketone, Methyl, Super Hydroxyl (27.74%) Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (25.51%) Primary Alcohol, (31.68%) Alkene, Dialkylether, Enol, Methyl, Secondary Alcohol (31.3%) Phenol, Secondary Alcohol (32.1%) Alkene, Dialkylether, Enol, Methyl, Secondary Alcohol (31.08%) Enolether, Ketone, Methyl,, Super Hydroxyl (31.29%) Super Amine, Super Carboxylic Acid Amide, Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (26.18%) 1,2-diphenol, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Ketone, Methyl, Phenol, Primary Alcohol,, Tertiary Alcohol (34.74%) Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Primary Alcohol,, Secondary Amine (35.71%) 5-Heterocycle, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Ketone, Methyl, Phenol, Alcohol, Secondary Ami (35.62%) 5-Heterocycle, Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine (35.56%) 5-Heterocycle, Alkene, Enol, Enolether, Ketone, Methyl, Alcohol, Secondary Amine, Super Ether (35.61%) Algorithm terminated due to performance cutoff 1,2-diphenol, Aldehyde, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Imine, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (35.83%) 1,2-diol, Aldehyde, Alkene, Carboxylic Acid Ester, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (37.18%) 1,2-diphenol, 5-Heterocycle, 6-Heterocycle, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Ketone, Methyl, Phenol, Primary Amine, Alcohol, Secondary Amine, Tertiary Alcohol (36.89%) 1,2-diol, 1,2-diphenol, 5-Heterocycle, 6- Heterocycle, Alkene, Alpha-aminoacid, Dialkylether, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (36.89%) 1,2-diol, 1,2-diphenol, 6-Heterocycle, Alkene, Alpha-aminoacid, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Super Carboxylic Acid Derivative, Super Ether, Tertiary Alcohol (37.03%) Algorithm terminated due to performance cutoff Table S6 groups comprising optimal strategies for the combined KEGG database and their performance in % unambigious formulas. A) As seen with the combined database, stoichiometric strategies allow for the best increases in percent unambigious formulas. Additionally the differences in the databases manifests itself most clearly in the increased diversity of functional groups comprising the best performing strategy of three groups.

Table S6 A: groups comprising best stoichiometric strategies for KEGG only Strategy: Best strategy of 3 Best strategy of 5 Best strategy of 10 Best strategy of 15 Only + + + + + + + Super Super Only Alkene, Ketone, Methyl (61.63%) (62%) (62.24%) (61.77%) Methyl, Secondary Alcohol, Super Hydroxyl (61.94%) Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (56.9%) Alkene, Carboxylic Acid, Alcohol (65.14%) Alkene, Dialkylether, Enol, Methyl, (66.65%) Phenol, (66.16%) Alkene, Dialkylether, Enol, Methyl, (66.5%) Alkene, Enol, Methyl,, Super Ether (66.69%) Super Amine, Super Carboxylic Acid Amide, Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (58.1%) Aldehyde, Alkene, Carboxylic Acid, Ketone, Methyl, Phenol, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (68.15%) Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Alcohol, Secondary Amine (70.82%) 1,2-diol, 6-Heterocycle, Aldehyde, Alkene, Carboxylic Acid, Ketone, Methyl, Phenol,, Secondary Amine (69.76%) 1,2-diol, Alkene, Alpha-aminoacid, Dialkylether, Enol, Enolether, Ketone, Methyl,, Secondary Amine (70.55%) 1,2-diol, Alkene, Enol, Enolether, Ketone, Methyl,, Secondary Amine, Super Carboxylic Acid Derivative, Super Hydroxyl (70.87%) Algorithm terminated due to performance cutoff 1,2-diphenol, Aldehyde, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Imine, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (69.31%) 1,2-diol, Aldehyde, Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol, Tertiary Amine (72.21%) 1,2-diol, 1,2-diphenol, 5-Heterocycle, 6- Heterocycle, Aldehyde, Alkene, Carboxylic Acid, Carboxylic Acid Secondary Amide, Dialkylether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol, (71.16%) 1,2-diol, 6-Heterocycle, Aldehyde, Alkene, Alphaaminoacid, Dialkylether, Enamine, Enol, Enolether, Ketone, Methyl, Phenol, Primary Alcohol,, Secondary Amine (71.91%) 1,2-diol, 1,2-diphenol, 6-Heterocycle, Alkene, Alpha-aminoacid, Enamine, Enol, Enolether, Ketone, Methyl, Phenol,, Secondary Amine, Super Carboxylic Acid Derivative, Super Hydroxyl (72.11%) Algorithm terminated due to performance cutoff Table S6 B) The optimal non-stoichiometric strategies are similar to the stoichiometric strategies but allow for less disambiguation of database compounds. The performance difference between stoichiometric and non-stoichiometric strategies decreases as the number of functional groups within each strategy increases. Additionally, the ketone, secondary alcohol and dialkyl ethers perform relatively better non-stoichiometrically than stoichiometrically, providing better performance in some strategies than alkenes or methyl groups.

Table S6 B: groups comprising best non-stoichiometric strategies for KEGG only Strategy: Top 3 Top 5 Top 10 Top 15 Only + + + + + + + Super Super Only Alkene, Ketone, Methyl (49%) Ketone, Methyl, Secondary Alcohol (51.75%) Alkene, Ketone, Secondary Alcohol (51.8%) Dialkylether, Ketone, Secondary Alcohol (51.32%) Ketone,Super Carboxylic Acid Derivative, Super Hydroxyl (52.03%) Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (51.18%) Alkene, Carboxylic Acid, Alcohol (54.76%) Dialkylether, Enol, Ketone, Methyl, (57.03%) Alkene, Carboxylic Acid, Alcohol (56.66%) Dialkylether, Enol, Ketone, Methyl, (56.51%) Alcohol, Super Carboxylic Acid Derivative, Super Ether (56.84%) Super Amine,Super Carboxylic Acid Amide,Super Carboxylic Acid Derivative,Super Ether,Super Hydroxyl (52.79%) Aldehyde, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Ketone,Methyl, Phenol, Primary Alcohol,, Tertiary Alcohol (61.54%) Dialkylether, Enol, Enolether, Ketone, Methyl,Phenol, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (63.33%) Alkene, Alkylarylethermol, Carboxylic Acid, Carboxylic Acid Ester, Ketone, Methyl, Phenol, Alcohol, Tertiary Alcohol (62.51%) 5-Heterocycle, Carboxylic Acid, Enol, Enolether, Ketone, Methyl, Alcohol, Secondary Amine, Tertiary Alcohol (62.73%) Enol, Enolether, Ketone, Methyl, Alcohol, Secondary Amine, Super Carboxylic Acid Derivative, Super Ether, Tertiary Alcohol (63.27%) Algorithm terminated due to performance cutoff 1,2-diol, 1,2-diphenol, Aldehyde,Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (64.03%) 1,2-diol, Aldehyde, Carboxylic Acid, Carboxylic Acid Ester, Enol, Enolether, Ketone, Methyl,Phenol,Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol, Tertiary Amine (65.85%) 5-Heterocycle, Aldehyde, Alkene, Alkylarylethermol, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Enamine, Ketone, Methyl, Phenol, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (65.19%) 1,2-diol, 5-Heterocycle, 6-Heterocycle, Aldehyde, Carboxylic Acid, Carboxylic Acid Ester, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Tertiary Alcohol, Tertiary Amine (65.13%) 1,2-diol, 5-Heterocycle, 6-Heterocycle, Aldehyde, Enamine, Enol, Enolether, Ketone, Methyl, Primary Alcohol,, Secondary Amine, Super Carboxylic Acid Derivative, Super Ether, Tertiary Alcohol (65.42%) Algorithm terminated due to performance cutoff Table S6 C) The optimal pseudostoichiometric strategies provide nearly the same percent of disambiguation as the stoichiometric strategies but with slightly different functional groups comprising each strategy until the number of functional groups within a strategy becomes larger than five. Pseudostoichiometric adduct formation provides much better performance than non-stoichiometric and is an achievable goal for CS-tagging strategies. Table S6 C: groups comprising best non-stoichiometric strategies for KEGG only

Strategy: Top 3 Top 5 Top 10 Top 15 Only + + + + + + + Super Super Only Alkene, Ketone, Methyl (59.32%) Dialkylether, Methyl, Secondary Alcohol (58.43%) (59.43%) Dialkylether, Methyl, Secondary Alcohol (58.1%) Methyl, Super Carboxylic Acid Derivative, Super Hydroxyl (58.99%) Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (55.9%) Alkene, Carboxylic Acid, Alcohol (63.33%) Dialkylether, Enol, Ketone, Methyl, (63.42%) Phenol, (64.09%) Dialkylether, Enol, Ketone, Methyl, (63.59%) Enol, Methyl, Secondary Alcohol, Super Carboxylic Acid Derivative, Super Ether (63.8%) Super Amine, Super Carboxylic Acid Amide, Super Carboxylic Acid Derivative, Super Ether, Super Hydroxyl (57.11%) Aldehyde, Alkene, Carboxylic Acid, Ketone, Methyl, Phenol, Alcohol, Secondary Amine, Tertiary Alcohol (66.82%) Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Alcohol, Secondary Amine (68.88%) 5-Heterocycle, Aldehyde, Alkene, Carboxylic Acid, Ketone, Methyl, Phenol, Primary Alcohol,, Secondary Amine (68.15%) 1,2-diol, Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Alcohol, Secondary Amine (68.59%) 1,2-diol, Alkene, Enol, Enolether, Alcohol, Secondary Amine, Super Carboxylic Acid Derivative, Super Hydroxyl (68.89%) Algorithm terminated due to performance cutoff 1,2-diol, 1,2-diphenol, Aldehyde, Alkene, Carboxylic Acid, Carboxylic Acid Ester, Dialkylether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (68.13%) 1,2-diol, Aldehyde, Alkene, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Primary Amine, Primary Alcohol,, Secondary Amine, Tertiary Alcohol, Tertiary Amine (70.65%) 1,2-diphenol, 5-Heterocycle, Aldehyde, Alkene, Alkylarylethermol, Alpha-aminoacid, Carboxylic Acid, Enamine, Ketone, Methyl, Phenol, Primary Alcohol,, Secondary Amine, Tertiary Alcohol (69.72%) 1,2-diol, 6-Heterocycle, Aldehyde, Alkene, Alphaaminoacid, Dialkylether, Enol, Enolether, Ketone, Methyl, Phenol, Alcohol, Secondary Amine, Tertiary Alcohol (70.27%) 1,2-diol, 1,2-diphenol, 6-Heterocycle, Alkene, Alpha-aminoacid, Enamine, Enol, Enolether, Ketone, Methyl, Alcohol, Secondary Amine, Super Carboxylic Acid Derivative, Super Hydroxyl (70.56%) Algorithm terminated due to performance cutoff

Figure S1: Isomeric compound distribution by molecular formula within the HMDB and KEGG. Within each database, all isomeric compounds and their respective formulas were determined. The number of isomeric compounds mapping to each formula was then calculated; the number of formulas with a specific number of isomeric compounds was then determined and plotted ( e.g. two formulas have 27 isomeric compounds ). Bins marked * represent bins with one molecular formula, these could not be plotted due to the log transform. A) In the HMDB, a large number of formulas have over 40 compounds that map to them and a significant portion have over 100 compounds. Since the HMDB contains many similar entries that are structural isomers of one another, many of these compounds that map to the same formula are very similar in bonded structure, making disambiguation of all compounds mapping to the formula very difficult. (e.g. the many lipids in the HMDB) B) The distribution of isomers in KEGG differs significantly from the HMDB. In KEGG, all but three formulae have 35 or fewer compounds that map to them. Isomers in KEGG are mapped to a relatively larger number of formulae; making it easier to disambiguate them. Additionally, KEGG does not have as many lipid entries as the HMDB, reducing a source of very difficult compounds to disambiguate. Figure S2: Time needed to find all instances of selected functional groups in the HMDB. A and C) Unlike the time needed to find all instances of alkenes in the HMDB, which was neatly polynomial, the R 2 values for figures A and C show that there is some deviation from the behavior observed when searching for alkenes. Unlike the alkene group which contains only carbon, there are multiple element types which must be matched by CASS in order to find a valid instance of the functional group. When multiple element types are considered, there are now separate search spaces for each element type within the functional group as each element type in the functional group is only tested against the same element type in the database compound when wild card atoms are not considered. Since certain element types are more common than others, the sizes of these search spaces differ and if it is determined that there is no valid mapping for one element type, the algorithm terminates

as a valid mapping of the whole functional group is impossible. Therefore, we see two polynomial curves in all three of the trials for these functional groups. Although both alcohols and carboxylic acid contain hydrogens as well as oxygens and carbons, essentially there is always at least one valid hydrogen mapping given the ubiquity of hydrogen in the metabolome. Due to this effect, the scatter plots are not fit well by a single polynomial fit line, resulting in smaller R 2 values. B) Although the same effect occurs when searching for carboxylic acid functional groups, the larger values of m minimize this effect.