Skip to main content
ChemAgent supports bidirectional conversion between different chemical representations including IUPAC names, SMILES notation, and molecular formulas.

Overview

The name conversion tasks handle:
  • IUPAC ↔ SMILES conversions
  • SMILES ↔ Molecular Formula conversions
  • IUPAC ↔ Molecular Formula conversions
All SMILES strings are automatically canonicalized to ensure consistency.

IUPAC to SMILES

Convert IUPAC chemical names to SMILES notation.
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: Of course. It's <SMILES> CCC1(C)COC(=O)C1 </SMILES> .
Always wrap IUPAC names in <IUPAC> ... </IUPAC> tags for proper processing.

SMILES to IUPAC

Translate SMILES notation into systematic IUPAC names.
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "Translate the given SMILES formula of a molecule <SMILES> CCC(C)C1CNCCCNC1 </SMILES> into its IUPAC name."
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <IUPAC> 3-butan-2-yl-1,5-diazocane </IUPAC>

SMILES to Molecular Formula

Determine the molecular formula from a SMILES string.
query = "Given the SMILES representation <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>, what would be its molecular formula?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: It is <MOLFORMULA> C7H15Cl2N2OPS </MOLFORMULA> .

IUPAC to Molecular Formula

Extract molecular formulas directly from IUPAC names.
query = "What is the molecular formula of the compound with this IUPAC name <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <MOLFORMULA> C15H11NO </MOLFORMULA>

Automatic Canonicalization

ChemAgent automatically canonicalizes SMILES strings to ensure consistent representations.

How It Works

The canonicalization process (LLM4Chem/utils/smiles_canonicalization.py:64):
  1. Parses the SMILES string using RDKit
  2. Removes atom mapping numbers
  3. Standardizes stereochemistry
  4. Applies Kekulization (optional)
  5. Generates canonical SMILES
from LLM4Chem.generation import LlaSMolGeneration

# Non-canonical input
query = "What is the IUPAC name of <SMILES> C1CCOC1 </SMILES>?"

# The SMILES is automatically canonicalized before processing
generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')
result = generator.generate(query, canonicalize_smiles=True)
Canonicalization can be disabled by setting canonicalize_smiles=False in the generate() method, but this is not recommended for most use cases.

Tag Format

Input Tags

  • <SMILES> ... </SMILES>
  • <IUPAC> ... </IUPAC>

Output Tags

  • <MOLFORMULA> ... </MOLFORMULA>
  • <SMILES> ... </SMILES>
  • <IUPAC> ... </IUPAC>

Auto-Processing

SMILES canonicalizationTag extractionValidation

Common Patterns

Multiple Conversions

queries = [
    "Convert <IUPAC> ethanol </IUPAC> to SMILES",
    "What is the molecular formula of <SMILES> CCO </SMILES>?",
    "Give me the IUPAC name for <SMILES> C1=CC=CC=C1 </SMILES>"
]

for query in queries:
    result = generator.generate(query)
    print(f"Query: {query}")
    print(f"Result: {result[0]['output'][0]}\n")

With Validation

from plan_execute_agent.chem_tools import validate_smiles_rdkit

# Generate SMILES
query = "Convert <IUPAC> benzene </IUPAC> to SMILES"
result = generator.generate(query)

# Extract and validate
smiles = result[0]['output'][0].split('<SMILES>')[1].split('</SMILES>')[0].strip()
validation = validate_smiles_rdkit.invoke({"smiles_string": smiles})
print(f"Valid: {validation['valid']}")

Error Handling

If the conversion fails or the input is invalid, the model will indicate the issue:
query = "What is the SMILES for <IUPAC> invalidchemicalname123 </IUPAC>?"
result = generator.generate(query)
# The model will attempt to process but may return an error or empty result
For best results, ensure chemical names are spelled correctly and use standard IUPAC nomenclature.

See Also