> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/pranavkrishnasuresh/chemAgent/llms.txt
> Use this file to discover all available pages before exploring further.

# Name Conversion

> Convert between IUPAC names, SMILES, and molecular formulas

ChemAgent supports bidirectional conversion between different chemical representations including IUPAC names, SMILES notation, and molecular formulas.

## Overview

The name conversion tasks handle:

* IUPAC ↔ SMILES conversions
* SMILES ↔ Molecular Formula conversions
* IUPAC ↔ Molecular Formula conversions

All SMILES strings are automatically canonicalized to ensure consistency.

## IUPAC to SMILES

Convert IUPAC chemical names to SMILES notation.

<CodeGroup>
  ```python Basic Usage theme={null}
  from LLM4Chem.generation import LlaSMolGeneration

  generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

  query = "Could you provide the SMILES for <IUPAC> 4-ethyl-4-methyloxolan-2-one </IUPAC>?"
  result = generator.generate(query)
  print(result[0]['output'][0])
  # Output: Of course. It's <SMILES> CCC1(C)COC(=O)C1 </SMILES> .
  ```

  ```python With Agent theme={null}
  import asyncio
  from plan_execute_agent.rdkit_agent import process_input

  query = "What is the SMILES representation of <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
  result, completed, attempts, _, _, _ = asyncio.run(process_input(query))
  print(result)
  ```
</CodeGroup>

<Tip>
  Always wrap IUPAC names in `<IUPAC> ... </IUPAC>` tags for proper processing.
</Tip>

## SMILES to IUPAC

Translate SMILES notation into systematic IUPAC names.

```python theme={null}
from LLM4Chem.generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')

query = "Translate the given SMILES formula of a molecule <SMILES> CCC(C)C1CNCCCNC1 </SMILES> into its IUPAC name."
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <IUPAC> 3-butan-2-yl-1,5-diazocane </IUPAC>
```

## SMILES to Molecular Formula

Determine the molecular formula from a SMILES string.

```python theme={null}
query = "Given the SMILES representation <SMILES> S=P1(N(CCCl)CCCl)NCCCO1 </SMILES>, what would be its molecular formula?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: It is <MOLFORMULA> C7H15Cl2N2OPS </MOLFORMULA> .
```

## IUPAC to Molecular Formula

Extract molecular formulas directly from IUPAC names.

```python theme={null}
query = "What is the molecular formula of the compound with this IUPAC name <IUPAC> 2,5-diphenyl-1,3-oxazole </IUPAC>?"
result = generator.generate(query)
print(result[0]['output'][0])
# Output: <MOLFORMULA> C15H11NO </MOLFORMULA>
```

## Automatic Canonicalization

ChemAgent automatically canonicalizes SMILES strings to ensure consistent representations.

### How It Works

The canonicalization process (`LLM4Chem/utils/smiles_canonicalization.py:64`):

1. **Parses** the SMILES string using RDKit
2. **Removes** atom mapping numbers
3. **Standardizes** stereochemistry
4. **Applies** Kekulization (optional)
5. **Generates** canonical SMILES

```python theme={null}
from LLM4Chem.generation import LlaSMolGeneration

# Non-canonical input
query = "What is the IUPAC name of <SMILES> C1CCOC1 </SMILES>?"

# The SMILES is automatically canonicalized before processing
generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')
result = generator.generate(query, canonicalize_smiles=True)
```

<Note>
  Canonicalization can be disabled by setting `canonicalize_smiles=False` in the `generate()` method, but this is not recommended for most use cases.
</Note>

## Tag Format

<CardGroup cols={3}>
  <Card title="Input Tags" icon="code">
    * `<SMILES> ... </SMILES>`
    * `<IUPAC> ... </IUPAC>`
  </Card>

  <Card title="Output Tags" icon="brackets-curly">
    * `<MOLFORMULA> ... </MOLFORMULA>`
    * `<SMILES> ... </SMILES>`
    * `<IUPAC> ... </IUPAC>`
  </Card>

  <Card title="Auto-Processing" icon="wand-magic-sparkles">
    SMILES canonicalization

    Tag extraction

    Validation
  </Card>
</CardGroup>

## Common Patterns

### Multiple Conversions

```python theme={null}
queries = [
    "Convert <IUPAC> ethanol </IUPAC> to SMILES",
    "What is the molecular formula of <SMILES> CCO </SMILES>?",
    "Give me the IUPAC name for <SMILES> C1=CC=CC=C1 </SMILES>"
]

for query in queries:
    result = generator.generate(query)
    print(f"Query: {query}")
    print(f"Result: {result[0]['output'][0]}\n")
```

### With Validation

```python theme={null}
from plan_execute_agent.chem_tools import validate_smiles_rdkit

# Generate SMILES
query = "Convert <IUPAC> benzene </IUPAC> to SMILES"
result = generator.generate(query)

# Extract and validate
smiles = result[0]['output'][0].split('<SMILES>')[1].split('</SMILES>')[0].strip()
validation = validate_smiles_rdkit.invoke({"smiles_string": smiles})
print(f"Valid: {validation['valid']}")
```

## Error Handling

If the conversion fails or the input is invalid, the model will indicate the issue:

```python theme={null}
query = "What is the SMILES for <IUPAC> invalidchemicalname123 </IUPAC>?"
result = generator.generate(query)
# The model will attempt to process but may return an error or empty result
```

<Tip>
  For best results, ensure chemical names are spelled correctly and use standard IUPAC nomenclature.
</Tip>

## See Also

* [SMILES Validation](/guides/smiles-validation) - Validate SMILES strings
* [Molecule Operations](/guides/molecule-operations) - Generate and describe molecules
* [API Reference](/api-reference/generation) - Full API documentation
