Chemistry Validate
POST/science/v1/chemistry/validate
What it does
Send any chemical notation — SMILES, InChI, or MDL Molfile (V2000/V3000) — and get back a complete validation report in milliseconds. Powered by RDKit, the industry-standard cheminformatics toolkit.
What you get back
| Output | Tier | What it gives you |
|---|---|---|
| Canonical SMILES | Free | Normalized, deterministic SMILES string — safe for dedup, indexing, and downstream pipelines. |
| InChI | Free | IUPAC International Chemical Identifier for cross-database lookup. |
| Molecular descriptors | Free | Formula, exact weight, atom/bond counts, TPSA, cLogP, rotatable bonds, aromaticity, stereochemistry. |
| Lipinski Ro5 | Free | Drug-likeness check — pass/fail on all four Lipinski rules with actual values. |
| InChI Key | Premium | 27-character hash for exact structure matching and registry dedup. |
| Molblock | Premium | Full MDL Molblock with 2D coordinates — ready for structure export and visualization. |
Invalid molecules are not errors
If the input is chemically invalid, the API still returns a successful response with data.valid: false. You get a structured reason explaining why it failed, plus fix recommendations. This makes it safe to use in form validation, user input pipelines, and batch processing without error handling for expected invalid inputs.
Why use it?
- Clean your data before it breaks downstream. Garbage SMILES in a database causes silent failures in docking, retrosynthesis, and ML pipelines. Validate early, fix cheap.
- Canonical = deterministic. The same molecule written 10 different ways produces the same canonical SMILES every time. Use it as a primary key, dedup identifier, or cache key.
- Real-time form validation. Build input fields that validate chemistry notation as users type — with human-readable error messages, not stack traces.
- Drug-likeness screening. Lipinski Rule of Five check built in. Filter compound libraries before expensive experiments.
- Actionable fix suggestions. Invalid SMILES? The API analyzes common mistakes (unbalanced parentheses, valence errors, unclosed rings) and tells you exactly what to fix.
- No RDKit installation required. Skip the painful C++ build, Python bindings, and WASM setup. One HTTP call, any language.
Examples
Validate Aspirin (SMILES)
curl -X POST https://api.creatornode.io/science/v1/chemistry/validate \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-d '{
"input": "CC(=O)OC1=CC=CC=C1C(=O)O"
}'Response — valid molecule
{
"success": true,
"data": {
"valid": true,
"format": "smiles",
"input": "CC(=O)OC1=CC=CC=C1C(=O)O",
"canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O",
"inchi": "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)",
"metadata": {
"molecularFormula": "C9H8O4",
"exactMolecularWeight": 180.042,
"numAtoms": 21, "numBonds": 21,
"numRings": 1, "numHeavyAtoms": 13,
"numHBA": 4, "numHBD": 1,
"TPSA": 63.6, "cLogP": 1.31,
"numRotatableBonds": 3,
"numAromaticRings": 1, "hasStereo": false
},
"drugLikeness": {
"lipinskiRo5": true, "violations": 0,
"rules": [
{ "rule": "MW ≤ 500", "passed": true, "value": 180.04 },
{ "rule": "cLogP ≤ 5", "passed": true, "value": 1.31 },
{ "rule": "HBD ≤ 5", "passed": true, "value": 1 },
{ "rule": "HBA ≤ 10", "passed": true, "value": 4 }
]
}
},
"meta": {
"requestId": "abc-123",
"processingTimeMs": 12,
"rdkitVersion": "2025.3.4"
}
}Response — invalid molecule
{
"success": true,
"data": {
"valid": false,
"format": "smiles",
"input": "CC(=O",
"error": {
"message": "SMILES Parse Error: unclosed ring or branch"
}
},
"recommendations": [
{
"type": "fix",
"title": "Input fix",
"message": "Unbalanced parentheses: 1 opening '(' without matching ')'.",
"priority": "high"
}
]
}InChI input with premium outputs
{
"input": "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)",
"format": "inchi",
"options": {
"inchiKey": true,
"molblock": true
}
}On free tier, inchiKey and molblock are silently ignored — upgrade to Premium to include them in the response.
Supported input formats
| Format | Example | When to use |
|---|---|---|
| SMILES | CC(=O)Oc1ccccc1C(=O)O | Most common. Compact, human-readable line notation. |
| InChI | InChI=1S/C9H8O4/... | Cross-database lookup. Standard identifier from IUPAC. |
| Molfile | (multi-line V2000/V3000) | Structure files from ChemDraw, Marvin, or database exports. |
💡 Auto-detection: You don't need to specify
format. The API detects it from the input: InChI strings start with InChI=, Molfiles contain V2000/V3000, everything else is treated as SMILES.Options reference
| Option | Type | Default | What it does |
|---|---|---|---|
canonicalSmiles | boolean | true | Include canonical (normalized) SMILES in output. |
inchi | boolean | true | Include InChI string in output. |
inchiKey | boolean | false | Include InChI Key (27-char hash). Premium only. |
molblock | boolean | false | Include MDL Molblock with 2D coords. Premium only. |
descriptors | boolean | true | Include molecular descriptors (weight, TPSA, cLogP, etc.). |
sanitize | boolean | true | Run RDKit sanitization (valence check). Set to false for molecules with non-standard valence. |
Tips & tricks
💡 Tip: For the full request/response schema, OpenAPI spec, and the interactive demo endpoint, see the API Reference.
- Let format auto-detection work for you. The
formatfield defaults to"auto"and correctly identifies SMILES, InChI, and Molfile. Only set it explicitly if you know the format and want to skip detection. - Use canonical SMILES as your primary key. Multiple SMILES strings can represent the same molecule. Canonical SMILES eliminates duplicates — store it, index it, compare it.
- Valence errors? Try
sanitize: false. Some molecules (vancomycin fragments, metal complexes) have atoms that violate standard valence rules. Setting"options": { "sanitize": false }lets RDKit parse them anyway and produce a canonical SMILES. The API will suggest this automatically when it detects a valence error. - Batch with confidence. The API returns
valid: falseas a normal response, not an error. You can process hundreds of molecules in a loop without try/catch — just checkdata.valid. - Read the fix recommendations. When validation fails, the
recommendationsarray contains specific, actionable fixes — unbalanced parentheses, unclosed rings, valence issues — not generic error messages. - Lipinski filtering at scale. Use
drugLikeness.lipinskiRo5to filter compound libraries. Molecules with more than 1 violation are unlikely to be orally bioavailable. - Input limits are in bytes, not characters. UTF-8 encoding means special characters in Molfiles use more than 1 byte. Free tier: 200 bytes, Premium: 4,000 bytes.
Cost & Limits
| Feature | Detail |
|---|---|
| Credit cost | 1 credit per request |
| Input formats | SMILES, InChI, MDL Molfile (V2000/V3000) |
| Free outputs | canonicalSmiles, inchi, metadata, drugLikeness |
| Premium outputs | inchiKey, molblock |
| Engine | RDKit |
Tier Limits
| Limit | Free | Premium |
|---|---|---|
| Max input size | 200 bytes | 4 000 bytes |
| InChI Key | No | Yes |
| Molblock export | No | Yes |