Chemistry Compare

POST/science/v1/chemistry/compare

What it does

Send two molecules in any supported notation — SMILES, InChI, or MDL Molfile — and find out how they relate. Powered by RDKit Morgan fingerprints and substructure matching.

Comparison modes

Mode	Tier	What it tells you
identity	Free	Are these the same molecule? Compares canonical SMILES — handles atom reordering, aromatic/Kekulé differences, and notation variants.
similarity	Free	How similar are they? Tanimoto coefficient (0–1) on Morgan circular fingerprints, with human-readable interpretation.
substructure	Premium	Is molecule A inside molecule B (or vice versa)? Graph-based substructure matching for scaffold/fragment detection.
full	Premium	All of the above in a single request.

Invalid molecules are not errors

If one or both molecules fail to parse, the API still returns HTTP 200 with data.parseErrors describing what went wrong — consistent with how Chemistry Validate handles invalid inputs. No comparison is performed, but the response is structured and safe for pipelines.

Why use it?

Dedup compound databases. Identity mode tells you if two differently-written SMILES are actually the same molecule. Use it as a batch dedup step before expensive downstream processing.
Rank drug candidates by similarity. Generate variants → compare each to a reference molecule → sort by Tanimoto score. The interpretation label ("very similar", "similar", "moderately similar", "dissimilar", etc.) is ready for reports without manual threshold logic.
Scaffold matching. Does this compound contain a benzene ring? A sulfonamide group? Substructure mode answers fragment-in-molecule questions that regular string matching can't.
Cross-format comparison. Compare an InChI string against a SMILES string without pre-conversion. The API normalizes both inputs before comparing.
No RDKit installation required. Molecular fingerprinting and substructure search without C++ builds, Python bindings, or WASM setup. One HTTP call.

Examples

Identity check — same molecule, different notation

curl -X POST https://api.creatornode.io/science/v1/chemistry/compare \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "molecules": [
      { "input": "CC(=O)OC1=CC=CC=C1C(=O)O" },
      { "input": "CC(=O)Oc1ccccc1C(=O)O" }
    ],
    "mode": "identity"
  }'

Response — identical

{
  "success": true,
  "data": {
    "molecules": [
      { "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" },
      { "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" }
    ],
    "identity": {
      "identical": true
    }
  },
  "meta": {
    "requestId": "abc-123",
    "processingTimeMs": 8,
    "rdkitVersion": "2025.3.4"
  }
}

Similarity — Aspirin vs Ibuprofen

{
  "molecules": [
    { "input": "CC(=O)Oc1ccccc1C(=O)O" },
    { "input": "CC(C)Cc1ccc(cc1)C(C)C(=O)O" }
  ],
  "mode": "similarity"
}

Response — similarity score

{
  "success": true,
  "data": {
    "molecules": [
      { "canonicalSmiles": "CC(=O)Oc1ccccc1C(=O)O", "format": "smiles" },
      { "canonicalSmiles": "CC(C)Cc1ccc(C(C)C(=O)O)cc1", "format": "smiles" }
    ],
    "similarity": {
      "tanimoto": 0.29,
      "fingerprintRadius": 2,
      "fingerprintBits": 2048,
      "interpretation": "dissimilar"
    }
  },
  "meta": {
    "requestId": "def-456",
    "processingTimeMs": 10,
    "rdkitVersion": "2025.3.4"
  }
}

Full mode — everything at once (Premium)

{
  "molecules": [
    { "input": "c1ccccc1" },
    { "input": "c1ccc(cc1)O" }
  ],
  "mode": "full"
}

Returns identity, similarity, and substructure blocks in a single response. Requires a Premium API key.

Comparison modes in detail

Mode	Algorithm	Output
identity	Canonical SMILES string equality	`identical: true/false`
similarity	Morgan fingerprint → Tanimoto coefficient	`tanimoto: 0.0–1.0` + interpretation label
substructure	RDKit `get_substruct_match()` graph search	`aInB`, `bInA`, `relationship`
full	All three combined	All output blocks in one response

Similarity interpretation thresholds

Tanimoto range	Interpretation	Meaning
≥ 0.95	identical	Essentially the same structure
0.85 – 0.94	very similar	Close analogs, minor substituent differences
0.70 – 0.84	similar	Same scaffold family, moderate variation
0.50 – 0.69	moderately similar	Shared substructure elements
0.30 – 0.49	dissimilar	Limited structural overlap
< 0.30	unrelated	Different molecular classes

Fingerprint options

Option	Type	Default	What it does
`fingerprintRadius`	number (1–4)	`2`	Morgan fingerprint radius. Higher = more extended features captured.
`fingerprintBits`	1024 \| 2048	`2048`	Bit vector length. 2048 is standard; 1024 is faster but slightly less precise.

💡 Auto-detection: You don't need to specify format on each molecule. The API detects it from the input: InChI strings start with InChI=, Molfiles contain V2000/V3000, everything else is treated as SMILES.

Tips & tricks

💡 Tip: For the full request/response schema, OpenAPI spec, and the interactive demo endpoint, see the API Reference.

Let format auto-detection work for you. Both molecules default to format: "auto". The API detects SMILES, InChI, and Molfile automatically — you can even compare an InChI against a SMILES.
Tanimoto interpretation guide. The interpretation field saves you from hardcoding thresholds:
≥ 0.95 — identical, ≥ 0.85 — very similar, ≥ 0.7 — similar, ≥ 0.5 — moderately similar, ≥ 0.3 — dissimilar, < 0.3 — unrelated.
Fingerprint radius matters. The default radius of 2 is standard for drug-likeness comparisons. Increase to 3 or 4 to capture more extended molecular features — useful for distinguishing closely related analogs.
Substructure for fragment screening. Use "mode": "substructure" to check if a pharmacophore fragment is present in a larger molecule. The relationship field tells you the direction: "a_in_b", "b_in_a", "mutual", or "none".
Parse errors don't crash your pipeline. If one molecule is invalid, you get a parseErrors array with the index plus a hint / hintCodeto fix the input — no HTTP error, no exception, just structured data.
Input limits are in bytes, not characters. UTF-8 encoding means Molfiles use more than 1 byte per character. Free tier: 200 bytes per molecule, Premium: 4,000 bytes.

Cost & Limits

Feature	Detail
Credit cost	3 credits per request
Input formats	SMILES, InChI, MDL Molfile (V2000/V3000)
Free modes	identity, similarity
Premium modes	substructure, full
Engine	RDKit

Tier Limits

Limit	Free	Premium
Max input size (per molecule)	200 bytes	4 000 bytes
Identity mode	Yes	Yes
Similarity mode	Yes	Yes
Substructure mode	No	Yes
Full mode	No	Yes

Other Endpoints

POST/science/v1/chemistry/validate

Chemistry Validate

Validate chemical notation (SMILES/InChI/Mol) and extract molecular properties.