Advanced ligand preparation#

Parameterization of RDKit molecules is controlled by several parameters that can be used when creating an instance of MoleculePreparation. Many of such parameters are exposed to the command line script, mk_prepare_ligand.py. Here, we cover the most relevant options, and how to use them both from command line and Python scripts. Familiarity with Python and with the command line is required.

Configuring MoleculePreparation#

From Python#

An instance of MoleculePreparation is configured by passing any number of optional parameters during initialization:

from meeko import MoleculePreparation

mk_prep = MoleculePreparation(
    merge_these_atom_types=("H"),
    charge_model="gasteiger",
)

Here we are passing two parameters, one that will merge atoms typed “H” into their parent atom, and another to use Gasteiger charges. Both are the default parameter, so we would have gotten the same configuration by passing zero parameters: mk_prep = MoleculePreparation().

Parameters can be in a dictionary and use the from_config constructor:

config_dict = {"merge_these_atom_types": (), "charge_model": "gasteiger"}
mk_prep = MoleculePreapration.form_config(config_dict)

The configuration dictionary can be written to a JSON file:

import json

with open("my_config.json", "w") as f:
    json.dump(config_dict, f)

From command line#

The command line script mk_prepare_ligand.py can read a JSON file using the -c or --config_file option. The contents of a JSON file corresponding to the Python example above is:

{"merge_these_atom_types": ["H"], "charge_model": "gasteiger"}

Passing the filename to the script with option -c:

mk_prepare_ligand.py -i mol.sdf -c my_config.json

Here, mol.sdf is some file in the current working directory.

Alternatively, each parameter can be passed directly:

mk_prepare_ligand.py -i mol.sdf --charge_model gasteiger --merge_these_atom_types H

Merging hydrogens (or not)#

By default, atoms of type H are merged. Merging consists of adding their partial charge to the parent atom, and setting the is_ignore attribute to True in the instance of MoleculeSetup, which in turn prevents the atom from being written to PDBQT. Since the atoms are merged based on their atom type, changing the atom typing can change the set of atoms that is merged.

Option name is merge_these_atom_types. To prevent merging of any atom types set the variable to an empty tuple in Python:

mk_prep = MoleculePreparation(merge_these_atom_types=())

or pass no parameters in command line

mk_prepare_ligand.py -i mol.sdf --merge_these_atom_types

Modifying atom types#

Atom typing relies on SMARTS patterns to identify chemical substructures. AutoDock4 atom types are set by default. The easiest way to modify typing is to add new SMARTS that will superseed the existing ones. For example, let’s assume we want to type hydrogens bound to aromatic carbons as HX. By default, hydrogens bound to carbon are typed H. A SMARTS pattern to matches hydrogen bound to carbon is "[H][c]". From command line:

mk_prepare_ligand.py --add_atom_types '[{"smarts": "[H]c", "atype": "HX"}]' -i mol.sdf

We pass a JSON string to --add_atom_types that is a list of dictionaries. Each dictionary has a "smarts" and "atype" key, and an optional "IDX" key that can be used to specify a list of atom indices (0-based) of the atoms in the SMARTS string that will be typed. By default IDX = [0].

The equivalent from Python is:

mk_prep = MoleculePreparation(
    add_atom_types=[{"smarts": "c[H]", "atype": "HX", "IDX": [1]}],
)

Note that we swapped the order of the atoms in the SMARTS, and are now explicitly defining the "IDX" key to type the second atom in the SMARTS.

The full set of atom types can also be specified. This can only be done from Python or by passing the equivalent configuration JSON file to mk_prepare_ligand.py. The easiest way to do so, is to put all SMARTS in a JSON file. See the default file for an example, it is located at meeko/data/params/ad4_types.json. The IDX key can be used as described above. Entries are matched in the order they appear in the file, the last SMARTS pattern that matches an atom is the one that determines the atom type. Then the filename can be passed to option -p/--load_atom_params.

Rigidifying bonds#

By default, single bonds are made rotatable except bonds in rings and amide bonds. Thioamide and amidine bonds are also not rotatable. Tertiary amides with non-equivalent substituents on the nitrogen are still made rotatable, which often leads to unreasonable geometries, but is necessary to visit both amide rotamers during docking.

Here, we configure Meeko to make single bonds in some conjugated systems rigid, as defined by the SMARTS "C=CC=C", and rigidify all amide bonds matched by "[CX3](=O)[NX3]", which includes tertiary amides but not thioamides or amidines:

mk_prepare_ligand.py\
  --rigidify_bonds_smarts "C=CC=C"\
  --rigidify_bonds_indices 2 3\
  --rigidify_bonds_smarts "[CX3](=O)[NX3]"\
  --rigidify_bonds_indices 1 3\
  -i mol.sdf

The equivalent code in Python to initialize the molecule preparator is:

mk_prep = MoleculePreparation(
    rigidify_bonds_smarts = ["C=CC=C", "[CX3](=O)[NX3]"],
    rigidify_bonds_indices = [(1, 2), (0, 2)],

)

The indices are the indices of the atoms in the SMARTS strings. Note that we use 0-based indices from the Python API, but 1-based indices from the command line script. In a future version of Meeko we may use 0-based indices everywhere.