Preparation


This tutorial will explain how to properly prepare input files for biceps using Python. Each experimental observable requires its own method from the biceps.Preparation class to prepare the input data (datapoints of theory and experiment). For example, biceps.Preparation.prep_noe will be used to prepare data of NOE distances and biceps.Preparation.prep_J for scalar coupling constants. Our goal is to prepare input data for the next step, which is constructing the ensemble of experimental restraints. All data from this tutorial can be found from this work (DOI:10.1002/jcc.23738).


[1]:
import mdtraj as md
import numpy as np
import pandas as pd
import biceps
BICePs - Bayesian Inference of Conformational Populations, Version 2.0
Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate.  The estimate presumes the use of many statistically independent samples.  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data.

In this tutorial, we have two experimental observables: (1) J couplings for small molecules and (2) NMR nuclear Overhauser effect (NOE) (pairwise distances). First we need to perform conformational clustering on our MD simulations data. In this case, 100 metastable states are clustered. Now we need to prepare prior knowledge we learned from computational simulations. Normally, we used potential energy for each metastable state. In the original work, Zhou et al did Quantum Mechanical (QM) calculations to refine each state and B3LYP energy was used as priors in BICePs calculation. Instructions of QM calculations are beyond the scope of this tutorial. Alternatively, we can estimate the potential energy using \(U = -ln(P)\) where \(P\) is the normalized population for each state. We also have built-in functions (toolbox) to conduct this conversion.

Next, we need to compute pairwise distances and J coupling constants for each clustered state. To compute pairwise distances, we recommend to use MDTraj (biceps.toolbox.compute_distances) which is free to download. To compare simulated conformational ensembles to experimental NOE measurements, we normally computed \(<r^{-6}>^{-1/6}\). For convenience in this tutorial, we assume the cluster center of each state is representative enough and simply compute pairwise distances for the cluster center conformation. In practice, we still recommend users to compute ensemble-averaged distance.

[2]:
# Compute model_data for NOE
data_dir = "../../datasets/cineromycin_B/"
states = biceps.toolbox.get_files(data_dir+"cineromycinB_pdbs/*")
nstates = len(states)
ind=data_dir+'atom_indice_noe.txt'
outdir = "NOE/"
biceps.toolbox.mkdir(outdir)
model_data_NOE = biceps.toolbox.compute_distances(states, ind, outdir)

Next, we need to convert computed distance to Pandas DataFrame. The default file format for saving prepared input data is fmt="pickle". biceps currently supports *.csv and *.pkl file formats and possibly other input/output file formats from Pandas.

[3]:
# Now using biceps Preparation submodule
model_data_NOE = str(outdir+"*.txt")
exp_data_NOE = data_dir+"noe_distance.txt"
preparation = biceps.Preparation(nstates, top_file=states[0], outdir="J_NOE")
preparation.prepare_noe(exp_data_NOE, model_data_NOE, indices=ind, write_as="csv", verbose=False)

Now, let’s take a look at what’s inside the newly generated files.

[4]:
pd.options.display.max_columns = 12
pd.options.display.max_rows = 10
pd.read_csv('J_NOE/0.noe', index_col=0)
[4]:
exp model restraint_index atom_index1 res1 atom_name1 atom_index2 res2 atom_name2
0 2.5 2.864866 1 23 UNK1 H3 25 UNK1 H5
1 2.5 4.400116 2 23 UNK1 H3 27 UNK1 H7
2 2.5 4.045793 3 25 UNK1 H5 27 UNK1 H7
3 2.5 3.481033 4 24 UNK1 H4 26 UNK1 H6
4 2.5 2.566673 5 25 UNK1 H5 26 UNK1 H6
... ... ... ... ... ... ... ... ... ...
28 2.5 3.837284 14 36 UNK1 H16 39 UNK1 H19
29 2.5 2.601645 14 36 UNK1 H16 40 UNK1 H20
30 2.5 2.810059 14 37 UNK1 H17 38 UNK1 H18
31 2.5 4.223544 14 37 UNK1 H17 39 UNK1 H19
32 2.5 3.634443 14 37 UNK1 H17 40 UNK1 H20

33 rows × 9 columns

Now let’s move on to J couplings. Model predictions of coupling constants from dihedral angles θ were obtained from Karplus relations chosen depending on the relevant stereochemistry.

Again, we need to convert computed J coupling constants to Pandas DataFrame.

[5]:
# Compute model_data forJ coupling
ind = np.load(data_dir+'ind.npy') # atom indices of J coupling constants

outdir = "J/"
biceps.toolbox.mkdir(outdir)

# Karplus relations for each dihedral angles
karplus_key=np.loadtxt(data_dir+'Karplus.txt', dtype=str)
print('Karplus relations', karplus_key)
states = biceps.toolbox.get_files(data_dir+"cineromycinB_pdbs/*.fixed.pdb")
biceps.toolbox.compute_nonaa_scalar_coupling(states=states,
        indices=ind, karplus_key=karplus_key, outdir=outdir)
Karplus relations ['Allylic' 'Karplus_HH' 'Karplus_HH' 'Karplus_HH' 'Karplus_HH'
 'Karplus_HH' 'Karplus_antiperiplanar_O' 'Karplus_antiperiplanar_O'
 'Karplus_antiperiplanar_O' 'Karplus_antiperiplanar_O']

Next, we need to convert J coupling to Pandas DataFrame. Let’s use fmt="pickle" to show off how we can import various file formats for different restraints…

[6]:
indices = data_dir+'atom_indice_J.txt'
exp_data_J = data_dir+'exp_Jcoupling.txt'
model_data_J = data_dir+"J_coupling/*.txt"

# Now using biceps Preparation submodule
preparation = biceps.Preparation(nstates, top_file=states[0], outdir="J_NOE")
preparation.prepare_J(exp_data=exp_data_J, model_data=model_data_J,
                   indices=indices, write_as="pickle", verbose=False)

Now, let’s take a look at what’s inside the newly generated files.

[7]:
pd.read_pickle('J_NOE/0.J')
[7]:
exp model restraint_index atom_index1 res1 atom_name1 ... atom_index3 res3 atom_name3 atom_index4 res4 atom_name4
0 5.0 3.279128 1 25 UNK1 H5 ... 10 UNK1 C7 26 UNK1 H6
1 1.5 13.336870 2 30 UNK1 H10 ... 15 UNK1 C12 32 UNK1 H12
2 1.5 0.600845 3 31 UNK1 H11 ... 15 UNK1 C12 32 UNK1 H12
3 6.0 13.877749 4 32 UNK1 H12 ... 20 UNK1 C16 38 UNK1 H18
4 6.0 3.339372 4 32 UNK1 H12 ... 20 UNK1 C16 39 UNK1 H19
5 6.0 1.828765 4 32 UNK1 H12 ... 20 UNK1 C16 40 UNK1 H20
6 1.5 0.464738 5 32 UNK1 H12 ... 16 UNK1 C13 33 UNK1 H13
7 7.0 2.956672 6 33 UNK1 H13 ... 21 UNK1 C17 35 UNK1 H15
8 7.0 10.805575 6 33 UNK1 H13 ... 21 UNK1 C17 36 UNK1 H16
9 7.0 1.232210 6 33 UNK1 H13 ... 21 UNK1 C17 37 UNK1 H17

10 rows × 15 columns

Conclusion

From this tutorial, we learned how to use the biceps.Preparation class to prepare input files. Currently, BICePs supports the following restraints:

NOE, J couplings, Chemical Shifts, Protection Factors and more to come…

[8]:
print(biceps.toolbox.list_possible_restraints())
['Restraint_cs', 'Restraint_J', 'Restraint_noe', 'Restraint_pf']

In the example above, we showed how to deal with NOE and J couplings (non-natural amino acids).

Chemical shifts can be computed using different algorithm. We recommend to use Shiftx2 which is also available in MDTraj library. We offer methods in biceps.toolbox to easily compute model data (chemical shifts, J coupling, NOE distances).

Protection factors is a special observables which asks for extra work. We provide a full example that includes HDX (protection factor) restraints in biceps.

Now that the input files are ready, we can move on to constructing the conformational Ensemble and applying our restraints.

# NOTE: The following cell is for pretty notebook rendering

[14]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../../../theme.css", "r").read()
    return HTML(styles)
css_styling()
[14]: