Preparation¶
This tutorial will explain how to properly prepare input files for biceps
using Python. Each experimental observable requires its own method from the biceps.Preparation
class to prepare the input data (datapoints of theory and experiment). For example, biceps.Preparation.prep_noe
will be used to prepare data of NOE distances and biceps.Preparation.prep_J
for scalar coupling constants. Our goal is to prepare input data for the next step, which is constructing the ensemble of
experimental restraints. All data from this tutorial can be found from this work (DOI:10.1002/jcc.23738).
[1]:
import mdtraj as md
import numpy as np
import pandas as pd
import biceps
BICePs - Bayesian Inference of Conformational Populations, Version 2.0
Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate. The estimate presumes the use of many statistically independent samples. Tests should be performed to assess whether this condition is satisfied. Be cautious in the interpretation of the data.
In this tutorial, we have two experimental observables: (1) J couplings for small molecules and (2) NMR nuclear Overhauser effect (NOE) (pairwise distances). First we need to perform conformational clustering on our MD simulations data. In this case, 100 metastable states are clustered. Now we need to prepare prior knowledge we learned from computational simulations. Normally, we used potential energy for each metastable state. In the original work, Zhou et al did Quantum Mechanical (QM) calculations to refine each state and B3LYP energy was used as priors in BICePs calculation. Instructions of QM calculations are beyond the scope of this tutorial. Alternatively, we can estimate the potential energy using \(U = -ln(P)\) where \(P\) is the normalized population for each state. We also have built-in functions (toolbox) to conduct this conversion.
Next, we need to compute pairwise distances and J coupling constants for each clustered state. To compute pairwise distances, we recommend to use MDTraj (biceps.toolbox.compute_distances
) which is free to download. To compare simulated conformational ensembles to experimental NOE measurements, we normally computed \(<r^{-6}>^{-1/6}\). For convenience in this tutorial, we assume the cluster center of each state is representative enough and simply compute pairwise
distances for the cluster center conformation. In practice, we still recommend users to compute ensemble-averaged distance.
[2]:
# Compute model_data for NOE
data_dir = "../../datasets/cineromycin_B/"
states = biceps.toolbox.get_files(data_dir+"cineromycinB_pdbs/*")
nstates = len(states)
ind=data_dir+'atom_indice_noe.txt'
outdir = "NOE/"
biceps.toolbox.mkdir(outdir)
model_data_NOE = biceps.toolbox.compute_distances(states, ind, outdir)
Next, we need to convert computed distance to Pandas DataFrame. The default file format for saving prepared input data is fmt="pickle"
. biceps
currently supports *.csv
and *.pkl
file formats and possibly other input/output file formats from Pandas.
[3]:
# Now using biceps Preparation submodule
model_data_NOE = str(outdir+"*.txt")
exp_data_NOE = data_dir+"noe_distance.txt"
preparation = biceps.Preparation(nstates, top_file=states[0], outdir="J_NOE")
preparation.prepare_noe(exp_data_NOE, model_data_NOE, indices=ind, write_as="csv", verbose=False)
Now, let’s take a look at what’s inside the newly generated files.
[4]:
pd.options.display.max_columns = 12
pd.options.display.max_rows = 10
pd.read_csv('J_NOE/0.noe', index_col=0)
[4]:
exp | model | restraint_index | atom_index1 | res1 | atom_name1 | atom_index2 | res2 | atom_name2 | |
---|---|---|---|---|---|---|---|---|---|
0 | 2.5 | 2.864866 | 1 | 23 | UNK1 | H3 | 25 | UNK1 | H5 |
1 | 2.5 | 4.400116 | 2 | 23 | UNK1 | H3 | 27 | UNK1 | H7 |
2 | 2.5 | 4.045793 | 3 | 25 | UNK1 | H5 | 27 | UNK1 | H7 |
3 | 2.5 | 3.481033 | 4 | 24 | UNK1 | H4 | 26 | UNK1 | H6 |
4 | 2.5 | 2.566673 | 5 | 25 | UNK1 | H5 | 26 | UNK1 | H6 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
28 | 2.5 | 3.837284 | 14 | 36 | UNK1 | H16 | 39 | UNK1 | H19 |
29 | 2.5 | 2.601645 | 14 | 36 | UNK1 | H16 | 40 | UNK1 | H20 |
30 | 2.5 | 2.810059 | 14 | 37 | UNK1 | H17 | 38 | UNK1 | H18 |
31 | 2.5 | 4.223544 | 14 | 37 | UNK1 | H17 | 39 | UNK1 | H19 |
32 | 2.5 | 3.634443 | 14 | 37 | UNK1 | H17 | 40 | UNK1 | H20 |
33 rows × 9 columns
Now let’s move on to J couplings. Model predictions of coupling constants from dihedral angles θ were obtained from Karplus relations chosen depending on the relevant stereochemistry.
Again, we need to convert computed J coupling constants to Pandas DataFrame.
[5]:
# Compute model_data forJ coupling
ind = np.load(data_dir+'ind.npy') # atom indices of J coupling constants
outdir = "J/"
biceps.toolbox.mkdir(outdir)
# Karplus relations for each dihedral angles
karplus_key=np.loadtxt(data_dir+'Karplus.txt', dtype=str)
print('Karplus relations', karplus_key)
states = biceps.toolbox.get_files(data_dir+"cineromycinB_pdbs/*.fixed.pdb")
biceps.toolbox.compute_nonaa_scalar_coupling(states=states,
indices=ind, karplus_key=karplus_key, outdir=outdir)
Karplus relations ['Allylic' 'Karplus_HH' 'Karplus_HH' 'Karplus_HH' 'Karplus_HH'
'Karplus_HH' 'Karplus_antiperiplanar_O' 'Karplus_antiperiplanar_O'
'Karplus_antiperiplanar_O' 'Karplus_antiperiplanar_O']
Next, we need to convert J coupling to Pandas DataFrame. Let’s use fmt="pickle"
to show off how we can import various file formats for different restraints…
[6]:
indices = data_dir+'atom_indice_J.txt'
exp_data_J = data_dir+'exp_Jcoupling.txt'
model_data_J = data_dir+"J_coupling/*.txt"
# Now using biceps Preparation submodule
preparation = biceps.Preparation(nstates, top_file=states[0], outdir="J_NOE")
preparation.prepare_J(exp_data=exp_data_J, model_data=model_data_J,
indices=indices, write_as="pickle", verbose=False)
Now, let’s take a look at what’s inside the newly generated files.
[7]:
pd.read_pickle('J_NOE/0.J')
[7]:
exp | model | restraint_index | atom_index1 | res1 | atom_name1 | ... | atom_index3 | res3 | atom_name3 | atom_index4 | res4 | atom_name4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5.0 | 3.279128 | 1 | 25 | UNK1 | H5 | ... | 10 | UNK1 | C7 | 26 | UNK1 | H6 |
1 | 1.5 | 13.336870 | 2 | 30 | UNK1 | H10 | ... | 15 | UNK1 | C12 | 32 | UNK1 | H12 |
2 | 1.5 | 0.600845 | 3 | 31 | UNK1 | H11 | ... | 15 | UNK1 | C12 | 32 | UNK1 | H12 |
3 | 6.0 | 13.877749 | 4 | 32 | UNK1 | H12 | ... | 20 | UNK1 | C16 | 38 | UNK1 | H18 |
4 | 6.0 | 3.339372 | 4 | 32 | UNK1 | H12 | ... | 20 | UNK1 | C16 | 39 | UNK1 | H19 |
5 | 6.0 | 1.828765 | 4 | 32 | UNK1 | H12 | ... | 20 | UNK1 | C16 | 40 | UNK1 | H20 |
6 | 1.5 | 0.464738 | 5 | 32 | UNK1 | H12 | ... | 16 | UNK1 | C13 | 33 | UNK1 | H13 |
7 | 7.0 | 2.956672 | 6 | 33 | UNK1 | H13 | ... | 21 | UNK1 | C17 | 35 | UNK1 | H15 |
8 | 7.0 | 10.805575 | 6 | 33 | UNK1 | H13 | ... | 21 | UNK1 | C17 | 36 | UNK1 | H16 |
9 | 7.0 | 1.232210 | 6 | 33 | UNK1 | H13 | ... | 21 | UNK1 | C17 | 37 | UNK1 | H17 |
10 rows × 15 columns
Conclusion
From this tutorial, we learned how to use the biceps.Preparation
class to prepare input files. Currently, BICePs supports the following restraints:
NOE, J couplings, Chemical Shifts, Protection Factors and more to come…
[8]:
print(biceps.toolbox.list_possible_restraints())
['Restraint_cs', 'Restraint_J', 'Restraint_noe', 'Restraint_pf']
In the example above, we showed how to deal with NOE and J couplings (non-natural amino acids).
Chemical shifts can be computed using different algorithm. We recommend to use Shiftx2 which is also available in MDTraj library. We offer methods in biceps.toolbox
to easily compute model data (chemical shifts, J coupling, NOE distances).
Protection factors is a special observables which asks for extra work. We provide a full example that includes HDX (protection factor) restraints in biceps.
Now that the input files are ready, we can move on to constructing the conformational Ensemble and applying our restraints.
# NOTE: The following cell is for pretty notebook rendering
[14]:
from IPython.core.display import HTML
def css_styling():
styles = open("../../../theme.css", "r").read()
return HTML(styles)
css_styling()
[14]: