Preparation¶

This tutorial will explain how to properly prepare input files for biceps using Python. Each experimental observable requires its own method from the biceps.Preparation class to prepare the input data (datapoints of theory and experiment). For example, biceps.Preparation.prep_noe will be used to prepare data of NOE distances and biceps.Preparation.prep_J for scalar coupling constants. Our goal is to prepare input data for the next step, which is constructing the ensemble of experimental restraints. All data from this tutorial can be found from this work (DOI:10.1002/jcc.23738).

[1]:

import mdtraj as md
import numpy as np
import pandas as pd
import biceps

BICePs - Bayesian Inference of Conformational Populations, Version 2.0

Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate.  The estimate presumes the use of many statistically independent samples.  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data.

In this tutorial, we have two experimental observables: (1) J couplings for small molecules and (2) NMR nuclear Overhauser effect (NOE) (pairwise distances). First we need to perform conformational clustering on our MD simulations data. In this case, 100 metastable states are clustered. Now we need to prepare prior knowledge we learned from computational simulations. Normally, we used potential energy for each metastable state. In the original work, Zhou et al did Quantum Mechanical (QM) calculations to refine each state and B3LYP energy was used as priors in BICePs calculation. Instructions of QM calculations are beyond the scope of this tutorial. Alternatively, we can estimate the potential energy using \(U = -ln(P)\) where \(P\) is the normalized population for each state. We also have built-in functions (toolbox) to conduct this conversion.

Next, we need to compute pairwise distances and J coupling constants for each clustered state. To compute pairwise distances, we recommend to use MDTraj (biceps.toolbox.compute_distances) which is free to download. To compare simulated conformational ensembles to experimental NOE measurements, we normally computed \(<r^{-6}>^{-1/6}\). For convenience in this tutorial, we assume the cluster center of each state is representative enough and simply compute pairwise distances for the cluster center conformation. In practice, we still recommend users to compute ensemble-averaged distance.

[2]:

# Compute model_data for NOE
data_dir = "../../datasets/cineromycin_B/"
states = biceps.toolbox.get_files(data_dir+"cineromycinB_pdbs/*")
nstates = len(states)
ind=data_dir+'atom_indice_noe.txt'
outdir = "NOE/"
biceps.toolbox.mkdir(outdir)
model_data_NOE = biceps.toolbox.compute_distances(states, ind, outdir)

Next, we need to convert computed distance to Pandas DataFrame. The default file format for saving prepared input data is fmt="pickle". biceps currently supports *.csv and *.pkl file formats and possibly other input/output file formats from Pandas.

[3]:

# Now using biceps Preparation submodule
model_data_NOE = str(outdir+"*.txt")
exp_data_NOE = data_dir+"noe_distance.txt"
preparation = biceps.Preparation(nstates, top_file=states[0], outdir="J_NOE")
preparation.prepare_noe(exp_data_NOE, model_data_NOE, indices=ind, write_as="csv", verbose=False)

Now, let’s take a look at what’s inside the newly generated files.

[4]:

pd.options.display.max_columns = 12
pd.options.display.max_rows = 10
pd.read_csv('J_NOE/0.noe', index_col=0)

[4]:

	exp	model	restraint_index	atom_index1	res1	atom_name1	atom_index2	res2	atom_name2
0	2.5	2.864866	1	23	UNK1	H3	25	UNK1	H5
1	2.5	4.400116	2	23	UNK1	H3	27	UNK1	H7
2	2.5	4.045793	3	25	UNK1	H5	27	UNK1	H7
3	2.5	3.481033	4	24	UNK1	H4	26	UNK1	H6
4	2.5	2.566673	5	25	UNK1	H5	26	UNK1	H6
...	...	...	...	...	...	...	...	...	...
28	2.5	3.837284	14	36	UNK1	H16	39	UNK1	H19
29	2.5	2.601645	14	36	UNK1	H16	40	UNK1	H20
30	2.5	2.810059	14	37	UNK1	H17	38	UNK1	H18
31	2.5	4.223544	14	37	UNK1	H17	39	UNK1	H19
32	2.5	3.634443	14	37	UNK1	H17	40	UNK1	H20

33 rows × 9 columns

Now let’s move on to J couplings. Model predictions of coupling constants from dihedral angles θ were obtained from Karplus relations chosen depending on the relevant stereochemistry.

Again, we need to convert computed J coupling constants to Pandas DataFrame.

[5]:

# Compute model_data forJ coupling
ind = np.load(data_dir+'ind.npy') # atom indices of J coupling constants

outdir = "J/"
biceps.toolbox.mkdir(outdir)

# Karplus relations for each dihedral angles
karplus_key=np.loadtxt(data_dir+'Karplus.txt', dtype=str)
print('Karplus relations', karplus_key)
states = biceps.toolbox.get_files(data_dir+"cineromycinB_pdbs/*.fixed.pdb")
biceps.toolbox.compute_nonaa_scalar_coupling(states=states,
        indices=ind, karplus_key=karplus_key, outdir=outdir)

Karplus relations ['Allylic' 'Karplus_HH' 'Karplus_HH' 'Karplus_HH' 'Karplus_HH'
 'Karplus_HH' 'Karplus_antiperiplanar_O' 'Karplus_antiperiplanar_O'
 'Karplus_antiperiplanar_O' 'Karplus_antiperiplanar_O']

Next, we need to convert J coupling to Pandas DataFrame. Let’s use fmt="pickle" to show off how we can import various file formats for different restraints…

[6]:

indices = data_dir+'atom_indice_J.txt'
exp_data_J = data_dir+'exp_Jcoupling.txt'
model_data_J = data_dir+"J_coupling/*.txt"

# Now using biceps Preparation submodule
preparation = biceps.Preparation(nstates, top_file=states[0], outdir="J_NOE")
preparation.prepare_J(exp_data=exp_data_J, model_data=model_data_J,
                   indices=indices, write_as="pickle", verbose=False)

Now, let’s take a look at what’s inside the newly generated files.

[7]:

pd.read_pickle('J_NOE/0.J')

[7]:

	exp	model	restraint_index	atom_index1	res1	atom_name1	...	atom_index3	res3	atom_name3	atom_index4	res4	atom_name4
0	5.0	3.279128	1	25	UNK1	H5	...	10	UNK1	C7	26	UNK1	H6
1	1.5	13.336870	2	30	UNK1	H10	...	15	UNK1	C12	32	UNK1	H12
2	1.5	0.600845	3	31	UNK1	H11	...	15	UNK1	C12	32	UNK1	H12
3	6.0	13.877749	4	32	UNK1	H12	...	20	UNK1	C16	38	UNK1	H18
4	6.0	3.339372	4	32	UNK1	H12	...	20	UNK1	C16	39	UNK1	H19
5	6.0	1.828765	4	32	UNK1	H12	...	20	UNK1	C16	40	UNK1	H20
6	1.5	0.464738	5	32	UNK1	H12	...	16	UNK1	C13	33	UNK1	H13
7	7.0	2.956672	6	33	UNK1	H13	...	21	UNK1	C17	35	UNK1	H15
8	7.0	10.805575	6	33	UNK1	H13	...	21	UNK1	C17	36	UNK1	H16
9	7.0	1.232210	6	33	UNK1	H13	...	21	UNK1	C17	37	UNK1	H17

10 rows × 15 columns

Conclusion

From this tutorial, we learned how to use the biceps.Preparation class to prepare input files. Currently, BICePs supports the following restraints:

NOE, J couplings, Chemical Shifts, Protection Factors and more to come…

[8]:

print(biceps.toolbox.list_possible_restraints())

['Restraint_cs', 'Restraint_J', 'Restraint_noe', 'Restraint_pf']

In the example above, we showed how to deal with NOE and J couplings (non-natural amino acids).

Chemical shifts can be computed using different algorithm. We recommend to use Shiftx2 which is also available in MDTraj library. We offer methods in biceps.toolbox to easily compute model data (chemical shifts, J coupling, NOE distances).

Protection factors is a special observables which asks for extra work. We provide a full example that includes HDX (protection factor) restraints in biceps.

Now that the input files are ready, we can move on to constructing the conformational Ensemble and applying our restraints.

# NOTE: The following cell is for pretty notebook rendering

[14]:

from IPython.core.display import HTML
def css_styling():
    styles = open("../../../theme.css", "r").read()
    return HTML(styles)
css_styling()

[14]: