Cineromycin B¶

Summary:

In our previous work (DOI: 10.1002/jcc.23738), we determine solution-state conformational populations of the 14-membered macrocycle cineromycin B (O-desmethyl albocycline), using a combination of previously published sparse Nuclear Magnetic Resonance (NMR) observables and replica exchange molecular dynamic/Quantum mechanics (QM)-refined conformational ensembles. Cineromycin B is a 14-membered macrolide antibiotic that has become increasingly known for having activity against methicillin-resistant Staphylococcus Aureus (MRSA). In this example, we show how to calculate the consistency of computational modeling with experiment, and the relative importance of reference potentials and other model parameters. We will walk through the process of setting up a biceps script. You are welcome to take this as a template for you and make necessary modification to get it work for you. All input files used in this tutorial are prepared from this tutorial.

To convert this Jupyter Notebook into a script use the following command:

$ jupyter nbconvert --to python cineromycinB.ipynb

First, we need to import source code for BICePs classes. More information of these classes can be found here.

[3]:

import numpy as np
import pandas as pd
import biceps
# NOTE: timeseries module Warning is from MBAR

[4]:

# What possible experimental data does biceps accept?
print(f"Possible data restraints: {biceps.toolbox.list_possible_restraints()}")
print(f"Possible input data extensions: {biceps.toolbox.list_possible_extensions()}")

Possible data restraints: ['Restraint_cs', 'Restraint_J', 'Restraint_noe', 'Restraint_pf']
Possible input data extensions: ['H', 'Ca', 'N', 'J', 'noe', 'pf']

Now we need to specify input files and output files folder.

[5]:

####### Data and Output Directories #######
energies = np.loadtxt('cineromycin_B/cineromycinB_QMenergies.dat')*627.509  # convert from hartrees to kcal/mol
energies = energies/0.5959   # convert to reduced free energies F = f/kT
energies -= energies.min()  # set ground state to zero, just in case

# REQUIRED: specify directory of input data (BICePs readable format)
input_data = biceps.toolbox.sort_data('cineromycin_B/J_NOE')

# REQUIRED: specify outcome directory of BICePs sampling
outdir = 'results'
# Make a new directory if we have to
biceps.toolbox.mkdir(outdir)

Next, we need to specify number of steps for BICePs sampling. We recommend to run at least 1M steps for converged Monte Carlo samplings. In BICePs, we also prepare functions of checking convergence of Monte Carlo. More information can be found here.

[6]:

# REQUIRED: number of MCMC steps for each lambda
nsteps = 100000 # 100 k steps for testing
#nsteps = 10000000 # 10 M steps for production

We need to specify what lambda values we want to use in BICePs samplings. Briefly, lambda values are similar to the parameters used in free energy perturbation (FEP) and has effect on the BICePs score. The lambda values represent how much prior information from computational modeling is included in BICePs sampling (1.0 means all, 0.0 means none). As we explained in this work, one can consider BICePs score as the relative free energy change between different models. More lambda values will increase the samplings for multistate Bennett acceptance ratio (MBAR) predictions in free energy change and populations. However more lambda values also will slow down the whole process of BICePs (as more samplings need to run), so balancing the accuracy and efficiency is important. To successfully finish a BICePs sampling, lambda values of 0.0 and 1.0 are necessary. Based on our experience, the lambda values of 0.0,0.5,1.0 are good enough for BICePs sampling.

[7]:

# REQUIRED: specify how many lambdas to sample (more lambdas will provide higher
# accuracy but slower the whole process, lambda=0.0 and 1.0 are necessary)
n_lambdas = 2
lambda_values = np.linspace(0.0, 1.0, n_lambdas)
print(lambda_values)

[0. 1.]

Print what experimental observables are included in BICePs sampling. This is optional and provides a chance for double check before running BICePs sampling.

In this tutorial, we used both J couplings and NOE (pairwise distances) in BICePs sampling.

[8]:

# OPTIONAL but RECOMMENDED: print experimental restraints included (a chance for double check)
print(f"Input data: {biceps.toolbox.list_extensions(input_data)}")

Input data: ['J', 'noe']

Another key parameter for BICePs set-up is the type of reference potential for each experimental observables. More information of reference potential can be found here.

Three reference potentials are supported in BICePs: uniform (‘uniform’), exponential (‘exp’), Gaussian (‘gau’).

As we found in previous research, exponential reference potential is useful in most cases. Some higher level task may require more in reference potential selection (e.g force field parametrization).

(Note: It will be helpful to print out what is the order of experimental observables included in BICePs sampling as shown above.)

Users can specify reference potential to use for each experimental observable will be in the same order as input_data. If you want, you can specify nuisance parameters (uncertainty in experiments) range but it’s not required. Our default parameters range is broad enough. “gamma” is a scaling factor converting distances to NOE intensity in experiments. It is only used when NOE restraint is used in BICePs sampling.

[9]:

parameters = [
        {"ref": 'uniform', "sigma": (0.05, 20.0, 1.02)},
        {"ref": 'exponential', "sigma": (0.05, 5.0, 1.02), "gamma": (0.2, 5.0, 1.02)}
        ]
pd.DataFrame(parameters)

[9]:

	ref	sigma	gamma
0	uniform	(0.05, 20.0, 1.02)	NaN
1	exponential	(0.05, 5.0, 1.02)	(0.2, 5.0, 1.02)

Now we need to set up BICePs samplings and feed arguments using variables we defined above. In most cases, you don’t need to modify this part as long as you defined all these parameters shown above.

[10]:

####### MCMC Simulations #######
for lam in lambda_values:
    print(f"lambda: {lam}")
    ensemble = biceps.Ensemble(lam, energies)
    ensemble.initialize_restraints(input_data, parameters)
    sampler = biceps.PosteriorSampler(ensemble)
    sampler.sample(nsteps=nsteps, print_freq=10000, verbose=True)
    sampler.traj.process_results(f"{outdir}/traj_lambda{lam}.npz")

lambda: 0.0
Step            State   Para Indices            Avg Energy      Acceptance (%)
0               [54]    [151, 116, 81]          123.122         100.00  True
10000           [27]    [246, 127, 99]          6.384           66.61   True
20000           [87]    [209, 131, 97]          5.411           66.99   True
30000           [77]    [229, 135, 102]         7.863           67.57   False
40000           [45]    [250, 124, 97]          7.009           67.86   True
50000           [87]    [209, 141, 96]          5.762           66.97   True
60000           [87]    [212, 144, 99]          5.530           66.58   True
70000           [32]    [220, 142, 91]          7.669           66.80   True
80000           [58]    [239, 150, 98]          8.864           67.51   False
90000           [21]    [249, 125, 93]          6.046           67.99   False

Accepted 68.364 %


Accepted [ ...Nuisance paramters..., state] %
Accepted [32.502 29.963 29.963  5.899] %

lambda: 1.0
Step            State   Para Indices            Avg Energy      Acceptance (%)
0               [77]    [150, 116, 81]          123.708         100.00  True
10000           [38]    [227, 155, 99]          9.126           63.02   False
20000           [65]    [249, 144, 92]          6.959           63.90   True
30000           [65]    [243, 131, 90]          5.939           63.73   False
40000           [39]    [275, 137, 94]          8.569           63.88   False
50000           [90]    [242, 142, 95]          6.631           64.19   True
60000           [65]    [248, 129, 92]          5.834           64.26   False
70000           [65]    [232, 129, 88]          6.763           64.28   True
80000           [65]    [222, 136, 95]          7.673           64.22   True
90000           [38]    [213, 134, 96]          8.254           64.19   True

Accepted 64.298 %


Accepted [ ...Nuisance paramters..., state] %
Accepted [32.632 30.075 30.075  1.591] %

Note that all output trajectories and images will be placed in the previously defined ``outdir`` as ‘results_ref_normal’…

Also note that the user can quickly check to see if our MCMC trajectories have converged using our ``biceps.convergence`` module. See `this tutorial <https://biceps.readthedocs.io/en/latest/examples/Tutorials/Convergence/convergence.html>`__ for an example and more details.

Then, we need to run functions in Analysis to get predicted populations of each conformational states and BICePs score. A figure of posterior distribution of populations and nuisance parameters will be plotted as well.

The output files include: population information (“populations.dat”), figure of sampled parameters distribution (“BICePs.pdf”), BICePs score information (“BS.dat”).

[15]:

############ MBAR and Figures ###########
%matplotlib inline
# Let's do analysis using MBAR algorithm and plot figures
A = biceps.Analysis(outdir=f"{outdir}", nstates=len(energies))
fig = A.plot()

not all state sampled, these states [ 0  1  3  4  5  7  8  9 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27
 28 29 34 36 40 41 42 44 48 49 51 52 53 54 55 56 57 60 61 62 63 64 66 69
 71 73 74 75 76 78 81 82 83 86 87 88 89 95 96 97 98 99] are not sampled

/Users/rr/github/biceps/docs/examples/full_examples/cineromycinB/biceps/Analysis.py:246: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  for g in np.array(traj[0][4]):
/Users/rr/github/biceps/docs/examples/full_examples/cineromycinB/biceps/Analysis.py:250: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  rest_type = np.array(rest_type)
/Users/rr/github/biceps/docs/examples/full_examples/cineromycinB/biceps/Analysis.py:246: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  for g in np.array(traj[0][4]):
/Users/rr/github/biceps/docs/examples/full_examples/cineromycinB/biceps/Analysis.py:250: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  rest_type = np.array(rest_type)

../../../_images/examples_full_examples_cineromycinB_CineromycinB_20_2.png

../../../_images/examples_full_examples_cineromycinB_CineromycinB_20_3.png

Now, let’s take a look at the output figure above (“BICePs.pdf”).

The top left panel shows the population of each states in the presence (y-axis) and absense (x-axis) of computational modeling prior information. You can find some states (e.g state 38, 59) get awarded after including computational prior information. If you check this work you will see how misleading the results will be if we only use experimental restraints without computational prior information. The other three panels show the distribution of nuisance parameters in the presence or absence of computational modeling information. It may not be clear in this example due to the limit of finite sampling, but based on our experience, including prior information from our simulations will lower the nuisance parameters than only using experimental restraints.

Next, let’s take a look at the populations file.

There are 100 rows corresponding to 100 clustered states. Four columns corresponding to populations of each state (row) for 2 lambda values (first 2 columns) and population change (last 2 columns).

[16]:

import pandas as pd
pops = np.loadtxt(outdir+'/populations.dat')
df = pd.DataFrame(pops)
print(df)

           0             1         2             3
0   0.013974  2.559152e-05  0.003707  6.883350e-06
1   0.001994  6.042650e-06  0.001408  4.276702e-06
2   0.006086  9.142238e-04  0.002294  3.463398e-04
3   0.000000  0.000000e+00       NaN           NaN
4   0.000000  0.000000e+00       NaN           NaN
..       ...           ...       ...           ...
95  0.007000  1.918854e-11  0.002636  7.275900e-12
96  0.000991  9.009883e-06  0.000990  9.013945e-06
97  0.001994  6.065073e-06  0.001408  4.292571e-06
98  0.001000  1.475099e-11  0.000999  1.475777e-11
99  0.000000  0.000000e+00       NaN           NaN

[100 rows x 4 columns]

# NOTE: The following cell is for pretty notebook rendering

[17]:

from IPython.core.display import HTML
def css_styling():
    styles = open("../../../theme.css", "r").read()
    return HTML(styles)
css_styling()

[17]: