Ensemble


This tutorial shows the user how to appropriately use the biceps.Ensemble class to construct the ensemble and apply data restraints that were prepared from the previous tutorial (Preparation). Please note that in order to compute the relative free energies, one must perform sampling for at least two lambda values.


[24]:
import numpy as np
import pandas as pd
import biceps
[25]:
print(f"Possible input data extensions: {biceps.toolbox.list_possible_extensions()}")
Possible input data extensions: ['H', 'Ca', 'N', 'J', 'noe', 'pf']
[26]:
####### Data and Output Directories #######
energies = np.loadtxt('cineromycin_B/cineromycinB_QMenergies.dat')*627.509  # convert from hartrees to kcal/mol
energies = energies/0.5959   # convert to reduced free energies F = f/kT
energies -= energies.min()  # set ground state to zero, just in case

# Point to directory that contains input files
#input_data = biceps.toolbox.sort_data('cineromycin_B/J_NOE')
input_data = biceps.toolbox.sort_data("J_NOE")
print(f"Input data: {biceps.toolbox.list_extensions(input_data)}")

# Make a new directory if we have to
outdir = 'results'
biceps.toolbox.mkdir(outdir)
Input data: ['J', 'noe']

Another key parameter for BICePs set-up is the type of reference potential for each experimental observables. More information of reference potential can be found here.

Three reference potentials are supported in BICePs: uniform (‘uniform’), exponential (‘exp’), Gaussian (‘gau’).

As we found in previous research, exponential reference potential is useful in most cases. Some higher level task may require more in reference potential selection (e.g force field parametrization).

(Note: It will be helpful to print out what is the order of experimental observables included in BICePs sampling as shown above.)

The order of the parameters below must follow the order of biceps.toolbox.list_extensions(data). Therefore, our parameters will be a list of dictionaries e.g., \(\text{[{'J'}, {'noe'}]}\). Recall, in the last section we saved J coupling files as *.pkl files and NOE distances as *.csv files. If the default (``*.pkl`` files) is not being used, then we need to specify this inside the corresponding dictionary…

[27]:
n_lambdas = 2
lambda_values = np.linspace(0.0, 1.0, n_lambdas)
options = [
        dict(ref='uniform', sigma=(0.05, 20.0, 1.02), file_fmt="pickle"),
        dict(ref='exponential', sigma=(0.05, 5.0, 1.02), gamma=(0.2, 5.0, 1.01), file_fmt="csv")
        ]
pd.DataFrame(options)
[27]:
ref sigma file_fmt gamma
0 uniform (0.05, 20.0, 1.02) pickle NaN
1 exponential (0.05, 5.0, 1.02) csv (0.2, 5.0, 1.01)

Let’s print out the allowed \(\sigma_{J}\) space when sigma=(0.05, 20.0, 1.02).

[28]:
allowed_sigma = np.exp(np.arange(np.log(0.05), np.log(20.0), np.log(1.02)))
print(f"allowed_sigma[:5] = {allowed_sigma[:5]}")
print(f"allowed_sigma[-5:] = {allowed_sigma[-5:]}")
print(f"len(allowed_sigma) = {len(allowed_sigma)}")
allowed_sigma[:5] = [0.05       0.051      0.05202    0.0530604  0.05412161]
allowed_sigma[-5:] = [18.27347693 18.63894647 19.0117254  19.39195991 19.77979911]
len(allowed_sigma) = 303

Quick note on lambda values:

We need to specify what lambda value(s) we want to use in BICePs samplings. Briefly, lambda values are similar to the parameters used in free energy perturbation (FEP) and has effect on the BICePs score. The lambda values represent how much prior information from computational modeling is included in BICePs sampling (1.0 means all, 0.0 means none). As we explained in this work, one can consider BICePs score as the relative free energy change between different models. More lambda values will increase the samplings for multistate Bennett acceptance ratio (MBAR) predictions in free energy change and populations. However more lambda values also will slow down the whole process of BICePs (as more samplings need to run), so balancing the accuracy and efficiency is important. To successfully finish a BICePs sampling, lambda values of 0.0 and 1.0 are necessary. Based on our experience, three lambda values of 0.0,0.5,1.0 are suggested.

[29]:
for lam in lambda_values:
    print(f"lambda: {lam}")
    ensemble = biceps.Ensemble(lam, energies)
    ensemble.initialize_restraints(input_data, options)
    # Save each ensemble as a pickle file
    print(f"Saving ensemble_{lam}.pkl ...")
    biceps.toolbox.save_object(ensemble, outdir+"/ensemble_%s.pkl"%lam)
lambda: 0.0
Saving ensemble_0.0.pkl ...
lambda: 1.0
Saving ensemble_1.0.pkl ...

Let’s take a look at the ensemble (lam=1.0)…

The ensemble consists of a list of 2 restraint objects for each state. Here we are showing the first 10 states.

[30]:
print(ensemble.to_list()[:10])
[[<biceps.Restraint.Restraint_J object at 0x110f38880>, <biceps.Restraint.Restraint_noe object at 0x103f937c0>], [<biceps.Restraint.Restraint_J object at 0x105149a00>, <biceps.Restraint.Restraint_noe object at 0x10500c1c0>], [<biceps.Restraint.Restraint_J object at 0x10500c8e0>, <biceps.Restraint.Restraint_noe object at 0x103f93760>], [<biceps.Restraint.Restraint_J object at 0x105269c40>, <biceps.Restraint.Restraint_noe object at 0x1374a1df0>], [<biceps.Restraint.Restraint_J object at 0x1374a1730>, <biceps.Restraint.Restraint_noe object at 0x1374a14f0>], [<biceps.Restraint.Restraint_J object at 0x1374a1130>, <biceps.Restraint.Restraint_noe object at 0x10527df40>], [<biceps.Restraint.Restraint_J object at 0x1374a1be0>, <biceps.Restraint.Restraint_noe object at 0x10527d580>], [<biceps.Restraint.Restraint_J object at 0x10527de50>, <biceps.Restraint.Restraint_noe object at 0x13740b850>], [<biceps.Restraint.Restraint_J object at 0x10527d4c0>, <biceps.Restraint.Restraint_noe object at 0x13740bbe0>], [<biceps.Restraint.Restraint_J object at 0x13740b4c0>, <biceps.Restraint.Restraint_noe object at 0x137330070>]]

Conclusion

In this tutorial, we explained how to construct an ensemble (for each lambda) of restraints for each state, which we saved as a pickle file. In the next tutorial, PosteriorSampler we will Sample the posterior distribution by using the biceps.PosteriorSampler class.

# NOTE: The following cell is for pretty notebook rendering

[31]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../../../theme.css", "r").read()
    return HTML(styles)
css_styling()
[31]: