API Reference¶

Preparation¶

class biceps.Restraint.Preparation(nstates=0, top_file=None, outdir=None)¶

A class to prepare input_data for the biceps.Ensemble.initialize_restraints method.

Parameters

nstates (int) – number of conformational states
top_file (str) – relative path to the structure topology file
outdir (str) – relative path for output files

Ensemble¶

class biceps.Ensemble(lam, energies, debug=False)¶

Container class for biceps.Restraint.Restraint objects.

Parameters

lam (float) – lambda value to scale energies
energies (np.ndarray) – numpy array of energies for each state

Ensemble.initialize_restraints(input_data, options=None)¶

Initialize corresponding biceps.Restraint.Restraint classes based on experimental observables from input_data for each conformational state.

Parameters

input_data (list of str) – a sorted collection of filenames (files contain exp (experimental) and model (theoretical) observables)
options (list of dict) – dictionary containing keys that match biceps.Restraint.Restraint parameters and values are lists for each restraint.

# In general:
parameters = [dict(**kwargs),...,dict(**kwargs)]
# More specifically, for J and NOE data restraints, respectively:
parameters = [dict(ref='uniform', sigma=(0.05, 20.0, 1.02)),
              dict(ref='exp', sigma=(0.05, 5.0, 1.02), gamma=(0.2, 5.0, 1.02))]

Tip

See the following parent biceps.Restraint.Restraint and child class methods for the full list of keyword arguments (**kwargs) for each restraint used inside parameters:

biceps.Restraint.Restraint_cs.init_restraint

biceps.Restraint.Restraint_J.init_restraint

biceps.Restraint.Restraint_noe.init_restraint

biceps.Restraint.Restraint_pf.init_restraint

Print possible restraints with: biceps.toolbox.list_possible_restraints

Print possible extensions with: biceps.toolbox.list_possible_extensions

Ensemble.to_list()¶

Converts the Ensemble class to a list.

Returns: collection of biceps.Restraint.Restraint objects
Return type: list

Restraint¶

class biceps.Restraint.Restraint(ref='uniform', sigma=[0.05, 20.0, 1.02], use_global_ref_sigma=True, verbose=False)¶

The parent biceps.Restraint.Restraint class.

Parameters

ref_pot (str) – referenece potential e.g., “uniform”. “exp”, “gau”. If None, the default reference potential will be used for a given experimental observable
sigma (list) – (sigma_min, sigma_max, dsigma)
use_global_ref_sigma (bool) – (defaults to True)

Restraint_cs.init_restraint(data, energy, extension='H', weight=1, file_fmt='pickle', verbose=False)¶

Initialize the chemical shift restraints for each experimental and theoretical observable given data.

Parameters

data (str) – filename of data
energy (float) – The (reduced) free energy of the conformation
extensions (str) – “H”, “Ca”, “N”
weight (float) – weight for restraint

Restraint_J.init_restraint(data, energy, extension='J', weight=1, file_fmt='pickle', verbose=False)¶

Initialize the sclar coupling constant restraints for each exp (experimental) and model (theoretical) observable given data.

Parameters

data (str) – filename of data
energy (float) – The (reduced) free energy of the conformation
weight (float) – weight for restraint

Restraint_noe.init_restraint(data, energy, extension='noe', weight=1, file_fmt='pickle', verbose=False, log_normal=False, gamma=[0.2, 10.0, 1.01])¶: Initialize the NOE distance restraints for each experimental and theoretical observable given data. When log_normal=True, the modified sum of squared errors is used \(\chi_{d}^{2}(X)=\sum_{j} w_{j}( \ln ( r_{j}(X) / \gamma' r_{j}^{exp} ))^{2}\) :param data: filename of data :type data: str :param energy: The (reduced) free energy \(f=\beta*F\) of the conformation :type energy: float :param weight: weight for restraint :type weight: float :param log_normal: use log normal distribution :type log_normal: bool :param gamma: [gamma_min, gamma_max, dgamma] in log space :type gamma: list

Restraint_pf.init_restraint(data, energy, precomputed=False, pf_prior=None, Ncs_fi=None, Nhs_fi=None, beta_c=(0.05, 0.25, 0.01), beta_h=(0.0, 5.2, 0.2), beta_0=(-10.0, 0.0, 0.2), xcs=(5.0, 8.5, 0.5), xhs=(2.0, 2.7, 0.1), bs=(15.0, 16.0, 1.0), extension='pf', weight=1, file_fmt='pickle', states=None, verbose=False)¶

Initialize protection factor restraints for each exp (experimental) and model (theoretical) observable given data.

Parameters

data (str) – filename of data
energy (float) – The (reduced) free energy \(f=\beta*F\) of the conformation
weight (float) – weight for restraint
beta_c (list) – [min, max, spacing]
beta_h (list) – [min, max, spacing]
beta_0 (list) – [min, max, spacing]
xcs (list) – [min, max, spacing]
xhs (list) – [min, max, spacing]
bs (list) – [min, max, spacing]

PosteriorSampler¶

class biceps.PosteriorSampler(ensemble, freq_write_traj=100.0, freq_save_traj=100.0, verbose=False)¶

A class to perform posterior sampling of conformational populations.

Parameters

ensemble (object) – a biceps.Ensemble object
freq_write_traj (int) – the frequency (in steps) to write the MCMC trajectory
freq_print (int) – the frequency (in steps) to print status
freq_save_traj (int) – the frequency (in steps) to store the MCMC trajectory

PosteriorSampler.neglogP(states, parameters, parameter_indices)¶

Return -ln P of the current configuration.

Parameters

state (list) – the new conformational state being sampled in PosteriorSampler.sample
parameters (list) – a list of the new parameters for each of the restraints
parameter_indices (list) – parameter indices that correspond to each restraint

PosteriorSampler.sample(nsteps, burn=0, print_freq=1000, verbose=False, progress=True)¶

Perform n number of steps (nsteps) of posterior sampling, where Monte Carlo moves are accepted or rejected according to Metroplis criterion. Energies are computed via neglogP.

Parameters

nsteps (int) – the number of steps of sampling
burn (int) – the number of steps to burn
print_freq (int) – the frequency of printing to the screen
verbose (bool) – control over verbosity

Tip

Set verbose=False when using multiprocessing.

class biceps.PosteriorSamplingTrajectory(ensemble, sampler, nreplicas)¶

A container class to store and perform operations on the trajectories of sampling runs.

Parameters

ensemble (list) – ensemble of biceps.Restraint.Restraint objects
nreplicas (int) – number of replicas

PosteriorSamplingTrajectory.process_results(filename=None)¶

Process the trajectory, computing sampling statistics, ensemble-average NMR observables.

Benefits of using Numpy Z compression (npz) formatting: 1) Standardized Python library (NumPy), 2) writes a compact file of several arrays into binary format and 3) significantly smaller size over many other formats.

Parameters: filename (str) – relative path and filename for MCMC trajectory

Tip

It is possible to convert the trajectory file to a Pandas DataFrame (pickle file) with the following: biceps.toolbox.npz_to_DataFrame

Trajectory information
Key	Short Description
`rest_type`	list of strings for each restraint type.
`ref`	list of strings for each reference potential types
`allowed_parameters`	list of numpy arrays containing the allowed range of nuisance parameters with shape (m,n)
`sampled_parameters`	list of numpy arrays containing the counts of nuisance parameters sampled for each restraint with shape (m,n)
`trajectory_headers`	e.g., [step, energy, accept, state, [nuisance parameter index]]
`trajectory`	list of values—see `trajectory_headers`
`sep_accept`	list of separated acceptance ratios with shape (n+1,)
`traces`	list of sampled nuisance parameters with shape (n)
`state_trace`	list of sampled conformational state index

n is the number of allowed parameters
m is the number of restraints

Analysis¶

class biceps.Analysis(outdir, nstates=0, precheck=True, BSdir='BS.dat', popdir='populations.dat', picfile='BICePs.pdf', verbose=False)¶

A class to perform analysis and plot figures.

Parameters

nstates (int) – number of conformational states
trajs (str) – relative path to glob ‘*.npz’ trajectories (analysis files and figures will be placed inside this directory)
precheck (bool) – find the all the states that haven’t been sampled if any
BSdir (str) – relative path for BICePs score file name
popdir (str) – relative path for BICePs reweighted populations file name
picfile (str) – relative path for BICePs figure

Analysis.plot(plottype='hist', figname='BICePs.pdf', figsize=None, label_fontsize=12, legend_fontsize=10)¶

Plot figures for population and sampled nuisance parameters.

Parameters: show (bool) – show the plot in Jupyter Notebook.

Convergence¶

biceps.convergence.exponential_fit(autocorrelation, exp_function='single', v0=None, verbose=False)¶

Calls on single_exp_decay (‘single’) or double_exp_decay (‘double’) for an exponential fitting of an autocorrelation curve. See SciPy curve fit for more details.

Parameters

autocorrelation (np.ndarray) – the autocorrelation of some timeseries
exp_function (str) – default=’single’ (‘single’ or ‘double’
v0 (list) – Initial conditions for exponential fitting. Default for ‘single’ is v0=[0.0, 1.0, 4000.]=[a0, a1, tau1] where \(a_{0} + a_{1}*exp(-(x/ au_{1}))\) and default for ‘double’ is v0=[0.0, 0.9, 0.1, 4000., 200.0]=[a0, a1, a2, tau1, tau2] where \(f(x) = a_{0} + a_{1}*exp(-(x/ au_{1})) + a_{2}*exp(-(x/ au_{2}))\)

Returns

the y-values of the fitted curve.

Return type

yfit(np.ndarray)

biceps.convergence.compute_autocorrelation_curves(data, max_tau, normalize=True)¶

Calculates the autocorrelation for a list of arrays, where each array is a separate time-series.

Parameters

data (list) – list of separate timeseries
maxtau (int) – the upper bound of autocorrelation lag time
normalize (bool) – to normalize

Returns: np.ndarray

biceps.convergence.g(f, max_tau=10000, normalize=True)¶

Calculate the autocorrelaton function for a time-series f(t).

Parameters

f (np.ndarray) – a 1D numpy array containing the time series f(t)
maxtau (int) – the maximum autocorrelation time to consider.
normalize (bool) – if True, return g(tau)/g[0]

Returns: np.array: a numpy array of size (max_tau+1,) containing g(tau)

biceps.convergence.compute_autocorrelation_time(autocorrelations)¶

Computes the autocorrelation time \(\tau_{auto} = \int C_{\tau} d\tau\)

Parameters: autocorrelations (np.ndarray) – an array containing the autocorrelations for each time-series.

Returns: np.ndarray

biceps.convergence.get_blocks(data, nblocks=5)¶

Method used to partition data into blocks. The data is a list of arrays, where each array is a separate time-series or autocorrelation.

Parameters: data (list) – list of separate timeseries

biceps.convergence.compute_JSD(T1, T2, T_total, ind, allowed_parameters)¶

Compute JSD for a given part of trajectory.

\(JSD = H(P_{comb}) - {\pi_{1}}{H(P_{1})} - {\pi_{2}}{H(P_{2})}\), where \(P_{comb}\) is the combined data (\(P_{1} \cup P_{2}\)). \(H\) is the Shannon entropy of distribution \(P_{i}\) and \(\pi_{i}\) is the weight for the probability distribution \(P_{i}\). \(H(P_{i}) = \sum -\frac{r_{i}}{N_{i}}*ln(\frac{r_{i}}{N_{i}})\), where \(r_{i}\) and \(N_{i}\) represents sampled times of a specific parameter index and the total number of samples of the parameter, respectively

Variables

T_total (T1, T2,) – part 1, part2 and total (part1 + part2)
rest_type – experimental restraint type
allowed_parameters – nuisacne parameters range

Return float

Jensen–Shannon divergence

class biceps.Convergence(traj=None, filename=None, outdir='./', verbose=False)¶

Convergence submodule for BICePs.

Parameters

filename (str) – relative path and filename to MCMC trajectory (NumPy npz file)
outdir (str) – relative path for output files

Convergence.plot_traces(figname='traj_traces.png', xlim=None)¶

Plot trajectory traces.

Parameters: xlim (tuple) – matplotlib x-axis limits

Convergence.plot_auto_curve(xlim=None, figname='autocorrelation_curve.png', std_x=None, std_y=None)¶

Plot auto-correlation curve. This function saves a figure of auto-correlation with error bars at the 95% confidence interval (\(\tau_{auto}\) is rounded to the nearest integer).

Parameters

xlim (tuple) – matplotlib x-axis limits
std_x (np.ndarray) –
std_y (np.ndarray) –

Convergence.plot_block_avg(nblock, r_max, figname='block_avg.png')¶

Plot block average

Parameters

nblock (int) – is the number of partitions in the time series
r_max (np.ndarray) – maximum sampled parameters for each restraint
figname (str) – figure name without relative path (taken care of)

Convergence.get_autocorrelation_curves(method='auto', nblocks=5, maxtau=10000, plot_traces=False)¶

Compute autocorrelaton function for a time-series f(t), partition the data into the specified number of blocks and plot the autocorrelation curve. Saves a figure of autocorrelation curves for each restraint.

Parameters

method (str) – method for computing autocorrelation time; “block-avg-auto” or “exp” or “auto”
nblocks (int) – number of blocks to split up the trajectory
maxtau (int) – the upper bound of autocorrelation lag time
plot_traces (bool) – plot the trajectory traces?

Convergence.process(nblock=5, nfold=10, nround=100, savefile=True, block_avg=False, normalize=True)¶

Process the trajectory and execute compute_JSD() with plot_JSD_conv() and plot_JSD_distribution(). If block_avg=True, then block averaging will be executed and plot_block_avg() will be executed as well.

Parameters

nblock (int) – is the number of partitions in the time series
nfold (int) – is the number of partitions in the shuffled (subsampled) trajectory
nround (int) – is the number of rounds of bootstrapping when computing JSDs
savefile (bool) –
block_avg (bool) – use block averaging
verbose (bool) – verbosity

toolbox¶

biceps.toolbox.sort_data(dataFiles)¶

Sorting the data by extension into lists. Data can be located in various directories. Provide a list of paths where the data can be found. Some examples of fileextensions: {.noe,.J,.cs_H,.cs_Ha}.

Parameters: dataFiles (list) – list of strings where the data can be found
Raises: ValueError – if the data directory does not exist

>>> biceps.toolbox.sort_data()

biceps.toolbox.get_files(path)¶

Return a sorted list of files that will be globbed from the path given. First, this function can handle decimals and multiple numbers that are seperated by characters. https://pypi.org/project/natsort/

Parameters: path (str) –
Returns: sorted list

biceps.toolbox.list_res(input_data)¶

Determine the ordering of the experimental restraints that will be included in sampling.

Parameters: input_data (list) – see biceps.Ensemble.initialize_restraints

>>> biceps.toolbox.list_res()

biceps.toolbox.list_possible_restraints()¶

Function will return a list of all possible restraint classes in Restraint.py.

>>> biceps.toolbox.list_possible_restraints()

biceps.toolbox.list_extensions(input_data)¶

Determine the ordering of the experimental restraints that will be included in sampling.

Parameters: input_data (list) – see biceps.Ensemble.initialize_restraints

>>> biceps.toolbox.list_extensions()

biceps.toolbox.list_possible_extensions()¶

Function will return a list of all possible input data file extensions.

>>> biceps.toolbox.list_possible_extensions()

biceps.toolbox.npz_to_DataFrame(file, out_filename='traj_lambda0.00.pkl', verbose=False)¶

Converts numpy Z compressed file to Pandas DataFrame (*.pkl)

>>> biceps.toolbox.npz_to_DataFrame(file, out_filename="traj_lambda0.00.pkl")

biceps.toolbox.save_object(obj, filename)¶

Saves python object as pickle file. :param obj: python object :type obj: object :param filename: relative path for ouput :type filename: str

>>> biceps.toolbox.save_object()