API Reference

Preparation

class biceps.Restraint.Preparation(nstates=0, top_file=None, outdir=None)

A class to prepare input_data for the biceps.Ensemble.initialize_restraints method.

Parameters
  • nstates (int) – number of conformational states

  • top_file (str) – relative path to the structure topology file

  • outdir (str) – relative path for output files

Ensemble

class biceps.Ensemble(lam, energies, debug=False)

Container class for biceps.Restraint.Restraint objects.

Parameters
  • lam (float) – lambda value to scale energies

  • energies (np.ndarray) – numpy array of energies for each state

Ensemble.initialize_restraints(input_data, options=None)

Initialize corresponding biceps.Restraint.Restraint classes based on experimental observables from input_data for each conformational state.

Parameters
  • input_data (list of str) – a sorted collection of filenames (files contain exp (experimental) and model (theoretical) observables)

  • options (list of dict) – dictionary containing keys that match biceps.Restraint.Restraint parameters and values are lists for each restraint.

# In general:
parameters = [dict(**kwargs),...,dict(**kwargs)]
# More specifically, for J and NOE data restraints, respectively:
parameters = [dict(ref='uniform', sigma=(0.05, 20.0, 1.02)),
              dict(ref='exp', sigma=(0.05, 5.0, 1.02), gamma=(0.2, 5.0, 1.02))]

Tip

See the following parent biceps.Restraint.Restraint and child class methods for the full list of keyword arguments (**kwargs) for each restraint used inside parameters:

biceps.Restraint.Restraint_cs.init_restraint

biceps.Restraint.Restraint_J.init_restraint

biceps.Restraint.Restraint_noe.init_restraint

biceps.Restraint.Restraint_pf.init_restraint

Print possible restraints with: biceps.toolbox.list_possible_restraints

Print possible extensions with: biceps.toolbox.list_possible_extensions

Ensemble.to_list()

Converts the Ensemble class to a list.

Returns

collection of biceps.Restraint.Restraint objects

Return type

list

Restraint

class biceps.Restraint.Restraint(ref='uniform', sigma=[0.05, 20.0, 1.02], use_global_ref_sigma=True, verbose=False)

The parent biceps.Restraint.Restraint class.

Parameters
  • ref_pot (str) – referenece potential e.g., “uniform”. “exp”, “gau”. If None, the default reference potential will be used for a given experimental observable

  • sigma (list) – (sigma_min, sigma_max, dsigma)

  • use_global_ref_sigma (bool) – (defaults to True)

Restraint_cs.init_restraint(data, energy, extension='H', weight=1, file_fmt='pickle', verbose=False)

Initialize the chemical shift restraints for each experimental and theoretical observable given data.

Parameters
  • data (str) – filename of data

  • energy (float) – The (reduced) free energy of the conformation

  • extensions (str) – “H”, “Ca”, “N”

  • weight (float) – weight for restraint

Restraint_J.init_restraint(data, energy, extension='J', weight=1, file_fmt='pickle', verbose=False)

Initialize the sclar coupling constant restraints for each exp (experimental) and model (theoretical) observable given data.

Parameters
  • data (str) – filename of data

  • energy (float) – The (reduced) free energy of the conformation

  • weight (float) – weight for restraint

Restraint_noe.init_restraint(data, energy, extension='noe', weight=1, file_fmt='pickle', verbose=False, log_normal=False, gamma=[0.2, 10.0, 1.01])

Initialize the NOE distance restraints for each experimental and theoretical observable given data. When log_normal=True, the modified sum of squared errors is used \(\chi_{d}^{2}(X)=\sum_{j} w_{j}( \ln ( r_{j}(X) / \gamma' r_{j}^{exp} ))^{2}\) :param data: filename of data :type data: str :param energy: The (reduced) free energy \(f=\beta*F\) of the conformation :type energy: float :param weight: weight for restraint :type weight: float :param log_normal: use log normal distribution :type log_normal: bool :param gamma: [gamma_min, gamma_max, dgamma] in log space :type gamma: list

Restraint_pf.init_restraint(data, energy, precomputed=False, pf_prior=None, Ncs_fi=None, Nhs_fi=None, beta_c=(0.05, 0.25, 0.01), beta_h=(0.0, 5.2, 0.2), beta_0=(-10.0, 0.0, 0.2), xcs=(5.0, 8.5, 0.5), xhs=(2.0, 2.7, 0.1), bs=(15.0, 16.0, 1.0), extension='pf', weight=1, file_fmt='pickle', states=None, verbose=False)

Initialize protection factor restraints for each exp (experimental) and model (theoretical) observable given data.

Parameters
  • data (str) – filename of data

  • energy (float) – The (reduced) free energy \(f=\beta*F\) of the conformation

  • weight (float) – weight for restraint

  • beta_c (list) – [min, max, spacing]

  • beta_h (list) – [min, max, spacing]

  • beta_0 (list) – [min, max, spacing]

  • xcs (list) – [min, max, spacing]

  • xhs (list) – [min, max, spacing]

  • bs (list) – [min, max, spacing]

PosteriorSampler

class biceps.PosteriorSampler(ensemble, freq_write_traj=100.0, freq_save_traj=100.0, verbose=False)

A class to perform posterior sampling of conformational populations.

Parameters
  • ensemble (object) – a biceps.Ensemble object

  • freq_write_traj (int) – the frequency (in steps) to write the MCMC trajectory

  • freq_print (int) – the frequency (in steps) to print status

  • freq_save_traj (int) – the frequency (in steps) to store the MCMC trajectory

PosteriorSampler.neglogP(states, parameters, parameter_indices)

Return -ln P of the current configuration.

Parameters
  • state (list) – the new conformational state being sampled in PosteriorSampler.sample

  • parameters (list) – a list of the new parameters for each of the restraints

  • parameter_indices (list) – parameter indices that correspond to each restraint

PosteriorSampler.sample(nsteps, burn=0, print_freq=1000, verbose=False, progress=True)

Perform n number of steps (nsteps) of posterior sampling, where Monte Carlo moves are accepted or rejected according to Metroplis criterion. Energies are computed via neglogP.

Parameters
  • nsteps (int) – the number of steps of sampling

  • burn (int) – the number of steps to burn

  • print_freq (int) – the frequency of printing to the screen

  • verbose (bool) – control over verbosity

Tip

Set verbose=False when using multiprocessing.

class biceps.PosteriorSamplingTrajectory(ensemble, sampler, nreplicas)

A container class to store and perform operations on the trajectories of sampling runs.

Parameters
PosteriorSamplingTrajectory.process_results(filename=None)

Process the trajectory, computing sampling statistics, ensemble-average NMR observables.

Benefits of using Numpy Z compression (npz) formatting: 1) Standardized Python library (NumPy), 2) writes a compact file of several arrays into binary format and 3) significantly smaller size over many other formats.

Parameters

filename (str) – relative path and filename for MCMC trajectory

Tip

It is possible to convert the trajectory file to a Pandas DataFrame (pickle file) with the following: biceps.toolbox.npz_to_DataFrame

Trajectory information
Key Short Description
rest_type list of strings for each restraint type.
ref list of strings for each reference potential types
allowed_parameters list of numpy arrays containing the allowed range of nuisance parameters with shape (m,n)
sampled_parameters list of numpy arrays containing the counts of nuisance parameters sampled for each restraint with shape (m,n)
trajectory_headers e.g., [step, energy, accept, state, [nuisance parameter index]]
trajectory list of values—see trajectory_headers
sep_accept list of separated acceptance ratios with shape (n+1,)
traces list of sampled nuisance parameters with shape (n)
state_trace list of sampled conformational state index

n is the number of allowed parameters
m is the number of restraints

Analysis

class biceps.Analysis(outdir, nstates=0, precheck=True, BSdir='BS.dat', popdir='populations.dat', picfile='BICePs.pdf', verbose=False)

A class to perform analysis and plot figures.

Parameters
  • nstates (int) – number of conformational states

  • trajs (str) – relative path to glob ‘*.npz’ trajectories (analysis files and figures will be placed inside this directory)

  • precheck (bool) – find the all the states that haven’t been sampled if any

  • BSdir (str) – relative path for BICePs score file name

  • popdir (str) – relative path for BICePs reweighted populations file name

  • picfile (str) – relative path for BICePs figure

Analysis.plot(plottype='hist', figname='BICePs.pdf', figsize=None, label_fontsize=12, legend_fontsize=10)

Plot figures for population and sampled nuisance parameters.

Parameters

show (bool) – show the plot in Jupyter Notebook.

Convergence

biceps.convergence.exponential_fit(autocorrelation, exp_function='single', v0=None, verbose=False)

Calls on single_exp_decay (‘single’) or double_exp_decay (‘double’) for an exponential fitting of an autocorrelation curve. See SciPy curve fit for more details.

Parameters
  • autocorrelation (np.ndarray) – the autocorrelation of some timeseries

  • exp_function (str) – default=’single’ (‘single’ or ‘double’

  • v0 (list) – Initial conditions for exponential fitting. Default for ‘single’ is v0=[0.0, 1.0, 4000.]=[a0, a1, tau1] where \(a_{0} + a_{1}*exp(-(x/ au_{1}))\) and default for ‘double’ is v0=[0.0, 0.9, 0.1, 4000., 200.0]=[a0, a1, a2, tau1, tau2] where \(f(x) = a_{0} + a_{1}*exp(-(x/ au_{1})) + a_{2}*exp(-(x/ au_{2}))\)

Returns

the y-values of the fitted curve.

Return type

yfit(np.ndarray)

biceps.convergence.compute_autocorrelation_curves(data, max_tau, normalize=True)

Calculates the autocorrelation for a list of arrays, where each array is a separate time-series.

Parameters
  • data (list) – list of separate timeseries

  • maxtau (int) – the upper bound of autocorrelation lag time

  • normalize (bool) – to normalize

Returns: np.ndarray

biceps.convergence.g(f, max_tau=10000, normalize=True)

Calculate the autocorrelaton function for a time-series f(t).

Parameters
  • f (np.ndarray) – a 1D numpy array containing the time series f(t)

  • maxtau (int) – the maximum autocorrelation time to consider.

  • normalize (bool) – if True, return g(tau)/g[0]

Returns: np.array: a numpy array of size (max_tau+1,) containing g(tau)

biceps.convergence.compute_autocorrelation_time(autocorrelations)

Computes the autocorrelation time \(\tau_{auto} = \int C_{\tau} d\tau\)

Parameters

autocorrelations (np.ndarray) – an array containing the autocorrelations for each time-series.

Returns: np.ndarray

biceps.convergence.get_blocks(data, nblocks=5)

Method used to partition data into blocks. The data is a list of arrays, where each array is a separate time-series or autocorrelation.

Parameters

data (list) – list of separate timeseries

biceps.convergence.compute_JSD(T1, T2, T_total, ind, allowed_parameters)

Compute JSD for a given part of trajectory.

\(JSD = H(P_{comb}) - {\pi_{1}}{H(P_{1})} - {\pi_{2}}{H(P_{2})}\), where \(P_{comb}\) is the combined data (\(P_{1} \cup P_{2}\)). \(H\) is the Shannon entropy of distribution \(P_{i}\) and \(\pi_{i}\) is the weight for the probability distribution \(P_{i}\). \(H(P_{i}) = \sum -\frac{r_{i}}{N_{i}}*ln(\frac{r_{i}}{N_{i}})\), where \(r_{i}\) and \(N_{i}\) represents sampled times of a specific parameter index and the total number of samples of the parameter, respectively

Variables
  • T_total (T1, T2,) – part 1, part2 and total (part1 + part2)

  • rest_type – experimental restraint type

  • allowed_parameters – nuisacne parameters range

Return float

Jensen–Shannon divergence

class biceps.Convergence(traj=None, filename=None, outdir='./', verbose=False)

Convergence submodule for BICePs.

Parameters
  • filename (str) – relative path and filename to MCMC trajectory (NumPy npz file)

  • outdir (str) – relative path for output files

Convergence.plot_traces(figname='traj_traces.png', xlim=None)

Plot trajectory traces.

Parameters

xlim (tuple) – matplotlib x-axis limits

Convergence.plot_auto_curve(xlim=None, figname='autocorrelation_curve.png', std_x=None, std_y=None)

Plot auto-correlation curve. This function saves a figure of auto-correlation with error bars at the 95% confidence interval (\(\tau_{auto}\) is rounded to the nearest integer).

Parameters
  • xlim (tuple) – matplotlib x-axis limits

  • std_x (np.ndarray) –

  • std_y (np.ndarray) –

Convergence.plot_block_avg(nblock, r_max, figname='block_avg.png')

Plot block average

Parameters
  • nblock (int) – is the number of partitions in the time series

  • r_max (np.ndarray) – maximum sampled parameters for each restraint

  • figname (str) – figure name without relative path (taken care of)

Convergence.get_autocorrelation_curves(method='auto', nblocks=5, maxtau=10000, plot_traces=False)

Compute autocorrelaton function for a time-series f(t), partition the data into the specified number of blocks and plot the autocorrelation curve. Saves a figure of autocorrelation curves for each restraint.

Parameters
  • method (str) – method for computing autocorrelation time; “block-avg-auto” or “exp” or “auto”

  • nblocks (int) – number of blocks to split up the trajectory

  • maxtau (int) – the upper bound of autocorrelation lag time

  • plot_traces (bool) – plot the trajectory traces?

Convergence.process(nblock=5, nfold=10, nround=100, savefile=True, block_avg=False, normalize=True)

Process the trajectory and execute compute_JSD() with plot_JSD_conv() and plot_JSD_distribution(). If block_avg=True, then block averaging will be executed and plot_block_avg() will be executed as well.

Parameters
  • nblock (int) – is the number of partitions in the time series

  • nfold (int) – is the number of partitions in the shuffled (subsampled) trajectory

  • nround (int) – is the number of rounds of bootstrapping when computing JSDs

  • savefile (bool) –

  • block_avg (bool) – use block averaging

  • verbose (bool) – verbosity

toolbox

biceps.toolbox.sort_data(dataFiles)

Sorting the data by extension into lists. Data can be located in various directories. Provide a list of paths where the data can be found. Some examples of fileextensions: {.noe,.J,.cs_H,.cs_Ha}.

Parameters

dataFiles (list) – list of strings where the data can be found

Raises

ValueError – if the data directory does not exist

>>> biceps.toolbox.sort_data()
biceps.toolbox.get_files(path)

Return a sorted list of files that will be globbed from the path given. First, this function can handle decimals and multiple numbers that are seperated by characters. https://pypi.org/project/natsort/

Parameters

path (str) –

Returns

sorted list

biceps.toolbox.list_res(input_data)

Determine the ordering of the experimental restraints that will be included in sampling.

Parameters

input_data (list) – see biceps.Ensemble.initialize_restraints

>>> biceps.toolbox.list_res()
biceps.toolbox.list_possible_restraints()

Function will return a list of all possible restraint classes in Restraint.py.

>>> biceps.toolbox.list_possible_restraints()
biceps.toolbox.list_extensions(input_data)

Determine the ordering of the experimental restraints that will be included in sampling.

Parameters

input_data (list) – see biceps.Ensemble.initialize_restraints

>>> biceps.toolbox.list_extensions()
biceps.toolbox.list_possible_extensions()

Function will return a list of all possible input data file extensions.

>>> biceps.toolbox.list_possible_extensions()
biceps.toolbox.npz_to_DataFrame(file, out_filename='traj_lambda0.00.pkl', verbose=False)

Converts numpy Z compressed file to Pandas DataFrame (*.pkl)

>>> biceps.toolbox.npz_to_DataFrame(file, out_filename="traj_lambda0.00.pkl")
biceps.toolbox.save_object(obj, filename)

Saves python object as pickle file. :param obj: python object :type obj: object :param filename: relative path for ouput :type filename: str

>>> biceps.toolbox.save_object()