API Reference¶
Preparation¶
- class biceps.Restraint.Preparation(nstates=0, top_file=None, outdir=None)¶
A class to prepare input_data for the
biceps.Ensemble.initialize_restraints
method.- Parameters
nstates (int) – number of conformational states
top_file (str) – relative path to the structure topology file
outdir (str) – relative path for output files
Ensemble¶
- class biceps.Ensemble(lam, energies, debug=False)¶
Container class for
biceps.Restraint.Restraint
objects.- Parameters
lam (float) – lambda value to scale energies
energies (np.ndarray) – numpy array of energies for each state
- Ensemble.initialize_restraints(input_data, options=None)¶
Initialize corresponding
biceps.Restraint.Restraint
classes based on experimental observables from input_data for each conformational state.- Parameters
input_data (list of str) – a sorted collection of filenames (files contain exp (experimental) and model (theoretical) observables)
options (list of dict) – dictionary containing keys that match
biceps.Restraint.Restraint
parameters and values are lists for each restraint.
# In general: parameters = [dict(**kwargs),...,dict(**kwargs)] # More specifically, for J and NOE data restraints, respectively: parameters = [dict(ref='uniform', sigma=(0.05, 20.0, 1.02)), dict(ref='exp', sigma=(0.05, 5.0, 1.02), gamma=(0.2, 5.0, 1.02))]
Tip
See the following parent
biceps.Restraint.Restraint
and child class methods for the full list of keyword arguments (**kwargs) for each restraint used inside parameters:biceps.Restraint.Restraint_cs.init_restraint
biceps.Restraint.Restraint_J.init_restraint
biceps.Restraint.Restraint_noe.init_restraint
biceps.Restraint.Restraint_pf.init_restraint
Print possible restraints with:
biceps.toolbox.list_possible_restraints
Print possible extensions with:
biceps.toolbox.list_possible_extensions
- Ensemble.to_list()¶
Converts the
Ensemble
class to a list.- Returns
collection of
biceps.Restraint.Restraint
objects- Return type
list
Restraint¶
- class biceps.Restraint.Restraint(ref='uniform', sigma=[0.05, 20.0, 1.02], use_global_ref_sigma=True, verbose=False)¶
The parent
biceps.Restraint.Restraint
class.- Parameters
ref_pot (str) – referenece potential e.g., “uniform”. “exp”, “gau”. If None, the default reference potential will be used for a given experimental observable
sigma (list) – (sigma_min, sigma_max, dsigma)
use_global_ref_sigma (bool) – (defaults to True)
- Restraint_cs.init_restraint(data, energy, extension='H', weight=1, file_fmt='pickle', verbose=False)¶
Initialize the chemical shift restraints for each experimental and theoretical observable given data.
- Parameters
data (str) – filename of data
energy (float) – The (reduced) free energy of the conformation
extensions (str) – “H”, “Ca”, “N”
weight (float) – weight for restraint
- Restraint_J.init_restraint(data, energy, extension='J', weight=1, file_fmt='pickle', verbose=False)¶
Initialize the sclar coupling constant restraints for each exp (experimental) and model (theoretical) observable given data.
- Parameters
data (str) – filename of data
energy (float) – The (reduced) free energy of the conformation
weight (float) – weight for restraint
- Restraint_noe.init_restraint(data, energy, extension='noe', weight=1, file_fmt='pickle', verbose=False, log_normal=False, gamma=[0.2, 10.0, 1.01])¶
Initialize the NOE distance restraints for each experimental and theoretical observable given data. When
log_normal=True
, the modified sum of squared errors is used \(\chi_{d}^{2}(X)=\sum_{j} w_{j}( \ln ( r_{j}(X) / \gamma' r_{j}^{exp} ))^{2}\) :param data: filename of data :type data: str :param energy: The (reduced) free energy \(f=\beta*F\) of the conformation :type energy: float :param weight: weight for restraint :type weight: float :param log_normal: use log normal distribution :type log_normal: bool :param gamma: [gamma_min, gamma_max, dgamma] in log space :type gamma: list
- Restraint_pf.init_restraint(data, energy, precomputed=False, pf_prior=None, Ncs_fi=None, Nhs_fi=None, beta_c=(0.05, 0.25, 0.01), beta_h=(0.0, 5.2, 0.2), beta_0=(-10.0, 0.0, 0.2), xcs=(5.0, 8.5, 0.5), xhs=(2.0, 2.7, 0.1), bs=(15.0, 16.0, 1.0), extension='pf', weight=1, file_fmt='pickle', states=None, verbose=False)¶
Initialize protection factor restraints for each exp (experimental) and model (theoretical) observable given data.
- Parameters
data (str) – filename of data
energy (float) – The (reduced) free energy \(f=\beta*F\) of the conformation
weight (float) – weight for restraint
beta_c (list) – [min, max, spacing]
beta_h (list) – [min, max, spacing]
beta_0 (list) – [min, max, spacing]
xcs (list) – [min, max, spacing]
xhs (list) – [min, max, spacing]
bs (list) – [min, max, spacing]
PosteriorSampler¶
- class biceps.PosteriorSampler(ensemble, freq_write_traj=100.0, freq_save_traj=100.0, verbose=False)¶
A class to perform posterior sampling of conformational populations.
- Parameters
ensemble (object) – a
biceps.Ensemble
objectfreq_write_traj (int) – the frequency (in steps) to write the MCMC trajectory
freq_print (int) – the frequency (in steps) to print status
freq_save_traj (int) – the frequency (in steps) to store the MCMC trajectory
- PosteriorSampler.neglogP(states, parameters, parameter_indices)¶
Return -ln P of the current configuration.
- Parameters
state (list) – the new conformational state being sampled in
PosteriorSampler.sample
parameters (list) – a list of the new parameters for each of the restraints
parameter_indices (list) – parameter indices that correspond to each restraint
- PosteriorSampler.sample(nsteps, burn=0, print_freq=1000, verbose=False, progress=True)¶
Perform n number of steps (nsteps) of posterior sampling, where Monte Carlo moves are accepted or rejected according to Metroplis criterion. Energies are computed via
neglogP
.- Parameters
nsteps (int) – the number of steps of sampling
burn (int) – the number of steps to burn
print_freq (int) – the frequency of printing to the screen
verbose (bool) – control over verbosity
Tip
Set verbose=False when using multiprocessing.
- class biceps.PosteriorSamplingTrajectory(ensemble, sampler, nreplicas)¶
A container class to store and perform operations on the trajectories of sampling runs.
- Parameters
ensemble (list) – ensemble of
biceps.Restraint.Restraint
objectsnreplicas (int) – number of replicas
- PosteriorSamplingTrajectory.process_results(filename=None)¶
Process the trajectory, computing sampling statistics, ensemble-average NMR observables.
Benefits of using Numpy Z compression (npz) formatting: 1) Standardized Python library (NumPy), 2) writes a compact file of several arrays into binary format and 3) significantly smaller size over many other formats.
- Parameters
filename (str) – relative path and filename for MCMC trajectory
Tip
It is possible to convert the trajectory file to a Pandas DataFrame (pickle file) with the following:
biceps.toolbox.npz_to_DataFrame
Key | Short Description |
---|---|
rest_type |
list of strings for each restraint type. |
ref |
list of strings for each reference potential types |
allowed_parameters |
list of numpy arrays containing the allowed range of nuisance parameters with shape (m,n) |
sampled_parameters |
list of numpy arrays containing the counts of nuisance parameters sampled for each restraint with shape (m,n) |
trajectory_headers |
e.g., [step, energy, accept, state, [nuisance parameter index]] |
trajectory |
list of values—see trajectory_headers |
sep_accept |
list of separated acceptance ratios with shape (n+1,) |
traces |
list of sampled nuisance parameters with shape (n) |
state_trace |
list of sampled conformational state index |
n is the number of allowed parameters
m is the number of restraints
Analysis¶
- class biceps.Analysis(outdir, nstates=0, precheck=True, BSdir='BS.dat', popdir='populations.dat', picfile='BICePs.pdf', verbose=False)¶
A class to perform analysis and plot figures.
- Parameters
nstates (int) – number of conformational states
trajs (str) – relative path to glob ‘*.npz’ trajectories (analysis files and figures will be placed inside this directory)
precheck (bool) – find the all the states that haven’t been sampled if any
BSdir (str) – relative path for BICePs score file name
popdir (str) – relative path for BICePs reweighted populations file name
picfile (str) – relative path for BICePs figure
- Analysis.plot(plottype='hist', figname='BICePs.pdf', figsize=None, label_fontsize=12, legend_fontsize=10)¶
Plot figures for population and sampled nuisance parameters.
- Parameters
show (bool) – show the plot in Jupyter Notebook.
Convergence¶
- biceps.convergence.exponential_fit(autocorrelation, exp_function='single', v0=None, verbose=False)¶
Calls on
single_exp_decay
(‘single’) ordouble_exp_decay
(‘double’) for an exponential fitting of an autocorrelation curve. See SciPy curve fit for more details.- Parameters
autocorrelation (np.ndarray) – the autocorrelation of some timeseries
exp_function (str) – default=’single’ (‘single’ or ‘double’
v0 (list) – Initial conditions for exponential fitting. Default for ‘single’ is v0=[0.0, 1.0, 4000.]=[a0, a1, tau1] where \(a_{0} + a_{1}*exp(-(x/ au_{1}))\) and default for ‘double’ is v0=[0.0, 0.9, 0.1, 4000., 200.0]=[a0, a1, a2, tau1, tau2] where \(f(x) = a_{0} + a_{1}*exp(-(x/ au_{1})) + a_{2}*exp(-(x/ au_{2}))\)
- Returns
the y-values of the fitted curve.
- Return type
yfit(np.ndarray)
- biceps.convergence.compute_autocorrelation_curves(data, max_tau, normalize=True)¶
Calculates the autocorrelation for a list of arrays, where each array is a separate time-series.
- Parameters
data (list) – list of separate timeseries
maxtau (int) – the upper bound of autocorrelation lag time
normalize (bool) – to normalize
Returns: np.ndarray
- biceps.convergence.g(f, max_tau=10000, normalize=True)¶
Calculate the autocorrelaton function for a time-series f(t).
- Parameters
f (np.ndarray) – a 1D numpy array containing the time series f(t)
maxtau (int) – the maximum autocorrelation time to consider.
normalize (bool) – if True, return g(tau)/g[0]
Returns: np.array: a numpy array of size (max_tau+1,) containing g(tau)
- biceps.convergence.compute_autocorrelation_time(autocorrelations)¶
Computes the autocorrelation time \(\tau_{auto} = \int C_{\tau} d\tau\)
- Parameters
autocorrelations (np.ndarray) – an array containing the autocorrelations for each time-series.
Returns: np.ndarray
- biceps.convergence.get_blocks(data, nblocks=5)¶
Method used to partition data into blocks. The data is a list of arrays, where each array is a separate time-series or autocorrelation.
- Parameters
data (list) – list of separate timeseries
- biceps.convergence.compute_JSD(T1, T2, T_total, ind, allowed_parameters)¶
Compute JSD for a given part of trajectory.
\(JSD = H(P_{comb}) - {\pi_{1}}{H(P_{1})} - {\pi_{2}}{H(P_{2})}\), where \(P_{comb}\) is the combined data (\(P_{1} \cup P_{2}\)). \(H\) is the Shannon entropy of distribution \(P_{i}\) and \(\pi_{i}\) is the weight for the probability distribution \(P_{i}\). \(H(P_{i}) = \sum -\frac{r_{i}}{N_{i}}*ln(\frac{r_{i}}{N_{i}})\), where \(r_{i}\) and \(N_{i}\) represents sampled times of a specific parameter index and the total number of samples of the parameter, respectively
- Variables
T_total (T1, T2,) – part 1, part2 and total (part1 + part2)
rest_type – experimental restraint type
allowed_parameters – nuisacne parameters range
- Return float
Jensen–Shannon divergence
- class biceps.Convergence(traj=None, filename=None, outdir='./', verbose=False)¶
Convergence submodule for BICePs.
- Parameters
filename (str) – relative path and filename to MCMC trajectory (NumPy npz file)
outdir (str) – relative path for output files
- Convergence.plot_traces(figname='traj_traces.png', xlim=None)¶
Plot trajectory traces.
- Parameters
xlim (tuple) – matplotlib x-axis limits
- Convergence.plot_auto_curve(xlim=None, figname='autocorrelation_curve.png', std_x=None, std_y=None)¶
Plot auto-correlation curve. This function saves a figure of auto-correlation with error bars at the 95% confidence interval (\(\tau_{auto}\) is rounded to the nearest integer).
- Parameters
xlim (tuple) – matplotlib x-axis limits
std_x (np.ndarray) –
std_y (np.ndarray) –
- Convergence.plot_block_avg(nblock, r_max, figname='block_avg.png')¶
Plot block average
- Parameters
nblock (int) – is the number of partitions in the time series
r_max (np.ndarray) – maximum sampled parameters for each restraint
figname (str) – figure name without relative path (taken care of)
- Convergence.get_autocorrelation_curves(method='auto', nblocks=5, maxtau=10000, plot_traces=False)¶
Compute autocorrelaton function for a time-series f(t), partition the data into the specified number of blocks and plot the autocorrelation curve. Saves a figure of autocorrelation curves for each restraint.
- Parameters
method (str) – method for computing autocorrelation time; “block-avg-auto” or “exp” or “auto”
nblocks (int) – number of blocks to split up the trajectory
maxtau (int) – the upper bound of autocorrelation lag time
plot_traces (bool) – plot the trajectory traces?
- Convergence.process(nblock=5, nfold=10, nround=100, savefile=True, block_avg=False, normalize=True)¶
Process the trajectory and execute
compute_JSD()
withplot_JSD_conv()
andplot_JSD_distribution()
. Ifblock_avg=True
, then block averaging will be executed andplot_block_avg()
will be executed as well.- Parameters
nblock (int) – is the number of partitions in the time series
nfold (int) – is the number of partitions in the shuffled (subsampled) trajectory
nround (int) – is the number of rounds of bootstrapping when computing JSDs
savefile (bool) –
block_avg (bool) – use block averaging
verbose (bool) – verbosity
toolbox¶
- biceps.toolbox.sort_data(dataFiles)¶
Sorting the data by extension into lists. Data can be located in various directories. Provide a list of paths where the data can be found. Some examples of fileextensions: {.noe,.J,.cs_H,.cs_Ha}.
- Parameters
dataFiles (list) – list of strings where the data can be found
- Raises
ValueError – if the data directory does not exist
>>> biceps.toolbox.sort_data()
- biceps.toolbox.get_files(path)¶
Return a sorted list of files that will be globbed from the path given. First, this function can handle decimals and multiple numbers that are seperated by characters. https://pypi.org/project/natsort/
- Parameters
path (str) –
- Returns
sorted list
- biceps.toolbox.list_res(input_data)¶
Determine the ordering of the experimental restraints that will be included in sampling.
- Parameters
input_data (list) – see
biceps.Ensemble.initialize_restraints
>>> biceps.toolbox.list_res()
- biceps.toolbox.list_possible_restraints()¶
Function will return a list of all possible restraint classes in Restraint.py.
>>> biceps.toolbox.list_possible_restraints()
- biceps.toolbox.list_extensions(input_data)¶
Determine the ordering of the experimental restraints that will be included in sampling.
- Parameters
input_data (list) – see
biceps.Ensemble.initialize_restraints
>>> biceps.toolbox.list_extensions()
- biceps.toolbox.list_possible_extensions()¶
Function will return a list of all possible input data file extensions.
>>> biceps.toolbox.list_possible_extensions()
- biceps.toolbox.npz_to_DataFrame(file, out_filename='traj_lambda0.00.pkl', verbose=False)¶
Converts numpy Z compressed file to Pandas DataFrame (*.pkl)
>>> biceps.toolbox.npz_to_DataFrame(file, out_filename="traj_lambda0.00.pkl")
- biceps.toolbox.save_object(obj, filename)¶
Saves python object as pickle file. :param obj: python object :type obj: object :param filename: relative path for ouput :type filename: str
>>> biceps.toolbox.save_object()