Posterior Distribution¶
- Overview:
Exact Posterior Classes -
BranchProPosterior-BranchProPosteriorMultSI-LocImpBranchProPosterior-LocImpBranchProPosteriorMultSI-GammaDistMCMC Sampling-based Log-Posterior Classes -
PoissonBranchProLogLik-PoissonBranchProLogPosterior-MultiCatPoissonBranchProLogLik-MultiCatPoissonBranchProLogPosterior-NegBinBranchProLogLik-NegBinBranchProLogPosterior
Branch Process Posterior Distribution¶
- class branchpro.BranchProPosterior(inc_data, daily_serial_interval, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]¶
BranchProPosterior Class: Class for computing the posterior distribution used for the inference of the reproduction numbers of an epidemic in the case of a branching process.
Choice of prior distribution is the conjugate prior for the likelihood (Poisson) of observing given incidence data, hence is a Gamma distribution. We express it in the shape-rate configuration, so that the PDF takes the form:
\[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\]Hence, the posterior distribution will be also be Gamma-distributed.
- Parameters:
inc_data – (pandas Dataframe) contains numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.
alpha – the shape parameter of the Gamma distribution of the prior.
beta – the rate parameter of the Gamma distribution of the prior.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
Notes
Always apply method run_inference before calling
BranchProPosterior.get_intervals()to get R behaviour dataframe!- get_intervals(central_prob)[source]¶
Returns a dataframe of the reproduction number posterior mean with percentiles over time.
The lower and upper percentiles are computed from the posterior distribution, using the specified central probability to form an equal-tailed interval.
The results are returned in a dataframe with the following columns: ‘Time Points’, ‘Mean’, ‘Lower bound CI’ and ‘Upper bound CI’
- Parameters:
central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.
- last_time_r_threshold(type_threshold, central_prob=0.95, method='Mean')[source]¶
Return the value of the first time point after the reproduction number posterior mean, lower bound and upper bound for a specified central probability respectively crosses the imposed threshold for the last time during the inference.
- Parameters:
type_threshold – (str) type of threshold imposed; ‘more’ = last time R > 1 and ‘less’ = last time R < 1.
central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.
method – choice for the average trajcetory of reproduction number considered; can be either Mean or Median.
- proportion_time_r_more_than_1(central_prob=0.95, method='Mean')[source]¶
Return the proportion of time the reproduction number posterior mean, lower bound and upper bound for a specified central probability respectively are bigger than 1.
- Parameters:
central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.
method – choice for the average trajcetory of reproduction number considered; can be either Mean or Median.
- run_inference(tau)[source]¶
Runs the inference of the reproduction numbers based on the entirety of the incidence data available.
First inferred R value is given at the immediate time point after which the tau-window of the initial incidences ends.
- Parameters:
tau – size sliding time window over which the reproduction number is estimated.
Branch Process Posterior Distribution with Multiple Serial Intervals¶
- class branchpro.BranchProPosteriorMultSI(inc_data, daily_serial_intervals, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]¶
BranchProPosteriorMultiSI Class: Class for computing the posterior distribution used for the inference of the reproduction numbers of an epidemic in the case of a branching process using mutiple serial intevals. Based on the
BranchProPosterior.In order to incorporate the uncertainty in the serial interval into the posterior of \(R_t\), this class employs the approximation
\[p(R_t|I) = \int p(R_t|I, w) p(w) dw \approx \frac{1}{N} \sum_{i=1}^N p(R_t|I,w^{(i)}); w^{(i)} \sim p(w)\]where \(I\) indicates the incidence data. At instantiation, the user supplies the samples \(w^{(i)}\) which are assumed to have been drawn IID from the distribution of serial intervals.
Requested posterior percentiles are computed from the above density using numerical integration.
- Parameters:
inc_data – (pandas Dataframe) contains numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
daily_serial_intervals – (list of lists) List of unnormalised probability distributions of that the recipient first displays symptoms s days after the infector first displays symptoms.
alpha – the shape parameter of the Gamma distribution of the prior.
beta – the rate parameter of the Gamma distribution of the prior.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
- get_intervals(central_prob)[source]¶
Returns a dataframe of the reproduction number posterior mean with percentiles over time.
The lower and upper percentiles are computed from the posterior distribution, using the specified central probability to form an equal-tailed interval.
The results are returned in a dataframe with the following columns: ‘Time Points’, ‘Mean’, ‘Lower bound CI’ and ‘Upper bound CI’
- Parameters:
central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.
- run_inference(tau, progress_fn=None)[source]¶
Runs the inference of the reproduction numbers based on the entirety of the incidence data available.
First inferred R value is given at the immediate time point after which the tau-window of the initial incidences ends.
- Parameters:
tau – size sliding time window over which the reproduction number is estimated.
progress_fn – A function with integer argument. If provided, it will be called every 10 iterations of the loop over serial intervals, with the current iteration number passed as the argument.
Local and Imported Branch Process Posterior Distribution¶
- class branchpro.LocImpBranchProPosterior(inc_data, imported_inc_data, epsilon, daily_serial_interval, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]¶
LocImpBranchProPosterior Class: Class for computing the posterior distribution used for the inference of the reproduction numbers of an epidemic in the case of a branching process with local and imported cases.
Choice of prior distribution is the conjugate prior for the likelihood (Poisson) of observing given incidence data, hence is a Gamma distribution. We express it in the shape-rate configuration, so that the PDF takes the form:
\[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\]Hence, the posterior distribution will be also be Gamma-distributed.
We assume that at all times the R number of the imported cases is proportional to the R number of the local incidences:
\[R_{t}^{\text(imported)} = \epsilon R_{t}^{\text(local)}\]- Parameters:
inc_data – (pandas Dataframe) contains numbers of local new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.
alpha – the shape parameter of the Gamma distribution of the prior.
beta – the rate parameter of the Gamma distribution of the prior.
time_key – label key given to the temporal data in the inc_data and imported_inc_data dataframes.
inc_key – label key given to the incidental data in the inc_data and imported_inc_data dataframes.
Notes
Always apply method run_inference before calling
BranchProPosterior.get_intervals()to get R behaviour dataframe!- run_inference(tau)[source]¶
Runs the inference of the reproduction numbers based on the entirety of the local and imported incidence data available.
First inferred (local) R value is given at the immediate time point after which the tau-window of the initial incidences ends.
- Parameters:
tau – size sliding time window over which the reproduction number is estimated.
Local and Imported Branch Process Posterior Distribution with Multiple Serial Intervals¶
- class branchpro.LocImpBranchProPosteriorMultSI(inc_data, imported_inc_data, epsilon, daily_serial_intervals, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]¶
- run_inference(tau, progress_fn=None)[source]¶
Runs the inference of the reproduction numbers based on the entirety of the incidence data available.
First inferred R value is given at the immediate time point after which the tau-window of the initial incidences ends.
- Parameters:
tau – size sliding time window over which the reproduction number is estimated.
progress_fn – A function with integer argument. If provided, it will be called every 10 iterations of the loop over serial intervals, with the current iteration number passed as the argument.
Gamma distribution¶
- class branchpro.GammaDist(shape, rate)[source]¶
Gamma distribution.
Smaller version of the scipy.stats class. It uses the scipy methods, but only saves the shape and rate parameters in the object. Instantiation is much faster than scipy; method calls are similar in speed. It also uses less memory than scipy.
It also has a new density function,
big_pdf(), which is faster on large array inputs.We use the shape/rate parametrization, under which the gamma pdf is:
\[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\]for shape \(alpha\) and rate :beta.
New Posterior Classes using MCMC Sampling algorithms¶
Branch Process with Poisson Noise¶
Log-likelihood Class¶
- class branchpro.PoissonBranchProLogLik(inc_data, daily_serial_interval, tau, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]¶
PoissonBranchProLogLik Class: Controller class to construct the log-likelihood needed for optimisation or inference in a PINTS framework of Poisson branching process.
- Parameters:
inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.
tau – (numeric) size sliding time window over which the reproduction number is estimated.
imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
- evaluateS1(x)[source]¶
Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')whereLis a scalar value andL'is a sequence of lengthn_parameters.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p)), evaluated atp=x.This is an optional method that is not always implemented.
- get_serial_intervals()[source]¶
Returns serial intervals for the model.
- Returns:
Serial intervals for the model.
- Return type:
list
- n_parameters()[source]¶
Returns number of parameters for log-likelihood object.
- Returns:
Number of parameters for log-likelihood object.
- Return type:
int
Log-posterior Class¶
- class branchpro.PoissonBranchProLogPosterior(inc_data, daily_serial_interval, tau, alpha, beta, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]¶
PoissonBranchProLogPosterior Class: Controller class for the optimisation or inference of parameters of the Poisson Branching process model in a PINTS framework.
- Parameters:
inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
tau – (numeric) Size sliding time window over which the reproduction number is estimated.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.
alpha – the shape parameter of the Gamma distribution of the prior.
beta – the rate parameter of the Gamma distribution of the prior.
imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
- return_loglikelihood(x)[source]¶
Return the log-likelihood used for the optimisation or inference.
- Parameters:
x (list) – List of free parameters used for computing the log-likelihood.
- Returns:
Value of the log-likelihood at the given point in the free parameter space.
- Return type:
float
- return_logposterior(x)[source]¶
Return the log-posterior used for the optimisation or inference.
- Parameters:
x (list) – List of free parameters used for computing the log-posterior.
- Returns:
Value of the log-posterior at the given point in the free parameter space.
- Return type:
float
- return_logprior(x)[source]¶
Return the log-prior used for the optimisation or inference.
- Parameters:
x (list) – List of free parameters used for computing the log-prior.
- Returns:
Value of the log-prior at the given point in the free parameter space.
- Return type:
float
- run_inference(num_iter)[source]¶
Runs the parameter inference routine for the Poisson branching process model.
- Parameters:
num_iter (integer) – Number of iterations the MCMC sampler algorithm is run for.
- Returns:
3D-matrix of the proposed parameters for each iteration for each of the chains of the MCMC sampler.
- Return type:
numpy.array
- run_optimisation()[source]¶
Runs the initial conditions optimisation routine for the Poisson branching process model.
- Returns:
numpy.array – Matrix of the optimised parameters at the end of the optimisation procedure.
float – Value of the log-posterior at the optimised point in the free parameter space.
Multiple Categories Branch Process with Poisson Noise¶
Log-likelihood Class¶
- class branchpro.MultiCatPoissonBranchProLogLik(inc_data, daily_serial_interval, num_cat, contact_matrix, transm, tau, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number', multipleSI=False)[source]¶
MultiCatPoissonBranchProLogLik Class: Controller class to construct the log-likelihood needed for optimisation or inference in a PINTS framework of Poisson branching process with multiple population categories.
- Parameters:
inc_data – (pandas Dataframe) Dataframe of the categorical numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number vector per categories, respectively.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms for each category.
num_cat – (int) Number of categories in which the population is split.
contact_matrix – (array) Matrix of contacts between the different categories in which the population is split.
transm – (list) List of overall reductions in transmissibility per category.
tau – (numeric) size sliding time window over which the reproduction number is estimated.
imported_inc_data – (pandas Dataframe) contains numbers of categorical imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number vector per categories, respectively.
epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
multipleSI – (boolean) Different serial intervals used for categories.
- evaluateS1(x)[source]¶
Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')whereLis a scalar value andL'is a sequence of lengthn_parameters.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p)), evaluated atp=x.This is an optional method that is not always implemented.
- n_parameters()[source]¶
Returns number of parameters for log-likelihood object.
- Returns:
Number of parameters for log-likelihood object.
- Return type:
int
- set_serial_intervals(serial_intervals, multipleSI=False)[source]¶
Updates serial intervals for the model.
- Parameters:
serial_intervals – New unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms for each category.
multipleSI – (boolean) Different serial intervals used for categories.
Log-posterior Class¶
- class branchpro.MultiCatPoissonBranchProLogPosterior(inc_data, daily_serial_interval, num_cat, contact_matrix, transm, tau, alpha, beta, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]¶
MultiCatPoissonBranchProLogPosterior Class: Controller class for the optimisation or inference of parameters of the Poisson Branching process model in a PINTS framework.
- Parameters:
inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.
num_cat – (int) Number of categories in which the population is split.
contact_matrix – (array) Matrix of contacts between the different categories in which the population is split.
transm – (list) List of overall reductions in transmissibility per category.
tau – (numeric) Size sliding time window over which the reproduction number is estimated.
alpha – the shape parameter of the Gamma distribution of the prior.
beta – the rate parameter of the Gamma distribution of the prior.
imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
- return_loglikelihood(x)[source]¶
Return the log-likelihood used for the optimisation or inference.
- Parameters:
x (list) – List of free parameters used for computing the log-likelihood.
- Returns:
Value of the log-likelihood at the given point in the free parameter space.
- Return type:
float
- return_logposterior(x)[source]¶
Return the log-posterior used for the optimisation or inference.
- Parameters:
x (list) – List of free parameters used for computing the log-posterior.
- Returns:
Value of the log-posterior at the given point in the free parameter space.
- Return type:
float
- return_logprior(x)[source]¶
Return the log-prior used for the optimisation or inference.
- Parameters:
x (list) – List of free parameters used for computing the log-prior.
- Returns:
Value of the log-prior at the given point in the free parameter space.
- Return type:
float
- run_inference(num_iter)[source]¶
Runs the parameter inference routine for the Poisson branching process model.
- Parameters:
num_iter (integer) – Number of iterations the MCMC sampler algorithm is run for.
- Returns:
3D-matrix of the proposed parameters for each iteration for each of the chains of the MCMC sampler.
- Return type:
numpy.array
- run_optimisation()[source]¶
Runs the initial conditions optimisation routine for the Poisson branching process model.
- Returns:
numpy.array – Matrix of the optimised parameters at the end of the optimisation procedure.
float – Value of the log-posterior at the optimised point in the free parameter space.
Branch Process with Negative Binomial Noise¶
Log-likelihood Class¶
- class branchpro.NegBinBranchProLogLik(inc_data, daily_serial_interval, tau, phi, infer_phi=True, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]¶
NegBinBranchProLogLik Class: Controller class to construct the log-likelihood needed for optimisation or inference in a PINTS framework of negative binomial branching process.
- Parameters:
inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.
tau – (numeric) size sliding time window over which the reproduction number is estimated.
phi – (numeric) Value of the overdispersion parameter for the negative binomial noise distribution.
infer_phi – (boolean) Indicator value of whether the overdispersion parameter for the negative binomial noise distribution is inferred or not.
imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
Log-posterior Class¶
- class branchpro.NegBinBranchProLogPosterior(inc_data, daily_serial_interval, tau, phi, alpha, beta, infer_phi=False, phi_shape=None, phi_rate=None, phi_prior=None, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]¶
NegBinBranchProLogPosterior Class: Controller class for the optimisation or inference of parameters of the negative binomial branching process model in a PINTS framework.
- Parameters:
inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.
tau – (numeric) Size sliding time window over which the reproduction number is estimated.
phi – (numeric) Value of the overdispersion parameter for the negative binomial noise distribution.
alpha – the shape parameter of the Gamma distribution of the prior.
beta – the rate parameter of the Gamma distribution of the prior.
infer_phi – (boolean) Indicator value of whether the overdispersion parameter for the negative binomial noise distribution is inferred or not.
phi_shape – the shape parameter of the Gamma distribution of the prior of the overdispersion.
phi_rate – the rate parameter of the Gamma distribution of the prior of the overdispersion.
phi_prior – (pints.LogPrior) Prior distribution of the phi parameter. Can be non-Gamma.
imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.
epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.
time_key – label key given to the temporal data in the inc_data dataframe.
inc_key – label key given to the incidental data in the inc_data dataframe.
- run_inference(num_iter)[source]¶
Runs the parameter inference routine for the Poisson branching process model.
- Parameters:
num_iter (integer) – Number of iterations the MCMC sampler algorithm is run for.
- Returns:
3D-matrix of the proposed parameters for each iteration for each of the chains of the MCMC sampler.
- Return type:
numpy.array
- run_optimisation()[source]¶
Runs the initial conditions optimisation routine for the Poisson branching process model.
- Returns:
numpy.array – Matrix of the optimised parameters at the end of the optimisation procedure.
float – Value of the log-posterior at the optimised point in the free parameter space.