Posterior Distribution

Overview:

Branch Process Posterior Distribution

class branchpro.BranchProPosterior(inc_data, daily_serial_interval, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]

BranchProPosterior Class: Class for computing the posterior distribution used for the inference of the reproduction numbers of an epidemic in the case of a branching process.

Choice of prior distribution is the conjugate prior for the likelihood (Poisson) of observing given incidence data, hence is a Gamma distribution. We express it in the shape-rate configuration, so that the PDF takes the form:

\[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\]

Hence, the posterior distribution will be also be Gamma-distributed.

Parameters:
  • inc_data – (pandas Dataframe) contains numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • alpha – the shape parameter of the Gamma distribution of the prior.

  • beta – the rate parameter of the Gamma distribution of the prior.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

Notes

Always apply method run_inference before calling BranchProPosterior.get_intervals() to get R behaviour dataframe!

get_intervals(central_prob)[source]

Returns a dataframe of the reproduction number posterior mean with percentiles over time.

The lower and upper percentiles are computed from the posterior distribution, using the specified central probability to form an equal-tailed interval.

The results are returned in a dataframe with the following columns: ‘Time Points’, ‘Mean’, ‘Lower bound CI’ and ‘Upper bound CI’

Parameters:

central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.

get_serial_intervals()[source]

Returns serial intervals for the model.

last_time_r_threshold(type_threshold, central_prob=0.95, method='Mean')[source]

Return the value of the first time point after the reproduction number posterior mean, lower bound and upper bound for a specified central probability respectively crosses the imposed threshold for the last time during the inference.

Parameters:
  • type_threshold – (str) type of threshold imposed; ‘more’ = last time R > 1 and ‘less’ = last time R < 1.

  • central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.

  • method – choice for the average trajcetory of reproduction number considered; can be either Mean or Median.

proportion_time_r_more_than_1(central_prob=0.95, method='Mean')[source]

Return the proportion of time the reproduction number posterior mean, lower bound and upper bound for a specified central probability respectively are bigger than 1.

Parameters:
  • central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.

  • method – choice for the average trajcetory of reproduction number considered; can be either Mean or Median.

run_inference(tau)[source]

Runs the inference of the reproduction numbers based on the entirety of the incidence data available.

First inferred R value is given at the immediate time point after which the tau-window of the initial incidences ends.

Parameters:

tau – size sliding time window over which the reproduction number is estimated.

set_serial_intervals(serial_intervals)[source]

Updates serial intervals for the model.

Parameters:

serial_intervals – New unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

Branch Process Posterior Distribution with Multiple Serial Intervals

class branchpro.BranchProPosteriorMultSI(inc_data, daily_serial_intervals, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]

BranchProPosteriorMultiSI Class: Class for computing the posterior distribution used for the inference of the reproduction numbers of an epidemic in the case of a branching process using mutiple serial intevals. Based on the BranchProPosterior.

In order to incorporate the uncertainty in the serial interval into the posterior of \(R_t\), this class employs the approximation

\[p(R_t|I) = \int p(R_t|I, w) p(w) dw \approx \frac{1}{N} \sum_{i=1}^N p(R_t|I,w^{(i)}); w^{(i)} \sim p(w)\]

where \(I\) indicates the incidence data. At instantiation, the user supplies the samples \(w^{(i)}\) which are assumed to have been drawn IID from the distribution of serial intervals.

Requested posterior percentiles are computed from the above density using numerical integration.

Parameters:
  • inc_data – (pandas Dataframe) contains numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • daily_serial_intervals – (list of lists) List of unnormalised probability distributions of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • alpha – the shape parameter of the Gamma distribution of the prior.

  • beta – the rate parameter of the Gamma distribution of the prior.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

get_intervals(central_prob)[source]

Returns a dataframe of the reproduction number posterior mean with percentiles over time.

The lower and upper percentiles are computed from the posterior distribution, using the specified central probability to form an equal-tailed interval.

The results are returned in a dataframe with the following columns: ‘Time Points’, ‘Mean’, ‘Lower bound CI’ and ‘Upper bound CI’

Parameters:

central_prob – level of the computed credible interval of the estimated R number values. The interval the central probability.

get_serial_intervals()[source]

Returns serial intervals for the model.

run_inference(tau, progress_fn=None)[source]

Runs the inference of the reproduction numbers based on the entirety of the incidence data available.

First inferred R value is given at the immediate time point after which the tau-window of the initial incidences ends.

Parameters:
  • tau – size sliding time window over which the reproduction number is estimated.

  • progress_fn – A function with integer argument. If provided, it will be called every 10 iterations of the loop over serial intervals, with the current iteration number passed as the argument.

set_serial_intervals(serial_intervals)[source]

Updates serial intervals for the model.

Parameters:

serial_intervals – New unnormalised probability distributions of that the recipient first displays symptoms s days after the infector first displays symptoms.

Local and Imported Branch Process Posterior Distribution

class branchpro.LocImpBranchProPosterior(inc_data, imported_inc_data, epsilon, daily_serial_interval, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]

LocImpBranchProPosterior Class: Class for computing the posterior distribution used for the inference of the reproduction numbers of an epidemic in the case of a branching process with local and imported cases.

Choice of prior distribution is the conjugate prior for the likelihood (Poisson) of observing given incidence data, hence is a Gamma distribution. We express it in the shape-rate configuration, so that the PDF takes the form:

\[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\]

Hence, the posterior distribution will be also be Gamma-distributed.

We assume that at all times the R number of the imported cases is proportional to the R number of the local incidences:

\[R_{t}^{\text(imported)} = \epsilon R_{t}^{\text(local)}\]
Parameters:
  • inc_data – (pandas Dataframe) contains numbers of local new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • alpha – the shape parameter of the Gamma distribution of the prior.

  • beta – the rate parameter of the Gamma distribution of the prior.

  • time_key – label key given to the temporal data in the inc_data and imported_inc_data dataframes.

  • inc_key – label key given to the incidental data in the inc_data and imported_inc_data dataframes.

Notes

Always apply method run_inference before calling BranchProPosterior.get_intervals() to get R behaviour dataframe!

run_inference(tau)[source]

Runs the inference of the reproduction numbers based on the entirety of the local and imported incidence data available.

First inferred (local) R value is given at the immediate time point after which the tau-window of the initial incidences ends.

Parameters:

tau – size sliding time window over which the reproduction number is estimated.

set_epsilon(new_epsilon)[source]

Updates proportionality constant of the R number for imported cases with respect to its analog for local ones.

Parameters:

new_epsilon – new value of constant of proportionality.

Local and Imported Branch Process Posterior Distribution with Multiple Serial Intervals

class branchpro.LocImpBranchProPosteriorMultSI(inc_data, imported_inc_data, epsilon, daily_serial_intervals, alpha, beta, time_key='Time', inc_key='Incidence Number')[source]
run_inference(tau, progress_fn=None)[source]

Runs the inference of the reproduction numbers based on the entirety of the incidence data available.

First inferred R value is given at the immediate time point after which the tau-window of the initial incidences ends.

Parameters:
  • tau – size sliding time window over which the reproduction number is estimated.

  • progress_fn – A function with integer argument. If provided, it will be called every 10 iterations of the loop over serial intervals, with the current iteration number passed as the argument.

Gamma distribution

class branchpro.GammaDist(shape, rate)[source]

Gamma distribution.

Smaller version of the scipy.stats class. It uses the scipy methods, but only saves the shape and rate parameters in the object. Instantiation is much faster than scipy; method calls are similar in speed. It also uses less memory than scipy.

It also has a new density function, big_pdf(), which is faster on large array inputs.

We use the shape/rate parametrization, under which the gamma pdf is:

\[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\]

for shape \(alpha\) and rate :beta.

big_pdf(x)[source]

Probability density function optimized for large inputs.

For small arrays x, it will be slower than the regular pdf. However it can be much faster if x is a large array.

New Posterior Classes using MCMC Sampling algorithms

Branch Process with Poisson Noise

Log-likelihood Class

class branchpro.PoissonBranchProLogLik(inc_data, daily_serial_interval, tau, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]

PoissonBranchProLogLik Class: Controller class to construct the log-likelihood needed for optimisation or inference in a PINTS framework of Poisson branching process.

Parameters:
  • inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • tau – (numeric) size sliding time window over which the reproduction number is estimated.

  • imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

evaluateS1(x)[source]

Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.

The returned data is a tuple (L, L') where L is a scalar value and L' is a sequence of length n_parameters.

Note that the derivative returned is of the log-pdf, so L' = d/dp log(f(p)), evaluated at p=x.

This is an optional method that is not always implemented.

get_serial_intervals()[source]

Returns serial intervals for the model.

Returns:

Serial intervals for the model.

Return type:

list

n_parameters()[source]

Returns number of parameters for log-likelihood object.

Returns:

Number of parameters for log-likelihood object.

Return type:

int

set_epsilon(new_epsilon)[source]

Updates proportionality constant of the R number for imported cases with respect to its analog for local ones.

Parameters:

new_epsilon – new value of constant of proportionality.

set_serial_intervals(serial_intervals)[source]

Updates serial intervals for the model.

Parameters:

serial_intervals – New unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

Log-posterior Class

class branchpro.PoissonBranchProLogPosterior(inc_data, daily_serial_interval, tau, alpha, beta, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]

PoissonBranchProLogPosterior Class: Controller class for the optimisation or inference of parameters of the Poisson Branching process model in a PINTS framework.

Parameters:
  • inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • tau – (numeric) Size sliding time window over which the reproduction number is estimated.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • alpha – the shape parameter of the Gamma distribution of the prior.

  • beta – the rate parameter of the Gamma distribution of the prior.

  • imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

return_loglikelihood(x)[source]

Return the log-likelihood used for the optimisation or inference.

Parameters:

x (list) – List of free parameters used for computing the log-likelihood.

Returns:

Value of the log-likelihood at the given point in the free parameter space.

Return type:

float

return_logposterior(x)[source]

Return the log-posterior used for the optimisation or inference.

Parameters:

x (list) – List of free parameters used for computing the log-posterior.

Returns:

Value of the log-posterior at the given point in the free parameter space.

Return type:

float

return_logprior(x)[source]

Return the log-prior used for the optimisation or inference.

Parameters:

x (list) – List of free parameters used for computing the log-prior.

Returns:

Value of the log-prior at the given point in the free parameter space.

Return type:

float

run_inference(num_iter)[source]

Runs the parameter inference routine for the Poisson branching process model.

Parameters:

num_iter (integer) – Number of iterations the MCMC sampler algorithm is run for.

Returns:

3D-matrix of the proposed parameters for each iteration for each of the chains of the MCMC sampler.

Return type:

numpy.array

run_optimisation()[source]

Runs the initial conditions optimisation routine for the Poisson branching process model.

Returns:

  • numpy.array – Matrix of the optimised parameters at the end of the optimisation procedure.

  • float – Value of the log-posterior at the optimised point in the free parameter space.

Multiple Categories Branch Process with Poisson Noise

Log-likelihood Class

class branchpro.MultiCatPoissonBranchProLogLik(inc_data, daily_serial_interval, num_cat, contact_matrix, transm, tau, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number', multipleSI=False)[source]

MultiCatPoissonBranchProLogLik Class: Controller class to construct the log-likelihood needed for optimisation or inference in a PINTS framework of Poisson branching process with multiple population categories.

Parameters:
  • inc_data – (pandas Dataframe) Dataframe of the categorical numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number vector per categories, respectively.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms for each category.

  • num_cat – (int) Number of categories in which the population is split.

  • contact_matrix – (array) Matrix of contacts between the different categories in which the population is split.

  • transm – (list) List of overall reductions in transmissibility per category.

  • tau – (numeric) size sliding time window over which the reproduction number is estimated.

  • imported_inc_data – (pandas Dataframe) contains numbers of categorical imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number vector per categories, respectively.

  • epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

  • multipleSI – (boolean) Different serial intervals used for categories.

evaluateS1(x)[source]

Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.

The returned data is a tuple (L, L') where L is a scalar value and L' is a sequence of length n_parameters.

Note that the derivative returned is of the log-pdf, so L' = d/dp log(f(p)), evaluated at p=x.

This is an optional method that is not always implemented.

get_serial_intervals()[source]

Returns serial intervals for the model.

n_parameters()[source]

Returns number of parameters for log-likelihood object.

Returns:

Number of parameters for log-likelihood object.

Return type:

int

set_serial_intervals(serial_intervals, multipleSI=False)[source]

Updates serial intervals for the model.

Parameters:
  • serial_intervals – New unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms for each category.

  • multipleSI – (boolean) Different serial intervals used for categories.

Log-posterior Class

class branchpro.MultiCatPoissonBranchProLogPosterior(inc_data, daily_serial_interval, num_cat, contact_matrix, transm, tau, alpha, beta, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]

MultiCatPoissonBranchProLogPosterior Class: Controller class for the optimisation or inference of parameters of the Poisson Branching process model in a PINTS framework.

Parameters:
  • inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • num_cat – (int) Number of categories in which the population is split.

  • contact_matrix – (array) Matrix of contacts between the different categories in which the population is split.

  • transm – (list) List of overall reductions in transmissibility per category.

  • tau – (numeric) Size sliding time window over which the reproduction number is estimated.

  • alpha – the shape parameter of the Gamma distribution of the prior.

  • beta – the rate parameter of the Gamma distribution of the prior.

  • imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

return_loglikelihood(x)[source]

Return the log-likelihood used for the optimisation or inference.

Parameters:

x (list) – List of free parameters used for computing the log-likelihood.

Returns:

Value of the log-likelihood at the given point in the free parameter space.

Return type:

float

return_logposterior(x)[source]

Return the log-posterior used for the optimisation or inference.

Parameters:

x (list) – List of free parameters used for computing the log-posterior.

Returns:

Value of the log-posterior at the given point in the free parameter space.

Return type:

float

return_logprior(x)[source]

Return the log-prior used for the optimisation or inference.

Parameters:

x (list) – List of free parameters used for computing the log-prior.

Returns:

Value of the log-prior at the given point in the free parameter space.

Return type:

float

run_inference(num_iter)[source]

Runs the parameter inference routine for the Poisson branching process model.

Parameters:

num_iter (integer) – Number of iterations the MCMC sampler algorithm is run for.

Returns:

3D-matrix of the proposed parameters for each iteration for each of the chains of the MCMC sampler.

Return type:

numpy.array

run_optimisation()[source]

Runs the initial conditions optimisation routine for the Poisson branching process model.

Returns:

  • numpy.array – Matrix of the optimised parameters at the end of the optimisation procedure.

  • float – Value of the log-posterior at the optimised point in the free parameter space.

Branch Process with Negative Binomial Noise

Log-likelihood Class

class branchpro.NegBinBranchProLogLik(inc_data, daily_serial_interval, tau, phi, infer_phi=True, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]

NegBinBranchProLogLik Class: Controller class to construct the log-likelihood needed for optimisation or inference in a PINTS framework of negative binomial branching process.

Parameters:
  • inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • tau – (numeric) size sliding time window over which the reproduction number is estimated.

  • phi – (numeric) Value of the overdispersion parameter for the negative binomial noise distribution.

  • infer_phi – (boolean) Indicator value of whether the overdispersion parameter for the negative binomial noise distribution is inferred or not.

  • imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

get_overdispersion()[source]

Returns overdispersion noise parameter for the model.

n_parameters()[source]

Returns number of parameters for log-likelihood object.

Returns:

Number of parameters for log-likelihood object.

Return type:

int

set_overdispersion(phi)[source]

Updates overdispersion noise parameter for the model.

Parameters:

phi – New value of the overdispersion parameter for the negative binomial noise distribution.

Log-posterior Class

class branchpro.NegBinBranchProLogPosterior(inc_data, daily_serial_interval, tau, phi, alpha, beta, infer_phi=False, phi_shape=None, phi_rate=None, phi_prior=None, imported_inc_data=None, epsilon=None, time_key='Time', inc_key='Incidence Number')[source]

NegBinBranchProLogPosterior Class: Controller class for the optimisation or inference of parameters of the negative binomial branching process model in a PINTS framework.

Parameters:
  • inc_data – (pandas Dataframe) Dataframe of the numbers of new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • daily_serial_interval – (list) Unnormalised probability distribution of that the recipient first displays symptoms s days after the infector first displays symptoms.

  • tau – (numeric) Size sliding time window over which the reproduction number is estimated.

  • phi – (numeric) Value of the overdispersion parameter for the negative binomial noise distribution.

  • alpha – the shape parameter of the Gamma distribution of the prior.

  • beta – the rate parameter of the Gamma distribution of the prior.

  • infer_phi – (boolean) Indicator value of whether the overdispersion parameter for the negative binomial noise distribution is inferred or not.

  • phi_shape – the shape parameter of the Gamma distribution of the prior of the overdispersion.

  • phi_rate – the rate parameter of the Gamma distribution of the prior of the overdispersion.

  • phi_prior – (pints.LogPrior) Prior distribution of the phi parameter. Can be non-Gamma.

  • imported_inc_data – (pandas Dataframe) contains numbers of imported new cases by time unit (usually days). Data stored in columns of with one for time and one for incidence number, respectively.

  • epsilon – (numeric) Proportionality constant of the R number for imported cases with respect to its analog for local ones.

  • time_key – label key given to the temporal data in the inc_data dataframe.

  • inc_key – label key given to the incidental data in the inc_data dataframe.

run_inference(num_iter)[source]

Runs the parameter inference routine for the Poisson branching process model.

Parameters:

num_iter (integer) – Number of iterations the MCMC sampler algorithm is run for.

Returns:

3D-matrix of the proposed parameters for each iteration for each of the chains of the MCMC sampler.

Return type:

numpy.array

run_optimisation()[source]

Runs the initial conditions optimisation routine for the Poisson branching process model.

Returns:

  • numpy.array – Matrix of the optimised parameters at the end of the optimisation procedure.

  • float – Value of the log-posterior at the optimised point in the free parameter space.