common Package

common Package

This package handles common functionalities that can come in handy for differnt types of estimators. It includes test problems, a kernel density estimation handle, and some other utility functions.

kde Module

class sde_calibration.common.kde.KDE(h, X, Y, kernel_type='gauss')

Bases: object

Class for estimating probability densities using kernel density estimation.

Parameters
  • h (np.ndarray) – Kernel bandwidth.

  • X (np.ndarray) – Training data (NxD).

  • Y (np.ndarray) – Training labels (Nx1). They are used for regression.

  • kernel_type (str, optional) –

    String describing the kernel function that should be used. Currently implemented options are:

    • gauss (Gaussian kernel)

    • parzen (Parzen window)

      |default| 'gauss'

Return type

None

Raises

ValueError – if the inputs X and Y have a different number of entries.

cross_validation_error(p, h=None)

Computes the leave-one-out cross-validation least-squares metric for local polynomial regression.

Parameters
  • p (int) – Order of the local polynomial estimator.

  • h (np.ndarray, optional) –

    Array of bandwidths that should b used for cross-validation (mx1). If h is None then the bandwidth stored in the object is used.

    |default| None

Returns

Cross-validation error metric for each bandwidth (mx1).

Return type

np.ndarray

diffusion_estimation(x, p=0, uncertainty=False, show_progress=False, display=<built-in function print>)

Regression estimator for the diffusion term of a diffusion process. The training data X is interpreted as time series data from the process.

Parameters
  • x (np.ndarray) – Points for evaluation of the diffusion function (nx1).

  • p (int, optional) –

    Degree for local polynomial estimator. The default means that Nadaraya-Watson regression is performed.

  • uncertainty (bool, optional) – Determines if bias and variance should be estimated. |default| False

  • show_progress (bool, optional) – Determines whether the progress should be displayed. |default| False

  • display (Callable, optional) – Function to be executed in order to display the progress |default| <built-in function print>

Returns

Estimation of the diffusion term at the points specified by x (nx1). Estimated bias at points specified by x (nx1). If uncertainty flag is not set None will be returned. Estimated variance at points specified by x (nx1). If uncertainty flag is not set None will be returned.

Return type

tuple (np.ndarray, np.ndarray, np.ndarray)

drift_estimation(x, p=0, uncertainty=False, show_progress=False, display=<built-in function print>)

Regression estimator for the drift term of a diffusion process. The training data X is interpreted as time series data from the process.

Parameters
  • x (np.ndarray) – Points for evaluation of the drift function (nx1).

  • p (int, optional) –

    Degree for local polynomial estimator. The default means that Nadaraya-Watson regression is performed.

  • uncertainty (bool, optional) – Determines if bias and variance should be estimated. |default| False

  • show_progress (bool, optional) – Determines whether the progress should be displayed. |default| False

  • display (Callable, optional) – Function to be executed in order to display the progress |default| <built-in function print>

Returns

Estimation of the drift term at the points specified by x (nx1). Estimated bias at points specified by x (nx1). If uncertainty flag is not set None will be returned. Estimated variance at points specified by x (nx1). If uncertainty flag is not set None will be returned.

Return type

tuple (np.ndarray, np.ndarray, np.ndarray)

get_bandwidth()

Returns the current bandwidth stored in an instance.

Returns

Bandwidth (scalar).

Return type

float

get_probability(x)

Estimates the pdf at the points specified

Parameters

x (np.ndarray) – Points for evaluation (nxD).

Returns

Probability density at the points specified by x (nx1).

Return type

np.ndarray

set_bandwidth(h)

Sets a new bandwidth for further computations.

Parameters

h (float) – New bandwidth (scalar).

Return type

None

set_data(X, Y=None)

Adjust the training data stored in the object.

Parameters
  • X (np.ndarray) – New training data (NxD).

  • Y (np.ndarray, optional) – New training labels (Nx1). |default| None

Return type

None

Raises

ValueError – if the inputs X and Y have a different number of entries.

test_problems Module

class sde_calibration.common.test_problems.Problems(problem_type='OU')

Bases: object

This class handles some common test problem types. It allows to easily use some benchmark problems. Models that are included are:

  • Ornstein-Uhlenbeck (OU) process (1D and 2D)

  • Cox-Ingersoll-Ross (CIR) process

  • Hyperbolic process

  • Modified CIR process

  • Double well process

  • Black-Scholes process

Note that all processes – except the Black-Scholes process – are setup in a way s.t. there exists an invariant distribution.

Parameters

problem_type (str, optional) –

The process that should be setup. Possible options are:

  • OU

  • OU_2D

  • CIR

  • hyperbolic

  • modified_CIR

  • double_well

  • black_scholes

    |default| 'OU'

Return type

None

get_density(x)

Evaluates the invariant density function of a process given the points of evaluation.

Parameters

x (np.ndarray) – Array of the points at which the density should be evaluated.

Returns

The density function evaluated at the points \(x\).

Return type

np.ndarray

Raises

NotImplementedError – If the invariant density does not exist or no analytic expression exists.

get_setup()

Returns the setup of the process that the instance is initalized with.

Returns

The dictionary of parameters describing the process, the drift function, and the diffusion function

Return type

tuple (dict, Callable, Callable)

See also

The notation of the parameters in the returned dictionary is taken from Simulation and Inference for Stochastic Differential Equations.

get_transition_density(y, x, dt)

Evaluates the transition density function of a process given the points of evaluation.

Parameters
  • y (np.ndarray) – Array of points at which the density should be evaluated.

  • x (np.ndarray) – Array of the points that are conditioned on.

  • dt (float) – Difference in time between the states \(x\) and \(y\).

Returns

The transition density function evaluated at the points \(y\) given the points \(x\) and the time step \(\Delta t\) between the states.

Return type

np.ndarray

Raises

NotImplementedError – If no analytic expression for the transition density exists.

utils Module

class sde_calibration.common.utils.Preprocessor(X, y=None, batch_size=1024, validation_size=0.2, input_scaling=None, output_scaling=None, input_columns=None, output_columns=None)

Bases: object

Class for preprocessing a dataset. This includes scaling the data as well as batching, caching, and prefetching.

Parameters
  • X (np.ndarray) – Array of predictors.

  • y (np.ndarray, optional) – Array of responses. |default| None

  • batch_size (int, optional) – Batch size that should be used to perform training on minibatches. |default| 1024

  • validation_size (float, optional) –

    Portion of the provided dataset that should be used for validation.

  • input_scaling (str, optional) –

    Gives the type of scaling that should be used for the predictors. Possible options are:

    • None

    • minmax

    • standard

    • robust

    If None is chosen, no scaling of the data is performed.

    |default| None

  • output_scaling (str, optional) –

    Gives the type of scaling that should be used for the response variables. Possible options are:

    • None

    • minmax

    • standard

    • robust

    If None is chosen, no scaling of the data is performed.

    |default| None

  • input_columns (list, optional) –

    If not all of the predictor variables should be transformed this can be specified by setting the columns via his parameter. If None is chosen all variables are transformed.

    |default| None

  • output_columns (list, optional) –

    If not all of the response variables should be transformed this can be specified by setting the columns via his parameter. If None is chosen all variables are transformed.

    |default| None

Return type

None

Raises
  • ValueError – If the in- or output scaling type is not valid.

  • ValueError – If the passed validation size does not lie in the interval [0, 1).

get_batch_size()

Gives access to the batch size that is used to preprocess the dataset.

Returns

Internally stored batch size for preprocessing.

Return type

float

get_dataset(dataset_type='train')

Gives access to the transformed, but not yet preprocessed datasets.

Parameters

dataset_type (str, optional) –

Specifies if either the train or the validation dataset should be returned.

|default| 'train'

Returns

The transformed predictors and responses of the train and validation dataset, respectively. If the validation size is chosen to be zero, then None will be returned in case of the validation dataset.

Return type

tuple (np.ndarray, np.ndarray)

Raises

ValueError

If the parameter dataset_type is not one of the following options:

  • train

  • validation

get_processed_datasets()

Gives access to the train- and validation datasets after they have been preprocessed.

Returns

Processed train and validation datasets in TensorFlow format.

Return type

tuple (tf.data.Dataset, tf.data.Dataset)

inverse_transform_data(data, transform_type='input')

Performs the inverse transformation arbitrary data according to the scaling that is initialized by the given dataset.

Parameters
  • data (np.ndarray) – The data that should be transformed by the stored processor. If no scaling is performed, the data will be returned as it is.

  • transform_type (str, optional) –

    Specifies whether the scaling should be performed according to the predictor (input) scaling or the response (output) scaling.

    |default| 'input'

Returns

The inverse transformed data, i.e. if scaled data is passed in the scaling is reversed by this inverse transformation.

Return type

np.ndarray

Raises

ValueError

If the parameter transform_type has not one of the following values:

  • input

  • output

transform_data(data, transform_type='input')

Transforms arbitrary data according to the scaling that is initialized by the given dataset.

Parameters
  • data (np.ndarray) – The data that should be transformed by the stored processor. If no scaling is performed, the data will be returned as it is.

  • transform_type (str, optional) –

    Specifies whether the scaling should be performed according to the predictor (input) scaling or the response (output) scaling.

    |default| 'input'

Returns

The scaled data.

Return type

np.ndarray

Raises

ValueError

If the parameter transform_type has not one of the following values:

  • input

  • output

sde_calibration.common.utils.clean_directory(dir_path)

Makes sure that the directory given exists. If not the directory is created. In case that the directory already exists it is cleaned and all subdirectories are removed.

Parameters

dir_path (str) – String specifiying the path of the directory to be cleaned.

Return type

None

sde_calibration.common.utils.setup_logger(name='logging', fname=None, level=10, log_format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

Function that sets up a basic logger to print on the console as well as log into a file provided one is specified.

Parameters
  • name (str, optional) – Name of the logger. |default| 'logging'

  • fname (str, optional) –

    A path to the file that should be used for logging. If None is provided the logger does not print any results to a file.

    |default| None

  • level (int, optional) –

    The logging level, i.e. the depth up to which the logger should notify via an output. Possible options are:

    • logging.NOTSET (0)

    • logging.DEBUG (10)

    • logging.INFO (20)

    • logging.WARNING (30)

    • logging.ERROR (40)

    • logging.CRITICAL (50)

  • log_format (str, optional) –

    Format string specifying the format that should be used for logging. For more information on the format, c.f. here.

    |default| '%(asctime)s - %(name)s - %(levelname)s - %(message)s'

Returns

A Logger object which can be used for further logging.

Return type

logging.Logger

Raises

ValueError – If the provided log level is not one of the previously given possible options.