common Package

`common` Package

This package handles common functionalities that can come in handy for differnt types of estimators. It includes test problems, a kernel density estimation handle, and some other utility functions.

`kde` Module

class sde_calibration.common.kde.KDE(h, X, Y, kernel_type='gauss')

Bases: object

Class for estimating probability densities using kernel density estimation.

Parameters

h (np.ndarray) – Kernel bandwidth.
X (np.ndarray) – Training data (NxD).
Y (np.ndarray) – Training labels (Nx1). They are used for regression.
kernel_type (str, optional) –
String describing the kernel function that should be used. Currently implemented options are:
- gauss (Gaussian kernel)
- parzen (Parzen window)
  
  |default| 'gauss'

Return type

None

Raises

ValueError – if the inputs X and Y have a different number of entries.

cross_validation_error(p, h=None)

Computes the leave-one-out cross-validation least-squares metric for local polynomial regression.

Parameters

p (int) – Order of the local polynomial estimator.
h (np.ndarray, optional) –
Array of bandwidths that should b used for cross-validation (mx1). If h is None then the bandwidth stored in the object is used.

|default| None

Returns

Cross-validation error metric for each bandwidth (mx1).

Return type

np.ndarray

diffusion_estimation(x, p=0, uncertainty=False, show_progress=False, display=<built-in function print>)

Regression estimator for the diffusion term of a diffusion process. The training data X is interpreted as time series data from the process.

Parameters

x (np.ndarray) – Points for evaluation of the diffusion function (nx1).
p (int, optional) –
Degree for local polynomial estimator. The default means that Nadaraya-Watson regression is performed.

|default| 0
uncertainty (bool, optional) – Determines if bias and variance should be estimated. |default| False
show_progress (bool, optional) – Determines whether the progress should be displayed. |default| False
display (Callable, optional) – Function to be executed in order to display the progress |default| <built-in function print>

Returns

Estimation of the diffusion term at the points specified by x (nx1). Estimated bias at points specified by x (nx1). If uncertainty flag is not set None will be returned. Estimated variance at points specified by x (nx1). If uncertainty flag is not set None will be returned.

Return type

tuple (np.ndarray, np.ndarray, np.ndarray)

drift_estimation(x, p=0, uncertainty=False, show_progress=False, display=<built-in function print>)

Regression estimator for the drift term of a diffusion process. The training data X is interpreted as time series data from the process.

Parameters

x (np.ndarray) – Points for evaluation of the drift function (nx1).
p (int, optional) –
Degree for local polynomial estimator. The default means that Nadaraya-Watson regression is performed.

|default| 0
uncertainty (bool, optional) – Determines if bias and variance should be estimated. |default| False
show_progress (bool, optional) – Determines whether the progress should be displayed. |default| False
display (Callable, optional) – Function to be executed in order to display the progress |default| <built-in function print>

Returns

Estimation of the drift term at the points specified by x (nx1). Estimated bias at points specified by x (nx1). If uncertainty flag is not set None will be returned. Estimated variance at points specified by x (nx1). If uncertainty flag is not set None will be returned.

Return type

tuple (np.ndarray, np.ndarray, np.ndarray)

get_bandwidth()

Returns the current bandwidth stored in an instance.

Returns: Bandwidth (scalar).
Return type: float

get_probability(x)

Estimates the pdf at the points specified

Parameters: x (np.ndarray) – Points for evaluation (nxD).
Returns: Probability density at the points specified by x (nx1).
Return type: np.ndarray

set_bandwidth(h)

Sets a new bandwidth for further computations.

Parameters: h (float) – New bandwidth (scalar).
Return type: None

set_data(X, Y=None)

Adjust the training data stored in the object.

Parameters

X (np.ndarray) – New training data (NxD).
Y (np.ndarray, optional) – New training labels (Nx1). |default| None

Return type

None

Raises

ValueError – if the inputs X and Y have a different number of entries.

`test_problems` Module

class sde_calibration.common.test_problems.Problems(problem_type='OU')

Bases: object

This class handles some common test problem types. It allows to easily use some benchmark problems. Models that are included are:

Ornstein-Uhlenbeck (OU) process (1D and 2D)

Cox-Ingersoll-Ross (CIR) process

Hyperbolic process

Modified CIR process

Double well process

Black-Scholes process

Note that all processes – except the Black-Scholes process – are setup in a way s.t. there exists an invariant distribution.

Parameters

problem_type (str, optional) –

The process that should be setup. Possible options are:

OU
OU_2D
CIR
hyperbolic
modified_CIR
double_well
black_scholes
|default| 'OU'

Return type

None

get_density(x)

Evaluates the invariant density function of a process given the points of evaluation.

Parameters: x (np.ndarray) – Array of the points at which the density should be evaluated.
Returns: The density function evaluated at the points \(x\).
Return type: np.ndarray
Raises: NotImplementedError – If the invariant density does not exist or no analytic expression exists.

get_setup()

Returns the setup of the process that the instance is initalized with.

Returns: The dictionary of parameters describing the process, the drift function, and the diffusion function
Return type: tuple (dict, Callable, Callable)

`utils` Module

class sde_calibration.common.utils.Preprocessor(X, y=None, batch_size=1024, validation_size=0.2, input_scaling=None, output_scaling=None, input_columns=None, output_columns=None)

Bases: object

Class for preprocessing a dataset. This includes scaling the data as well as batching, caching, and prefetching.

Parameters

X (np.ndarray) – Array of predictors.
y (np.ndarray, optional) – Array of responses. |default| None
batch_size (int, optional) – Batch size that should be used to perform training on minibatches. |default| 1024
validation_size (float, optional) –
Portion of the provided dataset that should be used for validation.

|default| 0.2
input_scaling (str, optional) –
Gives the type of scaling that should be used for the predictors. Possible options are:
- None
- minmax
- standard
- robust
If None is chosen, no scaling of the data is performed.

|default| None
output_scaling (str, optional) –
Gives the type of scaling that should be used for the response variables. Possible options are:
- None
- minmax
- standard
- robust
If None is chosen, no scaling of the data is performed.

|default| None
input_columns (list, optional) –
If not all of the predictor variables should be transformed this can be specified by setting the columns via his parameter. If None is chosen all variables are transformed.

|default| None
output_columns (list, optional) –
If not all of the response variables should be transformed this can be specified by setting the columns via his parameter. If None is chosen all variables are transformed.

|default| None

Return type

None

Raises

ValueError – If the in- or output scaling type is not valid.
ValueError – If the passed validation size does not lie in the interval [0, 1).

get_batch_size()

Gives access to the batch size that is used to preprocess the dataset.

Returns: Internally stored batch size for preprocessing.
Return type: float

get_dataset(dataset_type='train')

Gives access to the transformed, but not yet preprocessed datasets.

Parameters

dataset_type (str, optional) –

Specifies if either the train or the validation dataset should be returned.

|default| 'train'

Returns

The transformed predictors and responses of the train and validation dataset, respectively. If the validation size is chosen to be zero, then None will be returned in case of the validation dataset.

Return type

tuple (np.ndarray, np.ndarray)

Raises

ValueError –

If the parameter dataset_type is not one of the following options:

train
validation

get_processed_datasets()

Gives access to the train- and validation datasets after they have been preprocessed.

Returns: Processed train and validation datasets in TensorFlow format.
Return type: tuple (tf.data.Dataset, tf.data.Dataset)

inverse_transform_data(data, transform_type='input')

Performs the inverse transformation arbitrary data according to the scaling that is initialized by the given dataset.

Parameters

data (np.ndarray) – The data that should be transformed by the stored processor. If no scaling is performed, the data will be returned as it is.
transform_type (str, optional) –
Specifies whether the scaling should be performed according to the predictor (input) scaling or the response (output) scaling.

|default| 'input'

Returns

The inverse transformed data, i.e. if scaled data is passed in the scaling is reversed by this inverse transformation.

Return type

np.ndarray

Raises

ValueError –

If the parameter transform_type has not one of the following values:

input
output

transform_data(data, transform_type='input')

Transforms arbitrary data according to the scaling that is initialized by the given dataset.

Parameters

data (np.ndarray) – The data that should be transformed by the stored processor. If no scaling is performed, the data will be returned as it is.
transform_type (str, optional) –
Specifies whether the scaling should be performed according to the predictor (input) scaling or the response (output) scaling.

|default| 'input'

Returns

The scaled data.

Return type

np.ndarray

Raises

ValueError –

If the parameter transform_type has not one of the following values:

input
output

sde_calibration.common.utils.clean_directory(dir_path)

Makes sure that the directory given exists. If not the directory is created. In case that the directory already exists it is cleaned and all subdirectories are removed.

Parameters: dir_path (str) – String specifiying the path of the directory to be cleaned.
Return type: None

sde_calibration.common.utils.setup_logger(name='logging', fname=None, level=10, log_format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

Function that sets up a basic logger to print on the console as well as log into a file provided one is specified.

Parameters

name (str, optional) – Name of the logger. |default| 'logging'
fname (str, optional) –
A path to the file that should be used for logging. If None is provided the logger does not print any results to a file.

|default| None
level (int, optional) –
The logging level, i.e. the depth up to which the logger should notify via an output. Possible options are:
- logging.NOTSET (0)
- logging.DEBUG (10)
- logging.INFO (20)
- logging.WARNING (30)
- logging.ERROR (40)
- logging.CRITICAL (50)
  
  |default| 10
log_format (str, optional) –
Format string specifying the format that should be used for logging. For more information on the format, c.f. here.

|default| '%(asctime)s - %(name)s - %(levelname)s - %(message)s'

Returns

A Logger object which can be used for further logging.

Return type

logging.Logger

Raises

ValueError – If the provided log level is not one of the previously given possible options.

common Package

common Package

kde Module

test_problems Module

utils Module

`common` Package

`kde` Module

`test_problems` Module

`utils` Module