sparsereg.model package¶
Submodules¶
sparsereg.model.base module¶
-
sparsereg.model.base.
print_model
(coef, input_features, errors=None, intercept=None, error_intercept=None, precision=3, pm='±')[source]¶ Parameters: - coef –
- input_features –
- errors –
- intercept –
- sigma_intercept –
- precision –
- pm –
Returns:
-
class
sparsereg.model.base.
STRidge
(threshold=0.01, alpha=0.1, max_iter=100, normalize=True, fit_intercept=True, threshold_intercept=False, copy_X=True, unbias=True, ridge_kw=None)[source]¶ Bases:
sklearn.linear_model.base.LinearModel
,sklearn.base.RegressorMixin
-
complexity
¶
-
sparsereg.model.bayes module¶
-
class
sparsereg.model.bayes.
JMAP
(ae0=1e-06, be0=1e-06, af0=1e-06, bf0=1e-06, max_iter=300, tol=0.001, normalize=False, fit_intercept=True, copy_X=True)[source]¶ Bases:
sklearn.linear_model.base.LinearModel
,sklearn.base.RegressorMixin
,sparsereg.model.base.PrintMixin
-
sparsereg.model.bayes.
jmap
(g, H, ae0, be0, af0, bf0, max_iter=1000, tol=0.0001, rcond=None, observer=None)[source]¶ Maximum a posteriori estimator for g = H @ f + e
p(g | f) = normal(H f, ve I) p(ve) = inverse_gauss(ae0, be0) p(f | vf) = normal(0, vf I) p(vf) = inverse_gauss(af0, bf0)
- JMAP: maximizes p(f,ve,vf|g) = p(g | f) p(f | vf) p(ve) p(vf) / p(g)
- with respect to f, ve and vf
Original Author: Ali Mohammad-Djafari, April 2015
Parameters: - g –
- H –
- ae0 –
- be0 –
- af0 –
- bf0 –
- max_iter –
- rcond –
Returns:
sparsereg.model.efs module¶
-
sparsereg.model.efs.
mutate
(names, importance, toursize, operators, rng=<module 'random' from '/home/docs/checkouts/readthedocs.org/user_builds/sparsereg/envs/latest/lib/python3.6/random.py'>)[source]¶
-
class
sparsereg.model.efs.
LibTrafo
(names, operators)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
class
sparsereg.model.efs.
EFS
(q=1, mu=1, max_size=5, t=0.95, toursize=5, max_stall_iter=20, max_iter=2000, random_state=None, operators={'add': <ufunc 'add'>, 'cos': <ufunc 'cos'>, 'div': <ufunc 'true_divide'>, 'exp': <ufunc 'exp'>, 'log': <ufunc 'log'>, 'mul': <ufunc 'multiply'>, 'sin': <ufunc 'sin'>, 'sqrt': <ufunc 'sqrt'>, 'square': <ufunc 'square'>, 'subtract': <ufunc 'subtract'>}, max_coarsity=2, n_jobs=1)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.RegressorMixin
,sklearn.base.TransformerMixin
Evolutionary feature synthesis.
sparsereg.model.ffx module¶
-
class
sparsereg.model.ffx.
FFXElasticNet
(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, precompute=False, max_iter=1000, copy_X=True, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')[source]¶ Bases:
sparsereg.model.base.PrintMixin
,sklearn.linear_model.coordinate_descent.ElasticNet
Mixin, implements only the
score
method.
-
class
sparsereg.model.ffx.
FFXRationalElasticNet
(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, precompute=False, max_iter=1000, copy_X=True, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')[source]¶ Bases:
sparsereg.model.base.RationalFunctionMixin
,sparsereg.model.ffx.FFXElasticNet
-
class
sparsereg.model.ffx.
Strategy
¶ Bases:
tuple
Create new instance of Strategy(exponents, operators, consider_products, index, base)
-
base
¶ Alias for field number 4
-
consider_products
¶ Alias for field number 2
-
exponents
¶ Alias for field number 0
-
index
¶ Alias for field number 3
-
operators
¶ Alias for field number 1
-
-
sparsereg.model.ffx.
enet_path
(est, x_train, x_test, y_train, y_test, num_alphas, eps, l1_ratio, target_score, n_tail, max_complexity)[source]¶
-
sparsereg.model.ffx.
run_strategy
(strategy, x_train, x_test, y_train, y_test, num_alphas, eps, l1_ratios, target_score, n_tail, max_complexity, n_jobs, **kw)[source]¶
-
sparsereg.model.ffx.
run_ffx
(x_train, x_test, y_train, y_test, exponents, operators, num_alphas=100, l1_ratios=(0.1, 0.3, 0.5, 0.7, 0.9, 0.95), eps=1e-30, target_score=0.01, max_complexity=50, n_tail=15, random_state=None, strategies=None, n_jobs=1, rational=True, **kw)[source]¶
-
class
sparsereg.model.ffx.
WeightedEnsembleEstimator
(estimators, weights)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
class
sparsereg.model.ffx.
FFX
(l1_ratios=(0.4, 0.8, 0.95), num_alphas=30, eps=1e-05, random_state=None, strategies=None, target_score=0.01, n_tail=5, decision='min', max_complexity=50, exponents=[1, 2], operators={}, n_jobs=1, rational=True, **kw)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.RegressorMixin
Fast Function eXtraction model.
Parameters: - l1_ratios (iterable) – Determines ratio of l1 to l2 penalty term
- num_alphas (int) – Determines numbers of different ratios of cost function to penalty term.
0<= l1_ratio <= 1
. - eps (float) – ratio of smallest to largest alpha considered. (
0 < eps < 1
) - random_state (int) –
- strategies (iterable) –
Strategy
s to consider - target_score (float) – break condition on cost function for innermost loop
- n_tail (int) – length of path (in alpha) to check into past for saturation
- decision (str) – one of
'weight'
or'min'
- max_complexity (float) – break condition on model complexity for innermost loop
- exponents (iterable) – can contain float and negative values
- operators (dict) – mapping operator name to callable (of one variable)
- n_jobs (int) –
- rational (bool) – Whether to consider general rational functions as well
- kw –
The implemented algorithm is found in
http://dx.doi.org/10.1007/978-1-4614-1770-5_13
.A
Strategy
is determined by a set of nonlinear functions from which an extended set of features will be generated by evaluating these functions on all given features. You can either supply the strategies directly via thestrategies
parameter or let the strategies be generated. Generation of strategies is configured by the parametersexponents
,operators
andrational
. Whenstrategies
is given,exponents
,operators
andrational
have no effect.Strategy generation takes place in the following manner:
exponents
:- Orders of the monomials to consider for each single feature. (No products between features here).
exponents
is an iterable of numbers (floats and negative values are possible, 1 will always automatically be included.) The first step in strategy generation is calculating all monomials. operators
:- mapping of str to callable taking one parameter.
All callables in
operators
will be evaluated on all monomials from the first step - products
- Not configurable. Always consider all products of each operator feature
from the second step with each monomial feature from the first.
And all products of monomial features with all monomial features based on a different feature
(thus generating mixed products up to order
2*max(exponents)
). rational
- If true, do not only consider generalized linear models from all basis functions but consider also rational functions using the rational function trick described here
For each
Strategy
, an elastic net optimizer will be run with many combinations of l1_ratio and alpha. Al1_ratio
of 0 corresponds to ridge regression (only l2 penalty), al1_ratio
of 1 corresponds to LASSO regression (only l1 penalty).alpha
determines the amount of regularization, wherealpha=0
would mean now regularization andalpha -> infty
would mean only regularization. For details on the used elastic net algorithm seesklearn.linear_model.ElasticNet
.The number of alphas is loosely determined by
num_alpha
(the actual number is close and never smaller). The maximum value of the consideredalpha
is determined dynamically based on Tibshirani’s “Strong Rules”, see `https://doi.org/10.1111/j.1467-9868.2011.01004.x`_ The rule gives analpha
for which the fitted model will (in most relevant cases) have a complexity of 0 (no nonzero terms). This maximum alpha also depends on the l1_ratio, therefore the iteration over alpha takes place in the innermost loop.The innermost loop would iterate from the maximum alpha to
eps
times the maximum alpha. With increasingalpha
, the complexity (number of non-zero terms) is expected to increase, whereas the cost (nrmse evaluated on the training set) is expected to decrease.The innermost loop has three break conditions:
- train_score
If the cost is less or equal to
target_score
- complexity
If the complexity is greater or equal to
max_complexity
- saturation
No significant improvement in the cost during the last
n_tail
iterations. (Significant -> last 4 decimal digits)
To obtain a single model from the Pareto front of models, the Akaike information criterion (AIC) is used (see
https://en.wikipedia.org/wiki/Akaike_information_criterion
). How it is used is determined by thedecision
parameter. Ifdecision == 'min'
, the model with the smallest AIC is taken, ifdecision == 'weighted'
, the resulting model will be a linear combination of all models the front consists of, weighted byexp((min(AIC)-AIC)/2)
.-
score
(x, y)[source]¶ Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Parameters: - X (array-like, shape = (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
- y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns: score – R^2 of self.predict(X) wrt. y.
Return type: float
Notes
The R2 score used when calling
score
on a regressor will usemultioutput='uniform_average'
from version 0.23 to keep consistent withmetrics.r2_score
. This will influence thescore
method of all the multioutput regressors (except formultioutput.MultiOutputRegressor
). To specify the default value manually and avoid the warning, please either callmetrics.r2_score
directly or make a custom scorer withmetrics.make_scorer
(the built-in scorer'r2'
usesmultioutput='uniform_average'
).