sparsereg.model package¶
Submodules¶
sparsereg.model.base module¶
-
sparsereg.model.base.print_model(coef, input_features, errors=None, intercept=None, error_intercept=None, precision=3, pm='±')[source]¶ Parameters: - coef –
- input_features –
- errors –
- intercept –
- sigma_intercept –
- precision –
- pm –
Returns:
-
class
sparsereg.model.base.STRidge(threshold=0.01, alpha=0.1, max_iter=100, normalize=True, fit_intercept=True, threshold_intercept=False, copy_X=True, unbias=True, ridge_kw=None)[source]¶ Bases:
sklearn.linear_model.base.LinearModel,sklearn.base.RegressorMixin-
complexity¶
-
sparsereg.model.bayes module¶
-
class
sparsereg.model.bayes.JMAP(ae0=1e-06, be0=1e-06, af0=1e-06, bf0=1e-06, max_iter=300, tol=0.001, normalize=False, fit_intercept=True, copy_X=True)[source]¶ Bases:
sklearn.linear_model.base.LinearModel,sklearn.base.RegressorMixin,sparsereg.model.base.PrintMixin
-
sparsereg.model.bayes.jmap(g, H, ae0, be0, af0, bf0, max_iter=1000, tol=0.0001, rcond=None, observer=None)[source]¶ Maximum a posteriori estimator for g = H @ f + e
p(g | f) = normal(H f, ve I) p(ve) = inverse_gauss(ae0, be0) p(f | vf) = normal(0, vf I) p(vf) = inverse_gauss(af0, bf0)
- JMAP: maximizes p(f,ve,vf|g) = p(g | f) p(f | vf) p(ve) p(vf) / p(g)
- with respect to f, ve and vf
Original Author: Ali Mohammad-Djafari, April 2015
Parameters: - g –
- H –
- ae0 –
- be0 –
- af0 –
- bf0 –
- max_iter –
- rcond –
Returns:
sparsereg.model.efs module¶
-
sparsereg.model.efs.mutate(names, importance, toursize, operators, rng=<module 'random' from '/home/docs/checkouts/readthedocs.org/user_builds/sparsereg/envs/stable/lib/python3.6/random.py'>)[source]¶
-
class
sparsereg.model.efs.LibTrafo(names, operators)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixin
-
class
sparsereg.model.efs.EFS(q=1, mu=1, max_size=5, t=0.95, toursize=5, max_stall_iter=20, max_iter=2000, random_state=None, operators={'add': <ufunc 'add'>, 'cos': <ufunc 'cos'>, 'div': <ufunc 'true_divide'>, 'exp': <ufunc 'exp'>, 'log': <ufunc 'log'>, 'mul': <ufunc 'multiply'>, 'sin': <ufunc 'sin'>, 'sqrt': <ufunc 'sqrt'>, 'square': <ufunc 'square'>, 'subtract': <ufunc 'subtract'>}, max_coarsity=2, n_jobs=1)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.RegressorMixin,sklearn.base.TransformerMixinEvolutionary feature synthesis.
sparsereg.model.ffx module¶
-
class
sparsereg.model.ffx.FFXElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, precompute=False, max_iter=1000, copy_X=True, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')[source]¶ Bases:
sparsereg.model.base.PrintMixin,sklearn.linear_model.coordinate_descent.ElasticNetMixin, implements only the
scoremethod.
-
class
sparsereg.model.ffx.FFXRationalElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, precompute=False, max_iter=1000, copy_X=True, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')[source]¶ Bases:
sparsereg.model.base.RationalFunctionMixin,sparsereg.model.ffx.FFXElasticNet
-
class
sparsereg.model.ffx.Strategy¶ Bases:
tupleCreate new instance of Strategy(exponents, operators, consider_products, index, base)
-
base¶ Alias for field number 4
-
consider_products¶ Alias for field number 2
-
exponents¶ Alias for field number 0
-
index¶ Alias for field number 3
-
operators¶ Alias for field number 1
-
-
sparsereg.model.ffx.enet_path(est, x_train, x_test, y_train, y_test, num_alphas, eps, l1_ratio, target_score, n_tail, max_complexity)[source]¶
-
sparsereg.model.ffx.run_strategy(strategy, x_train, x_test, y_train, y_test, num_alphas, eps, l1_ratios, target_score, n_tail, max_complexity, n_jobs, **kw)[source]¶
-
sparsereg.model.ffx.run_ffx(x_train, x_test, y_train, y_test, exponents, operators, num_alphas=100, l1_ratios=(0.1, 0.3, 0.5, 0.7, 0.9, 0.95), eps=1e-30, target_score=0.01, max_complexity=50, n_tail=15, random_state=None, strategies=None, n_jobs=1, rational=True, **kw)[source]¶
-
class
sparsereg.model.ffx.WeightedEnsembleEstimator(estimators, weights)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixin
-
class
sparsereg.model.ffx.FFX(l1_ratios=(0.4, 0.8, 0.95), num_alphas=30, eps=1e-05, random_state=None, strategies=None, target_score=0.01, n_tail=5, decision='min', max_complexity=50, exponents=[1, 2], operators={}, n_jobs=1, rational=True, **kw)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.RegressorMixinFast Function eXtraction model.
Parameters: - l1_ratios (iterable) – Determines ratio of l1 to l2 penalty term
- num_alphas (int) – Determines numbers of different ratios of cost function to penalty term.
0<= l1_ratio <= 1. - eps (float) – ratio of smallest to largest alpha considered. (
0 < eps < 1) - random_state (int) –
- strategies (iterable) –
Strategys to consider - target_score (float) – break condition on cost function for innermost loop
- n_tail (int) – length of path (in alpha) to check into past for saturation
- decision (str) – one of
'weight'or'min' - max_complexity (float) – break condition on model complexity for innermost loop
- exponents (iterable) – can contain float and negative values
- operators (dict) – mapping operator name to callable (of one variable)
- n_jobs (int) –
- rational (bool) – Whether to consider general rational functions as well
- kw –
The implemented algorithm is found in
http://dx.doi.org/10.1007/978-1-4614-1770-5_13.A
Strategyis determined by a set of nonlinear functions from which an extended set of features will be generated by evaluating these functions on all given features. You can either supply the strategies directly via thestrategiesparameter or let the strategies be generated. Generation of strategies is configured by the parametersexponents,operatorsandrational. Whenstrategiesis given,exponents,operatorsandrationalhave no effect.Strategy generation takes place in the following manner:
exponents:- Orders of the monomials to consider for each single feature. (No products between features here).
exponentsis an iterable of numbers (floats and negative values are possible, 1 will always automatically be included.) The first step in strategy generation is calculating all monomials. operators:- mapping of str to callable taking one parameter.
All callables in
operatorswill be evaluated on all monomials from the first step - products
- Not configurable. Always consider all products of each operator feature
from the second step with each monomial feature from the first.
And all products of monomial features with all monomial features based on a different feature
(thus generating mixed products up to order
2*max(exponents)). rational- If true, do not only consider generalized linear models from all basis functions but consider also rational functions using the rational function trick described here
For each
Strategy, an elastic net optimizer will be run with many combinations of l1_ratio and alpha. Al1_ratioof 0 corresponds to ridge regression (only l2 penalty), al1_ratioof 1 corresponds to LASSO regression (only l1 penalty).alphadetermines the amount of regularization, wherealpha=0would mean now regularization andalpha -> inftywould mean only regularization. For details on the used elastic net algorithm seesklearn.linear_model.ElasticNet.The number of alphas is loosely determined by
num_alpha(the actual number is close and never smaller). The maximum value of the consideredalphais determined dynamically based on Tibshirani’s “Strong Rules”, see `https://doi.org/10.1111/j.1467-9868.2011.01004.x`_ The rule gives analphafor which the fitted model will (in most relevant cases) have a complexity of 0 (no nonzero terms). This maximum alpha also depends on the l1_ratio, therefore the iteration over alpha takes place in the innermost loop.The innermost loop would iterate from the maximum alpha to
epstimes the maximum alpha. With increasingalpha, the complexity (number of non-zero terms) is expected to increase, whereas the cost (nrmse evaluated on the training set) is expected to decrease.The innermost loop has three break conditions:
- train_score
If the cost is less or equal to
target_score - complexity
If the complexity is greater or equal to
max_complexity - saturation
No significant improvement in the cost during the last
n_tailiterations. (Significant -> last 4 decimal digits)
To obtain a single model from the Pareto front of models, the Akaike information criterion (AIC) is used (see
https://en.wikipedia.org/wiki/Akaike_information_criterion). How it is used is determined by thedecisionparameter. Ifdecision == 'min', the model with the smallest AIC is taken, ifdecision == 'weighted', the resulting model will be a linear combination of all models the front consists of, weighted byexp((min(AIC)-AIC)/2).-
score(x, y)[source]¶ Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Parameters: - X (array-like, shape = (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
- y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns: score – R^2 of self.predict(X) wrt. y.
Return type: float
Notes
The R2 score used when calling
scoreon a regressor will usemultioutput='uniform_average'from version 0.23 to keep consistent withmetrics.r2_score. This will influence thescoremethod of all the multioutput regressors (except formultioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either callmetrics.r2_scoredirectly or make a custom scorer withmetrics.make_scorer(the built-in scorer'r2'usesmultioutput='uniform_average').