hic3defdr.util.lowess module¶
-
hic3defdr.util.lowess.
lowess_fit
(x, y, logx=False, logy=False, left_boundary=None, right_boundary=None, frac=0.3, delta=0.01)[source]¶ Opinionated convenience wrapper for lowess smoothing.
Parameters: - y (x,) – The x and y values to fit, respectively.
- logy (logx,) – Pass True to perform the fit on the scale of
log(x)
and/orlog(y)
, respectively. - right_boundary (left_boundary,) – Allows specifying boundaries for the fit, in the original
x
space. If a float is passed, the returned fit will return the farthest left or farthest right lowess-estimatedy_hat
(from the original fitting set) for all points which are left or right of the specified left or right boundary point, respectively. Pass None to use linear extrapolation for these points instead. - frac (float) – The lowess smoothing fraction to use.
- delta (float) – Distance (on the scale of
x
orlog(x)
) within which to use linear interpolation when constructing the initial fit, expressed as a fraction of the range ofx
orlog(x)
.
Returns: This function takes in
x
values on the originalx
scale and returns estimatedy
values on the originaly
scale (regardless of what is passed forlogx
andlogy
). This function will still return sane estimates fory
even at points not in the original fitting set by performing linear interpolation in the space the fit was performed in.Return type: function
Notes
No filtering of input values is performed; clients are expected to handle this if desired. NaN values should not break the function, but
x
points with zero values passed whenlogx
is True are expected to break the function.The default value of the
delta
parameter is set to be non-zero, matching the behavior of lowess smoothing in R and improving performance.Linear interpolation between x-values in the original fitting set is used to provide a familiar functional interface to the fitted function.
Boundary conditions on the fitted function are exposed via
left_boundary
andright_boundary
, mostly as a convenience for points wherex == 0
when fitting was performed on the scale oflog(x)
.When
left_boundary
orright_boundary
are None (this is the default) the fitted function will be linearly extrapolated for points beyond the lowest and highest x-values inx
.
-
hic3defdr.util.lowess.
weighted_lowess_fit
(x, y, logx=False, logy=False, left_boundary=None, right_boundary=None, frac=None, auto_frac_factor=15.0, delta=0.01, w=20, power=0.25, interpolate_before_increase=True)[source]¶ Performs lowess fitting as in
lowess_fit()
, but weighting the data points automatically according to the precision in they
values as estimated by a rolling window sample variance.Points are weighted proportionally to a specified power
power
of their precision by adding duplicated points to the dataset. This should approximate the effects of a true weighted lowess fit, with the caveat that the weights are rounded a bit.Weighting the data points according to this rolling window sample variance is probably only a good idea if the marginal distribution of
x
values is uniform.Parameters: - y (x,) – The x and y values to fit, respectively.
- logy (logx,) – Pass True to perform the fit on the scale of
log(x)
and/orlog(y)
, respectively. - right_boundary (left_boundary,) – Allows specifying boundaries for the fit, in the original
x
space. If a float is passed, the returned fit will return the farthest left or farthest right lowess-estimatedy_hat
(from the original fitting set) for all points which are left or right of the specified left or right boundary point, respectively. Pass None to use linear extrapolation for these points instead. - frac (float, optional) – The lowess smoothing fraction to use. Pass None to use the default:
auto_frac_factor
divided by the product of the average of the unscaled weights and the largest scaled weight. - auto_frac_factor (float) – When
frac
is None, this factor scales the automatically determined fraction parameter. - delta (float) – Distance (on the scale of
x
orlog(x)
) within which to use linear interpolation when constructing the initial fit, expressed as a fraction of the range ofx
orlog(x)
. - w (int) – The size of the rolling window to use when estimating the precision of the y values.
- power (float) – Precisions will be taken to this power to obtain unscaled weights.
- interpolate_before_increase (bool) – Hacky flag introduced to handle quirk of Hi-C dispersion vs distance relationships in which dispersion is elevated at extremely short distances. When True, this function will identify a group of points with the lowest x-values across which the y-value is monotonically decreasing. These points will be included in the variance estimation, but will be excluded from lowess fitting. Linear interpolation will be used at these x-values instead, since it is hard to convince lowess to follow a sharp change in the trend that is only supported by 3-4 data points out of 200-500 total data points, even with our best attempts at weighting. Pass False to perform a simple weighted lowess fit with no linear interpolation.
Returns: This function takes in
x
values on the originalx
scale and returns estimatedy
values on the originaly
scale (regardless of what is passed forlogx
andlogy
). This function will still return sane estimates fory
even at points not in the original fitting set by performing linear interpolation in the space the fit was performed in.Return type: function