projections¶

class pybbda.analysis.projections.MarcelProjectionsBatting(stats_df=None, primary_pos_df=None)[source]¶

COMPUTED_METRICS = ['1B', '2B', '3B', 'HR', 'BB', 'HBP', 'SB', 'CS', 'SO', 'SH', 'SF']¶

LEAGUE_AVG_PT = 100¶

METRIC_WEIGHTS = (5, 4, 3)¶

NUM_REGRESSION_PLAYING_TIME = 200¶

PLAYING_TIME_COLUMN = 'PA'¶

PT_WEIGHTS = (0.5, 0.1, 0)¶

RECIPROCAL_AGE_METRICS = ['SO', 'CS']¶

REQUIRED_COLUMNS = ['AB', 'BB']¶

compute_playing_time_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)¶

computes playing time projection. metric_values, metric_weights, and seasonal_averages are not used but are included for consistency with compute_rate_projection

Parameters

metric_values –
pt_values – playing time values
metric_weights –
pt_weights – playing time weights
seasonal_averages –
num_regression_pt – number of playing-time units to use for regression

Returns

compute_rate_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)¶

computes rate projection. the length of the x_values and x_weights have to be the same. pt_weights is not used but is included for consistency with compute_playing_time_projection

Parameters

metric_values – float array
pt_values – float array
metric_weights – float array
pt_weights –
seasonal_averages – float array
num_regression_pt – float

Returns

filter_non_representative_data(stats_df, primary_pos_df)[source]¶

filters pitchers-as-batters. primary_pos_df is a data frame containing playerID, yearID, and primaryPos

Parameters

stats_df – a data frame like Lahman batting
primary_pos_df – data frame

Returns

get_num_regression_pt(stats_df)¶

Parameters: stats_df – data frame
Returns: float

metric_projection(metric_name, projected_season)¶

returns the projection for metric_name.

Parameters

metric_name – str
projected_season – int

Returns

data frame

metric_projection_detail(metric_name, projected_season)¶

returns the projection result for metric_name, including the detailed components separately. The use case for the details is primarily debugging

Parameters

metric_name – str
projected_season – it

Returns

data frame

preprocess_data(stats_df)[source]¶: preprocesses the data. :param stats_df: a data frame like Lahman batting :return: data frame

projections(projected_season, computed_metrics=None)¶

returns projections for all metrics in computed_metrics. If computed_metrics is None it uses the default set.

Parameters

projected_season – int
computed_metrics – list(str)

Returns

data frame

seasonal_average(stats_df, metric_name, playing_time_column)¶

seasonal average rate of metric_name

Parameters

stats_df – data frame
metric_name – str
playing_time_column – str

Returns

data frame

validate_data(stats_df)¶

class pybbda.analysis.projections.MarcelProjectionsPitching(stats_df=None, primary_pos_df=None)[source]¶

COMPUTED_METRICS = ['H', 'HR', 'ER', 'BB', 'SO', 'HBP', 'R']¶

LEAGUE_AVG_PT = 134¶

METRIC_WEIGHTS = (3, 2, 1)¶

NUM_REGRESSION_PLAYING_TIME = None¶

PLAYING_TIME_COLUMN = 'IPouts'¶

PT_WEIGHTS = (0.5, 0.1, 0)¶

RECIPROCAL_AGE_METRICS = ['H', 'HR', 'ER', 'BB', 'HBP', 'R']¶

REQUIRED_COLUMNS = ['IPouts']¶

compute_playing_time_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)¶

computes playing time projection. metric_values, metric_weights, and seasonal_averages are not used but are included for consistency with compute_rate_projection

Parameters

metric_values –
pt_values – playing time values
metric_weights –
pt_weights – playing time weights
seasonal_averages –
num_regression_pt – number of playing-time units to use for regression

Returns

compute_rate_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)¶

computes rate projection. the length of the x_values and x_weights have to be the same. pt_weights is not used but is included for consistency with compute_playing_time_projection

Parameters

metric_values – float array
pt_values – float array
metric_weights – float array
pt_weights –
seasonal_averages – float array
num_regression_pt – float

Returns

filter_non_representative_data(stats_df, primary_pos_df)[source]¶

filter batters-as-pitchers. primary_pos_df is a data frame containing playerID, yearID, and primaryPos

Parameters

stats_df – data frame like Lahman pitching
primary_pos_df – data frame

Returns

data frame

get_num_regression_pt(stats_df)[source]¶

gets the number of batters-faced for the regression component. computed as a function of fraction of games as a starter.

Parameters: stats_df – data frame like Lahman pitching
Returns: numpy array

metric_projection(metric_name, projected_season)¶

returns the projection for metric_name.

Parameters

metric_name – str
projected_season – int

Returns

data frame

metric_projection_detail(metric_name, projected_season)¶

returns the projection result for metric_name, including the detailed components separately. The use case for the details is primarily debugging

Parameters

metric_name – str
projected_season – it

Returns

data frame

preprocess_data(stats_df)[source]¶: preprocesses teh data. :param stats_df: data frame like Lahman pitching :return: data frame

projections(projected_season, computed_metrics=None)¶

returns projections for all metrics in computed_metrics. If computed_metrics is None it uses the default set.

Parameters

projected_season – int
computed_metrics – list(str)

Returns

data frame

seasonal_average(stats_df, metric_name, playing_time_column)¶

seasonal average rate of metric_name

Parameters

stats_df – data frame
metric_name – str
playing_time_column – str

Returns

data frame

validate_data(stats_df)¶