projections

class pybbda.analysis.projections.MarcelProjectionsBatting(stats_df=None, primary_pos_df=None)[source]
COMPUTED_METRICS = ['1B', '2B', '3B', 'HR', 'BB', 'HBP', 'SB', 'CS', 'SO', 'SH', 'SF']
LEAGUE_AVG_PT = 100
METRIC_WEIGHTS = (5, 4, 3)
NUM_REGRESSION_PLAYING_TIME = 200
PLAYING_TIME_COLUMN = 'PA'
PT_WEIGHTS = (0.5, 0.1, 0)
RECIPROCAL_AGE_METRICS = ['SO', 'CS']
REQUIRED_COLUMNS = ['AB', 'BB']
compute_playing_time_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)

computes playing time projection. metric_values, metric_weights, and seasonal_averages are not used but are included for consistency with compute_rate_projection

Parameters
  • metric_values

  • pt_values – playing time values

  • metric_weights

  • pt_weights – playing time weights

  • seasonal_averages

  • num_regression_pt – number of playing-time units to use for regression

Returns

compute_rate_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)

computes rate projection. the length of the x_values and x_weights have to be the same. pt_weights is not used but is included for consistency with compute_playing_time_projection

Parameters
  • metric_values – float array

  • pt_values – float array

  • metric_weights – float array

  • pt_weights

  • seasonal_averages – float array

  • num_regression_pt – float

Returns

filter_non_representative_data(stats_df, primary_pos_df)[source]

filters pitchers-as-batters. primary_pos_df is a data frame containing playerID, yearID, and primaryPos

Parameters
  • stats_df – a data frame like Lahman batting

  • primary_pos_df – data frame

Returns

get_num_regression_pt(stats_df)
Parameters

stats_df – data frame

Returns

float

metric_projection(metric_name, projected_season)

returns the projection for metric_name.

Parameters
  • metric_name – str

  • projected_season – int

Returns

data frame

metric_projection_detail(metric_name, projected_season)

returns the projection result for metric_name, including the detailed components separately. The use case for the details is primarily debugging

Parameters
  • metric_name – str

  • projected_season – it

Returns

data frame

preprocess_data(stats_df)[source]

preprocesses the data. :param stats_df: a data frame like Lahman batting :return: data frame

projections(projected_season, computed_metrics=None)

returns projections for all metrics in computed_metrics. If computed_metrics is None it uses the default set.

Parameters
  • projected_season – int

  • computed_metrics – list(str)

Returns

data frame

seasonal_average(stats_df, metric_name, playing_time_column)

seasonal average rate of metric_name

Parameters
  • stats_df – data frame

  • metric_name – str

  • playing_time_column – str

Returns

data frame

validate_data(stats_df)
class pybbda.analysis.projections.MarcelProjectionsPitching(stats_df=None, primary_pos_df=None)[source]
COMPUTED_METRICS = ['H', 'HR', 'ER', 'BB', 'SO', 'HBP', 'R']
LEAGUE_AVG_PT = 134
METRIC_WEIGHTS = (3, 2, 1)
NUM_REGRESSION_PLAYING_TIME = None
PLAYING_TIME_COLUMN = 'IPouts'
PT_WEIGHTS = (0.5, 0.1, 0)
RECIPROCAL_AGE_METRICS = ['H', 'HR', 'ER', 'BB', 'HBP', 'R']
REQUIRED_COLUMNS = ['IPouts']
compute_playing_time_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)

computes playing time projection. metric_values, metric_weights, and seasonal_averages are not used but are included for consistency with compute_rate_projection

Parameters
  • metric_values

  • pt_values – playing time values

  • metric_weights

  • pt_weights – playing time weights

  • seasonal_averages

  • num_regression_pt – number of playing-time units to use for regression

Returns

compute_rate_projection(metric_values, pt_values, metric_weights, pt_weights, seasonal_averages, num_regression_pt)

computes rate projection. the length of the x_values and x_weights have to be the same. pt_weights is not used but is included for consistency with compute_playing_time_projection

Parameters
  • metric_values – float array

  • pt_values – float array

  • metric_weights – float array

  • pt_weights

  • seasonal_averages – float array

  • num_regression_pt – float

Returns

filter_non_representative_data(stats_df, primary_pos_df)[source]

filter batters-as-pitchers. primary_pos_df is a data frame containing playerID, yearID, and primaryPos

Parameters
  • stats_df – data frame like Lahman pitching

  • primary_pos_df – data frame

Returns

data frame

get_num_regression_pt(stats_df)[source]

gets the number of batters-faced for the regression component. computed as a function of fraction of games as a starter.

Parameters

stats_df – data frame like Lahman pitching

Returns

numpy array

metric_projection(metric_name, projected_season)

returns the projection for metric_name.

Parameters
  • metric_name – str

  • projected_season – int

Returns

data frame

metric_projection_detail(metric_name, projected_season)

returns the projection result for metric_name, including the detailed components separately. The use case for the details is primarily debugging

Parameters
  • metric_name – str

  • projected_season – it

Returns

data frame

preprocess_data(stats_df)[source]

preprocesses teh data. :param stats_df: data frame like Lahman pitching :return: data frame

projections(projected_season, computed_metrics=None)

returns projections for all metrics in computed_metrics. If computed_metrics is None it uses the default set.

Parameters
  • projected_season – int

  • computed_metrics – list(str)

Returns

data frame

seasonal_average(stats_df, metric_name, playing_time_column)

seasonal average rate of metric_name

Parameters
  • stats_df – data frame

  • metric_name – str

  • playing_time_column – str

Returns

data frame

validate_data(stats_df)