An Exploration of the Open Powerlifting dataset¶

Andrew Lau
There's a great powerlifting data set that collates hundreds of thousands of powerlifting competition results. I thought I'd do a bit of exploration and fit a quick and dirty XGBoost model to see if I can gather some insight on the impact on the total lifted in competition of:

age, particularly when people tend to "peak" in their lifting careers
bodyweight
gender
equipment

I'm curious about the impacts of the above features on the typical powerlifter. Of course, one of the issues with this dataset is the self-selection bias of lifters (e.g. lifters tend to compete when they feel strong/uninjured). Particularly when looking at the impact of age, older lifters may just choose not to lift after years of accumulated injuries and declining performance and may not appear in the dataset.

The problem here is rather straightforward (there is a lot of clean data (>2m rows), there aren't that many features available) and I'm just curious about the "pure effect" of the features on the target. To make the problem even simpler, the general impact of features like age, gender, drug testing and bodyweight are known within the powerlifting community (e.g. we know that in general males lift more and heavier lifters lift more).

Thus, a quick and dirty model will suffice (e.g. I will just use the tried and tested XGBoost with no feature engineering, basic pre-processing, minimal hyperparameter tuning).

Data sourced from: https://www.openpowerlifting.org/

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
import seaborn as sns
from sklearn.impute import SimpleImputer
import numpy as np
from sklearn.metrics import mean_squared_error
import shap
import statsmodels.api as sm
from sklearn.model_selection import RandomizedSearchCV
import pickle

pd.set_option('display.max_columns', None)

N_JOBS = 6

# import the data
data = pd.read_csv("openpowerlifting-2021-01-20.csv",
                  low_memory=False)

data.shape

(2301966, 41)

display(data)

Exploratory data analysis¶

Let's have a play around with the data and see what it looks like.

Target distribution¶

Let's take a cheeky look at a random subset of the data.

data_EDA = data.sample(n=100_000, random_state=1)

data_EDA.TotalKg.hist()

<AxesSubplot:>

# Let's check out are target, the comp total
fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(15,5))

#, legend=True after updating pandas
data_EDA.Best3SquatKg.hist(ax=ax1)
data_EDA.Best3BenchKg.hist(ax=ax2)
data_EDA.Best3DeadliftKg.hist(ax=ax3)
data_EDA.TotalKg.hist(ax=ax4)

<AxesSubplot:>

Distribution of the target as well as the best bench, squat and deadlift is consistent with my anectoal experience and seems reasonable.

data.describe()
# there seems to be negatives in the weights lifted. I'm guessing the negatives are bad lifts

One-way analysis¶

Let's plot some one-way charts to give us a feel of certain features on the target. Of course, these won't be "pure effects" as such, as they will be confounding by correlated variables (mix effects - for example a one-way chart of total lifted vs. weight may overestimate the pure effect of weight as higher weights are correlated with being male).

def one_way(x_var, y_var, data, axis):
    ax = sns.regplot(x=x_var, y=y_var, data=data, lowess=True, scatter_kws={"alpha":0.1},
                     line_kws={"color": "red"}, ax=axis)

Let's have a look at the impact of some features on amount lifted

data_subset = data_EDA.loc[data_EDA.Tested == "Yes"].sample(n=25_000, random_state=1)

fig, ((ax1, ax2, ax3, ax4), (ax5, ax6, ax7, ax8), (ax9, ax10, ax11, ax12)) = plt.subplots(3, 4, figsize=(17.5,10))

one_way("Age", "Best3SquatKg", data=data_subset[data_subset.Best3SquatKg > 0],
       axis=ax1)
one_way("BodyweightKg", "Best3SquatKg", data=data_subset[data_subset.Best3SquatKg > 0],
       axis=ax2)
sns.boxplot(x="Sex", y="Best3SquatKg", data=data_subset[data_subset.Best3SquatKg > 0], ax=ax3)
sns.boxplot(x="Equipment", y="Best3SquatKg", data=data_subset[data_subset.Best3SquatKg > 0], ax=ax4)


one_way("Age", "Best3BenchKg", data=data_subset[data_subset.Best3BenchKg > 0],
       axis=ax5)
one_way("BodyweightKg", "Best3BenchKg", data=data_subset[data_subset.Best3BenchKg > 0],
       axis=ax6)
sns.boxplot(x="Sex", y="Best3BenchKg", data=data_subset[data_subset.Best3BenchKg > 0], ax=ax7)
sns.boxplot(x="Equipment", y="Best3BenchKg", data=data_subset[data_subset.Best3BenchKg > 0], ax=ax8)

one_way("Age", "Best3DeadliftKg", data=data_subset[data_subset.Best3DeadliftKg > 0],
       axis=ax9)
one_way("BodyweightKg", "Best3DeadliftKg", data=data_subset[data_subset.Best3DeadliftKg > 0],
       axis=ax10)
sns.boxplot(x="Sex", y="Best3DeadliftKg", data=data_subset[data_subset.Best3DeadliftKg > 0], ax=ax11)
sns.boxplot(x="Equipment", y="Best3DeadliftKg", data=data_subset[data_subset.Best3DeadliftKg > 0], ax=ax12)

plt.show()

The average lifter appears to peak around the age of 30, before declining over time. The deterioration as the average lifter ages is better than what I was expecting. As expected bodyweight has a significant impact on weight lifted. Again, I note that these are averages and are impacted by conflating/correlated variables. As expected, males tend to lift more and equipped lifters also tend to lift more.

Pre-processing¶

Feature selection and missing data imputation¶

I'm just fitting a quick and dirty model here, so I'm removing duplicated/redudant features and not bothering with any feature engineering.

features_list = ['Age',
#  'AgeClass',
#  'Bench1Kg',
#  'Bench2Kg',
#  'Bench3Kg',
#  'Bench4Kg',
#  'Best3BenchKg',
#  'Best3DeadliftKg',
#  'Best3SquatKg',
#  'BirthYearClass',
 'BodyweightKg',
 'Country',
#  'Date',
#  'Deadlift1Kg',
#  'Deadlift2Kg',
#  'Deadlift3Kg',
#  'Deadlift4Kg',
#  'Division',
#  'Dots',
 'Equipment',
 'Event',
#  'Federation',
#  'Glossbrenner',
#  'Goodlift',
#  'MeetCountry',
#  'MeetName',
#  'MeetState',
#  'MeetTown',
#  'Name',
 'ParentFederation',
#  'Place',
 'Sex',
#  'Squat1Kg',
#  'Squat2Kg',
#  'Squat3Kg',
#  'Squat4Kg',
#  'State',
#  'T',
 'Tested',
#  'TotalKg',
#  'WeightClassKg',
#  'Wilks'
                ]

# feature selection
X = data[features_list]

# one hot encoding on full dataset to avoid categorical levels not in training but in valid/test
X = pd.get_dummies(X, drop_first=True)
X_col_names = X.columns

y = data["TotalKg"]

# removing rows where there isn't a total, the age and bodyeight are invalid
y_notna = ~y.isna()
X_valid_age = X.Age > 10
X_valid_weight = X.BodyweightKg > 20

mask = y_notna & X_valid_age & X_valid_weight

y = y[mask]
X = X[mask]

print("na in y_train:", y.isna().sum())
print("na in X_train:", np.sum(np.isnan(X)))

na in y_train: 0
na in X_train: Age                       0
BodyweightKg              0
Country_Afghanistan       0
Country_Algeria           0
Country_American Samoa    0
                         ..
ParentFederation_WRPF     0
ParentFederation_WUAP     0
ParentFederation_XPC      0
Sex_M                     0
Sex_Mx                    0
Length: 224, dtype: int64

# train test split
# 20% test, 20% validation, 60% training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

data.shape

(2301966, 41)

# imputing missing values. not needed for XGB, used for testing out other ML models that can't handle missing data
# imp = SimpleImputer(missing_values=np.nan, strategy='median')
# X_train = imp.fit_transform(X_train)
# X_valid = imp.fit_transform(X_valid)
# X_test = imp.fit_transform(X_test)

Fitting a model - XGBoost¶

We now fit a model to see get a better understanding of the "pure effects" of the features. There is quite a bit of data and the problem is fairly straightforward, so we just default to the trusty XGBoost model. We conduct a light random gridsearch. Early stopping rounds and a validation data set were used to help find the optimal number of estimators at each fit.

# fitted this earlier on Google Cloud Platform
# xgb_1 = XGBRegressor(objective="reg:squarederror", n_estimators=500, n_jobs=16)
# eval_set = [(X_valid, y_valid)]


# gbm_param_grid = {
#     'max_depth': [2, 6],
#     'learning_rate': [0.01, 0.1],
#     'gamma': [0, 0.5, 2], 
#     'colsample_bytree':[0.8, 1]
# }

# fit_params={"early_stopping_rounds":25,
#             "eval_set" : [[X_valid, y_valid]]
#            }

# # Perform random search: grid_mse
# random_search = RandomizedSearchCV(xgb_1, 
#                                    cv=4,
#                                    param_distributions=gbm_param_grid, 
#                                    n_iter=5,
#                                    n_jobs=2,
#                                    verbose=False, 
#                                    random_state=123,
#                                   )

# random_search.fit(X_train, y_train, **fit_params)

# pickle.dump(random_search, open("random_search.pickle", 'wb'))
random_search = pickle.load(open("random_search.pickle", 'rb'))

random_search.best_estimator_.get_params()

{'objective': 'reg:squarederror',
 'base_score': 0.5,
 'booster': 'gbtree',
 'colsample_bylevel': 1,
 'colsample_bynode': 1,
 'colsample_bytree': 1,
 'gamma': 0,
 'gpu_id': -1,
 'importance_type': 'gain',
 'interaction_constraints': '',
 'learning_rate': 0.1,
 'max_delta_step': 0,
 'max_depth': 6,
 'min_child_weight': 1,
 'missing': nan,
 'monotone_constraints': '()',
 'n_estimators': 500,
 'n_jobs': 16,
 'num_parallel_tree': 1,
 'random_state': 0,
 'reg_alpha': 0,
 'reg_lambda': 1,
 'scale_pos_weight': 1,
 'subsample': 1,
 'tree_method': 'exact',
 'validate_parameters': 1,
 'verbosity': None}

¶

Diagnostics¶

MSE and MAPE¶

best_iter = random_search.best_estimator_.best_iteration
y_train_pred = random_search.best_estimator_.predict(X_train, ntree_limit=best_iter)
y_valid_pred = random_search.best_estimator_.predict(X_valid, ntree_limit=best_iter)
y_test_pred = random_search.best_estimator_.predict(X_test, ntree_limit=best_iter)

print("MSE on training data:", mean_squared_error(y_train, y_train_pred))
print("MSE on validation data:", mean_squared_error(y_valid, y_valid_pred))
print("MSE on test data:", mean_squared_error(y_test, y_test_pred))

def mean_absolute_percentage_error(y_true, y_pred):
    return np.mean(abs(y_true / y_pred - 1))

print("MAPE on training data:", mean_absolute_percentage_error(y_train, y_train_pred))
print("MAPE on validation data:", mean_absolute_percentage_error(y_valid, y_valid_pred))
print("MAPE on test data:", mean_absolute_percentage_error(y_test, y_test_pred))

MSE on training data: 4349.150095184332
MSE on validation data: 4446.135423677414
MSE on test data: 4410.636658830921
MAPE on training data: 0.13601618227032103
MAPE on validation data: 0.13763610656733644
MAPE on test data: 0.13728857659985094

The mean squared error for the validation/test dataset aren't significantly worse than the the training data. No signs of overfitting here. MSE is a little hard to interpret, so we also look at MAPE (mean absolute percentage error) to get a feel of the average percentage prediction error. Around 13-14% on all splits of the data, seems like a reasonable model.

One-ways by key factors¶

We compare the average fitted and actuals by key factors to see if there are any segments that do not fit well. The fit looks quite good. The fit deteriorates a bit for higher ages/weights, but they are low exposure segments.

pd.DataFrame({"factor":X_valid.loc[:, "Age"].round(), "y_true":y_valid, "y_pred":y_valid_pred}).groupby("factor").mean().plot()
pd.DataFrame({"factor":X_valid.loc[:, "BodyweightKg"].round(), "y_true":y_valid, "y_pred":y_valid_pred}).groupby("factor").mean().plot()
pd.DataFrame({"factor":X_valid.loc[:, "Sex_M"], "y_true":y_valid, "y_pred":y_valid_pred}).groupby("factor").mean().plot()

<AxesSubplot:xlabel='factor'>

Quantile plot¶

The next diagnostic we look at is the quantile plot. We compare predicted and actuals by quantiles.

def quantile_plot(y_true, y_pred, quantiles=10, scale=1.0, title=None, return_df=False):
    """ 
    description: plots a quantile plot based on sorting observations by y_pred
        and grouping them into quantiles.
    inputs:
        y_true - array like object of actuals
        y_pred - array like object of predicted values
        quantiles - number of quantiles/bins to separate the data
        scale - the predicted values can be scaled to focus on the ability of
            the model to rank the observations
        title - plot title
        return_df - False. Whether to return a pandas dataframe with the quantiles
    returns:
        DataFrame with y_true and y_pred grouped into quantiles sorted by y_pred
    """
    if scale == "auto":
        scale_used = sum(y_true) / sum(y_pred)
    else:
        scale_used = scale

    df = pd.DataFrame({'Actual':np.array(y_true).reshape(-1),
                                  'Predicted':np.array(y_pred).reshape(-1) * scale_used})
    df = df.groupby(pd.qcut(df.rank(method="first").loc[:, "Predicted"], quantiles)).mean()
    df.loc[:, "Quantile"] = (list(range(1, quantiles + 1)))
    df = df.set_index("Quantile")

    df.plot()
    plt.gca().set_ylabel("Target")    
    plt.xticks(rotation = 90)
    
    if title:
        plt.title(title)    
    else:
        plt.title("Predicted vs. Observed (" + str(quantiles) + " quantiles)")

    if return_df:
        return df

quantile_plot(y_train, y_train_pred, quantiles=100, scale=1.0, title="training")
quantile_plot(y_valid, y_valid_pred, quantiles=100, scale=1.0, title="validation")
quantile_plot(y_valid, y_valid_pred, quantiles=100, scale=1.0, title="test")

Quantile plot looks good - the predictions are close to the actuals in each quantile and there is decent separation between the high and low quantiles

Lorenz curve¶

def gini_score(y_true, y_pre, **kwargs):
    """
    description: model scoring metric. returns gini index.
    input: y_true, y_pre
    returns: gini score
    adapted from: https://zhiyzuo.github.io/Plot-Lorenz/
    """
    sorted_arr = np.array([x for _, x in sorted(zip(y_true, y_pre))])
    n = sorted_arr.size
    coef_ = 2. / n
    const_ = (n + 1.) / n
    weighted_sum = sum([(i + 1) * yi for i, yi in enumerate(sorted_arr)])

    return coef_ * weighted_sum / (sorted_arr.sum()) - const_
def lorenz_curve(y_true, y_pred):
    """
    description: plots the lorenz curve based on y_true and y_pred.
    input: y_true, y_pre
    adapted from: https://zhiyzuo.github.io/Plot-Lorenz/
    """
    X = np.array([x for _,x in sorted(zip(y_true, y_pred))])

    X_lorenz = X.cumsum() / X.sum()
    X_lorenz = np.insert(X_lorenz, 0, 0)
    X_lorenz[0], X_lorenz[-1]
    fig, ax = plt.subplots(figsize=[6,6])
    ## scatter plot of Lorenz curve
    ax.scatter(np.arange(X_lorenz.size)/(X_lorenz.size-1), X_lorenz,
               marker='x', color='orange', s=5, label="Predicted Separation")

    # repeat for perfect separation
    X = np.array([x for _,x in sorted(zip(y_true, y_true))])

    X_lorenz = X.cumsum() / X.sum()
    X_lorenz = np.insert(X_lorenz, 0, 0)
    X_lorenz[0], X_lorenz[-1]

    ## scatter plot of Lorenz curve
    ax.scatter(np.arange(X_lorenz.size)/(X_lorenz.size-1), X_lorenz,
               marker='x', color='red', s=5, label="Perfect Separation")

    ## line plot of equality\
    ax.plot([0,1], [0,1], color='k')
    ax.set_title("Gini Index - Predicted: %.4f, Gini Index Perfect: %.4f" % (gini_score(y_true, y_pred), gini_score(y_true, y_true)))
    ax.legend()
    
lorenz_curve(y_train, y_train_pred)
lorenz_curve(y_valid, y_valid_pred)
lorenz_curve(y_test, y_test_pred)

Interpreting results¶

Now that we're happy with this model, let's draw some insight from it!

Feature importance¶

# compute SHAP values
explainer = shap.TreeExplainer(random_search.best_estimator_)
shap_values = explainer.shap_values(X_train)

shap.summary_plot(shap_values, features=X_train, feature_names=X_col_names, plot_type="bar")

The most important feature is the event, which makes sense, given the drastic differences in totals from a full powerlifting meet (squat, bench and deadlift) and a bench only meet. It's interesting how gender came up above bodyweight.

shap.summary_plot(shap_values, features=X_train, feature_names=X_col_names)

The above chart shows the impact of the top features and the distributions of their SHAP values. When the colours mix, it implies an interaction with something else.

A full powerlifting meeting has a positive impact on total
Being male has a positive impact on total
A heavier bodyweight has a positive impact on total
Age is too heavily interacted with other features to discern a simple effect here
Raw (minimal equipment) has a negative impact on total

Partial dependencies¶

Let's look at some SHAP partial dependencies to analyse the pure effects of each feature

shap.dependence_plot("rank(0)", shap_values, X_train, feature_names = X_col_names, alpha=0.01)
shap.dependence_plot("rank(1)", shap_values, X_train, feature_names = X_col_names, alpha=0.01)
shap.dependence_plot("rank(2)", shap_values, X_train, feature_names = X_col_names, alpha=0.01)
shap.dependence_plot("rank(3)", shap_values, X_train, feature_names = X_col_names, alpha=0.01)
shap.dependence_plot("rank(4)", shap_values, X_train, feature_names = X_col_names, alpha=0.01)

Passing parameters norm and vmin/vmax simultaneously is deprecated since 3.3 and will become an error two minor releases later. Please pass vmin/vmax directly to the norm when creating it.

Passing parameters norm and vmin/vmax simultaneously is deprecated since 3.3 and will become an error two minor releases later. Please pass vmin/vmax directly to the norm when creating it.

Passing parameters norm and vmin/vmax simultaneously is deprecated since 3.3 and will become an error two minor releases later. Please pass vmin/vmax directly to the norm when creating it.

Passing parameters norm and vmin/vmax simultaneously is deprecated since 3.3 and will become an error two minor releases later. Please pass vmin/vmax directly to the norm when creating it.

Event_SBD is such a strong factor, it's interacting with quite a lot here. Interesting how lifters between the ages of 20-40 (in their prime), doing full powerlifting meets tended to lift more (above the impact of the full powerlifting meet).

These clouds of SHAP values are a little harder to interpret, so I've averaged it out in the graphs below.

def shap_plot_mean(var):
    shap_df = pd.DataFrame({"SHAP Value":shap_values[:,X_col_names.get_indexer([var])[0]], var:X_train.iloc[:,X_col_names.get_indexer([var])[0]]}).round().groupby(var).mean()
    shap_df.plot()
    
shap_plot_mean("Age")
shap_plot_mean("BodyweightKg")
shap_plot_mean("Sex_M")

Based off the above, you'll only taper off in performance after the age of 35 (noting the self-selection bias of the data), not suprisingly bodyweight increases weight lifted (albeit with diminishing returns), and the typical male powerlifter lifts about 140kg more than the typical female powerlifter.

	Name	Sex	Event	Equipment	Age	AgeClass	BirthYearClass	Division	BodyweightKg	WeightClassKg	Squat1Kg	Squat2Kg	Squat3Kg	Squat4Kg	Best3SquatKg	Bench1Kg	Bench2Kg	Bench3Kg	Bench4Kg	Best3BenchKg	Deadlift1Kg	Deadlift2Kg	Deadlift3Kg	Deadlift4Kg	Best3DeadliftKg	TotalKg	Place	Dots	Wilks	Glossbrenner	Goodlift	Tested	Country	State	Federation	ParentFederation	Date	MeetCountry	MeetState	MeetTown	MeetName
0	Dominik Gabriel	M	B	Raw	17.0	16-17	14-18	T 16-17	102.5	110	NaN	NaN	NaN	NaN	NaN	120.0	130.0	-140.0	NaN	130.0	NaN	NaN	NaN	NaN	NaN	130.0	1	79.18	78.36	74.83	58.87	NaN	Slovakia	NaN	WUAP	WUAP	2015-06-16	Czechia	NaN	Praha	European Championships
1	Marek Herák	M	B	Multi-ply	19.0	18-19	19-23	T 18-19	59.8	60	NaN	NaN	NaN	NaN	NaN	165.0	-175.0	175.0	-180.0	175.0	NaN	NaN	NaN	NaN	NaN	175.0	1	148.11	149.71	146.22	84.82	NaN	Slovakia	NaN	WUAP	WUAP	2015-06-16	Czechia	NaN	Praha	European Championships
2	Miroslav Adamove	M	B	Multi-ply	18.0	18-19	14-18	T 18-19	87.7	90	NaN	NaN	NaN	NaN	NaN	250.0	-270.0	-270.0	NaN	250.0	NaN	NaN	NaN	NaN	NaN	250.0	1	163.81	161.77	155.22	85.72	NaN	Slovakia	NaN	WUAP	WUAP	2015-06-16	Czechia	NaN	Praha	European Championships
3	Kamil Lipiňski	M	B	Multi-ply	19.0	18-19	19-23	T 18-19	89.5	90	NaN	NaN	NaN	NaN	NaN	-170.0	-180.0	-180.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	DQ	NaN	NaN	NaN	NaN	NaN	Poland	NaN	WUAP	WUAP	2015-06-16	Czechia	NaN	Praha	European Championships
4	Gabriel Kováč	M	B	Multi-ply	20.0	20-23	19-23	Juniors	81.6	82.5	NaN	NaN	NaN	NaN	NaN	260.0	280.0	-290.0	NaN	280.0	NaN	NaN	NaN	NaN	NaN	280.0	1	190.86	188.82	181.77	100.89	NaN	Slovakia	NaN	WUAP	WUAP	2015-06-16	Czechia	NaN	Praha	European Championships
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2301961	Fady Meilad	M	SBD	Raw	34.5	24-34	24-39	Open	94.5	105	200.0	210.0	-225.0	NaN	210.0	150.0	160.0	-165.0	NaN	160.0	230.0	250.0	-255.0	NaN	250.0	620.0	2	391.49	386.58	369.78	80.48	Yes	Egypt	NaN	AfricanPF	IPF	2015-10-24	Morocco	NaN	Meknes	African Classic Powerlifting Championships
2301962	Mohamed Omar	M	SBD	Raw	30.5	24-34	24-39	Open	110.3	120	150.0	NaN	NaN	NaN	150.0	-100.0	100.0	115.0	NaN	115.0	-230.0	240.0	-265.0	NaN	240.0	505.0	2	298.79	296.94	283.83	60.95	Yes	Egypt	NaN	AfricanPF	IPF	2015-10-24	Morocco	NaN	Meknes	African Classic Powerlifting Championships
2301963	Boudjemaa Nait Ladjemil	M	SBD	Raw	33.5	24-34	24-39	Open	116.4	120	270.0	290.0	-300.0	NaN	290.0	170.0	177.5	182.5	NaN	182.5	280.0	300.0	312.5	NaN	312.5	785.0	1	455.52	454.68	435.41	92.48	Yes	Algeria	NaN	AfricanPF	IPF	2015-10-24	Morocco	NaN	Meknes	African Classic Powerlifting Championships
2301964	Mohamed Eldeken	M	SBD	Raw	38.5	35-39	24-39	Open	128.3	120+	150.0	NaN	NaN	NaN	150.0	100.0	NaN	NaN	NaN	100.0	150.0	NaN	NaN	NaN	150.0	400.0	2	224.87	226.78	216.80	45.19	Yes	Egypt	NaN	AfricanPF	IPF	2015-10-24	Morocco	NaN	Meknes	African Classic Powerlifting Championships
2301965	Mohamed Bouafia	M	SBD	Raw	38.0	35-39	24-39	Open	130.5	120+	350.0	375.0	390.0	NaN	390.0	210.0	220.0	222.5	NaN	222.5	335.0	355.0	-367.5	NaN	355.0	967.5	1	541.12	546.84	522.26	108.52	Yes	Algeria	NaN	AfricanPF	IPF	2015-10-24	Morocco	NaN	Meknes	African Classic Powerlifting Championships

	Age	BodyweightKg	Squat1Kg	Squat2Kg	Squat3Kg	Squat4Kg	Best3SquatKg	Bench1Kg	Bench2Kg	Bench3Kg	Bench4Kg	Best3BenchKg	Deadlift1Kg	Deadlift2Kg	Deadlift3Kg	Deadlift4Kg	Best3DeadliftKg	TotalKg	Dots	Wilks	Glossbrenner	Goodlift
count	1.405630e+06	2.272788e+06	548366.000000	541887.000000	525775.000000	5965.000000	1.512067e+06	917694.000000	906210.000000	876436.000000	16381.000000	2.019193e+06	629286.000000	616535.000000	587107.000000	15711.000000	1.638153e+06	2.137665e+06	2.118599e+06	2.118599e+06	2.118599e+06	1.929263e+06
mean	3.100391e+01	8.405366e+01	112.103324	92.430403	32.275855	72.392890	1.749615e+02	83.474699	55.112395	-18.091854	23.887011	1.193173e+02	159.314554	129.661601	16.392483	77.594206	1.893182e+02	3.755357e+02	2.731590e+02	2.721775e+02	2.566639e+02	6.390920e+01
std	1.318477e+01	2.273521e+01	144.052702	169.183631	196.035332	188.238942	6.995331e+01	104.345559	129.211113	143.203392	164.077946	5.483921e+01	109.758061	160.004725	212.291169	188.829659	6.252989e+01	2.069084e+02	1.302450e+02	1.297588e+02	1.234186e+02	1.619867e+01
min	0.000000e+00	1.510000e+01	-555.000000	-580.000000	-600.500000	-550.000000	-5.080200e+02	-502.500000	-575.000000	-575.000000	-500.000000	-5.225000e+02	-461.000000	-470.000000	-587.500000	-461.000000	-4.100000e+02	1.000000e+00	6.800000e-01	6.700000e-01	6.400000e-01	5.000000e-01
25%	2.100000e+01	6.705000e+01	87.500000	70.000000	-160.000000	-100.000000	1.225000e+02	55.000000	-52.500000	-140.000000	-125.000000	7.711000e+01	125.000000	115.000000	-205.000000	-100.000000	1.400000e+02	2.000000e+02	1.407300e+02	1.402100e+02	1.327900e+02	5.219000e+01
50%	2.750000e+01	8.196000e+01	145.000000	142.900000	110.000000	131.000000	1.700000e+02	105.000000	95.000000	-60.000000	75.000000	1.150000e+02	180.000000	177.500000	117.500000	142.500000	1.900000e+02	3.500000e+02	2.938400e+02	2.929600e+02	2.735200e+02	6.362000e+01
75%	3.900000e+01	9.890000e+01	200.000000	202.500000	190.000000	200.000000	2.200000e+02	145.000000	145.000000	120.000000	155.000000	1.542200e+02	225.000000	230.000000	205.000000	206.380000	2.350000e+02	5.307000e+02	3.715400e+02	3.700700e+02	3.504900e+02	7.517000e+01
max	9.800000e+01	2.602000e+02	555.000000	577.500000	560.000000	592.390000	5.810500e+02	467.500000	487.500000	478.540000	487.610000	5.034900e+02	450.000000	460.400000	457.500000	440.500000	4.604000e+02	1.407500e+03	7.952200e+02	7.933300e+02	7.569000e+02	1.464900e+02