Analysis of Olympic History Data Using SAS Part-01

3 min readOct 26, 2022

In this Project, we will try to understand the process of Exploratory Data Analysis (EDA) and we will also dig into data, and try to answer to some of the interview questions with SAS.
I downloaded Olympic dataset from Kaggle.
The Olympic Games are considered the world’s foremost sports competition with more than 200 countries participating. The Olympics are normally held every four years, and since 1994, have alternated between the Summer and Winter Olympics every two years during the four-year period.
This dataset contains information about the Olympics from 1896–2016. The dataset contains two files; the athletes and the region file.
The Athletes file contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in a particular Olympic event (athlete events).
The Region file contains 230 rows and 3 columns.

What is Exploratory Data Analysis, what it’s use ?
EDA is a critical process in which we will get to know about our data by summarizing the information using statistical tools or visualize our data using Visualization tools.
EDA helps us to determine the trends, patterns, properties of variable, data types, descriptive statistics of our dataset, missing values, and aslo checking uniqueness of the data.

First we will look at the metadata of our dataset. Proc Contents gives number of observations, and variables, and also give the list of variables and it’s attributes.

proc contents data=athlete_events;
run;

Dataset Metadata

2. Sneak peak at the data (get the data for only first 5 records)

proc print data=work.athlete_events (obs=5) noobs;
run;

3. Get the descriptive statistics of the data.

proc means data=work.athlete_events mean median mode std var min max; run;

4. Get the missing values in each variable.

proc means data=work.athlete_events nmiss;
run;

5. Get distinct Values of a variable values in data.

5.1. Count distinct values using proc sql.

proc sql;
select count(distinct Name) as Name,
count(distinct Team) as Team,
count(distinct Season) as Season
from work.athlete_events ;
quit;

5.2. Count distinct values using proc freq.

proc freq data=work.athlete_events (keep = Name Team Season) nlevels;
tables Name Team Season / nopercent nocol nocum nofreq noprint;
run;

6. Check the distribution of the data with Univariate statistical Analysis using Histogram.

Univariate gives us overall picture of the data like Moments, Basic Statistical Measures, Tests for Location, Quantiles , and also Extreme Observations.

proc univariate data=work.athlete_events novarcontents;
histogram Year ;
run;