Back to Article
Scores
Download Notebook

Scores

Introduction:

This time you will create the data.

Exercise based on Chris Albon work, the credits belong to him.

Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

Step 2. Create the DataFrame that should look like the one below.

In [2]:
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
            'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
            'female': [0, 1, 1, 0, 1],
            'age': [42, 52, 36, 24, 73], 
            'preTestScore': [4, 24, 31, 2, 3],
            'postTestScore': [25, 94, 57, 62, 70]}

df = pd.DataFrame(raw_data)
df
first_name last_name female age preTestScore postTestScore
0 Jason Miller 0 42 4 25
1 Molly Jacobson 1 52 24 94
2 Tina Ali 1 36 31 57
3 Jake Milner 0 24 2 62
4 Amy Cooze 1 73 3 70

Step 3. Create a Scatterplot of preTestScore and postTestScore, with the size of each point determined by age

Hint: Don’t forget to place the labels

In [3]:
df.plot.scatter(x='preTestScore' , y='postTestScore'  , s=df['age'].values)

Step 4. Create a Scatterplot of preTestScore and postTestScore.

This time the size should be 4.5 times the postTestScore and the color determined by sex

In [4]:
df.plot.scatter(x='preTestScore' , y='postTestScore' , s=df['postTestScore']*4.5 , c='female' , colormap='viridis')

BONUS: Create your own question and answer it.