5.17 Intro to Data Science: Simulation and Static Visualizations

Instructor Note: This notebook's code has been organized into cells differently than the snippets presented in the book. In a notebook, all the code that affects the visualization's appearance must appear in the same cell. Any code that modifies that appearance would have to re-display the visualization. For this reason, snippet numbers in this notebook do not match with the snippet numbers in the book.

  • Visualizations help you “get to know” your data.
  • Give you a powerful way to understand data that goes beyond simply looking at raw data.
  • The Seaborn visualization library is built over the Matplotlib visualization library and simplifies many Matplotlib operations.

5.17.1 Sample Graphs for 600, 60,000 and 6,000,000 Die Rolls

  • A vertical bar chart that for 600 die rolls summarizes the frequencies with which each of the six faces appear, and their percentages of the total.
  • Seaborn refers to this type of graph as a bar plot:

Screen capture of a vertical bar chart for 600 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total

  • Expect about 100 occurrences of each die face.
  • For a small number of rolls, none of the frequencies is exactly 100 and most of the percentages are not close to 16.667% (about 1/6th).
  • For 60,000 die rolls, the bars will become much closer in size.
  • At 6,000,000 die rolls, they’ll appear to be exactly the same size.
  • “Law of large numbers” at work.
  • The first screen capture below shows the results for 60,000 die rolls—expect about 10,000 of each face.
  • The second screen capture below shows the results for 6,000,000 rolls—expect about 1,000,000 of each face
  • With more die rolls, the frequency percentages are much closer to the expected 16.667%.

Screen capture of a vertical bar chart for 60,000 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total

Screen capture of a vertical bar chart for 6,000,000 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total

5.17.2 Visualizing Die-Roll Frequencies and Percentages

Launching IPython for Interactive Matplotlib Development

  • To enable IPython's built-in support for interactively developing Matplotlib graphs:
ipython --matplotlib

Importing the Libraries

Note: %matplotlib inline is an IPython magic that enables Matplotlib-based graphics to be displayed directly in the notebook. We've separated by two blank lines the snippets that were combined into a single cell.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
In [2]:
import numpy as np
In [3]:
import random
In [4]:
import seaborn as sns
  1. matplotlib.pyplot contains the Matplotlib library’s graphing capabilities that we use. This module typically is imported with the name plt.
  2. NumPy (Numerical Python) library includes the function unique that we’ll use to summarize the die rolls. The numpy module typically is imported as np.
  3. random contains Python’s random-number generation functions.
  4. seaborn contains the Seaborn library’s graphing capabilities we use. This module typically is imported with the name sns.

Rolling the Die and Calculating Die Frequencies

In [5]:
rolls = [random.randrange(1, 7) for i in range(600)]
  • NumPy's unique function expects an ndarray argument and returns an ndarray.
  • If you pass a list, NumPy converts it to an ndarray for better performance.
  • Keyword argument return_counts=True tells unique to count each unique value’s number of occurrences
  • In this case, unique returns a tuple of two one-dimensional ndarrays containing the sorted unique values and their corresponding frequencies, respectively.
In [6]:
values, frequencies = np.unique(rolls, return_counts=True)

Creating the Initial Bar Plot

Setting the Window Title and Labeling the x- and y-Axes

Finalizing the Bar Plot

In [7]:
title = f'Rolling a Six-Sided Die {len(rolls):,} Times'


sns.set_style('whitegrid')  # default is white with no grid


# create and display the bar plot
axes = sns.barplot(x=values, y=frequencies, palette='bright')


# set the title of the plot
axes.set_title(title)


# label the axes
axes.set(xlabel='Die Value', ylabel='Frequency')  


# scale the y-axis to add room for text above bars
axes.set_ylim(top=max(frequencies) * 1.10)


# create and display the text for each bar
for bar, frequency in zip(axes.patches, frequencies):
    text_x = bar.get_x() + bar.get_width() / 2.0  
    text_y = bar.get_height() 
    text = f'{frequency:,}\n{frequency / len(rolls):.3%}'
    axes.text(text_x, text_y, text, 
              fontsize=11, ha='center', va='bottom')

Rolling Again and Updating the Bar Plot—Introducing IPython Magics

In [8]:
# plt.cla()
# We placed this code in a comment because it was meant for use 
# in an interactive IPython session in which we clear the window,
# then display a new graph in it. In a notebook, we can simply 
# display a new graph inline.

When you execute the next cell, the notebook will add another cell below it containing the code in Snippet 5. You should then change 600 to 60000.

In [9]:
%recall 5
In [18]:
rolls = [random.randrange(1, 7) for i in range(600)]

When you execute the next cell, the notebook will add another cell below it containing the code in Snippets 6-7. Executing that cell will produce a new graph.

In [19]:
%recall 6-7
In [20]:
values, frequencies = np.unique(rolls, return_counts=True)
title = f'Rolling a Six-Sided Die {len(rolls):,} Times'


sns.set_style('whitegrid')  # default is white with no grid


# create and display the bar plot
axes = sns.barplot(x=values, y=frequencies, palette='bright')


# set the title of the plot
axes.set_title(title)


# label the axes
axes.set(xlabel='Die Value', ylabel='Frequency')  


# scale the y-axis to add room for text above bars
axes.set_ylim(top=max(frequencies) * 1.10)


# create and display the text for each bar
for bar, frequency in zip(axes.patches, frequencies):
    text_x = bar.get_x() + bar.get_width() / 2.0  
    text_y = bar.get_height() 
    text = f'{frequency:,}\n{frequency / len(rolls):.3%}'
    axes.text(text_x, text_y, text, 
              fontsize=11, ha='center', va='bottom')

Saving Snippets to a File with the %save Magic

In [11]:
%save RollDie.py 1-7
The following commands were written to file `RollDie.py`:
get_ipython().run_line_magic('matplotlib', 'inline')
import matplotlib.pyplot as plt
import numpy as np
import random
import seaborn as sns
rolls = [random.randrange(1, 7) for i in range(600)]
values, frequencies = np.unique(rolls, return_counts=True)
title = f'Rolling a Six-Sided Die {len(rolls):,} Times'


sns.set_style('whitegrid')  # default is white with no grid


# create and display the bar plot
axes = sns.barplot(x=values, y=frequencies, palette='bright')


# set the title of the plot
axes.set_title(title)


# label the axes
axes.set(xlabel='Die Value', ylabel='Frequency')  


# scale the y-axis to add room for text above bars
axes.set_ylim(top=max(frequencies) * 1.10)


# create and display the text for each bar
for bar, frequency in zip(axes.patches, frequencies):
    text_x = bar.get_x() + bar.get_width() / 2.0  
    text_y = bar.get_height() 
    text = f'{frequency:,}\n{frequency / len(rolls):.3%}'
    axes.text(text_x, text_y, text, 
              fontsize=11, ha='center', va='bottom')
In [12]:
# plt.cla()
# We placed this code in a comment because it was meant for use 
# in an interactive IPython session in which we clear the window,
# then display a new graph in it. In a notebook, we can simply 
# display a new graph inline.

Command-Line Arguments; Displaying a Plot from a Script

  • Provided with this chapter’s examples is an edited version of the RollDie.py file you saved above.
  • We added comments and a two modifications so you can run the script with an argument that specifies the number of die rolls, as in:

    ipython RollDieWithArg.py 600
    
  • sys module enables a script to receive command-line arguments that are passed into the program.

  • These include the script’s name and any values that appear to the right of it when you execute the script.
  • The sys module’s argv list contains the arguments.
  • Matplotlib and Seaborn do not automatically display the plot for you when you create it in a script. So at the end of the script we added the following call to Matplotlib’s show function, which displays the window containing the graph:
    plt.show()
    
In [24]:
run RollDieWithArg.py 6000

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.