import pandas as pd
titanic = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/carData/TitanicSurvival.csv')
DataFrame shows only the first 30 rows, followed by “…” and the last 30 rowsDataFrame methods head and tailpd.set_option('precision', 2) # format for floating-point values
titanic.head()
titanic.tail()
1305 is NaN (not a number), indicating a missing value in the dataset'Unnamed: 0')titanic.columns = ['name', 'survived', 'sex', 'age', 'class']
titanic.head()
describe on a DataFrame containing both numeric and non-numeric columns produces descriptive statistics only for the numeric columnsage columntitanic.describe()
count (1046) vs. the dataset’s number of rows (1309—the last row’s index was 1308 when we called tail)1046 (the count above) of the records contained an ageNaNNaN) by default1046 people with valid agesmean) age was 29.88 years oldmin) was just over two months old (0.17 * 12 is 2.04)max) was 8028 (indicated by the 50% quartile)25% quartile is the median age in the first half of the passengers (sorted by age)75% quartile is the median of the second half of passengerssurvived column to 'yes' to get a new Series containing True/False values, then use describe to summarize the results(titanic.survived == 'yes').describe()
describe displays different descriptive statistics:count is the total number of items in the resultunique is the number of unique values (2) in the result—True (survived) and False (died)top is the most frequently occurring value in the resultfreq is the number of occurrences of the top value%matplotlib inline
DataFrame’s hist method analyzes each numerical column’s data and produces a separate histogram for each numerical columnhistogram = titanic.hist()
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.