DataFrames¶arrayDataFrame is a SeriesDataFrame from a Dictionary¶DataFrame from a dictionary that represents student grades on three examsimport pandas as pd
grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90],
'Sam': [94, 77, 90], 'Katie': [100, 81, 82],
'Bob': [83, 65, 85]}
grades = pd.DataFrame(grades_dict)
DataFrames in tabular format with indices left aligned in the index column and the remaining columns’ values right alignedgrades
DataFrame’s Indices with the index Attribute¶index attribute to change the DataFrame’s indices from sequential integers to labelsDataFramegrades.index = ['Test1', 'Test2', 'Test3']
grades
DataFrame’s Columns¶Eva’s grades by nameSeriesgrades['Eva']
DataFrame’s column-name strings are valid Python identifiers, you can use them as attributesgrades.Sam
loc and iloc Attributes¶DataFrames support indexing capabilities with [], but pandas documentation recommends using the attributes loc, iloc, at and iatDataFrames and also provide additional capabilities DataFrame’s loc attributegrades.loc['Test1']
iloc attribute (the i in iloc means that it’s used with integer indices)grades.iloc[1]
loc and iloc Attributes¶loc, the range specified includes the high index ('Test3'):grades.loc['Test1':'Test3']
iloc, the range you specify excludes the high index (2):grades.iloc[0:2]
grades.loc[['Test1', 'Test3']]
grades.iloc[[0, 2]]
Eva’s and Katie’s grades on Test1 and Test2grades.loc['Test1':'Test2', ['Eva', 'Katie']]
iloc with a list and a slice to select the first and third tests and the first three columns for those testsgrades.iloc[[0, 2], 0:3]
DataFrame.False are represented as NaN (not a number) in the new `DataFrameNaN is pandas’ notation for missing valuesgrades[grades >= 90]
grades[(grades >= 80) & (grades < 90)]
& (bitwise AND), not the and Boolean operatoror conditions, use | (bitwise OR)arrays, but always returns a one-dimensional array containing only the values that satisfy the conditionDataFrame Cell by Row and Column¶DataFrame method at and iat attributes get a single value from a DataFramegrades.at['Test2', 'Eva']
grades.iat[2, 0]
grades.at['Test2', 'Eva'] = 100
grades.at['Test2', 'Eva']
grades.iat[1, 2] = 87
grades.iat[1, 2]
DataFrames describe method calculates basic descriptive statistics for the data and returns them as a DataFramegrades.describe()
set_option functionpd.set_option('precision', 2)
grades.describe()
mean on the DataFramegrades.mean()
DataFrame with the T Attribute¶T attribute to get a viewgrades.T
describe on grades.Tgrades.T.describe()
grades.T.mean()
DataFrame by its rows or columns, based on their indices or valuessort_index and its keyword argument ascending=False grades.sort_index(ascending=False)
axis=1 keyword argument indicates that we wish to sort the column indices, rather than the row indicesaxis=0 (the default) sorts the row indicesgrades.sort_index(axis=1)
Test1’s grades in descending order so we can see the students’ names in highest-to-lowest grade order, call method sort_valuesby and axis arguments work together to determine which values will be sortedaxis=1) for Test1grades.sort_values(by='Test1', axis=1, ascending=False)
DataFrame insteadgrades.T.sort_values(by='Test1', ascending=False)
Test1’s grades, we might not want to see the other tests at allgrades.loc['Test1'].sort_values(ascending=False)
sort_index and sort_values return a copy of the original DataFrameinplace=True ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.