1.3 Data Hierarchy

Data hierarchy from bits to records

Bits

  • A bit (short for “binary digit”—a digit that can assume one of two values) is the smallest data item in a computer
  • Can have the value 0 or 1
  • Bits are the basis of the binary number system

Characters

  • Decimal digits (0–9), letters (A–Z and a–z) and special symbols such as

    $ @ % & * ( ) – + " : ; , ? /

  • Computer's character set contains the characters used to write programs and represent data items
  • Computers process only 1s and 0s, so a character set represents every character as a pattern of 1s and 0s

Characters (cont.)

  • Python uses Unicode® characters composed of one, two, three or four bytes (8, 16, 24 or 32 bits, respectively)—known as UTF-8 encoding
  • Unicode contains characters for many of the world’s languages
  • The ASCII (American Standard Code for Information Interchange) character set is a subset of Unicode that represents letters (a–z and A–Z), digits and some common special characters
  • Unicode charts for all languages, symbols, emojis and more

Fields

  • Fields are composed of characters or bytes
  • A field is a group of characters or bytes that conveys meaning
    • a person’s name
    • a person’s age
    • etc.

Records

  • A record is a group of related fields
  • A record for an employee might consist of
    • Employee identification number (a whole number)
    • Name (a string of characters)
    • Address (a string of characters)
    • Hourly pay rate (a number with a decimal point)
    • Year-to-date earnings (a number with a decimal point)
    • Amount of taxes withheld (a number with a decimal point)

Files

  • A file is a group of related records
  • More generally, a file contains arbitrary data in arbitrary formats
  • Any organization of the bytes in a file, such as organizing the data into records, is a view created by the application programmer
  • Not unusual for an organization to have many files, some containing billions, or even trillions, of characters of information

Databases

  • A database is a collection of data organized for easy access and manipulation
  • Most popular model is the relational database, in which data is stored in simple tables
  • A table includes records and fields
  • You can search, sort and otherwise manipulate the data, based on its relationship to multiple tables or databases

Big Data

  • Table below shows some common byte measurements:
Unit Bytes Which is approximately
1 kilobyte (KB) 1024 bytes 103 (1024) bytes exactly
1 megabyte (MB) 1024 kilobytes 106 (1,000,000) bytes
1 gigabyte (GB) 1024 megabytes 109 (1,000,000,000) bytes
1 terabyte (TB) 1024 gigabytes 1012 (1,000,000,000,000) bytes
1 petabyte (PB) 1024 terabytes 1015 (1,000,000,000,000,000) bytes
1 exabyte (EB) 1024 petabytes 1018 (1,000,000,000,000,000,000) bytes
1 zettabyte (ZB) 1024 exabytes 1021 (1,000,000,000,000,000,000,000) bytes

Big Data (cont.)

  • Amount of data being produced worldwide is enormous and its growth is accelerating
  • Big data applications deal with massive amounts of data
  • Field is growing quickly
  • Lots of opportunity for software developers

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 1 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.