6.2.7 Example: Word Counts

  • Script that builds a dictionary to count the number of occurrences of each word in a tokenized string.
  • Python automatically concatenates strings separated by whitespace in parentheses.
# fig06_02.py
"""Tokenizing a string and counting unique words."""

text = ('this is sample text with several words '
        'this is more sample text with some different words')

word_counts = {}

# count occurrences of each unique word
for word in text.split():
    if word in word_counts: 
        word_counts[word] += 1  # update existing key-value pair
    else:
        word_counts[word] = 1  # insert new key-value pair

print(f'{"WORD":<12}COUNT')

for word, count in sorted(word_counts.items()):
    print(f'{word:<12}{count}')

print('\nNumber of unique words:', len(word_counts))
In [1]:
run fig06_02.py
WORD        COUNT
different   1
is          2
more        1
sample      2
several     1
some        1
text        2
this        2
with        2
words       2

Number of unique words: 10

Python Standard Library Module collections

  • The Python Standard Library already contains the counting functionality shown above.
  • A Counter is a customized dictionary that receives an iterable and summarizes its elements.
In [2]:
from collections import Counter
In [3]:
text = ('this is sample text with several words '
        'this is more sample text with some different words')
In [4]:
counter = Counter(text.split())
In [5]:
for word, count in sorted(counter.items()):
    print(f'{word:<12}{count}')
    
different   1
is          2
more        1
sample      2
several     1
some        1
text        2
this        2
with        2
words       2
In [6]:
print('Number of unique keys:', len(counter.keys()))
Number of unique keys: 10

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 6 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.