Lecture 16

Review

  1. What is data abstraction?
  2. What is data encapsulatioin?

Classes

Classes are programming structures that hold data and methods. They allow you to bundle together all data related to a thing and all of the methods of interacting with that thing. This allows for data abstraction or data encapsulation. You are encapsulating all of the information related to a thing in one variable/object; you are abstracting away (or hiding) the details of how that information is stored. Basically, by creating a class, you are able to create a new kind of data type.

In the paragraph above, classes are described as representing "things". What kinds of things can classes represent? Anything, probably. When creating a class, you will need to answer these two questions:

The answers to these two questions depend on how the class will be used. Not all data for a thing is relevant for a program.

Some terminology:

Creating a Simple Class

A class has both a class header and a class body. The class header starts with the class keyword, then the class name, ending with a colon. The class header has some optional parts (such as inheritance) that we may cover later. The class body is indented from the class header and contains fields (variables) and methods (functions).

class ClassEx1:
    var1 = None
    var2 = 'cat'
    
    def set_var1(self, value):
        self.var1 = value

In the example above, the class is named "ClassEx1". Any valid identifier will work as a class name. Traditionally, class names capitalize the first letter in each word of the name (e.g. the 'C' in "Class" and "E" in "Ex").

In the body of the class, there are two fields. Notice that creating a field is very similar to creating a variable. There is also a method in the body. Creating a method is very similar to creating a function. Below, we'll cover some of the nuance in fields and methods (such as what self is doing in the method).

To use a class, you must create an instance of the class. With that instance, you can access fields and methods. In most cases, each instance of a class has its own set of fields. So, two instances of a class can each have their own values for the fields.

Code Output
a = ClassEx1() #this creates an instance of the class
print('a.var2 =', a.var2) #this accesses var2, belonging to the instance 'a'
a.var2 = 'dog' #changing a's var2's value
print('a.var2 =', a.var2)
print('a.var1 =', a.var1)
a.var1 = 1
print('a.var1 =', a.var1)
a.set_var1('ant') #here's how you call a method...more on this below
print('a.var1 =', a.var1)
print('notice a.var1's value changed')

b = ClassEx1()
print('b.var1 =', b.var1)
print('b.var2 =', b.var2)
print("notice b's fields are separate from a's")
b.var1 = 'bird'
print('b.var1 =', b.var1)
print('a.var1 =', a.var1)
print("changing b's var1 does not affect a's var1")
a.var2 = cat
a.var2 = dog
a.var1 = None
a.var1 = 1
a.var1 = ant
notice a.var1's value changed

b.var1 = None
b.var2 = cat
notice b's fields are separate from a's
b.var1 = bird
a.var1 = ant
changing b's var1 does not affect a's var1

Methods and self

There are a number of additions that make methods different from functions. In almost all cases, functions need access to fields. In Python, this is accomplished with the first parameter in a method. This first parameter always refers to the object the method is being called from. Traditionally, this first parameter is called self; this is not required, but Python programmers would be confused if you do not follow this convention. Whenever you write a method, you must include at least one parameter (the self parameter).

Code Output
class ClassEx2:
    def method():
        print('method was called')

a = ClassEx2()
a.method()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: method() takes 0 positional arguments but 1 was given

When calling a method, Python automatically passes in a reference to the object you're dealing with. So even though no arguments were given to method in the example above, Python automatically included one.

This also means that if you want to pass an argument into a method call, you must have two parameters. The first parameter will be the self reference and the second will be the value to pass in.

Code Output
class ClassEx3:
    def methodBAD(value): #"bad" because it only has one parameter, poorly-labeled
        print('the value was:', value)
    
    def methodGOOD(self, value): #"good" because it has a self reference variable and a value parameter
        print('the value was:', value)

a = ClassEx3()
print('calling methodBAD')
a.methodBAD('argument passed in')

print('calling methodGOOD')
a.methodGOOD('argument passed in')
calling methodBAD
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: methodBAD() takes 1 positional arguments but 2 were given
calling methodGOOD
the value was: argument passed in

The __init__ Method

When you create a new object (e.g. a = ClassEx3()), you're actually calling a constructor. A constructor is a method that "constructs" (also called "builds" or "initializes") the object, basically by setting up the object's fields. In Python, the constructor is called "__init__" (those are two underscores before "init" and two after), short for "initializing". For example:

class Person:
    def __init__(self):
        print('initializing a new Person object')

Just like any other method, constructors can take arguments. For example, to initialize a Person object with their name, write the constructor as:

class Person:
    def __init__(self, name):
        self.name = name

Notice that the field "name" doesn't exist until we create it in the constructor. This is ok. Often, the __init__ method is the place that fields are created. Since this method is called whenever an object is created, fields created in this method will exist for the entirety of the object. Now, when we create Person objects, we must pass in the person's name.

Code Output
class Person:
    def __init__(self, name):
        self.name = name

print('creating Joel')
joel = Person('Joel')
print('creating baby')
baby = Person()
creating Joel
creating baby
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'name'

"Private" Members and Set/Get Methods

Often, we can't trust users to do the right thing. Thus, with classes, we often engage in data hiding. Data hiding is when we hide fields from the user and force them to use methods to access and modify fields. These methods can then validate the values coming in from the user. They also help to ensure data abstraction and encapsulation. By default, all members are publicly-accessible, that is they are accessible both in the class and outside of the class. Private members are those that are only directly-accessible inside the class (i.e. the members are hidden).

To hide a field (or method) from the user, start the member's name with two underscores. This causes Python to secretly mangle the name so that it isn't easily accessible outside of the class. Inside the class, you can still access the member using its name (including the two underscores).

For example, we want an age field in our Person class, but it only makes sense for age to be a positive number (maybe even just a positive int). So, we don't want the age field to be publicly accessible (why not?). To make age private, we name it __age. See the example below:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.__age = age

The user of the Person class can still access name directly but they can't access age:

>>> joel = Person('Joel', 65)
>>> print('name =', joel.name)
name = 'Joel'
>>>
>>> print('age =', joel.__age)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Person' object has no attribute '__age'
>>>
>>> print('age =', joel.age)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Person' object has no attribute 'age'

If you want to allow the user to access your private fields, we need to write set and get methods (also called setters and getters or mutators and accessors). A get method gets the value of a field. A set method sets, modifies, or mutates the value of a field. Traditionally, accessors follow the naming convention "get_field_name" and mutators follow the naming convention "set_field_name". For example:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.set_age(age)
    
    def set_age(self, age):
        if not isinstance(age, int):
            raise TypeError('age must be a positive int')
        elif age < 0:
            raise ValueError('age must be a positive int')
        
        self.__age = age
    
    def get_age(self):
        return self.__age

Now, if the user wants to access age, they can use the set_age and get_age methods. Notice that the set_age method also validates the age, ensuring that it is a positive int. Notice also that the __init__ method now calls the set_age method instead of directly setting __age. Why do you think this change was made?

Your setter should do validation of the value before doing the assignment. In many cases, your setter and getter methods should save and return copies of the fields and not the originals so that the field's value can only be modified through the setter. We'll talk more about this when we cover data aggregatioin.

In many other programming languages, it is strongly recommended to make all fields private. Python is more lax about that, but it is still a good idea to make your fields private.

Property

With some fields public (e.g. name in the Person class) and other fields private (e.g. age in the Person class), accessing fields becomes inconsistent. The programmer will need to memorize which fields they can access directly and which fields they must use methods for.

One solution to this problem is to provide set/get methods for all fields. A lot of programming languages take this approach (and also say that almost all fields should be private). This is an option for Python as well. In our Person class, we could do:

class Person:
    def __init__(self, name, age):
        self.set_name(name)
        self.set_age(age)
    
    def set_name(self, name):
        self.__name = name
    
    def get_name(self):
        return self.__name
    
    def set_age(self, age):
        if not isinstance(age, int):
            raise TypeError('age must be a positive int')
        elif age < 0:
            raise ValueError('age must be a positive int')
        
        self.__age = age
    
    def get_age(self):
        return self.__age

In the code above, notice that name was changed to __name, set/get methods were written, and set_name is used in the __init__ method.

While this solves our problem of inconsistent access to fields, it has the unfortunately effect of causing the user to type more to access a field. For example, now the user has to type:

joel.set_name('Joel')

instead of the shorter option of:

joel.name = 'Joel'

Python offers an alternative to this first solution: create properties. Properties are basically pseudo-fields. The user can treat them like a field, but secretly the user is interacting with methods. To create a property, first write set and get methods. Then, use the property function to create the property by passing in the get and set methods (do not call these methods when passing them in). The property function returns the pseudo-field, so store it into a field with the name you want to pseudo-field to have. For example:

class Person:
    def __init__(self, name, age):
        self.set_name(name)
        self.set_age(age)
    
    def set_name(self, name):
        self.__name = name
    
    def get_name(self):
        return self.__name
    
    name = property(get_name, set_name)
    
    def set_age(self, age):
        if not isinstance(age, int):
            raise TypeError('age must be a positive int')
        elif age < 0:
            raise ValueError('age must be a positive int')
        
        self.__age = age
    
    def get_age(self):
        return self.__age
    
    age = property(get_age, set_age)

To now use these properties, treat them just like a field:

>>> joel = Person('Joel', 65)
>>> print('name =', joel.name)
name = Joel
>>> print('age =', joel.age)
age = 65
>>> joel.age = 66
>>> print('age =', joel.age)
age = 66
>>> joel.age = -1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: age must be a positive int

There are other things you can do with properties, such as provide a del method and provide documentation. We won't cover these features. However, the last thing to cover with properties is that you can use them to make constant fields. Constant fields are fields that do not change their values. To create a constant field, just create a get method, then just pass in the get method to the property function. In the example below, two constant fields are created: birthday and home_planet and some other changes are made to support those fields (all new code is bolded and red):

class Person:
    def __init__(self, name, age, birthday):
        self.set_name(name)
        self.set_age(age)
        self.__set_birthday(birthday)
    
    def set_name(self, name):
        self.__name = name
    
    def get_name(self):
        return self.__name
    
    name = property(get_name, set_name)
    
    def set_age(self, age):
        if not isinstance(age, int):
            raise TypeError('age must be a positive int')
        elif age < 0:
            raise ValueError('age must be a positive int')
        
        self.__age = age
    
    def get_age(self):
        return self.__age
    
    age = property(get_age, set_age)
    
    def __set_birthday(self, birthday):
        #some validation code to ensure birthday is a valid date should be here
        self.__birthday = birthday
    
    def get_birthday(self):
        return self.__birthday
    
    birthday = property(get_birthday)
    
    def get_homeplanet(self):
        return 'Earth'
    
    home_planet = property(get_homeplanet)

Notice that:

<< Previous Notes Daily Schedule Next Notes >>