Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

A gray card with a dark blue border and centered whited text. The text reads: introduction to python data types

Introduction to Python Data Types

Introduction

You cannot have a concert without musicians. Whether it is a 100-member orchestra or an individual singer, without musicians to provide music, there is no concert. In much the same way, there can be no software without data.

From the most complex piece of top-secret military software to a one-line “Hello, World!” script, every piece of software is a transfer of data back and forth between the user (and/or the programmer) and the computer.

A computer is just a complex tool that is better at storing, retrieving, and processing data than a human is. From simple calculators to cell phones to experimental supercomputers, every computer is a tool that is used to handle data.

A calculator, for instance takes in data (the numbers and operators you type) and returns data in the form of the solution to your problem. Your laptop’s operating system is primarily a tool that stores and retrieves data (your files and their contents) for you in a way that is faster and less burdensome than keeping and referencing hard files. Your browser sends HTTP requests to this website, and the website returns this blog post.

Because the exchange of data is at the heart of all software, Python data types and data objects are the basic, fundamental building blocks that you will use to construct your code.

I will dedicate significant time and a significant number of posts to data types. This post will act as your top-level Python data type hub for this website. The data type discussions below will be linked to more in-depth discussions of each data type.

If you’ve come to this blog to learn Python, I hope you’ll read through each of the detailed data type discussions.

Python’s Built-In Data Types

Python comes with fifteen (15) built-in data types. Among them, there are several different kinds of data types. Let’s take a look at the fifteen data types, grouped by the different kinds of information that they store.

  • Text Data Types
    • Strings, represented as str
  • Numeric Data Types
    • Integers (int)
    • Floating Point Numbers (float)
    • Complex Numbers (complex)
  • Sequence Data Types
    • Lists (list)
    • Tuples (tuple)
    • Ranges (range)
  • Mapping Data Types
    • Dictionaries (dict)
  • Set Data Types
    • Sets (set)
  • Boolean Data Type
    • Boolean (bool)
  • Binary Data Types
    • Bytes (bytes)
    • Byte Arrays (bytearray)
    • Memory Views (memoryview)
      • NOTE: Memory views are measured in bytes, so a call would look like this: memoryview(bytes(5))
  • The None Type
    • None Type (NoneType)

It isn’t likely that you’ll have any reason to use the binary types as a beginner, so we are going to mostly skip over them in this article. While the None Type will be useful later on, it isn’t something we’ll use right away so we’ll leave it out of the discussion for now.

In this article, we’ll focus on the remaining eleven (11) data types. While we have broken them down into several different kinds of variables above, it is easiest to think of these data types as falling into two broad groups.

The numeric data types – integers, floating point numbers (usually called ‘floats’), and complex numbers – along with the boolean type, are single-value data types. When you are dealing with any of these data types, you will know that they hold only one value.

All of the other data types are data structures. Each of these data types is capable of holding more than one value. Some of them store their values in a specific order. Others store their values in pairs. All of them, however, can hold and return more than one value.

Now that we are familiar with Python’s data types, let’s look at how we can check the data type of an object in Python.

Checking the Type of a Python Variable or Data Object

You may encounter situations where you need to know the data type of a variable. For example, if you are collecting a user’s age, you will be expecting an integer. You may need to perform operations, such as assigning the user to an age range, that are dependent on the age being a numeric value. As an error check, you may want to confirm the data type of the user’s input before storing it.

In situations like this, we can use the type() method to check on a variable. In the following example, a variable called ‘age’ is created in the age_check() function that takes input from a user. The function then uses the type() method to ensure that the user’s input is an integer.

def age_check():
    age = input('what is your age?')
    if type(age) == int:    #Using type() method to ensure age is an integer
        print('Thank you!')
    else:    #An else statement to handle situations where age is not an integer
        print('please enter an integer value')

The type() method can also be used for debugging. For instance, if you are getting a type error, you may want to insert print(type(problem_var)) (replacing problem_var with the variable in your code that is causing the error) at the line where the error is occurring. By doing so, the program will print the data type of your problem variable. If it is not the data type that you are expecting, then you can modify your code to fix the problem.

I used the type() method for this exact reason when working on a program to help me with my fantasy baseball endeavors. I had gathered a fairly large (absolutely massive) amount of statistics and was performing additional calculations specific to my purposes.

In baseball, most statistics that represent percentages, such as batting average, on-base percentage, and slugging percentage, are represented as decimals. For instance, if a batter gets a hit in exactly 30% of their at-bats, then their batting average is .300.

However, I was trying to perform calculations with an ‘advanced’ statistic called fly ball percentage, and I kept getting a data type error. The system was telling me that it was expecting a numeric data type but was not getting it.

Based on the rest of the information in the error message, I was able to determine the variable that was causing the error. We’ll call it fly_ball_pct. Right before the line in my code where the error was occurring, I put the following:

print(type(fly_ball_pct))
input('')

I use the input() function here to pause the program at the moment that it prints the data type so that I can examine what it is. An alternative would be to write the result of the print() statement to a text file. Doing this, however, is beyond the scope of this article.

As it turned out, the fly ball percentage statistic, and a cluster of other percentage statistics related to it, were strings. Rather than being stored in decimal format like batting average, these statistics were stored as strings with percent symbols. So, instead of sending the value .300 like I thought, the function I was running was actually receiving the string value ‘30.00%’ and trying to perform calculations on it as though it were a numeric value.

As you will see later on, you must exercise a fair bit of diligence to avoid data type errors in Python. It is one of the few drawbacks to the flexibility that Python gives you when assigning values to variables and data objects. Luckily, it is usually fairly easy to pinpoint, identify, and fix errors, just like I described above.

Setting the Data Type of a Python Variable or Data Object

Setting a variable’s data type in Python is very easy to do. It is less easy, however, to explain.

In this section, I will first explain in fairly general terms how data types are set in Python and show you how to assign a value to a variable. Then we will take a step back and talk about the concept of dynamic typing and how it makes Python different from many other programming languages!

How to Set a Variable’s Data Type

In Python, a variable’s data type is set at the moment that you assign a value to it. You do not need to declare the fact that you are creating a variable, nor do you have to declare the type of the variable that you are creating. This is markedly different from most other programming languages.

In C++, for example, if we want to declare a variable that will hold an integer value, we must declare it as an integer. The syntax is:

variable_type varaible_name = value;

Let’s look at an example. In the below C++ code, we declare a variable called my_number, assign a value of 13 to that variable, and then print the value of the variable to the terminal.

int my_number = 13;
cout << my_number;

Note that we first initialize the variable as an integer with the int tag, then name the variable, then assign a value to it. If we were to instead write ‘my_number = 13;‘ we would receive a data type error.

In Python, declaring a variable as an integer is much simpler. To assign the integer data type to a variable, all you need to do is assign an integer value to that variable! In the below example, we assign a value of 13 to the variable my_number. Because 13 is an integer value, the integer data type is automatically assigned to the my_number data object.

my_number = 13

One of the reasons (although not the only reason) that variables must be initialized in other languages has to do with memory. C++ actually has three simple numeric data types: integers, floats, and doubles. Like Python, integers are whole numbers. Floats and doubles, however, both represent single decimal values. The difference, however, is precision: a float can carry a decimal to a maximum of six or seven decimal places, while a double can carry a decimal out to fifteen places.

When C++ was developed in the mid-1980s, computer memory was not nearly as abundant as it is today. The Apple Lisa, released a year or two before C++ was released in 1985, had a 5 Megabyte hard drive and only a single megabyte of RAM. A C++ integer value takes up two or four bytes, depending on the size of the value. Floats always use four bytes, and doubles will use four or eight bytes, depending on the number of decimal places in the value.

In a world where you only had 1MB of RAM to work with, it made sense to only use floats and doubles if they were necessary. With only one million bytes to work with, every single byte counted! Luckily, computing has advanced tremendously since the 1980s and, for the vast majority of beginners, low-level memory concerns will not be an issue.

If you do want to specify a variable’s type, you can do so by typing the abbreviation for the data type followed by parentheses enclosing the variable or value that you want to assign a type to. The following example shows how we would do this for each of the 15 Python data types.

#Variable assignments that would assign specific data type to some value

myvar = str('some value') #Sets myvar to the string value of 'some value'
myvar = int(134)
myvar = float(134.5)
myvar = complex(8j)
myvar = list((some, different, values, 45))
myvar = tuple((item1, 345, item3))
myvar = range(9)
myvar = dict(name='PyTaster', age=1)
myvar = set((item1, item2, item3))
myvar = frozenset((item1, item2, item3))
myvar = bool(True)
myvar = bytes(5)
myvar = bytearray(5)
myvar = memoryview(bytes(5))

#using typing to change the type of a variable

myvar = '100' #note that this is a string, not an integer
my_number = int(myvar) #turning the myvar string into an integer value

Because of something called dynamic typing, the above code would actually run without error. In the next section, we’ll talk about what dynamic typing is and how it makes Python different from many other widely-used programming languages.

Dynamic Typing in Python

Python uses dynamic typing with its data objects. What this means is that, just as we do not have to initialize a variable’s type when we assign a value to it, we do not need to perform any special initialization to change a variable’s data type. In other words, we can reassign variables to different data types.

Remember in C++ how we declared an integer variable:

int my_number = 13;

C++ is a statically typed language. This means that once a variable is initialized as a specific data type (such as integer), the variable will remain that data type unless re-initialized. If we were to later write the following line of code, it would produce a type error in C++.

my_number = 'thirteen';

Because my_number has already been initialized as an integer, we can only assign integer values to it. Because the initialization of my_number told the compiler that the variable called my_number is an integer, a type error will be produced if we attempt to assign it a string value.

This is not the case in Python. In Python, the following code is totally acceptable:

my_number = 13

my_number = 'thirteen'

my_number = [12, 13, fourteen]

Here we have changed the data type of my_number from an integer to a string to a list simply by assigning it a value of each of those types. There are obvious advantages to dynamic typing; however, it also creates some pitfalls to look out for. Let’s take a look at the major pros and cons of dynamic typing.

Advantages of Dynamic Typing

  • Flexibility. Because of dynamic typing, variables are extremely flexible in Python. The same variable name can easily be reused without any additional code.
  • Readability. Because Python code is not cluttered with elaborate variable calls, code is much more readable.
  • Efficient. Especially in large projects with hundreds or thousands of variables, the elimination of variable type declarations in Python can save a significant amount of time.
  • Easy. Because you do not have to think about a variable’s type and remember to initialize it, it is relatively easy to learn variable assignment in Python.

Disadvantages of Dynamic Typing

  • Easy to Make Mistakes. Because variables can be easily reassigned to a different data type, it is also easy to accidentally reassign a variable without receiving an error. You will thus need to carefully keep track of your variable names.
  • Unexpected Data Type Errors. Because Python data types are determined by the value assigned to the variable, unexpected data type errors are easy to encounter, especially when dealing with user input. As discussed above, it is a good idea to have type-checking functionality built into your code when you will be performing operations that expect a certain data type.

The Major Python Data Types

Now that you understand what data types are and how they are assigned and checked, let’s take a look at the major individual data types. As you’ll see, each data type can be useful in different situations. Especially when you are deciding between two similar data types, such as a list and a set, you’ll need to think carefully about how you want to use the data you’ll be storing. Doing this simple thought exercise will often make the choice obvious.

The String Data Type

In general, strings are used to hold text, from letters to words to sentences. To declare a variable as a string, put the value inside of single ‘ or double ” quotes. It doesn’t matter whether you use single or double quotes; however, whichever you choose must both open and close the string.

So, to declare a string, we would write something like:

mystring = 'string'

It should be noted that if you place a number in between quotations marks when declaring a variable, then that number will be read as a string, not a number. For instance, the variables in the following example would each be read as a string, even though they look like an integer and a float, respectively.

not_integer = '133'

not_float = '145.678'

This is important to keep in mind as you are writing software. Mathematical functions cannot be performed on a string, so you’ll need to be sure all numbers that you plan to work with have numeric data types!

At the beginning of this article, I listed all fifteen of Python’s data types. Strings were at the top of that list, and were in their own category. I identified strings as a ‘text data type.’

While this is an important distinction of strings–they are the only data type designed specifically for holding text–strings could easily have been categorized as a sequence data type along with lists, tuples and ranges.

This is because a string is actually an ordered sequence of characters. Let’s create some strings to use as examples.

myword = 'special'
mysentence = 'I am glad to be learning Python!'

It can be easy to think of myword in the above example as a variable holding a single value, i.e., the word ‘special.’ Likewise, it can be easy to think of mysentence as holding a single sentence, or a collection of seven words.

However, as a string, myword is an ordered sequence of the characters in the word ‘special.’ Likewise, mysentence is an ordered sequence of the characters (including spaces) that make up the sentence “I am glad to be learning Python!”

Every string has an index. A Python string’s index runs from left to right and begins at zero. In the myword string above, the letter ‘s’ would have an index of 0, the letter ‘p’ would have an index of 1, and so forth:

index0123456
char.special

While string indexing will become important as you learn about strings in greater detail, for now it is sufficient to know that strings are used to hold text. Let’s move now to the numeric data types, integers and floating point numbers.

Numeric Python Data Types: Integers and Floating Point Numbers

Integers and Floats are the two main numeric data types in Python.

Integers

Integers are a fairly straightforward data type. They are just whole numbers. Any whole number without a decimal, whether positive or negative, is an integer in Python.

In the following example, all three variables that are created are integer values. Note, however, that if any of them were followed by a decimal–even if it is ‘.0’–they would be read as a float by Python.

x = 3

y = 45

z = 69584873748945905

Floating Point Numbers

A floating point number (‘float’ for short) is any positive or negative number that contains one or more decimals. Unlike C++, where a float could only hold six or seven decimal places and a double had to be used for further precision, in Python all decimal numbers are floats. Python floats can even represent scientific numbers, using the letter ‘e’ to indicate the power of 10.

#using float to assign a scientific number to a variable

my_float = 96.4e34

Importantly, if you create a variable that is a whole number but use .0 or .00 after, then the number will be read as a float. This is illustrated in the example below.

my_int = 143 #this will be read as an integer

my_number = 143.00 #this number will be read as a float

Now that we’ve covered the numeric data types, we’ll move on to the sequence data types, lists, tuples and ranges.

Sequence Data Types

The sequence data types include lists, tuples, and ranges. Although you are most likely to use lists and occasionally tuples, it is good to understand what a range is and how it works as well.

Lists

The list is the most common sequence data type that you will encounter in Python. A list is an ordered sequence of objects.

To tell Python that we are creating a list when we initialize a variable, we enclose the list values in square brackets (‘[ ]’). Empty brackets can be used to create an empty list:

empty_list = []

The data objects in a list may be of different types. For instance, it is acceptable for a list to have floats, integers, and strings as items:

mylist = ['my_name', 14, 345.3]

As you can see in the above example, the items in a list are separated by commas.

Because they are ordered sequences, lists are indexed, just like strings. We can therefore use a list’s index to call specific items, just like we can with a string. For instance, if we wanted to call 345.3 from the list in the previous example, we would write mylist[2]. Remember: indexing starts at ZERO!

Tuples

The next type of data object that we’ll look at is the tuple. To declare a tuple, we use parentheses (). Here is an example:

my_tuple = (143, 'pytaster', 43.6)

Tuples are very similar to lists. Just like a list, a tuple is an ordered sequence of objects. Tuples, however, have one major difference from lists. Tuples are immutable.

Immutability means that once an element is inside a tuple, it cannot be reassigned. With a list, we can change the index position of an item in the list if we want to. We cannot do this with a tuple. For instance, in the tuple that we created above, the string ‘pytaster’ will always be at index position 1. We cannot change it to a different value.

One example of a situation where you might use a tuple is when you have a list that needs to be kept in order. For instance, say you have an alphabetized list of names that you are going to use for addressing envelopes. You need the names to stay in order to ensure they match with addresses and other information. You may store the alphabetized list in a tuple rather than a list to ensure that the list is not modified.

Ranges

A range is, essentially, exactly what it sounds like. To declare a range, we call its type and put an integer in parentheses:

my_range = range(7)

my_range now represents the range of values from zero to seven, inclusive.

If we were to write print(my_range), the output would be range(0, 7). If we wanted to print all of the numbers in the range, we could use iteration. We’ll discuss iteration in a later post. For now, just understand that the code in the following example runs through each number in my_range and prints it to the terminal. I’ll demonstrate both the input and the output for clarity.

Input:

myrange[7]

for i in myrange:    #iterates through each of the values in myrange
    print(i)        #prints each of the values in myrange

Output:

0
1
2
3
4
5
6
7

Ranges can be used in a variety of applications, from survey scores to generating random numbers within a defined set. While you’re unlikely to encounter ranges very often as a beginner, you’ll know what they are when you see them. They are another tool in your growing Python tool belt!

Dictionaries

Dictionaries are similar to lists, but are unordered. Instead of being stored in an ordered sequence, dictionary values are stored in key value pairs. This means that every value in a dictionary is actually two values, and those two values are immutably tied together.

To declare a dictionary, we use curly braces ( { } ) to hold our values. The items in each key value pair are separated by a colon (‘:’). Examine the following dictionary that stores names paired with birth months:

my_dictionary = {'John':'Dec', 'Susan':'Jul', 'Brian':'Mar', 'Jake':'Dec'}

This dictionary demonstrates how a dictionary might be useful. For instance, if we want to know Brian’s birth month, we can look up Brian in the dictionary and his name will be paired with the month of March.

Similarly, if we wanted to know who had a December birthday we could search the dictionary for ‘Dec’ and find out that John and Jake both have December birthdays.

Sets

Like lists and tuples, sets are collections of objects, and the objects within a set can be of different data types. Sets, however, have some key differences from the sequence data types.

A set is an unordered collection of unique objects.

Based on this definition, we can see that sets differ from lists and tuples because sets are unordered, while lists and tuples are ordered sequences.

More importantly, however, each object in a set must be unique. This means, for instance, if the number 100 (or the string ‘mystring’, etc.) appears in a set, it can only appear once.

To declare a set, we place a list of unique individual items inside a set of curly braces { }. Although sets and dictionaries both use curly braces to store their data, Python can tell the difference because set items are not stored in key-value pairs and thus do not have the colon separators that are characteristic of dictionaries.

In the following example, we declare a set containing a string, an integer, and a float.

my_set = {'some string', 143, 67.99}

If you declare a set and repeat a value within it, you will not encounter an error. However, the set that is created will have the repeated value listed only once. This example will better illustrate what I am saying.

Input:

my_set = {1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4}
print(my_set)

Output:

{1, 2, 3, 4}

Sets are a perfect tool for removing duplicates from a list. All you need to do is declare a new variable that puts the list into a set data type. Let’s take a look with an example.

Input:

#A list with duplicated values

my_list = [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4]

#turning the list into a set

my_set = set(my_list)

print(my_set)

Output:

{1, 2, 3, 4}

Booleans

We will end our introduction to Python data types with booleans. Boolean data objects can hold only one of two values: True or False. The capitalization here is intentional–boolean values in Python are capitalized.

Boolean objects are generally used when we want a function or process to continue running while something is true, but want that process to end or be modified if the condition becomes false.

The following example is pseudo-code, but it should give you an idea of how you might practically use a boolean variable.

paid = False

input('do you want to pay?')

>>User input Yes
>>User processes payment

paid = True

In this example, we have a boolean variable, paid, that is set to false. Perhaps this is part of a system that monitors monthly subscription payments.

Once the user has paid, the boolean variable is set to true. In a working system, this variable might automatically reset to false after 30 days to restart the next month’s payment cycle.

Conclusion

With that, we’ll wrap up your “introduction” to Python data types. I know that this may seem like a lot. I promise that this article, and the additional articles on the individual data types, will be worth the time and effort. When you start really writing code you’ll see that the early work on understanding data types was worthwhile.

I hope that you were able to learn something from this article, and I hope that you enjoyed reading it, too. Until next time, happy coding!