Wednesday, 19 October 2016

Python Built-in Data Structures : List, Dictionary, Tuple, Set, String, File Object

Hello again, readers! This is the 5th article in the series Python on Terminal and, this time, we are going to cover the basics of data structures in Python. Everything in Python that holds memory, in one or the other way, is considered as an Object and the type that the object belongs to is called as Data structure (or Data type, in other words). An object is a part of memory with related data, which may be assigned or predefined, and is associated with a set operations. There are two types of Data types (or Data Structures)-


1. Built-In Objects or Data Types
There are four important built-in Objects in Python-

  • List
  • Dictionary
  • Tuple
  • Set

2. User-Defined Objects or Data Types
We will be knowing about these data types when we will be discussing about OOP (Object Oriented Programming) concept, wherein we can define and create our own data structures in Python and use them.

When we speak about Python data structures, we need to look at the high to low level hierarchy of Python program. Lets look at the following figure.

  |  Programs   |  Highest Level
  |   Modules   |
  | Statements  |
  | Expressions |
  |   Objects   |  Lowest Level

Python programs are made of modules, which, in turn, is made up from number of statements. Each of the statements contains expressions, which is the basic level of Python programming hierarchy. The expressions actually create objects and process them using some operations or methods. To do some task through programming languages, we have to deal with two important trade-offs -

1. Data Structure:
In this entity, we define our data types optimally, such that less memory should be consumed for the program execution. Along with this, we also have to take care of the time needed for program execution, which also depends on the logic used in writing the program. So, defining data structure is defining objects.

2. Time-space:
This trade off actually depends on the algorithm of your program. The ideal way to code a program is, one should write an algorithm which will use less space and less time to execute. This is where our data types come into the picture. Python programmer should identify which data structure is suitable to accomplish the task optimally.

Lets elaborate each object in detail now.

Python Lists

If you have any knowledge of C programming or C++, then we can relate this data type with Arrays. How? You will come to know when we will be learning them in more details.

  • Lists are the sequential data type. The data in the list is identified by its position (also known as offset) with respect to the base position starting from 0
  • Lists have no specific fixed size, they can be shrunk or extended
  • Lists are mutable, as they can be modified -
    • Assign the new value by using the position of the data in the list
    • Use the inbuilt functions to change or append the lists
Unlike arrays in C, Python Lists can contain data of dissimilar type. In other words, you can have the first positional variable of float type, second as int, third as str and so on. Python intelligently identifies them and can validate the type of operations you want to do with each of the List item. For example, you can perform arithmetic operations on first and second item in the list, while concatenation on string object which is at third position. Hence,this is the best tool to represent the collection of dissimilar type of objects, like mailboxes, shopping carts etc.

Creating a List

It doesn't need a lot efforts to create a Python list, you just need a bunch of Python objects separated by comma enclosed in a pair of square brackets [ ]. Consider the example as shown below:

>>> Friendlist = ['Ram', 'Mandar', 'Kumar', 'Mighty']

In above example, Friendlist is the simple variable. It is assigned with the comma separated string values in square brackets ([]) to make it an object of type list.

Now as I mentioned earlier, each of the list item is associated with a positional parameter called 'Index' starting from '0', counting from left to right in the list. So, 'Ram' is at the index '0', 'Mandar' at '1', 'Kumar' at '2' and 'Mighty' at '4'. Thus, 'Ram' and 'Mighty' can be accessed using their indices as follows:

>>> Friendlist[0] # This returns Ram

>>> Friendlist[3] # This returns Mighty

You might have noticed that, above list contains all the elements of a same kind i.e. all of them are strings. But, as I mentioned earlier that, this is not the limitation, you may also have objects of different types in a list as shown in below example:

>>> Mydata = ['Ram', 11.20, 400, 'India']

It contains strings, float and integer type of data. We will be learning Python lists in more details in subsequent articles in this series.

Python Dictionaries

Python dictionaries are the only data types which provide object relativity. In other words, unlike Python lists, the elements in Python dictionaries are identified using a 'key', rather than any positional parameters. Dictionaries contain the data mapping, wherein the 'values' are mapped to their respective 'keys'. As an example, consider that we have to maintain the information about a product and the color of product being 'Red'. In a Python dictionary, we can represent this as 'Colour' : 'Red', where 'Color' is the key, which should be unique to the dictionary, and 'Red' is the value associated with it.

  • Dictionaries are also mutable i.e. their contents can be modified.
  • Unlike Lists, dictionary elements are accessed using their respective keys, instead of their position in the dictionary.
  • They are also called as 'Associative arrays' or 'Hashes'
  • Like lists, Python dictionaries can grow any further and can contain elements of dissimilar types.
  • Keys are unique and have only one associated value. This associated value can be any object, may it be a String, Integer, Float, Lists or even Dictionaries, and this can be nested to any number of times
  • Multiple keys in a Dictionary may have same values, but the keys must be unique and different
  • This is an unordered collection of data, so it is not associated with positional parameters
Dictionaries are internally stored as Hash tables (It is type of data structure which starts with small size and grows extensively on demand facilitating the fast data retrieval) by Python. Python intelligently uses the optimized hashing algorithms to retrieve the stored data.

Creating Python Dictionaries

You can create dictionaries in number of ways. The most common way to create Python dictionary, with comma separated Key:Value pairs enclosed in curly braces { }, is as follows:

>>> MyData = {'name': 'Vishal', 'age': 23}

We have just created the dictionary object MyData, wherein the keys are 'name' and 'age' and there respective values are 'Vishal' and 23. Lets not get confused with the data presented in a single quote here, these are the string objects (which we will be discussing later in this article). Key and values in the dictionaries can be of any type, but the keys should be unique to that particular dictionary. You can see in the following dictionary 'D' that we have a Key:Value combination as 'integer type':'float type' which is valid.

>>> D = { 11:12.5, 12:'Mandar'}

As mentioned earlier, the dictionary values can be accessed using their respective keys, which is demonstrated in below example:

>>> MyData['name'] # This would return 'Vishal'

>>> MyData['age'] # This would return 23

This is how we access the value using key, in Python dictionaries. Please note that, the we have used square brackets (and not curly ones) to access the dictionary values. Now, to prove that Python dictionaries are mutable and hence can be modified and can have nested objects values, we have created a dictionary 'E' in the example given below:

>>> E = { 'India':{'City':'Mumbai','Street':50}}

Here, in dictionary 'E', the key is 'India' and it points to other dictionary, which may again have further nesting in it. In such cases, how do we access the values using dictionary keys? Just have a look at example below:
>>> E['India'] ['Street'] # This would return 50

In above examples, E['India'] would give us the value {'City':'Mumbai','Street':50}, an another dictionary. While the second key 'Street' points to the value 50 in that dictionary. In this way, using 'Double indexing', you can access a value in a dictionary enclosed in another dictionary (also called as 'Compound Objects').

When you pass these dictionary names, E or D or MyData, as a parameter to built-in type() function, then it will return the 'type' (or 'Class') it belongs to. Lets try it then-

>>> type(E)
<type 'dict'>

We will learn the operations and the built-in methods to manipulate the dictionaries in separate article.

Python Tuples

Tuple is another data type (or Data structure or Object) in Python, which is almost similar to Python lists, with minute but crucial differences. There are lot more features a Tuple can provide, which we need to learn and we must find out the reason as to why this data type is introduced in Python, when the Python lists is available.

  • We can create a tuple by mentioning comma separated values enclosed in the parenthesis (( )). Tuple creation is almost the same as a List creation, with the brackets making the difference.
  • Like a Python list, tuple supports random collection of objects
  • The collection of elements is ordered, meaning that the individual elements are identified with their position in the sequence (Index), counting from left to right starting with '0'.
  • Tuple is a fixed length object i.e. we can't shrink it or update it. Hence they are immutable objects.
  • Tuple supports nesting of compound objects as in dictionary and list.
  • As the tuples are immutable, you can't change the size of a tuple without making a copy if it.
Lets create Tuple named as T.

T = (0,'Ram','Nikhil',24.4)

Tuple T contains objects of String, Integer and Float types. We can access these tuple items in the same way we accessed items in the list i.e. using indices inside square brackets.
T[1] # This returns 'Ram'

So the pointing offset starts from 0. Hence T[0] will point to 0, T[2] points to 'Nikhil' and so on.

One of the many things in which Python tuples differ from Python lists is that, we cannot make changes in a tuple, hence they are Immutable objects. But, of the many things which are common in tuples and lists, one is 'nesting'. As in lists, you can have a list within a list, you can have a tuple inside a tuple, or a list inside a tuple, or even a dictionary inside a tuple. Just have a look at the example below:

>>> M = ('dept',(23,'MG road'),[50,34,'India'])

We have created tuple M with compound objects like string, tuple, lists. We can access the item 'MG Road' in tuple M, in the same way we did in case of dictionaries, remember 'Double indexing'?
>>> M[1][1] # This returns 'MG road'.

First offset [1] in M[1][1] will point to object tuple (23,'MG road') and second offset [1] will point to the position of '1' inside this compound object. So, ultimately M[1][1] will point to 'MG road'.

Let us check what type the tuple T belongs to, using type() function as follows:

>>> type(T)
<type 'tuple'>

For more details about Python tuples, the operations and methods related to them, please check this article.

Python Sets

This data type is used when the existence of an item is more important than its position. Simply, if we want to know whether 'Ankit' is present in a group of friends FriendSet, irrespective of it's position in the group, we can use Python Sets. In other words, a Python set is an unordered list of unique items (but it is not of type list). A set can be created using Python built-in set() and providing a list of items as a parameter to it. Lets create the set then -

>>> FriendSet = set(['Ram', 'Mandar', 'Jessica'])

Here we have created the set named as FriendSet, which contains the list of friends. We have passed a list object to set() function in the above example, but it is not the only way. You can also create it - using a valueless dictionary - as shown below:

>>> Bestfriend = set({'Mandar'})

The sets in Python logically resemble the sets in General Mathematics. You can find out Union and Intersection of two or more sets, as given below:

# This operation returns common items, called as 'Intersection'
>>> FriendSet & Bestfriend 

Same way, we can find out the difference between the sets, which will return unpaired values from both the sets.

# This operation returns set of uncommon items
>>> FriendSet - Bestfriend 
set(['Ram', 'Jessica'])

That's all about introduction to Python sets. We will learn more about them in later part of the series.

Python Numeric Data Types

Numeric data types in Python are very straightforward. It provides variety of types of numbers which are needed for arithmetic and scientific calculations. This set of numeric objects includes -

  • Integers (numbers without decimal point)
  • Floats (with decimal points)
  • Complex numbers (with real and imaginary parts)
You can perform basic arithmetic operations, like addition ('+' operator), subtraction ('-' operator), multiplication ('*' operator), division('/' operator) and exponential operation ('**' operator), on the objects of numeric data type. Python intelligently knows the type of number which is involved in the calculations. If we say a = 5 then variable 'a' will point to the memory which contains the integer value (or type int). Similarly, if we say a = 4.99 then a will be automatically considered as the floating point variable (or type float). Consider the examples below:

>>> a = 5 # Creating a variable which stores an integer value
>>> b = 1.234 # Creating a variable which stores a float value

We must take care of the letter cases, as anything in Python is case sensitive. Thus, variable var and VAR will be treated as different objects by Python. As mentioned earlier, we don't have to declare the variables ahead of their use as Python dynamically decides the type of the objects based on assignment value. In the example given below, we have never declared variable c before using it. Then, we are adding an int type and a float type successfully, with Python giving out no errors.

>>> a = 5 # Variable 'a' of type 'int'

>>> b = 6.9 # Variable 'b' of type 'float'

>>> c = a + b # Addition of 'a' and 'b' stored in another variable 'c'

>>> print(c)

>>> type(c) # Checking it's type using  type() function
<type 'float'>

While discussing about Numeric data types in Python, it is necessary to introduce the math module which provides extensive functions for vast numeric calculations. In the example given below, we have made use of sqrt() function available in math module. Before we can use it, we have to import math module, as shown in line 1.

>>> import math # 'math' is a built-in module 
>>> math.sqrt(75) # sqrt() is a built-in function in math module
8.6602540378443873 # Output

In the earlier article, we have discussed modules as - It is a package or a simple Python code saved with .py extension. When we have to use some functions from a module, then we have to import them explicitly in our Python code by writing import followed by the module name (math in above example).

Python Strings

String is one of the most important data types in Python as it is almost needed in any Python program for recording the textual information or random collection of bytes (e.g. an Image file). String objects are ordered collection of data, which are indexed counting from '0' starting from left to right. Lets take a look at the following example to learn strings in easier way.

>>> S = 'Be like Ninja' # Creating a 13 character string and storing it to S

>>> S[0] # This returns 'B' 

>>> S[3] # This returns 'l' 

So, as you can see in the above example, string is internally stored as Python sequences, like Lists. Thus, we can access it's elements in the same way as we did in case of lists. Strings are immutable (can't be modified). So, there is no way with which we can modify a string object, if we try to do so, Python will throw errors.

We can operate and manipulate strings with a bunch of built-in functions like len() and methods like split(). We will be learning them all in later chapters in the series.

Python File Objects

When we have to deal with data external to the program, in any programming language, we generally use Files and Python is no exception. File objects in Python are used to interact with all type of external data like mails, audio-video clips, documents, CSV files etc. The type of data defines the type of file, like for an audio file, Python will treat it as stream of bytes which is machine readable.

In order to access/modify the files saved on our hard drive through a Python code, we have to open the file using a file object, then read from or write to the file using the file object and finally, close the file.

Open the file ---> Read or Write to the file ---> Close the file

While opening a file, we must instruct Python in what mode the file has to be opened and the modes can be- Read ('r'), Write ('w'), Append ('a') or Binary ('b'), upon which Python will return a file object. There are built-in methods, associated with the file object, to open, read, write and close the files, which are open(), read(), write(), close() respectively. Lets create the 'Ninja.txt' file and write something to it. To create a new file, we have to provide the file name and mode as 'w' as a parameters to the open() function which will create a file object f for us. Python will create a new file with specified name in the current directory.

>>> f = open('Ninja.txt', 'w') # Creates a new file object 'f' by opening 'Ninja.txt' in Write mode

>>> f.write('Ninja is a coder,\n') # Writes a line to file

>>> f.write('Be like Ninja.\n') # Writes another line
>>> f.close() # Closes the file

In order to read this file, we need to open it with the 'r' mode. Anyway, if no mode option is provided to open() function, the file will be opened in read mode, by default. Lets see the below example:

>>> f = open('Ninja.txt') # Opens the file in Read mode

>>> var = # Reads the file contents and stores it in 'var' 

>>> print(var) # Prints the file contents
Ninja is a coder,
Be like Ninja.

We must close the file object, using the close() method, whenever we are done with it.

With this, we have come to an end of the scope of this article. We have learned about most of the Python data structures in this article. In the next few articles in the series, we will be exploring them all in more details. Please post your views and feedback in the comment section below and stay connected!

This article is originally published at - Introduction to Built-in Data Structures in Python