Main menu

Pages

Numpy Library - The first step in data science in Python

 


The article provides information about the Numpy library in Python, how to install it and deal with it in data processing..


Numpy Library - The first step in data science in Python



Numpy library is  one  of the libraries that you cannot ignore when you go into the midst of data science for two main reasons, the first is that many data science  and machine learning libraries   rely heavily on it, and the second reason is that this library gives you the ability to deal with arrays in a better way than the Lists. Automatically exists as a form of data structure in Python.

This advantage of  Numpy  over Lists is that arrays in Numpy's library are fast in read and write operations and are considered more efficient and robust.

In this article and in the context of spreading knowledge of the Python programming language in the Arabic language,Here is an article explaining how to work with the Numpy Library  .

At the end of the article you will get the necessary knowledge to:

  • Install Numpy Library .
  • Create a Numpy Matrix .
  • Import and export of data from / to the array to / from external files .
  • Learn about array properties in the Numpy library .
  • Perform mathematical operations on matrices .
  • Perform Subsetting, Slicing, and Indexing operations .
  • Change the shape of matrices.
  • Get help from specialized functions in the Numpy Library.
  • Matrix methods, and perform additions and deletions.

The article also provides some helpful links for learning the Numpy Library 

Note / This article assumes that you are familiar with the concept of Arrays and the Python programming language.

The Numpy Matrix

There are some points where the Numpy array is similar to   the regular array List in Python, but there is a big difference at the same time.

The Numpy array is   one of the most important data structures in the Numpy library,  which is named after  Numerical Python . From the term we realize that this library is a library specialized in scientific computing in the Python language, and it contains a variety of tools and techniques that can be used to solve mathematical problems in the fields of science and engineering. One of the most important tools in the library is the High Performance Multidimensional Array, with which you can apply a lot of mathematical functions and mathematical operations to give you the ability to solve many problems.

Install Numpy Library

Numpy Library - The first step in data science in Python

Install the Numpy Python library

Before we start working with arrays in  Numpy , we must make sure that they exist within the libraries installed in the Python environment. We have several options, we can choose one of them to download the library, as follows:

First: using Python Wheels

Follow the following steps:

  • Make sure that the Python environment is installed on your machine. If not, you can install Python from here.

  • Make sure pip is running and Python is included in the PATH environment variable.

  • On the cmd screen, run the pip install –upgrade pip command to perform a pip update.

  • From  here  , download the wheel file that matches your version of Python.

  • Run the command pip install numpy_wheel_file

  • Open the Python command interpreter and make an import numpy library. If you don't see any error message, you have installed the library successfully, and you can then start working with it.

Second: by using the Python distribution Anaconda

To get the Numpy library  quickly  and easily, you can install Anaconda from  here .

You might be wondering, in real terms, what makes this option easier?

The good thing about getting Python with Anaconda is that you won't have to worry about  separately installing the Numpy library  or any other data-analyzing library like  Pandas   or  Scikit-learn .

If you are new to the world of Python or programming in general, it is very comfortable to use the Anaconda Python distribution, which contains more than 100 ready-made and specialized libraries in the field of data science, and it is approved by many data analysis scholars.

Also, Anaconda includes several open source development tools and environments like Jupyter and Spyder.

In short, I recommend you to install Anaconda to work with the Numpy library   and other data analysis libraries, and move on to great data science.

Create a Numpy Matrix

Numpy Library - The first step in data science in Python

Now that we've finished installing Python and the Numpy library  , it's time to get to work.

In order for us to start creating the array, we must import the library into the Python file we are writing (or in the Python command interpreter session) using the import numpy command, but we must follow the best practices here, as we must import the numpy library with the name np. If we do that, we will make Pythonians understand the code in an easier and faster way.

We use the np.array function to create the array as follows:

>> import numpy as np
>> arr = np.array([1,2,3,4,5,6])
>> print(arr)

Sometimes, we don't know what data will be in the array, its shape, or we want to import the array's data from another source. In these cases, we have to create empty arrays, or with raw values, or use special functions to fetch data from an external source.

Some methods for creating special matrices

To create a 3x4 matrix with all elements of number 1, we use the np.ones function:

>> np.ones((3,4))

To create an array with the same dimensions with all elements of 0, we use the np.zeros function:

>> np.zeros((3,4))

To create an array of 2 rows and 3 columns with all its elements from a number that we specify, we use the np.full function:

>> np.full((2,3), 10)

To get an array that starts at a certain number and ends with a value preceding another number by a given amount, we use the np.arange function:

>> np.arange(5,30,5)

To obtain an array that begins with a certain number and ends with another number, with specifying the number of elements of the matrix so that the range of its elements is between the two numbers, we use the np.linespace function:

>> np.linspace(1,5,5)

Using the Numpy library,  you  can create the unit / neutral array, which in English is called the Identity Matrix, by using one of the functions np.eye or np.identity.

Note /  unit matrix is ​​a square matrix in which the values ​​of the diagonal elements are 1 and the remainder is 0.

Importing data from external files

Numpy Library - The first step in data science in Python

When you deal with arrays in the Numpy library in   a practical and realistic way in the field of data analysis, it is most likely that you will not create arrays and give them values ​​directly, but rather that you will import these values ​​from external files of various shapes and types.

Below, we will learn how to import data from txt files and save them into an array.

We can fetch the data from a text file by using the np.loadtxt or np.genfromtxt functions.

Suppose you have a file called data.txt that contains text like this:

Table of
data
id      mobile      com

1       0555        0.2
2       9999        0.1
3       6565        0.2
4       888         0.2
5       74744       0.3

The following command imports this data from the file into the work environment:

>> id, mobile, com = np.loadtxt(‘data.txt’,skiprows=2,unpack=True)

In the previous code, we first entered the file name (located in the same path as the session or the Python file), then we entered the value of 2 for the skiprows property, which means ignoring the first two lines of the file (because they are texts and not numbers), and finally we specified the unpack property so that it is an output The function is an array of columns.

If the columns in the text file are separated by commas or you want to specify the data type, you can specify that by setting the delimiter and dtype properties respectively.

Save arrays to an external file

Here we are working on exporting the data in the array to external files, so we use the np.savetxt function.

>> x = np.arange(0,5,0.5)
>> np.savetxt(‘test.txt’,x,delimiter = ‘,’)

This is not the only way to export arrays to a text file, there are functions like save, savez, savez_compressed that have different purposes for the same target.

For more information and examples about functions and methods used to save and export arrays in the Numpy library, you can go  here to  read more and learn more.

Matrix properties

The following table lists the properties that you can query for an array called arr.

Matrix formarr.shape
Matrix dimensionsarr.ndim
The number of elements of the matrixarr.size
Data typearr.dtype
Size of matrix elements used in bytesarr.nbytes
The size of a single element in the array, in bytesarr.itemsize
Matrix information for memoryarr.flags

 

Now that you know how to create a Numpy array   either using the np.array function or with the np.loadtxt and np.genfromtxt function, it is time to get to know more about Numpy arrays   and perform the math operations on them.

It is important to mention that there are limitations to many mathematical operations performed on matrices. Among these restrictions is that if we want to perform a mathematical operation such as addition or subtraction between two matrices, the shape of the two matrices must be compatible or one of them must be of one element, as well as for the process of matrix multiplication and division among other restrictions.

In this article, I will not dwell on these restrictions and conditions as much as I will focus on introducing the library in the programming side.

Mathematical operations

Numpy Library - The first step in data science in Python

You can perform mathematical operations (addition, subtraction, multiplication, division, remainder) on matrices using the well-known symbols +, -, *, /,%, respectively.

In addition, the Numpy library  lets  you perform the same math operations as before using the np.add, np.substract, np.multiply, np.divide, np.remainder functions.

You can also perform advanced mathematical operations on matrices, such as exponentiation, square root, sin, cos, log, and dot multiply.

Here  you will find all the math operations that you can perform on Numpy Library arrays  .

In addition to the mathematical operations, it is worth noting that we can also compare elements of matrices to each other using comparison operations such as ==, <,>.

When we use the previous comparison operations, the result of the operation is a logical array in the same way that contains Boolean values ​​(True, False) expressing the result of the comparison for each element of the first matrix with the corresponding element in the second matrix.

>> import numpy as np
>> a = np.array([[1,2],[3,4]])
>> b = np.array([[2,1],[4,3]])
>> a == b

array([[False, False],

[False, False]], dtype=bool)

 

>> a > b

array([[False, True],

[False, True]], dtype=bool)

The np.array_equal function also compares two arrays, but the result is either True or False, so that if one of the elements of the first matrix is ​​almost not equal to the element corresponding to it in the second matrix, the result is False, and this assumes that the shape of the two arrays that are being compared is the same.

>> np.array_equal(a,b)

False

Besides the two array comparison operation, note that we can perform logical operations like OR, AND, NOT on arrays using the functions np.logical_or, np.logical_not, np.logical_and.

Subset, Slice, Index operations

Numpy Library - The first step in data science in Python

Besides performing mathematical operations on matrices, we can perform truncation of a specific part of the matrix to perform some operations on this part without the remainder of the matrix, or we may need to deal with individual elements of the matrix. In this case we need to perform operations like subset, slice or index on the array.

If you have dealt with the Lists in Python, then you will have the knowledge to perform the previous operations, but if you are new to this command then you should know the following two main points:

First:  You must write the name of the array followed by [] to inform Python that you are performing one of the previous operations.

Second:  Usually, you will pass numbers between [], but you can also enter  :  inside them beside or without the numbers, in order to tell Python which part of the array you are interested in.

An example is illustrated by the article.

>> import numpy as np
>> array1d = np.array([1,2,3,4,5,6])
>> array2d = np. array([[1, 2, 3, 4], [5, 6, 7, 8]])

Subsetting process:

To get the first element of array1d:

>> print( array1d[ 0 ] )

1

To get the element in row 1 and column 2 of array2d:


 

>> print( array2d[ 1 , 2 ])

7

Slicing process:

This process is similar to Subsetting, but is more advanced, in that you are not dealing with elements in specific places, but rather you get a range or regions of the array.

To get the first three elements of array1d:

>> print(array1d [ 0 : 3 ])

[1 2 3]

To get the row 0 and row 1 elements from column 1 of array2d:

>> print(array2d [ 0:2 , 1 ] )

[2 6]

Indexing process:

We'll only talk about Boolean Indexing, not Fancy Indexing.

To get elements greater than 3 in array1d:

>> print( array1d[ array1d > 3 ] )

[4 5 6]

To get the even elements from array2d:

>> print( array2d[ array2d % 2 == 0 ] )

[2 4 6 8]

Get help

We can use the help function to get the documentation for the functions, modules, or properties we are working with in Python.

But Numpy Library   offers other solutions to help you get more help or information about the code that you write using the np.lookfor and np.info functions.

Dealing with arrays

Array Transpose

Assuming you have a matrix A of the shape of 3 * 2, the output of the transposed matrix is ​​an array with the same elements but in a different order where the rows are columns and the columns are rows so that the transposed form of the matrix A is 2 * 3.

We can obtain the transpose of a matrix using the np.transpose function or the existing T attribute of the matrix:

>> np.transpose( array2d )

array([[1, 5],

[2, 6],

[3, 7],

[4, 8]])

 

>> array2d.T

array([[1, 5],

[2, 6],

[3, 7],

[4, 8]])

Note: One-dimensional arrays are not affected by the np.transpose function nor by the T property, and they give you the same function without change.

Change the shape of the matrix

I mentioned previously that there are some restrictions that must be taken into account when performing some operations on arrays, and among these conditions, the form of two arrays are compatible with operations such as dot multiplication.

But what would we do if we had incompatible arrays? In this case, the functions that change the shape of the array or enlarge it, such as the np.resize function.

When an array is passed to the np.resize function in addition to the required shape, and the new matrix is ​​larger than the original, the resulting matrix will be sequentially populated with values ​​from the same original matrix according to order and need.

>> np.resize( array2d, (4,6))

array([[1, 2, 3, 4, 5, 6],

[7, 8, 1, 2, 3, 4],

[5, 6, 7, 8, 1, 2],

[3, 4, 5, 6, 7, 8]])

The array resize function within its properties can be called for the same purpose, but in this case the values ​​to be populated will be zeros.

>> array2d.resize((4,6))

array([[1, 2, 3, 4, 5, 6],

[7, 8, 0, 0, 0, 0],

[0, 0, 0, 0, 0, 0],

[0, 0, 0, 0, 0, 0]])

Besides resize an array, it is possible to change the shape of an array using the reshape function, in which case the change will not affect the number of array elements.

To clarify further, suppose there is an array with the shape of 3 * 4, the number of its elements is 12. You can use the reshape function to change the shape of the function so that it maintains its 12 elements, so that shapes such as 4 * 3, 6 * 2 and 2 * 6 are acceptable shapes, otherwise You will see an error if you use another format.

>> array2d.reshape((4,2))

array([[1, 2],

[3, 4],

[5, 6],

[7, 8]])

Matrix methods

The ravel function "knocks" a matrix, converting any matrix with one or more dimensions to an array with one dimension.

>> array2d.ravel()

array([1, 2, 3, 4, 5, 6, 7, 8])

Matrix addition

When adding elements to an array, those elements are added to the end of the array, with the help of the np.append function, which makes it easy to add in the Numpy library arrays.

To add a new row

>> new_array = np.append( array2d, [[9,10,11,12]], axis = 0)
>> new_array

array([[ 1, 2, 3, 4],

[ 5, 6, 7, 8],

[ 9, 10, 11, 12]])

To add a new column

>> new_array = np.append(array2d,[[9],[10]],axis = 1)
>> new_array

array([[ 1, 2, 3, 4, 9],

[ 5, 6, 7, 8, 10]])

Note that when adding, we used the axis attribute that determines where to add, is it a row or column. (In two-dimensional matrix, axis = 1 means column, axis = 0 means row).

To add to the array in a specified location (not at the end of it) we use the np.insert function in the following way:

>> new_array = np.insert( array2d , (0,0) , 5, axis = 1)
>> new_array

array([[5, 5, 1, 2, 3, 4],

[5, 5, 5, 6, 7, 8]])

In the previous example, we added a column with a value of 5 in place (0,0) which represents the beginning of the array. To remove from a specific location in the array, we use the np.delete function in the same sense.

Links to learn:

1- Here you will find the Numpy_Python_Cheat_Sheet file  that  summarizes important information about Numpy Library.

2-  https://www.datacamp.com/community/blog/python-numpy-cheat-sheet#gs.0_T9D90

3-  http://cs231n.github.io/python-numpy-tutorial

4-  http://www.python-course.eu/numpy.php

5-  https://www.datacamp.com/community/tutorials/python-numpy-tutorial#gs.Ereixfc

read also :

Why should you learn the Python programming language now?


reactions

Comments