Data Analysis — Numpy contd…

Sanjeev Kumar
5 min readJun 30, 2021

Hello, readers welcome again, from the last blog you got many ideas about NumPy so let’s move further, one more step towards our journey of Data Science. Hope you all are enjoying this series a lot.

Linear Algebra with NumPy:-

NumPy has a module known as linalg which provides all the functionalities required to be done for Linear Algebra. Some of them I have mentioned below:-

dot: Gives us functionality of dot product

Det: Gives us the functionality to compute the determinant of array

inv: Gives us the functionality to compute the Inverse of array

and many more functionalities are there which can help you in your daily life.

Vectorized Operation:-

The NumPy array can hold data of a single type i.e array contains only 8-bit integer or 32-bit floating-point number but can not store a mixture of it. While python List and Tuple can store strings, integers, and other objects too.

“NumPy delegate the task of performing mathematical operations on the array’s contents to optimized compiled C code which is faster. This is a process that is referred to as vectorization.”

Let’s see an example:-

It is quite been surprising that vectorized sum of the above example on my system takes 89.2 µs while the iterative sum takes 138 ns which is a huge difference and I am sure if we increase the value the gap between the difference increase more.

Note:- Don’t worry aboute the “timeit” module It is just use to calculate the calculation time of any task assigned to the compiler. %timeit _function_ is used to calculate.

Universal Functions:-

Whenever we heard about the term universal the first thing that strikes our mind is that it must be used everywhere, yes you are right this is also somewhat similar.

The universal function is denoted by “ufuncs” in NumPy which are used to implement Vectorization. As we all know that it is the fastest method to iterate over elements.

Suppose we have two lists l1 = [1, 2, 3, 4] and l2 = [5, 6, 7, 8] and we have to add these two lists. we have 2 ways:

  1. To iterate over both of the lists and then sum each element.
  2. We have a ufuncs for this, called add(x, y) that will produce the same result but the vectorized method take much less time incomparably.

Broadcasting and Shape manipulation:-

In Python, Broadcasting refers to the ability of NumPy to treat arrays of different shapes during arithmetic operations. Arithmetic operations on arrays are usually done on corresponding elements. If two arrays are of the same shape, then these operations are smoothly performed but if the dimensions of two arrays are not the same, element-to-element operations are not possible. Because of this broadcasting capability NumPy capable to do operations on arrays of non-similar types. It simply stretches the dis-similar array to make it similar and perform the operation on it.

Broadcasting Rules:
Broadcasting two arrays together follow these rules:

  1. If the arrays don’t have the same rank then prepend the shape of the lower rank array with 1s until both shapes have the same length.
  2. The two arrays are compatible in a dimension if they have the same size in the dimension or if one of the arrays has size 1 in that dimension.
  3. The arrays can be broadcast together iff they are compatible with all dimensions.
  4. After broadcasting, each array behaves as if it had a shape equal to the element-wise maximum of shapes of the two input arrays.
  5. In any dimension where one array had size 1 and the other array had a size greater than 1, the first array behaves as if it were copied along that dimension.

Shape manipulation:-

NumPy provides us the flexibility to change the dimension of the array. There are several ways to change the dimension of an array but the most commonly used is reshape. Below is an example of reshaping.

Changing array shape:

1️. reshape(a, newshape[, order]) →Gives a new shape to an array without changing its data.

2️. ravel(a[, order]) →Return a contiguous flattened array.

3️. ndarray.flat →A 1-D iterator over the array.

4️. ndarray.flatten([order]) →Return a copy of the array collapsed into one dimension.

Boolean Masking:-

While indexing and slicing are handy and powerful but yet boolean masking is better than those. It is a special kind of array in NumPy which consists of boolean values after comparing any array with the condition provided.

As we pass the mask in the array x, wherever true will be there gets replaced by the assigned value. Below is the example:-

Date and Time:-

A date in Python is not a data type of its own, but we can import a module named datetime to work with dates as date objects.

But NumPy has no separate date and time objects, just a single “datetime64” object to represent a single moment in time. NumPy’s datetime64 object allows you to set its precision from hours all the way to attoseconds (10 ^ -18). we can get the date in a NumPy array in a particular format i.e year-month-day by using numpy.datetime64() method.

--

--

Sanjeev Kumar

Day Dreamer, Python Enthusiast, Future Data Scientist... for rest contact me on LinkedIn. https://www.linkedin.com/in/sanjeev-kumar-588242176/