{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "

NUMERICAL PYTHON

\n", "
\n", "\n", "

Python fo computational science

\n", "

CINECA, 16-18 October 2017

\n", "\n", "

m.cestari@cineca.it

\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", " ![](files/images/rubber_duck.png)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "https://en.wikipedia.org/wiki/Rubber_duck_debugging" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Goal of the lecture\n", "\n", "Learn \n", "* how to use the objects provided by numpy\n", "* how to read data from files and manipulate it\n", "* how to integrate python with plotting\n", "* and of course ...\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* ... how to have fun with it ...\n", "
\n", "![](files/images/xkcd.png)\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import antigravity" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "but don't get overly exicted\n", "
\n", "![](files/images/coding_drunk.png)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Introduction (1): history\n", "* Numerical Python: evolution\n", " * In origin, 2 different libraries Numeric and Numarray\n", " * They became NumPy (2006). All the features of the two original libraries (plus more..) converging in one library\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Introduction (2): why bother?\n", "* NumPy offers efficient array storing and computation\n", "* Python scientific libraries (i.e. scipy, matplotlib, pandas) make use of NumPy objects\n", "* NumPy is the __de facto__ standard" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", "

The Pydata stack

\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Where are the bottlenecks?\n", "Let’s start by trying to understand why high level languages like Python are slower than compiled code\n", "\n", "\n", "In Python the main reasons of the \"slowness\" of the code are:\n", "\n", "- Dynamic Typing\n", "- Data (memory) Access " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Dynamic typing\n", "Even for simple operations, the Python interpreter has a fair bit of work to do. For example, in the statement `a+b`, the interpreter has to know which operation to invoke\n", "\n", " - If `a` and `b` are integers, then `a+b` requires sum of integers\n", " - If `a` and `b` are strings, then `a+b` requires string concatenation\n", " - If `a` and `b` are lists, then `a+b` requires list concatenation\n", " \n", "The operator `+` is **overloaded**: its action depends on the type of the objects on which it acts\n", "\n", "As a result, Python must check the type of the objects and then call the correct operation: **this involves substantial overheads**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Static typing\n", "Compiled languages avoid these overheads with explicit, static types; consider the following C code, which sums the integers from 1 to 10\n", "\n", "```C\n", "#include \n", "\n", "int main(void) {\n", " int i;\n", " int sum = 0;\n", " for (i = 1; i <= 10; i++) {\n", " sum = sum + i;\n", " }\n", " printf(\"sum = %d\\n\", sum);\n", " return 0;\n", "}\n", "```\n", "The variables `i` and `sum` are explicitly declared to be integers, hence the meaning of addition here is completely unambiguous." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Data Access\n", "\n", "Another drag on speed for high level languages is data access; to illustrate, let’s consider the problem of summing some data, a collection of integers for example\n", "\n", "### Summing with Compiled Code\n", "In C or Fortran, these integers would typically be stored in an array, a simple data structure for storing homogeneous data. Such an array is stored in a single contiguous block of memory\n", "\n", "For example, a 64 bit integer is stored in 8 bytes of memory and an array of `n` such integers occupies `8xn` consecutive memory slots. \n", "\n", "Moreover, the compiler is made aware of the data type by the programmer; in this case 64 bit integers,\n", "hence each successive data point can be accessed by shifting forward in memory space by a known and fixed amount, in this case 8 bytes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Data Access\n", "\n", "\n", "### Summing in Pure Python\n", "\n", "Python tries to replicate these ideas to some degree; for example, in the standard Python implementation (CPython), list elements are placed in memory locations that are in a sense contiguous\n", "\n", "However, these list elements are more like pointers to data rather than actual data, hence there is still overhead involved in accessing the data values themselves\n", "\n", "This is a considerable drag on speed: in fact it is generally true that the memory traffic is a major contributor to the \"slowness\" of the python code\n", "\n", "Let’s look at some ways around these problems" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## what's inside the box\n", "
\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Using numpy\n", "\n", "* Default for this presentation (and for the standard documentation)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## numpy (main) objects\n", "\n", "* NumPy provides two main objects:\n", " * __ndarray__;\n", " * __ufunc__;\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ndarray\n", "\n", "Like lists, tuples and sets, ndarrays are collection of items\n", "```python\n", ">>> myarray, type(myarray)\n", "(array([0, 1, 2, 3, 4, 5, 6, 7, 8]), numpy.ndarray)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ndarray/2\n", "\n", "Examples:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1., 2, 3]) # try also: \n", " # b = np.array([[1.5, 2.2, 34.4], [2.3, 4, 6.5]])\n", " # c = np.array(['lof', 'l', 'ga']) \n", "a.dtype" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array(['lof', '3', '2'],\n", " dtype=' \n", "typedef struct PyArrayObject {\n", "char *data; /* pointer to raw data buffer */ \n", "PyArray_Descr *descr; /* Pointer to type structure */ \n", "int nd; /* number of dimensions, also called ndim */\n", "npy_intp *dimensions; /* size in each dimension (shape) */\n", "npy_intp *strides; /* bytes to jump to get to the next element in each dimension */\n", "PyObject *base; /* This object should be decref'd upon deletion of array */ \n", " /* For views it points to the original array */ \n", "int flags; /* Flags describing array */\n", "} PyArrayObject;\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ndarray: data\n", "\n", "```C\n", "char *data; /* pointer to raw data buffer */ \n", "PyArray_Descr *descr;/* pointer to type structure */\n", "```\n", "\n", "* `data` is just a pointer to bytes in memory\n", "* `data` needs some sort of descriptor (dtype) to be interpreted\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1, 2, 3]) # buffer of 12 bytes intepreted as 3 ints\n", "a.dtype # int32 -> 32bits -> 4 bytes" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 1., 2., 3.])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = a.astype(np.float) # dtype('float64') \n", "b" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ndarray: some dtype types\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ndarray: dtype cont'ed\n", "\n", "* the elements of `ndarrays` can be specified with a dtype object:\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "__numerical__\n", "```python\n", ">>> np.dtype(np.int32) # 32 are the bits used to \n", "dtype('np.int32') # represent the integer \n", " # (4 bytes int) \n", "```\n", "__strings__\n", "```python\n", ">>> np.dtype('S3') # mind the capital case here\n", " # 3 means 3 bytes \n", "```\n", "__both__\n", "```python\n", ">>> np.dtype([('f1', np.uint), ('f2', 'S3')])\n", "dtype([('f', '>> help(np.dtype) (python)\n", ">>> np.dtype? (ipython)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 1: dtype\n", "* Solution" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype([('symb', 'S3'), ('PA', '\n", "typedef struct PyArrayObject {\n", "char *data; /* pointer to raw data buffer */ \n", "PyArray_Descr *descr; /* Pointer to type structure */ \n", "int nd; /* number of dimensions, also called ndim */\n", "npy_intp *dimensions; /* size in each dimension (shape) */\n", "npy_intp *strides; /* bytes to jump to get to the next element in each dimension */\n", "PyObject *base; /* This object should be decref'd upon deletion of array */ \n", " /* For views it points to the original array */ \n", "int flags; /* Flags describing array */\n", "} PyArrayObject;\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ndarray: dimensions\n", "\n", "```C\n", "int nd; /* number of dimensions, also called ndim */\n", " \n", "npy_intp *dimensions; /* size in each dimension */\n", "``` " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([ [1,2,5], [3,4,6] ])\n", "b.shape" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.shape" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[1],\n", " [2],\n", " [5],\n", " [3],\n", " [4],\n", " [6]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.shape = (6,1) # reshaping the array\n", "b" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## numpy array summary\n", "\n", "\"Drawing\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Creating arrays/1\n", "\n", "__5 general mechanism__ to create arrays:\n", "\n", "__(1)__ Conversion from other Python structures (e.g. lists, tuples): np.array function" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 2 3 7] \n" ] } ], "source": [ "list_ = [0,2,3,7]\n", "a = np.array(list_)\n", "print(a, type(a))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 2, 3, 7],\n", " [0, 2, 3, 7]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([list_,list_]) # l, m python lists\n", "b = np.asarray(b) # convert b to an array if b is, \n", " # i.e. a list; otherwise does nothing \n", "b" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Creating arrays/2\n", " \n", "__(2)__ Intrinsic `NumPy` array creation functions" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", "[[ 1. 1.]\n", " [ 1. 1.]\n", " [ 1. 1.]]\n" ] } ], "source": [ "a_ = np.zeros(10)\n", "print(a_)\n", "b_ = np.ones(shape=(3,2))\n", "print(b_)\n", "np.ones?" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.14492754 0.28985507 0.43478261 0.57971014\n", " 0.72463768 0.86956522 1.01449275 1.15942029 1.30434783\n", " 1.44927536 1.5942029 1.73913043 1.88405797 2.02898551\n", " 2.17391304 2.31884058 2.46376812 2.60869565 2.75362319\n", " 2.89855072 3.04347826 3.1884058 3.33333333 3.47826087\n", " 3.62318841 3.76811594 3.91304348 4.05797101 4.20289855\n", " 4.34782609 4.49275362 4.63768116 4.7826087 4.92753623\n", " 5.07246377 5.2173913 5.36231884 5.50724638 5.65217391\n", " 5.79710145 5.94202899 6.08695652 6.23188406 6.37681159\n", " 6.52173913 6.66666667 6.8115942 6.95652174 7.10144928\n", " 7.24637681 7.39130435 7.53623188 7.68115942 7.82608696\n", " 7.97101449 8.11594203 8.26086957 8.4057971 8.55072464\n", " 8.69565217 8.84057971 8.98550725 9.13043478 9.27536232\n", " 9.42028986 9.56521739 9.71014493 9.85507246 10. ]\n", "70\n" ] } ], "source": [ "c_ = np.linspace(0,10,70) # start, stop, npoints \n", "print(c_)\n", "print(c_.size)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 3 5 7 9]\n" ] } ], "source": [ "d_ = np.arange(1,10,2) # try also np.arange(1,10,2.)\n", "print(d_) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Creating arrays/3\n", " \n", "Try the shortcut: __np.r_ __\n", "\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2]\n", "[ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]\n", "[ 0. 0.10526316 0.21052632 0.31578947 0.42105263 0.52631579\n", " 0.63157895 0.73684211 0.84210526 0.94736842 1.05263158 1.15789474\n", " 1.26315789 1.36842105 1.47368421 1.57894737 1.68421053 1.78947368\n", " 1.89473684 2. ]\n" ] } ], "source": [ "a = np.r_[0,1,2] # ~ np.array\n", "print(a)\n", "a = np.r_[0:1:0.1] # np.arange\n", "print(a)\n", "a = np.r_[0:2:20j] # np.linspace\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Creating arrays/4\n", "\n", "__(4)__ Reading arrays from disk, either from standard or custom formats (mind the __byte order!!__)\n", "\n", "```\n", "loadtxt, genfromtxt # custom ASCII formats\n", "\n", "fromfile() # custom binary\n", ".tofile() \n", "\n", "h5py, pytables # binary standard formats \n", "netcdf-python \n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " ## Creating arrays/5\n", "\n", "__Reading__ from ASCII text files\n", "```\n", "loadtxt(fname[, dtype=, comments=, ...])\n", "```\n", "__Save__ an array to ASCII text file (auto recognizes if .gz) \n", "```\n", "savetxt(fname, X[, fmt=, delimiter=])\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* `loadtxt`   is faster but less flexible\n", "* `genfromtxt`   runs two loops, the first reads all the lines (→strings), the second parses the line and converts each string to a data type\n", " * It has the advantage of taking into account missing data\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 2: creating array\n", "\n", "1. Create a ndarray of 6x5 elements, all set to 1\n", "```\n", ">>> help(np.ones)\n", "```\n", "\n", "2. Reshape the array to 3x10\n", "\n", "3. Download 'mol.xyz' from here and build a new array of atom dtype\n", "```\n", ">>> help(np.loadtxt)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 2: creating array\n", "\n", "1.\n", "```python\n", ">>> a = np.ones((6,5))\n", "```\n", "2.\n", "```python\n", ">>> a.shape = 3,10\n", "```\n", "3.\n", "```python\n", ">>> dt = np.dtype([('symb','|S3'),('PA',np.int), ('x',np.float64),('y',np.float64),('z',np.float64)]) \n", ">>> arr = np.loadtxt('mol.xyz',dtype=dt)\n", "```" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(b'C', 12, -0.102, -0.301, -0.276) (b'C', 12, -0.024, -0.189, 1.125)\n", " (b'C', 12, 1.254, -0.165, 1.713) (b'C', 12, 2.406, -0.245, 0.941)\n", " (b'C', 12, 2.311, -0.359, -0.456) (b'C', 12, 1.042, -0.388, -1.058)\n", " (b'C', 12, -1.248, -0.099, 1.954) (b'C', 12, -1.288, 0.693, 3.11 )\n", " (b'C', 12, -2.434, 0.786, 3.899) (b'C', 12, -3.583, 0.07 , 3.539)\n", " (b'C', 12, -3.563, -0.728, 2.385) (b'C', 12, -2.417, -0.807, 1.61 )\n", " (b'O', 16, -4.755, 0.081, 4.233) (b'C', 12, -4.849, 0.871, 5.418)\n", " (b'C', 12, -6.253, 0.694, 5.978) (b'C', 12, -6.478, 1.501, 7.263)\n", " (b'C', 12, -7.892, 1.324, 7.828) (b'C', 12, -8.114, 2.125, 9.104)\n", " (b'O', 16, -9.453, 1.885, 9.535) (b'C', 12, -9.9 , 2.496, 10.668)\n", " (b'C', 12, -11.226, 2.214, 11.032) (b'C', 12, -11.774, 2.784, 12.17 )\n", " (b'C', 12, -11.027, 3.656, 12.986) (b'C', 12, -9.706, 3.927, 12.605)\n", " (b'C', 12, -9.138, 3.36 , 11.464) (b'C', 12, -11.617, 4.262, 14.203)\n", " (b'C', 12, -12.527, 3.546, 15.001) (b'C', 12, -13.084, 4.108, 16.143)\n", " (b'C', 12, -12.737, 5.415, 16.525) (b'C', 12, -11.828, 6.143, 15.739)\n", " (b'C', 12, -11.283, 5.572, 14.597) (b'C', 12, -13.305, 5.999, 17.703)\n", " (b'N', 14, -13.767, 6.475, 18.66 ) (b'H', 1, -2.416, -1.453, 0.738)\n", " (b'C', 12, 3.495, -0.445, -1.257) (b'H', 1, -13.779, 3.538, 16.751)\n", " (b'H', 1, -12.781, 2.525, 14.737) (b'H', 1, -10.607, 6.158, 13.984)\n", " (b'H', 1, -11.566, 7.156, 16.025) (b'H', 1, -12.809, 2.572, 12.418)\n", " (b'H', 1, -11.807, 1.552, 10.399) (b'H', 1, -8.109, 3.587, 11.215)\n", " (b'H', 1, -9.091, 4.572, 13.225) (b'H', 1, -7.406, 1.815, 9.886)\n", " (b'H', 1, -7.966, 3.199, 8.921) (b'H', 1, -8.637, 1.634, 7.085)\n", " (b'H', 1, -8.083, 0.265, 8.041) (b'H', 1, -6.293, 2.566, 7.066)\n", " (b'H', 1, -5.74 , 1.197, 8.019) (b'H', 1, -6.977, 0.996, 5.212)\n", " (b'H', 1, -6.422, -0.372, 6.169) (b'H', 1, -4.094, 0.543, 6.148)\n", " (b'H', 1, -4.653, 1.927, 5.183) (b'H', 1, -2.423, 1.421, 4.777)\n", " (b'H', 1, -0.416, 1.277, 3.389) (b'H', 1, -4.457, -1.285, 2.126)\n", " (b'H', 1, -1.074, -0.294, -0.759) (b'H', 1, 0.963, -0.467, -2.138)\n", " (b'H', 1, 3.384, -0.232, 1.412) (b'H', 1, 1.344, -0.109, 2.793)\n", " (b'N', 14, 4.457, -0.515, -1.908)]\n" ] } ], "source": [ "dt = np.dtype([('symb','|S3'),('PA',np.int), ('x',np.float64),('y',np.float64), ('z', np.float64)])\n", "arr = np.loadtxt('mol.xyz', dtype=dt)\n", "print(arr)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Creating arrays/6\n", "* From the ndarray just built we can save a binary file with the ndarray data\n", "\n", "* It can be read with np.fromfile\n", "\n", "* Be careful when using np.fromfile w/ fortran binary files, you may need to deal with __field separators__...\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(b'C', 12, -0.102, -0.301, -0.276) (b'C', 12, -0.024, -0.189, 1.125)\n", " (b'C', 12, 1.254, -0.165, 1.713) (b'C', 12, 2.406, -0.245, 0.941)\n", " (b'C', 12, 2.311, -0.359, -0.456) (b'C', 12, 1.042, -0.388, -1.058)\n", " (b'C', 12, -1.248, -0.099, 1.954) (b'C', 12, -1.288, 0.693, 3.11 )\n", " (b'C', 12, -2.434, 0.786, 3.899) (b'C', 12, -3.583, 0.07 , 3.539)\n", " (b'C', 12, -3.563, -0.728, 2.385) (b'C', 12, -2.417, -0.807, 1.61 )\n", " (b'O', 16, -4.755, 0.081, 4.233) (b'C', 12, -4.849, 0.871, 5.418)\n", " (b'C', 12, -6.253, 0.694, 5.978) (b'C', 12, -6.478, 1.501, 7.263)\n", " (b'C', 12, -7.892, 1.324, 7.828) (b'C', 12, -8.114, 2.125, 9.104)\n", " (b'O', 16, -9.453, 1.885, 9.535) (b'C', 12, -9.9 , 2.496, 10.668)\n", " (b'C', 12, -11.226, 2.214, 11.032) (b'C', 12, -11.774, 2.784, 12.17 )\n", " (b'C', 12, -11.027, 3.656, 12.986) (b'C', 12, -9.706, 3.927, 12.605)\n", " (b'C', 12, -9.138, 3.36 , 11.464) (b'C', 12, -11.617, 4.262, 14.203)\n", " (b'C', 12, -12.527, 3.546, 15.001) (b'C', 12, -13.084, 4.108, 16.143)\n", " (b'C', 12, -12.737, 5.415, 16.525) (b'C', 12, -11.828, 6.143, 15.739)\n", " (b'C', 12, -11.283, 5.572, 14.597) (b'C', 12, -13.305, 5.999, 17.703)\n", " (b'N', 14, -13.767, 6.475, 18.66 ) (b'H', 1, -2.416, -1.453, 0.738)\n", " (b'C', 12, 3.495, -0.445, -1.257) (b'H', 1, -13.779, 3.538, 16.751)\n", " (b'H', 1, -12.781, 2.525, 14.737) (b'H', 1, -10.607, 6.158, 13.984)\n", " (b'H', 1, -11.566, 7.156, 16.025) (b'H', 1, -12.809, 2.572, 12.418)\n", " (b'H', 1, -11.807, 1.552, 10.399) (b'H', 1, -8.109, 3.587, 11.215)\n", " (b'H', 1, -9.091, 4.572, 13.225) (b'H', 1, -7.406, 1.815, 9.886)\n", " (b'H', 1, -7.966, 3.199, 8.921) (b'H', 1, -8.637, 1.634, 7.085)\n", " (b'H', 1, -8.083, 0.265, 8.041) (b'H', 1, -6.293, 2.566, 7.066)\n", " (b'H', 1, -5.74 , 1.197, 8.019) (b'H', 1, -6.977, 0.996, 5.212)\n", " (b'H', 1, -6.422, -0.372, 6.169) (b'H', 1, -4.094, 0.543, 6.148)\n", " (b'H', 1, -4.653, 1.927, 5.183) (b'H', 1, -2.423, 1.421, 4.777)\n", " (b'H', 1, -0.416, 1.277, 3.389) (b'H', 1, -4.457, -1.285, 2.126)\n", " (b'H', 1, -1.074, -0.294, -0.759) (b'H', 1, 0.963, -0.467, -2.138)\n", " (b'H', 1, 3.384, -0.232, 1.412) (b'H', 1, 1.344, -0.109, 2.793)\n", " (b'N', 14, 4.457, -0.515, -1.908)]\n" ] } ], "source": [ "arr.tofile('mol_bin') \n", "arr2 = np.fromfile('mol_bin', dtype=dt) \n", "print(arr2)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-r--r-- 1 mirko mirko 2135 ott 15 10:53 mol_bin\n" ] } ], "source": [ "%%bash\n", "ls -lrt mol_bin " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Creating arrays/7\n", "\n", "__(5)__ Special library functions (e.g. random, ...) or _ad hoc_ defined function" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 1., 1., 1.],\n", " [ 1., 2., 3., 4.],\n", " [ 1., 3., 5., 7.]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def func(i, j):\n", " return i*j+1\n", "\n", "np.fromfunction(func, (3,4)) # meshgrid" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0.91556047, 0.13240769, 0.27650742, 0.79593441])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.rand(4)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Why a numpy array is useful?\n", "\n", "memory-efficient container that provides fast numerical operations.\n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.56 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "L = list(range(10000))\n", "\n", "t1 = %timeit -o [i**2 for i in L] # list elements squared" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10.1 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "a = np.arange(10000)\n", "\n", "t2 = %timeit -n 1000 -o a**2" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "numpy speedUP: 698.4395632553798\n" ] } ], "source": [ "print(\"numpy speedUP:\", t1.best/t2.best)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## ndarray attributes\n", "\n", "Some useful ndarray attributes are: (let arr be the array)\n", "* __`arr.dtype`__   data-type object for this array\n", "```python\n", "a = np.ones(10); a.dtype\n", "dtype('float64')\n", "```\n", "* __`arr.shape`__   returns a tuple with the size of the array. Changing the tuple re-shapes the array\n", "* __`arr.ndim`__   number of dimension in array\n", "* __`arr.size`__   total number of elements\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## (some) ndarray functions\n", "Some useful ndarray functions are:\n", "* __`arr.copy()`__   returns a copy of the array\n", "* __`arr.view()`__   returns a new view of the array using the same data of arr (save memory space)\n", "* __`arr.reshape()`__   returns an array with a new shape\n", "* __`arr.transpose()`__   returns an array view with the shape transposed\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## (some) ndarray functions/2\n", "* __`arr.sum()`__\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a = np.array([[1,4,2],[3,1,8]])\n", "print(a)\n", "print(a.shape)\n", "a.sum(axis=1) # try also axis=None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## (some) ndarray functions/3\n", "```\n", ">>> a = np.array([[1,1],[2,2]])\n", ">>> a.sum(axis=0)\n", ">>> array([ 3, 3])\n", "```\n", "![](files/images/sum.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## (some) ndarray functions/4\n", "* __`arr.min|max()`__   return min (or max) value\n", "* __`arr.argmin|argmax()`__   return the index of min (max) value\n", "* __`arr.var|mean|std|prod()`__   take the usual meaning\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## (some) ndarray functions/5\n", "__`sort`__   array sorting\n", "```python\n", ">>> a = np.array([[1,4],[3,1]])\n", ">>> np.sort(a) # sort along the last axis\n", "array([[1, 4],\n", " [1, 3]])\n", ">>> np.sort(a, axis=0) # sort along x axis\n", "array([[1, 1],\n", " [3, 4]])\n", ">>> np.sort(a, axis=None) # sort the flat array\n", "array([1, 1, 3, 4])\n", ">>> np.sort(a, order=field) # 'field' of a structured array \n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic routines overview\n", "* Creating arrays routines (mentioned earlier in this presentation)\n", "* Operation on two or more arrays (`dot, ...`)\n", "* Shape functions (`split, resize, concatenate, stack`)\n", "* Basic functions (`average, cov, vectorize,...`)\n", "* Two-dimensional array functions (`eye, diag, ...`) \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Indexing\n", "* Indexing means selecting some of the array's values\n", "* ndarrays can be indexed using the following syntax:\n", "
`X[obj]`
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
`X[obj]`
\n", "Depending on __obj__, a different indexing is triggered:\n", "* __Basic slicing__ when __obj__ is an integer, or a tuple (or list) ofintegers, and slice objects.\n", " * always returns a __view__ of the array\n", "* __Record access__ when __obj__ is the string of a field of structured array\n", "* __Advanced selection__ when __obj__ contains a ndarray of data type integer or bool.\n", " * always returns a copy of the array" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic slicing/1\n", "* A slice object is represented by X[__I:J:K__]\n", " * __I__ = __first element__ (included, default is 0)\n", " * __J__ = __last element__ (excluded, default is length of the array)\n", " * __K__ = __step or stride __ (default is 1)\n", " \n", "* Slicing for ndarray works similarly to standard Python sequences but can be done over __multiple dimensions__\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic slicing/2\n", "Examples: given a two dimensional __5x10__ ndarray “A”\n", "* `A[3] = A[3,:] = A[3,::]`   represents the __4th__ row of the array (10 elements)\n", "* `A[:,1] = A[::,1]`   represents the __2nd__ column of the array (5 elements)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic slicing/3\n", "\n", "![](files/images/slice.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic slicing/4\n", "* Ellipses (...) can be used to replace zero or more “:” terms so that the total number of dimensions in the slicing tuple matches the shape of the ndarray" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "a = np.arange(60) \n", "a.shape=(3,4,5) \n", "print(a[...,3]) # print a[:,:,3] → (3,4)\n", "print('='*30)\n", "print(a[1,...,3]) # print a[1,:,3] → (4)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 3: Basic slicing\n", "* Build a 2-dim array, then calculate all the average values along the rows for its odd and even columns.\n", "\n", "![](files/images/slice_ex.png)\n", "\n", "`hint: np.mean() or arr.mean()`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Record access\n", "The fields of structured arrays can be accessed by indexing the array with string\n", "```python\n", ">>> ra\n", "array([(b'H', 1), (b'C', 6)], \n", " dtype=[('at', 'S3'), ('Z', '>> ra['at'] # array(['H', 'C'], dtype='|S3')\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Advanced selection\n", "* Indexing arrays with other arrays\n", "* `X[obj]` → `obj` is a `ndarray`\n", "* There are two different ways:\n", " * __Boolean index arrays__. Involve giving a boolean array of the proper shape to indicate the values to be selected\n", " * __Integer index arrays__. Use one or more arrays of index values.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Adv. selection: boolean index arrays\n", "* ```python\n", " a = np.array([True, False, True]) # bool\n", " ```\n", "* Boolean arrays must be of the (or _broadcastable_ to the) same shape of the array being indexed" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "y = np.arange(18).reshape(3,6)\n", "print(y)\n", "y>6\n", "b = y > 6\n", "y[y>6] # 1-d array" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 4 ndarray selection\n", "* From the atoms array calculate the contribution to the molecular weight for each of the different kind of atoms (C, N, H, O) making use of the Numpy selection\n", "\n", "$PM_c = \\sum PA_c $\n", "\n", "Hint: use the advanced boolean section" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 4 ndarray selection\n", "* Solution\n", "```python\n", ">>> arr[arr['symb']=='C']['PA'].sum() # 372\n", ">>> arr[arr['symb']=='H']['PA'].sum() # 26\n", ">>> arr[arr['symb']=='N']['PA'].sum() # 28\n", ">>> arr[arr['symb']=='O']['PA'].sum() # 32\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ndarray selection\n", "To summarize, selection:\n", "* can save you __time__ (and code lines)\n", "* enhances the code __performances__\n", " * avoid traversing array through python loops \n", " * internal (C) loops are still performed... more on that later)\n", "* can save __memory__ (in place selection)\n", " \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## NumPy (main) objects\n", "NumPy provides two fundamental objects:\n", "* __ndarray__;\n", "* __ufunc__; functions that operate on one or more `ndarrays` __element-by-element__\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## ufunc" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Universal functions/1\n", "* Universal functions, ufuncs are functions that __operate__ on the `ndarray` objects \n", "* all `ufuncs` are instances of a general class, thus they behave the same. All `ufuncs` perform __element-by-element__ operations on the entire `ndarray(s)`\n", "* These functions are __wrapper__ of some core functions, typically implemented in compiled (C or Fortran) code" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Universal functions/2\n", "\n", "![](files/images/ufunc.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Universal functions/3\n", "* Ufunc loop may look something like\n", "
\n",
    "void ufunc_loop(void **args, int *dimensions, int *steps, void *data)\n",
    "{\n",
    "    /* int8 output = elementwise_function(int8 input_1, int8  \n",
    "    * input_2)\n",
    "    *\n",
    "    * This function must compute the ufunc for many values at \n",
    "    * once, in the way shown below.\n",
    "    */\n",
    "    char *input_1 = (char*)args[0];\n",
    "    char *input_2 = (char*)args[1];\n",
    "    char *output = (char*)args[2];\n",
    "    int i;\n",
    "    for (i = 0; i < dimensions[0]; ++i) {\n",
    "        *output = elementwise_function(*input_1, *input_2);\n",
    "        input_1 += steps[0];\n",
    "        input_2 += steps[1];\n",
    "        output += steps[2];\n",
    "    }\n",
    "}\n",
    "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Universal functions/4\n", "\n", "Examples:\n", "```python\n", ">>> a = np.array([a1,a2,...,an]) \n", "```\n", "```python\n", ">>> b = 1.5*a + 2 # * + / - all ufuncs \n", "```\n", "* all the elements of “a” are multiplied by 1.5; result stored in a __temporary array__\n", "* All the elements of the temp array are increased by 2; result stored in another __temporary array__\n", "* b becomes a reference to the __second__ temp array\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Universal functions (3)\n", "* What happens when ufunc has to operate on two arrays of __different dimensions__?\n", "* To be efficient, element-wise operations are required\n", "* To operate element-by-element broadcasting is needed.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/images/broadcasting1.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/images/broadcasting2.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Broadcasting/1\n", "* Broadcasting allows to deal in a meaningful way with inputs that do not have the same shape.\n", "* After having applied the broadcasting rules the shape of all arrays must match\n", "* let's see when different shapes are compatible from the broadcasting standpoint\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Broadcasting/2\n", "\n", "Broadcasting follows these rules:\n", "\n", "__1.__ Prepend 1 until all the arrays dimension matches\n", "\n", "```\n", "a.shape (2,2,3,6) # a.ndim = 4 \n", "b.shape (3,6) # b.ndim = 2 \n", "\n", "1 → b.ndim = 4\n", "1 → b.shape (1,1,3,6)\n", "```\n", " \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "__2.__ Arrays with a size of 1 along a particular dimension act like the array with the __largest size__ along that dimension.\n", " * The values of the array are assumed to be __the same along the broadcasted dimension__" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Broadcasting/3\n", "* 1d-ndarrays are always broadcastable\n", "\n", "```python\n", "a.shape → (3) b.shape -> (4,3); \n", "a.shape → (1,3) # first rule, n. of dim matches\n", "a.shape → (4,3) # second rule\n", "c = a*b \n", "c.shape → (4,3) # as expected \n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "# another example\n", "a.shape → (2,1,3) \n", "b.shape → (2,4,1); \n", "c = a*b c.shape → (2,4,3) # second rule\n", "\n", "# Arrays not broadcastable\n", "a.shape → (2,5,3)\n", "b.shape → (2,4,1); \n", "c = a*b # shape mismatch\n", "```\n", " \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Broadcasting/4\n", "* If needed it is possible to add a trailing dimension to the shape of a `ndarray`. This is a convenient way of taking the outer product (or any other outer operation) of two arrays\n", "```python\n", "a.shape → (3) b.shape → (4); \n", "c = a[:,np.newaxis] # c.shape → (3,1)\n", "c * b \n", "c.shape → (3,4) \n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Universal functions: performance\n", "* Iterations over array should be avoided because they are not efficient. Performance using array arithmetics is better than using python loops.\n", " \n", "* ~ 18x speedup (standard across NumPy)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Using ufunc\n", "x = np.arange(10**7)\n", "%timeit y = x*3" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Using for loop\n", "%timeit y = [x*3 for x in range(10**7)] " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Efficiency note\n", "The operation\n", "```python\n", ">>> b = 2*a + 2\n", "```\n", "can be done with in-place operations\n", "```python\n", ">>> b = a # b is a reference to a\n", ">>> b *= 2\n", ">>> b += 2\n", "```\n", "More efficient and doesn't require storing temp array" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "These operations affect array “a” as well; we can avoid this with\n", "```python\n", ">>> b = a.copy()\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Other in-place operations\n", "```python\n", ">>> b += a \n", ">>> b *= a \n", ">>> b **= a\n", ">>> b -= a\n", ">>> b /= a\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Efficiency note/2:\n", "* Let's consider the function\n", "```python\n", ">>> def myfunc(x): # if x is an array\n", " if x<0: # x<0 becomes a bool array\n", " return 0\n", " else: \n", " return sin(x)\n", "```\n", "\n", "* “if x<0” results in a boolean array and __cannot__ be evaluated by the if statement" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Efficiency note (3): Vectorization\n", "We can vectorize our function using the function where\n", "```python\n", ">>> def vecmyfunc(x):\n", " y = np.where(x<0, 0, np.sin(x))\n", " return y # now returns an array\n", "```\n", "`np.where`\n", "```python\n", "np.where(condition, x, y) # condition → bool array \n", " # condition is true → x \n", " # condition is false → y \n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## NumPy universal functions/1\n", "* There are more than 60 ufuncs\n", "* Some `ufuncs` are called automatically: i.e. np.multiply(x,y) is called internally when a*b is typed\n", "* NumPy offers trigonometric functions, their inverse counterparts, and hyperbolic versions as well as the exponential and logarithmic functions. Here, a few examples:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", ">>> b = np.sin(a)\n", ">>> b = np.arcsin(a)\n", ">>> b = np.sinh(a)\n", ">>> b = a**2.5 # exponential\n", ">>> b = np.log(a)\n", ">>> b = np.exp(a)\n", ">>> b = np.sqrt(a)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## NumPy universal functions/2\n", "* Comparison functions: __greater, less, equal, logical_and/_or/_xor/_nor, maximum, minimum, ...__\n", "```python\n", ">>> a = np.array([2,0,3,5])\n", ">>> b = np.array([1,1,1,6])\n", ">>> np.maximum(a,b)\n", "array([2, 1, 3, 6])\n", "```\n", "* Floating point functions: __floor, ceil, isreal, iscomplex,...__\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 5 \n", "![](files/images/inertia_t.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Exercise 5: universal functions\n", "* Calculate the molecular center of mass\n", "$$\n", "xcm=\\frac{1}{MW}\\sum_ix_i m_i \\;\\;\\;\\;\\;\\;\\; \n", "ycm=\\frac{1}{MW}\\sum_iy_i m_i \\;\\;\\;\\:\\;\\;\\; \n", "zcm=\\frac{1}{MW}\\sum_iz_i m_i\n", "$$ \n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Calculate the moment of inertia tensor\n", "$$ \n", "\\left[ \\begin{array}{ccc}\n", "I_{11} & I_{12} & I_{13} \\\\\n", "I_{21} & I_{22} & I_{23} \\\\\n", "I_{31} & I_{32} & I_{33} \\end{array} \\right] \n", "$$\n", "whose elemets can be calculated as:\n", "$$\n", "I_{11} = I_{xx} = \\sum_i m_i(y_i^2+z_i^2) \\;\\;\\;\\;\\;\\;\\;\n", "I_{11} = I_{xx} = \\sum_i m_i(y_i^2+z_i^2) \\;\\;\\;\\;\\;\\;\\;\n", "I_{11} = I_{xx} = \\sum_i m_i(y_i^2+z_i^2) \n", "$$\n", "$$\n", "I_{12} = I_{xy} = -\\sum_i m_ix_iy_i \\;\\;\\;\\;\\;\\;\\;\n", "I_{12} = I_{xy} = -\\sum_i m_ix_iz_i \\;\\;\\;\\;\\;\\;\\;\n", "I_{12} = I_{xy} = -\\sum_i m_iy_iz_i\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# molecular center of mass\n", "MW = arr['PA'].sum() \n", "xcm = 1./MW * (arr['PA']*arr['x']).sum()\n", "ycm = 1./MW * (arr['PA']*arr['y']).sum()\n", "zcm = 1./MW * (arr['PA']*arr['z']).sum()\n", "print(xcm,ycm,zcm)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# moment of inertia tensor\n", "Ixx = (arr['PA']*(arr['y']**2 + arr['z']**2)).sum()\n", "Iyy = (arr['PA']*(arr['x']**2 + arr['z']**2)).sum()\n", "Izz = (arr['PA']*(arr['x']**2 + arr['y']**2)).sum()\n", "Ixy = -(arr['PA']*arr['x']*arr['y']).sum()\n", "Ixz = -(arr['PA']*arr['x']*arr['z']).sum()\n", "Iyz = -(arr['PA']*arr['y']*arr['z']).sum()\n", "\n", "TI = np.array([[Ixx, Ixy, Ixz],[Ixy, Iyy, Iyz], [Ixz, Iyz, Izz]])\n", "print(TI)\n", "mTI = np.matrix(TI)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Standard classes inheriting from ndarray\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Matrix objects/1\n", "`Matrix` objects inherit from ndarray classes\n", "* Matrix can be created using a Matlab-style notation\n", "```python\n", ">>> a = np.matrix('1 2; 3 4')\n", ">>> print a\n", "[[1 2]\n", " [3 4]]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* Matrix are always __two-dimensional__ objects\n", "* Matrix objects over-ride multiplication to be __matrix-multiplication__, and over-ride power to be __matrix raised to a power__\n", "* Matrix objects have higher __priority__ than ndarrays. Mixed operations result therefore in a matrix object " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Matrix objects/2\n", "\n", "* Matrix objects have special attributes:\n", " * `.T`   returns the transpose of the matrix\n", " * `.H`   returns the conjugate transpose of the matrix\n", " * `.I`   returns the inverse of the matrix\n", " * `.A`   returns a view of the data of the matrix as a 2-d array \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Basic modules" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic modules: linear algebra/1\n", "\n", "Contains functions to solve __linear systems__, finding the __inverse__, the __determinant__ of a matrix, and computing __eigenvalues__ and __eigenvectors__\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# example\n", "\n", "A = np.zeros((10,10)) # arrays initialization\n", "x = np.arange(10)/2.0\n", "for i in range(10):\n", " for j in range(10):\n", " A[i,j] = 2.0 + float(i+1)/float(j+i+1)\n", "b = np.dot(A, x)\n", "y = np.linalg.solve(A, b) # A*y=b → y=x \n", "print(x)\n", "print(y)\n", "np.allclose(y,x, 10**-3) # True, mind the tolerance! " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic modules: linear algebra/2\n", "* Eigenvalues can also be computed:\n", "\n", "```python\n", "# eigenvalues only:\n", ">>> A_eigenvalues = np.linalg.eigvals(A)\n", "\n", "# eigenvalues and eigenvectors:\n", ">>> A_eigenvalues, A_eigenvectors = np.linalg.eig(A)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 6: matrices and linalg\n", "\n", "1. From the array TI of the last exercise build a matrix (__mTI__), calculate eigenvalues (__e__) and eigenvectors (__Ev__) and prove that: \n", "$\n", "Ev^T mTI \\; Ev = \n", "\\left[ \\begin{array}{ccc}\n", "e_1 & 0 & 0 \\\\\n", "0 & e_2 & 0 \\\\\n", "0 & 0 & e_3 \\end{array} \\right] \n", "$\n", "\n", "2. Now, for example, you can rotate all the atoms coordinates by multiplying them by the __Ev__ matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "mTI = np.matrix(TI)\n", "ev, Ev = np.linalg.eig(mTI)\n", "print(ev)\n", "Ev.T * mTI * Ev" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Basic modules: random numbers\n", "* Calling the standard random module of Python in a loop to generate a sequence of random number is inefficient. Instead:\n", "\n", "```python\n", ">>> np.random.seed(100)\n", ">>> x = np.random.random(4)\n", "array([ 0.89132195, 0.20920212, 0.18532822, \n", "0.10837689])\n", ">>> y = np.random.uniform(-1, 1, n) # n uniform numbers in interval (-1,1)\n", "``` " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* This module provides more general distributions like the normal distribution\n", "\n", "```python\n", ">>> mean = 0.0; stdev = 1.0\n", ">>> u = np.random.normal(mean, stdev, n)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise 7: random numbers\n", "* Given a game in which you win back 10 times your bet if the sum of 4 dice is less than 10, determine (brute force, i.e. 1000000 trows) whether it is convenient to play the game or not\n", "\n", "* hint: use `np.random.randint`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Numpy: good to know" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Masked arrays (1)\n", "\n", "```python\n", ">>> x = np.arange(20).reshape(4,5) # 2d array \n", ">>> x[2:,2:] = 2 # assignment\n", "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 2, 2, 2],\n", " [15, 16, 2, 2, 2]])\n", ">>> x[x<5] # x<5 bool array \n", "array([0, 1, 2, 3, 2, 2, 2, 2, 2, 2]) # 1d array\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* What if we want to maintain the original shape of the array\n", "* We can use masked arrays: the __shape of the array is kept__\n", "* Invalid data can be flagged and ignored in math computations\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "x = np.arange(20).reshape(4,5)\n", "x[2:,2:] = 2\n", "import numpy.ma as ma\n", "y = ma.masked_greater(x, 4)\n", "y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Meshgrids/1\n", "* Given a two-dimensional grid of points, namely $a_{i,j}=(x i , y j)$ we want to calculate for each point of the grid the values of a function $f(a_{ij})=f(x_i,y_j)$\n", "\n", "* Suppose $f = \\frac{x}{2} + y$\n", "\n", "* $f\\;$ is a `ufunc` (implicitly uses ufuncs: `add`, `divide`) so it operates element-by-element)\n", "\n", "![](files/images/mesh.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# wrong\n", "def f(x,y): \n", " return x/2 + y\n", "\n", "x = np.linspace(0,4,5)\n", "y = x.copy()\n", "\n", "values = f(x,y)\n", "values" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Meshgrids/2\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x = np.linspace(0,4,5)\n", "y = x.copy()\n", "xv,yv = np.meshgrid(x,y)\n", "print(xv)\n", "print(yv)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/images/mesh2.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "values=f(xv,yv)\n", "values" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Bibliography\n", "* http://docs.scipy.org/doc/\n", "* Guide to NumPy (Travis E. Oliphant)\n", "* NumPy User Guide\n", "* Numpy Reference Guide\n", "* Python Scripting for Computational Science (HansPetter Langtangen)\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }