"SciPy es un ecosistema para computo cientifico en python, esta constriuido sobre los arreglos de NumPy. Scipy incluye herramientas como Matplotlib, pandas , SymPy y scikit-learn. \n",
"\n",
"## 5.1 NumPy\n",
"NumPy es la base para todos los paquetes de computo científico en python, provee soporte para arreglos multidimensionales y matrices, junto con una amplia coleccion de funciones matematicas de alto nivel para operar con estos arreglos.\n",
"\n",
"### 5.1.1 numpy.array \n",
"El tipo de dato mas importante de numpy es **numpy.array** sus atibutos mas importantes son:\n",
"* numpy.array.**ndim**: -numero de dimensiones del arreglo.\n",
"* numpy.array.**shape**: Un tupla indicando el tamaño del arreglo en cada dimension.\n",
"* numpy.array.**size**: El numero total elementos en el arreglo.\n",
"* numpy.array.**dtype**: El tipo de elemenos en el arreglo e.g. numpy.int32, numpy.int16, and numpy.float64.\n",
"* numpy.array.**itemsize**: el tamaño en bytes de cada elemento del arrglo.\n",
"* numpy.array.**data**: El bloque de memoria que contiene los datos del arreglo.\n"
"[**Producto punto**](https://en.wikipedia.org/wiki/Dot_product) y [**Multiplicacion Matricial**](https://en.wikipedia.org/wiki/Matrix_multiplication)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a@b == a.dot(b)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"A = np.array([[1, 0], [0, 1]])\n",
"B = np.array([[4, 1], [2, 2]]) "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ True, True],\n",
" [ True, True]])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.matmul(A, B) == A.dot(B)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Elementos, filas, columnas y submatrices."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 2, 3],\n",
" [10, 11, 12, 13],\n",
" [20, 21, 22, 23],\n",
" [30, 31, 32, 33],\n",
" [40, 41, 42, 43]])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def f(x,y):\n",
" return 10*x+y\n",
"B = np.fromfunction(f,(5,4),dtype=int)\n",
"B"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"B[0,3]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([10, 11, 12, 13])"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"B[1,:]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 11, 21, 31, 41])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"B[:,1]"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 2],\n",
" [11, 12]])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"B[:2,1:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterando elementos"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 1 2 3]\n",
"[10 11 12 13]\n",
"[20 21 22 23]\n",
"[30 31 32 33]\n",
"[40 41 42 43]\n"
]
}
],
"source": [
"for row in B:\n",
" print(row)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"1\n",
"2\n",
"3\n",
"10\n",
"11\n",
"12\n",
"13\n",
"20\n",
"21\n",
"22\n",
"23\n",
"30\n",
"31\n",
"32\n",
"33\n",
"40\n",
"41\n",
"42\n",
"43\n"
]
}
],
"source": [
"for element in B.flat:\n",
" print(element)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cambio de forma"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2., 9., 8., 6.],\n",
" [8., 4., 2., 9.],\n",
" [1., 1., 4., 5.]])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.floor(10*np.random.random((3,4)))\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 4)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.shape"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2., 9.],\n",
" [8., 6.],\n",
" [8., 4.],\n",
" [2., 9.],\n",
" [1., 1.],\n",
" [4., 5.]])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.reshape(6,2)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2., 8., 1.],\n",
" [9., 4., 1.],\n",
" [8., 2., 4.],\n",
" [6., 9., 5.]])"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.T"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ True, True, True],\n",
" [ True, True, True],\n",
" [ True, True, True],\n",
" [ True, True, True]])"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.transpose()==a.T"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4, 3)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.T.shape"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2., 9., 8., 6.],\n",
" [8., 4., 2., 9.],\n",
" [1., 1., 4., 5.]])"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# La dimensión con -1 se calcula automaticamente\n",
"a.reshape(3,-1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5.2 Ejercicos\n",
"\n",
"### 5.2.1 Sin utilizar numpy escribe una funcion para obten el producto punto de dos vectores."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"166.39999999999998\n"
]
}
],
"source": [
"a = [2, 5.6, 9, 8, 10]\n",
"b = [1, 3, 2.4, 2, 11]\n",
"\n",
"def ProductoPunto(a,b):\n",
" '''Encontrar el producto punto de dos vectores dados a y b, sin usar arreglos de Numpy. \n",
" \n",
" Args: \n",
" a(), b(): son los vectores que se multiplicaran, los cuales son de la misma longitud.\n",
" \n",
" Returns:\n",
" float, int: El resultado del producto punto entre a y b.\n",
" \n",
" Ejemplos: \n",
" \n",
" >>> a = [2, 5.6, 9, 8, 10] \n",
" >>> b = [1, 3, 2.4, 2, 11]\n",
" >>> ProdPunto(a, b)\n",
" el Resultado debe ser: 166.39999999999998\n",
" '''\n",
" \n",
" return sum( map((lambda V1, V2: V1*V2), a, b) )\n",
" \n",
" \n",
"\n",
"print (ProductoPunto(a,b))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2.2 Sin utilizar numpy escribe una funcion que obtenga la multiplicacion de dos matrices.\n"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[23, -5, 2, 5]\n",
"[15, 6, 26, 39]\n",
"None\n"
]
}
],
"source": [
"# A = [[1,2,3],[4,5,6]]\n",
"# B = [[7,8],[9,10],[11,12]]\n",
"\n",
"def MultiplicacionMatrices(a,b):\n",
" \n",
" '''Multiplicar dos matrices dadas A y B, sena A una de 3x2 y B una de 2x3. \n",
" \n",
" Args: \n",
" a(), b(): sonlas matrices dadas de diferente longitud.\n",
"### 5.2.3 Utiliza numpy para probar que las dos funciones anteriores dan el resultado correcto."
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"166.4\n"
]
},
{
"data": {
"text/plain": [
"matrix([[ 1, 0],\n",
" [ 2, -1]])"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Sean \n",
"a = [2, 5.6, 9, 8, 10]\n",
"b = [1, 3, 2.4, 2, 11]\n",
"\n",
"\n",
"#Comprobacion con Numpy ejercicio 1\n",
"print(np.matmul(a, b))\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sea A:\n",
"[[1 2 3]\n",
" [4 5 6]]\n",
"Sea B:\n",
"[[ 7 8]\n",
" [ 9 10]\n",
" [11 12]]\n",
"El Producto de A por B es:\n",
"[[ 58 64]\n",
" [139 154]]\n"
]
}
],
"source": [
"#Comprobar la Multiplicacion de Matrices usndo Numpy Sean \n",
"# A = [[1,2,3],[4,5,6]]\n",
"# B = [[7,8],[9,10],[11,12]]\n",
"\n",
"\n",
"import numpy as np\n",
"A = np.matrix('1 2 3 ; 4 5 6')\n",
"print ('Sea A:')\n",
"print (A)\n",
"B = np.matrix('7 8; 9 10 ; 11 12')\n",
"\n",
"print ('Sea B:')\n",
"print (B) \n",
"\n",
"print ('El Producto de A por B es:')\n",
"print (A*B)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2.4 Utilizando solo lo visto hasta el momento de numpy escribe una funcion que encuentre la inversa de una matriz por el metodo de Gauss-Jordan.\n",
"[Wikipedia](https://en.wikipedia.org/wiki/Gaussian_elimination): En matemáticas, la eliminación de Gauss Jordan, llamada así en honor de Carl Friedrich Gauss y Wilhelm Jordan es un algoritmo del álgebra lineal que se usa para determinar las soluciones de un sistema de ecuaciones lineales, para encontrar matrices e inversas. Un sistema de ecuaciones se resuelve por el método de Gauss cuando se obtienen sus soluciones mediante la reducción del sistema dado a otro equivalente en el que cada ecuación tiene una incógnita menos que la anterior. El método de Gauss transforma la matriz de coeficientes en una matriz triangular superior. El método de Gauss-Jordan continúa el proceso de transformación hasta obtener una matriz diagonal"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sea A:\n",
"[[1 2 3]\n",
" [4 5 6]]\n"
]
},
{
"data": {
"text/plain": [
"matrix([[-0.94444444, 0.44444444],\n",
" [-0.11111111, 0.11111111],\n",
" [ 0.72222222, -0.22222222]])"
]
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from numpy import matrix\n",
"import numpy as np\n",
"A = np.matrix('1 2 3 ; 4 5 6')\n",
"print ('Sea A:')\n",
"print (A)\n",
"\n",
"A.I"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2.5 Utilizando la funcion anterior escribe otra que obtenga la pseduo-inversa de una matriz."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5.3 Pandas\n",
"En python, pandas es una biblioteca de software escrita como extensión de NumPy para manipulación y análisis de datos. En particular, ofrece estructuras de datos y operaciones para manipular tablas numéricas y series temporales.\n",
"and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. Su objetivo es ser un bloque de construccion fundamental para realizar analisis de datos en el mundo real.\n",
"El nombre de la biblioteca deriva del término \"datos de panel\" (PANel DAta), término de econometría que designa datos que combinan una dimensión temporal con otra dimensión transversal.\n",
"\n",
"Pandas tiene dos typos de datos principales, **Series** (1D) y **DataFrame** (2D), *Dataframe* es un contenedr para *Series* y *Series* es un contenedor de escalares. \n",
"\n",
"### 5.3.1 Series\n",
"Series es un arreglo unidimensional etiquetado capaz de contener cualquier tipo de dato (Enteros, cadenas, punto flotante, objetos, etc), El eje de etiquetas es llamado indice (**index**).\n",
"### Las Series son compatibles con *numpy.array* y *dict*"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0851635954341003"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[0]"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"a 1.085164\n",
"b 0.684054\n",
"c -3.137046\n",
"dtype: float64"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[:3]"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"a 1.085164\n",
"b 0.684054\n",
"e 0.497413\n",
"dtype: float64"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[s>s.mean()]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"a 2.170327\n",
"b 1.368108\n",
"c -6.274092\n",
"d -1.561832\n",
"e 0.994826\n",
"dtype: float64"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s*2"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"a True\n",
"b True\n",
"c False\n",
"d False\n",
"e False\n",
"dtype: bool"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s>s.median()"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0851635954341003"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[\"a\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Alieneacion Automatica"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"a = np.array(range(10))\n",
"s = pd.Series(a)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 0\n",
"1 1\n",
"2 2\n",
"3 3\n",
"4 4\n",
"5 5\n",
"6 6\n",
"7 7\n",
"8 8\n",
"9 9\n",
"dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 4, 6, 8, 10, 12, 14])"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(a[:6]+a[4:])"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 NaN\n",
"1 NaN\n",
"2 NaN\n",
"3 NaN\n",
"4 8.0\n",
"5 10.0\n",
"6 NaN\n",
"7 NaN\n",
"8 NaN\n",
"9 NaN\n",
"dtype: float64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(s[:6]+s[4:])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.3.2 DataFrame\n",
"DataFrame es un estructura bidimensional etiquetada con columnas que pueden ser de diferentes tipos, es el objeto mas usado en pandas, se puede pensar en ella como un diccionario de **Series**. **DataFrame** acepta diferentes tipos de entradas como:\n",
"* Diccionarios de arreglos unidimensionales, listas dicionarios o series. \n",