Replace 08-Test2.ipynb

parent 4cddd649
...@@ -4,9 +4,551 @@ ...@@ -4,9 +4,551 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Examen Unidad 2" "# Examen 2\n",
"\n",
"Responda a las siguentes preguntas. Tiene 24 horas para entregar la solución de todo el examen en formato de notebook, en su propia rama. Pasadas las 24 horas se descontará 1 punto por hora extra hasta un máximo de 5 horas.\n",
"\n",
"Los criterios para la evaluación de cada pregunta incluyen:\n",
"\n",
"* 80% de la puntuación si cumple cabalmente con la consigna y funciona;\n",
"* 20% de la puntuación si la solución cumple el paradigma orientado a objetos, la lógica es puntual, con buen estilo e incluye docstring."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## I. Matriz Documento-Término\n",
"\n",
"Una colección de $n$ documentos indexados por $m$ términos puede ser representada por una matriz $M_{[n x m]}$ conocida como matriz documento-término donde el valor de cada elemento $a_{ij}$ define la importancia del término $j$ en el documento $i$.\n",
"\n",
"La figura 1 muestra una matriz documento-término muy simple, donde cada columna representa un término en la colección, cada renglón un documento y cada celda o elemento de la matriz la ocurrencia del término en el documento. En ella podemos ver que el término 1 aparece en el documento 1 y 3, pero no en los otros dos documentos.\n",
"\n",
"\n",
" Término1 Término 2 Término 3\n",
" Documento1 1 0 0\n",
" Documento2 0 0 1\n",
" Documento3 1 1 1\n",
" Documento4 0 1 0\n",
"\n",
" Figura 1 – Matriz documento-termino simple.\n",
"\n",
"\n",
"### (4 puntos)\n",
"\n",
"* Defina la clase MatrizDT( ) cuyo constructor recibe una lista de documentos ([texto1, texto2, ...]) y tiene los siguientes métodos:\n",
"\n",
" * tf( ) que calcula una matriz documento-término donde cada celda $a_{ij}$ tiene el valor de la frecuencia de término : $ 1+ \\log Count(t_j, d_i) $ si $Count(t_j, d_i) > 0$; ó $0$ cuando el término $t_j$ no aparece en el documento $i$.\n",
"\n",
" * idf( ) que calcula una matriz documento-término donde cada celda $a_{ij}$ tiene el valor de la frecuencia inversa del término : $ \\log (\\frac{n}{df_t}) $ en donde $n$ es el número total de documentos y df_t es el número de textos en los cuales aparece el término $t$.\n",
"\n",
" * tf-idf( ) que calcula una matriz documento-término donde cada celda $a_{ij}$ tiene el producto de la frecuencia de término y de la frecuencia inversa del término. Es decir, el producto, por elemento, de las matrices anteriores.\n",
"\n",
"\n",
"\n",
"#### Observaciones\n",
"\n",
"* Utilice numpy y pandas para manipular los datos mediante estructuras de arregos y arreglos de arreglos;\n",
"\n",
"\n",
"* Utilice pandas para mostrar las matrices en el notebook;\n",
"\n",
"* No utilice modulos que generen directamente la matriz documento-término ni reutilice código que no haya sido programado por usted y que no sea capaz de explicar.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" angora azul blanco cafe gato loco siames\n",
"0 0.477121 0.477121 0.477121 0.477121 0.0 0.477121 0.477121\n",
"1 0.477121 0.477121 0.477121 0.477121 0.0 0.477121 0.477121\n",
"2 0.477121 0.477121 0.477121 0.477121 0.0 0.477121 0.477121\n",
" angora azul blanco cafe gato loco siames\n",
"0 0.0 1.0 0.0 0.0 1.0 1.0 0.0\n",
"1 1.0 0.0 1.0 0.0 1.0 0.0 0.0\n",
"2 0.0 0.0 0.0 1.0 1.0 0.0 1.0\n",
" angora azul blanco cafe gato loco siames\n",
"0 0.000000 0.477121 0.000000 0.000000 0.0 0.477121 0.000000\n",
"1 0.477121 0.000000 0.477121 0.000000 0.0 0.000000 0.000000\n",
"2 0.000000 0.000000 0.000000 0.477121 0.0 0.000000 0.477121\n"
]
}
],
"source": [
"import math\n",
"import pandas as pd \n",
"import numpy as np\n",
"from sklearn.feature_extraction.text import CountVectorizer \n",
"\n",
"class MatrizDT:\n",
" \n",
" def __init__(self,d):\n",
" '''\n",
" funcion que crea la funcion matrizdt, para realizalo pedido\n",
" Args:\n",
" docs: un Array que contenga uno o más cadenas de texto\n",
" \n",
" Ejemplo:\n",
" >>>docs = ['gato loco azul ', 'gato angora blanco', 'gato siames cafe']\n",
" >>>c=MatrizDT(docs)\n",
" '''\n",
" self.documentos=d\n",
" vec = CountVectorizer()\n",
" x = vec.fit_transform(self.documentos)\n",
" \n",
" #funcion tf\n",
" matriz=np.array(x.toarray(),dtype=float)\n",
" for i in range(len(matriz)):\n",
" for j in range(len(matriz[0])):\n",
" if(matriz[i,j]!=0):\n",
" matriz[i][j]=1+math.log(matriz[i][j],10)\n",
" self.vtf= pd.DataFrame(matriz, columns=vec.get_feature_names())\n",
" \n",
" #funcion idf\n",
" n=len(self.documentos)\n",
" res=np.zeros((len(matriz),len(matriz[0])))\n",
" for i in range(len(matriz[0])):\n",
" df_t=0\n",
" for j in range(len(matriz)):\n",
" if(matriz[j][i] !=0):\n",
" df_t +=1\n",
" val=math.log((n/df_t),10)\n",
" for k in range(len(matriz)):\n",
" res[k][i]=val\n",
" self.vidf = pd.DataFrame(res, columns=vec.get_feature_names())\n",
" \n",
" #función tfidf\n",
" res2=np.zeros((len(matriz),len(matriz[0])))\n",
" for i in range(len(matriz)):\n",
" for j in range(len(matriz[0])):\n",
" res2[i][j]=matriz[i][j]*res[i][j]\n",
" self.vtfidf=pd.DataFrame(res2, columns=vec.get_feature_names())\n",
" \n",
" \n",
" def tf(self):\n",
" '''\n",
" \n",
" tf( ) que calcula una matriz documento-término donde cada celda 𝑎𝑖𝑗 tiene el valor de la frecuencia de término : \n",
" 1+log𝐶𝑜𝑢𝑛𝑡(𝑡𝑗,𝑑𝑖) si 𝐶𝑜𝑢𝑛𝑡(𝑡𝑗,𝑑𝑖)>0 ; ó 0 cuando el término 𝑡𝑗 no aparece en el documento 𝑖 .\n",
" \n",
" \n",
" \n",
" angora azul blanco cafe gato loco siames\n",
" 0 0.0 1.0 0.0 0.0 1.0 1.0 0.0\n",
" 1 1.0 0.0 1.0 0.0 1.0 0.0 0.0\n",
" 2 0.0 0.0 0.0 1.0 1.0 0.0 1.0\n",
" '''\n",
" return self.vtf\n",
" \n",
" def idf(self):\n",
" '''\n",
" idf( ) calcula una matriz documento-término donde cada celda 𝑎𝑖𝑗 tiene el valor de la frecuencia inversa del término \n",
" : log(𝑛𝑑𝑓𝑡) en donde 𝑛 es el número total de documentos y df_t es el número de textos en los cuales aparece el término 𝑡 .\n",
" \n",
" \n",
" Ejemplo:\n",
" >>>c.idf()\n",
" angora azul blanco cafe gato loco siames\n",
" 0 0.477121 0.477121 0.477121 0.477121 0.0 0.477121 0.477121\n",
" 1 0.477121 0.477121 0.477121 0.477121 0.0 0.477121 0.477121\n",
" 2 0.477121 0.477121 0.477121 0.477121 0.0 0.477121 0.477121\n",
" '''\n",
" return self.vidf\n",
" \n",
" def tfidf(self):\n",
" '''\n",
" tf-idf( ) calcula una matriz documento-término donde cada celda 𝑎𝑖𝑗 tiene el producto de la frecuencia de término y de\n",
" la frecuencia inversa del término. Es decir, el producto, por elemento, de las matrices anteriores.\n",
" \n",
" \n",
" Ejemplo:\n",
" >>>c.tfidf()\n",
" angora azul blanco cafe gato loco siames\n",
" 0 0.000000 0.477121 0.000000 0.000000 0.0 0.477121 0.000000\n",
" 1 0.477121 0.000000 0.477121 0.000000 0.0 0.000000 0.000000\n",
" 2 0.000000 0.000000 0.000000 0.477121 0.0 0.000000 0.477121\n",
" '''\n",
" return self.vtfidf\n",
" \n",
"\n",
"documentos = ['gato loco azul ', 'gato angora blanco', 'gato siames cafe']\n",
"\n",
"Cont=MatrizDT(documentos)\n",
"\n",
"\n",
"print(Cont.idf()) \n",
"\n",
"print(Cont.tf() )\n",
"\n",
"print (Cont.tfidf())\n",
"\n",
"\n",
"#import doctest \n",
"# doctest.testmod (verbose = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## II. Nubes de palabras\n",
"\n",
"Una nube de palabras o nube de etiquetas es una representación visual de las palabras que conforman un documento o una colección de documentos, en donde el tamaño es mayor para las palabras que son más \"importantes\" según un criterio dado. Son muy útiles para visualizar las palabras clave del contenido o para visualizar las ideas principales de un tema. La figura 2 muestra un ejemplo de nube de palabras extraida de \"Don Quijote\" es el sguiente:\n",
"\n",
"\n",
"<img crossorigin=\"anonymous\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/Nube_de_etiquetas_-_Don_Quijote_de_la_Mancha.png/320px-Nube_de_etiquetas_-_Don_Quijote_de_la_Mancha.png\" class=\"png mw-mmv-dialog-is-open\" width=\"245\" height=\"145\">\n",
"\n",
" Figura 2 – Nube de etiquetas para el primer capítulo de Don Quijote de la Mancha.\n",
"\n",
"\n",
"\n",
"### (4 puntos)\n",
"\n",
"* Defina la clase NubePalabras() cuyo constructor recibe un diccionario cuyas llaves son palabras y cuyos valores son de tipo numérico y representan la \"importancia de la palabras\" e incluya el método plot_cloud() para generar la visualización utilizando Matplotlib tanto para controlar los aspectos visuales de la nube de palabras como para generar la figura.\n",
"\n",
"* Defina el método store_cloud('/algun/nombre/archivo.jpg') para guardar la figura en un archivo .jpg.\n",
"\n",
"* Modifique el constructor para aceptar un argumento opcional llamado \"stopwords\" que es una lista de palabras que no deben considerarse para la visuación. Si \"stopwords\" no es proporcionado al constructor, utilice por defaul una lista con las preposiciones y los verbos más comunes en español.\n",
"\n",
"\n",
"#### Observaciones\n",
"\n",
"* Las palabras deben de mostrarse en horizontal;\n",
"\n",
"* el tamaño de la letra debe refleja la importancia;\n",
"\n",
"* La disposición de las palabras puede se aleatoria pero las palabras más importantes deberían ocupar lugares centrales de la figura resultante;\n",
"\n",
"* El color de las palabras puede se aleatorio pero se aprecia una paleta de colores que se vean bien juntos;\n",
"\n",
"* El tamaño de la figura resultante debe ser apropiado para un monitor promedio, ni muy grande ni muy pequeño;\n",
"\n",
"* No utilice modulos de nubes de palabras ni reutilice código que no haya sido programado por usted y que no sea capaz de explicar.\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAADhCAYAAADGdn6kAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnXl8FEX2wL+VhCTcIYRwhCuEcMstoCJyiCAeuOutoLseuLr6U8H1AgUVVFQQdV0VxV1AFBVvxZNDvLgCCCJXIBzhvkICIYEk9ftjepI5ema6Z3pO6vv58Ml09auq183M6+qqV+8JKSUKhUKhiF3iwq2AQqFQKIKLMvQKhUIR4yhDr1AoFDGOMvQKhUIR4yhDr1AoFDGOMvQKhUIR4wTF0AshhgohNgkhcoUQDwejD4VCoVAYQ1jtRy+EiAc2A4OBfGAFcL2U8k9LO1IoFAqFIYIxou8F5Eopt0kpTwFzgeFB6EehUCgUBgiGoc8Adjkc52tlCoVCoQgDCUFoU+iUuc0PCSFGAaO0wx5B0EOhiHhq1G/qVlZ8OD8MmiiilENSyga+hIJh6POBZg7HTYE9rkJSyunAdAAhhAq4ozgjyR5yOwnJtZzKct4eEyZtFFHIDiNCwZi6WQFkCyEyhRCJwHXA50HoR6GIen5/dzw5b4+JSePe45YpJNVJC7caCoIwopdSlgkh7ga+BeKBt6WU663uR6FQKBTGCMbUDVLK+cD8YLStUHjizpwbAXitx5wwa6Ko3Tgr3CooHAiKoVcozjR63DIFkOS8/YD22Rkrp2YanTWAjLMv1T2n109qq25k9h9BzttjSO9wPs36XGFKP73rKdy9iS3fTvcp2+mqRwz11enqR0mqXV+3/1ic1go1lm+Y8ksJtRirsIBwjugdDVz+ii/Zv24RYDN09nlqXwbL3oYvuW4jnyauWhIbP5/GiUM2T+YW511NWts+uvXthl7KCoSIY937T3LqxDE3vV3rxScm03XEJAB+n/MYZaXFAHS/eTIiPsGrrs3P+SsN2p/HH/OeobTwkNfrcdRj1f8eRFaU2/r523OIuHiv/SjIkVL29CWkYt0oFBZSVnqi0sgD/DHvGcv7WD37UXLeHlNp5AF2/PIhBTv/AKBF32t16wkRR87bYyqNPHg3oHYjn/P2mEojD7Bq5kMB6a+HfUHabuTBZvQV1qAMvUJhIb/PedytzG5M9aZArGTrD/8FIK1Nr6D2A7Dz13kAdLzSeqPvyN413wG4uaAqzKEMvUJxhnBk22rDstVTGwNwNO933fOHNi8DILlueuCKeaGs5AQAQujtw1QYRS3GKgyTmlWXS14eQGLtRFa+sZbf52wMqL1z7utO++FZFO09wZJnlrN/ne+5XBEnuOHTy0mqncgP435h5y9ue/EM0eGvrelxayfiEuJYM+vPgK8l1PjzdlC4e7Nh2bQ2vQGol9mFHpnBfROxE+w3njMZZehjEPui5JJnlrN+3hZDsuB5EdNRxs65o3tw7ugeXuu51rfLubaXVCeRv/5viNe2qtVI4LafnOeeL3l5AACFu48z5/LPvOrgqosjZq4l3CTWqsdZ14wDIPe7NzmW7/yA8mYsHee/fWGX3b3yK/atXeiHpsbpeuNE4pOqA5Dz3wfAwUGkSY+LadzlwqD2fyagpm5imH6PGJ+r3ff7Qd1yV8O49JU1/DR5hVcZb7jKnjxSUvl5+4+eY7w4GvnSwlMsfmope3IOAFAnoxZXvDXYdN9LX1nDyunrvMpEGnYjn/P2GDcjbyX71toWlBudNTBofdipNPJvj3Ey8gAJidWD3v+ZgBrRxyCv9ZhjyGANee78ys+f3PKd2/lrP6jy1XYd6f7xgW0awN7PnTk3mh7ZG8XTW8eGT7cCcNWci2nczftcsbc3lxVvrHWSMXItZrD7vBfu3mRZm3pkXmDdQ6qs5DhQZYTNUlF+GrAZ6tIA9GjQ/rwAaivsqBF9jOPN4Lca1Nxr3dSsugBep38OrD9sSp8Z/T4wJe/I2/0/1C2fd+PXhtvwZsDNXoseelMnjc6yTTHpbTCyktSs7kFp1z5fbwb7G0G7y+/1u9/klIZ+11U4o0b0Mcrip5bS/7E+hmT1jF/7K6q2sC95ZrnHuh/d9E3lw6TjVdk+1wROnThtSCc7XUa0r/xcWnTKo1zx4ZPUqK8/+nS8Fm84Xksg6Bn78lMlOpL6so5lsqLcyZ983QeTOOuasZUy5adLia+WBMDWBf8la9DfA9LdkZy3x9Djlim06HsNLfpe41FGD/sbAQi3a3StU3G6lLhqSZW7ix0jndt1UASGGtHHKPZpDU90ubGd1/NGHxKOdB3Zwev50yfLTLd57v3GRqmLnljq8Zw/1+IveoZvd8581rwz1q/27DtD7Zw6foT1Hz9XeWw38qtnPkTBjj/86sMbtrWADbrn7C6W3uoaYfXsRzm+f5t25GzkFdagRvRnAHpzznYvk4rTFYbqG6Fmeg2v53f8tNtQO/5g1M0yFIutZgyUP8aspGC/x3p65Ue2rfbqQ+9Lh9zv3jKnoIm27Wz66tWA21B4Ro3oY5jfpq3yKfNGn/dCoIlCoQgnPkf0Qoi3gUuBA1LKTlpZKvA+0BLYDlwjpTwqbNvXXgKGAcXA36SUvq2NIiismb2Bc+6zTX1cNPl8vnvoJ8D8qDbSfcsBqtdLMiQXDdeiUFiNkRH9/4ChLmUPAwuklNnAAu0Y4GIgW/s3CnjNGjUV/lJ+yjY1k3Whu4fNO5d+6rFeMKdZzJD7naFMafS47SyP5yLlWhSKcOHT0EsplwBHXIqHAzO1zzOBKxzKZ0kbS4EUIURjq5RVmGf6OZ6nZor2nvB4bv59iys/XzP3EitVMsX3j/xsSO6s69p6POd4Lbqp6xUKF666ujp5uxqRt6tRuFWxBH8XYxtKKfcCSCn3CiHsu1UygF0Ocvla2V7/VVRYRfXUZC57bZDpevWzU4KgjXk8bWQSccat950r/d8MdcG/utHthjZOZdO6vQ+oBcNY4/mpdcOtgqVYvRir94vTTSoihBglhFgphFhpsQ4KF/KX7QPgb99fSf3WNqNtxNg5ytyZcyN35txIekdbFqC4hDjOuq5tZXkwvVm+GbPESQ9Hw37Ne8P4x4obkBXec9f4upa+/+oZkmtRKMKBvyP6/UKIxtpovjFwQCvPB5o5yDUFdP3epJTTgekQ2xmm7s25xuOI0z4aNMN9q/WTSnhr64u7FvhtvFzDKVw5y3W5JvjkLd5FwY5CUlrUAeAfK25wk3n97Hd9XmMkXItCEQ78NfSfAzcDz2p/P3Mov1sIMRfoDRyzT/EoAmfQOJ8Zwwyx+CnvG11csY+Gr5w1tHIUbOfPj3P5cZK59vzhvb9+AcCtS64hsWa1yvI3er1LRbnxcYL9Wu5Yej1x1ZxfaEN1LYrIpnOXar6Fogwj7pXvAf2BNCFEPjAem4H/QAhxK7ATuFoTn4/NtTIXm3uldfuxFZx1pbGt/HpY4Vb40U3fhK1vO95i5Zjp50zeP/DclLoMGpTE4SMVvPzicb78Qj88gz9cNCSZv99ag7PPTuRoQQXffVPCtKnHOXjQ98a8QGjTNoFXXk0hu00Cq1ed5tV/H2fhD/6FU/vsS/0k5cHi0cdqM2xYMolJgoULSnl8bCGnTlk7yaGSgwcZb1M3y95cz2//Mb5t3dO0Dfg3DaQwh7fF2EjB0Uvk9zWnueIyW6C2/7yewsWXJHutm9lsn+n+xj5em9tur2lY/p3ZxTz2aKGpPhyvyVHH+HjI3e7bK+buuwr4ysPDbMb/6jFwkLE9GN4we++MevMYaFclB490et/eMdwqKGKYLl1tUxB5uxr5NPJ2uSFDfcs5ypsx8gAjRtawzGXRiJEHPBp5wBIjbxYz15+3qxEjRnoPLWIEZeijhLRsZ3evZW+uD5MmimiiXXtzy3Cvv2mNK225j/h1ycn+bWjoP8BmmKPVv90fvZ96uk7A/aqgZlHCiA+cPUQyujUIkyaKaOLr79IAmPrCcV556biujKvxydvVyNBURGazfeTtakRxsaRj2/1eZXPzGhHvYG02bGno11TRuMdr899Z9Zx08EStWoLbRnl/49Cr73o//NFTDzPttu+QwPxv05zqBqKHGtGHmCVT1ljSToM2kbGJSRH5vDO72KORB32DM2iwsSmNzGb7fBp5gNaZ1hjLrNZVTwtfhu/4ccm0qZ6vO5QsWJzmdNyxnfd7tuHPMgb1P+RUFshbjDL0IWbVO87p5Go28C9VW1KdRCvUUZwBGFn8dDWab71dz4Ok/7j2MfRi4+sBvtqKdFplVT2gfvi+lOITvv1Ptm01n7/BE2rqJszcNG8or13wiak6n/zzR/7y6gVB0ih8XD7tfFpd0MSt/Pe5W1g0OThBUC+f1pdWF2Q4F0qY1t3dm8afxCm+qFYjgb9/cQk1Ut2N3pdjfiF3oeeE6dFO/wFJfPO1edfOnt0O+BaKIP79H+e379tvORpyHZShDzNGRuYJic5Zhnb8am40o+eW6a9boD9tudZxlE+qk8idP/7Fa/0u12XT5bpswDqj781VFVF1fuvi3Xxxvy2w2tZFu+l1q/csWkYQcYJ7c/RT8zly6ZSqxNj+/n+Nf8ycK2MoSa3v34TC4UPB9cm3mksu8//N5a5/FPCf16seFPePqcWLU8xPR6mpmzCwfMafpuRHfhSbW/Vv+epSn0belS7XZXPVmwP87nPAQ929G3kXsvpnVMrvX+8axNU859zVyZCRd+W+1ddy+/eXm64363/FhmXHPuL8UEhLU+Yh3Cz9zTlPsll3VjtqRB8Gfv33OlMjw7pNawVRm/BwyXPnUqeJf1/apj3T+b8VV/Py2R+aqtf79o6VbwZmuW/1tQFvjjLzgNGjZlp1S/TwxLvvFDPpmSpXvtvvqMkzk4oM1W3QII7lq9J9CyoCWlStUdM/t1Rl6COAy6f15fP7jMVd/+yeJb6FIpzr5wymYYdUp7Ij2wuZ9Zev3WQHju1J56vcQz/EJZgfbZ5zVyfd8iVT1rgtkutNrwRiqP/6muc1lekXfkbxYfe56vtWXasbDzaYxt6R1q19m4eHH63NHXf698BWhA5l6CMAt8VAL+T9HP0x4lyNvDejtXDSShZOWqlrZG949yLeveE7Q32ajfopK2TluUBH4m2HNqd5H/dRnC9jbV8Q1us/vX09DmwI7qJego/YXt5Gpjt3lvPenGIWLihl86YyQ3UUwUMZ+gjn4mfOCbcKQcXoyHRat/fdDF56+8BcAAPp2wx6/4dmRuR6/d/w7kVBH9Xvzi/3eE7PYN91RwFfz7cuQFqsEg7XULXaEiaWv2lsQbbtUPdcr7FCqAKCWeF1tHfdYb/61nObnHf7ItPtvNLbfT3ivlWBvWm44uod8t8ZxhdyM5vtU0Y+gvFp6IUQzYQQi4QQG4QQ64UQ92rlqUKI74UQW7S/9bRyIYR4WQiRK4RYK4ToHuyLiEZ+/c+6cKsQdeT9pJvDJiS8f9MPftUbtWC4W1n+SvN+4PYk704YWJe7+x7jC/nPTHaOp7Rli/6+gY8/cw7jO/KGwL2RFMHFyIi+DBgjpWwP9AH+KYToADwMLJBSZgMLtGOAi4Fs7d8o4DXLtY5B+tzhO5Kl0fnoaMCf0bzdn90Rv3YIR2lQbKML9o6MedC4oa9d25hHR7fuzpP3P/90yoOkMy1axvsWikFOnw7/F86noZdS7pVSrtI+FwEbsCX8Hg7M1MRmAldon4cDs6SNpUCKlm5Q4YU+/9D3CHEk2ItvkY5eJqk6jbyHcL3sxb5uZSF7YOrYza2Ld/vd3LYf3es27hzaJBmBsPinMzMQX5tWznFtwrEgbWqOXgjREugGLAMa2tMEan/tTrQZwC6HavlamcKFtfO2ej2v4tUbwMcgNKu/+1fvwMbQPDBv+uhitzK9t5JAuHbmhT5lrrzadzyl9RsbOh1/+slJj7LlntdoPeLonx/p7N8f/J23n38V2ge0YUMvhKgFfATcJ6X0tq9a76fnNhQTQowSQqwUQqw0qkOssXCS90v35PetCA+nTpiLdZOaGRnG7YWpdalXz/NP/aaba7htxLn//455lO/d03mNwdcINS4ObhgRePKMUNHv3INOx1aMwF09bc7qXI37RhufVtuY25Bnn6/rW9ADhtwrhRDVsBn5OVLKj7Xi/UKIxlLKvdrUjP1/Px9o5lC9KeC2iialnA5M19oP/yRWBNDs7HR2rdBfqJs78vsQa6NwZefSfbQe1DTcavjFqrW2F+5VOacZeeMRThZLXvp3Cpdd7u4V5Mv9Ty/WTN6uRgwbcogNf1Y9DO+5txajH4i+Xd16+VrzdjXiu29LuOO2AsCWQPyiIclce3110tLiDLlMtmm1n83bqt6c7r2/FvfeX3V/Fi4opXp1QY+e1UhM9G8HrCeMJAcXwAxgg5RyqsOpz4GbsSUKvxn4zKH8biHEXKA3cMw+xaPwzpXTB3hcpNz3h/JsCDfFR/1LNh1O7MlB7HTvUc1tmsaRkhJjY65/jT7G81OdR5iOiTKM6BLJvPHaCbcdvxcNSQ5I/9OnJa2a72PbTv02gpnW0MjUzXnASGCgEGKN9m8YNgM/WAixBRisHQPMB7YBucCbwF3Wqx37qHjzkUf5KT8mpyMAoxt0Zrx5gvbZvpOIAMz78CT/vLPAtA5vvnHCcJ1w8uzTRbz+H+t1ldL/DVP/9pI8xhc+R/RSyp/xvOQ1SEdeAv/0W6MzjAMbjuru8DQb1VERfOpmRG9MF7txeenfKVw+3Hm6pn/fg+zYYf4hNv/LEjK/3EfjJvH8uszdo+bZp4t44zVnY/n0xCKenmgsUJqdcCUZmfxMEZOfKaJhwziWrtQP2DbpySLeetP8A8F+TenpcSzL0W/7009Oel0rMYMKgRBm3r3hu4BjqShCg9k8vWWl5SQkRZbv+L13F3Dv3YG1kXHzKGq0aedUdtEsZ5kt48bYhq8+qNGmHRk3j6qqN3a00/mUc86nwaX6g55D33zB0Z/M7zJu/cRziAR90+faP9i8cIw+bNIuvpx6fft7PF9efIJtkx6rPD5wwHjbgaAMfRQw/+Hfwq2CAvPTae/d+D0j5znnEmgzpDmbv91pmU4bvtxuWVteEYLsiVMMi9tl9QynEeKSq5P12CSvMmlDL6No3RrKCoy5y2ZPmmpYxozemQ9NIKGOMQ+r+Bo1/eojUFSsmwjDMauQHSsNw5nGjy+sdiuzIkuUEQ5vdX/tHvastUHqvn1smaXt6dHwyutNGXk7Rg2wKwl16vo08mb7MGLk/ZU3auQD6SNQ1Ig+Aijce4I6jW3zv60HRq773uXT3HeZRjqr52zmgge6OZWde/dZprN8RQLhmOKr2+tc6nQ/2618x7TJnDrovnCb1KQpzf9pG6nmPf+UX31mPjS+8nPe5CcoK3R/YGY9/gxxSca8VFwNasGvSzj41aducil9+tLgsr861TMy6s5/81Wa3m5blizdnc/O/+gb8PqDhpA6cIhTWc22HTixKfjfRWXoI4CZV8znnmVXVx5f9dZAS9s/ln/ckixVZuLmK2wsmbqGfqO7hlsNv0kffpXT8enDB9k+9RmP8qV78gOakrAbZVlWRu74Bz3KbX3yEVPt2fGmW8HSnzl9rIAmI26pLMsa/yxbn3jYYx2Ak9u3sus/L1Kye5dXucMLvuXIkkW0nvBsZVmTm24LyRSOmrqJAFwjEzbtUbXo983YpQG3/7/h893KzI4Oh0zsHbAe4WKxTjLxq2eYe5j6O5peNXuTW5k/benVMZtK0SyuRnLHS895NfJWcWLTn16NvFESGzjvFziyyHeMoxMb/kBWVP0e4xKNrcv4MvJ25GljAeCsRhn6CGfj/B0BtyErAt943P6SlgG3ES7WzN3iVpbR3bgHzeAJvQLq/2SB+0YrM8beU27dirLgx2Rx5NSB0Lg57pn1liXttLjvIafjwz98Y6he7mMPOB2nX2E+mXukoQz9GULJMfeRxH2rr6XdxS281rv9+8tjwv3zxCH3IF33rb6WHje105F2luk4PDOgvt8Y4D4fbG/bF/etvpZbvrrUrTxUSVvsFP2eE9L+Iom6Z/extL3S/aEPFKDm6M8QXu//ia5hGfp0H4Y+3QdZIfntP38gpaTjFa1IaRZ9MUq88ebgz3Wv//z7u3D+/V2QFZKcmRs5XVJO9oXNSMt2DyAVSEpBT3XtZRXlktVzNlFaeJqsARk07JjqJmvnozsW+6WDGepf6OwWuu+DOUHvE6B488agtFuw1NqooQFhwRu2WZShjxBePvtD/m/F1b4FA2DWlV/rhs4FEHGCc+8+y2v9ad3eZ+jTfXy+BUQq3gy1iBP0/Ht7j3XfHPx5UPuPixc+3y7sbXjDqs03qQMusqQdsxz65ougtJvSpy8pfULjNZZybj8aXHKFb8EQoqZuIgS9+Var0w0e2VbI9As/8y2og93AfPNo4IvD4cSfKY9p3d7XnfoJVf9W1I0Wyoq8RUCPbLInTSV70tSIM/KgRvQRjdEE4mYoPlzCtG7vk96uHje853vUFovGxX5NvqZh5o783i1q6IGNR0lv5x6byJ/+M/s1YfhL53uVLdpXzIyLgzPKjWVEfOhCTxjZ+FS6bw8lu3Zyav9eGgwbbgvSH0KENBCPIuhKqHj0CkVEYcb/PBBcY91sm/QY5cXWRI10vIbc8Q8iy8wljjFCq7FPEV/D2Stq/8dzKcxZ7lmvp15wMvQB3tscKWVPX0JG4tEnA0uAJE1+npRyvBAiE5gLpAKrgJFSylNCiCRgFtADOAxcK6Xc7vdlKIJCesMudOx4ncfzO7YvYts2337HAwYa96tetLBqk0u1ajXpe/44t3JvfXiS69r1VuqltvZY3+i1KKooP3Gc+JqxsyBfr29/jiz+wfJ2XY28IaMtrE0qYgQj7w+lwEApZRegKzBUCNEHmAy8KKXMBo4Ct2rytwJHpZStgRc1OUUE0X/ARK9GHqBFywE+jbgZIx8s+g+Y6NXIg+1aGqSrtIxm2Pfhu84FYTBOVlJ/8DDL23SN/2N4ZB6Ge2kkHr0E7BHvq2n/JDAQuEErnwlMAF4DhmufAeYB/xZCCBkJc0QKABYvGseAgc+weNFYpHRfBDZiwB1l9Eba9vMbN8xj797g+WAbvZZOnW70+eagqKJ4i7ObY/bEKSGNthgVRNHDz9CKgBAiXgixBlte2O+BrUCBlNI+6ZUP2AOhZAC7ALTzx4DQpjxX+GTRwkd0DaPt3KOVn5OSvSckPnTI+4Jxu/ZXeT1vBVZdiyK2qCgtcTpOHTA4TJqEH0OGXkpZLqXsii3Rdy9Az+HYPmLXe8y5jeaFEKOEECuFECuNKqsIFVX/Xc2auodNdmTd2tm65dvzFliqkf8YvxaFM66Ll6EMq2sFW5981Om4/oX6e0hCSea/HvMtFARM+fhIKQuAxUAfIEUIYZ/6aQrs0T7nA80AtPN1AbfM1lLK6VLKnkZWjBXho1o1P9PnReBrrd/XcoaiF1gse9JUQ4G+EurUiYgHg2uo5OxJU92yY3mi1biJXq9hzztvOx1nPf601/ayJ00lISUw11x/MeJ10wA4LaUsEEJUBy7EtsC6CLgKm+fNzYB9J87n2vFv2vmFan4+cqlePZXefUYjhH9+x1273caa1e5BqFq2tDbUshECvRaFO1vGjnYzdlnjbWF2ZVkZB7/8mLKiQpKbZ5LS5zzikpL1mgkbZQVHKdm1g+RmVbu5K905peTAZ/MoKzxGUuMm1O7Sg8T0hh5acufEhj+cjuOSknVj2KcNvYx65w+oPN765CNkPR5aRwYjG6YaAzOF7dcTB3wgpfxSCPEnMFcIMRFYDczQ5GcAs4UQudhG8t7dOxRhIVCPme15C2iZOYh69bJo0KAjBw+urzznOHJevGhsQP0YIRK8f2IZPWMPIBISoiKy467XX6Ju7/NIv/xK5xNCkH5FYGFH9O6Nt7eA/Df/TUWpezTTYGPE62Yt0E2nfBu2+XrX8hIguEFbFAHhaBiLivJZueJVrzJ65OX9QMvMQQB0OmuERzlPi6RGSUjwPkK04loUvtkydjStJzyLqGY8b25FSYlvoRBxbNkvHFv2i+npJCOeRnmTJ5D50ARL2goWKgTCGcZZnW+q/Byou+GihY/oGtHy8tMs+fHxgNq20yTDcyx4Xy6eCmvJnVCVaSkuOZn6g4dRu1MX4qrXoHTvborWrqbglx9NtVm8eWNIDaBjX7W79qD+wIuolprGqcOHOLF+LYe++8p0m2WFhZXt1j6rKw0u/QtxNWpyMm8rRxZ8w8kdeV71CAXK0J9hpKV5jtBolv4DbAmcd+xYzLat35qqW1ZWbEguKyv8nhIKdypKSjj4xccc/OLjcKviN0VrcihaY+0ej6J1ayhat8bSNq1ARa9UuHFB/4mG5ISwfX3MGnmAUK3PG70WhSKWUYb+DKOw0Hduy7g4c14rGRmBZeDxNIduxdy62WtRKGIRFb3yDMTVgJaWHHPaNVpaeoykJNvxvr2r2LBBPwn1uec+bGi3qaf58959xlCjRprPut6Cmll1LQpFlGIoeqUa0Z+B/Parc5w5V2P96y/P8uefvuPQGw0p4GlkvmzpFE4WH/JYzzF8gSesuhaFIpZRI3qFaZo2PZfsNpcBxkMML1v2IsUnDgRdN4XiDMOaePQKhSt2I//Lz5MM1+nc+WaW/vZ8sFTymzZPOPtVbx6vIjQqYg81daPwm1OnjvsW0jh92pqsQQqFwjzK0CtCwupV08OtgkJxxqIMvcJvvLk/ZmVd7HS+osL6fJ0KhcIYao5eYRpHl0cjvu4qPIFCEV6UoVf4xaKFj9C+/VU0atzDo8yPix9TI3mFIgIwbOi1MMUrgd1SykuFEJnYYtGnAquAkVLKU0KIJGAW0AM4DFwrpdxuueaKsLNhwzw2bJgXbjUUCoUPzMzR3wtscDieDLwopcwGjgK3auW3AkellK2BFzU5hUKhUIQJo8nBmwKXAG9pxwIYCNiHczOBK7TPw7VjtPODNHmFQqFQhAGjUzfTgAeB2tpxfaBASmmfgM0HMrTPGcAuACllmRDimCbvea97DBBXPYnW79i27G+XDW/kAAAgAElEQVT/v1c4tTt8l5t+2yWkXFwVx7143TbyJ8z0UkOhUMQyRnLGXgockFLmCCH624t1RKWBc47tjgJGGdQz4rEbeYCWL9/D5ivHh1EbhVVk3j+OaimpuudK8new882XTLUX6E5co/Ud5Rxlsh6eSHz1GpXHFaUl5D7tHlPItZ/yE8fZ+pz5ZDJJjTJocecYrzKnDh1g+yvPmm7b6RonjAGXcC6u1+DIoQXzObLkB9N96pE9bjKiWjXdc8VbN5M/63VL+gkEIyP684DLhRDDgGSgDrYRfooQIkEb1TcF9mjy+UAzIF8IkQDUxZY71gkp5XRgOkR/rBuR6P6fXKNzK4rXbguDNgorSB/2V1J69/Uqk9y0BW2emMqh77/kyM8LQ6SZ/+gZvrikZNo8MdXpYaAnF1+zlpuc2b48kZiWXinvbwiKJtffwp53bWmr20yYAj5mi9MGDSNt0DDdB4RRjFxjjaw2tHliKid3bWfXWy/71Y8V+Jyjl1I+IqVsKqVsiS3R90Ip5Y3AIuAqTexm4DPt8+faMdr5hTISIqcFEXnqtFuZMvLRS5sJU3waeUfSBl9qyrCFg4wbb/N63j7q9nUdrcb4flMN5F74W7dW245V9U0sCbaZMMXQNTmS2KChaT2rN2sZ1u9IIDtjHwJGCyFysc3Bz9DKZwD1tfLRwMMe6isUEUdqvwvdDEVFyUn2fTyHXTNe4ehvnnOiioTI3JZSf8BQarbpAEDRH2vYPds9HEVSowya3fLPyuOTO/PYNeMVN7mEOsZCU+tRum83u9+dwc43X6Jg2c8e5fw1iHr1Tm7fyp53Z5A/63WKt27SrWfmmhLqptDy7od0zx3/cy27Z09nz9z/UpK/w7COocDUN1NKuRhYrH3eBrhlbpZSlgBXW6BbVKHm5GODtEHDKj/rTSOc3JnHwW9sL6+uP9rsx56LyOiX9ftfBDhfz+bxo930r94iy5CcL+x1itatZu+82boyJfk7ODDflm82GMZP7/+heOvmys+ufRqdlmo12n2dQq/e8Q3rAu7LSlSsG4VCByM/xEg06p4o+sM9YfWu/77qVlaw9Ce3sp1vOBuqBkOG++xv8/jRHo28nqwrWQ8+aaiu0fZcOb5pvel29R5I/n5ParZuZ7r/QFCGXqFwYf+X/u/2bXL9LRZqYh17P5zlVnZy+1a3sgNff+JWVrIn3+m43rkXWKeYxpYn/+V0HF+zll/tGH342hduA8HMg95VNmNkaB0OlaFXKFw4tuJXw7IHv/7U6bhWu05Wq3NGIMvLw60CSY0zPJ5rM2GK07Esi64YTsrQKxQBcHTpknCroNDIe8l4xjOA00cOOx3XzG7vWdhlgX7LUw+a6gtwc+Nsdus95tvwk8h0E/BCm4+ecDoOZBE0rkYyrWc7h9D11Z5r/54IxuKsp74Ll6xl30sfVR6XHS4MuK+MsSOo2T3bq8yusTM4uXFnwH1FEnpTHIrowNVw+6K8+DjVUutXHsclJVutkhM7p0+j+R33Vx5Xb54Z1P4ciTpDX/TrH9Q+t+r1uM1HT/htVF2N/LZbIy+nKfh+uNTp15k6/TpTuvMAO+5/lcKf1pI24kLL+3Gl2SRbHDtZepotN0w03V8kordoqbCG2p27kzZgKNVS08KtSlgo2bMrbH1HnaHfO+VDJ0NvJWUFxnOghgozxjepebrfDz6zRt4RkVSNjLEj2D3pHb/bUMQecYmJtB5rPrSBwnqiztDrkXb9QA69Z24LuqthK/p5nQdJZ/SMaPUOLWj2lPXeFp6M7+arJjjN92W9/SDxdWtWHjvG3THK5ivH6/aXd9c0Tu8/6lSW/cHjiPh4p7Ka3bNJSK1N2ZEi030rYoukho1pcde/fAsqQkZUGnpXo5R61QWmDb0re1/036Xu5J/6u+ACoc28CW5leya/x/HlG93Kt97ynK2Odk/iqicF1LevN4It1zwJcXG0+dBZrtWbD6iNY2c4Ta79G7U6dPZ4Pn/m6xRv26x7LtLDSEQzUWnozwhcV/mvedKnC5qnUblRTBnpioqA+1PEHnpGPpo2lsUqMeNeWau3F9coF1rPcl6EPbFqi9XqBERi0wZuZUb9jHNvNOdiplBYhb87RxXBJ2oNvevos8mD1xmuG1fT2Y0q0hYRW750t9OxPG18c0ZFySmr1VEo/GLb1KfCrUJE0fiam30LBYmoNfT+UrNr63CrYJot16kfjCL6KDt21LeQRrWUekHUxHrq9jzHdJ3aHbs4HbuGfQgmUW3oXRdQ9RYwXcl4bKTTsVo8VEQLtdqdFW4Vgkbm/Y+FWwWvbH12nNNxw8sCD9AbyrAPhhZjhRDbgSKgHCiTUvYUQqQC7wMtge3ANVLKo1oi8JeAYUAx8Dcp5SrrVbe5RDa+/6qqApWDHLAlQtHLeuUPcdWTqH/9QOpd0seS9hTOmAlZ2+T6vwdZm/BQI6ttuFXwSfnJYreyZrfeoxuzX49wexSZGdEPkFJ2lVL21I4fBhZIKbOBBVQlGLkYyNb+jQJes0rZQImvU8O3UJhxXT/wh+J1eQHVbzNvAm0+eoI2Hz1B63ceVUbeQvZ/8aF7oYEBSrgNhT80vflOnzJxSUk0vemOEGgTOK4P5OrNMw2FMdB7QEdTPPrhwEzt80zgCofyWdLGUmy5ZRsH0I9Xckc87XTszd0v67/OmWEicdomsZF+Imoz+Bvrxm7c1ZtR8Di28je3sjYTptD8Dv0ffuNrbo4aI39owXyn4xqtskntO9CjfJsnptL60WeCrZaluK47NLv1Ho/pCxPq1KXNE1Pdptz8CogWIEb96CXwnZbE+w0tsXdDKeVeACnlXiFEuiabATgGdcjXyvZapLMTFSdLg9Fs2CgvOhlwG7K8wnQdbw/I4nV55D85Cyrc21V+9OY5ffQI1eo5P9CTmzQ1ZND3ffwujf56Q7BUC4gjS35wytAFtny6aYMv9VnXn0xW4WDb1Kd0k4+7hjH2hCwrC0uIY6OG/jwp5R7NmH8vhHDfnlmF3nDQLTm4EGIUtqmdgDn2/UrqDu5Zedzonr+w7xXnBAquBsn1TSBSOH3AuKeCJ5JaNjQlr2est90+hbIjgUfBVLiTN20idbv3puHwaw3XKcnfwc43XwKIWEMP/qceBDi2ahl1u/cOhlqWsnnCGOoPGFqZotFwvTDuKTA0dSOl3KP9PQB8gi1X7H77lIz294Amng80c6jeFNij0+Z0KWVPhzl/v9n/+hdOx3X6d/VZJ9beBByp3r5FQPU3XzleGfkgc2zVMsM//B3/eaHSyEcDRq/r1OGDTrL7P3s/WCpZzuFF37B5/GhDo/OyosKwbxzzOaIXQtQE4qSURdrni4Angc+Bm4Fntb+faVU+B+4WQswFegPH7FM8isij2cRbw61CWLHiBxhIG/7UtTKfrS+5zH88xJHfFprW09974qmeXY9jv68IuA87Vjw8wzHf7g9GRvQNgZ+FEL8Dy4GvpJTfYDPwg4UQW4DB2jHAfGAbkAu8CdxludY6uEafdEya0fAflzmdswcBixZ8JQAJhOrtmzsdl+7YH7S+FNFFu3FTSUprSOPLrld6RDk+Db2UcpuUsov2r6OUcpJWflhKOUhKma39PaKVSynlP6WUWVLKs6SUK4N9EeC+eSpj7IjKz47z9wDlx06EQiW/KfhqqdOx47UEm1O7DvgW0hDVVEw8hSIaiOqdsf5Q9NPacKvgkwNvf+133ez3xvkWcsB1d17tvsZ3X2bPjezdjIrA2DhxNKWH9rN9Rni9YSJFj2gmpgy9q1984weudYuZvnfaR0QjRtwYM1+91/SO2Nzr/Yt2qdwqzwzyXp9Myd78cKsRMXpEKzFl6F2pfU4HiIvOS9TbzNXmoyeo0bmVrnybj56gmh+brfTibXgz4imX9FFGXqGIMmJukrXscCEJ9evontvz/Fy/2zVr3HzJG9mVq5fYo+l436FOzSYEKfh2BSlDznYqM1K/4OvlHHjrK2X4FR5p8pcR1GrdAVlRQdHGtez76oOw6NH0mluomdWOE3mbyX9/hlMqTjO0vG00yQ2bULh+DXs+jazw5t6IOUO/bdQUj4bn+NINIdYmcA7N+YG0Gy80LO9PWIcD07+kZtfWVGtoPFTs7knvRFzCljORRsOuJqW7LWTuxom+3Q3bjaua59aTdzzviJG2fbWT0q0PKd2q4iZ5a9MKPeKr1yB7zESnslqtO9Bu7BStrTHo7OV06t/en6s+dTp1p06n7qZ1ChfROa/hB/7Gfwk3Rz7+ic1Xjuf0viNe5TZfNSGg2D15d00zVP/khp1svnK8MvIRwr75VUHSsu42vhDvyThtnDiavDee5+DirykvNu+dlpjawMkoFu/cyv5vPubIsh+d5A7/5j3Hc6B6NBzyFycjf3JXHnu/mMvJ3VX5nduNm0Kt1h28tpM+eLjT9ez/7lMO/7rAScbTQymSENLPVxhLlbDF0FEoFH7ga5RuVs5Oap8BpF94mWF5f/owQjD0MPtmoyfT+LLrqdvFNuV5bO0K9n7+niHdLCbHSHSBM2ZEr1DEKmYN6om84L+NOb5phBojDxvHcl8j8sI/cnTL935RZdjrdj5bVyZSUIZeoYghMq70vVi/a07wU0Q0GhZ4BqZA2fPJbK/nd7033Vg7n86xQp2wogy9QhFD1G7fRbe87aMvhKT/3JefrPzcbtxU0voNCUm/ehSuX+31/Imt3oLwxhbK0CsUMYDjVIRIcN80J7T9JPu//cTtnJWUFRY46ZLWbwjtxk21Gf0Lhga1b4VnlKFXKGKMtg9Pdjqu26VX5eejK34KiQ4bJ452e6iknX8R7cZNJb56zZDooKgi5vzoFZFPr5FV2Xhyl8zkyI7gxB9y7Mcoy2ePCYImodFl09P/ou2jz7uVN77sOgAqTp8yrUMgHF3xU+WDJfuBScQnV7d9HvMUh3/5gYOL5nurrrAQQyN6IUSKEGKeEGKjEGKDEOIcIUSqEOJ7IcQW7W89TVYIIV4WQuQKIdYKIboH9xIU0YSrwWvdz/fiocIYsqIqnEXzm/7pdn7z5IdDqY4TW14Y6zSlU/8845sAFYFjdOrmJeAbKWU7oAuwAXgYWCClzAYWaMcAFwPZ2r9RQPCX+BVRjT+jXYV3ajTPAqB6RmDZxqwmlLtIfS1At3kwuhKTB4KRDFN1gH7A3wCklKeAU0KI4UB/TWwmsBh4CBgOzJK2nVhLtbeBxv5kmfJmAPx9xQ5Gm4rI5MiOtaS26BxuNYDQ6bL5uUecDFiLv99r+yDNJ4yPVsoKC0iok1K5AO2JuMQk24cI2DQabIzM0bcCDgL/FUJ0AXKAe4GGduMtpdyrJQ4HyAB2OdTP18pUOkGFLqdOBJ4QXY/cJTO9ng/lm0SodKk4VZULOblhRuXnjZMesKR9X7QbN5WKU6Vsfu4Rt3MiPoG2jwQ/u1vuy09WboJqN24qR5Yv4cB3n1aej0tMcnoYbpwU+wM8I4Y+AegO3COlXCaEeImqaRo9hE6Z2yNTCDEK29ROyKiWXDuU3SkMsubjib6FFIYpyPmVlB7n0vJ2cwbM2w5RvXOepmHiEpN87jb1J6CZGT02ThxdKZvaqx+pvfqZ1iOWMDJHnw/kSymXacfzsBn+/UKIxgDa3wMO8s0c6jcF9rg2KqWcLqXsaSROgx7Z/f9uuk63qyf405XCQpbPHsOfX7/E6ZOF7P1zsZouCwL7vnZOqxlKY7blhbFez+/78v2Q6bNx4mhyX9QP1Lf9ralnjJEHg0HNhBA/AbdJKTcJISYAdkfYw1LKZ4UQDwOpUsoHhRCXAHcDw4DewMtSyl66DVe1r6uEr9dZs0bC6vYU0Y3e9yFc34FI0kURVRgKambUj/4eYI4QIhHYBvwd29vAB0KIW4GdgD24xXxsRj4XKNZkA+boznXUa248n6lCoVAobBgy9FLKNYDeU2OQjqwE3J14A6RO42yrm1QoFIozgqgJgRBfLdmtrFG78w3X7361c9apzQvfClgnhUKhiAaixtADVJSXOR03P/sKw3UTkms5HRfsjr60ggqFQuEPURXrZuW7D0XBLkpBs+6XUK9ZR5JrN+DUyUJOFuzjwJbfOLpzXbiVq6Ra9TpknnMNtdMzkVJy/MA2dqz4lNLj3lMWKmKfFmdfQUqzTiTWqEtJ4UH2rP2ew9u9h/wNFqnNO9OwXV+q121IXEIip0uKOLL9dw5s/pXSIO2/8ETjDv1Jb9eXpBopnCw8wKGtK9i7flFIdfCXqDL0kYq3h09ijbok1qhL3SZtncp///RpSosOB6V/T94aiTXr0fWv+nlFU5p2JKVpR6eyFe/8C2lyR2UkBRJTGCO5bjqdL39I91z1ug3JOn8EWeePAEDKCla886+g6SLi4jn7Rs+bqpJq1adxp4E07jTQqfzglqXkLbU+q5Wn73P1ug1p1v1SmnW/tLIskr/HytD7SYPsPmT28T+LTpcrHgWC8+Wo16wTR3f94VTmjwE+e8TzEf3lVQSO2e+FEHH0GjmFwn25bPze2jBWgbytN8juQ4PsPoA1vyl/dLHXicTfTNQb+rj4BLe5e1ccn7oAO5YHlnzByumjXiOncGT7GnJ/8p72zAzNew53MvT+6rt+/otWqaSIMNoMvI2UjPZ+16/TqDW9Rk6xxKh1vfJxEmvUDbgdKwj0voDt97b6w/GcLjlukVaBE1WLsQDrv37J6bjTpb6/aI07DnA63r/pZ0t1CpTUll0tbS+pVmrl50AeSicO51uhjiLC6H71EwEbMzuBDnqy+t4YMUa+et2Glt2Xbi5efuEm6kb0Jw7tdDpOrpPuQTJ4rP5wvNt/pKwoZ8WcB73Wq52eSfshd+ues2p05NqmK/s3/ezxjSbznGto0Lp3QH36uobIX0yPbRq1O9/NA83OrlVfsXf9Qo91Pf3fBfLdrZ+pn64id8ksjuz43WvdpNr1K6dA7fjrTdeo3fkevfhOnShgzcdPeazb8/pniEtIdCsPxm/aX6LO0EcCjq9kOXPHUn66xFC9ogN5LJ89JiTGznXxd9+GJexc+ZnXOnm/fUDebx8Ann+AiujGkzEzYpCWzx5DYs0Uuv71Mbdz/hi11Bb6b7JG2yktOlwpG18tiR7XPe33/phA7svK92yROvV+15Fi7KNu6sYs9Zp1cjreueJTD5LmWD57DMtnjzFs5F3r6tG638hA1aqk7aCqwKDLZ4/xaeRdOZy3yjJdFJGBpwGGGUN06kSBR3nH75wR9L7vqz5wf4gYofx0qeU5Ksy2Z9V9CQZRaeg3L3rb6bhl7ys9yrpGudy3MTTJkf3B0wgnECJhNKGIXPx1ldTbb+H6FukPZaXFAbdhBf7eF/vo3hEr7kugRKWhL8hf73Sc3ubcMGniPztzPg96Hwe2LA16H4rowNOo1ew+CTu/fzLJwxm9dBSRi9X3paJMPwG73nRXKIlKQx8L7Pvzx6D3sT0IG0gUsUOgb3uF+3LdynqN9J6n1ReRsFh/ZPuagOrr3ZfEmikBtRkoPg29EKKtEGKNw79CIcR9QohUIcT3Qogt2t96mrwQQrwshMgVQqwVQqhVPYUiBgl0w5Sn6ZGzRzwfULuBEuieFqs3klmBT0MvpdwkpewqpewK9MAWY/4TbOkEF0gps4EFVKUXvBjI1v6NAiLmqiN5fl6hCBat+90UbhV08TQ9Yt99W88lJIfCf8xO3QwCtkopdwDDAXvG45mA3T9pODBL2lgKpNhTDlqJ64Jsgyz3JFY9r3/G6dgqj5toYM8fC8KtgiJCSG3RJWL7y3nvUY/nsgfcQq+RU+g1cgpx8dZ7gjdse57lbXoj1P8Pjpi9e9cB72mfG0op9wJIKfcKIew7lzKAXQ518rWyvYEo6orrgmzmuddycOtypzK9TQzBpHnP4TRqr5+EONQc3bE23CoozlAadejvc7OTnfKyUtZ9/hxnXe59s2HPGyYDcHj7arb+9E7AOgI0bNvXknaMYua+WI1hQ6+lEbwccPcfchHVKXPLCSuEGIVtaidqESIu7POJnig5bk1kTEVscqr4WNDarpXW3JT8yWP7DW8krN+yG/VbdgNgxZwHkRXlfukItqidrlh1X8pKi0lIquFUZva+WImZEf3FwCop5X7teL8QorE2mm8MHNDK84FmDvWaAntcG5NSTgemg+fk4Fayf9MvlrbXa8QLIKLLlUyhsHP6ZGG4VXDD7gVk1PPGHs7Yyr0iVt2X0ycL3Qx9ODFj6K+natoG4HPgZuBZ7e9nDuV3CyHmAr2BY/YpHqvJ++0DMs+5pvI4qVZq5UaO9hfd5SS7Y/nHlvVr9It4bPdGDm5dTuHezZSdOul3OwqF1cQnVg+3Ch6xG+72Q/5J7fRWPuWtDA9s1X2JtPtryNALIWoAg4E7HIqfBT4QQtwK7ATswdnnA8OAXGweOs5bUy3kYO4yJ0Pf6bIHKhd3ajfMCkqf3ozzgU2/sN3CB4pCESySa9cPtwo+2fDtqwDUbtiK9hf906e8FXFlrLovkRKR044hQy+lLAbqu5QdxuaF4yorAd//K0EgPiEpqO17MvKHtuWw7Zd3g9q3QmEtwZt2tHqatGj/tkoDXj+zO1l9b/Qoa8bY79/4Ew3bne9SGj33xQxqZ6wFKCOviGT83c7vL/mr5wet7cN5q1g+e4zXoHtCGDNrO1d9aZVahgjmffHFGWHo/Y1RbQR/XxU9xQRXKKxm88IZIe3Pn4iuZtn68xxWvKuf59aoJ5z0kZnOakJxXzwR9Yb+YO4yt7K0Vj2djv2NUe1IupaP0io6DL3H0vYUCk8c27Mx3CoEBVlepqKzGiTqDb09UYadZt0vpdV511veT92mHSxtL7l2mqXtKRRmSa7TIKD6umtWMuie0kEnUG+4SPSmi3pD74prflirKCs5EZR2FYpQkPP+OLeyzsMf1pEMjOXvPGB5m8FE774Eg00L3gxJP56IOUPvyrE9myxp53BejiXtQNV2boUiVJTr7OMAaNn7Kr/ai8RRqz9YfV8c3b0dCff0WUwY+oL8Pz2e27RguiV96MWY9oezb3wuKAGaFApf6Hl9pLc5x3Q7GV2G6Jav9LA4qodVv4Gs80cE3IbePH96m3M8XqcnMroMoUHr3m7lZu5LsIgJQ795UWi9ChwxM7LpfMUjiLj4IGqjUHjGU0TTXiOnkNqis6E2ul/zJBmdL9I9V2HSi8UembJOo9am6tlp3e+myrg3jvz+6dN+tedKRueLDN+Xes3Psuy+BAM1tDSBlBW6Prq9Rk5h34YlHhNw27LuOG/EOFmwj+opjYKhZsRQLbkWKRkdqJWeSUpGe6pVr60rZ39YHtuzieMHt1OwZyMnDu20VJfkOunU1vSolZ5JNQ/urb1GTqG8rJRjuzfadNm9kZLCA7qygepSOz2Tuhntfepy/MB2ju3ZSNHB7QHfF0/Bw1r3u9lB5gEc4xDWa34W2Rf8zWe7/tJu8J1Ox0X7t7Fj+ccUF7hHTknP7kPLPle7lTtSWmQ+oJ+R+7Lm46c4daKg8jixeh26XvU43jZZRYpXkDL0Jljxzr88juAbte9nKkTxui+ep8d1k4ivlmyVemHFijnbuk3aUrdJW4+vzEZ/NIHqEp+QRGqLLqS26ELznsPDrov9vnjCrDHxFSnSbDpAq41Z7Yat6HSZf4u6geji676YzfsaKUYeYmTqxhOHtq20vM1A//NOnSiobCNn7lgrVFIoTGOVEYokY2aFLstnj7FkY1Mk3ReIoRF9Wclxt92m2355z4N0YBiNne3KwS1LyVMJuxURwvLZY0hp2pE2A24xXXfVB49TVuq/y3FFeRmlRYdJsiCImLdpU3/ImTvW7/uyf9PP7Fj+iWW6WIWQEbDBIRTx6INFq/NuIK1VD4/nZUU5K+Z4z56jUEQCPW94lrj4ah7P78r5gr1/Lg5K33EJifS4bpLhODWhDCTYduDt1M1o5/H8sT2bLPPu84McKWVPX0LK0CsUCkX0YsjQG3p8CiHuF0KsF0L8IYR4TwiRLITIFEIsE0JsEUK8r6UaRAiRpB3naudbBnYdCoVCoQgEn4ZeCJEB/B/QU0rZCYjHliR8MvCilDIbOArcqlW5FTgqpWwNvKjJKRQKhSJMGPW6SQCqCyESgBrAXmAgME87PxO4Qvs8XDtGOz9ICJVcVaFQKMKFT0MvpdwNvIAtXeBe4BiQAxRIKe1bvvKBDO1zBrBLq1umyUd+3jKFQqGIUYxM3dTDNkrPBJoANYGLdUTtC6p6o3e3xVYhxCghxEohhPXO7gqFQqGoxMjUzYVAnpTyoJTyNPAxcC6Qok3lADQF9mif84FmANr5usAR10allNOllD2NrBgrFAqFwn+MGPqdQB8hRA1trn0Q8CewCLDH8rwZsO9Y+Fw7Rju/UEaCD6dCoVCcoRjyoxdCPAFcC5QBq4HbsM3FzwVStbIRUspSIUQyMBvohm0kf52UcpuP9tWDQKFQKMyjNkwpFApFjGPdhimFQqFQRC/K0CsUCkWMowy9QqFQxDjK0CsUCkWMowy9QqFQxDjK0CsUCkWMEykZpo4Dm8KthEnSgEPhVsIE0aYvKJ1DQbTpC0pnR1oYEYoUQ78p2kIhCCFWRpPO0aYvKJ1DQbTpC0pnf1BTNwqFQhHjKEOvUCgUMU6kGPqwZdYNgGjTOdr0BaVzKIg2fUHpbJqIiHWjUCgUiuARKSN6hUKhUASJsBt6IcRQIcQmIUSuEOLhcOsDIIRoJoRYJITYIIRYL4S4VytPFUJ8L4TYov2tp5ULIcTL2jWsFUJ0D6Pu8UKI1UKIL7XjTCHEMk3n94UQiVp5knacq51vGQZdU4QQ84QQG7V7fU6k32MhxP3ad+IPIcR7QojkSLvHQoi3hRAHhBB/OJSZvq9CiJs1+S1CiJv1+gqyzs9r3421QohPhBApDuce0XTeJIQY4lAeEnuip6/DuQeEEFIIkaYdh0DWtp0AAARASURBVP8eSynD9g+IB7YCrYBE4HegQzh10vRqDHTXPtcGNgMdgOeAh7Xyh4HJ2udhwNfY0ij2AZaFUffRwLvAl9rxB9hyAgC8Dtypfb4LeF37fB3wfhh0nQncpn1OBFIi+R5jy8GQB1R3uLd/i7R7DPQDugN/OJSZuq/Y8kxs0/7W0z7XC7HOFwEJ2ufJDjp30GxFErYUp1s1WxIye6Knr1beDPgW2AGkRco9DukPRedmnQN863D8CPBIOHXyoOdnwGBsm7oaa2WNsfn/A7wBXO8gXykXYj2bAguAgcCX2hfrkMOPpfJ+a1/Gc7TPCZqcCKGudTSjKVzKI/YeU5X4PlW7Z18CQyLxHgMtXYymqfsKXA+84VDuJBcKnV3O/QWYo312shP2+xxqe6KnLzAP6AJsp8rQh/0eh3vqxv7DsZOvlUUM2ut2N2AZ0FBKuRdA+5uuiUXKdUwDHgQqtOP6QIGUskxHr0qdtfPHNPlQ0Qo4CPxXm2p6SwhRkwi+x1LK3cAL2NJr7sV2z3KI3HvsiNn7Gvb77cIt2EbFEKE6CyEuB3ZLKX93ORV2fcNt6IVOWcS4AQkhagEfAfdJKQu9ieqUhfQ6hBCXAgeklDmOxTqi0sC5UJCA7dX3NSllN+AEtikFT4RbX7R57eHYpguaADWBi73oFXadDeBJx4jRXQgxFlsa0zn2Ih2xsOoshKgBjAUe1zutUxZSfcNt6POxzWnZaQrsCZMuTgghqmEz8nOklB9rxfuFEI21842BA1p5JFzHecDlQojt2HL5DsQ2wk8RQthDXTjqVamzdr4uthy/oSIfyJdSLtOO52Ez/JF8jy8E8qSUB6WUp4GPgXOJ3HvsiNn7Ggn3G22B8lLgRqnNb3jRLZw6Z2EbAPyu/QabAquEEI286BUyfcNt6FcA2ZrXQiK2BavPw6wTQggBzAA2SCmnOpz6HLCvjN+Mbe7eXn6TtrreBzhmf00OFVLKR6SUTaWULbHdx4VSyhuBRcBVHnS2X8tVmnzIRmxSyn3ALiFEW61oEPAnEXyPsU3Z9BFC1NC+I3adI/Ieu2D2vn4LXCSEqKe9yVyklYUMIcRQ4CHgcillscOpz4HrNK+mTCAbWE4Y7YmUcp2UMl1K2VL7DeZjc+jYRyTc42Aurhhc0BiGzatlKzA23PpoOvXF9gq1Flij/RuGbX51AbBF+5uqyQvgVe0a1gE9w6x/f6q8blph+xHkAh8CSVp5snacq51vFQY9uwIrtfv8KTbPg4i+x8ATwEbgD2A2Ns+PiLrHwHvY1hBOYzM4t/pzX7HNi+dq//4eBp1zsc1h23+DrzvIj9V03gRc7FAeEnuip6/L+e1ULcaG/R6rnbEKhUIR44R76kahUCgUQUYZeoVCoYhxlKFXKBSKGEcZeoVCoYhxlKFXKBSKGEcZeoVCoYhxlKFXKBSKGEcZeoVCoYhx/h/wcI3dkBo18QAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from scipy.misc import imread \n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"#import random \n",
"from wordcloud import WordCloud, STOPWORDS \n",
" \n",
"reader = ('quijote', 5), ('primera', 4), ('don', 3), ('novela', 3), ('parte', 3), ('obra', 3), ('título', 2), ('ingenioso', 2), ('mancha', 2), ('1605', 2)\n",
"\n",
"class NubePalabras():\n",
" #d = \"\"\n",
" \n",
" wordcloud=\"\"\n",
" \n",
" def __init__(self,diccionario,stopwords={'al','la','las','el','los','en','de','para','por','del','con','es','se','su'}):\n",
" self.stopwords=stopwords\n",
" \n",
" \n",
" \n",
" \n",
" d = {}\n",
" for k,v in reader:\n",
" d[k] = int(v)\n",
" \n",
" \n",
" \n",
" \n",
" wordcloud = WordCloud(background_color=\"white\",max_words=50, width=1500, height=850 , prefer_horizontal = 1 ,\n",
" #relative_scaling = .5, \n",
" stopwords = {'al','la','las','el','los','en','de','para','por','del','con','es','se','su'} # set or space-separated string \n",
" ).generate_from_frequencies(d)\n",
" \n",
" def plot_cloud(self):\n",
" '''\n",
" Función que muestra el resultado de la nube de palabras.\n",
" \n",
" Args:\n",
" Ninguno, ya que al crear el objeto este deberá tener \n",
" '''\n",
" wordcloud=self.wordcloud\n",
" plt.imshow(wordcloud) \n",
" plt.show()\n",
" #self.wordcloud=wordcloud\n",
"\n",
" def store_cloud(self):\n",
" wordcloud=self.wordcloud\n",
" wordcloud.to_file(\"NubePalabrasEx2.jpg\")\n",
" \n",
"\n",
"\n",
"#c=NubePalabras(d)\n",
"c.plot_cloud()\n",
"c.store_cloud()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## III. La ley de Zipf\n",
"\n",
"La ley de Zipf establece que la frecuencia de una palabra (para casi todos los idiomas) es inversamente proporcional a su posición $r$ en un ranking estadístico. De hecho para el idioma inglés se estableció que:\n",
"\n",
"<img src=\"http://mathworld.wolfram.com/images/equations/ZipfsLaw/NumberedEquation1.gif\" class=\"numberedequation\" width=\"116\" height=\"37\" border=\"0\" alt=\" P(r) approx 1/(rln(1.78R)), \">\n",
"\n",
"\n",
"donde $R$ es el número de palabras distintas.\n",
"\n",
"\n",
"### (2 puntos)\n",
"\n",
"Utilice los datos proporcionados en el repositorio para generar una distribución de probabilidad de Zipf utilizando $ln(rank)$ como la variable aleatoria ($xk$) y el $ln$(frecuency) como la distribución asociada a la variable ($pk$).\n",
"\n",
"* Datos\n",
"\n",
"./data/named_entity_recognition_sp_MX_locations.JSON\n",
"\n",
"\n",
"* graficar la función de densidad de probabilidades\n",
"* mostrar con evidencia experimental, con ayuda de scipy.stats, si el coeficiente $1.78R$ aplica también para español, y en caso de que no sea así, diga cuál es el valor del coeficiente correspondiente para español?\n",
"\n",
"\n",
"#### Observaciones\n",
"\n",
"* Báse su respuesta en el ejemplo de la distribución custom (rv_histogram) visto en clase;\n",
"* Ignore las etiquetas ``<START:location>`` y ``<END>`` para la generación de la distribución.\n",
"\n",
"\n",
"#### Referencias:\n",
"\n",
"* https://es.wikipedia.org/wiki/Ley_de_Zipf\n",
"* http://mathworld.wolfram.com/ZipfsLaw.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bsorb 93\n",
"bilirub 58\n",
"bophleb 31\n",
"bmandib 28\n",
"backb 25\n",
"biofib 25\n",
"bsorbab 22\n",
"boemb 15\n",
"bustib 15\n",
"bservab 13\n",
"bicarb 11\n",
"bilob 9\n",
"biocompatib 9\n",
"boxyb 9\n",
"boglob 8\n",
"bioavailab 7\n",
"butab 5\n",
"biodegradab 5\n",
"bolizab 4\n",
"boxyhemoglob 4\n",
"bofuranosylb 3\n",
"bromob 3\n",
"bparab 2\n",
"bothromb 2\n",
"bscrib 2\n",
"butanedicarb 2\n",
"buloseb 2\n",
"bjectionab 2\n",
"butylb 2\n",
"billb 2\n",
"blackb 1\n",
"bilimb 1\n",
"butyrob 1\n",
"brevib 1\n",
"bottleb 1\n",
"bisab 1\n",
"bonob 1\n",
"blueb 1\n",
"bifidob 1\n",
"butylideneb 1\n",
"biodistrib 1\n",
"bitab 1\n",
"benzylb 1\n",
"bocarb 1\n",
"breastb 1\n",
"bonylb 1\n",
"batroxob 1\n",
"bubarb 1\n",
"balthemoglob 1\n",
"bocyclob 1\n",
"benzeneb 1\n",
"biobib 1\n",
"bonimidoylb 1\n",
"bizumab 1\n"
]
}
],
"source": [
"import re\n",
"from operator import itemgetter \n",
" \n",
"frequency = {}\n",
"open_file = open('d2016.bin', 'r')\n",
"file_to_string = open_file.read()\n",
"words = re.findall(r'(b[A-Za-z][a-z]{2,9}b)', file_to_string)\n",
" \n",
"for word in words:\n",
" count = frequency.get(word,0)\n",
" frequency[word] = count + 1\n",
" \n",
"for key, value in reversed(sorted(frequency.items(), key = itemgetter(1))):\n",
" print (key, value)\n",
" \n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": false
},
"outputs": [
{
"ename": "UnicodeDecodeError",
"evalue": "'charmap' codec can't decode byte 0x9d in position 893: character maps to <undefined>",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-2-0b95ab146e7c>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mfrequency\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[0mopen_file\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34mr'C:\\Data\\test2.json'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'r'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 5\u001b[1;33m \u001b[0mfile_to_string\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mopen_file\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfile_to_string\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\encodings\\cp1252.py\u001b[0m in \u001b[0;36mdecode\u001b[1;34m(self, input, final)\u001b[0m\n\u001b[0;32m 21\u001b[0m \u001b[1;32mclass\u001b[0m \u001b[0mIncrementalDecoder\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mIncrementalDecoder\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 22\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mdecode\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minput\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfinal\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 23\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcharmap_decode\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0minput\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0merrors\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mdecoding_table\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 24\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 25\u001b[0m \u001b[1;32mclass\u001b[0m \u001b[0mStreamWriter\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mCodec\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mcodecs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mStreamWriter\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mUnicodeDecodeError\u001b[0m: 'charmap' codec can't decode byte 0x9d in position 893: character maps to <undefined>"
]
}
],
"source": [
"import json\n",
"\n",
"frequency = {}\n",
"open_file = open(r'C:\\Data\\test2.json', 'r')\n",
"file_to_string = open_file.read()\n",
"print(file_to_string)\n",
"\n",
"# Expresion regular(Zipf significa la frecuencia con la que aparecen las distintas palabras\n",
"\n",
"words = re.findall(r'(b[A-Za-z][a-z]{2,9}b)', file_to_string)\n",
"\n",
"#Bucle para la frecuencia de cada una de las palabras \n",
"\n",
"for word in words:\n",
" count = frequency.get(word,0)\n",
" frequency[word] = count + 1\n",
"\n",
"# imprimir del diccionario, mostrando la palabra (clave) y el número de veces que ha aparecido \n",
"\n",
"for key, value in reversed(sorted(frequency.items(), key = itemgetter(1))):\n",
" print (key, value)\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
...@@ -31,7 +573,7 @@ ...@@ -31,7 +573,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.8rc1" "version": "3.7.1"
} }
}, },
"nbformat": 4, "nbformat": 4,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment