{ "cells": [ { "cell_type": "markdown", "id": "07d17aca", "metadata": {}, "source": [ "# Math599 2022S" ] }, { "cell_type": "markdown", "id": "aec20849", "metadata": {}, "source": [ "## ML-exam" ] }, { "cell_type": "markdown", "id": "a28b2d94", "metadata": {}, "source": [ "Name: " ] }, { "cell_type": "markdown", "id": "bddbf2f2", "metadata": {}, "source": [ "Student ID #: " ] }, { "cell_type": "markdown", "id": "f8f6de4f", "metadata": {}, "source": [ "Please read the instructions carefully:\n", "1. Write your **name** and **Student ID #** first.\n", "2. You are allowed to use the internet, but you are **NOT allowed to communicate** with others in any form.\n", "3. Different problems might use same variable names. Make sure you use the right one to answer the problem.\n", "4. If the answer is too long, it is enough write **two digits after the decimal point**.\n", "5. Please copy your answer and paste it into the Markdown cell with \"_Your answer:_\".\n", "6. Run the next cell first. Then you may start." ] }, { "cell_type": "code", "execution_count": 1, "id": "869f73b5", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "ver = \"A\"" ] }, { "cell_type": "markdown", "id": "f20c914f", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "4645bb63", "metadata": {}, "source": [ "###### Problem 1 [2pt]" ] }, { "cell_type": "markdown", "id": "c74c9021", "metadata": {}, "source": [ "Let \n", "```python \n", "data = np.genfromtxt(\"p1-%s.csv\"%ver, delimiter=\",\")\n", "```\n", "If you project the points (rows) in `data` onto its 0-th and 1-st principal components, then you will see a four-letter word in upper case. \n", "What is the word? \n", "(Note the word might be upside down or left/right mirrored.)" ] }, { "cell_type": "code", "execution_count": 2, "id": "8fa6b89e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "### your code to reach the answer\n", "data = np.genfromtxt(\"p1-%s.csv\"%ver, delimiter=\",\")\n", "\n", "from sklearn.decomposition import PCA\n", "model = PCA(2)\n", "X_new = model.fit_transform(data)\n", "\n", "plt.axis(\"equal\")\n", "plt.scatter(*X_new.T)" ] }, { "cell_type": "markdown", "id": "1f891a80", "metadata": {}, "source": [ "_Your answer:_ Free" ] }, { "cell_type": "markdown", "id": "9a821199", "metadata": {}, "source": [ "###### Problem 2 [2pt]" ] }, { "cell_type": "markdown", "id": "f52345ed", "metadata": {}, "source": [ "Let \n", "```python\n", "dist = np.genfromtxt(\"p2-%s.csv\"%ver, delimiter=\",\")\n", "```\n", "be the distance matrix of a dataset. \n", "Use MDS to reconstruct the dataset in $\\mathbb{R}^2$. \n", "Then draw the scatter plot of them. \n", "You are supposed to see one of the four suit symbols:\n", "\n", "- spade ♠\n", "- heart ♥\n", "- diamond ♦\n", "- club ♣\n", "\n", "What do you see? " ] }, { "cell_type": "code", "execution_count": 3, "id": "4557e2f7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "### your code to reach the answer\n", "dist = np.genfromtxt(\"p2-%s.csv\"%ver, delimiter=\",\")\n", "\n", "from sklearn.manifold import MDS\n", "model = MDS(dissimilarity=\"precomputed\")\n", "X_new = model.fit_transform(dist)\n", "\n", "plt.axis(\"equal\")\n", "plt.scatter(*X_new.T)" ] }, { "cell_type": "markdown", "id": "ae2c5dde", "metadata": {}, "source": [ "_Your answer:_ heart" ] }, { "cell_type": "markdown", "id": "4037b735", "metadata": {}, "source": [ "###### Problem 3 [2pt]" ] }, { "cell_type": "markdown", "id": "8214252d", "metadata": {}, "source": [ "Let \n", "```python\n", "X = np.genfromtxt(\"p3-%s.csv\"%ver, delimiter=\",\")\n", "y = np.array([0]*50 + [1]*50 + [2]*50 + [3]*50)\n", "w = np.array([0,0,0,0])\n", "```\n", "Suppose after training a $k$-means model with the data `X` , \n", "the output label array is `y` . \n", "What is the predicted label of `w` under this setting?" ] }, { "cell_type": "code", "execution_count": 4, "id": "6027603e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### your code to reach the answer\n", "X = np.genfromtxt(\"p3-%s.csv\"%ver, delimiter=\",\")\n", "y = np.array([0]*50 + [1]*50 + [2]*50 + [3]*50)\n", "w = np.array([0,0,0,0])\n", "\n", "centers = np.zeros((4,X.shape[1]))\n", "for i in range(4):\n", " centers[i] = X[y == i].mean(axis=0)\n", " \n", "dist = np.sqrt(np.sum((centers - w) ** 2, axis=1))\n", "dist.argmin()" ] }, { "cell_type": "markdown", "id": "d5ad1024", "metadata": {}, "source": [ "_Your answer:_ 2" ] }, { "cell_type": "markdown", "id": "d3b78dc4", "metadata": {}, "source": [ "###### Problem 4 [2pt]" ] }, { "cell_type": "markdown", "id": "c39707fe", "metadata": {}, "source": [ "Let \n", "```python\n", "X = np.genfromtxt(\"p4-%s.csv\"%ver, delimiter=\",\")\n", "```\n", "One of the data points (rows) in `X` is an obvious outlier. \n", "Use DBSCAN to find its index. " ] }, { "cell_type": "code", "execution_count": 5, "id": "30881c17", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([375]),)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### your code to reach the answer\n", "X = np.genfromtxt(\"p4-%s.csv\"%ver, delimiter=\",\")\n", "\n", "from sklearn.cluster import DBSCAN\n", "model = DBSCAN()\n", "y_new = model.fit_predict(X)\n", "\n", "np.where(y_new == -1)" ] }, { "cell_type": "markdown", "id": "74c9f212", "metadata": {}, "source": [ "_Your answer:_ 375" ] }, { "cell_type": "markdown", "id": "aa4b2acf", "metadata": {}, "source": [ "###### Problem 5 [2pt]" ] }, { "cell_type": "markdown", "id": "f5095930", "metadata": {}, "source": [ "Let \n", "```python\n", "data = np.genfromtxt(\"p5-%s.csv\"%ver, delimiter=\",\")\n", "x,y = data.T\n", "```\n", "Let $x_i$ and $y_i$ be the entries in `x` and `y` , respectively. \n", "Let $f(x) = 3 + 4x$. \n", "Find the value $\\sum_{i} (f(x_i) - y_i)^2$. " ] }, { "cell_type": "code", "execution_count": 6, "id": "fc006257", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.849279764498071" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### your code to reach the answer\n", "data = np.genfromtxt(\"p5-%s.csv\"%ver, delimiter=\",\")\n", "x,y = data.T\n", "\n", "np.sum((3 + 4*x - y)**2) " ] }, { "cell_type": "markdown", "id": "7f24fc2a", "metadata": {}, "source": [ "_Your answer:_ ~4.84" ] }, { "cell_type": "markdown", "id": "b1023f76", "metadata": {}, "source": [ "###### Problem 6 [2pt]" ] }, { "cell_type": "markdown", "id": "1fecc353", "metadata": {}, "source": [ "Let \n", "```python\n", "data = np.genfromtxt(\"p6-%s.csv\"%ver, delimiter=\",\")\n", "x,y = data.T\n", "```\n", "Let $x_i$ and $y_i$ be the entries in `x` and `y` , respectively. \n", "Let $f(x) = c_2x^2 + c_4x^4$. \n", "Find $c_2, c_4$ such that $\\sum_{i} (f(x_i) - y_i)^2$ is minimized." ] }, { "cell_type": "code", "execution_count": 7, "id": "7e19ee43", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1.99944934, 2.99999746])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### your code to reach the answer\n", "data = np.genfromtxt(\"p6-%s.csv\"%ver, delimiter=\",\")\n", "x,y = data.T\n", "\n", "X = x[:,np.newaxis] ** np.array([2,4])\n", "from sklearn.linear_model import LinearRegression\n", "model = LinearRegression(fit_intercept=False)\n", "model.fit(X, y)\n", "model.coef_" ] }, { "cell_type": "markdown", "id": "b80f0205", "metadata": {}, "source": [ "_Your answer:_ $c_2 \\sim -1.99$, $c_4 \\sim 2.99$" ] }, { "cell_type": "markdown", "id": "ee06e733", "metadata": {}, "source": [ "###### Problem 7 [2pt]" ] }, { "cell_type": "markdown", "id": "d5d8fe2f", "metadata": {}, "source": [ "Let \n", "```python\n", "path = \"p7-%s\"%ver\n", "```\n", "There are 100 pictures in the `path` folder, \n", "where `digits00` ~ `digits49` are pictures of hand-written digit 0, \n", "while `digits50` ~ `digits99` are pictures of hand-written digit 1. \n", "Train an $k$-nearest neighbor classification model with `k = 5` . \n", "Then predict what is the hand-written digit (0 or 1) in `blur.png` ." ] }, { "cell_type": "code", "execution_count": 8, "id": "0179b01d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### your code to reach the answer\n", "path = \"p7-%s\"%ver\n", "\n", "import os\n", "from PIL import Image\n", "X = np.zeros((100, 28, 28), dtype=int)\n", "for j in range(100):\n", " img = Image.open(os.path.join(path, \"digit%02d.png\"%j))\n", " X[j] = np.array(img)\n", "X_flat = X.reshape(100, 28*28)\n", "y = np.array([0]*50 + [1]*50)\n", "\n", "img = Image.open(os.path.join(path, \"blur.png\"))\n", "blur = np.array(img)\n", "blur_flat = blur.reshape(1, 28*28)\n", "\n", "from sklearn.neighbors import KNeighborsClassifier \n", "model = KNeighborsClassifier(5)\n", "model.fit(X_flat, y)\n", "y_new = model.predict(blur_flat)\n", "y_new" ] }, { "cell_type": "markdown", "id": "ac8180b1", "metadata": {}, "source": [ "_Your answer:_ 1" ] }, { "cell_type": "markdown", "id": "6fb8084a", "metadata": {}, "source": [ "###### Problem 8 [2pt]" ] }, { "cell_type": "markdown", "id": "2ff55bfa", "metadata": {}, "source": [ "Let \n", "```python\n", "y = np.genfromtxt(\"p8-%s.csv\"%ver, delimiter=\",\")\n", "```\n", "The array `y` contains 1000 labels, \n", "representing the three categories `0, 1, 2` . \n", "Find the Gini impurity of `y` ." ] }, { "cell_type": "code", "execution_count": 9, "id": "1b592d99", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.33999999999999997" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### your code to reach the answer\n", "y = np.genfromtxt(\"p8-%s.csv\"%ver, delimiter=\",\")\n", "\n", "dtrib = np.unique(y, return_counts=True)[1]\n", "prob = dtrib / dtrib.sum()\n", "np.sum(prob * (1 - prob))" ] }, { "cell_type": "markdown", "id": "a32b204e", "metadata": {}, "source": [ "_Your answer:_ ~0.33" ] }, { "cell_type": "markdown", "id": "a79222bc", "metadata": {}, "source": [ "###### Problem 9 [2pt]" ] }, { "cell_type": "markdown", "id": "81b41427", "metadata": {}, "source": [ "Let \n", "```python\n", "X = np.genfromtxt(\"p9-%s.csv\"%ver, delimiter=\",\")\n", "```\n", "Train $k$-means models with `k = 1,2,...,10` \n", "and find the corresponding inertias. \n", "Use the elbow method to suggest the number of clusters in `X` ." ] }, { "cell_type": "code", "execution_count": 10, "id": "60d73c16", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "### your code to reach the answer\n", "X = np.genfromtxt(\"p9-%s.csv\"%ver, delimiter=\",\")\n", "\n", "from sklearn.cluster import KMeans\n", "\n", "inertias = []\n", "for k in range(1,11):\n", " model = KMeans(k)\n", " model.fit(X)\n", " inertias.append(model.inertia_)\n", " \n", "plt.plot(np.arange(1,11), inertias)" ] }, { "cell_type": "markdown", "id": "98c62ee2", "metadata": {}, "source": [ "_Your answer:_ 3" ] }, { "cell_type": "markdown", "id": "75db8030", "metadata": {}, "source": [ "###### Problem 10 [2pt]" ] }, { "cell_type": "markdown", "id": "4bd084a7", "metadata": {}, "source": [ "Let \n", "```python \n", "X = np.genfromtxt(\"p10-%s.csv\"%ver, delimiter=\",\")\n", "```\n", "It is known that the points (rows) in `X` look like three layers of spheres centered at the origin. \n", "Find their radii. \n", "\n", "Recall that the radius of a sphere centered at the origin is $\\sqrt{x^2 + y^2 + z^2}$ \n", "for any point $(x,y,z)$ on the sphere. " ] }, { "cell_type": "code", "execution_count": 11, "id": "1e3c39ae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1.73205081],\n", " [1. ],\n", " [2.23606798]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### your code to reach the answer\n", "X = np.genfromtxt(\"p10-%s.csv\"%ver, delimiter=\",\")\n", "\n", "radii = np.sqrt(np.sum(X**2, axis=1))[:,np.newaxis]\n", "from sklearn.cluster import KMeans\n", "model = KMeans(3)\n", "model.fit(radii)\n", "\n", "model.cluster_centers_" ] }, { "cell_type": "markdown", "id": "775a8c8c", "metadata": {}, "source": [ "_Your answer:_ ~1, ~1.73, ~2.23" ] }, { "cell_type": "markdown", "id": "88a3105f", "metadata": {}, "source": [ "###### Problem 11 [extra 2pt]" ] }, { "cell_type": "markdown", "id": "9a61a2b9", "metadata": {}, "source": [ "Suppose your have 1000 pictures of pictures and \n", "they are labeled by either `0` for dogs or `1` for cats. \n", "If you want to train a model to make a prediction of other pictures \n", "on whether they are pictures of dogs or cats, \n", "which model you would choose. \n", "\n", "Choose from one of the following: \n", "\n", "1. PCA\n", "2. MDS\n", "3. KMeans\n", "4. DBSCAN\n", "5. LinearRegression\n", "6. PolynomialRegression\n", "7. KNeighborsClassifier\n", "8. DecisionTreeClassifier\n", "\n", "The answer might not be unique. \n", "Add one or two sentences to justify your choice." ] }, { "cell_type": "markdown", "id": "73d873a9", "metadata": {}, "source": [ "_Your answer:_ Either 7 or 8 is okay, since it is a classification problem." ] }, { "cell_type": "markdown", "id": "c2b8c98a", "metadata": {}, "source": [ "--- \n", "Exam ends here. \n", "Total point = 20 (+2)" ] }, { "cell_type": "code", "execution_count": null, "id": "9a72916c", "metadata": {}, "outputs": [], "source": [ "### points for each problem\n", "pts = [0,0,0,0,0, \n", " 0,0,0,0,0, \n", " 0]\n", "total = sum(pts)\n", "print(\"Your total score =\", total)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 5 }