Update CoreNLP colab tutorial to be compatible with v1.1.1

author: Yuhao Zhang <zyh@stanford.edu> 2020-08-14 02:07:54 +0300
committer: Yuhao Zhang <zyh@stanford.edu> 2020-08-14 02:07:54 +0300
commit: 708c9358bbb9fd43d7bd4333ac621e1b35a77751 (patch)
tree: c5fde4c963943a184569303b6c2391c41f92ab1f /demo
parent: 1b23d69cd6127d5b7161fd091f975fb044a2cdcf (diff)
1 files changed, 77 insertions, 28 deletions
diff --git a/demo/Stanza_CoreNLP_Interface.ipynb b/demo/Stanza_CoreNLP_Interface.ipynb
index 33d6f550..a19cffbe 100644
--- a/demo/Stanza_CoreNLP_Interface.ipynb
+++ b/demo/Stanza_CoreNLP_Interface.ipynb
@@ -70,7 +70,7 @@
         "# Import stanza\n",
         "import stanza"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -84,39 +84,80 @@
         "\n",
         "In order for the interface to work, the Stanford CoreNLP library has to be installed and a `CORENLP_HOME` environment variable has to be pointed to the installation location.\n",
         "\n",
-        "**Note**: if you are want to use the interface in a terminal (instead of a Colab notebook), you can properly set the `CORENLP_HOME` environment variable with:\n",
-        "\n",
-        "```bash\n",
-        "export CORENLP_HOME=path_to_corenlp\n",
-        "```\n",
-        "\n",
-        "Here we instead set this variable with the Python `os` library, simply because `export` command is not well-supported in Colab notebook."
+        "Here we are going to show you how to download and install the CoreNLP library on your machine, with Stanza's installation command:"
       ]
     },
     {
       "cell_type": "code",
       "metadata": {
-        "id": "O3oBy0i-6HWL",
+        "id": "MgK6-LPV-OdA",
         "colab_type": "code",
         "colab": {}
       },
       "source": [
-        "# Download the Stanford CoreNLP Java library and unzip it to a ./corenlp folder\n",
-        "!echo \"Downloading CoreNLP...\"\n",
-        "!wget \"http://nlp.stanford.edu/software/stanford-corenlp-4.1.0.zip\" -O corenlp.zip\n",
-        "!unzip corenlp.zip\n",
-        "!mv ./stanford-corenlp-4.1.0 ./corenlp\n",
+        "# Download the Stanford CoreNLP package with Stanza's installation command\n",
+        "# This'll take several minutes, depending on the network speed\n",
+        "corenlp_dir = './corenlp'\n",
+        "stanza.install_corenlp(dir=corenlp_dir)\n",
         "\n",
         "# Set the CORENLP_HOME environment variable to point to the installation location\n",
         "import os\n",
-        "os.environ[\"CORENLP_HOME\"] = \"./corenlp\""
+        "os.environ[\"CORENLP_HOME\"] = corenlp_dir"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Jdq8MT-NAhKj",
+        "colab_type": "text"
+      },
+      "source": [
+        "That's all for the installation! 🎉  We can now double check if the installation is successful by listing files in the CoreNLP directory. You should be able to see a number of `.jar` files by running the following command:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "K5eIOaJp_tuo",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "# Examine the CoreNLP installation folder to make sure the installation is successful\n",
+        "!ls $CORENLP_HOME"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
       "cell_type": "markdown",
       "metadata": {
+        "id": "S0xb9BHt__gx",
+        "colab_type": "text"
+      },
+      "source": [
+        "**Note 1**:\n",
+        "If you are want to use the interface in a terminal (instead of a Colab notebook), you can properly set the `CORENLP_HOME` environment variable with:\n",
+        "\n",
+        "```bash\n",
+        "export CORENLP_HOME=path_to_corenlp_dir\n",
+        "```\n",
+        "\n",
+        "Here we instead set this variable with the Python `os` library, simply because `export` command is not well-supported in Colab notebook.\n",
+        "\n",
+        "\n",
+        "**Note 2**:\n",
+        "The `stanza.install_corenlp()` function is only available since Stanza v1.1.1. If you are using an earlier version of Stanza, please check out our [manual installation page](https://stanfordnlp.github.io/stanza/client_setup.html#manual-installation) for how to install CoreNLP on your computer.\n",
+        "\n",
+        "**Note 3**:\n",
+        "Besides the installation function, we also provide a `stanza.download_corenlp_models()` function to help you download additional CoreNLP models for different languages that are not shipped with the default installation. Check out our [automatic installation website page](https://stanfordnlp.github.io/stanza/client_setup.html#automated-installation) for more information on how to use it."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
         "id": "xJsuO6D8D05q",
         "colab_type": "text"
       },
@@ -149,7 +190,7 @@
         "# Import client module\n",
         "from stanza.server import CoreNLPClient"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -163,6 +204,8 @@
         "\n",
         "Additionally, the client constructor accepts a `memory` argument, which specifies how much memory will be allocated to the background Java process. An `endpoint` option can be used to specify a port number used by the communication between the server and the client. The default port is 9000. However, since this port is pre-occupied by a system process in Colab, we'll manually set it to 9001 in the following example.\n",
         "\n",
+        "Also, here we manually set `be_quiet=True` to avoid an IO issue in colab notebook. You should be able to use `be_quiet=False` on your own computer, which will print detailed logging information from CoreNLP during usage.\n",
+        "\n",
         "For more options in constructing the clients, please refer to the [CoreNLP Client Options List](https://stanfordnlp.github.io/stanza/corenlp_client.html#corenlp-client-options)."
       ]
     },
@@ -175,7 +218,11 @@
       },
       "source": [
         "# Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001\n",
-        "client = CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], memory='4G', endpoint='http://localhost:9001')\n",
+        "client = CoreNLPClient(\n",
+        "    annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n",
+        "    memory='4G', \n",
+        "    endpoint='http://localhost:9001',\n",
+        "    be_quiet=True)\n",
         "print(client)\n",
         "\n",
         "# Start the background server and wait for some time\n",
@@ -183,7 +230,7 @@
         "client.start()\n",
         "import time; time.sleep(10)"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -193,7 +240,7 @@
         "colab_type": "text"
       },
       "source": [
-        "Now if you print the background processes, you should be able to find the Java CoreNLP server running."
+        "After the above code block finishes executing, if you print the background processes, you should be able to find the Java CoreNLP server running."
       ]
     },
     {
@@ -205,9 +252,10 @@
       },
       "source": [
         "# Print background processes and look for java\n",
+        "# You should be able to see a StanfordCoreNLPServer java process running in the background\n",
         "!ps -o pid,cmd | grep java"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -237,7 +285,7 @@
         "document = client.annotate(text)\n",
         "print(type(document))"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -271,7 +319,7 @@
         "        print(\"{:12s}\\t{:12s}\\t{:6s}\\t{}\".format(t.word, t.lemma, t.pos, t.ner))\n",
         "    print(\"\")"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -299,7 +347,7 @@
         "    for m in sent.mentions:\n",
         "        print(\"{:30s}\\t{}\".format(m.entityMentionText, m.entityType))"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -326,7 +374,7 @@
         "# Print annotations of a mention\n",
         "print(document.sentence[0].mentions[0])"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -365,7 +413,7 @@
         "time.sleep(10)\n",
         "!ps -o pid,cmd | grep java"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
@@ -403,7 +451,8 @@
       },
       "source": [
         "print(\"Starting a server with the Python \\\"with\\\" statement...\")\n",
-        "with CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], memory='4G', endpoint='http://localhost:9001') as client:\n",
+        "with CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n",
+        "                   memory='4G', endpoint='http://localhost:9001', be_quiet=True) as client:\n",
         "    text = \"Albert Einstein was a German-born theoretical physicist.\"\n",
         "    document = client.annotate(text)\n",
         "\n",
@@ -414,7 +463,7 @@
         "\n",
         "print(\"\\nThe server should be stopped upon exit from the \\\"with\\\" statement.\")"
       ],
-      "execution_count": 0,
+      "execution_count": null,
       "outputs": []
     },
     {
author	Yuhao Zhang <zyh@stanford.edu>	2020-08-14 02:07:54 +0300
committer	Yuhao Zhang <zyh@stanford.edu>	2020-08-14 02:07:54 +0300
commit	708c9358bbb9fd43d7bd4333ac621e1b35a77751 (patch)
tree	c5fde4c963943a184569303b6c2391c41f92ab1f /demo
parent	1b23d69cd6127d5b7161fd091f975fb044a2cdcf (diff)