diff options
author | Yuhao Zhang <zyh@stanford.edu> | 2020-08-14 02:07:54 +0300 |
---|---|---|
committer | Yuhao Zhang <zyh@stanford.edu> | 2020-08-14 02:07:54 +0300 |
commit | 708c9358bbb9fd43d7bd4333ac621e1b35a77751 (patch) | |
tree | c5fde4c963943a184569303b6c2391c41f92ab1f /demo | |
parent | 1b23d69cd6127d5b7161fd091f975fb044a2cdcf (diff) |
Update CoreNLP colab tutorial to be compatible with v1.1.1
Diffstat (limited to 'demo')
-rw-r--r-- | demo/Stanza_CoreNLP_Interface.ipynb | 105 |
1 files changed, 77 insertions, 28 deletions
diff --git a/demo/Stanza_CoreNLP_Interface.ipynb b/demo/Stanza_CoreNLP_Interface.ipynb index 33d6f550..a19cffbe 100644 --- a/demo/Stanza_CoreNLP_Interface.ipynb +++ b/demo/Stanza_CoreNLP_Interface.ipynb @@ -70,7 +70,7 @@ "# Import stanza\n", "import stanza" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -84,39 +84,80 @@ "\n", "In order for the interface to work, the Stanford CoreNLP library has to be installed and a `CORENLP_HOME` environment variable has to be pointed to the installation location.\n", "\n", - "**Note**: if you are want to use the interface in a terminal (instead of a Colab notebook), you can properly set the `CORENLP_HOME` environment variable with:\n", - "\n", - "```bash\n", - "export CORENLP_HOME=path_to_corenlp\n", - "```\n", - "\n", - "Here we instead set this variable with the Python `os` library, simply because `export` command is not well-supported in Colab notebook." + "Here we are going to show you how to download and install the CoreNLP library on your machine, with Stanza's installation command:" ] }, { "cell_type": "code", "metadata": { - "id": "O3oBy0i-6HWL", + "id": "MgK6-LPV-OdA", "colab_type": "code", "colab": {} }, "source": [ - "# Download the Stanford CoreNLP Java library and unzip it to a ./corenlp folder\n", - "!echo \"Downloading CoreNLP...\"\n", - "!wget \"http://nlp.stanford.edu/software/stanford-corenlp-4.1.0.zip\" -O corenlp.zip\n", - "!unzip corenlp.zip\n", - "!mv ./stanford-corenlp-4.1.0 ./corenlp\n", + "# Download the Stanford CoreNLP package with Stanza's installation command\n", + "# This'll take several minutes, depending on the network speed\n", + "corenlp_dir = './corenlp'\n", + "stanza.install_corenlp(dir=corenlp_dir)\n", "\n", "# Set the CORENLP_HOME environment variable to point to the installation location\n", "import os\n", - "os.environ[\"CORENLP_HOME\"] = \"./corenlp\"" + "os.environ[\"CORENLP_HOME\"] = corenlp_dir" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jdq8MT-NAhKj", + "colab_type": "text" + }, + "source": [ + "That's all for the installation! 🎉 We can now double check if the installation is successful by listing files in the CoreNLP directory. You should be able to see a number of `.jar` files by running the following command:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "K5eIOaJp_tuo", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Examine the CoreNLP installation folder to make sure the installation is successful\n", + "!ls $CORENLP_HOME" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { + "id": "S0xb9BHt__gx", + "colab_type": "text" + }, + "source": [ + "**Note 1**:\n", + "If you are want to use the interface in a terminal (instead of a Colab notebook), you can properly set the `CORENLP_HOME` environment variable with:\n", + "\n", + "```bash\n", + "export CORENLP_HOME=path_to_corenlp_dir\n", + "```\n", + "\n", + "Here we instead set this variable with the Python `os` library, simply because `export` command is not well-supported in Colab notebook.\n", + "\n", + "\n", + "**Note 2**:\n", + "The `stanza.install_corenlp()` function is only available since Stanza v1.1.1. If you are using an earlier version of Stanza, please check out our [manual installation page](https://stanfordnlp.github.io/stanza/client_setup.html#manual-installation) for how to install CoreNLP on your computer.\n", + "\n", + "**Note 3**:\n", + "Besides the installation function, we also provide a `stanza.download_corenlp_models()` function to help you download additional CoreNLP models for different languages that are not shipped with the default installation. Check out our [automatic installation website page](https://stanfordnlp.github.io/stanza/client_setup.html#automated-installation) for more information on how to use it." + ] + }, + { + "cell_type": "markdown", + "metadata": { "id": "xJsuO6D8D05q", "colab_type": "text" }, @@ -149,7 +190,7 @@ "# Import client module\n", "from stanza.server import CoreNLPClient" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -163,6 +204,8 @@ "\n", "Additionally, the client constructor accepts a `memory` argument, which specifies how much memory will be allocated to the background Java process. An `endpoint` option can be used to specify a port number used by the communication between the server and the client. The default port is 9000. However, since this port is pre-occupied by a system process in Colab, we'll manually set it to 9001 in the following example.\n", "\n", + "Also, here we manually set `be_quiet=True` to avoid an IO issue in colab notebook. You should be able to use `be_quiet=False` on your own computer, which will print detailed logging information from CoreNLP during usage.\n", + "\n", "For more options in constructing the clients, please refer to the [CoreNLP Client Options List](https://stanfordnlp.github.io/stanza/corenlp_client.html#corenlp-client-options)." ] }, @@ -175,7 +218,11 @@ }, "source": [ "# Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001\n", - "client = CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], memory='4G', endpoint='http://localhost:9001')\n", + "client = CoreNLPClient(\n", + " annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n", + " memory='4G', \n", + " endpoint='http://localhost:9001',\n", + " be_quiet=True)\n", "print(client)\n", "\n", "# Start the background server and wait for some time\n", @@ -183,7 +230,7 @@ "client.start()\n", "import time; time.sleep(10)" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -193,7 +240,7 @@ "colab_type": "text" }, "source": [ - "Now if you print the background processes, you should be able to find the Java CoreNLP server running." + "After the above code block finishes executing, if you print the background processes, you should be able to find the Java CoreNLP server running." ] }, { @@ -205,9 +252,10 @@ }, "source": [ "# Print background processes and look for java\n", + "# You should be able to see a StanfordCoreNLPServer java process running in the background\n", "!ps -o pid,cmd | grep java" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -237,7 +285,7 @@ "document = client.annotate(text)\n", "print(type(document))" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -271,7 +319,7 @@ " print(\"{:12s}\\t{:12s}\\t{:6s}\\t{}\".format(t.word, t.lemma, t.pos, t.ner))\n", " print(\"\")" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -299,7 +347,7 @@ " for m in sent.mentions:\n", " print(\"{:30s}\\t{}\".format(m.entityMentionText, m.entityType))" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -326,7 +374,7 @@ "# Print annotations of a mention\n", "print(document.sentence[0].mentions[0])" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -365,7 +413,7 @@ "time.sleep(10)\n", "!ps -o pid,cmd | grep java" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { @@ -403,7 +451,8 @@ }, "source": [ "print(\"Starting a server with the Python \\\"with\\\" statement...\")\n", - "with CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], memory='4G', endpoint='http://localhost:9001') as client:\n", + "with CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n", + " memory='4G', endpoint='http://localhost:9001', be_quiet=True) as client:\n", " text = \"Albert Einstein was a German-born theoretical physicist.\"\n", " document = client.annotate(text)\n", "\n", @@ -414,7 +463,7 @@ "\n", "print(\"\\nThe server should be stopped upon exit from the \\\"with\\\" statement.\")" ], - "execution_count": 0, + "execution_count": null, "outputs": [] }, { |