Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/stanfordnlp/stanza.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/demo
diff options
context:
space:
mode:
authorYuhao Zhang <zyh@stanford.edu>2020-08-14 02:07:54 +0300
committerYuhao Zhang <zyh@stanford.edu>2020-08-14 02:07:54 +0300
commit708c9358bbb9fd43d7bd4333ac621e1b35a77751 (patch)
treec5fde4c963943a184569303b6c2391c41f92ab1f /demo
parent1b23d69cd6127d5b7161fd091f975fb044a2cdcf (diff)
Update CoreNLP colab tutorial to be compatible with v1.1.1
Diffstat (limited to 'demo')
-rw-r--r--demo/Stanza_CoreNLP_Interface.ipynb105
1 files changed, 77 insertions, 28 deletions
diff --git a/demo/Stanza_CoreNLP_Interface.ipynb b/demo/Stanza_CoreNLP_Interface.ipynb
index 33d6f550..a19cffbe 100644
--- a/demo/Stanza_CoreNLP_Interface.ipynb
+++ b/demo/Stanza_CoreNLP_Interface.ipynb
@@ -70,7 +70,7 @@
"# Import stanza\n",
"import stanza"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -84,39 +84,80 @@
"\n",
"In order for the interface to work, the Stanford CoreNLP library has to be installed and a `CORENLP_HOME` environment variable has to be pointed to the installation location.\n",
"\n",
- "**Note**: if you are want to use the interface in a terminal (instead of a Colab notebook), you can properly set the `CORENLP_HOME` environment variable with:\n",
- "\n",
- "```bash\n",
- "export CORENLP_HOME=path_to_corenlp\n",
- "```\n",
- "\n",
- "Here we instead set this variable with the Python `os` library, simply because `export` command is not well-supported in Colab notebook."
+ "Here we are going to show you how to download and install the CoreNLP library on your machine, with Stanza's installation command:"
]
},
{
"cell_type": "code",
"metadata": {
- "id": "O3oBy0i-6HWL",
+ "id": "MgK6-LPV-OdA",
"colab_type": "code",
"colab": {}
},
"source": [
- "# Download the Stanford CoreNLP Java library and unzip it to a ./corenlp folder\n",
- "!echo \"Downloading CoreNLP...\"\n",
- "!wget \"http://nlp.stanford.edu/software/stanford-corenlp-4.1.0.zip\" -O corenlp.zip\n",
- "!unzip corenlp.zip\n",
- "!mv ./stanford-corenlp-4.1.0 ./corenlp\n",
+ "# Download the Stanford CoreNLP package with Stanza's installation command\n",
+ "# This'll take several minutes, depending on the network speed\n",
+ "corenlp_dir = './corenlp'\n",
+ "stanza.install_corenlp(dir=corenlp_dir)\n",
"\n",
"# Set the CORENLP_HOME environment variable to point to the installation location\n",
"import os\n",
- "os.environ[\"CORENLP_HOME\"] = \"./corenlp\""
+ "os.environ[\"CORENLP_HOME\"] = corenlp_dir"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Jdq8MT-NAhKj",
+ "colab_type": "text"
+ },
+ "source": [
+ "That's all for the installation! 🎉 We can now double check if the installation is successful by listing files in the CoreNLP directory. You should be able to see a number of `.jar` files by running the following command:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "K5eIOaJp_tuo",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "# Examine the CoreNLP installation folder to make sure the installation is successful\n",
+ "!ls $CORENLP_HOME"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
+ "id": "S0xb9BHt__gx",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Note 1**:\n",
+ "If you are want to use the interface in a terminal (instead of a Colab notebook), you can properly set the `CORENLP_HOME` environment variable with:\n",
+ "\n",
+ "```bash\n",
+ "export CORENLP_HOME=path_to_corenlp_dir\n",
+ "```\n",
+ "\n",
+ "Here we instead set this variable with the Python `os` library, simply because `export` command is not well-supported in Colab notebook.\n",
+ "\n",
+ "\n",
+ "**Note 2**:\n",
+ "The `stanza.install_corenlp()` function is only available since Stanza v1.1.1. If you are using an earlier version of Stanza, please check out our [manual installation page](https://stanfordnlp.github.io/stanza/client_setup.html#manual-installation) for how to install CoreNLP on your computer.\n",
+ "\n",
+ "**Note 3**:\n",
+ "Besides the installation function, we also provide a `stanza.download_corenlp_models()` function to help you download additional CoreNLP models for different languages that are not shipped with the default installation. Check out our [automatic installation website page](https://stanfordnlp.github.io/stanza/client_setup.html#automated-installation) for more information on how to use it."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
"id": "xJsuO6D8D05q",
"colab_type": "text"
},
@@ -149,7 +190,7 @@
"# Import client module\n",
"from stanza.server import CoreNLPClient"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -163,6 +204,8 @@
"\n",
"Additionally, the client constructor accepts a `memory` argument, which specifies how much memory will be allocated to the background Java process. An `endpoint` option can be used to specify a port number used by the communication between the server and the client. The default port is 9000. However, since this port is pre-occupied by a system process in Colab, we'll manually set it to 9001 in the following example.\n",
"\n",
+ "Also, here we manually set `be_quiet=True` to avoid an IO issue in colab notebook. You should be able to use `be_quiet=False` on your own computer, which will print detailed logging information from CoreNLP during usage.\n",
+ "\n",
"For more options in constructing the clients, please refer to the [CoreNLP Client Options List](https://stanfordnlp.github.io/stanza/corenlp_client.html#corenlp-client-options)."
]
},
@@ -175,7 +218,11 @@
},
"source": [
"# Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001\n",
- "client = CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], memory='4G', endpoint='http://localhost:9001')\n",
+ "client = CoreNLPClient(\n",
+ " annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n",
+ " memory='4G', \n",
+ " endpoint='http://localhost:9001',\n",
+ " be_quiet=True)\n",
"print(client)\n",
"\n",
"# Start the background server and wait for some time\n",
@@ -183,7 +230,7 @@
"client.start()\n",
"import time; time.sleep(10)"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -193,7 +240,7 @@
"colab_type": "text"
},
"source": [
- "Now if you print the background processes, you should be able to find the Java CoreNLP server running."
+ "After the above code block finishes executing, if you print the background processes, you should be able to find the Java CoreNLP server running."
]
},
{
@@ -205,9 +252,10 @@
},
"source": [
"# Print background processes and look for java\n",
+ "# You should be able to see a StanfordCoreNLPServer java process running in the background\n",
"!ps -o pid,cmd | grep java"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -237,7 +285,7 @@
"document = client.annotate(text)\n",
"print(type(document))"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -271,7 +319,7 @@
" print(\"{:12s}\\t{:12s}\\t{:6s}\\t{}\".format(t.word, t.lemma, t.pos, t.ner))\n",
" print(\"\")"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -299,7 +347,7 @@
" for m in sent.mentions:\n",
" print(\"{:30s}\\t{}\".format(m.entityMentionText, m.entityType))"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -326,7 +374,7 @@
"# Print annotations of a mention\n",
"print(document.sentence[0].mentions[0])"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -365,7 +413,7 @@
"time.sleep(10)\n",
"!ps -o pid,cmd | grep java"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{
@@ -403,7 +451,8 @@
},
"source": [
"print(\"Starting a server with the Python \\\"with\\\" statement...\")\n",
- "with CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], memory='4G', endpoint='http://localhost:9001') as client:\n",
+ "with CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n",
+ " memory='4G', endpoint='http://localhost:9001', be_quiet=True) as client:\n",
" text = \"Albert Einstein was a German-born theoretical physicist.\"\n",
" document = client.annotate(text)\n",
"\n",
@@ -414,7 +463,7 @@
"\n",
"print(\"\\nThe server should be stopped upon exit from the \\\"with\\\" statement.\")"
],
- "execution_count": 0,
+ "execution_count": null,
"outputs": []
},
{