Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/zabbix/zabbix.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'templates/app/hadoop_http/README.md')
-rw-r--r--templates/app/hadoop_http/README.md8
1 files changed, 4 insertions, 4 deletions
diff --git a/templates/app/hadoop_http/README.md b/templates/app/hadoop_http/README.md
index dd52627bf8e..638eefc55d2 100644
--- a/templates/app/hadoop_http/README.md
+++ b/templates/app/hadoop_http/README.md
@@ -115,22 +115,22 @@ There are no template links in this template.
|----|-----------|----|----|----|
|ResourceManager: Service is unavailable |<p>-</p> |`last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"])=0` |AVERAGE |<p>Manual close: YES</p> |
|ResourceManager: Service response time is too high |<p>-</p> |`min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"],5m)>{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN}` |WARNING |<p>Manual close: YES</p><p>**Depends on**:</p><p>- ResourceManager: Service is unavailable</p> |
-|ResourceManager: Service has been restarted |<p>Uptime is less than 10 minutes</p> |`last(/Hadoop by HTTP/hadoop.resourcemanager.uptime)<10m` |INFO |<p>Manual close: YES</p> |
+|ResourceManager: Service has been restarted |<p>Uptime is less than 10 minutes.</p> |`last(/Hadoop by HTTP/hadoop.resourcemanager.uptime)<10m` |INFO |<p>Manual close: YES</p> |
|ResourceManager: Failed to fetch ResourceManager API page |<p>Zabbix has not received data for items for the last 30 minutes.</p> |`nodata(/Hadoop by HTTP/hadoop.resourcemanager.uptime,30m)=1` |WARNING |<p>Manual close: YES</p><p>**Depends on**:</p><p>- ResourceManager: Service is unavailable</p> |
|ResourceManager: Cluster has no active NodeManagers |<p>Cluster is unable to execute any jobs without at least one NodeManager.</p> |`max(/Hadoop by HTTP/hadoop.resourcemanager.num_active_nm,5m)=0` |HIGH | |
|ResourceManager: Cluster has unhealthy NodeManagers |<p>YARN considers any node with disk utilization exceeding the value specified under the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage (in yarn-site.xml) to be unhealthy. Ample disk space is critical to ensure uninterrupted operation of a Hadoop cluster, and large numbers of unhealthyNodes (the number to alert on depends on the size of your cluster) should be quickly investigated and resolved.</p> |`min(/Hadoop by HTTP/hadoop.resourcemanager.num_unhealthy_nm,15m)>0` |AVERAGE | |
|NameNode: Service is unavailable |<p>-</p> |`last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"])=0` |AVERAGE |<p>Manual close: YES</p> |
|NameNode: Service response time is too high |<p>-</p> |`min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"],5m)>{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN}` |WARNING |<p>Manual close: YES</p><p>**Depends on**:</p><p>- NameNode: Service is unavailable</p> |
-|NameNode: Service has been restarted |<p>Uptime is less than 10 minutes</p> |`last(/Hadoop by HTTP/hadoop.namenode.uptime)<10m` |INFO |<p>Manual close: YES</p> |
+|NameNode: Service has been restarted |<p>Uptime is less than 10 minutes.</p> |`last(/Hadoop by HTTP/hadoop.namenode.uptime)<10m` |INFO |<p>Manual close: YES</p> |
|NameNode: Failed to fetch NameNode API page |<p>Zabbix has not received data for items for the last 30 minutes.</p> |`nodata(/Hadoop by HTTP/hadoop.namenode.uptime,30m)=1` |WARNING |<p>Manual close: YES</p><p>**Depends on**:</p><p>- NameNode: Service is unavailable</p> |
|NameNode: Cluster capacity remaining is low |<p>A good practice is to ensure that disk use never exceeds 80 percent capacity.</p> |`max(/Hadoop by HTTP/hadoop.namenode.percent_remaining,15m)<{$HADOOP.CAPACITY_REMAINING.MIN.WARN}` |WARNING | |
|NameNode: Cluster has missing blocks |<p>A missing block is far worse than a corrupt block, because a missing block cannot be recovered by copying a replica.</p> |`min(/Hadoop by HTTP/hadoop.namenode.missing_blocks,15m)>0` |AVERAGE | |
|NameNode: Cluster has volume failures |<p>HDFS now allows for disks to fail in place, without affecting DataNode operations, until a threshold value is reached. This is set on each DataNode via the dfs.datanode.failed.volumes.tolerated property; it defaults to 0, meaning that any volume failure will shut down the DataNode; on a production cluster where DataNodes typically have 6, 8, or 12 disks, setting this parameter to 1 or 2 is typically the best practice.</p> |`min(/Hadoop by HTTP/hadoop.namenode.volume_failures_total,15m)>0` |AVERAGE | |
|NameNode: Cluster has DataNodes in Dead state |<p>The death of a DataNode causes a flurry of network activity, as the NameNode initiates replication of blocks lost on the dead nodes.</p> |`min(/Hadoop by HTTP/hadoop.namenode.num_dead_data_nodes,5m)>0` |AVERAGE | |
-|{#HOSTNAME}: Service has been restarted |<p>Uptime is less than 10 minutes</p> |`last(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}])<10m` |INFO |<p>Manual close: YES</p> |
+|{#HOSTNAME}: Service has been restarted |<p>Uptime is less than 10 minutes.</p> |`last(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}])<10m` |INFO |<p>Manual close: YES</p> |
|{#HOSTNAME}: Failed to fetch NodeManager API page |<p>Zabbix has not received data for items for the last 30 minutes.</p> |`nodata(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}],30m)=1` |WARNING |<p>Manual close: YES</p><p>**Depends on**:</p><p>- {#HOSTNAME}: NodeManager has state {ITEM.VALUE}.</p> |
|{#HOSTNAME}: NodeManager has state {ITEM.VALUE}. |<p>The state is different from normal.</p> |`last(/Hadoop by HTTP/hadoop.nodemanager.state[{#HOSTNAME}])<>"RUNNING"` |AVERAGE | |
-|{#HOSTNAME}: Service has been restarted |<p>Uptime is less than 10 minutes</p> |`last(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}])<10m` |INFO |<p>Manual close: YES</p> |
+|{#HOSTNAME}: Service has been restarted |<p>Uptime is less than 10 minutes.</p> |`last(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}])<10m` |INFO |<p>Manual close: YES</p> |
|{#HOSTNAME}: Failed to fetch DataNode API page |<p>Zabbix has not received data for items for the last 30 minutes.</p> |`nodata(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}],30m)=1` |WARNING |<p>Manual close: YES</p><p>**Depends on**:</p><p>- {#HOSTNAME}: DataNode has state {ITEM.VALUE}.</p> |
|{#HOSTNAME}: DataNode has state {ITEM.VALUE}. |<p>The state is different from normal.</p> |`last(/Hadoop by HTTP/hadoop.datanode.oper_state[{#HOSTNAME}])<>"Live"` |AVERAGE | |