templates/db/gridgain_jmx/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177


# GridGain by JMX

## Overview

For Zabbix version: 6.2 and higher.
Official JMX Template for GridGain In-Memory Computing Platform.
This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.


This template was tested on:

- GridGain, version 8.8.5

## Setup

> See [Zabbix template operation](https://www.zabbix.com/documentation/6.2/manual/config/templates_out_of_the_box/jmx) for basic instructions.

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

1. Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for [instructions](https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html). Current JMX tree hierarchy contains classloader by default. Add the following jvm option `-DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false`to will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using [official guide](https://www.gridgain.com/docs/latest/administrators-guide/monitoring-metrics/configuring-metrics).
2. Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

## Zabbix configuration

No specific Zabbix configuration is required.

### Macros used

|Name|Description|Default|
|----|-----------|-------|
|{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} |<p>The maximum percent of checkpoint buffer utilization for high trigger expression.</p> |`80` |
|{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} |<p>The maximum percent of checkpoint buffer utilization for warning trigger expression.</p> |`66` |
|{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} |<p>The maximum percent of data region utilization for high trigger expression.</p> |`90` |
|{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} |<p>The maximum percent of data region utilization for warning trigger expression.</p> |`80` |
|{$GRIDGAIN.JOBS.QUEUE.MAX.WARN} |<p>The maximum number of queued jobs for trigger expression.</p> |`10` |
|{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES} |<p>Filter of discoverable cache groups.</p> |`.*` |
|{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES} |<p>Filter to exclude discovered cache groups.</p> |`CHANGE_IF_NEEDED` |
|{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES} |<p>Filter of discoverable data regions.</p> |`.*` |
|{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES} |<p>Filter to exclude discovered data regions.</p> |`^(sysMemPlc|TxLog)$` |
|{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES} |<p>Filter of discoverable thread pools.</p> |`.*` |
|{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES} |<p>Filter to exclude discovered thread pools.</p> |`^(GridCallbackExecutor|GridRebalanceStripedExecutor|GridDataStreamExecutor|StripedExecutor)$` |
|{$GRIDGAIN.PASSWORD} |<p>-</p> |`<secret>` |
|{$GRIDGAIN.PME.DURATION.MAX.HIGH} |<p>The maximum PME duration in ms for high trigger expression.</p> |`60000` |
|{$GRIDGAIN.PME.DURATION.MAX.WARN} |<p>The maximum PME duration in ms for warning trigger expression.</p> |`10000` |
|{$GRIDGAIN.THREAD.QUEUE.MAX.WARN} |<p>Threshold for thread pool queue size. Can be used with thread pool name as context.</p> |`1000` |
|{$GRIDGAIN.THREADS.COUNT.MAX.WARN} |<p>The maximum number of running threads for trigger expression.</p> |`1000` |
|{$GRIDGAIN.USER} |<p>-</p> |`zabbix` |

## Template links

There are no template links in this template.

## Discovery rules

|Name|Description|Type|Key and additional info|
|----|-----------|----|----|
|Cache groups |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p><p>**Filter**:</p>AND <p>- {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}`</p><p>- {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`</p> |
|Cache metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:name=\"org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p><p>**Filter**:</p>AND <p>- {#JMXGROUP} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}`</p><p>- {#JMXGROUP} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`</p> |
|Cluster metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p> |
|Data region metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p><p>**Filter**:</p>AND <p>- {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}`</p><p>- {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}`</p> |
|GridGain kernal metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p> |
|Local node metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p> |
|TCP Communication SPI metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p> |
|TCP discovery SPI |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p> |
|Thread pool metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p><p>**Filter**:</p>AND <p>- {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}`</p><p>- {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}`</p> |
|Transaction metrics |<p>-</p> |JMX |jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]<p>**Preprocessing**:</p><p>- JAVASCRIPT: `The text is too long. Please see the template.`</p> |

## Items collected

|Group|Name|Description|Type|Key and additional info|
|-----|----|-----------|----|---------------------|
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime |<p>Uptime of GridGain instance.</p> |JMX |jmx["{#JMXOBJ}",UpTime]<p>**Preprocessing**:</p><p>- MULTIPLIER: `0.001`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Version |<p>Version of GridGain instance.</p> |JMX |jmx["{#JMXOBJ}",FullVersion]<p>**Preprocessing**:</p><p>- REGEX: `(.*)-\d+ \1`</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID |<p>Unique identifier for this node within grid.</p> |JMX |jmx["{#JMXOBJ}",LocalNodeId]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline |<p>Total baseline nodes that are registered in the baseline topology.</p> |JMX |jmx["{#JMXOBJ}",TotalBaselineNodes]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline |<p>The number of nodes that are currently active in the baseline topology.</p> |JMX |jmx["{#JMXOBJ}",ActiveBaselineNodes]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client |<p>The number of client nodes in the cluster.</p> |JMX |jmx["{#JMXOBJ}",TotalClientNodes]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total |<p>Total number of nodes.</p> |JMX |jmx["{#JMXOBJ}",TotalNodes]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server |<p>The number of server nodes in the cluster.</p> |JMX |jmx["{#JMXOBJ}",TotalServerNodes]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current |<p>Number of cancelled jobs that are still running.</p> |JMX |jmx["{#JMXOBJ}",CurrentCancelledJobs] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current |<p>Number of jobs rejected after more recent collision resolution operation.</p> |JMX |jmx["{#JMXOBJ}",CurrentRejectedJobs] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current |<p>Number of queued jobs currently waiting to be executed.</p> |JMX |jmx["{#JMXOBJ}",CurrentWaitingJobs] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current |<p>Number of currently active jobs concurrently executing on the node.</p> |JMX |jmx["{#JMXOBJ}",CurrentActiveJobs] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate |<p>Total number of jobs handled by the node per second.</p> |JMX |jmx["{#JMXOBJ}",TotalExecutedJobs]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate |<p>Total number of jobs cancelled by the node per second.</p> |JMX |jmx["{#JMXOBJ}",TotalCancelledJobs]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate |<p>Total number of jobs this node rejects during collision resolution operations since node startup per second.</p> |JMX |jmx["{#JMXOBJ}",TotalRejectedJobs]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current |<p>Current PME duration in milliseconds.</p> |JMX |jmx["{#JMXOBJ}",CurrentPmeDuration] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current |<p>Current number of live threads.</p> |JMX |jmx["{#JMXOBJ}",CurrentThreadCount] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used |<p>Current heap size that is used for object allocation.</p> |JMX |jmx["{#JMXOBJ}",HeapMemoryUsed] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator |<p>Current coordinator UUID.</p> |JMX |jmx["{#JMXOBJ}",Coordinator]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left |<p>Nodes left count.</p> |JMX |jmx["{#JMXOBJ}",NodesLeft] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined |<p>Nodes join count.</p> |JMX |jmx["{#JMXOBJ}",NodesJoined] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed |<p>Nodes failed count.</p> |JMX |jmx["{#JMXOBJ}",NodesFailed] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue |<p>Message worker queue current size.</p> |JMX |jmx["{#JMXOBJ}",MessageWorkerQueueSize] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate |<p>Number of times node tries to (re)establish connection to another node per second.</p> |JMX |jmx["{#JMXOBJ}",ReconnectCount]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages |<p>The number of messages received per second.</p> |JMX |jmx["{#JMXOBJ}",TotalProcessedMessages]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate |<p>The number of messages processed per second.</p> |JMX |jmx["{#JMXOBJ}",TotalReceivedMessages]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue |<p>Outbound messages queue size.</p> |JMX |jmx["{#JMXOBJ}",OutboundMessagesQueueSize] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate |<p>The number of  messages received per second.</p> |JMX |jmx["{#JMXOBJ}",ReceivedMessagesCount]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate |<p>The number of  messages sent per second.</p> |JMX |jmx["{#JMXOBJ}",SentMessagesCount]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate |<p>Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.</p> |JMX |jmx["{#JMXOBJ}",ReconnectCount, "maxNumbers"]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys |<p>The number of keys locked on the node.</p> |JMX |jmx["{#JMXOBJ}",LockedKeysNumber] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current |<p>The number of active transactions for which this node is the initiator.</p> |JMX |jmx["{#JMXOBJ}",OwnerTransactionsNumber] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current |<p>The number of active transactions holding at least one key lock.</p> |JMX |jmx["{#JMXOBJ}",TransactionsHoldingLockNumber] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate |<p>The number of transactions which were rollback per second.</p> |JMX |jmx["{#JMXOBJ}",TransactionsRolledBackNumber] |
|GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate |<p>The number of transactions which were committed per second.</p> |JMX |jmx["{#JMXOBJ}",TransactionsCommittedNumber] |
|GridGain |Cache group [{#JMXGROUP}]: Cache gets, rate |<p>The number of gets to the cache per second.</p> |JMX |jmx["{#JMXOBJ}",CacheGets]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |Cache group [{#JMXGROUP}]: Cache puts, rate |<p>The number of puts to the cache per second.</p> |JMX |jmx["{#JMXOBJ}",CachePuts]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |Cache group [{#JMXGROUP}]: Cache removals, rate |<p>The number of removals from the cache per second.</p> |JMX |jmx["{#JMXOBJ}",CacheRemovals]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |Cache group [{#JMXGROUP}]: Cache hits, pct |<p>Percentage of successful hits.</p> |JMX |jmx["{#JMXOBJ}",CacheHitPercentage] |
|GridGain |Cache group [{#JMXGROUP}]: Cache misses, pct |<p>Percentage of accesses that failed to find anything.</p> |JMX |jmx["{#JMXOBJ}",CacheMissPercentage] |
|GridGain |Cache group [{#JMXGROUP}]: Cache transaction commits, rate |<p>The number of transaction commits per second.</p> |JMX |jmx["{#JMXOBJ}",CacheTxCommits]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate |<p>The number of transaction rollback per second.</p> |JMX |jmx["{#JMXOBJ}",CacheTxRollbacks]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |Cache group [{#JMXGROUP}]: Cache size |<p>The number of non-null values in the cache as a long value.</p> |JMX |jmx["{#JMXOBJ}",CacheSize] |
|GridGain |Cache group [{#JMXGROUP}]: Cache heap entries |<p>The number of entries in heap memory.</p> |JMX |jmx["{#JMXOBJ}",HeapEntriesCount]<p>**Preprocessing**:</p><p>- CHANGE_PER_SECOND</p> |
|GridGain |Data region {#JMXNAME}: Allocation, rate |<p>Allocation rate (pages per second) averaged across rateTimeInternal.</p> |JMX |jmx["{#JMXOBJ}",AllocationRate] |
|GridGain |Data region {#JMXNAME}: Allocated, bytes |<p>Total size of memory allocated in bytes.</p> |JMX |jmx["{#JMXOBJ}",TotalAllocatedSize] |
|GridGain |Data region {#JMXNAME}: Dirty pages |<p>Number of pages in memory not yet synchronized with persistent storage.</p> |JMX |jmx["{#JMXOBJ}",DirtyPages] |
|GridGain |Data region {#JMXNAME}: Eviction, rate |<p>Eviction rate (pages per second).</p> |JMX |jmx["{#JMXOBJ}",EvictionRate] |
|GridGain |Data region {#JMXNAME}: Size, max |<p>Maximum memory region size defined by its data region.</p> |JMX |jmx["{#JMXOBJ}",MaxSize] |
|GridGain |Data region {#JMXNAME}: Offheap size |<p>Offheap size in bytes.</p> |JMX |jmx["{#JMXOBJ}",OffHeapSize] |
|GridGain |Data region {#JMXNAME}: Offheap used size |<p>Total used offheap size in bytes.</p> |JMX |jmx["{#JMXOBJ}",OffheapUsedSize] |
|GridGain |Data region {#JMXNAME}: Pages fill factor |<p>The percentage of the used space.</p> |JMX |jmx["{#JMXOBJ}",PagesFillFactor] |
|GridGain |Data region {#JMXNAME}: Pages replace, rate |<p>Rate at which pages in memory are replaced with pages from persistent storage (pages per second).</p> |JMX |jmx["{#JMXOBJ}",PagesReplaceRate] |
|GridGain |Data region {#JMXNAME}: Used checkpoint buffer size |<p>Used checkpoint buffer size in bytes.</p> |JMX |jmx["{#JMXOBJ}",UsedCheckpointBufferSize] |
|GridGain |Data region {#JMXNAME}: Checkpoint buffer size |<p>Total size in bytes for checkpoint buffer.</p> |JMX |jmx["{#JMXOBJ}",CheckpointBufferSize] |
|GridGain |Cache group [{#JMXNAME}]: Backups |<p>Count of backups configured for cache group.</p> |JMX |jmx["{#JMXOBJ}",Backups] |
|GridGain |Cache group [{#JMXNAME}]: Partitions |<p>Count of partitions for cache group.</p> |JMX |jmx["{#JMXOBJ}",Partitions] |
|GridGain |Cache group [{#JMXNAME}]: Caches |<p>List of caches.</p> |JMX |jmx["{#JMXOBJ}",Caches]<p>**Preprocessing**:</p><p>- DISCARD_UNCHANGED_HEARTBEAT: `3h`</p> |
|GridGain |Cache group [{#JMXNAME}]: Local node partitions, moving |<p>Count of partitions with state MOVING for this cache group located on this node.</p> |JMX |jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount] |
|GridGain |Cache group [{#JMXNAME}]: Local node partitions, renting |<p>Count of partitions with state RENTING for this cache group located on this node.</p> |JMX |jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount] |
|GridGain |Cache group [{#JMXNAME}]: Local node entries, renting |<p>Count of entries remains to evict in RENTING partitions located on this node for this cache group.</p> |JMX |jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount] |
|GridGain |Cache group [{#JMXNAME}]: Local node partitions, owning |<p>Count of partitions with state OWNING for this cache group located on this node.</p> |JMX |jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount] |
|GridGain |Cache group [{#JMXNAME}]: Partition copies, min |<p>Minimum number of partition copies for all partitions of this cache group.</p> |JMX |jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies] |
|GridGain |Cache group [{#JMXNAME}]: Partition copies, max |<p>Maximum number of partition copies for all partitions of this cache group.</p> |JMX |jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies] |
|GridGain |Thread pool [{#JMXNAME}]: Queue size |<p>Current size of the execution queue.</p> |JMX |jmx["{#JMXOBJ}",QueueSize] |
|GridGain |Thread pool [{#JMXNAME}]: Pool size |<p>Current number of threads in the pool.</p> |JMX |jmx["{#JMXOBJ}",PoolSize] |
|GridGain |Thread pool [{#JMXNAME}]: Pool size, max |<p>The maximum allowed number of threads.</p> |JMX |jmx["{#JMXOBJ}",MaximumPoolSize] |
|GridGain |Thread pool [{#JMXNAME}]: Pool size, core |<p>The core number of threads.</p> |JMX |jmx["{#JMXOBJ}",CorePoolSize] |

## Triggers

|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----|----|----|
|GridGain [{#JMXIGNITEINSTANCENAME}]: Host has been restarted |<p>Uptime is less than 10 minutes.</p> |`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m` |INFO |<p>Manual close: YES</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data |<p>Zabbix has not received data for items for the last 10 minutes.</p> |`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1` |WARNING |<p>Manual close: YES</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed |<p>The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Perform Ack to close.</p> |`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0` |INFO |<p>Manual close: YES</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology |<p>One or more server node left the topology. Ack to close.</p> |`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0` |WARNING |<p>Manual close: YES</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology |<p>One or more server node added to the topology. Ack to close.</p> |`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0` |INFO |<p>Manual close: YES</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology |<p>One or more server node left the topology. Ack to close.</p> |`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])` |INFO |<p>Manual close: YES</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high |<p>Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}` |WARNING | |
|GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |<p>PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}` |WARNING |<p>**Depends on**:</p><p>- GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |<p>PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}` |HIGH | |
|GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high |<p>Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}` |WARNING |<p>**Depends on**:</p><p>- GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long</p> |
|GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed |<p>The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Perform Ack to close.</p> |`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0` |WARNING |<p>Manual close: YES</p> |
|Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m |<p>-</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0` |AVERAGE | |
|Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m |<p>-</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)` |WARNING |<p>**Depends on**:</p><p>- Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m</p> |
|Cache group [{#JMXGROUP}]: All entries are in heap |<p>All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close.</p> |`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])` |INFO |<p>Manual close: YES</p> |
|Data region {#JMXNAME}: Node started to evict pages |<p>You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0` |INFO |<p>Manual close: YES</p> |
|Data region {#JMXNAME}: Data region utilization is too high |<p>Data region utilization is high. Increase data region size or delete any data.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}` |WARNING |<p>**Depends on**:</p><p>- Data region {#JMXNAME}: Data region utilization is too high</p> |
|Data region {#JMXNAME}: Data region utilization is too high |<p>Data region utilization is high. Increase data region size or delete any data.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}` |HIGH | |
|Data region {#JMXNAME}: Pages replace rate more than 0 |<p>There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0` |WARNING | |
|Data region {#JMXNAME}: Checkpoint buffer utilization is too high |<p>Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}` |WARNING |<p>**Depends on**:</p><p>- Data region {#JMXNAME}: Checkpoint buffer utilization is too high</p> |
|Data region {#JMXNAME}: Checkpoint buffer utilization is too high |<p>Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}` |HIGH | |
|Cache group [{#JMXNAME}]: One or more backups are unavailable |<p>-</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)` |WARNING | |
|Cache group [{#JMXNAME}]: List of caches has changed |<p>List of caches has changed. Significant changes have occurred in the cluster. Ack to close.</p> |`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0` |INFO |<p>Manual close: YES</p> |
|Cache group [{#JMXNAME}]: Rebalance in progress |<p>Ack to close.</p> |`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0` |INFO |<p>Manual close: YES</p> |
|Cache group [{#JMXNAME}]: There is no copy for partitions |<p>-</p> |`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0` |WARNING | |
|Thread pool [{#JMXNAME}]: Too many messages in queue |<p>Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.</p> |`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}` |AVERAGE | |

## Feedback

Please report any issues with the template at https://support.zabbix.com.

You can also provide feedback, discuss the template, or ask for help at [ZABBIX forums](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback/).