# GridGain by JMX ## Overview For Zabbix version: 6.2 and higher Official JMX Template for GridGain In-Memory Computing Platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor. This template was tested on: - GridGain, version 8.8.5 ## Setup > See [Zabbix template operation](https://www.zabbix.com/documentation/6.2/manual/config/templates_out_of_the_box/jmx) for basic instructions. This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable. 1. Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for [instructions](https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html). Current JMX tree hierarchy contains classloader by default. Add the following jvm option `-DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false`to will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using [official guide](https://www.gridgain.com/docs/latest/administrators-guide/monitoring-metrics/configuring-metrics). 2. Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}. ## Zabbix configuration No specific Zabbix configuration is required. ### Macros used |Name|Description|Default| |----|-----------|-------| |{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} |

The maximum percent of checkpoint buffer utilization for high trigger expression.

|`80` | |{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} |

The maximum percent of checkpoint buffer utilization for warning trigger expression.

|`66` | |{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} |

The maximum percent of data region utilization for high trigger expression.

|`90` | |{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} |

The maximum percent of data region utilization for warning trigger expression.

|`80` | |{$GRIDGAIN.JOBS.QUEUE.MAX.WARN} |

The maximum number of queued jobs for trigger expression.

|`10` | |{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES} |

Filter of discoverable cache groups.

|`.*` | |{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES} |

Filter to exclude discovered cache groups.

|`CHANGE_IF_NEEDED` | |{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES} |

Filter of discoverable data regions.

|`.*` | |{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES} |

Filter to exclude discovered data regions.

|`^(sysMemPlc|TxLog)$` | |{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES} |

Filter of discoverable thread pools.

|`.*` | |{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES} |

Filter to exclude discovered thread pools.

|`^(GridCallbackExecutor|GridRebalanceStripedExecutor|GridDataStreamExecutor|StripedExecutor)$` | |{$GRIDGAIN.PASSWORD} |

-

|`` | |{$GRIDGAIN.PME.DURATION.MAX.HIGH} |

The maximum PME duration in ms for high trigger expression.

|`60000` | |{$GRIDGAIN.PME.DURATION.MAX.WARN} |

The maximum PME duration in ms for warning trigger expression.

|`10000` | |{$GRIDGAIN.THREAD.QUEUE.MAX.WARN} |

Threshold for thread pool queue size. Can be used with thread pool name as context.

|`1000` | |{$GRIDGAIN.THREADS.COUNT.MAX.WARN} |

The maximum number of running threads for trigger expression.

|`1000` | |{$GRIDGAIN.USER} |

-

|`zabbix` | ## Template links There are no template links in this template. ## Discovery rules |Name|Description|Type|Key and additional info| |----|-----------|----|----| |Cache groups |

-

|JMX |jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

**Filter**:

AND

- {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}`

- {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`

| |Cache metrics |

-

|JMX |jmx.discovery[beans,"org.apache:name=\"org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

**Filter**:

AND

- {#JMXGROUP} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}`

- {#JMXGROUP} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`

| |Cluster metrics |

-

|JMX |jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

| |Data region metrics |

-

|JMX |jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

**Filter**:

AND

- {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}`

- {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}`

| |GridGain kernal metrics |

-

|JMX |jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

| |Local node metrics |

-

|JMX |jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

| |TCP Communication SPI metrics |

-

|JMX |jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

| |TCP discovery SPI |

-

|JMX |jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

| |Thread pool metrics |

-

|JMX |jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

**Filter**:

AND

- {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}`

- {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}`

| |Transaction metrics |

-

|JMX |jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

**Preprocessing**:

- JAVASCRIPT: `The text is too long. Please see the template.`

| ## Items collected |Group|Name|Description|Type|Key and additional info| |-----|----|-----------|----|---------------------| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime |

Uptime of GridGain instance.

|JMX |jmx["{#JMXOBJ}",UpTime]

**Preprocessing**:

- MULTIPLIER: `0.001`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Version |

Version of GridGain instance.

|JMX |jmx["{#JMXOBJ}",FullVersion]

**Preprocessing**:

- REGEX: `(.*)-\d+ \1`

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID |

Unique identifier for this node within grid.

|JMX |jmx["{#JMXOBJ}",LocalNodeId]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline |

Total baseline nodes that are registered in the baseline topology.

|JMX |jmx["{#JMXOBJ}",TotalBaselineNodes]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline |

The number of nodes that are currently active in the baseline topology.

|JMX |jmx["{#JMXOBJ}",ActiveBaselineNodes]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client |

The number of client nodes in the cluster.

|JMX |jmx["{#JMXOBJ}",TotalClientNodes]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total |

Total number of nodes.

|JMX |jmx["{#JMXOBJ}",TotalNodes]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server |

The number of server nodes in the cluster.

|JMX |jmx["{#JMXOBJ}",TotalServerNodes]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current |

Number of cancelled jobs that are still running.

|JMX |jmx["{#JMXOBJ}",CurrentCancelledJobs] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current |

Number of jobs rejected after more recent collision resolution operation.

|JMX |jmx["{#JMXOBJ}",CurrentRejectedJobs] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current |

Number of queued jobs currently waiting to be executed.

|JMX |jmx["{#JMXOBJ}",CurrentWaitingJobs] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current |

Number of currently active jobs concurrently executing on the node.

|JMX |jmx["{#JMXOBJ}",CurrentActiveJobs] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate |

Total number of jobs handled by the node per second.

|JMX |jmx["{#JMXOBJ}",TotalExecutedJobs]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate |

Total number of jobs cancelled by the node per second.

|JMX |jmx["{#JMXOBJ}",TotalCancelledJobs]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate |

Total number of jobs this node rejects during collision resolution operations since node startup per second.

|JMX |jmx["{#JMXOBJ}",TotalRejectedJobs]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current |

Current PME duration in milliseconds.

|JMX |jmx["{#JMXOBJ}",CurrentPmeDuration] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current |

Current number of live threads.

|JMX |jmx["{#JMXOBJ}",CurrentThreadCount] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used |

Current heap size that is used for object allocation.

|JMX |jmx["{#JMXOBJ}",HeapMemoryUsed] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator |

Current coordinator UUID.

|JMX |jmx["{#JMXOBJ}",Coordinator]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left |

Nodes left count.

|JMX |jmx["{#JMXOBJ}",NodesLeft] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined |

Nodes join count.

|JMX |jmx["{#JMXOBJ}",NodesJoined] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed |

Nodes failed count.

|JMX |jmx["{#JMXOBJ}",NodesFailed] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue |

Message worker queue current size.

|JMX |jmx["{#JMXOBJ}",MessageWorkerQueueSize] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate |

Number of times node tries to (re)establish connection to another node per second.

|JMX |jmx["{#JMXOBJ}",ReconnectCount]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages |

The number of messages received per second.

|JMX |jmx["{#JMXOBJ}",TotalProcessedMessages]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate |

The number of messages processed per second.

|JMX |jmx["{#JMXOBJ}",TotalReceivedMessages]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue |

Outbound messages queue size.

|JMX |jmx["{#JMXOBJ}",OutboundMessagesQueueSize] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate |

The number of messages received per second.

|JMX |jmx["{#JMXOBJ}",ReceivedMessagesCount]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate |

The number of messages sent per second.

|JMX |jmx["{#JMXOBJ}",SentMessagesCount]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate |

Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.

|JMX |jmx["{#JMXOBJ}",ReconnectCount]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys |

The number of keys locked on the node.

|JMX |jmx["{#JMXOBJ}",LockedKeysNumber] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current |

The number of active transactions for which this node is the initiator.

|JMX |jmx["{#JMXOBJ}",OwnerTransactionsNumber] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current |

The number of active transactions holding at least one key lock.

|JMX |jmx["{#JMXOBJ}",TransactionsHoldingLockNumber] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate |

The number of transactions which were rollback per second.

|JMX |jmx["{#JMXOBJ}",TransactionsRolledBackNumber] | |GridGain |GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate |

The number of transactions which were committed per second.

|JMX |jmx["{#JMXOBJ}",TransactionsCommittedNumber] | |GridGain |Cache group [{#JMXGROUP}]: Cache gets, rate |

The number of gets to the cache per second.

|JMX |jmx["{#JMXOBJ}",CacheGets]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |Cache group [{#JMXGROUP}]: Cache puts, rate |

The number of puts to the cache per second.

|JMX |jmx["{#JMXOBJ}",CachePuts]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |Cache group [{#JMXGROUP}]: Cache removals, rate |

The number of removals from the cache per second.

|JMX |jmx["{#JMXOBJ}",CacheRemovals]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |Cache group [{#JMXGROUP}]: Cache hits, pct |

Percentage of successful hits.

|JMX |jmx["{#JMXOBJ}",CacheHitPercentage] | |GridGain |Cache group [{#JMXGROUP}]: Cache misses, pct |

Percentage of accesses that failed to find anything.

|JMX |jmx["{#JMXOBJ}",CacheMissPercentage] | |GridGain |Cache group [{#JMXGROUP}]: Cache transaction commits, rate |

The number of transaction commits per second.

|JMX |jmx["{#JMXOBJ}",CacheTxCommits]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate |

The number of transaction rollback per second.

|JMX |jmx["{#JMXOBJ}",CacheTxRollbacks]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |Cache group [{#JMXGROUP}]: Cache size |

The number of non-null values in the cache as a long value.

|JMX |jmx["{#JMXOBJ}",CacheSize] | |GridGain |Cache group [{#JMXGROUP}]: Cache heap entries |

The number of entries in heap memory.

|JMX |jmx["{#JMXOBJ}",HeapEntriesCount]

**Preprocessing**:

- CHANGE_PER_SECOND

| |GridGain |Data region {#JMXNAME}: Allocation, rate |

Allocation rate (pages per second) averaged across rateTimeInternal.

|JMX |jmx["{#JMXOBJ}",AllocationRate] | |GridGain |Data region {#JMXNAME}: Allocated, bytes |

Total size of memory allocated in bytes.

|JMX |jmx["{#JMXOBJ}",TotalAllocatedSize] | |GridGain |Data region {#JMXNAME}: Dirty pages |

Number of pages in memory not yet synchronized with persistent storage.

|JMX |jmx["{#JMXOBJ}",DirtyPages] | |GridGain |Data region {#JMXNAME}: Eviction, rate |

Eviction rate (pages per second).

|JMX |jmx["{#JMXOBJ}",EvictionRate] | |GridGain |Data region {#JMXNAME}: Size, max |

Maximum memory region size defined by its data region.

|JMX |jmx["{#JMXOBJ}",MaxSize] | |GridGain |Data region {#JMXNAME}: Offheap size |

Offheap size in bytes.

|JMX |jmx["{#JMXOBJ}",OffHeapSize] | |GridGain |Data region {#JMXNAME}: Offheap used size |

Total used offheap size in bytes.

|JMX |jmx["{#JMXOBJ}",OffheapUsedSize] | |GridGain |Data region {#JMXNAME}: Pages fill factor |

The percentage of the used space.

|JMX |jmx["{#JMXOBJ}",PagesFillFactor] | |GridGain |Data region {#JMXNAME}: Pages replace, rate |

Rate at which pages in memory are replaced with pages from persistent storage (pages per second).

|JMX |jmx["{#JMXOBJ}",PagesReplaceRate] | |GridGain |Data region {#JMXNAME}: Used checkpoint buffer size |

Used checkpoint buffer size in bytes.

|JMX |jmx["{#JMXOBJ}",UsedCheckpointBufferSize] | |GridGain |Data region {#JMXNAME}: Checkpoint buffer size |

Total size in bytes for checkpoint buffer.

|JMX |jmx["{#JMXOBJ}",CheckpointBufferSize] | |GridGain |Cache group [{#JMXNAME}]: Backups |

Count of backups configured for cache group.

|JMX |jmx["{#JMXOBJ}",Backups] | |GridGain |Cache group [{#JMXNAME}]: Partitions |

Count of partitions for cache group.

|JMX |jmx["{#JMXOBJ}",Partitions] | |GridGain |Cache group [{#JMXNAME}]: Caches |

List of caches.

|JMX |jmx["{#JMXOBJ}",Caches]

**Preprocessing**:

- DISCARD_UNCHANGED_HEARTBEAT: `3h`

| |GridGain |Cache group [{#JMXNAME}]: Local node partitions, moving |

Count of partitions with state MOVING for this cache group located on this node.

|JMX |jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount] | |GridGain |Cache group [{#JMXNAME}]: Local node partitions, renting |

Count of partitions with state RENTING for this cache group located on this node.

|JMX |jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount] | |GridGain |Cache group [{#JMXNAME}]: Local node entries, renting |

Count of entries remains to evict in RENTING partitions located on this node for this cache group.

|JMX |jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount] | |GridGain |Cache group [{#JMXNAME}]: Local node partitions, owning |

Count of partitions with state OWNING for this cache group located on this node.

|JMX |jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount] | |GridGain |Cache group [{#JMXNAME}]: Partition copies, min |

Minimum number of partition copies for all partitions of this cache group.

|JMX |jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies] | |GridGain |Cache group [{#JMXNAME}]: Partition copies, max |

Maximum number of partition copies for all partitions of this cache group.

|JMX |jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies] | |GridGain |Thread pool [{#JMXNAME}]: Queue size |

Current size of the execution queue.

|JMX |jmx["{#JMXOBJ}",QueueSize] | |GridGain |Thread pool [{#JMXNAME}]: Pool size |

Current number of threads in the pool.

|JMX |jmx["{#JMXOBJ}",PoolSize] | |GridGain |Thread pool [{#JMXNAME}]: Pool size, max |

The maximum allowed number of threads.

|JMX |jmx["{#JMXOBJ}",MaximumPoolSize] | |GridGain |Thread pool [{#JMXNAME}]: Pool size, core |

The core number of threads.

|JMX |jmx["{#JMXOBJ}",CorePoolSize] | ## Triggers |Name|Description|Expression|Severity|Dependencies and additional info| |----|-----------|----|----|----| |GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted |

Uptime is less than 10 minutes.

|`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m` |INFO |

Manual close: YES

| |GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data |

Zabbix has not received data for items for the last 10 minutes.

|`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1` |WARNING |

Manual close: YES

| |GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed |

GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.

|`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0` |INFO |

Manual close: YES

| |GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology |

One or more server node left the topology. Ack to close.

|`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0` |WARNING |

Manual close: YES

| |GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology |

One or more server node added to the topology. Ack to close.

|`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0` |INFO |

Manual close: YES

| |GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology |

One or more server node left the topology. Ack to close.

|`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])` |INFO |

Manual close: YES

| |GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high |

Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}` |WARNING | | |GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |

PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}` |WARNING |

**Depends on**:

- GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

| |GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |

PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}` |HIGH | | |GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high |

Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}` |WARNING |

**Depends on**:

- GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

| |GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed |

GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.

|`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0` |WARNING |

Manual close: YES

| |Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m |

-

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0` |AVERAGE | | |Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m |

-

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)` |WARNING |

**Depends on**:

- Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m

| |Cache group [{#JMXGROUP}]: All entries are in heap |

All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close.

|`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])` |INFO |

Manual close: YES

| |Data region {#JMXNAME}: Node started to evict pages |

You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0` |INFO |

Manual close: YES

| |Data region {#JMXNAME}: Data region utilization is too high |

Data region utilization is high. Increase data region size or delete any data.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}` |WARNING |

**Depends on**:

- Data region {#JMXNAME}: Data region utilization is too high

| |Data region {#JMXNAME}: Data region utilization is too high |

Data region utilization is high. Increase data region size or delete any data.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}` |HIGH | | |Data region {#JMXNAME}: Pages replace rate more than 0 |

There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0` |WARNING | | |Data region {#JMXNAME}: Checkpoint buffer utilization is too high |

Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}` |WARNING |

**Depends on**:

- Data region {#JMXNAME}: Checkpoint buffer utilization is too high

| |Data region {#JMXNAME}: Checkpoint buffer utilization is too high |

Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}` |HIGH | | |Cache group [{#JMXNAME}]: One or more backups are unavailable |

-

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)` |WARNING | | |Cache group [{#JMXNAME}]: List of caches has changed |

List of caches has changed. Significant changes have occurred in the cluster. Ack to close.

|`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0` |INFO |

Manual close: YES

| |Cache group [{#JMXNAME}]: Rebalance in progress |

Ack to close.

|`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0` |INFO |

Manual close: YES

| |Cache group [{#JMXNAME}]: There is no copy for partitions |

-

|`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0` |WARNING | | |Thread pool [{#JMXNAME}]: Too many messages in queue |

Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.

|`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}` |AVERAGE | | ## Feedback Please report any issues with the template at https://support.zabbix.com You can also provide feedback, discuss the template or ask for help with it at [ZABBIX forums](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback/).