Welcome to mirror list, hosted at ThFree Co, Russian Federation.

README.md « ceph « plugins « go « src - github.com/zabbix/zabbix.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: c5fe623fd463b61e75ab332859249e339489a52d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
# Ceph plugin
Provides native Zabbix solution for monitoring Ceph clusters (distributed storage system). It can monitor several 
Ceph instances simultaneously, remote or local to the Zabbix Agent.
Best for use in conjunction with the official 
[Ceph template.](https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/ceph_agent2) 
You can extend it or create your template for your specific needs. 

## Requirements
* Zabbix Agent 2
* Go >= 1.13 (required only to build from source)

## Supported versions
* Ceph, version 14+

## Installation
* Configure the Ceph RESTful Module according to [documentation.](https://docs.ceph.com/en/latest/mgr/restful/)  
* Make sure a RESTful API endpoint is available for connection.  

## Configuration
The Zabbix agent 2 configuration file is used to configure plugins.

**Plugins.Ceph.InsecureSkipVerify** — InsecureSkipVerify controls whether an http client verifies the
server's certificate chain and host name. If InsecureSkipVerify is true, TLS accepts any certificate presented by 
the server and any host name in that certificate. In this mode, TLS is susceptible to man-in-the-middle attacks.  
**This should be used only for testing.**  
*Default value:* false  
*Limits:* false | true

**Plugins.Ceph.Timeout** — The maximum time in seconds for waiting when a request has to be done. The timeout includes 
connection time, any redirects, and reading the response body.  
*Default value:* equals the global Timeout configuration parameter.  
*Limits:* 1-30

**Plugins.Ceph.KeepAlive** — Sets a time for waiting before unused connections will be closed.  
*Default value:* 300 sec.  
*Limits:* 60-900

### Configuring connection
A connection can be configured using either keys' parameters or named sessions.     

*Notes*:  
* It is not possible to mix configuration using named sessions and keys' parameters simultaneously.
* You can leave any connection parameter empty, a default hard-coded value will be used in the such case.
* Embedded URI credentials (userinfo) are forbidden and will be ignored. So, you can't pass the credentials by this:   
  
      ceph.ping[https://user:apikey@127.0.0.1] — WRONG  
  
  The correct way is:
    
      ceph.ping[https://127.0.0.1,user,apikey]
      
* The only supported network schema for a URI is "https".  
Examples of valid URIs:
    - https://127.0.0.1:8003
    - https://localhost
    - localhost
      
#### Using keys' parameters
The common parameters for all keys are: [ConnString][,User][,ApiKey]  
Where ConnString can be either a URI or a session name.  
ConnString will be treated as a URI if no session with the given name is found.  
If you use ConnString as a session name, just skip the rest of the connection parameters.  
 
#### Using named sessions
Named sessions allow you to define specific parameters for each Ceph instance. Currently, there are only three supported 
parameters: Uri, User and ApiKey. It's a bit more secure way to store credentials compared to item keys or macros.  

E.g: suppose you have two Ceph clusters: "Prod" and "Test". 
You should add the following options to the agent configuration file:   

    Plugins.Ceph.Sessions.Prod.Uri=https://192.168.1.1:8003
    Plugins.Ceph.Sessions.Prod.User=<UserForProd>
    Plugins.Ceph.Sessions.Prod.ApiKey=<ApiKeyForProd>
        
    Plugins.Ceph.Sessions.Test.Uri=https://192.168.0.1:8003
    Plugins.Ceph.Sessions.Test.User=<UserForTest>
    Plugins.Ceph.Sessions.Test.ApiKey=<ApiKeyForTest>
        
Then you will be able to use these names as the 1st parameter (ConnString) in keys instead of URIs, e.g:

    ceph.ping[Prod]
    ceph.ping[Test]
    
*Note*: sessions names are case-sensitive.
    
## Supported keys
**ceph.df.details[\<commonParams\>]** — Returns information about cluster’s data usage and distribution among pools.    
Uses data provided by "df detail" command.  
*Output sample:*
```json
{
    "pools": {
        "device_health_metrics": {
            "percent_used": 0,
            "objects": 0,
            "bytes_used": 0,
            "rd_ops": 0,
            "rd_bytes": 0,
            "wr_ops": 0,
            "wr_bytes": 0,
            "stored_raw": 0,
            "max_avail": 1390035968
        },
        "new_pool": {
            "percent_used": 0,
            "objects": 0,
            "bytes_used": 0,
            "rd_ops": 0,
            "rd_bytes": 0,
            "wr_ops": 0,
            "wr_bytes": 0,
            "stored_raw": 0,
            "max_avail": 695039808
        },
        "test_zabbix": {
            "percent_used": 0,
            "objects": 4,
            "bytes_used": 786432,
            "rd_ops": 0,
            "rd_bytes": 0,
            "wr_ops": 4,
            "wr_bytes": 24576,
            "stored_raw": 66618,
            "max_avail": 1390035968
        },
        "zabbix": {
            "percent_used": 0,
            "objects": 0,
            "bytes_used": 0,
            "rd_ops": 0,
            "rd_bytes": 0,
            "wr_ops": 0,
            "wr_bytes": 0,
            "stored_raw": 0,
            "max_avail": 1390035968
        }
    },
    "rd_ops": 0,
    "rd_bytes": 0,
    "wr_ops": 4,
    "wr_bytes": 24576,
    "num_pools": 4,
    "total_bytes": 12872318976,
    "total_avail_bytes": 6898843648,
    "total_used_bytes": 2752249856,
    "total_objects": 4
}
```

**ceph.osd.stats[\<commonParams\>]** — Returns aggregated and per OSD statistics.  
Uses data provided by "pg dump" command.  
*Output sample:*
```json
{
    "osd_latency_apply": {
        "min": 0,
        "max": 0,
        "avg": 0
    },
    "osd_latency_commit": {
        "min": 0,
        "max": 0,
        "avg": 0
    },
    "osd_fill": {
        "min": 47,
        "max": 47,
        "avg": 47
    },
    "osd_pgs": {
        "min": 65,
        "max": 65,
        "avg": 65
    },
    "osds": {
        "0": {
            "osd_latency_apply": 0,
            "osd_latency_commit": 0,
            "num_pgs": 65,
            "osd_fill": 47
        },
        "1": {
            "osd_latency_apply": 0,
            "osd_latency_commit": 0,
            "num_pgs": 65,
            "osd_fill": 47
        },
        "2": {
            "osd_latency_apply": 0,
            "osd_latency_commit": 0,
            "num_pgs": 65,
            "osd_fill": 47
        }
    }
}
```

**ceph.osd.discovery[\<commonParams\>]** — Returns a list of discovered OSDs in LLD format.
Can be used in conjunction with "ceph.osd.stats" and "ceph.osd.dump" in order to create "per osd" items.  
Uses data provided by "osd crush tree" command.  
*Output sample:*
```json
[
  {
    "{#OSDNAME}": "0",
    "{#CLASS}": "hdd",
    "{#HOST}": "node1"
  },
  {
    "{#OSDNAME}": "1",
    "{#CLASS}": "hdd",
    "{#HOST}": "node2"
  },
  {
    "{#OSDNAME}": "2",
    "{#CLASS}": "hdd",
    "{#HOST}": "node3"
  }
]
```

**ceph.osd.dump[\<commonParams\>]** — Returns usage thresholds and statuses of OSDs.  
Uses data provided by "osd dump" command.  
*Output sample:*
```json
{
    "osd_backfillfull_ratio": 0.9,
    "osd_full_ratio": 0.95,
    "osd_nearfull_ratio": 0.85,
    "num_pg_temp": 65,
    "osds": {
        "0": {
            "in": 1,
            "up": 1
        },
        "1": {
            "in": 1,
            "up": 1
        },
        "2": {
            "in": 1,
            "up": 1
        }
    }
}
```

**ceph.ping[\<commonParams\>]** — Tests if a connection is alive or not.
Uses data provided by "health" command.    
*Returns:*
- "1" if a connection is alive.
- "0" if a connection is broken (if there is any error presented including AUTH and configuration issues).

**ceph.pool.discovery[\<commonParams\>]** — Returns a list of discovered pools in LLD format.
Can be used in conjunction with "ceph.df.details" in order to create "per pool" items.  
Uses data provided by "osd dump" and "osd crush rule dump" commands.  
*Output sample:*
```json
[
    {
        "{#POOLNAME}": "device_health_metrics",
        "{#CRUSHRULE}": "default"
    },
    {
        "{#POOLNAME}": "test_zabbix",
        "{#CRUSHRULE}": "default"
    },
    {
        "{#POOLNAME}": "zabbix",
        "{#CRUSHRULE}": "default"
    },
    {
        "{#POOLNAME}": "new_pool",
        "{#CRUSHRULE}": "newbucket"
    }
]
```

**ceph.status[\<commonParams\>]** — Returns an overall cluster's status.  
Uses data provided by "status" command.  
*Output sample:*
```json
{
    "overall_status": 2,
    "num_mon": 3,
    "num_osd": 3,
    "num_osd_in": 2,
    "num_osd_up": 1,
    "num_pg": 66,
    "pg_states": {
        "activating": 0,
        "active": 0,
        "backfill_toofull": 0,
        "backfill_unfound": 0,
        "backfill_wait": 0,
        "backfilling": 0,
        "clean": 0,
        "creating": 0,
        "deep": 0,
        "degraded": 36,
        "down": 0,
        "forced_backfill": 0,
        "forced_recovery": 0,
        "incomplete": 0,
        "inconsistent": 0,
        "laggy": 0,
        "peered": 65,
        "peering": 0,
        "recovering": 0,
        "recovery_toofull": 0,
        "recovery_unfound": 1,
        "recovery_wait": 0,
        "remapped": 0,
        "repair": 0,
        "scrubbing": 0,
        "snaptrim": 0,
        "snaptrim_error": 0,
        "snaptrim_wait": 0,
        "stale": 0,
        "undersized": 65,
        "unknown": 1,
        "wait": 0
    },
    "min_mon_release_name": "octopus"
}
```

## Troubleshooting
The plugin uses Zabbix agent's logs. You can increase debugging level of Zabbix Agent if you need more details about 
what is happening.  

If you get the error "x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs", 
probably you need to set the InsecureSkipVerify option to "true" or use a certificate that is signed by the 
organization’s certificate authority.