Host infrastructure monitoring importer
Host infrastructure monitoring is the continuous process of collecting, analyzing, and acting on data from an organization's IT infrastructure. This foundational practice ensures the health, performance, and availability of the servers, virtual machines, containers, and other foundational components that power an organization's applications and services. By providing real-time visibility into the state of this critical infrastructure, it enables IT teams to proactively identify and resolve issues, optimize performance, and plan for future capacity needs.
At its core, host infrastructure monitoring involves the tracking of key performance indicators (KPIs) and metrics across various layers of the IT stack. This includes the physical hardware, the operating system, and the network connections. The primary goal is to move from a reactive to a proactive stance in IT management, addressing potential problems before they escalate and impact end-users.
Configuration
You can find the configuration option in Versio.io at Environment settings > OneImporter > Configurations > Host infrastructure monitoring
.
Basically, setting up a configuration is not necessary for this module. The activation on the side of the OneImporter is sufficient to already start importing the data. An additional configuration can be created to restrict the import or to overwrite defaults.
Figure: Host infrastructure monitoring configuration
The following metrics are monitored depending on the specific operating system:
ID | Name | Unit | Description |
---|---|---|---|
agent:host.cpu.usage |
percent | Overall CPU usage as a percentage of total available capacity, including user and system time. | |
agent:host.cpu.idle |
percent | Percentage of time the CPU is idle and not executing any tasks (excluding I/O wait). | |
agent:host.cpu.other |
percent | CPU time not categorized as user, system, idle, or iowait—may include interrupt handling or guest time. | |
agent:host.cpu.steal |
percent | Time the virtual CPU waits for real CPU while the hypervisor is servicing other virtual machines. | |
agent:host.cpu.iowait |
percent | Percentage of time the CPU is idle while waiting for I/O operations (e.g., disk or network) to complete. | |
agent:host.cpu.user |
percent | Time spent on user processes (non-kernel code), including applications and services. | |
agent:host.cpu.system |
percent | Time spent executing system (kernel) processes and handling system calls. | |
agent:host.cpu.interrupt |
percent | Time spent handling hardware interrupts. | |
agent:host.cpu.load |
number | The system load representing the number of processes running or waiting for CPU time. | |
agent:host.cpu.load5 |
number | The average system load over the last 5 minutes. | |
agent:host.cpu.load15 |
number | The average system load over the last 15 minutes. | |
agent:host.memory.total |
byte | The total amount of installed physical memory (RAM). | |
agent:host.memory.used |
byte | The amount of memory currently in use by processes, the kernel, and caches. | |
agent:host.memory.reclaimable |
byte | The portion of memory occupied by the operating system that can be released when needed, e.g., file caches and buffers. | |
agent:host.memory.kernel |
byte | The portion of system memory reserved for the operating system kernel, device drivers, and core system services. | |
agent:host.memory.swap.total |
byte | The total size of the configured swap space on disk. Memory swap is the process where an operating system temporarily moves inactive data from fast RAM to a slow disk to free up memory for active applications when RAM is full. | |
agent:host.memory.swap.used |
byte | The amount of swap space currently in use. | |
agent:host.memory.pgfault |
numberRate | The number of page faults. A page fault occurs when a process accesses data that is not in physical memory. | |
agent:host.disk.total |
byte | The total storage capacity of the filesystem or drive. | |
agent:host.disk.used |
byte | The amount of used storage space on the filesystem or drive. | |
agent:host.network.traffic.in |
bitRate | The rate of incoming network traffic in bits per unit of time. | |
agent:host.network.traffic.out |
bitRate | The rate of outgoing network traffic in bits per unit of time. | |
agent:host.network.packets.in |
numberRate | The number of network packets received per unit of time. | |
agent:host.network.packets.out |
numberRate | The number of network packets sent per unit of time. | |
agent:host.network.errors.in |
numberRate | The number of errors on sent network packets per unit of time. | |
agent:host.network.errors.out |
numberRate | The number of errors on sent network packets per unit of time. | |
agent:host.network.drops.in |
numberRate | The number of incoming network packets dropped per unit of time. | |
agent:host.network.drops.out |
numberRate | The number of outgoing network packets dropped per unit of time. |
Example
The following image is an example of a monitored Linux host:
Figure: Monitored CPU metrics
Figure: Monitored RAM metrics
Figure: Monitored disk metrics
Figure: Monitored NIC metrics