Skip to content

Host infrastructure monitoring importer

Host infrastructure monitoring is the continuous process of collecting, analyzing, and acting on data from an organization's IT infrastructure. This foundational practice ensures the health, performance, and availability of the servers, virtual machines, containers, and other foundational components that power an organization's applications and services. By providing real-time visibility into the state of this critical infrastructure, it enables IT teams to proactively identify and resolve issues, optimize performance, and plan for future capacity needs.

At its core, host infrastructure monitoring involves the tracking of key performance indicators (KPIs) and metrics across various layers of the IT stack. This includes the physical hardware, the operating system, and the network connections. The primary goal is to move from a reactive to a proactive stance in IT management, addressing potential problems before they escalate and impact end-users.

Configuration

You can find the configuration option in Versio.io at Environment settings > OneImporter > Configurations > Host infrastructure monitoring.

Basically, setting up a configuration is not necessary for this module. The activation on the side of the OneImporter is sufficient to already start importing the data. An additional configuration can be created to restrict the import or to overwrite defaults.

Host infrastructure monitoring configuration Figure: Host infrastructure monitoring configuration  

The following metrics are monitored depending on the specific operating system:

ID Name Unit Description
agent:host.cpu.usage CPU usage percent Overall CPU usage as a percentage of total available capacity, including user and system time.
agent:host.cpu.idle CPU idle percent Percentage of time the CPU is idle and not executing any tasks (excluding I/O wait).
agent:host.cpu.other CPU other percent CPU time not categorized as user, system, idle, or iowait—may include interrupt handling or guest time.
agent:host.cpu.steal CPU steal percent Time the virtual CPU waits for real CPU while the hypervisor is servicing other virtual machines.
agent:host.cpu.iowait CPU iowait percent Percentage of time the CPU is idle while waiting for I/O operations (e.g., disk or network) to complete.
agent:host.cpu.user CPU user percent Time spent on user processes (non-kernel code), including applications and services.
agent:host.cpu.system CPU system percent Time spent executing system (kernel) processes and handling system calls.
agent:host.cpu.interrupt CPU interrupt percent Time spent handling hardware interrupts.
agent:host.cpu.load CPU load number The system load representing the number of processes running or waiting for CPU time.
agent:host.cpu.load5 CPU load 5 number The average system load over the last 5 minutes.
agent:host.cpu.load15 CPU load 15 number The average system load over the last 15 minutes.
agent:host.memory.total Memory total byte The total amount of installed physical memory (RAM).
agent:host.memory.used Memory used byte The amount of memory currently in use by processes, the kernel, and caches.
agent:host.memory.reclaimable Memory reclaimable byte The portion of memory occupied by the operating system that can be released when needed, e.g., file caches and buffers.
agent:host.memory.kernel Memory kernel byte The portion of system memory reserved for the operating system kernel, device drivers, and core system services.
agent:host.memory.swap.total Swap total byte The total size of the configured swap space on disk. Memory swap is the process where an operating system temporarily moves inactive data from fast RAM to a slow disk to free up memory for active applications when RAM is full.
agent:host.memory.swap.used Swap used byte The amount of swap space currently in use.
agent:host.memory.pgfault Page faults numberRate The number of page faults. A page fault occurs when a process accesses data that is not in physical memory.
agent:host.disk.total Disk total byte The total storage capacity of the filesystem or drive.
agent:host.disk.used Disk used byte The amount of used storage space on the filesystem or drive.
agent:host.network.traffic.in Traffic in bitRate The rate of incoming network traffic in bits per unit of time.
agent:host.network.traffic.out Traffic out bitRate The rate of outgoing network traffic in bits per unit of time.
agent:host.network.packets.in Packets in numberRate The number of network packets received per unit of time.
agent:host.network.packets.out Packets out numberRate The number of network packets sent per unit of time.
agent:host.network.errors.in Errors in numberRate The number of errors on sent network packets per unit of time.
agent:host.network.errors.out Errors out numberRate The number of errors on sent network packets per unit of time.
agent:host.network.drops.in Dropped in numberRate The number of incoming network packets dropped per unit of time.
agent:host.network.drops.out Dropped out numberRate The number of outgoing network packets dropped per unit of time.

Example

The following image is an example of a monitored Linux host:

Monitored CPU metrics
Figure: Monitored CPU metrics

Monitored RAM metrics
Figure: Monitored RAM metrics

Monitored disk metrics
Figure: Monitored disk metrics

Monitored NIC metrics
Figure: Monitored NIC metrics