Performance Analysis
The performance of VAIOS was evaluated using a comprehensive benchmark suite executed on 5 March 2026. The benchmarks were designed to assess the behavior of key subsystems including task scheduling, inter-task communication, memory management, floating-point computation, and DMA-based data transfer. All tests were conducted on an STM32F401RE (Cortex-M4) platform with a 1 ms system tick.
The benchmark suite covered a total of 14 test cases spanning multiple subsystems, all of which completed successfully without errors, timeouts, or failures. The results demonstrate stable operation under both isolated and concurrent workloads.
Experimental Platform
The benchmarks were conducted on a Nucleo-F401RE development board. The configuration details are summarized in Table 6.1.
| Field | Value |
|---|---|
| Board | STM32F401RE (Nucleo-64) |
| CPU | Cortex-M4 @ 84 MHz |
| FPU | Enabled (hard-float ABI) |
| DMA | Enabled (UART logging) |
| SysTick | 1 ms period |
| Build | Release (-O2) |
Benchmark Summary
A comprehensive benchmark suite was executed on 5 March 2026 to evaluate the system’s performance across five critical sub-domains. The tests were performed on an STM32F401RE platform running at 84 MHz.
| Benchmark | Status | Duration | Throughput |
|---|---|---|---|
FPU: sinf throughput |
PASS | 4 ms | 250,000 ops/s |
FPU: sqrtf throughput |
PASS | 3 ms | 333,333 ops/s |
| FPU: multi-task context-save | PASS | 4 ms | 250,000 ops/s |
| DMA: Memory-to-Memory (Looped) | PASS | 20 ms | 2,000 KB/s |
| DMA: Concurrent Streams | PASS | 21 ms | 1,904 KB/s |
| TASK: Context switch rate | PASS | 212 ms | 3,773 sw/s |
| TASK: Priority preemption | PASS | 65 ms | Verified |
| TASK: Delay accuracy | PASS | 101 ms | 1% Error |
| IPC: Semaphore ping-pong | PASS | 31 ms | 32,258 trips/s |
| IPC: Mutex shared counter | PASS | 69 ms | Verified |
| MEM: Alloc/free throughput | PASS | 97 ms | 6,185 ops/s |
| MEM: Fragmentation resilience | PASS | — | Coalesced |
| STRESS: All subsystems concurrent | PASS | 8,004 ms | Verified |
Task Scheduling: The scheduler achieved a context switch rate of approximately 3,773 switches per second under release build conditions at 84 MHz. This translates to approximately 22,000 clock cycles per context switch, including PendSV overhead and scheduler logic. Delay accuracy was measured with a 1% error margin (101 ms measured for a 100 ms requested delay), which is the theoretical limit for a 1 ms system tick.
Computational Performance: The hardware FPU demonstrates high efficiency, with sqrtf outperforming sinf by approximately 33%, consistent with the hardware-accelerated square root instruction set. Multi-tasking tests verified the FPU context preservation logic, with lazy stacking ensuring that only active FPU-using tasks incur the register save/restore overhead.
Subsystem Reliability: A 3,000 ms high-contention stress test successfully executed 200 DMA transfers, 6,902 semaphore trips, and numerous memory operations concurrently. Zero allocation failures and zero data corruption incidents were observed, confirming kernel stability under high-load autonomous flight scenarios. Significant reliability fixes were implemented during benchmarking, including a bounded spin-wait in the logging subsystem and optimized DMA flag clearing sequences to prevent spurious interrupts.