It happens, hopefully not that often but it isn’t avoidable. Virtual machines do crash sometimes and this isn’t exactly a pleasure for sys admins. So what can be done to avoid it?
The following script will start your existing vms, which have the option “Start at boot” set to yes. The interval for this service can be changed in watchdog_vm.timer.An exclusion is also possible, if needed. In this case I did it with the tags feature. So if your vm has the “Start at boot” box ticked, but has one of the following example tags attached to it, the vm will be skipped. Nonetheless you can still easily change the .sh file to suit your needs, as this is just meant as an inspiration.
watchdog.sh
#!/bin/bash
for vmid in $(qm list | awk '{print $1}' | sed '1d')
do
onboot=$(qm config $vmid | awk -F'[= ]' '/onboot/{print $2}')
tags=$(qm config $vmid | awk '/^tags:/{$1=""; print $0}')
if [ "$onboot" == "1" ] && [[ ! "$tags" =~ "test" ]] && [[ ! "$tags" =~ "fuss" ]]; then
status=$(qm status $vmid | awk '{print $2}')
if [ "$status" == "stopped" ]; then
qm start $vmid
echo "Started VM $vmid"
elif [ "$status" == "running" ]; then
echo "VM $vmid already running"
else
echo "Failed to determine status of VM $vmid"
fi
fi
done
watchdog_vm.service
[Unit]
Description=Check every VMs status with the 'onboot' parameter and start the VM in case it's not running
After=network.target
[Service]
Type=oneshot
ExecStart=/root/watchdog.sh
[Install]
WantedBy=multi-user.target
watchdog_vm.timer
[Unit]
Description=Timer for watchdog_vm.service
[Timer]
OnBootSec=15min
OnUnitActiveSec=2min
Unit=watchdog_vm.service
[Install]
WantedBy=timers.target
Followed by
systemctl daemon-reload
systemctl enable watchdog_vm.timer
systemctl start watchdog_vm.timer