Through hours of testing, I have found that the nextcloud desktop sync client for ubuntu 20.04 (appimage or ppa) both seem to have a bug to where... if a common nextcloud file sync error occurs , kswapd0 spikes to 100% of CPU and the swapfile on Debian 10.5 server becomes completely filled. (clamscan also spikes 45% to 100% during kswapd0's climb to 100% of CPU). My other sync clients do not cause this problem (mobile, ubuntu native "online accounts") .
top command output
top - 16:08:59 up 22 min, 2 users, load average: 89.42, 84.04, 55.66
Tasks: 378 total, 12 running, 359 sleeping, 0 stopped, 7 zombie
%Cpu(s): 3.4 us, 57.0 sy, 0.0 ni, 0.1 id, 39.5 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3946.8 total, 90.2 free, 3766.4 used, 90.1 buff/cache
MiB Swap: 6144.0 total, 0.0 free, 6144.0 used. 4.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
36 root 30 10 0 0 0 R 98.3 0.0 12:43.68 kswapd0
1691 mysql 20 0 1739540 2376 0 S 3.9 0.1 0:34.59 mysqld
1300 root 10 -10 116752 3400 0 D 3.3 0.1 0:41.96 AliYunDun
1544 root 20 0 806108 640 0 D 2.4 0.0 0:09.45 aliyun-service
161 root 20 0 4556 1904 1844 S 0.9 0.0 0:10.60 plymouthd
2746 git 20 0 1374728 6020 0 S 0.7 0.1 0:07.23 gitea
1114 root 20 0 24312 284 0 S 0.5 0.0 0:03.74 AliYunDunUpdate
5805 web2 20 0 292472 215456 920 D 0.4 5.3 0:05.43 clamscan
155 root 0 -20 0 0 0 I 0.3 0.0 0:07.11 kworker/0:1H-kbl+
232 root 20 0 70888 284 88 D 0.3 0.0 0:03.74 systemd-journal
936 memcache 20 0 408168 0 0 S 0.3 0.0 0:02.19 memcached
3492 root 20 0 11380 756 556 R 0.3 0.0 0:03.28 top
1 root 20 0 170192 2972 0 D 0.3 0.1 0:11.03 systemd
1041 redis 20 0 54244 428 0 D 0.3 0.0 0:03.28 redis-server
4029 www-data 20 0 339376 2436 16 D 0.3 0.1 0:00.85 /usr/sbin/apach
I have tried using nice and cpulimit to prevent kswapd0 from reaching 100% and completely consuming the swap memory.. but kswapd0 seems to just power through both commands whether run individually or simultaneously and consumes 100% of CPU and swap, leaving me no choice but to reboot the server in order to clear the swap cache.
I have already reduced swapiness to zero. And I have tried:
To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
To free reclaimable slab objects (includes dentries and inodes):
echo 2 > /proc/sys/vm/drop_caches
To free slab objects and pagecache:
echo 3 > /proc/sys/vm/drop_caches
As I figure nextcloud file sync errors will be a common thing in the future, might someone be able to suggest how I can mitigate / prevent a simple file sync error from taking down my entire server?
UPDATE
After some additional testing and reading.. it seems that ClamAV is running clamscan on every upload and email which is spiking CPU usage to 100%. The relation to nextcloud is that I have anitvirus for files activated. Therefore, my file sync uploads also start clamscan as well, then overload the server.
The solution seems to be stop using clamscan but instead implement clamav-daemon. I am researching the problem now, but if someone can tell me how to switch from clamscan to clamav-daemon. I would appreciate it.