Posted by Eltharyon 3 years ago (Source)
to keep you informed regarding server performance in preparation for the next reset day. We’ve been hard at work and finally I have some really good news to report:
We’ve been able to find the cause for the abominable combat performance in large fights and have already rolled out a temporary workaround last friday
which massively improved the situation over the weekend!
While we’re not happy with this workaround as a long term solution (see below), we’re confident that fights are already be significantly
better and you should be able to fight more smoothly than ever before. Happy zerging!
Robin “Eltharyon” Henkys
The long story
For everyone who wants to know a bit more, here are the details:
Last week we invested a lot of time to set up our very own “large-battle-testlab” in our office. To recreate the conditions the servers experienced under stress, we simulated hundreds of players fighting each other with all kinds of different spells, mounts and equipment at the same time in the same cluster. To our confusion, we simply could not reproduce the issues we’d been having on the live systems. Fights with 300 simulated players were running flawlessly, while fights with far fewer players were having trouble on the live systems.
To double check our results, we tested the same simulation on a computer in the datacenter and behold: the lag problems were occuring just as they were in the live game.
With that began a tedious process of elimination, in which we went through the differences of our setup one by one to identify which component could be the cause of the slowdown.
Eventually we found it: a bug in our networking technology caused by the fact that we’d outfitted the live servers with multi-CPU systems (not to be confused with multi-core systems!) in preparation for the Free 2 Play launch of Albion. These multi-cpu systems were performing far worse under pressure than a single cpu system, so our short term solution was simply to deactivate the additional cpus on these machines.
Even with a single CPU active performance is far better than it was before, but for obvious reasons this is not a satisfactory long-term solution, as it means that a significant portion of our server investment is now sitting idle. We’re now in the process of debugging and fixing the network code with our technology partners and are looking forward to put these issues behind us and re-focus on future features and other performance improvements!