Posted by Eltharyon 2 years ago (Source)
Last weekend, after we experienced extreme performance issues during the Invasion Day fights, I promised
we would keep you informed about the progress being made on server performance in large fights.
The short version is:
- We’ve spent the week looking into the problem, running various networking tests and tweaking settings
- We’ve decided to upgrade selected hardware which may be causing issues and in general decided to invest in additional redundancy
- We’re hoping to roll out this additional hardware during tomorrow's (Friday April 26th) maintenance
While we do hope that the additional hardware will bring some relief for the weekend, we’re still convinced we haven’t found ‘the thing’ that is causing these slowdowns. We currently still believe that one of the many improvements we made in preparation for the Free To Play launch is causing large fights to perform worse than they should, and we will continue to investigate.
For those of you who are wondering why we’re not putting too much faith in additional server hardware, I’ve written up an in-depth explanation of the issues we’re facing below!
We will continue to update you on this topic until we’re convinced large fight performance is as good as it can be.
Robin ‘Eltharyon’ Henkys
What is the problem?
In short, the problem is that players involved in large fights experience severe ‘lag’. The term ‘lag’ describes all sorts of symptoms caused by communication issues between a game client and its servers. Typically it means that, while the rendering is unaffected (FPS are still high), the game becomes unresponsive to your commands, movement of your own and/or other characters becomes erratic and the often the impact of game actions (such as damage taken) will appear in sudden bursts on your game client, as information about them arrives much later than intended.
What causes lag?
There are many causes for lag. When lag occurs, it means that messages being sent back and forth between your computer and our servers are not being transported and/or processed as fast and reliable as they should be.
This can be caused by a disruption in your local network (someone using all the available bandwidth to download/steam content for example), an interruption at your ISP, somewhere in transit or within our data center.
If many players are experiencing lag at the same time, it is more likely the problem lies within our data center. When this occurs, there are generally two possible explanations: either a game server is unable to process all the messages it needs to receive and send out, or some piece of networking hardware (all the routers, switches firewalls etc. involved) is unable to put through the necessary bandwidth to send back and forth.
Why does lag occur in big fights?
Big fights are a particularly challenging case in terms of game engineering. This is due to fundamental mathematics and the nature of networking.
In order for a player action to become visible on your client, the game server needs to send your computer a message about this action. In fact, every action you take, be it walking, casting a spell or chopping down a tree causes a message to be sent to the server and the server then distributes this information to all nearby players who should be able to see this action.
This means, the more players are around to see an action you take, the more messages each action generates. Or, mathematically speaking: if n
is the number of players nearby, a
the number of actions a player takes on average and m
the number of messages that need to be sent, then m = a*n²
It's the squared
bit that is causing difficulties in large fights and that puts an engineering limit on how many players can ever participate in a fight, as every additional player becomes significantly harder to process the more players there already are.
To put this in perspective: If there are 100 players fighting, and each of them sends one action to everyone else, the server has to handle 100*99 = 9900 messages.
If these players are running around in groups of two instead, all the server has to handle is 100*1 = 100 messages. In other words: in terms of stress on the server, 990 players could be duelling instead of 100 players having a fight together.
It gets even worse when we get to 200 players. 200 players send messages to 199 players each, causing 200*199 = 39800 messages to be sent. This would be enough for 3980 players in pairs! It gets only worse from here, especially because this simplified scenario does not take into account additional multipliers which occur, like AoE attacks causing massive amounts of players to be hit (and massive amounts of messages being sent).
Due to the mathematics of this situation, not only are infinitely large fights impossible, even larger fights than a few hundred players are extremely challenging and increasing the maximum fighter count requires bigger and bigger optimizations to add fewer and fewer players into the mix.
Despite all this, we’ve seen the Albion network code deal with well over 300 players in a fight reasonably well, which is why we’re particularly frustrated that right now problems seem to occur at far lower numbers than that.
Why don’t you just add more servers?
As you can see from the above explanation, the problems do not primarily occur due to the total number of players in the game. Albion’s game world is split amongst a large number of individual servers and the servers we have all show good health with the amount of players in the game world. Since most of the players are well distributed (and overload mode keeps hotspots like cities from becoming a problem), the amount of messages generated by the large player numbers is easily handled by all the different servers.
The challenge in a battle that is taking place in a single zone is that it cannot handled by different computers. Synchronizing all the necessary information across several machines would be way too slow and actually slow down the entire process. Instead, a single machine has to be able to handle the entire fight. The quality of our individual machines is already at the upper limit of what you can get on the market, we cannot improve through further upgrades, especially when you take into account that significant improvements are needed to make a difference.
This is why we’re focussing our search and improvements on other places which could be causing problems: network infrastructure and networking code.
If infinitely large fights are not possible, what is your plan then?
Right now we’re focussed on getting back to at least the same level of performance we’ve had before. Once we have managed to achieve that, we will continue to search for optimizations and improvements to push the number of players even higher.
At the same time, we have to face the fact that there will always be more players wanting to fight than we can handle in a single fight. For this reason, we’re currently working on different concepts for handling overloaded fights. One solution we’re discussing involves temporarily putting weaker/uninvolved players in a stasis mode during large engagements, giving them a chance to wait until player number have gone down or being transferred to a nearby region.
Implementing such a solution will be a priority once we’ve dealt with the immediate issues.