Difference between revisions of "Optimizing Plugins (SourceMod Scripting)"

From AlliedModders Wiki
Jump to: navigation, search
m (Decl on Local Arrays: Uh, fuck. Forgot to close the <tt> tag)
(Update syntax. Remove decl notes (deprecated))
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
==Introduction==
 
==Introduction==
SourceMod is pretty fast, but there are some common assumptions about scripts:
+
This guide contains some general suggestions as to how to improve local performance in your code. However, '''take note'''. Do not use this as a guide to prematurely optimizing your program. You should focus on making your scripts easily readable and maintainable. Premature optimization can make your code more complex, introducing bugs that you otherwise wouldn't have had.
*You can't possibly make it any faster.
 
*It's pre-compiled, so it's already quite fast.
 
*Details don't matter, as it's only "scripting" anyway.
 
  
None of these are true. The compiler, in fact, is very poor at optimizing, and the JIT is left to do most of the work. You can '''greatly''' increase the speed and efficiency of your plugins by keeping a few rules in mind.  Remember - it's more important to minimize instructions than it is to minimize lines of code.
+
If you do notice performance problems on your server, and you think they are introduced by your plugin, there are a few steps you can take. The best way to start is to get a profiler. You can use the [[SourceMod Profiler]] to tell you how much time is being spent in script functions.
  
Note that many of these optimizations should be taken in context. An admin command probably doesn't need fine-tuned optimizations.  But a timer that executes every 0.1 seconds, or a GameFrame hook, should definitely be as fast as possible.
+
However, if you are worried about your code, you can also try to estimate the cost of operations in your program. Anything that happens repeatedly in a small period of time - the contents of a loop, the body of a timer, an <tt>OnGameFrame</tt> hook - are good targets.
  
=Always Save Results=
+
'''DISCLAIMER:''' The units of time in this article are comparative only. We estimate most SourcePawn operations as costing 1-10 cycles, where a cycle is measured in nanoseconds.
Observe the example code snippet below:
 
<pawn>if (GetClientTeam(player) == 3)
 
{
 
    //...code
 
}
 
else if (GetClientTeam(player) == 2)
 
{
 
    //...code
 
}
 
else if (GetClientTeam(player) == 1)
 
{
 
    //...code
 
}</pawn>
 
This is a mild example of "cache your results".  When the compiler generates assembly for this code, it will (in pseudo code) generate:
 
<pre>
 
  CALL GetClientTeam
 
  COMPARE+JUMP
 
  CALL GetClientTeam
 
  COMPARE+JUMP
 
  CALL GetClientTeam
 
  COMPARE+JUMP
 
</pre>
 
Notice the problem?  We have called <tt>GetClientTeam</tt> an extra two times than necessary.  The result doesn't change, so we can save it.  Observe:
 
<pawn>new team = GetClientTeam(player)
 
if (team == 3)
 
{
 
    //...code
 
}
 
else if (team == 2)
 
{
 
    //...code
 
}
 
else if (team == 1)
 
{
 
    //...code
 
}</pawn>
 
Now, the compiler will only generate this:
 
<pre>
 
  CALL get_user_team
 
  COMPARE+JUMP
 
  COMPARE+JUMP
 
  COMPARE+JUMP
 
</pre>
 
If <tt>GetClientTeam</tt> were a more expensive operation (it's relatively cheap), we would have recalculated the entire result each branch of the <tt>if</tt> case, wasting CPU cycles.
 
 
 
Similarly, this type of code is usually not a good idea:
 
<pre>for (new i = 0; i < strlen(string); i++) ....
 
</pre>
 
 
 
Better code is:
 
<pre>new len = strlen(string);
 
for (new i = 0; i < len; i++)</pre>
 
 
 
Similarly, you may not want to put <tt>GetMaxClients()</tt> in a loop about players.  While it is a very cheap function call, it incurs the cost of a native call, which might be significant in highly performance-sensitive code.
 
 
 
 
 
=Switch instead of If=
 
If you can, you should use <tt>switch</tt> cases instead of <tt>if</tt>.  This is because for an <tt>if</tt> statement, the compiler must branch to each consecutive <tt>if</tt> case.  Using the example from above, observe the switch version:
 
<pawn>new team = GetClientTeam(player)
 
switch (team)
 
{
 
  case 3:
 
    //code...
 
  case 2:
 
    //code...
 
  case 1:
 
    //code...
 
}</pawn>
 
This will generate what's called a "case table".  Rather than worm through displaced <tt>if</tt> tests, the compiler generates a table of possible values.  The JIT is smart enough to optimize this even further, and the best case is:
 
<pre>
 
  JUMP Table[CALL GetUserTeam]
 
</pre>
 
 
 
If your switch cases are listed in perfectly sequential order (that is, skipping no numbers, either ascending or descending), the JIT can make the best optimizations. For example, "7,8,9" and "2,1,0" are examples of perfect switch cases.  "1,3,4" is not.
 
 
 
 
 
=Don't Re-index Arrays=
 
A common practice in Pawn is to "save space" by re-indexing arrays.  There are a few myths behind this, such as saving memory, assuming the compiler does it for you, or readability. Fact: none of these are true.  Observe the code below. 
 
<pawn>SomeFunction(const clients[], num_clients)
 
{
 
  for (new i = 0; i < num_clients; i++)
 
  {
 
      if (!IsClientInGame(clients[i]))
 
      {
 
        continue;
 
      }
 
      SetSomething(clients[i], GetSomething(clients[i]) + 1);
 
  }
 
}
 
</pawn>
 
  
For this, the compiler generates code similar to:
+
=Estimating Cost=
  
<pre>
+
The goal of estimating cost is to figure out two things:
:LOOP_BEGIN
+
* How expensive an operation is, run only once.
  LOAD i
+
* How many times the operation is repeated.
  LOAD clients
 
  CALC
 
  LOAD clients[i]
 
  CALL IsClientInGame
 
  LOAD i
 
  LOAD clients
 
  CALC
 
  LOAD clients[i]
 
  CALL GetSomething
 
  LOAD i
 
  LOAD players
 
  CALC
 
  LOAD players[i]
 
  CALL SetSomething
 
</pre>
 
  
See what happened?  The compiler does not cache array indexing. Because we've used <tt>clients[i]</tt> each time, every instance generates 4-6 (or more) instructions which load <tt>i</tt>, the address of <tt>clients</tt>, computes the final location, and then grabs the data out of memory. It is much faster to do:
+
Let's try this with a simple example loop below. Most syntactic language features have a simple cost. Natives can be trickier.
  
<pawn>SomeFunction(const clients[], num_clients)
+
<sourcepawn>for (int i = 0; i < strlen(string); i++)
 
{
 
{
  new client;
+
    if (string[i] == '"')
  for (new i = 0; i < num_clients; i++)
+
    {
  {
+
        return i;
      client = clients[i];
+
    }
      if (!IsClientInGame(client))
 
      {
 
        continue;
 
      }
 
      SetSomething(client, GetSomething(client) + 1);
 
  }
 
 
}
 
}
</pawn>
+
</sourcepawn>
 
 
Not only is this more readable, but look at how much cruft we've shaved off the compiler's generated code:
 
<pre>
 
:LOOP_BEGIN
 
  LOAD i
 
  LOAD clients
 
  CALC
 
  LOAD clients[i]
 
  STORE client
 
  CALL IsClientInGame
 
  LOAD client
 
  CALL GetSomething
 
  LOAD client
 
  CALL SetSomething
 
</pre>
 
In a large loop you can drastically reduce codesize in this manner.
 
 
 
=Decl on Local Arrays=
 
There is a small caveat to the 'new' statement in Pawn; it automatically writes a zero for every byte in the data structure.  For example:
 
 
 
<pawn>new String:elephant[512];</pawn>
 
 
 
If placed inside local scope, all 512 bytes of the variable will be cleared every time the variable is created.  In a performance-sensitive callback, this could be disastrous.  Even something as innocuous as:
 
  
<pawn>new temp_players[MAX_PLAYERS];</pawn>
+
First, let's determine how many times this loop will run. If the length of <tt>string</tt> is ''n'', then the loop will run ''n'' times. Next, let's see what each iteration of the loop costs:
 +
* <tt>strlen(string)</tt>: This has to count all of the characters in |string|. Let's say counting a character costs 1 unit of time. Therefore, |strlen(string)| will cost ''n'' units of time.
 +
* <tt>if (string[i] == '"')</tt>: This contains an array load and comparison. Let's say those each cost 1 unit of time, totaling 2.
 +
* <tt>i++</tt>: This increments a local variable. Let's say that costs 1 unit of time.
  
Could be a significant slow-down if called too often.  To solve this, SourceMod has a special replacement for <tt>new</tt> called <tt>decl</tt>.  Unlike <tt>new</tt>, <tt>decl</tt> does not bother setting the entire array or structure to zero.  Thus, the data will be filled with random garbage.
+
Therefore, every iteration of the loop costs <tt>n + 3</tt> units of time.
  
While this is much faster, you must be careful to initialize the data before you use it.  Observe the following code:
+
That means this loop may cost up to <tt>(n * (n + 3))</tt> units of time. Now, if we know that <tt>n</tt> is always small - say, under 100 - that might not be a problem. But what happens if the string has 10,000 characters? Now, the loop will take over 100,000,000 units of time! If a "unit of time" is even as small as a nanosecond, that loop will take a whole tenth of a second, delaying the server by multiple frames!
  
<pawn>decl String:my_string[256];
+
This example is easy to fix. We can identify that <tt>strlen</tt> magnifies the cost of the loop, and rewrite it like this:
 
+
<sourcepawn>int length = strlen(string);
if (Something())
+
for (int i = 0; i < length; i++)
 
{
 
{
  Format(my_string, sizeof(my_string), "clam");
+
    if (string[i] == '"')
 +
    {
 +
        return i;
 +
    }
 
}
 
}
 +
</sourcepawn>
  
PrintToChat(client, "%s", my_string);</pawn>
+
Now the cost of this loop is just: <tt>n</tt> times a very, very small amount of time.
 
 
Notice the bug?  If <tt>Something()</tt> returned false, <tt>my_string</tt> would still have garbage in it.  Thus, you must always take care and make sure that you might not accidentally be reading from uninitialized data.  Once that happens, you will get undefined behavior and possibly even crashes.
 
 
 
A common practice is to short-initialize strings.  For example:
 
<pawn>decl String:my_string[256];
 
my_string[0] = '\0';</pawn>
 
 
 
This sets the first byte of the string to zero, which makes sure the string will be read as a valid, but empty, string.
 
 
 
Note that <tt>decl</tt> will work on any local variable type (array, string, float, integer, et cetera).  The one thing it cannot be used for is initialization.  This is invalid:
 
 
 
<pawn>decl var = 5;</pawn>
 
 
 
Since the purpose of <tt>decl</tt> is to avoid initializaton, such a line would have no meaning, and thus it is invalid syntax. You must use <tt>new</tt> in order to do initialize:
 
 
 
<pawn>new var = 5;</pawn>
 
  
 
=Avoid Large KeyValues=
 
=Avoid Large KeyValues=
Line 199: Line 53:
  
 
If you're trying to use a KeyValues file with thousands of entries and updating/loading it on events such as player connections or disconnections, you will find that the structure will grow to an unmanageably slow size.  If that's the case, you should consider moving to something like SQLite or MySQL.
 
If you're trying to use a KeyValues file with thousands of entries and updating/loading it on events such as player connections or disconnections, you will find that the structure will grow to an unmanageably slow size.  If that's the case, you should consider moving to something like SQLite or MySQL.
 
 
=Use Stock/Public Correctly=
 
Using <tt>stock</tt> and <tt>public</tt> correctly won't necessarily optimize your plugin, but they will help your code be more maintainable and will eliminate extra exports/data being written to your plugin file.
 
 
==Stock==
 
A <tt>stock</tt> function is compiled, but only written to the plugin's binary if it's used.  Generally, you should use the <tt>stock</tt> keyword if:
 
*You may use the function in the future, and don't want the compiler telling you it's never used;
 
*You are writing an include file and don't want the function compiled if it's never used.
 
 
==Public==
 
A <tt>public</tt> function is exported externally.  Every plugin binary has a list of all its <tt>public</tt> functions, and Core uses this table whenever it needs to find a matching <tt>forward</tt>.
 
 
Generally, there are only two reasons you should ever use <tt>public</tt>:
 
*You are implementing a <tt>forward</tt>.
 
*You are implementing a callback to another function, and it requires you to use <tt>public</tt>.
 
 
It seems common for users to randomly add <tt>public</tt> to completely private functions -- not only is that unnecessary, but it only adds to the amount of work Core has to do to find functions in your plugin.
 
 
 
=Conclusion=
 
Although optimization is important, you should always keep context in mind.  Don't replace every single <tt>new</tt> in your plugin with <tt>decl</tt> just because it might be faster.  Identify the areas of your plugin where optimization is significant, and tweak from there.
 
 
  
 
[[Category:SourceMod Scripting]]
 
[[Category:SourceMod Scripting]]

Latest revision as of 17:49, 29 March 2020

Introduction

This guide contains some general suggestions as to how to improve local performance in your code. However, take note. Do not use this as a guide to prematurely optimizing your program. You should focus on making your scripts easily readable and maintainable. Premature optimization can make your code more complex, introducing bugs that you otherwise wouldn't have had.

If you do notice performance problems on your server, and you think they are introduced by your plugin, there are a few steps you can take. The best way to start is to get a profiler. You can use the SourceMod Profiler to tell you how much time is being spent in script functions.

However, if you are worried about your code, you can also try to estimate the cost of operations in your program. Anything that happens repeatedly in a small period of time - the contents of a loop, the body of a timer, an OnGameFrame hook - are good targets.

DISCLAIMER: The units of time in this article are comparative only. We estimate most SourcePawn operations as costing 1-10 cycles, where a cycle is measured in nanoseconds.

Estimating Cost

The goal of estimating cost is to figure out two things:

  • How expensive an operation is, run only once.
  • How many times the operation is repeated.

Let's try this with a simple example loop below. Most syntactic language features have a simple cost. Natives can be trickier.

for (int i = 0; i < strlen(string); i++)
{
    if (string[i] == '"')
    {
        return i;
    }
}

First, let's determine how many times this loop will run. If the length of string is n, then the loop will run n times. Next, let's see what each iteration of the loop costs:

  • strlen(string): This has to count all of the characters in |string|. Let's say counting a character costs 1 unit of time. Therefore, |strlen(string)| will cost n units of time.
  • if (string[i] == '"'): This contains an array load and comparison. Let's say those each cost 1 unit of time, totaling 2.
  • i++: This increments a local variable. Let's say that costs 1 unit of time.

Therefore, every iteration of the loop costs n + 3 units of time.

That means this loop may cost up to (n * (n + 3)) units of time. Now, if we know that n is always small - say, under 100 - that might not be a problem. But what happens if the string has 10,000 characters? Now, the loop will take over 100,000,000 units of time! If a "unit of time" is even as small as a nanosecond, that loop will take a whole tenth of a second, delaying the server by multiple frames!

This example is easy to fix. We can identify that strlen magnifies the cost of the loop, and rewrite it like this:

int length = strlen(string);
for (int i = 0; i < length; i++)
{
    if (string[i] == '"')
    {
        return i;
    }
}

Now the cost of this loop is just: n times a very, very small amount of time.

Avoid Large KeyValues

KeyValues is an n-ary structure using linked lists. This type of structure is extremely expensive to allocate and traverse. While it might be suitable for tiny pieces of information (that is, under 10KB of data or so), its complexity growth is very poor.

If you load KeyValues data, you should make an effort to, at the very least, cache its Handle so you don't need to reparse the file every time. Caching its contents on a needed basis would be a bonus as well.

If you're trying to use a KeyValues file with thousands of entries and updating/loading it on events such as player connections or disconnections, you will find that the structure will grow to an unmanageably slow size. If that's the case, you should consider moving to something like SQLite or MySQL.