Optimizing Plugins (AMX Mod X Scripting)
Contents
Introduction
Admin-Mod and AMX Mod X became very popular because of their easy to use scripting language. However, the words "scripting language" come with a lot of loaded preconceptions. Most people assume that because it's scripted:
- You can't possibly make it any faster
- It's pre-compiled, so it's already quite fast
- Details don't matter, as it's only "scripting" anyway
Especially, with Pawn (formerly Small), none of these are true. The compiler, in fact, is very poor at optimizing, and you can greatly increase the speed and efficiency of your plugins by keeping a few rules in mind. Remember - it's more important to minimize instructions than it is to minimize lines of code.
Terms
To read this document, you will need to understand a few concepts beforehand:
- BRANCHING - When the processor takes a different path of code. For example, to call a function or to use an if statement, the processor will "branch". Modern processors attempt to predict pathways with "branch prediction", but it's best to avoid branching a lot if possible.
- STACK ALLOCATION - In Pawn, all local data is stored on the stack, a big chunk of continuous memory. Whenever you create a variable on the stack, it is automatically written with zeroes.
- HEAP ALLOCATION - In Pawn, temporary data that needs to be referenced by a native is stored on the heap, another area of contiguous, but less restrictive memory.
- DATA SECTION - This is an area of memory built into your .amxx file. In fact, it "becomes" the heap at load time. All your strings and arrays are hardcoded into this area.
- EXPENSIVENESS - To be "expensive" in computer science means an operation requires a lot of CPU processing. It usually does not refer to memory size, only to processing cycles and time. Addition is inexpensive, floating power operations are expensive. However, both are inexpensive in comparison to writing a file. An inexpensive operation can also be called "cheap".
- BIG-OH NOTATION - O(*) notation refers to the expensiveness of an algorithm. If something is O(n), it occurs in linear time -- meaning that for N items, it will complete relative to N. O(N^2) means with N items, it will complete relative to N^2. O(1) means "constant time" - no matter what N is, it will run in the same amount of time.
Compiler Optimizations
These optimizations have to do with changing how your code is compiled. While the syntax may remain the same, you are not only increasing compile time, but increasing your plugin's efficiency and speed at run time.
Always Save Results
Observe the example code snippet below:
if (get_user_team(player) == TEAM_T) { //...code } else if (get_user_team(player) == TEAM_CT) { //...code } else if (get_user_team(player) == TEAM_SPECTATOR) { //...code }
This is a mild example of "cache your results". When the compiler generates assembly for this code, it will (in pseudo code) generate:
CALL get_user_team COMPARE+BRANCH CALL get_user_team COMPARE+BRANCH CALL get_user_team COMPARE+BRANCH
Notice the problem? We have called get_user_team an extra two times than necessary. The result doesn't change, so we can save it. Observe:
new team = get_user_team(player) if (team == TEAM_T) { //...code } else if (team == TEAM_CT) { //...code } //...etc
Now, the compiler will only generate this:
CALL get_user_team COMPARE+BRANCH COMPARE+BRANCH COMPARE+BRANCH
If get_user_team were an expensive operation (it's relatively cheap), we would have recalculated the entire result each branch of the if case.
Switch instead of If
If you can, you should use switch cases instead of if. This is because for an if statement, the compiler must branch to each consecutive if case. Using the example from above, observe the switch version:
new team = get_user_team(player) switch (team) { case TEAM_T: //code... case TEAM_CT: //code... case TEAM_SPECTATOR: //code... }
This will generate what's called a "case table". Rather than worm through displaced if tests, the compiler generates a table of possible values, which the virtual machine knows to browse through:
CALL get_user_team COMPARE COMPARE COMPARE
Don't Re-index Arrays
A common practive in Small is to "save space" by re-indexing arrays. There are a few myths behind this, such as saving memory, assuming the compiler does it for you, or readability. Fact: none of these are true. Observe the code below:
new players[32], num, team get_players(players, num) for (new i=0; i<num; i++) { if (!is_user_connected(players[i])) continue; team = get_user_team(players[i]) set_user_frags(players[i], 0) }
For this, the compiler generates code similar to:
:LOOP_BEGIN LOAD i LOAD players CALC LOAD players[i] CALL is_user_connected LOAD i LOAD players CALC LOAD players[i] CALL get_user_team LOAD i LOAD players CALC LOAD players[i] CALL set_user_frags
See what happened? The compiler does not cache array indexing. Because we've used players[i] each time, every instance generates 4-6 (or more) instructions which load i, the address of players, computes the final location, and then grabs the data out of memory. It is much faster to do:
new player for (new i=0; i<num; i++) { player = players[i] if (!is_user_connected(player)) continue; team = get_user_team(player) }
Not only is this more readable, but look at how much cruft we've shaved off the compiler's generated code:
:LOOP_BEGIN LOAD i LOAD players CALC LOAD players[i] STORE player CALL is_user_connected LOAD player CALL get_user_team LOAD player CALL set_user_frags
In a large loop you can drastically reduce codesize in this manner.
Global vs Local and Variables in Loops
It is important to realize that every variable in Pawn is automatically zeroed. For global variables, they are static and permanent, thus they are zeroed when your plugin is loaded. Variables in functions, however, must be zeroed dynamically. This is a slow and tedious operation, and you should not only avoid relying on it when necessary, but you should keep that fact in mind when using arrays.
Arrays in Pawn are "cells" of data. Each cell is four or eight bytes, depending on whether your processor is 32bit or 64bit. To create an array of 4096 cells, for example:
new array[4096]
The compiler generates code to manually zero every single one of the 16,384 bytes in that location. Normally, this isn't too bad -- but it can be absolutely deadly in a function which is called quite often. For example, server_frame is called on every server tick in AMX Mod X. To declare an array of that size in server_frame is highly unnecessary. Instead, you can take advantage of the fact that not only are globals free of charge, but server_frame does not need to be re-entrant. That is, you will never call server_frame inside of server_frame, so making the variable global will not bring up the problem of one instance of the function overwriting data from another instance of the same function. Observe:
new g_serverframe_array[4096] public server_frame() { //...code that uses g_serverframe_array }
This will execute conseridably faster. You can do this with smaller arrays too.
Likewise, it is equally important to avoid declaring arrays inside of loops. Consider the following block of code:
for (new i=0; i<num; i++) { new message[255], name[32], player player = players[i] get_user_name(player, name, 31) format(message, 254, "Hello, %s", name) }
If there are 32 players, this loop will actually resize and zero out over 1K of memory thirty two times in a row. Not good! However, on the other hand, it's nice that the message is zeroed out for us each time. Tip: you often only need to zero out the first character of a string. This will make the entire string empty. The code below is much more efficient:
new message[255], name[32], player for (new i=0; i<num; i++) { player = players[i] name[0] = '\0' message[0] = '\0' get_user_name(player, name, 31) format(message, 254, "Hello, %s", name) }
This has the effect of safely making the string empty beforehand, as well as largely reducing a lot of run-time overhead.
Static vs Global
A way to drastically improve the memory footprint of your functions without the potentially confusing and damaging method of global variables can be found in static variables.
A variable declared with the keyword "static" instead of "new" operates in the same way a global does (it is created only once) but the variable is still local to the function; this means it generates much easier to read code, while drastically improving speed.
For Loop Comparisons
A common mistake is to write code like this:
new string[256] = "something long" for (new i=0; i<strlen(string); i++) //...code
This plays off a similar principle from before: cache results. The compiler will actually recompute your string length on each iteration of the loop. This will have even worse effects if your string changes mid-loop. A more sensible method is:
new string[256] = "something long" new len = strlen(string) for (new i=0; i<len; i++) //...code
Tips and Tricks
Lookup Tables
Precompute what can be precomputed. For example, say you have a mapping of weapon indices to names:
if (weapon == CSW_AK47) copy(name, len, "weapon_ak47")
Ignoring the fact that we have get_weapon_name in AMX Mod X, this is inefficient. We could precompute this result in a table:
new g_WeaponNamesTable[TOTAL_WEAPONS] = { //..0 to CSW_AK47-1 "weapon_ak47", //..CSW_AK47+1 to TOTAL_WEAPONS-1 };
Perfect Hashing
- TODO: explain this
Local Strings
The Pawn compiler does not optimize the DATA section, which stores all hardcoded strings and global arrays. If you reference the same hardcoded string 500 times in your plugin, it will appear 500 different times. If this seems bad enough, it actually does this with all strings. For example, the empty string ("") appears everywhere in the include files, usually used as a default parameter to many natives. This too is copied into the data section for each unique reference.
For example:
set_cvar_string("amx_gaben", get_cvar_string("amx_gaben") + 1)
This will create two copies of "amx_gaben" in the DATA section. While this doesn't really hurt, it does increase the size of your plugin.
Similarly, this has the same problem:
#define AMX_GABEN "amx_gaben" set_cvar_string(AMX_GABEN, get_cvar_string(AMX_GABEN) + 1)
The only way to avoid this mess is to use global variables. As stated earlier, they're basically free storage.
new AMX_GABEN[] = "amx_gaben" set_cvar_string(AMX_GABEN, get_cvar_string(AMX_GABEN) + 1)
Again, while not necessary, this will reduce your plugin's size in memory and on disk. If you're already using defines, you can make this switch easily.
Note: Be careful that if you use this global array method, you do not allow the array to be modified.