Difference between revisions of "User:Nosoop/Guide/Advanced"
m (→The Hard Way: Have to mask out six bytes, according to DynamicHooks' JMP_SIZE constant) |
(→Virtual Hook or Detour?: Add note on symbol aliasing) |
||
(27 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | This section is provided for users that want or need to work with game-specific functionality that SourceMod doesn't provide access to out of the box. | ||
+ | |||
+ | It's assumed that you're comfortable with programming and various terms. By the end of this page you'll have some knowledge of calling / hooking arbitrary functions in the game. | ||
+ | |||
+ | = What is gamedata? = | ||
+ | |||
+ | ''Gamedata'', also known as game data, gameconfig, and gameconf, are files used to specify information tied to a specific game. | ||
+ | |||
+ | As the games that SourceMod runs on are updated independently of SourceMod itself, gamedata is used as a unified way to keep plugins and extensions up to date on game changes without needing to recompile them. | ||
+ | |||
+ | = Finding Functions = | ||
+ | |||
+ | * TODO refer to public SDK if you don't know what you're looking for | ||
+ | * TODO explain what to do in a game with symbols | ||
+ | * TODO suggest opening IDA's options and enabling opcode bytes | ||
+ | * TODO inlined functions | ||
+ | * TODO debugging | ||
+ | |||
= Finding VTable Offsets = | = Finding VTable Offsets = | ||
− | In C++, a ''virtual method table'' (shorthand "vtable") is effectively an array of function pointers. It's intended for inheritance — a virtual <code>::DoThing()</code> method can be different for different classes, and so the code will look up the correct function for a specific instance based on the table for the instance's class. | + | In C++, a ''virtual method table'' (shorthand "vtable") is effectively an array of function pointers. It's intended for inheritance — a virtual <code>::DoThing()</code> method can be different for different classes, and so the code will look up the correct function for a specific instance based on the table for the instance's class. Every class that uses a vtable will hold a reference to it as one of its properties. |
== The Hard Way == | == The Hard Way == | ||
− | * | + | Once you have the virtual call, jump to its reference in <tt>.rodata</tt> and make a note of that address. Scroll up until you see an offset reference (<tt>off_*</tt> in IDA, <tt>PTR_*</tt> in Ghidra); that is likely the first entry in the vtable (index 0). This reference is created by disassemblers as this is the address that is stored in class instances. |
+ | |||
+ | Get the difference between your virtual call's address and that of the first entry, then divide by the pointer size (4 on 32-bit platforms, 8 on 64-bit). | ||
+ | |||
+ | For example, given a 32-bit function pointer located at <tt>011AE84Ch</tt> and the start at <tt>011AE3C8h</tt>, you do <tt>(0x011AE84C-0x011AE3C8) / 4</tt>, resulting in the index 289. | ||
+ | |||
+ | Alternatively, if you're familiar with the code or have sources to cross-reference against, you can search for the virtual call itself. In a Linux disassembly, it will look something like this: | ||
− | + | <pre> | |
+ | ; get the first vtable by dereferencing the pointer at the start of the class instance | ||
+ | 8B 03 mov eax, [ebx] | ||
+ | |||
+ | ; push the class instance as a parameter | ||
+ | 89 1C 24 mov [esp], ebx | ||
+ | |||
+ | ; call the fourth entry (at index 3) in the vtable: 0xC / sizeof(void*) = 0x3 | ||
+ | FF 50 0C call dword ptr [eax+0Ch] | ||
+ | </pre> | ||
== The Easy Way == | == The Easy Way == | ||
Line 13: | Line 46: | ||
If the game isn't stripped of debugging symbols, use [https://asherkin.github.io/vtable/ asherkin's VTable Dumper]. It provides correct offsets for Linux binaries (as it's what it works with), and estimates usually correct offsets for Windows. | If the game isn't stripped of debugging symbols, use [https://asherkin.github.io/vtable/ asherkin's VTable Dumper]. It provides correct offsets for Linux binaries (as it's what it works with), and estimates usually correct offsets for Windows. | ||
− | There are instances where the dumper isn't correct | + | There are instances where the dumper isn't correct, so you may need to be careful in those cases. Known cases include: |
− | + | * [https://github.com/asherkin/vtable/issues/7 Classes that have discontinuous overloaded functions] | |
+ | * Possibly multiple inheritance | ||
+ | |||
+ | Aside - the layout of vtables is not the same across platforms. Notable differences are: | ||
− | + | # Linux may have multiple virtual destructors; Windows appears to only have up to one. | |
− | + | # Linux overloads are in the same order as they are initially defined in the original code. On Windows, this is the same, except that overloaded functions (those with the same name that accept different parameters) are grouped together and emitted in reverse order. | |
− | |||
− | |||
− | |||
= Creating Signatures = | = Creating Signatures = | ||
Line 28: | Line 61: | ||
After you've found a function, you need to tell SourceMod the sequence of bytes unique to it. Those bytes make up a ''signature''. | After you've found a function, you need to tell SourceMod the sequence of bytes unique to it. Those bytes make up a ''signature''. | ||
+ | |||
+ | {{Note|If you're using IDA and only see the mnemonics in the "IDA View" tab, make sure to set the number of opcode bytes in IDA options to a non-zero number. 8 is sufficient in most cases.}} | ||
You could treat just the sequence bytes as the signature directly, but this would break very easily whenever the game is updated. At the machine-code level, the ''instructions'' might be the same for "move X to Y", but the ''data'' might change — X and Y might be in a different location in the binary altogether. For an example within a longer signature: | You could treat just the sequence bytes as the signature directly, but this would break very easily whenever the game is updated. At the machine-code level, the ''instructions'' might be the same for "move X to Y", but the ''data'' might change — X and Y might be in a different location in the binary altogether. For an example within a longer signature: | ||
Line 33: | Line 68: | ||
<pre> | <pre> | ||
; sets esp to the offset aString | ; sets esp to the offset aString | ||
− | ; the bytes 3B B3 25 01 are the absolute offset of aString in this binary | + | ; the bytes 3B B3 25 01 are the absolute offset of aString in this binary in little-endian format (0x0125B33B) |
C7 04 24 3B B3 25 01 mov dword ptr [esp], offset aString | C7 04 24 3B B3 25 01 mov dword ptr [esp], offset aString | ||
Line 43: | Line 78: | ||
</pre> | </pre> | ||
− | The naive signature for that would be <code>\xC7\x04\x24\ | + | The naive signature for that would be <code>\xC7\x04\x24\x3B\xB3\x25\x01\xE8\x78\xF0\x48\x00\x8B\x45\x08</code>. However, you can't rely on those bytes mentioned to be constant at all: |
− | As a solution to this, you use wildcards to mask off the bytes you don't care about. | + | * The offsets of <code>aString</code> and <code>UTIL_VarArgs</code> might be located somewhere else after a game update |
+ | * Relocations may be performed such that the data bytes are different in memory from its on-disk representation | ||
+ | |||
+ | As a solution to this, you use wildcards to mask off the bytes you don't care about. For SourceMod game config files, the sequence <code>\x2A</code> indicates that particular byte shouldn't be checked and to continue to the next one. | ||
Here is what the previous signature looks like with the masked bytes displayed as <code>??</code>: | Here is what the previous signature looks like with the masked bytes displayed as <code>??</code>: | ||
Line 59: | Line 97: | ||
Masking is used mainly for offsets, such as for functions and variables. Instructions generally don't change unless the function code itself is modified, at which point you'll want to revisit your binary and update accordingly. | Masking is used mainly for offsets, such as for functions and variables. Instructions generally don't change unless the function code itself is modified, at which point you'll want to revisit your binary and update accordingly. | ||
− | If you're using DHooks with byte signatures (covered later), you may want to also mask out the first six bytes, as a detour will patch in an unconditional JMP at the start, and subsequent scans for the byte signature will fail. | + | If you're using DHooks with byte signatures (covered later), you may want to also mask out the first six bytes, as a detour will patch in an unconditional JMP at the start to trampoline into a user-defined function, and subsequent scans for the byte signature will fail. |
+ | |||
+ | {{Note|This is no longer the case as of SourceMod 1.11, which stores a copy of the original data for scanning purposes. However, it's noted here for historical / implementation detail reasons.}} | ||
For an extended lesson, you can look at the following material: | For an extended lesson, you can look at the following material: | ||
Line 67: | Line 107: | ||
== The Easy Way == | == The Easy Way == | ||
− | If you're using IDA (including Free), use the [https://github.com/alliedmodders/sourcemod/blob/master/tools/ida_scripts/ | + | If you're using IDA (including Free), use the [https://github.com/alliedmodders/sourcemod/blob/master/tools/ida_scripts/makesig7.idc <code>makesig7.idc</code>] script. If you're using Ghidra, use [https://github.com/alliedmodders/sourcemod/blob/master/tools/ghidra_scripts/makesig.py <code>makesig.py</code>]. |
They generally do pretty well at finding and masking byte signatures, but when it fails or you want a more robust signature, you should understand how to create the signatures manually. | They generally do pretty well at finding and masking byte signatures, but when it fails or you want a more robust signature, you should understand how to create the signatures manually. | ||
+ | |||
+ | Both scripts may produce different byte signatures for the same function due to using different methods to determine if a given byte should be masked. | ||
It's exceedingly rare, but possible that the binary has two copies of the exact same short function (for example, when they are typechecked and statically casted to different subclasses). Both scripts will fail in that case. SourceMod's signature scanner will use the first match it finds, so if any match is acceptable, you can still use an appropriately masked signature. | It's exceedingly rare, but possible that the binary has two copies of the exact same short function (for example, when they are typechecked and statically casted to different subclasses). Both scripts will fail in that case. SourceMod's signature scanner will use the first match it finds, so if any match is acceptable, you can still use an appropriately masked signature. | ||
− | + | If two copies of a function seem to exist, be sure to look at the disassembly to make sure that the functions are indeed the same. | |
= Finding Addresses = | = Finding Addresses = | ||
Line 81: | Line 123: | ||
To find an address, you start from a known location reference (signature). You may then have to jump to references (that is, dereference locations), then get an offset from the previous reference. | To find an address, you start from a known location reference (signature). You may then have to jump to references (that is, dereference locations), then get an offset from the previous reference. | ||
− | <code>read</code> keys indicate an offset to load / dereference relative to the previous address, and <code>offset</code> means to shift the previous address without any dereference. | + | <code>read</code> keys indicate an offset to load / dereference relative to the previous address, and <code>offset</code> means to shift the previous address without any dereference. These key / value pairs are processed in the order you specify them in the file; <code>offset</code> is only valid as the last "operation". |
For a C++-like example: | For a C++-like example: | ||
<pre class="cpp">// start from an address | <pre class="cpp">// start from an address | ||
+ | // "FindLocation" would return the location of either a named symbol reference or the start of a byte signature | ||
uintptr_t addr = FindLocation("some_signature"); | uintptr_t addr = FindLocation("some_signature"); | ||
addr = *reinterpret_cast<uintptr_t*>(addr + 40); // gameconf: "read" "40" | addr = *reinterpret_cast<uintptr_t*>(addr + 40); // gameconf: "read" "40" | ||
Line 106: | Line 149: | ||
#* The function was declared as static, where there is no <tt>this</tt> to pass in. | #* The function was declared as static, where there is no <tt>this</tt> to pass in. | ||
#* The function was declared with the <tt>SDKCall_GameRules</tt> or <tt>SDKCall_EntityList</tt> call types; SDKTools itself will provide the appropriate global instance. | #* The function was declared with the <tt>SDKCall_GameRules</tt> or <tt>SDKCall_EntityList</tt> call types; SDKTools itself will provide the appropriate global instance. | ||
− | # The return buffer. | + | # The return buffer, if applicable. |
+ | #* If the function returns a <tt>Vector</tt> or <tt>QAngle</tt>, the parameter is a <tt>float[3]</tt>. | ||
+ | #* If the function returns a <tt>char*</tt>, the parameters should be a <tt>char[]</tt> buffer and an <tt>int</tt> specifying the size of the buffer. The return value of the SDKCall will be the number of characters written, or -1 if the function returned a null pointer (to differentiate between an empty string). | ||
+ | #* If the function returns a primitive type / entity / edict, it will be the return value of the SDKCall, so no such return buffer is necessary. | ||
# Any remaining parameters for the function. | # Any remaining parameters for the function. | ||
+ | |||
+ | Examples: | ||
+ | <pre class="cpp">// Vector CBaseCombatCharacter::Weapon_ShootPosition() -- has 'this' and 'Vector' return | ||
+ | float vecShootPosition[3]; | ||
+ | SDKCall(g_hSDKCall, client, vecShootPosition); | ||
+ | |||
+ | // const char *CBaseAnimating::GetSequenceName(int iSequence) -- has 'this', 'char*' return, and parameter | ||
+ | char sequenceName[64]; | ||
+ | SDKCall(g_hSDKCall, entity, sequenceName, sizeof(sequenceName), iSequence); | ||
+ | |||
+ | // bool CGlobalEntityList::IsEntityPtr(void* pTest) -- SDKCall_EntityList is used, so no 'this' explicitly needed | ||
+ | // SDKCall passes the return value from the called function as its return value, so use an assignment operator | ||
+ | Address pTest; | ||
+ | bool result = SDKCall(g_hSDKCall, pTest); | ||
+ | </pre> | ||
+ | |||
+ | == Calling via Signature or Offset? == | ||
+ | |||
+ | If you have a class with a virtual method, you generally should set up the <tt>SDKCall</tt> to take a virtual offset. Doing so allows your plugin to have the expected interactions with other plugin's hooks, covered in the next section. | ||
+ | |||
+ | You should use a signature either when the function is not virtual or if you need to bypass the virtual override on an entity (e.g. calling the parent class's function). In those instances, only detours will take effect. | ||
= Hooking Game Functions (with DHooks) = | = Hooking Game Functions (with DHooks) = | ||
+ | |||
+ | DHooks is an extension bundled with SourceMod that enables plugins to hook functions of their choosing (currently restricted to those accessible via server / engine binaries). You may use its functionality by including {{SourceMod API|file=dhooks}}. | ||
+ | |||
+ | As with {{SourceMod API|file=sdktools|function=SDKCall}}s, you must ensure that your hook setup is declared with the same parameter and return types to ensure the server continues to operate as you'd expect. | ||
{{Note|This section is a work-in-progress.}} | {{Note|This section is a work-in-progress.}} | ||
+ | |||
+ | == Virtual Hook or Detour? == | ||
+ | |||
+ | A virtual hook is mainly used for hooking virtual methods of a class; a detour is used for hooking any function. | ||
+ | |||
+ | While detours can be used to hook the function a virtual table calls into, virtual hooks still have the merit of hooking specific classes / instances. More specifically: | ||
+ | |||
+ | * DHooks provides the bookkeeping on which instances are and aren't hooked, so for virtual hooks the callback will only be invoked on those you specifically hook. On detours, you have to filter on instances yourself. | ||
+ | * On chained inheritance, a virtual hook will only act on the exact class and not any parent nor subclasses, even if they all point to the same virtual function. Detours will, again, be called on any invocation of the function, including calls to it made by its subclass. | ||
+ | * On some binaries (especially with more aggressive optimizations in place), multiple function symbols may map to the same place in memory if they output the same code. As a result, detours may be called when you don't expect them to be. |
Latest revision as of 22:19, 10 June 2024
This section is provided for users that want or need to work with game-specific functionality that SourceMod doesn't provide access to out of the box.
It's assumed that you're comfortable with programming and various terms. By the end of this page you'll have some knowledge of calling / hooking arbitrary functions in the game.
Contents
What is gamedata?
Gamedata, also known as game data, gameconfig, and gameconf, are files used to specify information tied to a specific game.
As the games that SourceMod runs on are updated independently of SourceMod itself, gamedata is used as a unified way to keep plugins and extensions up to date on game changes without needing to recompile them.
Finding Functions
- TODO refer to public SDK if you don't know what you're looking for
- TODO explain what to do in a game with symbols
- TODO suggest opening IDA's options and enabling opcode bytes
- TODO inlined functions
- TODO debugging
Finding VTable Offsets
In C++, a virtual method table (shorthand "vtable") is effectively an array of function pointers. It's intended for inheritance — a virtual ::DoThing()
method can be different for different classes, and so the code will look up the correct function for a specific instance based on the table for the instance's class. Every class that uses a vtable will hold a reference to it as one of its properties.
The Hard Way
Once you have the virtual call, jump to its reference in .rodata and make a note of that address. Scroll up until you see an offset reference (off_* in IDA, PTR_* in Ghidra); that is likely the first entry in the vtable (index 0). This reference is created by disassemblers as this is the address that is stored in class instances.
Get the difference between your virtual call's address and that of the first entry, then divide by the pointer size (4 on 32-bit platforms, 8 on 64-bit).
For example, given a 32-bit function pointer located at 011AE84Ch and the start at 011AE3C8h, you do (0x011AE84C-0x011AE3C8) / 4, resulting in the index 289.
Alternatively, if you're familiar with the code or have sources to cross-reference against, you can search for the virtual call itself. In a Linux disassembly, it will look something like this:
; get the first vtable by dereferencing the pointer at the start of the class instance 8B 03 mov eax, [ebx] ; push the class instance as a parameter 89 1C 24 mov [esp], ebx ; call the fourth entry (at index 3) in the vtable: 0xC / sizeof(void*) = 0x3 FF 50 0C call dword ptr [eax+0Ch]
The Easy Way
If the game isn't stripped of debugging symbols, use asherkin's VTable Dumper. It provides correct offsets for Linux binaries (as it's what it works with), and estimates usually correct offsets for Windows.
There are instances where the dumper isn't correct, so you may need to be careful in those cases. Known cases include:
- Classes that have discontinuous overloaded functions
- Possibly multiple inheritance
Aside - the layout of vtables is not the same across platforms. Notable differences are:
- Linux may have multiple virtual destructors; Windows appears to only have up to one.
- Linux overloads are in the same order as they are initially defined in the original code. On Windows, this is the same, except that overloaded functions (those with the same name that accept different parameters) are grouped together and emitted in reverse order.
Creating Signatures
The Hard Way
After you've found a function, you need to tell SourceMod the sequence of bytes unique to it. Those bytes make up a signature.
You could treat just the sequence bytes as the signature directly, but this would break very easily whenever the game is updated. At the machine-code level, the instructions might be the same for "move X to Y", but the data might change — X and Y might be in a different location in the binary altogether. For an example within a longer signature:
; sets esp to the offset aString ; the bytes 3B B3 25 01 are the absolute offset of aString in this binary in little-endian format (0x0125B33B) C7 04 24 3B B3 25 01 mov dword ptr [esp], offset aString ; call function, the four bytes after E8 are the location of the function E8 78 F0 48 00 call _Z12UTIL_VarArgsPKcz ; sets eax to arg 0 8B 45 08 mov eax, [ebp+arg_0]
The naive signature for that would be \xC7\x04\x24\x3B\xB3\x25\x01\xE8\x78\xF0\x48\x00\x8B\x45\x08
. However, you can't rely on those bytes mentioned to be constant at all:
- The offsets of
aString
andUTIL_VarArgs
might be located somewhere else after a game update - Relocations may be performed such that the data bytes are different in memory from its on-disk representation
As a solution to this, you use wildcards to mask off the bytes you don't care about. For SourceMod game config files, the sequence \x2A
indicates that particular byte shouldn't be checked and to continue to the next one.
Here is what the previous signature looks like with the masked bytes displayed as ??
:
C7 04 24 ?? ?? ?? ?? mov dword ptr [esp], offset aString E8 ?? ?? ?? ?? call _Z12UTIL_VarArgsPKcz 8B 45 08 mov eax, [ebp+arg_0]
A masked signature would then be \xC7\x04\x24\x2A\x2A\x2A\x2A\xE8\x2A\x2A\x2A\x2A\x8B\x45\x08
.
Masking is used mainly for offsets, such as for functions and variables. Instructions generally don't change unless the function code itself is modified, at which point you'll want to revisit your binary and update accordingly.
If you're using DHooks with byte signatures (covered later), you may want to also mask out the first six bytes, as a detour will patch in an unconditional JMP at the start to trampoline into a user-defined function, and subsequent scans for the byte signature will fail.
For an extended lesson, you can look at the following material:
- Signature Scanning on the AlliedModders wiki
The Easy Way
If you're using IDA (including Free), use the makesig7.idc
script. If you're using Ghidra, use makesig.py
.
They generally do pretty well at finding and masking byte signatures, but when it fails or you want a more robust signature, you should understand how to create the signatures manually.
Both scripts may produce different byte signatures for the same function due to using different methods to determine if a given byte should be masked.
It's exceedingly rare, but possible that the binary has two copies of the exact same short function (for example, when they are typechecked and statically casted to different subclasses). Both scripts will fail in that case. SourceMod's signature scanner will use the first match it finds, so if any match is acceptable, you can still use an appropriately masked signature.
If two copies of a function seem to exist, be sure to look at the disassembly to make sure that the functions are indeed the same.
Finding Addresses
Sometimes you have a symbol, but you need an address to work with. That is what the "Addresses" section of a game configuration file is used for.
To find an address, you start from a known location reference (signature). You may then have to jump to references (that is, dereference locations), then get an offset from the previous reference.
read
keys indicate an offset to load / dereference relative to the previous address, and offset
means to shift the previous address without any dereference. These key / value pairs are processed in the order you specify them in the file; offset
is only valid as the last "operation".
For a C++-like example:
// start from an address // "FindLocation" would return the location of either a named symbol reference or the start of a byte signature uintptr_t addr = FindLocation("some_signature"); addr = *reinterpret_cast<uintptr_t*>(addr + 40); // gameconf: "read" "40" addr = *reinterpret_cast<uintptr_t*>(addr); // gameconf: "read" "0" addr += 13; // gameconf: "offset" "13"
Calling Game Functions
SDKCall Order
When performing an SDKCall
, the parameters need to be passed in the following order:
- The SDKCall handle received from
EndPrepSDKCall
. - The this instance. this may be omitted in the following cases:
- The function was declared as static, where there is no this to pass in.
- The function was declared with the SDKCall_GameRules or SDKCall_EntityList call types; SDKTools itself will provide the appropriate global instance.
- The return buffer, if applicable.
- If the function returns a Vector or QAngle, the parameter is a float[3].
- If the function returns a char*, the parameters should be a char[] buffer and an int specifying the size of the buffer. The return value of the SDKCall will be the number of characters written, or -1 if the function returned a null pointer (to differentiate between an empty string).
- If the function returns a primitive type / entity / edict, it will be the return value of the SDKCall, so no such return buffer is necessary.
- Any remaining parameters for the function.
Examples:
// Vector CBaseCombatCharacter::Weapon_ShootPosition() -- has 'this' and 'Vector' return float vecShootPosition[3]; SDKCall(g_hSDKCall, client, vecShootPosition); // const char *CBaseAnimating::GetSequenceName(int iSequence) -- has 'this', 'char*' return, and parameter char sequenceName[64]; SDKCall(g_hSDKCall, entity, sequenceName, sizeof(sequenceName), iSequence); // bool CGlobalEntityList::IsEntityPtr(void* pTest) -- SDKCall_EntityList is used, so no 'this' explicitly needed // SDKCall passes the return value from the called function as its return value, so use an assignment operator Address pTest; bool result = SDKCall(g_hSDKCall, pTest);
Calling via Signature or Offset?
If you have a class with a virtual method, you generally should set up the SDKCall to take a virtual offset. Doing so allows your plugin to have the expected interactions with other plugin's hooks, covered in the next section.
You should use a signature either when the function is not virtual or if you need to bypass the virtual override on an entity (e.g. calling the parent class's function). In those instances, only detours will take effect.
Hooking Game Functions (with DHooks)
DHooks is an extension bundled with SourceMod that enables plugins to hook functions of their choosing (currently restricted to those accessible via server / engine binaries). You may use its functionality by including <dhooks>
.
As with SDKCall
s, you must ensure that your hook setup is declared with the same parameter and return types to ensure the server continues to operate as you'd expect.
Virtual Hook or Detour?
A virtual hook is mainly used for hooking virtual methods of a class; a detour is used for hooking any function.
While detours can be used to hook the function a virtual table calls into, virtual hooks still have the merit of hooking specific classes / instances. More specifically:
- DHooks provides the bookkeeping on which instances are and aren't hooked, so for virtual hooks the callback will only be invoked on those you specifically hook. On detours, you have to filter on instances yourself.
- On chained inheritance, a virtual hook will only act on the exact class and not any parent nor subclasses, even if they all point to the same virtual function. Detours will, again, be called on any invocation of the function, including calls to it made by its subclass.
- On some binaries (especially with more aggressive optimizations in place), multiple function symbols may map to the same place in memory if they output the same code. As a result, detours may be called when you don't expect them to be.