diff --git a/.gitignore b/.gitignore
index 250595f..f53d18e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,2 @@
__pycache__
.vscode
-custom_hooks.py
-tests/bin/*
-tests/out/*
-tests/Makefile
diff --git a/README.md b/README.md
index 9546154..e01c6a7 100644
--- a/README.md
+++ b/README.md
@@ -1,77 +1,95 @@
-# Symless
-
-Automatic structures recovering plugin for IDA. Able to reconstruct structures/classes and virtual tables used in a binary.
-
-### Features
-* Automatic creation of identified structures (c++ classes, virtual tables and others)
-* Xrefs on structures usages
-* Functions typing using gathered information
-
-Two modes are available: **Pre-Analysis** and **Plugin**.
-
-## Plugin mode
-Interactive IDA plugin. Uses static analysis from an entry point selected by the user to build and propagate a structure.
-
-
-
-
-
-
-
-
-### Installation
-```
-$ python plugin/install.py [-u]
-```
-
-**Manual installation**: copy the [symless](symless/) directory and [symless_plugin.py](plugin/symless_plugin.py) into IDA plugins folder.
-
-### Usage
-While in IDA disassembly view:
-- Right-click a register that contains a structure pointer
-- Select **Propagate structure**
-- Select which structure & shift to apply
-
-Symless will then propagate the structure, build it and type untyped functions / operands with the harvested information. This action can be undone with **Ctrl-Z**. A new structure can be created, an existing one can be completed.
-
-## Pre-Analysis mode
-
-### Before use
-
-#### Specify your IDA installation:
-
-```
-export IDA_DIR="$HOME/idapro-M.m"
-```
-
-#### Edit the config file to suit your case:
-
-Specify the memory allocation functions used in your executable in the [imports.csv](symless/config/imports.csv) file. Syntax is discussed there.
-
-Symless uses those to find structures creations from memory allocations. C++ classes can also be retrieved from their virtual tables.
-
-### Usage
-```
- $ python3 symless.py [-c config.csv]
-```
-
-* ```config.csv``` - configuration to be used (defaults to [imports.csv](symless/config/imports.csv))
-* ```target(s)``` - one or more binaries / IDA bases
-
-Symless will create a new IDA base when given an executable as an argument. Otherwise keep in mind it may overwrite user-modifications on existing bases.
-
-Once done the IDA base will be populated with information about identified structures.
-
-## Support
-Both stripped and non-stripped binaries are supported. Symbols are only used to name the created structures.
-
-**x64** and **i386** binairies using the following calling conventions are supported:
-* Windows x64 (```__fastcall```)
-* Windows i386 (```__stdcall``` & ```__thiscall```)
-* System V x64 (```__fastcall```)
-* System V i386 (```__stdcall```)
-
-**IDA Pro 7.6** or newer & **python 3**
-
-## Disclaimer
-Symless is still in development and might not fit every use cases.
+# Symless
+
+An **IDA Pro plugin** that assists with **structure reconstruction**. Using static data-flow analysis to gather information, Symless automates most of the structure creation workflow. Its key features are:
+
+* Inferring and creating structure fields based on access patterns
+* Identifying and creating C++ virtual function tables
+* Placing cross-references to link each structure field with the code that uses it
+
+## Installation
+
+```bash
+$ python3 plugin/install.py [-u]
+```
+
+Or install manually: copy the [symless](symless/) directory and [symless_plugin.py](plugin/symless_plugin.py) file into your IDA plugins folder.
+
+## Usage
+
+The **interactive plugin** helps reconstruct a chosen structure. In the Disassembly or Pseudocode view, right-click a line that uses the structure you want to rebuild and select **Propagate structure** from the context menu:
+
+
+
+
+
+
+
+A form will appear prompting for:
+
+* The **name of the new structure** to create, or an existing structure to extend.
+* An **entry point** for the data-flow analysis, which is performed on the microcode. This entry point is a microcode operand that holds a pointer to the structure.
+
+> [!NOTE]
+> The microcode is IDA's intermediate representation (IR), generated from the CPU-specific assembly. Because of its similarity with the assembly, it is not difficult to read.
+
+
+
+
+
+
+
+Additional options are:
+
+* **Shifted by**, the shift to apply to the structure pointer
+* **Spread in callees**, whether the analysis should extend into called functions and discovered virtual methods
+
+Clicking **Propagate** starts the analysis. The structure pointer is tracked from the selected entry, and observed accesses are used to infer structure fields.
+
+> [!TIP]
+> To get a more complete structure, run the analysis from the code that initializes the structure (for example, right after an allocation or inside a constructor).
+
+The new structure is added to the Local Types view. Cross-references are added on assembly operands for each field access:
+
+
+
+
+
+
+
+You can then edit field types directly from the pseudocode. The plugin reduces the amount of back-and-forth navigation between disassembly, pseudocode and local types, required when creating structures and placing cross-references.
+
+## CLI mode
+
+An **automatic command-line** mode also exists, able to identify and automatically reconstruct most of the structures used in a binary. Symless uses two sources to discover structures:
+
+* Dynamic memory allocations
+* C++ virtual function tables and constructors
+
+This automatic mode is intended as a pre-analysis step, to create structures and improve decompilation before manual work.
+
+First, add the memory allocators used in your executable in [imports.csv](symless/config/imports.csv). This allows Symless to rebuild structures from dynamic allocations. If you don't, only C++ classes with virtual tables will be reconstructed.
+
+The pre-analysis is ran using:
+
+```bash
+ $ python3 symless.py [-c config.csv]
+```
+
+* ```config.csv``` - configuration file to use (defaults to [imports.csv](symless/config/imports.csv))
+* ```target``` - a binary or an IDA database
+
+if target is an executable, a new IDA database will be created. When the analysis finishes, the database is populated with the reconstructed structures.
+
+### Limitations
+
+The main challenge for the automatic analysis is resolving conflicts between structures. This can cause functions to be incorrectly typed, or duplicated structures to be created. In some cases it is better to use the interactive plugin, which is less prone to errors.
+
+## Support
+
+All architectures supported by your IDA decompiler are supported.
+
+Supported IDA versions are **IDA 8.4 and later**.
+
+## Credits
+
+Thalium Team, and Célian Debéthune for working on the architecture-agnostic version during his internship at Thalium.
diff --git a/img/plugin-demo.gif b/img/plugin-demo.gif
deleted file mode 100644
index 267b8d8..0000000
Binary files a/img/plugin-demo.gif and /dev/null differ
diff --git a/img/plugin_builder_form.png b/img/plugin_builder_form.png
new file mode 100644
index 0000000..dc43d74
Binary files /dev/null and b/img/plugin_builder_form.png differ
diff --git a/img/plugin_built_structure.png b/img/plugin_built_structure.png
new file mode 100644
index 0000000..537e750
Binary files /dev/null and b/img/plugin_built_structure.png differ
diff --git a/img/plugin_context_menu.png b/img/plugin_context_menu.png
new file mode 100644
index 0000000..03e9c4a
Binary files /dev/null and b/img/plugin_context_menu.png differ
diff --git a/plugin/symless_plugin.py b/plugin/symless_plugin.py
index 81f735c..4b5be81 100644
--- a/plugin/symless_plugin.py
+++ b/plugin/symless_plugin.py
@@ -1,10 +1,10 @@
import base64
import collections
import importlib
-import inspect
import os
import pkgutil
import sys
+import traceback
from typing import Collection
import idaapi
@@ -15,9 +15,10 @@
class fixedBtn(idaapi.Form.ButtonInput):
- def __init__(self, plugin: "SymlessPlugin"):
+ def __init__(self, plugin: "SymlessPlugin", form: "SymlessInfoForm"):
super().__init__(self.reload, "0")
self.plugin = plugin
+ self.form = form
def reload(self, code):
idaapi.show_wait_box("Reloading Symless..")
@@ -26,18 +27,17 @@ def reload(self, code):
# terminate all extensions
self.plugin.term()
- # reload symless code
- reload_plugin()
+ remove_old_modules()
# rebind all extensions
- self.plugin.find_extensions()
+ self.plugin.find_extensions(reload=True)
except Exception as e:
- import traceback
-
+ idaapi.hide_wait_box()
utils.g_logger.critical(repr(e) + "\n" + traceback.format_exc())
- finally:
+ else:
idaapi.hide_wait_box()
+ self.form.Close(1)
def get_tag(self):
return "" % (
@@ -72,7 +72,7 @@ def __init__(self, plugin: "SymlessPlugin"):
{
"img": idaapi.Form.StringLabel(img_html, tp=idaapi.Form.FT_HTML_LABEL, size=None),
"info": idaapi.Form.StringLabel(info_html, tp=idaapi.Form.FT_HTML_LABEL, size=None),
- "reload": fixedBtn(plugin),
+ "reload": fixedBtn(plugin, self),
},
)
@@ -92,7 +92,7 @@ def init(self) -> idaapi.plugmod_t:
return idaapi.PLUGIN_KEEP
# find and load extensions from symless plugins folder
- def find_extensions(self):
+ def find_extensions(self, reload: bool = False):
for mod_info in pkgutil.walk_packages(plugins.__path__, prefix="symless.plugins."):
if mod_info.ispkg:
continue
@@ -111,21 +111,27 @@ def find_extensions(self):
spec.loader.exec_module(module)
except BaseException as e:
sys.modules.pop(module.__name__)
- print(f"Error while loading extension {mod_info.name}: {e}")
+ utils.g_logger.error(f"Error while loading extension {mod_info.name}:")
+ utils.g_logger.error(repr(e) + "\n" + traceback.format_exc())
continue
# module defines an extension
if not hasattr(module, "get_plugin"):
continue
-
ext: plugins.plugin_t = module.get_plugin()
+
+ # notify the extension that it has been reloaded
+ if reload:
+ ext.reload()
+
self.ext.append(ext)
- # debug - reload plugin action
+ # display info panel
def run(self, args):
info = SymlessInfoForm(self)
info.Compile()
- return info.Execute()
+ info.Execute()
+ info.Free()
# term all extensions
def term(self):
@@ -138,31 +144,13 @@ def PLUGIN_ENTRY() -> idaapi.plugin_t:
return SymlessPlugin()
-# reload one module, by first reloading all imports from that module
-# to_reload contains all modules to reload
-def reload_module(module, to_reload: set):
- if module not in to_reload:
- return
-
- # remove from set first, avoid infinite recursion if recursive imports
- to_reload.remove(module)
-
- # reload all imports first
- for _, dep in inspect.getmembers(module, lambda k: inspect.ismodule(k)):
- reload_module(dep, to_reload)
-
- # reload the module
- utils.g_logger.info(f"Reloading {module.__name__} ..")
- importlib.reload(module)
-
-
-# reload all symless code
-def reload_plugin():
- # list all modules to reload, unordered
- to_reload = set()
- for k, mod in sys.modules.items():
+# remove old symless modules from loaded modules
+def remove_old_modules():
+ to_remove = set()
+ for k in sys.modules.keys():
if k.startswith("symless"):
- to_reload.add(mod)
+ to_remove.add(k)
- for mod in list(to_reload): # copy to alter
- reload_module(mod, to_reload)
+ for r in to_remove:
+ print(f"Removing old {r} ..")
+ del sys.modules[r]
diff --git a/run_script.py b/run_script.py
index b366d75..b1a1615 100644
--- a/run_script.py
+++ b/run_script.py
@@ -8,8 +8,8 @@
from typing import List, Optional, Tuple
# max & min supported majors
-MIN_MAJOR = 7
-MAX_MAJOR = 8
+MIN_MAJOR = 8
+MAX_MAJOR = 9
def stderr_print(line: str):
@@ -40,20 +40,23 @@ def find_ida_Linux() -> Optional[str]:
# find in PATH
if "PATH" in os.environ:
for path in os.environ["PATH"].split(":"):
- if os.path.exists(os.path.join(path, "idat64")):
+ if os.path.exists(os.path.join(path, "idat64")) or os.path.exists(os.path.join(path, "idat")):
return path
# find in default location
for major in range(MAX_MAJOR, MIN_MAJOR - 1, -1):
- for minor in range(9, 0, -1):
- current = "%s/idapro-%d.%d" % (os.environ["HOME"], major, minor)
- if os.path.exists(current):
- return current
+ for minor in range(9, -1, -1):
+ p1 = "%s/idapro-%d.%d" % (os.environ["HOME"], major, minor)
+ p2 = "%s/ida-pro-%d.%d" % (os.environ["HOME"], major, minor)
+ if os.path.exists(p1):
+ return p1
+ if os.path.exists(p2):
+ return p2
return None
# find idat executables
-def find_idat() -> Tuple[str, str]:
+def find_idat() -> Tuple[Optional[str], str]:
ida_dir = None
# user defined IDA path
@@ -73,18 +76,23 @@ def find_idat() -> Tuple[str, str]:
print(f'Using IDA installation: "{ida_dir}"')
suffix = ".exe" if sys.platform == "win32" else ""
- ida32 = os.path.join(ida_dir, "idat" + suffix)
- ida64 = os.path.join(ida_dir, "idat64" + suffix)
+ idat = os.path.join(ida_dir, "idat" + suffix)
+ idat64 = os.path.join(ida_dir, "idat64" + suffix)
- if not os.path.isfile(ida32):
+ if not (os.path.isfile(idat) or os.path.isfile(idat64)):
stderr_print('Missing idat%s in "%s"' % (suffix, ida_dir))
return None
- if not os.path.isfile(ida64):
- stderr_print('Missing idat64%s in "%s"' % (suffix, ida_dir))
- return None
+ # earliest IDA 9 version - only idat64
+ if not os.path.isfile(idat):
+ return (None, idat64)
+
+ # IDA 9 + - only idat
+ if not os.path.isfile(idat64):
+ return (None, idat)
- return (ida32, ida64)
+ # IDA 8 or earlier
+ return (idat, idat64)
# craft IDA batch command
@@ -119,7 +127,7 @@ def run_ida_batchmode(idat: str, filepath: str) -> int:
# Create .idb from 32 bits executable or .i64 from 64 bits exe
def make_idb(ida_install: tuple, filepath: str) -> Tuple[str, int]:
- if run_ida_batchmode(ida_install[0], filepath) == 0:
+ if ida_install[0] and run_ida_batchmode(ida_install[0], filepath) == 0:
return (f"{filepath}.idb", 0)
# 32 bits analysis failed, try 64 bits mode
diff --git a/scripts/ctors.py b/scripts/ctors.py
index 5b431ee..68a6bf1 100644
--- a/scripts/ctors.py
+++ b/scripts/ctors.py
@@ -1,3 +1,4 @@
+import argparse
import inspect
import os
import re
@@ -19,6 +20,10 @@
""" Debug script - Find ctors/dtors in binary """
if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--prefix", type=str, default="")
+ args = parser.parse_args(idc.ARGV[1:])
+
# wait for autoanalysis, we'll need its results
idaapi.auto_wait()
@@ -26,10 +31,11 @@
i = 0
for vtbl in families:
- print("Family %x:" % vtbl)
+ print("%sFamily 0x%x:" % (args.prefix, vtbl))
for ctor in families[vtbl]:
- name = ida_utils.demangle_ea(ctor)
+ fea = ctor.func.start_ea
+ name = ida_utils.demangle_ea(fea)
match = re_ctors.match(name)
if match is None:
@@ -44,8 +50,8 @@
else:
typ = "[CONSTRUCTOR]"
- print(" %s %x -> %s" % (typ, ctor, name))
- print()
+ print("%s %s 0x%x -> %s" % (args.prefix, typ, fea, name))
+ print(args.prefix)
i += 1
diff --git a/scripts/dump.py b/scripts/dump.py
index 6ec5a13..a6b3584 100644
--- a/scripts/dump.py
+++ b/scripts/dump.py
@@ -4,6 +4,8 @@
import idautils
import idc
+""" Dump all informations found in one database (types, functions, ..) """
+
def report(output: str = ""):
if len(output) == 0:
@@ -17,7 +19,10 @@ def report(output: str = ""):
def dump_functions() -> dict:
out = {"total": 0}
- for fea in idautils.Functions():
+ all_fcts = [fea for fea in idautils.Functions()]
+ all_fcts.sort()
+
+ for fea in all_fcts:
# only print user defined function types
if not idaapi.is_userti(fea):
continue
@@ -85,10 +90,10 @@ def dump_structures() -> dict:
struc = idaapi.get_struc(sid)
# do not dump hidden structs
- if struc.props & idaapi.SF_HIDDEN:
- continue
+ # if struc.props & idaapi.SF_HIDDEN:
+ # continue
- is_vtable = name.endswith("_vtbl")
+ is_vtable = "_vtbl" in name
if is_vtable:
out["total vtables"] += 1
@@ -136,7 +141,7 @@ def dump_local_types() -> dict:
idati = idaapi.get_idati()
- count = idaapi.get_ordinal_qty(idati)
+ count = idaapi.get_ordinal_count(idati)
if count == 0 or count == 0xFFFFFFFF:
return
@@ -150,7 +155,7 @@ def dump_local_types() -> dict:
# sort by name
types.sort(key=lambda k: str(k[1]))
- for ordinal, tinfo in types:
+ for _, tinfo in types:
# do not print types imported as structures
name = str(tinfo)
if idaapi.get_struc_id(name) != idaapi.BADADDR:
diff --git a/scripts/entries.py b/scripts/entries.py
index c2aca18..e83dd98 100644
--- a/scripts/entries.py
+++ b/scripts/entries.py
@@ -12,7 +12,8 @@
import symless.allocators as allocators
import symless.model.entrypoints as entrypoints
-import symless.model.model as model
+
+# import symless.model.model as model
""" Debug script - Get all entrypoints (structures creations) identified in one binary """
@@ -26,15 +27,15 @@
config_path = os.path.abspath(os.path.join(symless_dir, "symless", "config", "imports.csv"))
imports = allocators.get_allocators(config_path)
- if imports is None:
+ if not len(imports):
print("%sNo allocators identified" % args.prefix)
- imports = list()
+ idc.qexit(0)
# get initial entrypoints
ctx = entrypoints.retrieve_entrypoints(imports)
# build entries tree
- model.analyze_entrypoints(ctx)
+ # model.analyze_entrypoints(ctx)
entries = ctx.get_entrypoints()
allocs = ctx.get_allocators()
diff --git a/scripts/functions.py b/scripts/functions.py
deleted file mode 100644
index da2b5d4..0000000
--- a/scripts/functions.py
+++ /dev/null
@@ -1,47 +0,0 @@
-import argparse
-import inspect
-import os
-import sys
-
-import idaapi
-import idc
-
-# add symless dir to search path
-symless_dir = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(inspect.getsourcefile(lambda: 0))), ".."))
-sys.path.append(symless_dir)
-
-import symless.allocators as allocators
-import symless.model.entrypoints as entrypoints
-import symless.model.model as model
-import symless.utils.ida_utils as ida_utils
-
-""" Debug script - Dump information about analyzed functions """
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser()
- parser.add_argument("--prefix", type=str, default="")
- args = parser.parse_args(idc.ARGV[1:])
-
- idaapi.auto_wait()
-
- config_path = os.path.abspath(os.path.join(symless_dir, "symless", "config", "imports.csv"))
-
- imports = allocators.get_allocators(config_path)
- if imports is None:
- print("%sNo allocators identified" % args.prefix)
- imports = list()
-
- # get initial entrypoints
- ctx = entrypoints.retrieve_entrypoints(imports)
-
- # build entries tree
- model.analyze_entrypoints(ctx)
- entries = ctx.get_entrypoints()
- allocs = ctx.get_allocators()
-
- # dump analyzed functions
- for fct in ctx.get_functions():
- fct_name = ida_utils.demangle_ea(fct.ea).split("(")[0]
- print("%s%s (0x%x), at least %d args" % (args.prefix, fct_name, fct.ea, fct.get_nargs()))
-
-idc.qexit(0)
diff --git a/scripts/vtables.py b/scripts/vtables.py
index 0123524..41d0976 100644
--- a/scripts/vtables.py
+++ b/scripts/vtables.py
@@ -1,7 +1,9 @@
+import argparse
import inspect
import os
import sys
+import idaapi
import idc
# add symless dir to search path
@@ -9,20 +11,29 @@
sys.path.append(symless_dir)
import symless.utils.ida_utils as ida_utils
+import symless.utils.vtables as vtables
""" Debug script - Scans binary for vtables """
if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--prefix", type=str, default="")
+ args = parser.parse_args(idc.ARGV[1:])
+
+ idaapi.auto_wait()
+
stats = [0, 0]
- for vtbl_ref, vtbl_addr in ida_utils.get_all_vtables():
- name = ida_utils.demangle_ea(vtbl_addr)
- size = ida_utils.vtable_size(vtbl_addr)
- print("%x ref for %x (size: 0x%x) -> %s" % (vtbl_ref, vtbl_addr, size, name))
+ for vtbl in vtables.get_all_vtables():
+ name = ida_utils.demangle_ea(vtbl.ea)
+ print("%s0x%x (size: 0x%x) -> %s" % (args.prefix, vtbl.ea, vtbl.size(), name))
+
+ for x in vtbl.get_loads():
+ print("%s\tload @ 0x%x" % (args.prefix, x))
stats[1] += 1
stats[0] += 1 if "vftable" in name else 0
- print("Total: %d, corrects (for sure): %d" % (stats[1], stats[0]))
+ print("%sTotal: %d, corrects (from symbols): %d" % (args.prefix, stats[1], stats[0]))
idc.qexit(0)
diff --git a/symless.py b/symless.py
index 14681c7..1a46ecb 100644
--- a/symless.py
+++ b/symless.py
@@ -17,8 +17,6 @@ def ida_main():
args = parser.parse_args(idc.ARGV[1:])
start_analysis(args.config)
- idc.qexit(0)
-
""" Command line main """
@@ -72,3 +70,5 @@ def cmd_main():
from symless.main import start_analysis
ida_main() # script run from IDA
+
+ idc.qexit(0)
diff --git a/symless/__init__.py b/symless/__init__.py
index 1e535c1..c5a20d6 100644
--- a/symless/__init__.py
+++ b/symless/__init__.py
@@ -1,3 +1,3 @@
# plugin info
-PLUGIN_VERSION = 1.0
-PLUGIN_DESC = "Structure building helper"
+PLUGIN_VERSION = 1.1
+PLUGIN_DESC = "Structure reconstruction assistant"
diff --git a/symless/allocators.py b/symless/allocators.py
index 09f2822..03b4d8f 100644
--- a/symless/allocators.py
+++ b/symless/allocators.py
@@ -1,6 +1,6 @@
import enum
import re
-from typing import List, Tuple
+from typing import Any, List, Tuple
import idaapi
@@ -9,18 +9,17 @@
import symless.utils.utils as utils
# do not consider alloc bigger than this to be object allocs
-g_max_alloc = 0xFFFFFF
+g_max_alloc = 0x4000
def valid_size(size: int):
- return size > 0 and size <= g_max_alloc
+ return size > 0 and size < g_max_alloc
class alloc_action_t(enum.Enum): # allocator action
STATIC_ALLOCATION = 0 # malloc(n)
WRAPPED_ALLOCATOR = 1 # func(x) -> return malloc(x)
- JUMP_TO_ALLOCATOR = 2 # func(x) -> jump malloc
- UNDEFINED = 3
+ UNDEFINED = 2
# a heap allocation function
@@ -43,7 +42,7 @@ def next_index(self) -> int:
def get_name(self) -> str:
return f"{self.type}_like_{self.index:x}"
- def get_child(self, ea: int, args: tuple):
+ def get_child(self, ea: int, args: tuple) -> "allocator_t":
child = self.__class__.__new__(self.__class__) # is there a nicer way to do this ?
child.__init__(ea, *args)
return child
@@ -53,13 +52,12 @@ def make_type(self, func_data: idaapi.func_type_data_t):
pass
# what type of allocation for given state + allocation size for STATIC_ALLOCATION
- def on_call(self, state: cpustate.state_t) -> Tuple[alloc_action_t, int]:
+ def on_call(self, state: cpustate.state_t) -> Tuple[alloc_action_t, Any]:
return (alloc_action_t.UNDEFINED, 0)
- # for WRAPPED_ALLOCATOR action, does wrapper ret confirm it is a wrapper
+ # for WRAPPED_ALLOCATOR action, does wrapper return value confirms it is a wrapper
def on_wrapper_ret(self, state: cpustate.state_t, call_ea: int) -> bool:
- ret_val = state.ret.code
- if isinstance(ret_val, cpustate.call_t) and ret_val.where == call_ea:
+ if isinstance(state.ret, cpustate.call_t) and state.ret.where == call_ea:
return True
return False
@@ -80,24 +78,16 @@ def __init__(self, ea: int, size_index: int = 0):
allocator_t.__init__(self, ea, "malloc")
self.size_index = size_index
- def on_call(self, state: cpustate.state_t) -> Tuple[alloc_action_t, int]:
- is_jump = state.call_type == cpustate.call_type_t.JUMP
+ def on_call(self, state: cpustate.state_t) -> Tuple[alloc_action_t, Any]:
+ if len(state.call_args) <= self.size_index:
+ return (alloc_action_t.UNDEFINED, 0)
- # size parameter
- arg = cpustate.get_argument(cpustate.get_default_cc(), state, self.size_index, False, is_jump)
-
- # size argument comes from wrapper arguments, wrapper might be an allocator
+ arg = state.call_args[self.size_index]
if isinstance(arg, cpustate.arg_t):
index = arg.idx
-
- if is_jump:
- return (alloc_action_t.JUMP_TO_ALLOCATOR, (index,))
-
return (alloc_action_t.WRAPPED_ALLOCATOR, (index,))
-
- # static size - memory allocation
- if isinstance(arg, cpustate.int_t) and valid_size(arg.get_val()):
- return (alloc_action_t.STATIC_ALLOCATION, arg.get_val())
+ elif isinstance(arg, cpustate.int_t) and valid_size(arg.get_uval()):
+ return (alloc_action_t.STATIC_ALLOCATION, arg.get_uval())
return (alloc_action_t.UNDEFINED, 0)
@@ -121,28 +111,23 @@ def __init__(self, ea: int, count_index: int = 0, size_index: int = 1):
self.count_index = count_index
self.size_index = size_index
- def on_call(self, state: cpustate.state_t) -> Tuple[alloc_action_t, int]:
- is_jump = state.call_type == cpustate.call_type_t.JUMP
-
- count_arg = cpustate.get_argument(cpustate.get_default_cc(), state, self.count_index, False, is_jump)
- size_arg = cpustate.get_argument(cpustate.get_default_cc(), state, self.size_index, False, is_jump)
+ def on_call(self, state: cpustate.state_t) -> Tuple[alloc_action_t, Any]:
+ if len(state.call_args) <= max(self.size_index, self.count_index):
+ return (alloc_action_t.UNDEFINED, 0)
+ count_arg = state.call_args[self.count_index]
+ size_arg = state.call_args[self.size_index]
if isinstance(count_arg, cpustate.arg_t) and isinstance(size_arg, cpustate.arg_t):
count_index = count_arg.idx
size_index = size_arg.idx
-
- if is_jump:
- return (alloc_action_t.JUMP_TO_ALLOCATOR, (count_index, size_index))
-
return (alloc_action_t.WRAPPED_ALLOCATOR, (count_index, size_index))
-
- if (
+ elif (
isinstance(count_arg, cpustate.int_t)
- and valid_size(count_arg.get_val())
+ and valid_size(count_arg.get_uval())
and isinstance(size_arg, cpustate.int_t)
- and valid_size(size_arg.get_val())
+ and valid_size(size_arg.get_uval())
):
- size = count_arg.get_val() * size_arg.get_val()
+ size = count_arg.get_uval() * size_arg.get_uval()
return (alloc_action_t.STATIC_ALLOCATION, size)
return (alloc_action_t.UNDEFINED, 0)
diff --git a/symless/config/__init__.py b/symless/config/__init__.py
index b1bb173..7865c6d 100644
--- a/symless/config/__init__.py
+++ b/symless/config/__init__.py
@@ -32,6 +32,7 @@ def initialize(self, config_file: str):
settings = json.load(config)
for key, value in settings.items():
self.__setattr__(key, value)
+ self.debug = self.log_level <= LOG_LEVEL_DEBUG
# global settings variable
diff --git a/symless/config/config.json b/symless/config/config.json
index 605b63c..32651ed 100644
--- a/symless/config/config.json
+++ b/symless/config/config.json
@@ -1,5 +1,4 @@
{
- "log_level": 20,
- "debug": true,
+ "log_level": 30,
"rebase_db" : true
}
diff --git a/symless/config/imports.csv b/symless/config/imports.csv
index a7c8dce..2a5276c 100644
--- a/symless/config/imports.csv
+++ b/symless/config/imports.csv
@@ -36,6 +36,8 @@ kernel32, VirtualAllocEx, malloc(2)
kernel32, VirtualAllocExNuma, malloc(2)
kernel32, VirtualAllocFromApp, malloc(1)
+KERNEL32, HeapAlloc, malloc(2)
+
# Windows kernel
ntoskrnl, ExAllocatePool, malloc(1)
ntoskrnl, ExAllocatePool2, malloc(1)
@@ -58,3 +60,10 @@ ntoskrnl, ExAllocatePoolWithTagPriority, malloc(1)
.dynsym, _Znwj@@GLIBCXX_, malloc # operator new(uint)
.dynsym, _Znwm@@GLIBCXX_, malloc # operator new(ulong)
+
+libstdc++-6, _Znwy, malloc # operator new(unsigned long long)
+
+# macOS libsystem
+/usr/lib/libSystem.B.dylib, _malloc, malloc
+/usr/lib/libSystem.B.dylib, _realloc, realloc
+/usr/lib/libSystem.B.dylib, _calloc, calloc
diff --git a/symless/conflict.py b/symless/conflict.py
index 2066348..4acc512 100644
--- a/symless/conflict.py
+++ b/symless/conflict.py
@@ -7,19 +7,16 @@
import symless.utils.utils as utils
-# for possible overlapping fields for a structure
-# which one to choose
+# when two possible fields are overlapping in a structure, select which to keep
def fields_conflicts_solver(field: generation.field_t, old_field: generation.field_t) -> bool:
- # care when replacing a typed field
if old_field.has_type():
- # do not untype our field
if not field.has_type():
return False
return old_field.replace(field)
- # default policy: replace
- return True
+ # default: keep smallest
+ return field.size <= old_field.size
# when converting from var field to field
@@ -32,8 +29,25 @@ def field_size_solver(field: model.field_t) -> int:
# define less derived structure between multiple structs
# the less derived should be the nearest from their common base
def less_derived(candidates: List[Tuple[generation.structure_t, int]]) -> Tuple[generation.structure_t, int]:
+ # search for smallest struc with smallest shift applied
candidates.sort(key=lambda i: (i[1], i[0].get_size()))
- return candidates[0]
+ chosen = candidates[0]
+
+ __vftable = chosen[0].get_field(0)
+ if not isinstance(__vftable, generation.vtbl_ptr_field_t):
+ return chosen
+
+ # if we still have multiple candidates left, and they are cpp classes
+ # find the one with the less derived associated vtable
+ for struc, shift in filter(lambda c: c[1] == chosen[1] and c[0].get_size() == chosen[0].get_size(), candidates[1:]):
+ __vftable_2 = struc.get_field(0)
+ if not isinstance(__vftable_2, generation.vtbl_ptr_field_t):
+ continue
+
+ if __vftable.value.get_most_derived(__vftable_2.value) == __vftable.value:
+ chosen = (struc, shift)
+
+ return chosen
# find which class a vtable belongs to
@@ -113,11 +127,6 @@ def get_conflicting_structures(
# can we consider two structures to be duplicates & merge them
def are_structures_identical(one: generation.structure_t, other: generation.structure_t) -> bool:
- # even if computed size is not always right
- # consider different sized structures to be different
- if one.get_size() != other.get_size():
- return False
-
i, j = 0, 0
while i < len(one.range) and j < len(other.range):
off_one, size_one = one.range[i][0], one.range[i][1]
@@ -159,6 +168,10 @@ def are_structures_identical(one: generation.structure_t, other: generation.stru
i += 1
j += 1
+ # if not at the end of both structures, they are different
+ if i < len(one.range) or j < len(other.range):
+ return False
+
return True
@@ -181,7 +194,7 @@ def merge_structures(src: generation.structure_t, dst: generation.structure_t, e
entry.set_structure(shift, dst)
-# find & merge duplicated structures
+# try to find & merge duplicated structures
def remove_dupes(entries: model.entry_record_t, structures: generation.structure_record_t):
# find conflicting structures
dupe_conflicts = get_conflicting_structures(entries)
diff --git a/symless/cpustate/__init__.py b/symless/cpustate/__init__.py
index 1b3e15b..3cefac1 100644
--- a/symless/cpustate/__init__.py
+++ b/symless/cpustate/__init__.py
@@ -1,289 +1,20 @@
import copy
-import ctypes
import enum
-from typing import Dict, Generator, List, Optional, Tuple
+from typing import Collection, Dict, Generator, List, Optional
+import ida_hexrays
import idaapi
-import idc
-
-###################
-# CPU definitions #
-###################
-
-
-# convert ida op_type to string
-def op_type_str(op_type: int) -> str:
- if op_type == idaapi.o_void:
- return "void"
- if op_type == idaapi.o_reg:
- return "reg"
- if op_type == idaapi.o_mem:
- return "mem"
- if op_type == idaapi.o_phrase:
- return "phrase"
- if op_type == idaapi.o_displ:
- return "disp"
- if op_type == idaapi.o_imm:
- return "imm"
- if op_type == idaapi.o_far:
- return "far"
- if op_type == idaapi.o_near:
- return "near"
-
-
-# convert size in bytes to string
-def to_size(nbytes: int) -> str:
- if nbytes == 1:
- return "u8"
- if nbytes == 2:
- return "u16"
- if nbytes == 4:
- return "u32"
- if nbytes == 8:
- return "u64"
- if nbytes == 16:
- return "u128"
- return "invalid"
-
-
-# ida x64 register names
-X64_REGISTERS = {
- 0: "rax",
- 1: "rcx",
- 2: "rdx",
- 3: "rbx",
- 4: "rsp",
- 5: "rbp",
- 6: "rsi",
- 7: "rdi",
- 8: "r8",
- 9: "r9",
- 10: "r10",
- 11: "r11",
- 12: "r12",
- 13: "r13",
- 14: "r14",
- 15: "r15",
- 29: "es",
- 30: "cs",
- 31: "ss",
- 32: "ds",
- 33: "fs",
- 34: "gs",
- 56: "mm0",
- 57: "mm1",
- 58: "mm2",
- 59: "mm3",
- 60: "mm4",
- 61: "mm5",
- 62: "mm6",
- 63: "mm7",
- 64: "xmm0",
- 65: "xmm1",
- 66: "xmm2",
- 67: "xmm3",
- 68: "xmm4",
- 69: "xmm5",
- 70: "xmm6",
- 71: "xmm7",
- 72: "xmm8",
- 73: "xmm9",
- 74: "xmm10",
- 75: "xmm11",
- 76: "xmm12",
- 77: "xmm13",
- 78: "xmm14",
- 79: "xmm15",
- 81: "ymmm0",
- 82: "ymmm1",
- 83: "ymmm2",
- 84: "ymmm3",
- 85: "ymmm4",
- 86: "ymmm5",
- 87: "ymmm6",
- 88: "ymmm7",
- 89: "ymmm8",
- 90: "ymmm9",
- 91: "ymmm10",
- 92: "ymmm11",
- 93: "ymmm12",
- 94: "ymmm13",
- 95: "ymmm14",
- 96: "ymmm15",
-}
-
-
-# convert ida register to string
-def reg_string(reg: int) -> str:
- return X64_REGISTERS[reg]
-
-
-INSN_MOVES = [idaapi.NN_mov, idaapi.NN_movups, idaapi.NN_movdqu]
-
-INSN_MATH = [idaapi.NN_add, idaapi.NN_or, idaapi.NN_sub]
-
-INSN_MATHS = [idaapi.NN_add, idaapi.NN_sub]
-
-INSN_XORS = [idaapi.NN_xor]
-
-INSN_ANDS = [idaapi.NN_and]
-
-INSN_CALLS = [idaapi.NN_call, idaapi.NN_callfi, idaapi.NN_callni]
-
-INSN_RETS = [idaapi.NN_retn, idaapi.NN_retf]
-
-INSN_JUMPS = [
- idaapi.NN_ja,
- idaapi.NN_jae,
- idaapi.NN_jb,
- idaapi.NN_jbe,
- idaapi.NN_jc,
- idaapi.NN_jcxz,
- idaapi.NN_je,
- idaapi.NN_jecxz,
- idaapi.NN_jg,
- idaapi.NN_jge,
- idaapi.NN_jl,
- idaapi.NN_jle,
- idaapi.NN_jmp,
- idaapi.NN_jmpfi,
- idaapi.NN_jmpni,
- idaapi.NN_jmpshort,
- idaapi.NN_jna,
- idaapi.NN_jnae,
- idaapi.NN_jnb,
- idaapi.NN_jnbe,
- idaapi.NN_jnc,
- idaapi.NN_jne,
- idaapi.NN_jng,
- idaapi.NN_jnge,
- idaapi.NN_jnl,
- idaapi.NN_jnle,
- idaapi.NN_jno,
- idaapi.NN_jnp,
- idaapi.NN_jns,
- idaapi.NN_jnz,
- idaapi.NN_jo,
- idaapi.NN_jp,
- idaapi.NN_jpe,
- idaapi.NN_jpo,
- idaapi.NN_jrcxz,
- idaapi.NN_js,
- idaapi.NN_jz,
-]
-
-INSN_UNCONDITIONAL_JUMPS = [idaapi.NN_jmp, idaapi.NN_jmpfi, idaapi.NN_jmpni]
-
-INSN_TESTS = [idaapi.NN_test]
-
-INSN_CMPS = [idaapi.NN_cmp]
-
-INSN_LEAS = [idaapi.NN_lea]
-
-
-# for operand [rax + rcx*scale + disp] get base reg (rax)
-def x64_base_reg(insn: idaapi.insn_t, op: idaapi.op_t) -> int:
- if op.specflag1 == 0: # no SIB in op
- return op.phrase
-
- base = op.specflag2 & 7
-
- # REX byte, 64-bytes mode
- if insn.insnpref & 1: # sid base extension
- base |= 8
-
- return base
-
-
-x86_INDEX_NONE = 4
-
-
-# for operand [rax + rcx*scale + disp] get index reg (rcx)
-def x64_index_reg(insn: idaapi.insn_t, op: idaapi.op_t) -> int:
- if op.specflag1 == 0:
- return x86_INDEX_NONE
-
- index = (op.specflag2 >> 3) & 7
- if insn.insnpref & 2: # sib index extension
- index |= 8
-
- return index
-
-
-# insn.itype to string
-def insn_itype_str(insn_itype: int) -> str:
- if insn_itype == idaapi.NN_lea:
- return "lea"
- if insn_itype == idaapi.NN_push:
- return "push"
- if insn_itype == idaapi.NN_pop:
- return "pop"
- if insn_itype in INSN_MOVES:
- return "mov"
- if insn_itype in INSN_MATH:
- return "math"
- if insn_itype in INSN_CALLS:
- return "call"
- if insn_itype in INSN_TESTS:
- return "test"
- if insn_itype in INSN_JUMPS:
- return "jump"
- if insn_itype in INSN_RETS:
- return "ret"
- return "invalid"
-
-
-# op.dtype to string
-def op_dtype_str(dtype: int) -> str:
- if dtype == idaapi.dt_byte:
- return "byte"
- if dtype == idaapi.dt_word:
- return "word"
- if dtype == idaapi.dt_dword:
- return "dword"
- if dtype == idaapi.dt_qword:
- return "qword"
- return ""
-
-
-# get instruction str representation
-def insn_str(insn: idaapi.insn_t) -> str:
- return f"insn:{insn.ea:x} type:{insn.itype} {insn_itype_str(insn.itype)} ({idc.generate_disasm_line(insn.ea, 0)})"
-
-
-# get operand str representation
-def op_str(op: idaapi.op_t) -> str:
- registers = [idaapi.o_reg, idaapi.o_displ]
- reg_suffix = " " + reg_string(op.reg) if op.type in registers else ""
- return f"op: type:{op_type_str(op.type)} reg:{op.reg}{reg_suffix} val:{op.value:x} ea:{op.addr:x} dtype:{op.dtype:x}:{op_dtype_str(op.dtype)}"
-
-
-# get instruction + operands representation
-def insn_str_full(insn: idaapi.insn_t) -> str:
- out = insn_str(insn)
- for op in insn.ops:
- if op.type == idaapi.o_void:
- break
- out += "\n\t" + op_str(op)
- return out
-
-
-# convert data to given size & sign
-def convert_imm(value: int, sizeof: int, signed: bool = True) -> int:
- mask = 1 << (sizeof * 8)
- out = value & (mask - 1)
- if signed and (out & (mask >> 1)):
- out -= mask
- return out
+import symless.utils.ida_utils as ida_utils
+import symless.utils.utils as utils
########################
# CPU state definition #
########################
-# represents an instruction's operand
-# implementations of this class define the operand's type & value
+# abstract operand - define a micro-operand type & possible value
+# use to track values propagation between variables
class absop_t:
# should this operand value be transfered from a caller to a callee as an arg
def should_dive(self) -> bool:
@@ -309,29 +40,7 @@ def __hash__(self) -> int:
return self.where
def __repr__(self):
- return f"call:0x{self.arg:x} @0x{self.where:x}"
-
-
-# displacement operand
-class disp_t(absop_t):
- def __init__(self, reg: int, offset: int, nbytes: int):
- self.reg = reg
- self.offset = offset
- self.nbytes = nbytes
-
- def __eq__(self, other) -> bool:
- return (
- isinstance(other, disp_t)
- and self.reg == other.reg
- and self.offset == other.offset
- and self.nbytes == other.nbytes
- )
-
- def __hash__(self) -> int:
- return hash((self.reg, self.offset, self.nbytes))
-
- def __repr__(self):
- return "%s[%s+0x%x]" % (to_size(self.nbytes), reg_string(self.reg), self.offset)
+ return f"call:0x{self.arg:x}@0x{self.where:x}"
# (unknown) function argument operand
@@ -340,7 +49,7 @@ def __init__(self, idx: int):
self.idx = idx
def should_dive(self) -> bool:
- return False
+ return False # one caller arg is not a callee arg
def __eq__(self, other) -> bool:
return isinstance(other, arg_t) and self.idx == other.idx
@@ -357,49 +66,45 @@ class buff_t(absop_t):
def __init__(self, shift: int = 0):
self.shift = shift
- def offset(self, shift: int):
- return self.clone(ctypes.c_int32(self.shift + shift).value)
-
-
-# structure pointer
-class sid_t(buff_t):
- def __init__(self, sid, shift=0):
- super().__init__(shift)
- self.sid = sid
+ def shift_by(self, add: int, size: int) -> "buff_t":
+ out = copy.copy(self)
+ out.shift = utils.to_c_integer(out.shift + add, size)
+ return out
- def clone(self, shift: int):
- return sid_t(self.sid, shift)
- def should_dive(self) -> bool:
- return False
+# a pointer dereference (ex: access to an object's field)
+class deref_t(absop_t):
+ def __init__(self, ptr: Optional[absop_t], size: int):
+ self.ptr = ptr # pointer beeing dereferenced
+ self.size = size
def __eq__(self, other) -> bool:
- return isinstance(other, sid_t) and self.sid == other.sid and self.shift == other.shift
+ return isinstance(other, deref_t) and self.ptr == other.ptr and self.size == other.size
def __hash__(self) -> int:
- return hash((self.sid, self.shift))
+ return hash((self.ptr, self.size))
def __repr__(self):
- return f"sid:0x{self.sid:x}+0x{self.shift:x}"
+ return f"[{self.ptr}:{self.size:#x}]"
-# stack pointer
-class stack_ptr_t(buff_t):
- def clone(self, shift: int):
- return stack_ptr_t(shift)
+# structure pointer
+class sid_t(buff_t):
+ def __init__(self, sid, shift=0):
+ super().__init__(shift)
+ self.sid = sid
- # stack tracking is local to function
def should_dive(self) -> bool:
- return False
+ return False # sid represents an entrypoint -> local to a function
def __eq__(self, other) -> bool:
- return isinstance(other, stack_ptr_t) and self.shift == other.shift
+ return isinstance(other, sid_t) and self.sid == other.sid and self.shift == other.shift
def __hash__(self) -> int:
- return self.shift
+ return hash((self.sid, self.shift))
def __repr__(self):
- return f"stack_ptr:0x{self.shift:x}"
+ return f"sid{self.sid:x}+0x{self.shift:x}"
# immediate operand, applies to the same operation than buff_t
@@ -408,13 +113,13 @@ def __init__(self, val: int, sizeof: int):
super().__init__(val)
self.size = sizeof
- self.shift = convert_imm(self.shift, self.size, False) # keep int_t unsigned
+ self.shift = utils.to_c_integer(self.shift, self.size)
def get_val(self) -> int:
return self.shift
- def clone(self, shift: int):
- return int_t(shift, self.size)
+ def get_uval(self) -> int: # unsigned value
+ return utils.to_c_integer(self.shift, self.size, False)
def __eq__(self, other) -> bool:
return isinstance(other, int_t) and self.shift == other.shift and self.size == other.size
@@ -423,312 +128,270 @@ def __hash__(self) -> int:
return hash((self.shift, self.size))
def __repr__(self):
- return f"int32:0x{self.get_val():x} ({self.get_val()})"
+ return f"int{self.size*8}:0x{self.get_uval():x}"
-# memory operand
+# a memory address or a value read @ addr
class mem_t(int_t):
def __init__(self, value: int, addr: int, sizeof: int):
super().__init__(value, sizeof)
self.addr = addr
- def clone(self, shift: int):
- return mem_t(shift, self.addr, self.size)
-
def __repr__(self):
- return f"mem:0x{self.addr:x}:0x{self.get_val():x}"
-
-
-# registers values for given cpu state
-class registers_t:
- def __init__(self):
- pass
+ return f"mem:0x{self.addr:x}:0x{self.get_uval():x}"
# memory write
class write_t:
- def __init__(self, ea: int, disp: disp_t, src: absop_t):
- self.ea = ea
- self.disp = disp
- self.src = src
+ def __init__(self, ea: int, target: Optional[absop_t], size: int, value: Optional[absop_t]):
+ self.ea = ea # ea of the write
+ self.target = target # write dst
+ self.size = size # write size
+ self.value = value # written value
def __repr__(self):
- return "0x%x %r=%r" % (self.ea, self.disp, self.src)
+ return f"{self.ea:#x} u{self.size*8}[{self.target}]={self.value}"
# memory read
class read_t:
- def __init__(self, ea: int, disp: disp_t, dst: int):
- self.ea = ea
- self.disp = disp
- self.dst = dst
+ def __init__(self, ea: int, target: Optional[absop_t], size: int, dst: ida_hexrays.mop_t):
+ self.ea = ea # ea of the read
+ self.target = target # read src
+ self.size = size # read size
+ self.dst = dst # dst operand of the read, no copy
def __repr__(self):
- return "0x%x %r=%r" % (self.ea, X64_REGISTERS[self.dst], self.disp)
+ dstname = (
+ ida_hexrays.get_mreg_name(self.dst.r, self.size)
+ if self.dst.t == ida_hexrays.mop_r
+ else f"stk:{self.dst.s.off:x}"
+ )
+ return f"{self.ea:#x} {dstname}=u{self.size*8}[{self.target}]"
# memory access
class access_t:
- def __init__(self, ea: int, op_index: int, key: disp_t):
- self.ea = ea
- self.op_index = op_index
- self.key = key
+ def __init__(self, ea: int, target: Optional[absop_t], loc: idaapi.mop_t, size: int):
+ self.ea = ea # ea for the access
+ self.target = target # target beeing accessed
+ self.size = size # access size
+ self.loc = loc # target operand, no copy it should not get freed
def __repr__(self):
- return "0x%x key = %r op_index = %r" % (self.ea, self.key, self.op_index)
+ return f"{self.ea:#x} u{self.size*8}[{self.target}]"
-# function return value and address
-class ret_t:
- def __init__(self, code: absop_t, where: int):
- self.code = code
- self.where = where
+# a visited function
+class function_t:
+ def __init__(self, mba: ida_hexrays.mba_t):
+ self.ea = mba.entry_ea
- def __repr__(self):
- return "ret:%s at 0x%x" % (self.code, self.where)
-
-
-# function arguments count & calling convention guesser
-# focuses on sid_t arguments, our target
-class arguments_t:
- def __init__(self, state, cc):
- self.args: Dict[int, int] = dict() # id(args) -> index
- self.cc = cc # current function's cc (or default cc if unknown)
- self.guessed_args_count = -1 # args count - 1
- self.individual_validation = [False for i in range(cc.get_arg_count())]
-
- # record values of (potential) arguments
- for i in range(cc.get_arg_count()):
- arg = get_argument(cc, state, i)
- if arg is not None:
- self.args[id(arg)] = i
-
- # value has been used, if it comes from an arg validate it
- def validate(self, value: absop_t) -> bool:
- try:
- index = self.args[id(value)]
- self.guessed_args_count = max(self.guessed_args_count, index)
- self.individual_validation[index] = True
- return True
-
- except KeyError:
- return False
-
- # returns guessed (cc, start_arg, args_count) for current propagation
- def guess_cc(self):
- return self.cc.guess_function_cc(self.guessed_args_count + 1, self.individual_validation)
-
-
-# tracks the stack state
-class stack_t:
- def __init__(self):
- self.stack: Dict[int, absop_t] = dict() # offset -> value
-
- def push(self, shift: int, value: absop_t):
- self.stack[shift] = value
-
- def pop(self, shift: int) -> absop_t:
- if shift in self.stack:
- return self.stack[shift]
+ # location of function's ret code
+ self.retloc: Optional[ida_hexrays.vdloc_t] = None
+
+ # location of function's arguments
+ self.argloc: Collection[ida_hexrays.vdloc_t] = list()
+
+ # tinfo for function, force decompile for accurate arguments count
+ finfo = ida_utils.get_fct_type(self.ea, True)
+ if not finfo:
+ return
+
+ fdata = idaapi.func_type_data_t()
+ if not finfo.get_func_details(fdata):
+ utils.g_logger.warning(f"No func_details for fea {self.ea:#x}")
+ return
+
+ # update retloc & arglocs
+ if fdata.retloc.atype() != idaapi.ALOC_NONE:
+ self.retloc = mba.idaloc2vd(fdata.retloc, ida_utils.get_ptr_size())
+
+ for arg in fdata:
+ self.argloc.append(mba.idaloc2vd(arg.argloc, ida_utils.get_ptr_size()))
+
+ def get_args_count(self) -> int:
+ return len(self.argloc)
+
+ def get_retloc(self) -> Optional[ida_hexrays.vdloc_t]:
+ return self.retloc
+
+ def get_argloc(self, idx: int) -> Optional[ida_hexrays.vdloc_t]:
+ if idx < self.get_args_count():
+ return self.argloc[idx]
return None
- def copy(self, origin: "stack_t"):
- self.stack = origin.stack.copy()
+ def __repr__(self):
+ return f"fct {hex(self.ea)} ({self.get_args_count()} args)"
-class call_type_t(enum.Enum):
- CALL = 0
- JUMP = 1
+# types for state last processed instruction
+class last_insn_type_t(enum.Enum):
+ LAST_INSN_ANY = 0
+ LAST_INSN_RET = 1
+ LAST_INSN_CALL = 2
-# a cpu state (stack, registers, ..)
+# a cpu state (stack, registers (variables), ..)
class state_t:
- def __init__(self, fct_ea: int = idaapi.BADADDR):
- self.fct_ea = fct_ea # function this state is for
-
- self.previous = registers_t() # registers before computing current insn
- self.registers = registers_t() # registers after computing current insn
-
- self.writes: List[write_t] = []
- self.reads: List[read_t] = []
- self.access: List[access_t] = []
-
- self.call_type: Optional[call_type_t] = None
- self.call_to: Optional[idaapi.func_t] = None
- self.ret: Optional[ret_t] = None
-
- # track the use of function's args
- self.arguments = None
-
- # stack tracker
- self.stack = stack_t()
- set_stack_ptr(self, stack_ptr_t())
-
- # must be called before use
- def reset_arguments(self, cc):
- self.arguments = arguments_t(self, cc)
-
- # drop register state
- def drop_register(self, reg: int):
- self.drop_register_str(reg_string(reg))
-
- def drop_register_str(self, reg: str):
- try:
- delattr(self.registers, reg)
- except AttributeError:
- pass
-
- # get register state, if any
- def get_register(self, reg: int) -> absop_t:
- return self.get_register_str(reg_string(reg))
-
- def get_register_str(self, reg: str, n: int = 0) -> absop_t:
- source = self.previous if n else self.registers
- try:
- return getattr(source, reg)
- except AttributeError:
- return None
-
- # save register state
- def set_register(self, reg: int, arg: absop_t):
- self.set_register_str(reg_string(reg), arg)
-
- def set_register_str(self, reg: str, arg: absop_t, n: int = 0):
- source = self.previous if n else self.registers
- try:
- setattr(source, reg, arg)
- except AttributeError:
- pass
-
- def get_previous_register(self, reg: int) -> absop_t:
- return self.get_register_str(reg_string(reg), 1)
-
- def get_registers(self) -> Generator[Tuple[int, absop_t], None, None]:
- for reg in vars(self.registers):
- yield (reg, self.get_register_str(reg))
+ def __init__(self, mba: ida_hexrays.mba_t, fct: Optional[function_t]):
+ self.mba = mba # microcode where the propagation takes place
+ self.fct = fct # owning function's model
+
+ # type of the last processed instruction
+ # we mostly care about function calls & ret
+ self.last_insn_type: last_insn_type_t = last_insn_type_t.LAST_INSN_ANY
+
+ # record current micro registers values (mreg_t: value)
+ self.registers: Dict[int, absop_t] = {}
+
+ # record current stack variables values (index: value)
+ self.locals: Dict[int, absop_t] = {}
+
+ self.writes: List[write_t] = [] # writes performed by last insn
+ self.reads: List[read_t] = [] # reads performed by last insn
+ self.accesses: List[access_t] = [] # memory accesses performed by last insn
+
+ self.call_to: Optional[idaapi.func_t] = None # current call target
+ self.call_args: List[Optional[absop_t]] = [] # arguments for current call insn
+
+ self.ret: Optional[absop_t] = None # current ret value
+
+ # start ea for function in which we propagate
+ def get_fea(self) -> int:
+ return self.fct.ea
+
+ # get value for given mreg_t
+ def get_register(self, mreg: int) -> Optional[absop_t]:
+ return self.registers.get(mreg)
+
+ # set value for mreg_t
+ def set_register(self, mreg: int, value: Optional[absop_t]):
+ if value is not None:
+ self.registers[mreg] = value
+ else:
+ self.drop_register(mreg)
+
+ # drop recorded value for mreg_t
+ def drop_register(self, mreg: int):
+ self.registers.pop(mreg, None)
+
+ # get value for given stack variable
+ def get_local(self, idx: int) -> Optional[absop_t]:
+ return self.locals.get(idx)
+
+ # set value for stack variable
+ def set_local(self, idx: int, value: Optional[absop_t]):
+ if value is not None:
+ self.locals[idx] = value
+ else:
+ self.drop_local(idx)
+
+ # drop recorded stack variable
+ def drop_local(self, idx: int):
+ self.locals.pop(idx, None)
+
+ # get value for given micro operand
+ def get_var_from_mop(self, mop: ida_hexrays.mop_t) -> Optional[absop_t]:
+ if mop.t == ida_hexrays.mop_r:
+ return self.get_register(mop.r)
+ if mop.t == ida_hexrays.mop_S:
+ return self.get_local(mop.s.off)
+ utils.g_logger.warning(f"{ida_utils.g_mopt_name[mop.t]} operands not handled")
+ return None
+
+ # set value for given micro operand
+ def set_var_from_mop(self, mop: ida_hexrays.mop_t, value: Optional[absop_t]):
+ if mop.t == ida_hexrays.mop_r:
+ self.set_register(mop.r, value)
+ elif mop.t == ida_hexrays.mop_S:
+ self.set_local(mop.s.off, value)
+ else:
+ utils.g_logger.error(f"{ida_utils.g_mopt_name[mop.t]} operands not handled")
+
+ # drop var from given micro operand
+ def drop_var_from_mop(self, mop: ida_hexrays.mop_t):
+ if mop.t == ida_hexrays.mop_r:
+ self.drop_register(mop.r)
+ elif mop.t == ida_hexrays.mop_S:
+ self.drop_local(mop.s.off)
+ else:
+ utils.g_logger.info(f"{ida_utils.g_mopt_name[mop.t]} operands not handled")
+
+ # get value at the specified vd location (stack or register)
+ def get_var_from_loc(self, loc: ida_hexrays.vdloc_t) -> Optional[absop_t]:
+ if loc.is_reg1():
+ return self.get_register(loc.reg1())
+ if loc.is_stkoff():
+ return self.get_local(loc.stkoff())
+ return None
+
+ # set value at the specified vd location (stack or register)
+ def set_var_from_loc(self, loc: ida_hexrays.vdloc_t, value: Optional[absop_t]):
+ if loc.is_reg1():
+ self.set_register(loc.reg1(), value)
+ elif loc.is_stkoff():
+ self.set_local(loc.stkoff(), value)
+
+ # drop recorded values for kregs used to pass results between inlined minsns
+ def drop_kregs(self):
+ for kreg in self.mba.tmp_result_kregs:
+ self.drop_register(kreg)
+ self.drop_register(self.mba.call_result_kreg)
+
+ def get_vars(self) -> Generator[absop_t, None, None]:
+ for var in self.registers.values():
+ yield var
+ for var in self.locals.values():
+ yield var
def get_nb_types(self, wanted_type) -> int:
ret = 0
- for _, reg in self.get_registers():
- if type(reg) == wanted_type:
- ret += 1
+ for var in self.get_vars():
+ ret += int(isinstance(var, wanted_type))
return ret
- # prepare to transit to next state
+ # reset information about current insn
def reset(self):
+ self.last_insn_type = last_insn_type_t.LAST_INSN_ANY
self.writes.clear()
self.reads.clear()
- self.access.clear()
- self.ret: ret_t = None
+ self.accesses.clear()
self.call_to = None
- self.call_type = None
- self.previous = copy.copy(self.registers)
+ self.call_args.clear()
+ self.ret = None
# copy persistent content into another state
def copy(self) -> "state_t":
- out = state_t(self.fct_ea)
+ out = state_t(self.mba, self.fct)
out.registers = copy.copy(self.registers)
- out.stack.copy(self.stack)
-
- # keep same version of arguments tracking object
- out.arguments = self.arguments
+ out.locals = copy.copy(self.locals)
return out
# save write
- def write_to(self, ea: int, key: disp_t, src: absop_t):
- if src:
- self.writes.append(write_t(ea, key, src))
+ def write_to(self, ea: int, target: Optional[absop_t], loc: idaapi.mop_t, size: int, value: Optional[absop_t]):
+ self.access_to(ea, target, loc, size)
+ self.writes.append(write_t(ea, target, size, value))
# save read
- def read_from(self, ea: int, disp: disp_t, dst: int):
- self.reads.append(read_t(ea, disp, dst))
+ def read_from(self, ea: int, target: Optional[absop_t], loc: idaapi.mop_t, size: int, dst: ida_hexrays.mop_t):
+ self.access_to(ea, target, loc, size)
+ self.reads.append(read_t(ea, target, size, dst))
# save access
- def access_to(self, ea: int, n: int, key: disp_t):
- self.access.append(access_t(ea, n, key))
-
- # save ret
- def save_ret(self, where: int):
- item = get_ret_value(self)
- if item:
- self.ret = ret_t(item, where)
-
- # cpu state representation
- def __repr__(self):
- regs = []
- for k in sorted(vars(self.registers)):
- regs.append(f"{k}:{getattr(self.registers, k)}")
- str_call_to = f" call_to start_ea {self.call_to.start_ea}" if self.call_to is not None else ""
- return " ".join(regs) + str_call_to
-
-
-#####################
-# Arch specific ops #
-#####################
-
-import symless.cpustate.arch as arch
+ def access_to(self, ea: int, target: Optional[absop_t], loc: idaapi.mop_t, size: int):
+ self.accesses.append(access_t(ea, target, loc, size))
-# global calling convention & abi
-g_abi = None
+ # state contains call info from last call instruction
+ def has_call_info(self) -> bool:
+ return self.last_insn_type == last_insn_type_t.LAST_INSN_CALL
+ # state contains ret info from last function ret
+ def has_ret_info(self) -> bool:
+ return self.last_insn_type == last_insn_type_t.LAST_INSN_RET
-def get_abi() -> arch.abi_t:
- global g_abi
-
- if g_abi is None:
- g_abi = arch.get_abi()
- return g_abi
-
-
-# default calling convention to use on basic functions
-def get_default_cc() -> arch.abi_t:
- return get_abi().get_default_cc()
-
-
-# cc to use for class methods
-def get_object_cc() -> arch.abi_t:
- return get_abi().get_object_cc()
-
-
-# set value of stack ptr in given state_t
-def set_stack_ptr(state: state_t, value: absop_t):
- get_abi().set_stack_ptr(state, value)
-
-
-# get stack ptr value in given state_t
-def get_stack_ptr(state: state_t) -> absop_t:
- return get_abi().get_stack_ptr(state)
-
-
-# set ret register value
-def set_ret_value(state: state_t, value: absop_t):
- get_abi().set_ret_value(state, value)
-
-
-# get ret register value
-def get_ret_value(state: state_t) -> absop_t:
- return get_abi().get_ret_value(state)
-
-
-# set argument at given index in given state using given cc
-def set_argument(
- cc: arch.abi_t,
- state: state_t,
- index: int,
- value: absop_t,
- from_callee: bool = True,
- is_jump: bool = False,
-):
- if is_jump: # jmp, always from caller state
- cc.set_jump_argument(state, index, value)
- else:
- cc.set_argument(state, index, value, from_callee)
-
-
-# get argument at given index in given state using given cc
-def get_argument(
- cc: arch.abi_t, state: state_t, index: int, from_callee: bool = True, is_jump: bool = False
-) -> absop_t:
- if is_jump:
- return cc.get_jump_argument(state, index)
- return cc.get_argument(state, index, from_callee)
+ # cpu state representation
+ def __repr__(self) -> str:
+ regs = ", ".join([f"{idaapi.get_mreg_name(r, 8)}({v})" for r, v in self.registers.items()])
+ lcls = ", ".join([f"{loc:#x}({val})" for loc, val in sorted(self.locals.items(), key=lambda k: k[0])])
+ return f"[regs: {regs}], [stack: {lcls}]"
diff --git a/symless/cpustate/arch.py b/symless/cpustate/arch.py
index 8475b85..bec6866 100644
--- a/symless/cpustate/arch.py
+++ b/symless/cpustate/arch.py
@@ -1,249 +1,14 @@
import idaapi
-import symless.utils.utils as utils
-from symless.cpustate import *
-
-
-# Define arch specific calling convention & abi
-class abi_t:
- def __init__(self, name: str, ret: str, stack_ptr: str):
- self.name = name
- self.ret = ret
- self.stack_ptr = stack_ptr
-
- # set value of stack ptr in given state_t
- def set_stack_ptr(self, state: state_t, value):
- state.set_register_str(self.stack_ptr, value)
-
- # get stack ptr value in given state_t
- def get_stack_ptr(self, state: state_t):
- return state.get_register_str(self.stack_ptr)
-
- # set ret register value
- def set_ret_value(self, state: state_t, value):
- state.set_register_str(self.ret, value)
-
- # get ret register value
- def get_ret_value(self, state: state_t):
- return state.get_register_str(self.ret)
-
- # max arguments count we are willing to consider for a function
- def get_arg_count(self) -> int:
- return 0
-
- # set argument at given index in given state
- def set_argument(self, state: state_t, index: int, value, from_callee: bool = True):
- pass
-
- # get argument at given index in given state
- def get_argument(self, state: state_t, index: int, from_callee: bool = True):
- return None
-
- # set argument for a jmp instruction (from caller state)
- def set_jump_argument(self, state: state_t, index: int, value):
- self.set_argument(self, state, index, value, False)
-
- # get argument for a jmp instruction (from caller state)
- def get_jump_argument(self, state: state_t, index: int):
- return self.get_argument(state, index, False)
-
- # guess function's cc & args count after propagating in it
- # guessed_args_count: count of args that have been used (using default abi/cc)
- # individual_validation: for each args, which one have been recorded to be used
- # returns (guessed_abi, first_valid_arg, guessed_args_count)
- def guess_function_cc(self, guessed_args_count: int, individual_validation: list) -> tuple:
- return (self, 0, guessed_args_count)
-
- # calling convention to use for normal functions
- def get_default_cc(self):
- return self
-
- # calling convention to use for class methods
- def get_object_cc(self):
- return self
-
-
-# for calling convention passing arguments through registers (__fastcall)
-class reg_cc_abi_t(abi_t):
- def __init__(self, name: str, params: list, ret: str, stack_ptr: str):
- super().__init__(name, ret, stack_ptr)
- self.params = params
-
- def get_arg_count(self) -> int:
- return len(self.params)
-
- def set_argument(self, state: state_t, index: int, value, from_callee: bool = True):
- state.set_register_str(self.params[index], value)
-
- def get_argument(self, state: state_t, index: int, from_callee: bool = True):
- return state.get_register_str(self.params[index])
-
-
-# for calling convention passing arguments through stack (__cdecl & __stdcall)
-class stack_cc_abi_t(abi_t):
- def __init__(self, name: str, ret: str, stack_ptr: str, max_args_count: int = 4):
- super().__init__(name, ret, stack_ptr)
- self.max_args_count = max_args_count # increase default ?
-
- def get_arg_count(self) -> int:
- return self.max_args_count
-
- # index of the first argument in the function local stack
- # or None if rsp does not track stack
- def first_args_index(self, state: state_t, from_callee: bool):
- if from_callee:
- return 4 # after saved eip
- ptr = self.get_stack_ptr(state)
- if not isinstance(ptr, stack_ptr_t):
- return None
- return ptr.shift
-
- # consider the args to be 4 bytes aligned on stack
- def get_args_shift(self, state: state_t, index: int, from_callee: bool):
- start = self.first_args_index(state, from_callee)
- if start is not None:
- start += index * 4
- return start
-
- def set_argument(self, state: state_t, index: int, value, from_callee: bool = True):
- shift = self.get_args_shift(state, index, from_callee)
- if shift is not None:
- state.stack.push(shift, value)
-
- def get_argument(self, state: state_t, index: int, from_callee: bool = True):
- shift = self.get_args_shift(state, index, from_callee)
- if shift is None:
- return None
- return state.stack.pop(shift)
-
- def set_jump_argument(self, state: state_t, index: int, value):
- shift = self.get_args_shift(state, index, False)
- if shift is not None:
- # saved eip won't be pushed by jmp, it is already in stack
- # first arg index is 4 and not 0
- state.stack.push(shift + 4, value)
-
- def get_jump_argument(self, state: state_t, index: int):
- shift = self.get_args_shift(state, index, False)
- if shift is None:
- return None
- return state.stack.pop(shift + 4)
-
-
-# default abi & cc to use before guessing function's cc
-# should be called once per analysis
-def get_abi() -> abi_t:
- if idaapi.inf_get_filetype() == idaapi.f_PE:
- if idaapi.get_inf_structure().is_64bit():
- selected = win_64_abi_t()
- else:
- selected = win_32_abi_t()
-
- elif idaapi.get_inf_structure().is_64bit():
- selected = systemv_64_abi_t()
- else:
- selected = systemv_32_abi_t()
-
- utils.g_logger.info("Applying %s calling convention" % selected.name)
-
- return selected
-
def is_arch_supported() -> bool:
- return is_filetype_supported() and is_proc_supported()
-
-
-def is_filetype_supported() -> bool:
- return idaapi.inf_get_filetype() in [idaapi.f_PE, idaapi.f_ELF]
-
-
-def is_elf() -> bool:
- return idaapi.inf_get_filetype() == idaapi.f_ELF
+ return is_proc_supported()
def is_proc_supported() -> bool:
- return idaapi.inf_get_procname() == "metapc"
+ # name = idaapi.inf_get_procname()
+ return True # every arch should be supported by microcode
def get_proc_name() -> str:
return idaapi.inf_get_procname()
-
-
-# Win i386 __thiscall ABI -> first arg (this) in ecx, rest in stack
-class win_32_thiscall_abi_t(stack_cc_abi_t):
- def __init__(self):
- super().__init__("Microsoft i386 __thiscall", "rax", "rsp")
- self.cc = idaapi.CM_CC_THISCALL
-
- def set_argument(self, state: state_t, index: int, value, from_callee: bool = True):
- if index == 0:
- state.set_register_str("rcx", value)
- else:
- super().set_argument(state, index - 1, value, from_callee)
-
- def get_argument(self, state: state_t, index: int, from_callee: bool = True):
- if index == 0:
- return state.get_register_str("rcx")
- return super().get_argument(state, index - 1, from_callee)
-
- def set_jump_argument(self, state: state_t, index: int, value):
- if index == 0:
- state.set_register_str("rcx", value)
- else:
- super().set_jump_argument(state, index - 1, value)
-
- def get_jump_argument(self, state: state_t, index: int):
- if index == 0:
- return state.get_register_str("rcx")
- return super().get_jump_argument(state, index - 1)
-
-
-# Win i386 __stdcall ABI
-class win_32_stdcall_abi_t(stack_cc_abi_t):
- def __init__(self):
- super().__init__("Microsoft i386 __stdcall", "rax", "rsp")
- self.cc = idaapi.CM_CC_STDCALL
-
-
-# win 32 abi with default cc (merged between __stdcall & __thiscall calling conventions)
-class win_32_abi_t(win_32_thiscall_abi_t):
- def __init__(self):
- super().__init__()
- self.name = "Microsoft i386"
- self.max_args_count = 5 # ecx + 4 args from stack
-
- # possible ABIs for a function
- self.stdcall = win_32_stdcall_abi_t()
- self.thiscall = win_32_thiscall_abi_t()
-
- def guess_function_cc(self, guessed_args_count: int, individual_validation: list) -> tuple:
- if individual_validation[0]: # ecx is used (thiscall)
- return (self.thiscall, 0, guessed_args_count)
- return (self.stdcall, 1, max(0, guessed_args_count - 1)) # default __stdcall
-
- def get_default_cc(self):
- return self.stdcall
-
- def get_object_cc(self):
- return self.thiscall
-
-
-# System V x86_64 ABI
-class systemv_64_abi_t(reg_cc_abi_t):
- def __init__(self):
- super().__init__("System V x86_64", ["rdi", "rsi", "rdx", "rcx", "r8", "r9"], "rax", "rsp")
- self.cc = idaapi.CM_CC_FASTCALL
-
-
-# Win x86_64 ABI
-class win_64_abi_t(reg_cc_abi_t):
- def __init__(self):
- super().__init__("Microsoft x86_64", ["rcx", "rdx", "r8", "r9"], "rax", "rsp")
- self.cc = idaapi.CM_CC_FASTCALL
-
-
-# System V i386 ABI
-class systemv_32_abi_t(stack_cc_abi_t):
- def __init__(self):
- super().__init__("System V i386", "rax", "rsp") # rax & eax have the same reg_id in IDA
- self.cc = idaapi.CM_CC_STDCALL
diff --git a/symless/cpustate/cpustate.py b/symless/cpustate/cpustate.py
index eef3549..2b48eae 100644
--- a/symless/cpustate/cpustate.py
+++ b/symless/cpustate/cpustate.py
@@ -1,12 +1,11 @@
-import copy
-import ctypes
-import logging
-from typing import Collection, Dict, Iterator, List, Set, Tuple
+from collections import deque
+from collections.abc import Callable
+from typing import Collection, Dict, Generator, List, Set, Tuple
+import ida_hexrays
import idaapi
import symless.config as config
-import symless.cpustate.arch as arch
import symless.utils.ida_utils as ida_utils
import symless.utils.utils as utils
from symless.cpustate import *
@@ -14,499 +13,444 @@
# max functions depth to propagate a structure
MAX_PROPAGATION_RECURSION = 100
-# Explicit constants
-ONE_OPERAND_INSTRUCTIONS = 0
-TWO_OPERAND_INSTRUCTIONS = 1
+# handles mov (mop_r | mop_S), (mop_r | mop_S)
+def handle_mov_var_var(state: state_t, insn: ida_hexrays.minsn_t):
+ v = state.get_var_from_mop(insn.l)
+ state.set_var_from_mop(insn.d, v)
-# ignore instruction
-def handle_ignore(state: state_t, *args):
- pass
+# handles mov mop_n, (mop_r | mop_S)
+def handle_mov_imm_var(state: state_t, insn: ida_hexrays.minsn_t):
+ v = int_t(insn.l.nnn.value, insn.d.size)
+ state.set_var_from_mop(insn.d, v)
-# drop one reg values when we do no know its new value
-def handle_reg_drop(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- if op.type == idaapi.o_reg:
- state.drop_register(op.reg)
+# handles mov mop_v, (mop_r | mop_S)
+def handle_mov_gbl_var(state: state_t, insn: ida_hexrays.minsn_t):
+ gvalue = ida_utils.get_nb_bytes(insn.l.g, insn.d.size)
+ if gvalue is None:
+ return state.drop_var_from_mop(insn.d)
+ v = mem_t(gvalue, insn.l.g, insn.d.size)
+ state.set_var_from_mop(insn.d, v)
-def handle_mov_reg_reg(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- cur = state.get_register(src.reg)
- state.set_register(dst.reg, cur)
+# handles mov mop_a, (mop_r | mop_S)
+def handle_mov_addr_var(state: state_t, insn: ida_hexrays.minsn_t):
+ if insn.l.a.t != ida_hexrays.mop_v: # mop_l, mop_S or mop_r
+ return state.drop_var_from_mop(insn.d)
+ v = mem_t(insn.l.a.g, insn.l.a.g, insn.d.size)
+ state.set_var_from_mop(insn.d, v)
-def handle_mov_disp_reg(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- # mov [rax+rbx*2+n]
- # ignore basereg + indexreg*scale + offset cases
- if x64_index_reg(insn, dst) != x86_INDEX_NONE:
- # FIXME: stack value may be replaced here and we won't know
- return
-
- base = x64_base_reg(insn, dst)
- cur = state.get_register(src.reg)
- nex = state.get_register(base)
- nbytes = idaapi.get_dtype_size(dst.dtype)
- if isinstance(nex, stack_ptr_t):
- shift = ctypes.c_int32(dst.addr + nex.shift).value
- state.stack.push(shift, cur)
- else:
- # do not report src to be used when pushed in stack
- state.arguments.validate(cur)
+# handles stx (mop_r | mop_S), mop_r, (mop_r | mop_S)
+# note: sel register is ignored
+def handle_stx_var_var(state: state_t, insn: ida_hexrays.minsn_t):
+ dst = state.get_var_from_mop(insn.d)
+ # if not isinstance(dst, buff_t): # stx to unknown
+ # return
- disp = disp_t(base, dst.addr, nbytes)
- state.write_to(insn.ea, disp, cur)
+ v = state.get_var_from_mop(insn.l)
+ state.write_to(insn.ea, dst, insn.d, insn.l.size, v)
-def handle_mov_reg_imm(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- nbytes = idaapi.get_dtype_size(dst.dtype)
- state.set_register(dst.reg, int_t(src.value, nbytes))
+# handles stx mop_n, mop_r, (mop_r | mop_S)
+# note: sel register is ignored
+def handle_stx_imm_var(state: state_t, insn: ida_hexrays.minsn_t):
+ dst = state.get_var_from_mop(insn.d)
+ # if not isinstance(dst, buff_t): # stx to unknown
+ # return
+ v = int_t(insn.l.nnn.value, insn.l.size)
+ state.write_to(insn.ea, dst, insn.d, insn.l.size, v)
-def handle_mov_disp_imm(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- # FIXME: mov [rsp + rcx*2 + 16], 200h will modify the stack
- # without us unvalidating the old value
- if x64_index_reg(insn, dst) != x86_INDEX_NONE:
- return
- base = x64_base_reg(insn, dst)
- cur = state.get_register(base)
- nbytes = idaapi.get_dtype_size(src.dtype)
-
- if isinstance(cur, stack_ptr_t):
- shift = ctypes.c_int32(dst.addr + cur.shift).value
- state.stack.push(shift, int_t(src.value, nbytes))
-
- else:
- # special win32 vtable load case
- # mov [ecx], offset vftable
- # to simplify vtable detection, consider immediate to be a mem_t
+# handles stx mop_v, mop_r, (mop_r | mop_S)
+# note: sel register is ignored
+def handle_stx_gbl_var(state: state_t, insn: ida_hexrays.minsn_t):
+ dst = state.get_var_from_mop(insn.d)
+ # if not isinstance(dst, buff_t): # stx to unknown
+ # return
- disp = disp_t(base, dst.addr, nbytes)
- state.write_to(insn.ea, disp, mem_t(src.value, src.value, nbytes))
+ gvalue = ida_utils.get_nb_bytes(insn.l.g, insn.l.size)
+ if gvalue is not None:
+ v = mem_t(gvalue, insn.l.g, insn.l.size)
+ state.write_to(insn.ea, dst, insn.d, insn.l.size, v)
-def handle_mov_reg_mem(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- nbytes = idaapi.get_dtype_size(dst.dtype)
- value = ida_utils.get_nb_bytes(src.addr, nbytes)
- if value is not None:
- state.set_register(dst.reg, mem_t(value, src.addr, nbytes))
- else: # register loaded with bss data
- state.drop_register(dst.reg)
+# handles stx mop_a, mop_r, (mop_r | mop_S)
+# note: sel register is ignored
+def handle_stx_addr_var(state: state_t, insn: ida_hexrays.minsn_t):
+ dst = state.get_var_from_mop(insn.d)
+ # if not isinstance(dst, buff_t): # stx to unknown
+ # return
-
-def handle_mov_reg_disp(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- base = x64_base_reg(insn, src)
- cur = state.get_register(base)
-
- # mov rbx, [rax+rcx*2+n], ignored
- if x64_index_reg(insn, src) != x86_INDEX_NONE:
- state.drop_register(dst.reg)
+ if insn.l.a.t != ida_hexrays.mop_v: # mop_l, mop_S or mop_r
return
+ v = mem_t(insn.l.a.g, insn.l.a.g, insn.l.size)
+ state.write_to(insn.ea, dst, insn.d, insn.l.size, v)
- # mov rax, [rsp+0x10]
- if isinstance(cur, stack_ptr_t):
- shift = ctypes.c_int32(src.addr + cur.shift).value
- value = state.stack.pop(shift)
- if value is not None:
- state.set_register(dst.reg, value)
- return
- nbytes = idaapi.get_dtype_size(dst.dtype)
+# handles ldx mop_r, (mop_r | mop_S), (mop_r | mop_S)
+# note: sel register is ignored
+def handle_ldx_var_var(state: state_t, insn: ida_hexrays.minsn_t):
+ src = state.get_var_from_mop(insn.r)
+ state.read_from(insn.ea, src, insn.r, insn.d.size, insn.d) # record read
- # PIE memory move: mov rdx, [rax + vtbl_offset]
- dref = idaapi.get_first_dref_from(insn.ea)
- if dref != idaapi.BADADDR:
- value = ida_utils.get_nb_bytes(dref, nbytes)
- if value is not None:
- state.set_register(dst.reg, mem_t(value, dref, nbytes))
- return
+ # set dst mop value
+ deref = deref_t(src, insn.d.size) # default : unknown access
+ if isinstance(src, mem_t):
+ v = ida_utils.get_nb_bytes(src.get_uval(), insn.d.size) # try getting read value from memory
+ deref = mem_t(v, src.get_uval(), insn.d.size) if v is not None else deref
+ state.set_var_from_mop(insn.d, deref)
- # other cases
- disp = disp_t(base, src.addr, nbytes)
- state.set_register(dst.reg, disp)
- state.read_from(insn.ea, disp, dst.reg)
+# handles xdu (mop_r | mop_S), (mop_r | mop_S)
+def handle_xdu_var_var(state: state_t, insn: ida_hexrays.minsn_t):
+ assert insn.l.size < insn.d.size
+ src = state.get_var_from_mop(insn.l)
+ if not isinstance(src, int_t): # only makes sense to extend int
+ return state.drop_var_from_mop(insn.d)
-def handle_call(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- state.call_type = call_type_t.CALL
- resolve_callee(insn, state)
+ v = copy.copy(src)
+ v.size = insn.d.size
+ state.set_var_from_mop(insn.d, v)
-def handle_jump(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- state.call_type = call_type_t.JUMP
- if insn.itype in INSN_UNCONDITIONAL_JUMPS:
- resolve_callee(insn, state)
+# handles xdu mop_n, (mop_r | mop_S)
+def handle_xdu_imm_var(state: state_t, insn: ida_hexrays.minsn_t):
+ handle_mov_imm_var(state, insn)
-def handle_lea_reg_mem(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- # avoid 'lea esi, ds:2[rax*2]' flagged as 'lea reg, mem'
- if src.specflag1: # hasSIB
- state.drop_register(dst.reg)
- else:
- state.set_register(dst.reg, mem_t(src.addr, src.addr, ida_utils.get_ptr_size()))
+# handles xds (mop_r | mop_S), (mop_r | mop_S)
+def handle_xds_var_var(state: state_t, insn: ida_hexrays.minsn_t):
+ handle_xdu_var_var(state, insn) # we do not differenciate signed / unsigned (should we ?)
-def handle_lea_reg_disp(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- base = x64_base_reg(insn, src)
- cur = state.get_register(base)
+# handles xds mop_n, (mop_r | mop_S)
+def handle_xds_imm_var(state: state_t, insn: ida_hexrays.minsn_t):
+ handle_xdu_imm_var(state, insn)
- # mov rbx, [rax+rcx*2+n], ignored
- if x64_index_reg(insn, src) != x86_INDEX_NONE:
- state.drop_register(dst.reg)
- return
- # apply offset shift instead if input operand is a sid
- if isinstance(cur, buff_t):
- state.set_register(dst.reg, cur.offset(src.addr))
- else:
- # data can be referenced from reg disp in PIE
- # check if we have a data ref on the insn
- dref = idaapi.get_first_dref_from(insn.ea)
- if dref != idaapi.BADADDR:
- state.set_register(dst.reg, mem_t(dref, dref, ida_utils.get_ptr_size()))
- else:
- # we don't have any use for this
- state.drop_register(dst.reg)
+# handles call mop_v, (mop_f | mop_z)
+def handle_call(state: state_t, insn: ida_hexrays.minsn_t):
+ state.last_insn_type = last_insn_type_t.LAST_INSN_CALL
+ # resolve call arguments
+ if insn.d.t == ida_hexrays.mop_f:
+ # assert(insn.l.g == insn.d.f.callee) # insn.d.f.callee is not always resolved
-def handle_add_reg_imm(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- cur = state.get_register(dst.reg)
- if not cur:
- return
+ # we should not have non-flattened mop_d in the args list
+ assert not any([i.t == ida_hexrays.mop_d for i in insn.d.f.args])
- state.arguments.validate(cur)
+ state.call_args.extend([state.get_var_from_mop(i) for i in insn.d.f.args])
+ utils.g_logger.debug(f"call site {insn.ea:#x} : {len(state.call_args)} argument(s)")
- if not isinstance(cur, buff_t):
- state.drop_register(dst.reg)
+ # try to resolve callee
+ callee = idaapi.get_func(insn.l.g)
+ if callee is None or callee.start_ea != insn.l.g:
return
- if insn.itype == idaapi.NN_add:
- shift = src.value
- else:
- shift = -src.value
+ utils.g_logger.debug(f"call @ {insn.ea:#x} resolved to function {callee.start_ea:#x}")
+ state.call_to = callee
- # TODO : Le probleme c'est qu'on ne sait pas a l'avance la taille de l'acces
- # Pour l'instant on prend la taille de l'archi comme si c'etait un pointeur
- # Mais si c'est un pointeur x64 sur un DWORD a l'interieur de la structure par exemple
- # alors la taille ne devrait pas etre la taille de l'archi
- if src.type in [idaapi.o_imm]:
- size = ida_utils.get_ptr_size()
- if size == idaapi.get_dtype_size(dst.dtype):
- state.access_to(insn.ea, 1, disp_t(dst.reg, shift, size))
- state.set_register(dst.reg, cur.offset(shift))
+# handles icall mop_r, mop_r, (mop_f | mop_z)
+# note: sel register is ignored
+def handle_icall(state: state_t, insn: ida_hexrays.minsn_t):
+ state.last_insn_type = last_insn_type_t.LAST_INSN_CALL
+ # resolve call arguments
+ if insn.d.t == ida_hexrays.mop_f:
+ assert not any([i.t == ida_hexrays.mop_d for i in insn.d.f.args])
-def handle_xor_reg_reg(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- if dst.reg == src.reg:
- state.set_register(dst.reg, int_t(0, ida_utils.get_ptr_size()))
- else:
- state.drop_register(dst.reg)
+ state.call_args.extend([state.get_var_from_mop(i) for i in insn.d.f.args])
+ utils.g_logger.debug(f"icall site {insn.ea:#x} : {len(state.call_args)} argument(s)")
+ # try to resolve callee
+ off = state.get_var_from_mop(insn.r)
+ if not isinstance(off, mem_t):
+ return
-# handle stack alignements
-def handle_and_reg_imm(state: state_t, insn: idaapi.insn_t, dst: idaapi.op_t, src: idaapi.op_t):
- cur = state.get_register(dst.reg)
- if isinstance(cur, buff_t):
- value = ctypes.c_int32(cur.shift & src.value).value
- state.set_register(dst.reg, cur.clone(value))
- else:
- state.drop_register(dst.reg)
+ callee = idaapi.get_func(off.get_uval())
+ if callee is None or callee.start_ea != off.get_uval():
+ return
+
+ utils.g_logger.debug(f"icall @ {insn.ea:#x} resolved to function {callee.start_ea:#x}")
+ state.call_to = callee
-# stack shift by a push/pop operation
-def handle_stack_shift(state: state_t, op: idaapi.op_t, is_push: bool) -> stack_ptr_t:
- size = idaapi.get_dtype_size(op.dtype)
- stack_ptr = get_stack_ptr(state)
- if not isinstance(stack_ptr, stack_ptr_t):
- return None
+# special case for ret handling
+# there are no micro-insn for a ret
+def handle_ret(state: state_t) -> state_t:
+ state.reset()
- if is_push:
- size = -size
+ state.last_insn_type = last_insn_type_t.LAST_INSN_RET
- stack_ptr = stack_ptr.offset(size)
- set_stack_ptr(state, stack_ptr)
- return stack_ptr
+ retloc = state.fct.get_retloc()
+ if retloc is None:
+ return state
+ state.ret = state.get_var_from_loc(retloc)
-def handle_push_reg(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- stack_ptr = handle_stack_shift(state, op, True)
- reg = state.get_register(op.reg)
- if stack_ptr is not None and reg is not None:
- state.stack.push(stack_ptr.shift, reg)
+ return state
-def handle_push_imm(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- stack_ptr = handle_stack_shift(state, op, True)
- if stack_ptr is not None:
- nbytes = idaapi.get_dtype_size(op.dtype)
- state.stack.push(stack_ptr.shift, int_t(op.value, nbytes))
+# handles add (mop_r | mop_S), mop_n, (mop_r | mop_S)
+def handle_add_var_imm(state: state_t, insn: ida_hexrays.minsn_t, sign: int = 1):
+ v = state.get_var_from_mop(insn.l)
+ if not isinstance(v, buff_t):
+ return state.drop_var_from_mop(insn.d)
+ shifted_v = v.shift_by(sign * insn.r.nnn.value, insn.r.size)
+ state.set_var_from_mop(insn.d, shifted_v)
-def handle_pop_reg(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- # drop dst reg in any case
- state.drop_register(op.reg)
+ # this add can be a lea we need to type
+ # register an access for a field of size 0 (size is unknown)
+ state.access_to(insn.ea, shifted_v, insn.l, 0)
- stack_ptr = get_stack_ptr(state)
- if isinstance(stack_ptr, stack_ptr_t):
- # record poped value
- value = state.stack.pop(stack_ptr.shift)
- if value is not None:
- state.set_register(op.reg, value)
- # shift stack ptr
- size = idaapi.get_dtype_size(op.dtype)
- set_stack_ptr(state, stack_ptr.offset(size))
+# handles add (mop_r | mop_S), (mop_r | mop_S), (mop_r | mop_S)
+def handle_add_var_var(state: state_t, insn: ida_hexrays.minsn_t, sign: int = 1):
+ v = state.get_var_from_mop(insn.l)
+ v2 = state.get_var_from_mop(insn.r)
+ if not isinstance(v, buff_t) or not isinstance(v2, int_t):
+ return state.drop_var_from_mop(insn.d)
+ shifted_v = v.shift_by(sign * v2.get_val(), insn.r.size)
+ state.set_var_from_mop(insn.d, shifted_v)
-# shift stack pointer, ignore pushed/poped value
-def handle_ignored_push_pop(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- handle_stack_shift(state, op, (insn.itype == idaapi.NN_push))
+ state.access_to(insn.ea, shifted_v, insn.l, 0) # dummy access
-# validate register operand to be a used argument, to keep track of function args count
-# other type of operands (displ, phrase) are already validated in process_instruction()
-def validate_operand(state: state_t, insn: idaapi.insn_t, op: idaapi.op_t):
- if op.type == idaapi.o_reg:
- state.arguments.validate(state.get_previous_register(op.reg))
+# handles sub (mop_r | mop_S), mop_n, (mop_r | mop_S)
+def handle_sub_var_imm(state: state_t, insn: ida_hexrays.minsn_t):
+ handle_add_var_imm(state, insn, -1)
-# handle test instruction
-def handle_test(state: state_t, insn: idaapi.insn_t, op1: idaapi.op_t, op2: idaapi.op_t):
- validate_operand(state, insn, op1)
- validate_operand(state, insn, op2)
+# handles sub (mop_r | mop_S), (mop_r | mop_S), (mop_r | mop_S)
+def handle_sub_var_var(state: state_t, insn: ida_hexrays.minsn_t):
+ handle_add_var_var(state, insn, -1)
-# instructions specific handlers
-# list of (list[insn.itype], tuple(ops.type), handler)
-g_insn_handlers = [
- (
- # 1 operand instructions
- ([idaapi.NN_push], (idaapi.o_reg,), handle_push_reg), # push rbp
- ([idaapi.NN_push], (idaapi.o_imm,), handle_push_imm), # push 42h
- ([idaapi.NN_push], (idaapi.o_displ,), handle_ignored_push_pop), # push [ebp+var_14]
- ([idaapi.NN_push], (idaapi.o_mem,), handle_ignored_push_pop), # push bss_var
- ([idaapi.NN_push], (idaapi.o_phrase,), handle_ignored_push_pop), # push dword[rcx]
- ([idaapi.NN_pop], (idaapi.o_reg,), handle_pop_reg), # pop rbp
- ([idaapi.NN_pop], (idaapi.o_displ,), handle_ignored_push_pop), # pop [rbp+var_14]
- ([idaapi.NN_pop], (idaapi.o_phrase,), handle_ignored_push_pop), # pop [rcx]
- ([idaapi.NN_pop], (idaapi.o_mem,), handle_ignored_push_pop), # pop data_var
- (INSN_CALLS, (0,), handle_call), # call ?
- (INSN_JUMPS, (0,), handle_jump), # jne ?
+# handlers per instructions types
+g_per_minsn_handlers = {
+ ida_hexrays.m_mov: (
+ # mov rax, rcx
+ (
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_z,),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_mov_var_var,
+ ),
+ # mov #0, rax
+ ((ida_hexrays.mop_n,), (ida_hexrays.mop_z,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_mov_imm_var),
+ # mov dword_0, rax
+ ((ida_hexrays.mop_v,), (ida_hexrays.mop_z,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_mov_gbl_var),
+ # mov &dword_0, rax
+ ((ida_hexrays.mop_a,), (ida_hexrays.mop_z,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_mov_addr_var),
),
- (
- # 2 operands instructions
- (INSN_MOVES, (idaapi.o_phrase, idaapi.o_reg), handle_mov_disp_reg), # mov [rcx], rax
- (INSN_MOVES, (idaapi.o_displ, idaapi.o_reg), handle_mov_disp_reg), # mov [rcx+10h], rax
- (INSN_MOVES, (idaapi.o_phrase, idaapi.o_imm), handle_mov_disp_imm), # mov [rcx], 10h
- (INSN_MOVES, (idaapi.o_displ, idaapi.o_imm), handle_mov_disp_imm), # mov [rcx+10h], 10h
- (INSN_MOVES, (idaapi.o_reg, idaapi.o_reg), handle_mov_reg_reg), # mov rax, rbx
- (INSN_MOVES, (idaapi.o_reg, idaapi.o_imm), handle_mov_reg_imm), # mov rax, 10h
- (INSN_MOVES, (idaapi.o_reg, idaapi.o_mem), handle_mov_reg_mem), # mov rax, @addr
- (INSN_MOVES, (idaapi.o_reg, idaapi.o_phrase), handle_mov_reg_disp), # mov rax, [rbx]
+ ida_hexrays.m_stx: (
+ # stx rax, ds, rcx
(
- INSN_MOVES,
- (idaapi.o_reg, idaapi.o_displ),
- handle_mov_reg_disp,
- ), # mov rax, [rbx+10h]
- (INSN_MOVES, (idaapi.o_mem, 0), handle_ignore), # mov @addr, ?
- (INSN_TESTS, (0, 0), handle_test), # test ?, ?
- (INSN_CMPS, (0, 0), handle_test), # cmp ?, ?
- (INSN_LEAS, (idaapi.o_reg, idaapi.o_mem), handle_lea_reg_mem), # lea rax, @addr
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_r,),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_stx_var_var,
+ ),
+ # stx #0, ds, rcx
+ ((ida_hexrays.mop_n,), (ida_hexrays.mop_r,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_stx_imm_var),
+ # stx dword_0, ds, rcx
+ ((ida_hexrays.mop_v,), (ida_hexrays.mop_r,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_stx_gbl_var),
+ # stx &dword_0, ds, rcx
+ ((ida_hexrays.mop_a,), (ida_hexrays.mop_r,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_stx_addr_var),
+ ),
+ ida_hexrays.m_ldx: (
+ # ldx ds, rax, rcx
(
- INSN_LEAS,
- (idaapi.o_reg, idaapi.o_displ),
- handle_lea_reg_disp,
- ), # lea rax, [rbx+10h]
- (INSN_LEAS, (idaapi.o_reg, 0), handle_ignore), # lea rax, ?
- (INSN_XORS, (idaapi.o_reg, idaapi.o_reg), handle_xor_reg_reg), # xor rax, rax
- (INSN_ANDS, (idaapi.o_reg, idaapi.o_imm), handle_and_reg_imm), # and esp, 0xfffffff0
- (INSN_MATHS, (idaapi.o_reg, idaapi.o_imm), handle_add_reg_imm), # add rax, 10h
+ (ida_hexrays.mop_r,),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_ldx_var_var,
+ ),
),
-]
-
+ ida_hexrays.m_xdu: (
+ # xdu esi, rsi
+ (
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_z,),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_xdu_var_var,
+ ),
+ # xdu #0, rsi
+ ((ida_hexrays.mop_n,), (ida_hexrays.mop_z,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_xdu_imm_var),
+ ),
+ ida_hexrays.m_xds: (
+ # xds esi, rsi
+ (
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_z,),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_xds_var_var,
+ ),
+ # xds #0, rsi
+ ((ida_hexrays.mop_n,), (ida_hexrays.mop_z,), (ida_hexrays.mop_r, ida_hexrays.mop_S), handle_xds_imm_var),
+ ),
+ ida_hexrays.m_call: (
+ # call sub_0, (arg1, arg2, ..)
+ ((ida_hexrays.mop_v,), (ida_hexrays.mop_z,), (ida_hexrays.mop_f, ida_hexrays.mop_z), handle_call),
+ ),
+ ida_hexrays.m_icall: (
+ # icall cs, x16, (arg1, arg2, ..)
+ ((ida_hexrays.mop_r,), (ida_hexrays.mop_r,), (ida_hexrays.mop_f, ida_hexrays.mop_z), handle_icall),
+ ),
+ ida_hexrays.m_add: (
+ # add rax, #0, rax
+ (
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_n,),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_add_var_imm,
+ ),
+ # add rax, rcx, rax
+ (
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_add_var_var,
+ ),
+ ),
+ ida_hexrays.m_sub: (
+ # sub rax, #0, rax
+ (
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_n,),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_sub_var_imm,
+ ),
+ # sub rax, rcx, rax
+ (
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ (ida_hexrays.mop_r, ida_hexrays.mop_S),
+ handle_sub_var_var,
+ ),
+ ),
+}
-# check wheter given insn types meet the required ones
-def check_types(effective: tuple, expected: tuple) -> bool:
- for i in range(len(expected)):
- if expected[i] != 0 and effective[i] != expected[i]:
- return False
- return True
+def get_handler_for_insn(insn: ida_hexrays.minsn_t) -> Optional[Callable[[state_t, ida_hexrays.minsn_t], None]]:
+ family = g_per_minsn_handlers.get(insn.opcode, tuple())
-# dump full instruction
-def dump_insn(insn: idaapi.insn_t, level: int = config.LOG_LEVEL_VERBOSE_DEBUG):
- if level >= utils.g_logger.level: # do not compute __repr__ everytime
- utils.g_logger.log(level, insn_str_full(insn))
+ for lft, rgt, dst, handler in family:
+ if insn.l.t in lft and insn.r.t in rgt and insn.d.t in dst:
+ return handler
+ return None
-# handle zero-operand instructions
-def handle_no_op_insn(state: state_t, insn: idaapi.insn_t):
- if insn.itype in INSN_RETS:
- state.save_ret(insn.ea)
+# debug: pretty print current state and insn
+def dbg_dump_state_insn(insn: ida_hexrays.minsn_t, state: state_t):
+ if utils.g_logger.level > config.LOG_LEVEL_VERBOSE_DEBUG:
+ return
-# handle one-operand instructions
-def handle_one_op_insn(state: state_t, insn: idaapi.insn_t, ops):
- handler, it_type = None, None
- op = ops[0]
+ utils.g_logger.log(config.LOG_LEVEL_VERBOSE_DEBUG, f"<----- insn & state @ {insn.ea:#x} ----->")
+ utils.g_logger.log(config.LOG_LEVEL_VERBOSE_DEBUG, f"insn: {ida_utils.insn_str_full(insn)}")
+ utils.g_logger.log(config.LOG_LEVEL_VERBOSE_DEBUG, f"state: {state}")
- for itype, optype, current in g_insn_handlers[ONE_OPERAND_INSTRUCTIONS]:
- if insn.itype in itype:
- it_type = insn.itype
- if check_types((op.type,), optype):
- handler = current
- break
- if not it_type:
- handle_reg_drop(state, insn, op)
- return
+# divide a microinstruction into the subinstructions composing it
+# returned instructions are ordered from first to last executed
+# + patch instructions to use special kreg to transfer results
+# note: this does not handle mop_d in mop_f arguments list
+def flatten_minsn(minsn: ida_hexrays.minsn_t, mba: ida_hexrays.mba_t) -> Collection[ida_hexrays.minsn_t]:
+ subs = deque()
+ used_kregs = deque()
- if handler:
- handler(state, insn, op)
- return
+ # always copy, minsn in IDA Python 8.4 have a tendency to get freed prematurely. TODO: why ?
+ # mk_copy &= any([(op.t == ida_hexrays.mop_d) for op in (minsn.l, minsn.r, minsn.d)])
+ to_patch = ida_hexrays.minsn_t(minsn)
- if False:
- dump_insn(insn)
- raise BaseException("not implemented")
-
-
-# handle two-operands instructions
-def handle_two_ops_insn(state: state_t, insn: idaapi.insn_t, ops):
- handler = None
- dst, src = ops[0], ops[1]
- known_type = None
- for itypes, optype, current in g_insn_handlers[TWO_OPERAND_INSTRUCTIONS]:
- if insn.itype in itypes:
- known_type = insn.itype
- if check_types((dst.type, src.type), optype):
- handler = current
- break
-
- if not known_type:
- # drop destination register only
- handle_reg_drop(state, insn, dst)
- return
+ # search operands for sub instructions
+ for num_op in ("l", "r", "d"):
+ op = getattr(to_patch, num_op)
+ if op.t != ida_hexrays.mop_d: # is op a subinsn
+ continue
- if handler:
- handler(state, insn, dst, src)
- return
+ # sub ret value is used as insn operand
+ if op.d.d.t == ida_hexrays.mop_z:
+ sub = ida_hexrays.minsn_t(op.d) # copy to patch
- if dst.type == idaapi.o_reg:
- state.drop_register(dst.reg)
+ # kreg to use for transfering sub ret to insn
+ kreg = mba.tmp_result_kregs.pop()
+ used_kregs.append(kreg)
+ krop = ida_hexrays.mop_t(kreg, op.size) # make mop_r
- if False:
- dump_insn(insn)
- raise BaseException("not implemented")
+ # sub ret to kreg
+ sub.d = krop
+ # sub may also contain sub instructions
+ subs.extend(flatten_minsn(sub, mba))
-# pretty print state and insn
-def dbg_dump_state_insn(insn: idaapi.insn_t, state: state_t):
- utils.g_logger.log(config.LOG_LEVEL_VERBOSE_DEBUG, "---------------------------------------------------------")
- dump_insn(insn, config.LOG_LEVEL_VERBOSE_DEBUG)
- utils.g_logger.log(config.LOG_LEVEL_VERBOSE_DEBUG, state)
+ # sub-insn does not return a value (m_call)
+ else:
+ # insn should read call ret from call_result_kreg
+ # call_result_kreg is set by flow_in_callee
+ krop = ida_hexrays.mop_t(mba.call_result_kreg, op.size)
+ subs.extend(flatten_minsn(op.d, mba))
-def handle_struc_access(state: state_t, insn: idaapi.insn_t, ops: List[idaapi.op_t]):
- # register any access through displ missed by custom handlers
- for i, op in enumerate(ops):
- if op.type in [idaapi.o_phrase, idaapi.o_displ]:
- base = x64_base_reg(insn, op)
- index = x64_index_reg(insn, op)
+ # replace minsn original operand with kreg
+ setattr(to_patch, num_op, krop)
- # validate base reg for parameters tracking
- cur = state.get_previous_register(base)
- state.arguments.validate(cur)
+ # release used kregs
+ mba.tmp_result_kregs.extend(used_kregs)
- if index == x86_INDEX_NONE: # ignore base + index*scale + offset
- nbytes = idaapi.get_dtype_size(op.dtype)
- state.access_to(insn.ea, i, disp_t(base, op.addr, nbytes))
- else: # validate index usage
- cur = state.get_previous_register(index)
- state.arguments.validate(cur)
+ subs.append(to_patch)
+ return subs
# process one instruction & update current state
-def process_instruction(state: state_t, insn: idaapi.insn_t):
- ops: list[idaapi.op_t] = ida_utils.get_insn_ops(insn)
+def process_instruction(state: state_t, insn: ida_hexrays.minsn_t):
+ # reset previous instruction state
state.reset()
- op_len = len(ops)
- if op_len == 0:
- handle_no_op_insn(state, insn)
- elif op_len == 1:
- handle_one_op_insn(state, insn, ops)
- elif op_len == 2:
- handle_two_ops_insn(state, insn, ops)
- elif op_len == 3:
- handle_reg_drop(state, insn, ops[0])
- elif op_len == 4:
- handle_reg_drop(state, insn, ops[0])
+ # modify the current state according to the insn
+ handler = get_handler_for_insn(insn)
+ if handler is None:
+ utils.g_logger.log(config.LOG_LEVEL_VERBOSE_DEBUG, f"unsupported insn @ {insn.ea:#x}")
+ state.drop_var_from_mop(insn.d)
else:
- utils.g_logger.error("unsupported instruction with %d operands:" % op_len)
- dump_insn(insn, logging.ERROR)
-
- handle_struc_access(state, insn, ops)
+ handler(state, insn)
+ # dump the new state
dbg_dump_state_insn(insn, state)
-# read next instruction within giben basic block
-def next_instruction(ea: int, block: idaapi.range_t, insn: idaapi.insn_t) -> bool:
- while (ea != idaapi.BADADDR and ea < block.end_ea) and not idaapi.is_code(idaapi.get_flags(ea)):
- ea = idaapi.get_item_end(ea)
-
- if ea >= block.end_ea or ea == idaapi.BADADDR:
- return False
-
- idaapi.decode_insn(insn, ea)
- return True
-
-
# select most interesting state (most sid_t, call_t)
-def select_state(states: list) -> state_t:
+def select_state(states: List[state_t]) -> state_t:
states.sort(key=lambda e: (e.get_nb_types(sid_t), e.get_nb_types(call_t)), reverse=True)
return states[0]
-# Get the starting state for a basic block
+# get the starting state for a basic block
# if many states are possible, select the one with the most info in it
-def get_previous_state(flow, idx, prev_states) -> state_t:
- npred = flow.npred(idx)
+def get_previous_state(block: ida_hexrays.mblock_t, prev_states: Dict[int, state_t]) -> state_t:
+ npred = block.npred()
initial = prev_states[idaapi.BADADDR]
- # no predecessor, just use starting state
- if npred == 0:
- out = state_t(initial.fct_ea)
- out.arguments = initial.arguments # keep arguments tracker
- return out
-
- # only one predecessor, use its state
- if npred == 1:
- last_node = flow.pred(idx, 0)
- if last_node == idx:
- out = state_t(initial.fct_ea)
- out.arguments = initial.arguments
- return out
-
- if last_node not in prev_states.keys():
- raise BaseException("invalid previous node")
-
- return prev_states[last_node].copy()
-
- # multiple predecessors, find one suitable
- predecessors = []
+ # get all candidates for previous state
+ states = []
for i in range(npred):
- predecessor_node = flow.pred(idx, i)
- if predecessor_node in prev_states.keys():
- predecessors.append(prev_states[predecessor_node])
+ prev = block.pred(i)
+ if prev in prev_states:
+ states.append(prev_states[prev])
- if len(predecessors) == 0:
- raise BaseException("no previous node found")
+ if len(states) == 0:
+ return state_t(initial.mba, initial.fct)
- return select_state(predecessors).copy()
+ return select_state(states).copy()
# next node to visit from given list
@@ -534,89 +478,38 @@ def pop_node(nodes: Collection[Tuple[int, Set[int]]], visited: Set[int]) -> int:
return node
-def walk_topological(flow) -> Iterator[int]:
+def walk_topological(mba: ida_hexrays.mba_t) -> Generator[int, None, None]:
# generate a list of nodes with predecessors
nodes: Collection[Tuple[int, Set[int]]] = list()
- for i in range(flow.size()):
- # all predecessors, excluding current node
- preds = set([flow.pred(i, j) for j in range(flow.npred(i)) if flow.pred(i, j) != i])
- nodes.append((i, preds))
+
+ cur: ida_hexrays.mblock_t = mba.blocks
+ while cur:
+ # avoid empty blocks (head, tail & other purged blocks)
+ if not cur.empty():
+ preds = set(cur.pred(i) for i in range(cur.npred()) if not mba.get_mblock(cur.pred(i)).empty())
+ nodes.append((cur.serial, preds))
+ cur = cur.nextb
visited: Set[int] = set()
while len(nodes):
yield pop_node(nodes, visited)
-# a visited function
-class function_t:
- def __init__(self, ea):
- self.ea = ea
-
- # guessed cc
- self.cc = get_abi()
-
- # approximate count of arguments
- self.args_count = self.cc.get_arg_count()
- self.args = [set() for i in range(self.args_count)] # sets of (sid, shift)
-
- self.cc_not_guessed = True
-
- def update_visited(self, state: state_t):
- for i in range(self.args_count):
- cur = get_argument(self.cc, state, i)
- if isinstance(cur, sid_t):
- self.args[i].add((cur.sid, cur.shift))
-
- def has_args(self) -> bool:
- for i in range(self.args_count):
- if len(self.args[i]) > 0:
- return True
- return False
-
- # guess function cc & arguments count
- def guess_function_cc(self, arguments: arguments_t):
- # always use guessed cc from arguments, in case arguments'cc is de-synced with self.cc
- cc, start_arg, args_count = arguments.guess_cc()
- self.cc = cc
-
- fixed_args_count = min(self.cc.get_arg_count(), args_count)
- if self.cc_not_guessed:
- self.args_count = fixed_args_count
-
- # shift args array if needed
- if start_arg > 0:
- self.args = self.args[start_arg:]
-
- self.cc_not_guessed = False
-
- elif self.args_count < fixed_args_count:
- self.args_count = fixed_args_count
-
- # guessed args count
- def get_count(self) -> int:
- if self.cc_not_guessed:
- return 0
- return self.args_count
-
- def __repr__(self):
- return f"cpustate.function_t {hex(self.ea)}"
-
-
# Injector into state_t
class injector_t:
def __init__(self, callback=None, when: int = 0):
- self.callback = callback # callback(state: state_t, insn: idaapi.insn_t, before_update: bool)
+ self.callback = callback # callback(state: state_t, ea: int, sub_ea: int, before_update: bool)
self.when = when # when & 1 -> inject before, when & 2 -> inject after
# inject value before processing current instruction
- def inject_before(self, state: state_t, insn: idaapi.insn_t):
+ def inject_before(self, state: state_t, ea: int, sub_ea: int):
if self.when & 1:
- self.callback(state, insn, True)
+ self.callback(state, ea, sub_ea, True)
# inject value after the current instruction has been processed
- def inject_after(self, state: state_t, insn: idaapi.insn_t):
+ def inject_after(self, state: state_t, ea: int, sub_ea: int):
if self.when & 2:
- self.callback(state, insn, False)
+ self.callback(state, ea, sub_ea, False)
# should_propagate default callback
@@ -653,190 +546,162 @@ def has_function(self, ea: int) -> bool:
return ea in self.visited
# get or create function
- def get_function(self, ea: int) -> function_t:
- if not self.has_function(ea):
- self.visited[ea] = function_t(ea)
- return self.visited[ea]
-
- # get function's cc
- def get_function_cc(self, ea: int) -> arch.abi_t:
- if self.has_function(ea):
- return self.visited[ea].cc
- return get_abi() # default cc
-
-
-# if given instruction is a call / jmp, get its target
-def resolve_callee(insn: idaapi.insn_t, state: state_t):
- target = insn.ops[0]
- if target.type == idaapi.o_reg: # call rax
- cur = state.get_register(target.reg)
- if not isinstance(cur, mem_t):
- return
-
- target_addr = cur.get_val()
-
- elif target.type in [idaapi.o_mem, idaapi.o_far, idaapi.o_near]:
- target_addr = target.addr
- if target.type == idaapi.o_mem:
- target_addr = ida_utils.dereference_pointer(target_addr)
-
- else:
- return
-
- callee = idaapi.get_func(target_addr)
- if callee is None or callee.start_ea != target_addr:
- return
-
- utils.g_logger.debug(f"call at 0x{insn.ea:x} resolved to function 0x{callee.start_ea:x}")
- state.call_to = callee
+ def get_function_for_mba(self, mba: ida_hexrays.mba_t) -> function_t:
+ if not self.has_function(mba.entry_ea):
+ self.visited[mba.entry_ea] = function_t(mba)
+ return self.visited[mba.entry_ea]
+ # get or create function model for ea
+ def get_function(self, fct: idaapi.func_t) -> Optional[function_t]:
+ mba = ida_utils.get_func_microcode(fct)
+ if mba is None:
+ return None
-# validate that function arguments are used if they are passed to another function
-def validate_passthrough_args(caller_state: state_t, callee: function_t, is_call: bool):
- for i in range(callee.get_count()):
- cur = get_argument(callee.cc, caller_state, i, False, not is_call)
- caller_state.arguments.validate(cur)
+ return self.get_function_for_mba(mba)
# copy callee's arguments from caller's state and propagate in callee
-def flow_in_callee(call_ea: int, state: state_t, param: dflow_ctrl_t) -> Iterator[Tuple[int, state_t]]:
+def flow_in_callee(
+ call_ea: int, state: state_t, params: dflow_ctrl_t
+) -> Generator[Tuple[int, int, state_t], None, None]:
ret_value = call_t(idaapi.BADADDR if state.call_to is None else state.call_to.start_ea, call_ea)
- is_call = state.call_type == call_type_t.CALL
if state.call_to is not None: # callee was resolved
- model = param.get_function(state.call_to.start_ea)
+ # microcode for callee
+ mba = ida_utils.get_func_microcode(state.call_to)
+ if mba is None:
+ utils.g_logger.warning(f"No mba for callee {state.call_to.start_ea:#x}")
+ else:
+ # callee initial state
+ cistate = state_t(mba, params.get_function_for_mba(mba))
+ populate_arguments(cistate, state)
+
+ params.depth -= 1
- cistate = state_t(model.ea)
- populate_arguments(cistate, model.cc, state, is_call)
+ # propagate in callee
+ # peep at intermediate states to catch return values
+ for ea, sea, cstate in function_data_flow(cistate, params):
+ if isinstance(cstate.ret, absop_t) and cstate.ret.should_dive() and cstate.fct == cistate.fct:
+ ret_value = cstate.ret
- param.depth -= 1
- for ea, cstate in function_data_flow(state.call_to, cistate, param):
- # get callee return value
- if cstate.ret is not None and state.call_to.contains(cstate.ret.where) and not isinstance(ret_value, sid_t):
- ret_value = cstate.ret.code
+ yield ea, sea, cstate
- yield ea, cstate
+ params.depth += 1
- param.depth += 1
+ # set last call return value
+ utils.g_logger.debug(f"ret value for call @ {call_ea:#x} set to {ret_value}")
+ state.set_register(state.mba.call_result_kreg, ret_value)
- # validate parameters used in callee
- validate_passthrough_args(state, model, is_call)
- set_ret_value(state, ret_value)
+# propagate in a function, using given initial state and parameters
+def function_data_flow(initial_state: state_t, params: dflow_ctrl_t) -> Generator[Tuple[int, int, state_t], None, None]:
+ mba = initial_state.mba
+ # apply entry injection before deciding if we should continue
+ # note: function's ea may differ from first insn.ea
+ params.injector.inject_before(initial_state, mba.entry_ea, -1)
-# propagate in given function, using given initial state and parameters
-def function_data_flow(
- fct: idaapi.func_t, initial_state: state_t, param: dflow_ctrl_t
-) -> Iterator[Tuple[int, state_t]]:
- model = param.get_function(fct.start_ea) # function's model
+ # check if we can get new info by propagating there
+ if not params.should_propagate(initial_state.fct, initial_state):
+ return
# record initial states for every node
prev_states = dict() # bb index -> state
prev_states[idaapi.BADADDR] = initial_state
+ # analyze calls & resolve callees arguments
+ # this takes decompilation, do it after we are sure to analyze the function
+ ida_utils.mba_analyze_calls(mba)
+
# get nodes flooding order
- flow = idaapi.qflow_chart_t()
- flow.create("", fct, fct.start_ea, fct.end_ea, idaapi.FC_NOEXT)
- nodes = walk_topological(flow)
+ nodes = walk_topological(mba)
+ # get entry basic block
try:
- entry = flow[next(nodes)] # function's entry block
+ block = mba.get_mblock(next(nodes))
except StopIteration: # function has no block
- utils.g_logger.error(f"No entry block for function 0x{fct.start_ea}")
+ utils.g_logger.error(f"No entry block for function 0x{mba.entry_ea}")
return
- insn = idaapi.insn_t() # current instruction
- next_instruction(entry.start_ea, entry, insn)
+ insn = block.head # first instruction
+ state = initial_state # first state
- # apply entry injection before recording function's arguments
- param.injector.inject_before(initial_state, insn)
- initial_state.reset_arguments(model.cc)
-
- # check if we can get new info by propagating there
- if not param.should_propagate(model, initial_state):
- return
- model.update_visited(initial_state)
-
- # process first instruction
- process_instruction(initial_state, insn)
-
- state = initial_state
- node_id = 0
+ # two minsn may have the same ea, use sub_ea to distinguish them
+ sub_ea = 0
# for every basic block
while True:
# for every instruction
- while True:
- # yield state after instruction processing
- # before after-process injection
- yield insn.ea, state
+ while insn:
+ # for every subinstruction forming the instruction
+ for subinsn in flatten_minsn(insn, mba):
+ params.injector.inject_before(state, subinsn.ea, sub_ea)
+
+ process_instruction(state, subinsn)
+
+ # yield state after processing the insn
+ yield subinsn.ea, sub_ea, state
- # we need to go deeper
- if state.call_to is not None or state.call_type == call_type_t.CALL:
- for cea, cstate in flow_in_callee(insn.ea, state, param):
- yield cea, cstate
+ # we need to go deeper
+ if state.has_call_info():
+ yield from flow_in_callee(subinsn.ea, state, params)
- param.injector.inject_after(state, insn)
+ params.injector.inject_after(state, subinsn.ea, sub_ea)
+ sub_ea += 1
- if not next_instruction(insn.ea + insn.size, entry, insn):
- break
+ # forget intermediate results
+ state.drop_kregs()
- param.injector.inject_before(state, insn)
- process_instruction(state, insn)
+ sub_ea = sub_ea if (insn.next and insn.ea == insn.next.ea) else 0
+ insn = insn.next
+
+ # tail basic block (ending with a ret)
+ # there are no specific minsn for ret, a tail bb is only followed by the special BLT_STOP bb
+ # note: a call to a noreturn function creates a special bb without any successor
+ if block.nsucc() == 1 and mba.get_mblock(block.succ(0)).type == idaapi.BLT_STOP:
+ yield block.end, -1, handle_ret(state)
# add updated state to previous states
- prev_states[node_id] = state
+ prev_states[block.serial] = state
# next block to process
try:
- node_id = next(nodes)
+ block = mba.get_mblock(next(nodes))
except StopIteration:
break
- entry = flow[node_id] # next block
- next_instruction(entry.start_ea, entry, insn) # assume there is at least one instruction per bb
-
- state = get_previous_state(flow, node_id, prev_states)
- param.injector.inject_before(state, insn)
- process_instruction(state, insn)
+ insn = block.head
+ state = get_previous_state(block, prev_states)
- # deduce function's calling convention
- model.guess_function_cc(initial_state.arguments)
+# copy arguments from caller state to callee state
+def populate_arguments(callee_state: state_t, caller_state: Optional[state_t] = None):
+ # make sure the number of arguments of the call site VS function's prototype are the same
+ if caller_state and callee_state.fct.get_args_count() != len(caller_state.call_args):
+ utils.g_logger.warning(
+ f"fct {callee_state.get_fea():#x} mismatch between fct nargs ({callee_state.fct.get_args_count()}) and call site args {len(caller_state.call_args)}"
+ )
-# copy arguments from caller state to callee state, depending on callee cc
-def populate_arguments(
- callee_state: state_t, callee_cc: arch.abi_t, caller_state: state_t = None, is_call: bool = True
-):
- for i in range(callee_cc.get_arg_count()):
- arg = None
- if caller_state is not None:
- arg = get_argument(callee_cc, caller_state, i, False, not is_call)
+ for i in range(callee_state.fct.get_args_count()):
+ val = (
+ caller_state.call_args[i] if (caller_state and i < len(caller_state.call_args)) else None
+ ) # get caller value for the arg
+ val = val if isinstance(val, absop_t) and val.should_dive() else arg_t(i) # use default arg when required
- if arg is None or not arg.should_dive():
- set_argument(callee_cc, callee_state, i, arg_t(i))
- else:
- # copy so we have a fresh reference for args count tracking
- set_argument(callee_cc, callee_state, i, copy.copy(arg))
+ callee_state.set_var_from_loc(callee_state.fct.get_argloc(i), val)
-# generate cpu state for input function
-# params.depth = when propagating in function call/jumps, max depth to go
-# -1 = follow until no more sid_t in state, 0 = don't follow calls
+# generate cpu state for given function
def generate_state(
- func: idaapi.func_t, params: dflow_ctrl_t = None, cc: arch.abi_t = None
-) -> Iterator[Tuple[int, state_t]]:
- starting_state = state_t(func.start_ea)
-
- if params is None:
- params = dflow_ctrl_t()
-
- if cc is None:
- cc = get_abi()
+ func: idaapi.func_t, params: Optional[dflow_ctrl_t] = None
+) -> Generator[Tuple[int, int, state_t], None, None]:
+ mba = ida_utils.get_func_microcode(func)
+ if mba is None:
+ utils.g_logger.error(f"no microcode for {func.start_ea}, no states generated")
+ return
- # Set up starting state with arguments
- populate_arguments(starting_state, cc)
+ params = params or dflow_ctrl_t()
+ starting_state = state_t(mba, params.get_function_for_mba(mba))
+ populate_arguments(starting_state)
- for ea, state in function_data_flow(func, starting_state, params):
- yield ea, state
+ yield from function_data_flow(starting_state, params)
diff --git a/symless/existing.py b/symless/existing.py
index 91df74c..c37fc1c 100644
--- a/symless/existing.py
+++ b/symless/existing.py
@@ -1,71 +1,140 @@
+from collections import deque
+from typing import Collection, Tuple
+
import idaapi
-import idc
+import symless.generation as generation
import symless.utils.ida_utils as ida_utils
-
-
-# set existing structure padding fields to undefined
-def remove_padd_fields(struc: idaapi.struc_t):
- offset = idaapi.get_struc_first_offset(struc)
- size = idaapi.get_struc_size(struc)
-
- while offset < size and offset != idaapi.BADADDR:
- member = idaapi.get_member(struc, offset)
-
- if member is not None: # avoid undefined fields
- name = idaapi.get_member_name(member.id)
- if name.startswith("padd_"):
- idaapi.del_struc_member(struc, offset)
-
- offset = idaapi.get_struc_next_offset(struc, offset)
-
-
-# get flags giving the right type for given struct member size
-def get_data_flags(size: int):
- flags = idaapi.FF_DATA
- if size < 32: # avoid ymmword type, raises warnings
- flags |= idaapi.get_flags_by_size(size)
- return flags
-
-
-# Add padding fields to structure
-def add_padd_fields(struc: idaapi.struc_t, size: int):
- current, next = 0, 0
- struc_size = idaapi.get_struc_size(struc.id)
-
- while next != struc_size:
- if idc.get_member_id(struc.id, next) != -1:
- if next - current > 0:
- msize = next - current
- idaapi.add_struc_member(struc, f"padd__{current:08x}", current, get_data_flags(msize), None, msize)
- next = idc.get_next_offset(struc.id, next)
- current = next
- else:
- next = idc.get_next_offset(struc.id, next)
-
- if struc_size < size:
- msize = size - struc_size
- idaapi.add_struc_member(struc, f"padd__{struc_size:08x}", struc_size, get_data_flags(msize), None, msize)
-
-
-# was a structured assigned to an assembly operand
-def has_op_stroff(ea: int, n: int):
+import symless.utils.utils as utils
+
+
+# add special gap field to structure
+def make_gap(struc: idaapi.tinfo_t, off: int, size: int):
+ udm = idaapi.udm_t()
+ udm.offset = off * 8
+ udm.size = size * 8
+ udm.name = f"gap{off:X}"
+ udm.tafld_bits |= idaapi.TAFLD_GAP # is_gap
+
+ # set type to _BYTE[size]
+ arr = idaapi.array_type_data_t(0, size)
+ arr.elem_type = ida_utils.get_basic_type(idaapi.BT_VOID | idaapi.BTMT_SIZE12)
+ udm.type = idaapi.tinfo_t()
+ udm.type.create_array(arr)
+
+ tcode = struc.add_udm(udm, idaapi.ETF_MAY_DESTROY)
+ if tcode != idaapi.TERR_OK:
+ utils.g_logger.error(
+ f'Failed to gap {struc.get_type_name()} with {udm.name} (size {size:#x}, type "{udm.type}"): "{idaapi.tinfo_errstr(tcode)}" ({tcode:#x})'
+ )
+
+
+# remove padding fields from structure
+def remove_padd_fields(struc: idaapi.tinfo_t):
+ details = idaapi.udt_type_data_t()
+ struc.get_udt_details(details)
+
+ # search for gaps
+ gaps: Collection[Tuple[int, int]] = deque()
+ for udm in details:
+ if udm.name == f"gap{(udm.offset // 8):X}": # gap identified by name
+ # we do not want consecutive gaps latter, merge them
+ if len(gaps) and gaps[0][1] == udm.offset:
+ gaps[0] = (gaps[0][0], udm.offset + udm.size)
+ else:
+ gaps.appendleft((udm.offset, udm.offset + udm.size))
+
+ # merge all gaps into real gaps fields
+ for off, end in gaps:
+ make_gap(struc, off // 8, (end // 8) - (off // 8))
+
+
+# remove gap flag from padd fields
+# + padd to reach at least min_size bytes
+def add_padd_fields(struc: idaapi.tinfo_t, min_size: int):
+ csize = struc.get_size()
+ if csize < min_size: # padd to final size
+ make_gap(struc, csize, min_size - csize)
+
+ # remove gap flags from gaps - collapse padding
+ details = idaapi.udt_type_data_t()
+ struc.get_udt_details(details)
+ for udm in details:
+ udm.tafld_bits &= ~idaapi.TAFLD_GAP
+ struc.create_udt(details) # removes tid & ordinal info
+
+
+# get the structure path assigned to given operand
+def get_op_stroff(ea: int, n: int):
delta, path = idaapi.sval_pointer(), idaapi.tid_array(idaapi.MAXSTRUCPATH)
- return idaapi.get_stroff_path(path.cast(), delta.cast(), ea, n) > 0
+ if idaapi.get_stroff_path(path.cast(), delta.cast(), ea, n) == 0:
+ return idaapi.BADADDR
+ return path[0]
# find existing vtable structure from vtable ea
def find_existing_vtable(ea: int) -> int:
tinfo = idaapi.tinfo_t()
- if not idaapi.get_tinfo(tinfo, ea):
+ if not (idaapi.get_tinfo(tinfo, ea) and tinfo.is_udt()):
return idaapi.BADADDR
- return ida_utils.struc_from_tinfo(tinfo)
+ return tinfo.get_tid()
-# can we replace existing type with a struct type
-# only if type is a scalar or a scalar ptr
-def can_type_be_replaced(tinfo: idaapi.tinfo_t) -> bool:
- ptr_data = idaapi.ptr_type_data_t()
- if tinfo.get_ptr_details(ptr_data):
- tinfo = ptr_data.obj_type
- return tinfo.is_scalar() and not tinfo.is_enum()
+# find existing structure with given name
+def find_existing_structure(name: str) -> int:
+ tinfo = ida_utils.get_local_type(name)
+ if tinfo is None:
+ return idaapi.BADADDR
+
+ if tinfo.is_forward_struct():
+ ida_utils.replace_forward_ref(tinfo)
+ elif not tinfo.is_udt():
+ return idaapi.BADADDR
+
+ return tinfo.get_tid()
+
+
+# should we replace an existing type in the idb by our struc ptr
+# types we think it's ok to replace are void, scalars and scalars pointers
+def should_arg_type_be_replaced(tinfo: idaapi.tinfo_t) -> bool:
+ if tinfo.is_ptr():
+ ptr_data = idaapi.ptr_type_data_t()
+ if not tinfo.get_ptr_details(ptr_data) or ptr_data.parent.get_realtype() != idaapi.BT_UNK:
+ return False
+ tinfo = ptr_data.obj_type # decide on pointee type
+
+ # void, ints, floats, bools
+ return idaapi.get_base_type(tinfo.get_realtype()) < idaapi.BT_PTR
+
+
+# should we replace an existing struc field type
+# only replace integers and void pointer
+def should_field_type_be_replaced(tinfo: idaapi.tinfo_t) -> bool:
+ if tinfo.is_ptr():
+ ptr_data = idaapi.ptr_type_data_t()
+ if not tinfo.get_ptr_details(ptr_data) or ptr_data.parent.get_realtype() != idaapi.BT_UNK:
+ return False
+
+ # void*
+ return idaapi.get_base_type(ptr_data.obj_type.get_realtype()) == idaapi.BT_VOID
+
+ # integers
+ rt = idaapi.get_base_type(tinfo.get_realtype())
+ return rt >= idaapi.BT_INT8 and rt <= idaapi.BT_INT
+
+
+# should we rename a field, avoid renaming user-provided fields
+def should_field_name_be_replaced(offset: int, old_name: str, new_name: str) -> bool:
+ default_names = { # record of preferred default names
+ "": -1,
+ generation.unk_data_field_t.get_default_name(offset): 0,
+ generation.field_t.get_default_name(offset): 1,
+ generation.ptr_field_t.get_default_name(offset): 2,
+ generation.fct_ptr_field_t.get_default_name(offset): 3,
+ generation.struc_ptr_field_t.get_default_name(offset): 3,
+ generation.vtbl_ptr_field_t.get_default_name(offset): 4,
+ }
+
+ old_name_score = default_names.get(old_name, 5)
+ new_name_score = default_names.get(new_name, 5)
+ return new_name_score > old_name_score
diff --git a/symless/generation/__init__.py b/symless/generation/__init__.py
index 0dc9072..cc80ef8 100644
--- a/symless/generation/__init__.py
+++ b/symless/generation/__init__.py
@@ -1,5 +1,6 @@
import collections
-from typing import Collection, Iterator, Optional, Set, Tuple, Union
+from math import log2
+from typing import Any, Collection, Generator, List, Optional, Set, Tuple, Union
import idaapi
@@ -27,14 +28,14 @@ def __init__(
self.name: Optional[str] = None # field's name
self.owner: Optional[structure_t] = None # structure this field belongs to
- # get field's default name
- def get_default_name(self) -> str:
- return f"field_{self.offset:08x}"
+ # default name for the fields of this class, at given offset
+ def get_default_name(off: int) -> str:
+ return f"field_{off:08x}"
# get field's name
def get_name(self) -> str:
if self.name is None:
- return self.get_default_name()
+ return self.__class__.get_default_name(self.offset)
return self.name
# compute wished field's name using symbols
@@ -58,8 +59,19 @@ def get_comment(self) -> Optional[str]:
return None
# get field's type
- def get_type(self) -> Optional[idaapi.tinfo_t]:
- return None
+ def get_type(self) -> idaapi.tinfo_t:
+ if self.size not in (1, 2, 4, 8, 16):
+ raise RuntimeError(f"Unexpected {self} size")
+
+ # try the cool kids types
+ bt = ("uint8_t", "uint16_t", "uint32_t", "uint64_t", "uint128_t")[int(log2(self.size))]
+ t = ida_utils.get_local_type(bt)
+ if t is not None:
+ return t
+
+ # resort to verbose types
+ bt = (idaapi.BT_INT8, idaapi.BT_INT16, idaapi.BT_INT32, idaapi.BT_INT64, idaapi.BT_INT128)[int(log2(self.size))]
+ return ida_utils.get_basic_type(bt | idaapi.BTMT_USIGNED)
# do we have information on this field's type
def has_type(self) -> bool:
@@ -95,61 +107,106 @@ def replace(self, other: "field_t") -> bool:
utils.g_logger.warning(
f"Can not decide between fields set in {self.block.get_owner().entry_id()} and {other.block.get_owner().entry_id()} for structure {self.owner.get_name()}"
)
- return True
+ return False
def __str__(self) -> str:
- return f"{self.__class__.__name__} (0x{self.offset:x}:0x{self.size:x}), name: {self.get_name()}"
+ return f"{self.get_name()}[{self.offset:#x}:{self.size:x}]"
+
+
+# class for fields of unknown size
+# i.e we know the address of the field was used, no idea what's in there
+class unk_data_field_t(field_t):
+ def __init__(self, offset, block: model.block_t):
+ super().__init__(offset, 1, None, block)
+
+ def get_default_name(off: int) -> str:
+ return f"buff_{off:08x}"
+
+ def replace(self, other: "field_t") -> bool:
+ return True
+
+ # char[1]
+ def get_type(self) -> idaapi.tinfo_t:
+ t = idaapi.tinfo_t()
+ a = idaapi.array_type_data_t()
+ a.elem_type = ida_utils.get_basic_type(idaapi.BT_INT8 | idaapi.BTMT_CHAR)
+ a.nelems = 1
+ t.create_array(a)
+ return t
# field typed with an unknown pointer
class ptr_field_t(field_t):
- def __init__(self, offset: int, flow: model.entry_t, block: model.block_t):
+ def __init__(self, value: Any, offset: int, flow: model.entry_t, block: model.block_t):
super().__init__(offset, ida_utils.get_ptr_size(), flow, block)
+ self.value = value # for base ptr_field_t, an integer value
- def get_default_name(self) -> str:
- return f"ptr_{self.offset:08x}"
+ def get_default_name(off: int) -> str:
+ return f"ptr_{off:08x}"
def has_type(self) -> bool:
return True
+ # guess type from pointed data type
+ def get_type(self) -> idaapi.tinfo_t:
+ if not idaapi.is_mapped(self.value):
+ return ida_utils.void_ptr()
+
+ tinfo = idaapi.tinfo_t()
+ if not idaapi.get_tinfo(tinfo, self.value):
+ return ida_utils.void_ptr()
+
+ tinfo.create_ptr(tinfo)
+ return tinfo
+
# field typed with a function pointer
class fct_ptr_field_t(ptr_field_t):
def __init__(
self,
- fct: model.function_t,
+ fct_ea: int,
offset: int,
flow: model.entry_t,
block: model.block_t,
):
- super().__init__(offset, flow, block)
- self.fct = fct # pointed function
+ super().__init__(fct_ea, offset, flow, block)
- def get_default_name(self) -> str:
- return f"method_{self.offset:08x}"
+ def get_default_name(off: int) -> str:
+ return f"method_{off:08x}"
def preferred_name(self) -> Optional[str]:
- signature = ida_utils.demangle_ea(self.fct.ea)
- return None if len(signature) == 0 else symbols.method_name_from_signature(signature)
+ signature = ida_utils.demangle_ea(self.value)
+ if len(signature) == 0:
+ return None
+
+ simple = symbols.method_name_from_signature(signature)
+
+ # as much as we would love to use '~' in dtor names, IDA does not really support it
+ # it can cause problems when applying stroff & xrefs
+ if simple[0] == "~":
+ simple = "%s_dtor%s" % (simple[1:], "" if self.offset == 0 else f"_{(self.offset//self.size):x}")
+
+ return simple.strip("~")
def get_comment(self) -> str:
- return idaapi.get_name(self.fct.ea)
+ return f"{self.value:#x}"
- def get_type(self) -> Optional[idaapi.tinfo_t]:
- func_tinfo, func_data = ida_utils.get_or_create_fct_type(self.fct.ea, self.fct.get_ida_cc())
+ def get_type(self) -> idaapi.tinfo_t:
+ func_tinfo, func_data = ida_utils.get_or_create_fct_type(self.value)
# owner is a vtable, make sure to type method's 'this' argument
- if isinstance(self.owner, vtable_t):
+ if isinstance(self.owner, vtable_struc_t):
this, shift = self.owner.get_class()
this_tinfo = this.find_ptr_tinfo()
ida_utils.set_function_argument(func_data, 0, this_tinfo, shift, this_tinfo, "this")
func_tinfo.create_func(func_data)
- func_tinfo.create_ptr(func_tinfo)
- return func_tinfo
+ if func_tinfo.create_ptr(func_tinfo):
+ return func_tinfo
+ return ida_utils.void_ptr() # default to void*
def match(self, other: "fct_ptr_field_t") -> bool:
- return self.fct == other.fct
+ return self.value == other.value
# field typed with a structure pointer
@@ -161,17 +218,16 @@ def __init__(
flow: model.entry_t,
block: model.block_t,
):
- super().__init__(offset, flow, block)
- self.value = ep # ep written in this field
+ super().__init__(ep, offset, flow, block)
- def get_default_name(self) -> str:
- return f"struc_{self.offset:08x}"
+ def get_default_name(off: int) -> str:
+ return f"struc_{off:08x}"
# get structure this field points to
def get_structure(self) -> Tuple[int, "structure_t"]:
return self.value.get_structure()
- def get_type(self) -> Optional[idaapi.tinfo_t]:
+ def get_type(self) -> idaapi.tinfo_t:
shift, struc = self.get_structure()
tinfo = struc.find_ptr_tinfo()
@@ -213,41 +269,45 @@ def __init__(
):
self.add_vtable(type.entry, self.in_order)
- def get_default_name(self) -> str:
- return "%s%s" % (idaapi.VTBL_MEMNAME, "" if self.offset == 0 else f"_{self.offset:08x}")
+ def get_default_name(off: int) -> str:
+ return "%s%s" % (idaapi.VTBL_MEMNAME, "" if off == 0 else f"_{off:08x}")
def get_comment(self) -> str:
_, vtbl = self.get_structure()
return vtbl.get_name()
# add given vtable to vtables values list
- # is as_latest is set, consider it to be effective field's value
+ # if as_latest is set, consider it to be effective field's value
# else use less derived vtable between first & last added
- def add_vtable(self, vtbl: model.vtbl_entry_t, as_latest: bool) -> bool:
+ def add_vtable(self, vtbl: model.vtbl_entry_t, as_latest: bool):
if vtbl in self.values: # already encountered
- return False
+ return
self.values.append(vtbl)
- self.value = vtbl
-
- if not as_latest and self.values[0].is_most_derived(vtbl): # keep first vtbl set
- self.value = self.values[0]
- return False
-
- return True
+ self.value = vtbl if as_latest else vtbl.get_most_derived(self.values[0])
def replace(self, other: "field_t") -> bool:
if not isinstance(other, vtbl_ptr_field_t):
return False
- # different data flow, do not take into account
- if other.flow != self.flow:
- return False
+ # from same data flow, take latest vtable into account
+ if other.flow == self.flow:
+ old_vtbl = self.value
+ new_vtbl = other.value
+
+ # keep all info into new field
+ other.values = self.values
+ other.value = self.value
+ other.add_vtable(new_vtbl, other.in_order)
- if self.add_vtable(other.value, self.in_order):
- other.values = self.values # keep vtable record in new field
+ utils.g_logger.debug(
+ f"__vftable_{self.offset:#x} selecting vtbl {other.value.ea:#x} between ({old_vtbl.ea:#x}, {new_vtbl.ea:#x}) (in order {other.in_order})"
+ )
return True
+ # else: different data flow not taken into account
+ # effective vtable should have been found in this flow
+
return False
@@ -257,7 +317,7 @@ def __init__(self, sid: int):
self.sid = sid # model sid
self.size = -1 # structure size, if known
self.ea = idaapi.BADADDR
- self.ida_sid = idaapi.BADADDR # associated IDA struc sid
+ self.ida_tid = idaapi.BADADDR # associated IDA struc tid
self.fields: dict[int, field_t] = dict() # structure's members
self.range: Collection[tuple[int, int]] = list() # structure's ranges occupied by fields (offset, size)
@@ -268,19 +328,29 @@ def __init__(self, sid: int):
# records (shift, entry), shift is used when entry is a shift ptr on our strucs
self.root_eps: Set[Tuple[int, model.entry_t]] = set()
+ # is the structure associated with some xrefs
+ # if not, no need to generate it
+ self.has_xrefs = False
+
+ # force this struc generation into IDA database
+ self.force_generation = False
+
def set_size(self, size: int):
self.size = size
- # size of struc
- # if unknown, get current size from defined fields
- def get_size(self) -> int:
- if self.size >= 0:
- return self.size
+ # structure size from its last field
+ def get_size_from_fields(self) -> int:
if len(self.range) == 0:
return 0
last = self.range[-1]
return last[0] + last[1]
+ # structure size, known (malloc) or from the fields we found
+ def get_size(self) -> int:
+ if self.size >= 0:
+ return self.size
+ return self.get_size_from_fields()
+
# add a field to the structure
# solver_cb callback used to resolve overlapping fields
def set_field(self, field: field_t, solver_cb) -> bool:
@@ -346,13 +416,23 @@ def set_field(self, field: field_t, solver_cb) -> bool:
return True
+ # get the field that occupies given offset
+ def has_field_at(self, offset: int) -> Optional[field_t]:
+ i = 0
+ while i < len(self.range) and self.range[i][0] <= offset:
+ if self.range[i][0] + self.range[i][1] > offset:
+ return self.fields[self.range[i][0]]
+ i += 1
+ return None
+
+ # get field starting at given offset
def get_field(self, offset: int) -> Optional[field_t]:
try:
return self.fields[offset]
except KeyError:
return None
- def get_fields(self) -> Iterator[field_t]:
+ def get_fields(self) -> Generator[field_t, None, None]:
for field in self.fields.values():
yield field
@@ -367,39 +447,41 @@ def associate_root(self, entry: model.entry_t, shift: int):
entry.set_structure(shift, self)
# get structure's entries in the data flow
- def associated_root(self) -> Iterator[Tuple[int, model.entry_t]]:
+ def associated_root(self) -> Generator[Tuple[int, model.entry_t], None, None]:
for shift, root in self.root_eps:
yield shift, root
# get the flow of nodes traveled by the structure
# yields (root node, current shift, current block)
- def node_flow(self) -> Iterator[Tuple[model.entry_t, int, model.block_t]]:
- for initial_shift, root in self.associated_root():
- for shift, node in root.node_flow():
- yield (root, initial_shift + shift, node)
+ def node_flow(self, all_roots: bool = True) -> Generator[Tuple[model.entry_t, model.block_t, int], None, None]:
+ for initial_shift, initial_root in self.associated_root():
+ for root, node, shift in model.flow_from_root(initial_root, all_roots):
+ yield (root, node, shift + initial_shift)
# find existing IDA structure that this model represents
def find_existing(self) -> int:
- self.ida_sid = idaapi.get_struc_id(self.get_name())
- return self.ida_sid
+ self.ida_tid = existing.find_existing_structure(self.get_name())
+ return self.ida_tid
# set existing IDA struc represented by this model
- def set_existing(self, sid: int):
- self.ida_sid = sid
+ def set_existing(self, tid: int):
+ self.ida_tid = tid
# get IDA tinfo_t representing our structure
# the structure must have been created before
- def find_tinfo(self) -> Optional[idaapi.tinfo_t]:
- if self.ida_sid == idaapi.BADADDR:
+ def find_tinfo(self) -> idaapi.tinfo_t:
+ if self.ida_tid == idaapi.BADADDR:
raise Exception(f"find_tinfo on {self.get_name()} failed, no IDA structure associated")
- return ida_utils.tinfo_from_stuc(self.ida_sid)
+
+ tif = idaapi.tinfo_t()
+ tif.get_type_by_tid(self.ida_tid) # should return True
+ return tif
# get a struc pointer tinfo
- def find_ptr_tinfo(self) -> Optional[idaapi.tinfo_t]:
- simple = self.find_tinfo()
- if simple is not None:
- simple.create_ptr(simple)
- return simple
+ def find_ptr_tinfo(self) -> idaapi.tinfo_t:
+ t = self.find_tinfo()
+ t.create_ptr(t)
+ return t
def get_name(self) -> str:
if self.name is not None:
@@ -419,7 +501,7 @@ def compute_names(self):
field.set_name(taken)
# comment associated to structure
- def get_comment(self) -> str:
+ def get_comment(self) -> Optional[str]:
if not config.g_settings.debug:
return None
@@ -430,7 +512,25 @@ def get_comment(self) -> str:
# do we generate this structure in IDA
def relevant(self) -> bool:
- return True
+ return self.force_generation or (
+ self.has_xrefs
+ and len(self.fields) > 0
+ and ( # a struc without xref is useless
+ len(self.fields) > 1
+ or self.get_size_from_fields() > ida_utils.get_ptr_size() # more than 1 field - not a buffer
+ or self.fields[ # unique field not at off 0 - not a buffer
+ self.range[0][0]
+ ].has_type() # unique field has a relevant type
+ )
+ )
+
+ # do we need to apply the __cppobj flag
+ def is_cppobj(self) -> bool:
+ return isinstance(self.get_field(0), vtbl_ptr_field_t)
+
+ # do we need to apply the VFT flag
+ def is_vtable(self) -> bool:
+ return False
def __eq__(self, other) -> bool:
return isinstance(other, structure_t) and self.sid == other.sid
@@ -440,7 +540,7 @@ def __hash__(self) -> int:
# model of a vtable
-class vtable_t(structure_t):
+class vtable_struc_t(structure_t):
def __init__(self, sid: int):
super().__init__(sid)
self.owning_class: Optional[Tuple[structure_t, int]] = None # class owning this vtable, with associated offset
@@ -456,9 +556,9 @@ def get_comment(self) -> str:
# find existing vtable structure from typed vtable
def find_existing(self) -> int:
- self.ida_sid = existing.find_existing_vtable(self.ea)
- if self.ida_sid != idaapi.BADADDR:
- return self.ida_sid
+ self.ida_tid = existing.find_existing_vtable(self.ea)
+ if self.ida_tid != idaapi.BADADDR:
+ return self.ida_tid
return super().find_existing()
@@ -472,14 +572,17 @@ def get_class(self) -> Tuple[structure_t, int]:
def relevant(self) -> bool:
return self.owning_class is not None
+ def is_vtable(self) -> bool:
+ return True
+
# empty structure model from associated entry
def empty_model_from_ep(entry: model.entry_t) -> structure_t:
if isinstance(entry, model.vtbl_entry_t):
- return vtable_t(entry.struc_id)
+ return vtable_struc_t(entry.struc_id)
struc = structure_t(entry.struc_id)
- if isinstance(entry, model.ret_entry_t): # known size structure
+ if isinstance(entry, model.alloc_entry_t): # known size structure
struc.set_size(entry.size)
return struc
@@ -495,7 +598,7 @@ def __init__(self, merge: structure_t):
# record of all structures to be built
class structure_record_t:
def __init__(self, entries: model.entry_record_t):
- self.structures: list[Union[structure_t, merge_t]] = [None for _ in range(entries.structures_count())]
+ self.structures: List[Union[structure_t, merge_t]] = [None for _ in range(entries.structures_count())]
# fill structures array
for entry in entries.get_entries():
@@ -524,7 +627,9 @@ def get_structure(self, struc: structure_t) -> structure_t:
st = self._get_structure(st.merge_id)
return st
- def get_structures(self, cls: type = structure_t, include_discarded: bool = True) -> Iterator[structure_t]:
+ def get_structures(
+ self, cls: type = structure_t, include_discarded: bool = True
+ ) -> Generator[structure_t, None, None]:
for struc in self.structures:
if isinstance(struc, cls) and (include_discarded or struc.relevant()):
yield struc
diff --git a/symless/generation/generate.py b/symless/generation/generate.py
index e85540e..5182ccd 100644
--- a/symless/generation/generate.py
+++ b/symless/generation/generate.py
@@ -1,68 +1,84 @@
+from typing import Dict, Tuple
+
import idaapi
+import idautils
+import idc
import symless.allocators as allocators
import symless.existing as existing
import symless.model.entrypoints as entrypoints
import symless.symbols as symbols
+import symless.utils.ida_utils as ida_utils
from symless.generation import *
+# folder in local types listing, to store symless generated types
STRUC_DIR = "Symless"
# make symless structures directory
def make_structures_dir():
- if not ida_utils.can_create_folder():
+ root = ida_utils.get_local_types_folder()
+ if root is None:
return
- root = idaapi.get_std_dirtree(idaapi.DIRTREE_STRUCTS)
err = root.mkdir(STRUC_DIR)
if err not in (idaapi.DTE_OK, idaapi.DTE_ALREADY_EXISTS):
- utils.g_logger.error(f'Could not create {STRUC_DIR} structures directory: "{root.errstr(err)}"')
+ utils.g_logger.error(f'Could not create {STRUC_DIR} local types folder: "{root.errstr(err)}"')
# create an empty IDA structure used to contain given struc
def make_IDA_structure(struc: structure_t):
- if struc.ida_sid != idaapi.BADADDR:
+ if struc.ida_tid != idaapi.BADADDR:
return
name = struc.get_name()
# check for existing struc
- ida_sid = struc.find_existing()
- if ida_sid != idaapi.BADADDR:
- utils.g_logger.info(f'Re-using existing structure for model "{name}"')
+ ida_tid = struc.find_existing()
+ if ida_tid != idaapi.BADADDR:
+ utils.g_logger.info(f'Re-using existing structure (tid {ida_tid:#x}) for model "{name}"')
return
# create new structure
- struc.set_existing(idaapi.add_struc(-1, name, False))
- if struc.ida_sid == idaapi.BADADDR:
+ struc.set_existing(idc.add_struc(idaapi.BADADDR, name, False))
+ if struc.ida_tid == idaapi.BADADDR:
utils.g_logger.error(f'Could not create empty structure "{name}"')
+ return
- elif ida_utils.can_create_folder():
- # move structure to symless dir
- root = idaapi.get_std_dirtree(idaapi.DIRTREE_STRUCTS)
- err = root.rename(name, f"{STRUC_DIR}/{name}")
- if err != idaapi.DTE_OK:
- utils.g_logger.warning(
- f'Could not move structure "{name}" into {STRUC_DIR} directory: "{root.errstr(err)}"'
- )
+ # move it to symless folder
+ root = ida_utils.get_local_types_folder()
+ if root is None:
+ return
+
+ err = root.rename(name, f"{STRUC_DIR}/{name}")
+ if err != idaapi.DTE_OK:
+ utils.g_logger.warning(f'Could not move structure "{name}" into {STRUC_DIR} directory: "{root.errstr(err)}"')
-# do we decide to type IDA base we given entry data
+# do we want to type IDA base with given entry data
def should_type_entry(entry: entrypoints.entry_t, ctx: entrypoints.context_t) -> bool:
- # root are always right
+ # root is always right
if entry.is_root():
return True
shift, struc = entry.get_structure()
+ if not struc.relevant():
+ return False # structure will not be generated
- # do not overwrite typing set by user on entry's operands
- for ea, n, _ in entry.get_operands():
- if existing.has_op_stroff(ea, n):
- return False
+ struc_tif = struc.find_tinfo()
+
+ # do not overwrite typing set by user on operands
+ # TODO we should only check the operand we will type
+ # for now check all operands on the instructions about to be typed
+ # assume only one operand per instruction can be typed with a struc path (not sure if always true)
+ for ea, _, _ in entry.get_operands():
+ for n in range(idaapi.UA_MAXOP):
+ # operand was typed with a different structure than ours, stop
+ if existing.get_op_stroff(ea, n) not in (idaapi.BADADDR, struc_tif.get_tid()):
+ return False
# always type with vtbl, no matter its size
- if isinstance(struc, vtable_t):
+ if isinstance(struc, vtable_struc_t):
return True
# arguments entries special case
@@ -87,9 +103,9 @@ def should_type_entry(entry: entrypoints.entry_t, ctx: entrypoints.context_t) ->
# update given function's returned type with the given entry
-def type_function_return(fct: entrypoints.function_t, entry: entrypoints.entry_t):
+def type_function_return(fct: entrypoints.prototype_t, entry: entrypoints.entry_t):
# entry is not returned, exit
- if fct.get_ret() != entry.id:
+ if fct.get_ret() != entry:
return
shift, struc = entry.get_structure()
@@ -98,8 +114,8 @@ def type_function_return(fct: entrypoints.function_t, entry: entrypoints.entry_t
if shift != 0:
return
- func_tinfo, func_data = ida_utils.get_or_create_fct_type(fct.ea, fct.get_ida_cc())
- if not existing.can_type_be_replaced(func_data.rettype):
+ func_tinfo, func_data = ida_utils.get_or_create_fct_type(fct.ea)
+ if not existing.should_arg_type_be_replaced(func_data.rettype):
return
tinfo = struc.find_ptr_tinfo()
@@ -112,18 +128,18 @@ def type_function_return(fct: entrypoints.function_t, entry: entrypoints.entry_t
# update function's type with given arg entrypoint
-def type_function_argument(fct: entrypoints.function_t, arg: entrypoints.entry_t):
+def type_function_argument(fct: entrypoints.prototype_t, arg: entrypoints.entry_t):
if not isinstance(arg, entrypoints.arg_entry_t):
return
idx = arg.index
- if idx >= fct.get_nargs():
- return
- func_tinfo, func_data = ida_utils.get_or_create_fct_type(fct.ea, fct.get_ida_cc())
+ func_tinfo, func_data = ida_utils.get_or_create_fct_type(fct.ea)
+ if idx >= func_data.size():
+ return
# do not replace existing (complex) type
- if idx < func_data.size() and not existing.can_type_be_replaced(func_data[idx].type):
+ if not existing.should_arg_type_be_replaced(func_data[idx].type):
return
shift, struc = arg.get_structure()
@@ -145,11 +161,72 @@ def type_function_argument(fct: entrypoints.function_t, arg: entrypoints.entry_t
utils.g_logger.info(f"Typing fct_0x{fct.ea:x} arg_{idx} with {struc.get_name()} shifted by 0x{shift:x}")
-# Apply struc type on operand
-def set_operand_type(ea: int, n: int, sid: int, shift: int):
- path = idaapi.tid_array(1)
- path[0] = sid
- idaapi.op_stroff(ea, n, path.cast(), 1, shift)
+# which op of the given insn should we type with a structure path
+# for given reg and given field offset
+# returns op and shift to apply
+def find_op_for_stroff(insn: idaapi.insn_t, regid: int, off: int) -> Tuple[Optional[idaapi.op_t], int]:
+ for op in ida_utils.get_insn_ops(insn):
+ # disp/phrase with regid for base register, this is what we want to type
+ # we assume they should not be more than one disp/phrase op per insn
+ if op.type in (idaapi.o_phrase, idaapi.o_displ) and op.phrase == regid:
+ # compute shift to apply to type with member at given offset
+ displ = op.addr if op.type == idaapi.o_displ else 0 # signed int32
+ shift = utils.to_c_integer(off - displ, 4)
+
+ return (op, shift)
+
+ # immediate operand, preceded regid
+ # this must be an arithmetic operation on our struc ptr, we must type the immediate value
+ # this assumes the src reg is before the immediate, which is the case on IDA disass for arm and x64
+ if (
+ op.type == idaapi.o_imm
+ and op.n > 0
+ and insn.ops[op.n - 1].type == idaapi.o_reg
+ and insn.ops[op.n - 1].reg == regid
+ ):
+ imm_size = idaapi.get_dtype_size(op.dtype)
+
+ displ = op.value
+ shift = utils.to_c_integer(off - displ, imm_size)
+
+ return (op, shift)
+
+ return (None, 0)
+
+
+# type operand with the given "struct offset"
+def apply_stroff_to_op(ea: int, regid: int, struc: idaapi.tinfo_t, off: int):
+ insn = idaapi.insn_t()
+ if idaapi.decode_insn(insn, ea) == 0:
+ return
+
+ udm = idaapi.udm_t()
+ udm.offset = off * 8
+ mid = struc.get_udm_tid(ida_utils.find_udm_wrap(struc, udm))
+ path = idaapi.tid_array(idaapi.MAXSTRUCPATH)
+
+ op, shift = find_op_for_stroff(insn, regid, off)
+ if op is None:
+ utils.g_logger.warning(f"No op to apply stroff for {ea:#x} {idaapi.get_reg_name(regid,8)}({regid:#x})")
+ return
+
+ # type operand with struc path
+ path[0] = struc.get_tid()
+ path[1] = mid
+ idaapi.op_stroff(ea, op.n, path.cast(), 2, shift)
+
+ idaapi.auto_wait() # let IDA digest
+
+ # IDA 8.4: in some cases op_stroff does not set the right struc path
+ # instead of '[#struc.field_0]' we end up with '[#struc]'
+ # thus missing an xref on field_0 for the instruction
+ # this "fix" should force the xref
+ if mid not in idautils.DataRefsFrom(ea):
+ path[0] = mid
+ idaapi.op_stroff(ea, op.n, path.cast(), 1, shift) # type op
+ idaapi.add_dref(ea, mid, idaapi.dr_I | idaapi.XREF_USER) # force xref
+
+ utils.g_logger.debug(f"Typing op {ea:#x} {op.n} with stroff {path[0]:#x}:{shift:#x}")
# type IDA base with data from given entrypoint
@@ -158,19 +235,29 @@ def type_entry(entry: entrypoints.entry_t, ctx: entrypoints.context_t):
utils.g_logger.debug(f"Not typing database with {entry.entry_id()} data")
return
- utils.g_logger.debug(f"Typing database with {entry.entry_id()} data")
-
# make sure the associated structure exists in IDA
shift, struc = entry.get_structure()
- if struc.ida_sid == idaapi.BADADDR:
- utils.g_logger.error(
+ if struc.ida_tid == idaapi.BADADDR:
+ utils.g_logger.warning(
f'Structure "{struc.get_name()}" was not generated, preventing from typing {entry.entry_id()}'
)
return
+ struc_tif = struc.find_tinfo()
+
+ utils.g_logger.debug(f"Typing database with {entry.entry_id()} data")
+
# type disassembly operands
- for ea, n, off in entry.get_operands():
- set_operand_type(ea, n, struc.ida_sid, shift + off)
+ for ea, regid, offs in entry.get_operands():
+ apply_stroff_to_op(ea, regid, struc_tif, shift + offs[0])
+
+ # multiple fields referenced by one instruction, add xrefs on additional fields
+ for i in range(1, len(offs)):
+ udm = idaapi.udm_t()
+ udm.offset = (shift + offs[i]) * 8
+ mid = struc_tif.get_udm_tid(ida_utils.find_udm_wrap(struc_tif, udm))
+ idaapi.add_dref(ea, mid, idaapi.dr_I | idaapi.XREF_USER)
+ utils.g_logger.debug(f"Adding xref for field {struc_tif.get_tid():#x}:{(shift + offs[i]):#x} on {ea:#x}")
# type containing function
fct_ea = entry.get_function()
@@ -184,92 +271,145 @@ def type_entry(entry: entrypoints.entry_t, ctx: entrypoints.context_t):
type_function_return(fct, entry)
-# Set type & rename memory allocators if needed
-def type_allocator(alloc: allocators.allocator_t, ctx: entrypoints.context_t):
+# set type & rename memory allocators if needed
+def type_allocator(alloc: allocators.allocator_t):
# give a default name
if not symbols.has_relevant_name(alloc.ea):
idaapi.set_name(alloc.ea, alloc.get_name())
- fct = ctx.get_function(alloc.ea)
-
- # avoid function pointer
- # TODO: be able to type them
- func_tinfo = idaapi.tinfo_t()
- idaapi.get_tinfo(func_tinfo, fct.ea)
- if func_tinfo.is_ptr():
+ # set function type
+ func_tinfo, func_data = ida_utils.get_or_create_fct_type(alloc.ea)
+ if func_tinfo.is_ptr(): # avoid function pointers
return
- # set function type
- func_tinfo, func_data = ida_utils.get_or_create_fct_type(fct.ea, fct.get_ida_cc())
alloc.make_type(func_data)
if func_tinfo.create_func(func_data):
- idaapi.apply_tinfo(fct.ea, func_tinfo, idaapi.TINFO_DEFINITE)
+ idaapi.apply_tinfo(alloc.ea, func_tinfo, idaapi.TINFO_DEFINITE)
+
+ utils.g_logger.info(f"Typing allocator_{alloc.ea:x} ({alloc.get_name()})")
+
+
+# apply __cppobj & VFT flags
+def apply_udt_flags(struc: structure_t, tinfo: idaapi.tinfo_t):
+ taudt = idaapi.TAUDT_CPPOBJ if struc.is_cppobj() else 0
+ taudt |= idaapi.TAUDT_VFTABLE if struc.is_vtable() else 0
+ if taudt == 0:
+ return
- utils.g_logger.info(f"Typing allocator_{fct.ea:x} ({alloc.get_name()})")
+ # apply flags to tinfo
+ details = idaapi.udt_type_data_t()
+ tinfo.get_udt_details(details)
+ details.taudt_bits |= taudt
+ tinfo.create_udt(details)
+
+
+# add given field to given IDA structure
+def add_field_to_IDA_struc(struc: idaapi.tinfo_t, field: field_t, updated: Dict[int, Tuple[idaapi.tinfo_t, int]]):
+ bits_offset = field.offset * 8
+ bits_size = field.size * 8
+
+ t_ord = struc.get_ordinal()
+ if t_ord == 0: # ordinal number was lost
+ pass
+
+ elif t_ord not in updated:
+ updated[t_ord] = (struc.copy(), struc.get_size()) # copy or pray
+ existing.remove_padd_fields(struc) # reset gapX fields as padding
+ else:
+ struc, _ = updated[t_ord] # use our updated tinfo, not IDA's
+
+ # search for existing field
+ udm = idaapi.udm_t()
+ udm.offset = bits_offset
+ if ida_utils.find_udm_wrap(struc, udm) == idaapi.BADADDR:
+ pass
+
+ elif udm.is_gap(): # no STRMEM_SKIP_GAPS on IDA 8
+ # we want to add a field beyond the gap, without knowing what's after - abort
+ # a gap should not be followed by another gap
+ if bits_offset + bits_size > udm.offset + udm.size:
+ utils.g_logger.warning(
+ f"Abort making {field} into {struc.get_type_name()}: bigger than gap[{udm.offset//8:#x}:{udm.size//8:x}]"
+ )
+ return
+ udm = idaapi.udm_t() # ignore gap
-# does IDA struc with given sid have a comment
-def has_struc_comment(sid: int) -> bool:
- return idaapi.get_struc_cmt(sid, False) is not None
+ # field is within an embedded structure
+ elif udm.type.is_udt() and (udm.offset + udm.size) >= (bits_offset + bits_size):
+ field.offset = (bits_offset - udm.offset) // 8 # update field_t directly, it is not re-used after
+ return add_field_to_IDA_struc(udm.type, field, updated)
+
+ # existing field with different boundaries
+ elif udm.offset != bits_offset or udm.size != bits_size:
+ utils.g_logger.warning(
+ f"Abort making {field} into {struc.get_type_name()}: conflict with {udm.name}[{udm.offset//8:#x}:{udm.size//8:x}]"
+ )
+ return
+
+ # replace field type if ok to do so
+ ftif = field.get_type()
+ if udm.type.get_realtype() == idaapi.BT_UNK or (
+ existing.should_field_type_be_replaced(udm.type) and not existing.should_field_type_be_replaced(ftif)
+ ):
+ udm.type = ftif
+
+ # set field comment if no existing
+ fcomm = field.get_comment()
+ if fcomm and len(udm.cmt) == 0:
+ udm.cmt = fcomm
+
+ # replace field name if needed
+ fname = field.get_name()
+ if existing.should_field_name_be_replaced(field.offset, udm.name, fname):
+ udm.name = fname
+
+ # add field to struc tinfo
+ udm.offset = bits_offset
+ udm.size = bits_size
+ tcode = struc.add_udm(udm, idaapi.ETF_MAY_DESTROY)
+ if tcode != idaapi.TERR_OK:
+ utils.g_logger.error(
+ f'Failed making {udm.name} (off {udm.offset//8:#x}, size {udm.size//8:#x}, type "{udm.type}") to {struc.get_type_name()}: "{idaapi.tinfo_errstr(tcode)}" ({tcode:#x})'
+ )
# fill IDA structure with given model info
# does not overwrite fields of already existing IDA structure
def fill_IDA_structure(struc: structure_t):
- if struc.ida_sid == idaapi.BADADDR:
+ if struc.ida_tid == idaapi.BADADDR:
utils.g_logger.error(f'Could not generate structure "{struc.get_name()}"')
return
- ida_struc = idaapi.get_struc(struc.ida_sid)
+ struc_tif = struc.find_tinfo()
+
+ # record of structures to update (current struc and its embedded strucs)
+ updated: Dict[int, Tuple[idaapi.tinfo_t, int]] = dict()
+ updated[struc_tif.get_ordinal()] = (struc_tif, struc_tif.get_size())
+ existing.remove_padd_fields(struc_tif)
- # remove padding fields
- existing.remove_padd_fields(ida_struc)
+ # set udt attr
+ apply_udt_flags(struc, struc_tif)
# add fields
- for offset, field in struc.fields.items():
- err = idaapi.add_struc_member(
- ida_struc,
- field.get_name(),
- offset,
- existing.get_data_flags(field.size),
- None,
- field.size,
- )
- if err != idaapi.STRUC_ERROR_MEMBER_OK and err != idaapi.STRUC_ERROR_MEMBER_OFFSET:
- utils.g_logger.error(f"Could not add field_{offset:08x} to structure {struc.get_name()}, error: {err}")
-
- member = idaapi.get_member(ida_struc, offset)
- if member is None:
- continue
-
- # update field's type
- # TODO: do not overwrite existing field's type
- ftype = field.get_type()
- if ftype is not None:
- err = idaapi.set_member_tinfo(ida_struc, member, 0, ftype, idaapi.SET_MEMTI_COMPATIBLE)
- if err != idaapi.SMT_OK:
- utils.g_logger.error(
- f'Could set type of field_{offset:08x} (in {struc.get_name()}) to "{ftype}", error: {err}'
- )
-
- # set field's comment
- # TODO: do not replace old comment
- comment = field.get_comment()
- if comment is not None:
- if not idaapi.set_member_cmt(member, comment, True):
- utils.g_logger.warning(
- f'Could not set comment "{comment}" for member at off 0x{offset} of {struc.get_name()}'
- )
-
- # reset padding fields
- existing.add_padd_fields(ida_struc, struc.size)
+ for field in struc.fields.values():
+ add_field_to_IDA_struc(struc_tif, field, updated)
# set structure's comment
- comment = struc.get_comment()
- if not (has_struc_comment(ida_struc.id) or comment is None):
- if not idaapi.set_struc_cmt(ida_struc.id, comment, False):
- utils.g_logger.warning(f"Could not set comment for {struc.get_name()}")
+ scomm = struc.get_comment()
+ if scomm and struc_tif.get_type_cmt() is None:
+ tcode = struc_tif.set_type_cmt(scomm)
+ if tcode != idaapi.TERR_OK:
+ utils.g_logger.error(
+ f'Failed to set comment for {struc.get_name()}: "{idaapi.tinfo_errstr(tcode)}" ({tcode:#x})'
+ )
+
+ # reset gapX fields on all updated structures + save to IDA
+ while len(updated):
+ t_ord, (tinfo, min_size) = updated.popitem()
+ existing.add_padd_fields(tinfo, min_size)
+ tinfo.set_numbered_type(None, t_ord, idaapi.NTF_REPLACE)
# imports all structures defined into given record into IDA
@@ -286,7 +426,7 @@ def import_structures(record: structure_record_t):
fill_IDA_structure(struc)
# type vtables with vtables structures
- for vtbl in record.get_structures(cls=vtable_t, include_discarded=False):
+ for vtbl in record.get_structures(cls=vtable_struc_t, include_discarded=False):
tinfo = vtbl.find_tinfo()
if not idaapi.apply_tinfo(vtbl.ea, tinfo, idaapi.TINFO_DEFINITE):
utils.g_logger.warning(f"Could not apply type {tinfo} to vtable 0x{vtbl.ea:x}")
@@ -302,4 +442,4 @@ def import_context(context: entrypoints.context_t):
# type allocators
for allocator in context.get_allocators():
- type_allocator(allocator, context)
+ type_allocator(allocator)
diff --git a/symless/generation/structures.py b/symless/generation/structures.py
index 025e37c..111af9f 100644
--- a/symless/generation/structures.py
+++ b/symless/generation/structures.py
@@ -1,4 +1,5 @@
-from typing import Dict, Tuple
+from collections import defaultdict
+from typing import Dict, List, Tuple
import symless.conflict as conflict
import symless.model.entrypoints as entrypoints
@@ -10,48 +11,61 @@
# create a structure's field (fixed) from an entrypoint's field (ambiguous)
# size_solver_cb is used to choose prefered field's size
def make_field(
- var_field: model.field_t,
- shift: int,
- flow: model.entry_t,
- block: model.block_t,
- size_solver_cb,
- ctx: model.context_t,
+ var_field: model.field_t, shift: int, flow: model.entry_t, block: model.block_t, size_solver_cb
) -> "field_t":
var_type = var_field.get_type()
offset = shift + var_field.offset
- if isinstance(var_type, model.ftype_ptr_t): # an unknown pointer
- out = ptr_field_t(offset, flow, block)
- elif isinstance(var_type, model.ftype_struc_t): # a structure pointer
+
+ # unknown pointer
+ if isinstance(var_type, model.ftype_ptr_t):
+ v = var_type.value.get_uval() # value size is assumed to be ptr_size
+ return ptr_field_t(v, offset, flow, block)
+
+ # structure pointer
+ if isinstance(var_type, model.ftype_struc_t):
+ # vtable pointer
if isinstance(var_type.entry, model.vtbl_entry_t):
- out = vtbl_ptr_field_t(list(var_field.type), offset, flow, block)
+ return vtbl_ptr_field_t(list(var_field.type), offset, flow, block)
+
else:
- out = struc_ptr_field_t(var_type.entry, offset, flow, block)
- elif isinstance(var_type, model.ftype_fct_t): # a function pointer
- fea = var_type.value.get_val()
- out = fct_ptr_field_t(ctx.get_function(fea), offset, flow, block)
- else: # default field
- size = size_solver_cb(var_field)
- out = field_t(offset, size, flow, block)
- return out
+ return struc_ptr_field_t(var_type.entry, offset, flow, block)
+
+ # function pointer
+ if isinstance(var_type, model.ftype_fct_t):
+ fea = var_type.value.get_uval()
+ return fct_ptr_field_t(fea, offset, flow, block)
+
+ size = size_solver_cb(var_field)
+ return field_t(offset, size, flow, block)
# fill structures models
-def define_structure(struc: structure_t, ctx: entrypoints.context_t):
- visited = set()
+def define_structure(struc: structure_t):
+ visited: Set[Tuple[model.entry_t, model.block_t, int]] = set()
- for root, shift, node in struc.node_flow(): # every node in struc's flow
+ # get structures fields from associated entries fields
+ for root, node, shift in struc.node_flow():
# do not visit the same node twice, with the same shift
- path_id = (node.get_owner().id, node.id, shift)
+ path_id = (node.get_owner(), node, shift)
if path_id in visited:
continue
visited.add(path_id)
- # compute every field
+ # add all fields from node
for vfield in node.get_fields():
# structure field from entrypoint field
- field = make_field(vfield, shift, root, node, conflict.field_size_solver, ctx)
+ field = make_field(vfield, shift, root, node, conflict.field_size_solver)
struc.set_field(field, conflict.fields_conflicts_solver)
+ # search for xrefs on paddings and create dummy fields for it
+ # this happens when the address of a field is used, without its content beeing accessed
+ for entry, node, shift in visited:
+ for _, _, offs in entry.get_operands():
+ for off in offs:
+ eoff = shift + off
+ if struc.has_field_at(eoff) is None and eoff < struc.get_size():
+ struc.set_field(unk_data_field_t(eoff, node), None)
+
# define which structure an entry is associated to
def associate_entry(
@@ -69,14 +83,14 @@ def associate_entry(
eff_shift, eff_struc = associate_entry(field.value, entries)
utils.g_logger.debug(
- f"Setting {entry.entry_id()} to be a ptr to {eff_struc.get_name()}, shifted by 0x{eff_shift:x}"
+ f"{entry.entry_id()} associated with struc {eff_struc.get_name()} (shift {eff_shift:#x})"
)
entry.set_structure(eff_shift, eff_struc)
else:
# select less derived structure that flew through this ep
- candidates = list()
+ candidates: List[Tuple[structure_t, int] | None] = list()
for shift, parent in entry.get_parents():
pshift, pstruc = associate_entry(parent, entries)
candidates.append((pstruc, shift + pshift))
@@ -95,17 +109,15 @@ def associate_entries(entries: model.entry_record_t):
# compute the owner of each defined vtable
def select_vtables_owners(record: structure_record_t):
- owners: Dict[vtable_t, Collection[Tuple[structure_t, int]]] = dict()
+ owners: Dict[vtable_struc_t, List[Tuple[structure_t, int]]] = defaultdict(list)
# find all conflicts on owners
- for struc in record.get_structures():
+ for struc in record.get_structures(include_discarded=False):
for field in struc.fields.values():
if not isinstance(field, vtbl_ptr_field_t):
continue
_, vtbl = field.get_structure()
- if vtbl not in owners:
- owners[vtbl] = list()
owners[vtbl].append((struc, field.offset))
# select owner among candidates for each vtable
@@ -121,7 +133,7 @@ def define_structures(ctx: entrypoints.context_t) -> structure_record_t:
# make strucs models and generate empty structures
for struc in record.get_structures():
- define_structure(struc, ctx)
+ define_structure(struc)
# define which structure to be associated to each entry
associate_entries(entries)
diff --git a/symless/main.py b/symless/main.py
index 98c075c..62daff2 100644
--- a/symless/main.py
+++ b/symless/main.py
@@ -18,6 +18,11 @@ def start_analysis(config_path):
utils.g_logger.error("Unsupported arch (%s) or filetype" % arch.get_proc_name())
return
+ # check that the decompiler exists
+ if not idaapi.init_hexrays_plugin():
+ utils.g_logger.error("You do not have the decompiler for this architecture")
+ return
+
# rebase if required
if config.g_settings.rebase_db:
err = idaapi.rebase_program(-idaapi.get_imagebase(), idaapi.MSF_FIXONCE)
@@ -45,12 +50,12 @@ def start_analysis(config_path):
model.analyze_entrypoints(ctx)
utils.print_delay("Entrypoints graph built", start, time.time())
- # structure generation
+ # build structures
start = time.time()
strucs = structures.define_structures(ctx)
utils.print_delay("Structures defined", start, time.time())
- # structure generation
+ # import structures in IDA
start = time.time()
generate.import_structures(strucs)
generate.import_context(ctx)
diff --git a/symless/model/__init__.py b/symless/model/__init__.py
index 6b7f789..6ef319e 100644
--- a/symless/model/__init__.py
+++ b/symless/model/__init__.py
@@ -1,7 +1,17 @@
-import collections
-import copy
-from typing import Any, Collection, Dict, Iterator, Optional, Set, Tuple
-
+from collections import defaultdict, deque
+from typing import (
+ Any,
+ Collection,
+ Dict,
+ Generator,
+ Iterator,
+ List,
+ Optional,
+ Set,
+ Tuple,
+)
+
+import ida_hexrays
import idaapi
import symless.allocators as allocators
@@ -10,6 +20,7 @@
import symless.symbols as symbols
import symless.utils.ida_utils as ida_utils
import symless.utils.utils as utils
+import symless.utils.vtables as vtables
# a field's type & potential value
@@ -17,14 +28,14 @@ class ftype_t:
def __init__(self, value: cpustate.absop_t):
self.value = value
- # should we propagate this type when one of its values is read from a structure's field
+ # should we propagate the field value when read
def should_propagate(self) -> bool:
return False
# get value to use when propagating with cpustate
def get_propagated_value(self) -> cpustate.absop_t:
if self.should_propagate():
- return copy.copy(self.value) # copy to be sure not to mess with arguments tracking
+ return self.value
return None
def __eq__(self, other) -> bool:
@@ -38,7 +49,6 @@ def __str__(self) -> str:
# structure pointer type
-# does not record shifted struc ptr
class ftype_struc_t(ftype_t):
def __init__(self, entry: "entry_t"):
super().__init__(cpustate.sid_t(entry.id))
@@ -68,7 +78,7 @@ class field_t:
def __init__(self, offset: int):
self.offset = offset
self.size: int = 0 # bitfield of possible sizes
- self.type: Collection[ftype_t] = collections.deque() # list of affected types, in propagation's order
+ self.type: deque[ftype_t] = deque() # list of affected types, in propagation's order
# add a type to the field's possible types list
def set_type(self, type: ftype_t):
@@ -86,12 +96,15 @@ def set_size(self, size: int):
# get all possible field's sizes
def get_size(self) -> Collection[int]:
- out = collections.deque()
+ out = deque()
for i in range(8):
if self.size & (1 << i):
out.append(pow(2, i))
return out
+ def __str__(self) -> str:
+ return f"field_{self.offset:#x}:{self.size:#x}"
+
# records the data flow of a structure in a basic block
# since our data flow is flattened, loops & conditions are not taken into account
@@ -99,7 +112,7 @@ def get_size(self) -> Collection[int]:
class block_t:
def __init__(self, owner: "entry_t", id: int = 0):
self.owner = owner
- self.fields: dict[int, field_t] = dict() # fields defined in the block & their types
+ self.fields: Dict[int, field_t] = dict() # fields defined in the block & their types
# block index in owner's blocks list
self.id = id
@@ -175,15 +188,11 @@ def get_field_type(self, offset: int) -> Optional[ftype_t]:
return self.get_field(offset).get_type() if (ftype is None and self.has_field(offset)) else ftype
- # get the flow of blocks following this one
- # yields (shift, block)
- def node_flow(self) -> Iterator[Tuple[int, "block_t"]]:
- yield (0, self)
+ def __eq__(self, other) -> bool:
+ return isinstance(other, block_t) and other.id == self.id
- if self.has_callee():
- shift, callee = self.get_callee()
- for c_shift, c_block in callee.node_flow():
- yield (shift + c_shift, c_block)
+ def __hash__(self) -> int:
+ return self.id
# data flow entrypoints
@@ -196,14 +205,15 @@ class entry_t:
# this type of ep can have children
can_ramificate = True
- def __init__(self, ea: int):
+ def __init__(self, ea: int, sub_ea: int = 0):
self.ea = ea # entry address
+ self.sub_ea = sub_ea # index of the sub minsn the entry is for
self.id = -1 # entry identifier
# for entrypoints defining a structure (root ep)
self.struc_id = -1
- # structure associated to this entrypoint
+ # structure associated with this entrypoint
# the structure we will use to type this ep
self.struc: Optional[generation.structure_t] = None
self.struc_shift = 0
@@ -211,14 +221,15 @@ def __init__(self, ea: int):
# data flow injection parameters
self.to_analyze = True # yet to analyze
- # list of instruction's operands associated with this ep
- self.operands: dict[Tuple[int, int], int] = dict() # (insn.ea, op_index) -> (shift)
+ # list of operands accessing this ep fields
+ # a single op might reference multiple fields (offsets), like in STP, STM arm insns
+ self.operands: Dict[Tuple[int, int], Collection[int]] = defaultdict(list) # (ea, reg_id) -> [offsets]
# list of the entries that can precede this one in a data flow
- self.parents: Collection[Tuple[int, entry_t]] = collections.deque()
+ self.parents: Collection[Tuple[int, entry_t]] = deque()
# list of entries we want to analyze following this one
- self.children: Collection[Tuple[int, entry_t]] = collections.deque()
+ self.children: Collection[Tuple[int, entry_t]] = deque()
# entrypoint size
self.bounds: Optional[Tuple[int, int]] = None
@@ -239,6 +250,7 @@ def has_structure(self) -> bool:
def set_structure(self, shift: int, struc: "generation.structure_t"):
self.struc = struc
self.struc_shift = shift
+ struc.has_xrefs |= self.has_operands()
# get the structure associated with the entry
def get_structure(self) -> Tuple[int, "generation.structure_t"]:
@@ -297,13 +309,16 @@ def get_boundaries(self) -> Tuple[int, int]:
return self.bounds
- # associate operand at (ea, n) to this entrypoint, for given shift
- def add_operand(self, ea: int, n: int, shift: int):
- self.operands[(ea, n)] = shift
+ # associated accessed operand with this ep
+ def add_operand(self, ea: int, off: int, regid: int):
+ self.operands[(ea, regid)].append(off)
- def get_operands(self) -> Iterator[Tuple[int, int, int]]:
- for (ea, n), shift in self.operands.items():
- yield (ea, n, shift)
+ def get_operands(self) -> Generator[Tuple[int, int, Collection[int]], None, None]:
+ for (ea, regid), offs in self.operands.items():
+ yield (ea, regid, offs)
+
+ def has_operands(self) -> bool:
+ return len(self.operands) > 0
# does the given node precede this node in the data flow
def has_parent(self, parent: "entry_t") -> bool:
@@ -339,15 +354,17 @@ def end_block(self, callee: "entry_t", shift: int) -> bool:
# get node's parents
# yields (shift, parent)
- def get_parents(self) -> Iterator[Tuple[int, "entry_t"]]:
+ def get_parents(self) -> Generator[Tuple[int, "entry_t"], None, None]:
for off, p in self.parents:
yield (off, p)
# get node's children
# if all is set, returns following children + end blocks callee children
# else only returns following children
- def get_children(self, all: bool = False) -> Iterator[Tuple[int, "entry_t"]]:
+ def get_children(self, all: bool = False) -> Generator[Tuple[int, "entry_t"], None, None]:
if all:
+ assert self.blocks is not None
+
current = self.blocks
while current.next is not None:
yield current.get_callee()
@@ -356,25 +373,10 @@ def get_children(self, all: bool = False) -> Iterator[Tuple[int, "entry_t"]]:
for off, c in self.children:
yield (off, c)
- # get the flow of blocks following this entrypoint
- # yields (shift, block)
- def node_flow(self) -> Iterator[Tuple[int, block_t]]:
- # flow for entry's blocks
- current = self.blocks
- while current is not None:
- for shift, block in current.node_flow():
- yield (shift, block)
- current = current.next
-
- # flow for entry's children
- for shift, child in self.get_children():
- for c_shift, c_block in child.node_flow():
- yield (shift + c_shift, c_block)
-
# get distance to given child
# assume self is parent of child
def distance_to(self, child: "entry_t") -> int:
- q = collections.deque()
+ q = deque()
q.append((child, 0))
while len(q) > 0:
@@ -389,7 +391,7 @@ def distance_to(self, child: "entry_t") -> int:
# inject entrypoint on given state
# return True if the ep had to be analyzed
- def inject(self, ea: int, state: cpustate.state_t, ctx: "context_t", reset: bool = True) -> bool:
+ def inject(self, state: cpustate.state_t, reset: bool = True) -> bool:
if reset:
self.reset()
had_to = self.to_analyze
@@ -401,7 +403,7 @@ def reset(self):
# reset blocks
self.blocks = block_t(self)
self.cblock = self.blocks
- utils.g_logger.debug(f"Resetting {self.entry_id()}")
+ utils.g_logger.debug(f"Resetting entry {self.entry_id()}")
# get unique key identifying the ep from others
# to be implemented by heirs
@@ -432,12 +434,13 @@ def __hash__(self) -> int:
def __str__(self) -> str:
out = "%s\n" % self.entry_header()
- out += f"\t| Parents: {len([i for i in self.get_parents()])}\n"
+ out += f"\t| Parents: ({', '.join([str(i.id) for i in self.get_parents()])})\n"
if len(self.operands) > 0:
out += "\t| Operands:\n"
- for (ea, op), shift in self.operands.items():
- out += f"\t\t{ida_utils.addr_friendly_name(ea)}, ea: 0x{ea:x}, op: {op}, shift 0x{shift:x}\n"
+ for ea, regid, offs in self.get_operands():
+ for off in offs:
+ out += f"\t\t{ida_utils.addr_friendly_name(ea)}, ea: 0x{ea:x}, reg {idaapi.get_reg_name(regid,8)}({regid:#x}), off {off:#x}\n"
if len(self.children) > 0:
out += "\t| Children:\n"
@@ -447,12 +450,45 @@ def __str__(self) -> str:
return out
+# travel the flows of nodes from given entrypoint
+# yields (flow root, node, shift)
+def flow_from_root(entry: entry_t, all_roots: bool = True) -> Generator[Tuple[entry_t, block_t, int], None, None]:
+ roots: Collection[Tuple[int, entry_t]] = deque()
+ roots.append((0, entry))
+
+ while len(roots) > 0:
+ rshift, root = roots.pop()
+ if all_roots:
+ roots.extend([(i + rshift, j) for i, j in root.get_children()])
+
+ blocks: Collection[int, block_t] = deque()
+ blocks.append((rshift, root.blocks))
+
+ while len(blocks) > 0:
+ bshift, node = blocks.pop()
+ yield root, node, bshift
+
+ # record next block for latter
+ if node.next is not None:
+ blocks.append((bshift, node.next))
+
+ # process blocks from direct function call before
+ if node.has_callee():
+ cshift, callee = node.get_callee()
+ blocks.append((bshift + cshift, callee.blocks))
+
+ # childrens are not in this direct flow (ex: virtual method recorded from vtable load)
+ # process them as differents roots
+ if all_roots:
+ roots.extend([(bshift + cshift + i, j) for i, j in callee.get_children()])
+
+
# entrypoint as a method's argument
class arg_entry_t(entry_t):
inject_before = True
def __init__(self, ea: int, index: int):
- super().__init__(ea)
+ super().__init__(ea, -1)
self.index = index
def get_function(self) -> int:
@@ -465,10 +501,12 @@ def find_name(self) -> Tuple[Optional[str], int]:
fct_name = ida_utils.demangle_ea(self.ea)
return symbols.get_classname_from_ctor(fct_name), 1
- def inject(self, ea: int, state: cpustate.state_t, ctx: "context_t") -> bool:
- had_to = super().inject(ea, state, ctx, False)
- cc = ctx.dflow_info.get_function_cc(ea)
- cpustate.set_argument(cc, state, self.index, cpustate.sid_t(self.id))
+ def inject(self, state: cpustate.state_t) -> bool:
+ had_to = super().inject(state, False)
+
+ vdloc = state.fct.get_argloc(self.index)
+ state.set_var_from_loc(vdloc, cpustate.sid_t(self.id))
+
return had_to
def get_key(self) -> int:
@@ -478,41 +516,48 @@ def entry_id(self) -> str:
return f"ep_0x{self.ea:x}_arg{self.index}"
def entry_header(self) -> str:
- return "Entry[sid=%d], arg %d of ea: 0x%x(%s), [%s]" % (
+ return "Entry[sid=%d], arg %d of %s (0x%x), [%s]" % (
self.id,
self.index,
- self.ea,
ida_utils.addr_friendly_name(self.ea),
+ self.ea,
("TO_ANALYZE" if self.to_analyze else "ANALYZED"),
)
-# entry point in a register
-# as a destination operand (inject_before == False)
-class dst_reg_entry_t(entry_t):
- def __init__(self, ea: int, fct_ea: int, reg: str):
- super().__init__(ea)
- self.reg = reg
+# entry point in a variable (a micro operand)
+# for destination operands (inject_before == False)
+class dst_var_entry_t(entry_t):
+ def __init__(self, ea: int, sub_ea: int, fct_ea: int, mop: ida_hexrays.mop_t):
+ super().__init__(ea, sub_ea)
+ self.mop = ida_hexrays.mop_t(mop) # copy or it gets freed
+ assert self.mop.t in (ida_hexrays.mop_r, ida_hexrays.mop_S)
+
+ if self.mop.t == ida_hexrays.mop_r:
+ self.key = ida_hexrays.get_mreg_name(self.mop.r, ida_utils.get_ptr_size())
+ else:
+ self.key = f"stk:#{self.mop.s.off:x}"
+
self.fct_ea = fct_ea
def get_function(self) -> int:
return self.fct_ea
- def inject(self, ea: int, state: cpustate.state_t, ctx: "context_t") -> bool:
- had_to = super().inject(ea, state, ctx)
- state.set_register_str(self.reg, cpustate.sid_t(self.id))
+ def inject(self, state: cpustate.state_t) -> bool:
+ had_to = super().inject(state)
+ state.set_var_from_mop(self.mop, cpustate.sid_t(self.id))
return had_to
def get_key(self) -> str:
- return self.reg
+ return self.key
def entry_id(self) -> str:
- return f"ep_0x{self.ea:x}_{self.reg}"
+ return f"ep_0x{self.ea:x}_{self.get_key()}"
def entry_header(self) -> str:
- return "Entry[sid=%d], reg %s at ea: 0x%x(%s), [%s]" % (
+ return "Entry[sid=%d], %s at ea: 0x%x(%s), [%s]" % (
self.id,
- self.reg,
+ self.get_key(),
self.ea,
ida_utils.addr_friendly_name(self.ea),
("TO_ANALYZE" if self.to_analyze else "ANALYZED"),
@@ -521,7 +566,7 @@ def entry_header(self) -> str:
# entry point in a register
# as a src operand (inject_before == True)
-class src_reg_entry_t(dst_reg_entry_t):
+class src_reg_entry_t(dst_var_entry_t):
# inject_before needs to be a static member
# because of its use in get_entry_by_key()
# thus two reg_entry_t classes are required
@@ -530,28 +575,27 @@ class src_reg_entry_t(dst_reg_entry_t):
# entry point as a value read from a structure
# can be used to propagate a structure ptr written & read from a structure
-class read_entry_t(dst_reg_entry_t):
+class read_entry_t(dst_var_entry_t):
can_ramificate = False
- def __init__(self, ea: int, fct_ea: int, reg: str, source: entry_t, off: int):
- super().__init__(ea, fct_ea, reg)
+ def __init__(self, ea: int, sub_ea: int, fct_ea: int, mop: ida_hexrays.mop_t, source: entry_t, off: int):
+ super().__init__(ea, sub_ea, fct_ea, mop)
# source ep & offset this ep was read from
self.src = source
self.src_off = off
def entry_id(self) -> str:
- return f"ep_rd_0x{self.ea:x}_{self.reg}"
+ return f"ep_rd_0x{self.ea:x}_{self.get_key()}"
def add_parent(self, parent: "entry_t", shift: int) -> bool:
raise Exception("read_entry_t are not meant to be linked with parents")
-# entry point as a callee ret value
-# with known size (static allocation)
-class ret_entry_t(dst_reg_entry_t):
- def __init__(self, ea: int, fct_ea: int, size: int):
- super().__init__(ea, fct_ea, cpustate.get_default_cc().ret)
+# entry point as an allocation with known size
+class alloc_entry_t(dst_var_entry_t):
+ def __init__(self, ea: int, sub_ea: int, size: int, mba: ida_hexrays.mba_t):
+ super().__init__(ea, sub_ea, mba.entry_ea, ida_hexrays.mop_t(mba.call_result_kreg, ida_utils.get_ptr_size()))
self.size = size
# retrieve name from factory function
@@ -598,39 +642,30 @@ def add_parent(self, parent: entry_t, shift: int) -> bool:
def end_block(self, callee: entry_t, shift: int) -> bool:
return False
- def inject(self, ea: int, state: cpustate.state_t, ctx: "context_t") -> bool:
+ def inject(self, state: cpustate.state_t) -> bool:
raise Exception(f"{self.entry_id()} is not to be injected in the data flow")
# vtable root entry
class vtbl_entry_t(cst_entry_t):
- def __init__(self, ea: int):
- super().__init__(ea)
+ def __init__(self, vtbl: vtables.vtable_t):
+ super().__init__(vtbl.ea)
+ self.vtbl = vtbl
self.reset() # add default block
- self.total_xrefs = 0 # count of xref towards vtable's functions
- # find vtable methods, build the model
- i = 0
+ # make fields
ptr_size = ida_utils.get_ptr_size()
- for fea in ida_utils.vtable_members(ea):
- field = entry_t.add_field(self, i, ptr_size)
+ for i, (fea, _) in enumerate(vtbl.get_members()):
+ field = entry_t.add_field(self, i * ptr_size, ptr_size)
field.set_type(ftype_fct_t(cpustate.mem_t(fea, fea, ptr_size)))
- self.total_xrefs += len(ida_utils.get_data_references(fea))
- i += ptr_size
-
- # is self derived from other
- def is_most_derived(self, other: "vtbl_entry_t") -> bool:
- self_refs, self_size = self.total_xrefs, self.get_boundaries()[1]
- other_refs, other_size = other.total_xrefs, other.get_boundaries()[1]
- if self_size > other_size:
- return True
- if other_size > self_size:
- return False
- if self_refs > other_refs:
- return False
- return True
- def get_key(self) -> Any:
+ # get most derived between self and other
+ def get_most_derived(self, other: "vtbl_entry_t") -> "vtbl_entry_t":
+ if self.vtbl.get_most_derived(other.vtbl) == self.vtbl:
+ return self
+ return other
+
+ def get_key(self) -> Any: # ea is enough to discriminate vtables
return None
def find_name(self) -> Tuple[Optional[str], int]:
@@ -640,7 +675,7 @@ def entry_id(self) -> str:
return f"ep_0x{self.ea:x}_vtbl"
def entry_header(self) -> str:
- return f"Vtable at {ida_utils.demangle_ea(self.ea)}"
+ return f"vftable {ida_utils.demangle_ea(self.ea)}"
# records all entrypoints
@@ -648,11 +683,11 @@ class entry_record_t:
g_next_sid = -1
def __init__(self):
- self.entries_per_sid: list[entry_t] = list() # entry per sid
+ self.entries_per_sid: List[entry_t] = list() # entry per sid
# sorted entries, by ea for quick access
# & by inject_before / inject_after
- self.entries_per_ea: dict[int, Tuple[Collection[entry_t], Collection[entry_t]]] = dict()
+ self.entries_per_ea: dict[Tuple[int, int, bool], Collection[entry_t]] = defaultdict(deque)
# next entry point id
def next_id(self) -> int:
@@ -663,13 +698,12 @@ def structures_count(self) -> int:
# add an entrypoint to the graph
def add_entry(self, entry: entry_t, as_root: bool = False, inc_sid: bool = True) -> entry_t:
- existing = self.get_entry_by_key(entry.ea, entry.__class__, entry.get_key())
+ existing = self.get_entry_by_key(entry.ea, entry.sub_ea, entry.__class__, entry.get_key())
if existing is not None:
return existing
- if entry.ea not in self.entries_per_ea:
- self.entries_per_ea[entry.ea] = (collections.deque(), collections.deque())
- self.entries_per_ea[entry.ea][int(not entry.__class__.inject_before)].append(entry)
+ key = (entry.ea, entry.sub_ea, entry.__class__.inject_before)
+ self.entries_per_ea[key].append(entry)
entry.id = self.next_id()
self.entries_per_sid.append(entry)
@@ -705,15 +739,12 @@ def remove_entry(self, entry: entry_t):
if len(child.parents) == 0:
self.remove_entry(child)
- self.entries_per_ea[entry.ea][int(not entry.__class__.inject_before)].remove(entry)
+ self.entries_per_ea[(entry.ea, entry.sub_ea, entry.__class__.inject_before)].remove(entry)
self.entries_per_sid[entry.id] = None
# all entries to inject at given ea
- def get_entries_at(self, ea: int, inject_after: bool) -> Collection[entry_t]:
- if ea not in self.entries_per_ea:
- return []
-
- return self.entries_per_ea[ea][int(inject_after)]
+ def get_entries_at(self, ea: int, sub_ea: int, inject_before: bool) -> Collection[entry_t]:
+ return self.entries_per_ea.get((ea, sub_ea, inject_before), tuple())
# entry by sid
def get_entry_by_id(self, sid: int) -> Optional[entry_t]:
@@ -722,13 +753,14 @@ def get_entry_by_id(self, sid: int) -> Optional[entry_t]:
return self.entries_per_sid[sid]
# entry by ea, class & unique key identifier
- def get_entry_by_key(self, ea: int, cls: type, key: Any = None) -> Optional[entry_t]:
- if ea not in self.entries_per_ea:
+ def get_entry_by_key(self, ea: int, sub_ea: int, cls: type, key: Any = None) -> Optional[entry_t]:
+ eakey = (ea, sub_ea, cls.inject_before)
+ if eakey not in self.entries_per_ea:
return None
c = filter(
lambda e: isinstance(e, cls) and e.get_key() == key,
- self.entries_per_ea[ea][int(not cls.inject_before)],
+ self.entries_per_ea[eakey],
)
try:
@@ -736,14 +768,14 @@ def get_entry_by_key(self, ea: int, cls: type, key: Any = None) -> Optional[entr
except StopIteration:
return None
- def get_entries(self) -> Iterator[entry_t]:
+ def get_entries(self, analyzed: bool = True) -> Generator[entry_t, None, None]:
for entry in self.entries_per_sid:
- if entry is not None:
+ if entry is not None and not (analyzed and entry.to_analyze):
yield entry
# yield all unexplored entrypoints
# TODO: yield from most interesting function to less (fct having the most entrypoints)
- def next_to_analyze(self) -> Iterator[entry_t]:
+ def next_to_analyze(self) -> Generator[entry_t, None, None]:
current_len = len(self.entries_per_sid)
for i in range(current_len):
if self.entries_per_sid[i].to_analyze:
@@ -756,37 +788,22 @@ def __str__(self) -> str:
return out
-# defines a function
-class function_t:
+# information about a function's prototype
+class prototype_t:
def __init__(self, ea: int):
self.ea = ea
- self.nargs = 0 # arguments count (minimum estimated)
- self.cc = cpustate.get_abi() # guessed calling convention
- # optional entrypoint sid as function's ret value
- self.ret_sid: int = -1
+ # structure returned by function
+ self.ret: Optional[entry_t] = None
# is a virtual method
self.virtual = False
- def set_ret(self, sid: int):
- self.ret_sid = sid
-
- def get_ret(self) -> int:
- return self.ret_sid
+ def set_ret(self, ret: entry_t):
+ self.ret = ret
- def set_nargs(self, nargs: int):
- self.nargs = nargs
-
- def get_nargs(self) -> int:
- return self.nargs
-
- def set_cc(self, cc: cpustate.arch.abi_t):
- self.cc = cc
-
- # get IDA CM_CC_ calling convention
- def get_ida_cc(self) -> int:
- return self.cc.cc
+ def get_ret(self) -> Optional[entry_t]:
+ return self.ret
def set_virtual(self):
self.virtual = True
@@ -795,7 +812,7 @@ def is_virtual(self) -> bool:
return self.virtual
def __eq__(self, other: object) -> bool:
- return isinstance(other, function_t) and self.ea == other.ea
+ return isinstance(other, prototype_t) and self.ea == other.ea
# global model
@@ -804,7 +821,7 @@ class context_t:
# init from a list of entrypoints and propagation context
def __init__(self, entries: entry_record_t, allocators: Set[allocators.allocator_t]):
self.allocators = allocators # all registered allocators
- self.functions: dict[int, function_t] = dict() # ea -> function_t
+ self.functions: Dict[int, prototype_t] = dict() # ea -> prototype_t
self.graph = entries # entrypoints tree hierarchy
# information gathered by data flow
@@ -817,22 +834,12 @@ def __init__(self, entries: entry_record_t, allocators: Set[allocators.allocator
# dive into callee decision
self.dive_in: bool = False
- # record all visited functions into model
- def record_functions(self, record: Dict[int, cpustate.function_t]):
- for ea, visited in record.items():
- if visited.cc_not_guessed:
- continue
-
- fct = self.get_function(ea)
- fct.set_nargs(visited.get_count())
- fct.set_cc(visited.cc)
-
- def get_function(self, ea: int) -> function_t:
+ def get_function(self, ea: int) -> prototype_t:
if ea not in self.functions:
- self.functions[ea] = function_t(ea)
+ self.functions[ea] = prototype_t(ea)
return self.functions[ea]
- def get_functions(self) -> Collection[function_t]:
+ def get_functions(self) -> Collection[prototype_t]:
return self.functions.values()
def get_entrypoints(self) -> entry_record_t:
diff --git a/symless/model/entrypoints.py b/symless/model/entrypoints.py
index 8915fcf..094bfd2 100644
--- a/symless/model/entrypoints.py
+++ b/symless/model/entrypoints.py
@@ -1,12 +1,14 @@
-import collections
import enum
-from typing import Collection, Dict, Iterable, Optional, Set, Tuple, Union
+from collections import defaultdict, deque
+from typing import Collection, Dict, Iterable, Set, Tuple, Union
+import ida_hexrays
import idaapi
import symless.allocators as allocators
import symless.cpustate.cpustate as cpustate
import symless.utils.ida_utils as ida_utils
+import symless.utils.vtables as vtables
from symless.model import *
""" Entry points from memory allocations """
@@ -14,88 +16,80 @@
# Type of a memory allocation
class allocation_type(enum.Enum):
- WRAPPED_ALLOCATION = 0
- STATIC_SIZE = 1
- UNKNOWN = 2
+ WRAPPED_ALLOCATION = 0 # allocator is just a wrap calling another allocator
+ STATIC_SIZE = 1 # static size allocation
+ UNKNOWN = 2 # any other case we do not handle
# Analyze a given call to a memory allocator
# defines if the caller is an allocator wrapper, or if the call is a static allocation (known size)
def analyze_allocation(
caller: idaapi.func_t, allocator: allocators.allocator_t, call_ea: int
-) -> Tuple[allocation_type, Optional[Union[int, Iterable[int]]]]:
- before_allocation = True
- wrapper_args = None
+) -> Tuple[allocation_type, Union[allocators.allocator_t, alloc_entry_t, None]]:
+ action, wrapper_args = None, None
params = cpustate.dflow_ctrl_t(depth=0)
- for ea, state in cpustate.generate_state(caller, params, cpustate.get_default_cc()):
- if ea == call_ea and before_allocation:
- before_allocation = False
-
+ for ea, sub_ea, state in cpustate.generate_state(caller, params):
+ if ea == call_ea and state.has_call_info() and action is None:
action, wrapper_args = allocator.on_call(state)
- # caller jumps to allocator, with size argument past through
- if action == allocators.alloc_action_t.JUMP_TO_ALLOCATOR:
- return (allocation_type.WRAPPED_ALLOCATION, wrapper_args)
+ # caller calls allocator, with size argument passed through
+ if action == allocators.alloc_action_t.WRAPPED_ALLOCATOR:
+ pass
# known size allocation
- if action == allocators.alloc_action_t.STATIC_ALLOCATION:
- return (allocation_type.STATIC_SIZE, wrapper_args)
+ elif action == allocators.alloc_action_t.STATIC_ALLOCATION:
+ return (allocation_type.STATIC_SIZE, alloc_entry_t(ea, sub_ea, wrapper_args, state.mba))
# unknown size allocation
- if action == allocators.alloc_action_t.UNDEFINED:
+ elif action == allocators.alloc_action_t.UNDEFINED:
return (allocation_type.UNKNOWN, None)
- # else: allocators.alloc_action_t.WRAPPED_ALLOCATOR
- # find if the caller returns the callee return value
-
- elif state.ret and not before_allocation:
- # allocation returned value is returned by caller
- if allocator.on_wrapper_ret(state, call_ea):
- return (allocation_type.WRAPPED_ALLOCATION, wrapper_args)
+ # allocator wrapper returns the allocation
+ elif action and state.has_ret_info() and allocator.on_wrapper_ret(state, call_ea):
+ return (allocation_type.WRAPPED_ALLOCATION, allocator.get_child(caller.start_ea, wrapper_args))
return (allocation_type.UNKNOWN, None)
# Analyze all calls to a memory allocator and its wrappers
-# returns a set of entrypoints (static allocation) made with this allocator
+# returns a set of entrypoints (static allocations) made with this allocator
def analyze_allocator_heirs(
allocator: allocators.allocator_t,
- allocators: Set[allocators.allocator_t],
+ allocs: Set[allocators.allocator_t],
entries: entry_record_t,
):
- if allocator in allocators: # avoid infinite recursion if crossed xrefs
+ if allocator in allocs: # avoid infinite recursion if crossed xrefs
return
+ allocs.add(allocator)
- allocators.add(allocator)
+ # for all xrefs to allocator
+ for allocation_ea in ida_utils.get_all_references(allocator.ea):
+ # function referencing the allocator
+ caller = idaapi.get_func(allocation_ea)
+ if caller is None:
+ continue
- # for all calls to allocator
- for allocation in ida_utils.get_all_references(allocator.ea):
- insn = idaapi.insn_t()
- if idaapi.decode_insn(insn, allocation) <= 0:
+ # instruction referencing the allocator
+ call_insn = ida_utils.get_ins_microcode(allocation_ea)
+ if call_insn is None:
continue
- if insn.itype in [
- idaapi.NN_jmp,
- idaapi.NN_jmpfi,
- idaapi.NN_jmpni,
- idaapi.NN_call,
- idaapi.NN_callfi,
- idaapi.NN_callni,
- ]:
- caller = idaapi.get_func(allocation)
- if caller is None:
- continue
+ utils.g_logger.debug(f"Analyzing xref {allocation_ea:#x}: {call_insn.dstr()} to allocator {allocator}")
+
+ # verify this is a call / jmp instruction
+ if call_insn.opcode not in (ida_hexrays.m_call, ida_hexrays.m_icall, ida_hexrays.m_goto, ida_hexrays.m_ijmp):
+ continue
- type, args = analyze_allocation(caller, allocator, allocation)
+ type, alloc = analyze_allocation(caller, allocator, allocation_ea)
- if type == allocation_type.WRAPPED_ALLOCATION:
- wrapper = allocator.get_child(caller.start_ea, args)
- analyze_allocator_heirs(wrapper, allocators, entries)
+ if type == allocation_type.WRAPPED_ALLOCATION:
+ utils.g_logger.debug(f"{allocation_ea:#x} is a wrap around {allocator}")
+ analyze_allocator_heirs(alloc, allocs, entries)
- elif type == allocation_type.STATIC_SIZE:
- entry = ret_entry_t(allocation, caller.start_ea, args)
- entries.add_entry(entry, True)
+ elif type == allocation_type.STATIC_SIZE:
+ utils.g_logger.debug(f"{allocation_ea:#x} is a static allocation of {alloc.size:#x}")
+ entries.add_entry(alloc, True)
# get all entrypoints from defined allocators
@@ -113,93 +107,108 @@ def get_allocations_entrypoints(
""" Entry points from ctors & dtors """
-# count of xrefs to vtable functions
-def vtable_ref_count(vtable_ea: int) -> Tuple[int, int]:
- count, size = 0, 0
- for fea in ida_utils.vtable_members(vtable_ea):
- count += len(ida_utils.get_data_references(fea))
- size += 1
- return count, size
-
-
-# which one is the most derived vtable
-# base heuristics: biggest one, or the one with the less referenced functions
-def most_derived_vtable(v1: int, v2: int) -> int:
- c1, s1 = vtable_ref_count(v1)
- c2, s2 = vtable_ref_count(v2)
- if s1 > s2:
- return v1
- if s2 > s1:
- return v2
- if c1 > c2:
- return v2
- return v1
-
-
-# is given function a ctor/dtor (does it load a vtable into a class given as first arg)
-def is_ctor(func: idaapi.func_t, load_addr: int) -> Tuple[bool, int]:
- state = cpustate.state_t()
+# a constructor / destructor and the vtables it loads into the 'this' object
+class ctor_t:
+ def __init__(self, func: idaapi.func_t):
+ self.func = func
+
+ # vtables loaded into the 'this' object by this ctor
+ self.vtables: Dict[vtables.vtable_t, Optional[int]] = dict() # vtbl_ea -> load_offset
+
+ # get what we think is the right vtable for the class associated with this ctor
+ def get_associated_vtable(self) -> Tuple[vtables.vtable_t, int]:
+ candidates = [(vtbl, off) for (vtbl, off) in self.vtables.items() if off is not None]
+ candidates.sort(key=lambda k: k[1], reverse=True)
+
+ vtbl, off = candidates.pop() # at least one candidate should be present
+ while len(candidates) > 0:
+ vtbl2, off2 = candidates.pop()
+ if off2 != off:
+ break
+ vtbl = vtbl.get_most_derived(vtbl2) # conflict, try to find the inheriting vtable
+
+ return (vtbl, off)
+
+
+# analyse given ctor, returns True if it really is a ctor
+def analyze_ctor(ctor: ctor_t) -> bool:
+ ptr_size = ida_utils.get_ptr_size()
+ yet_to_see = set(ctor.vtables.keys())
+
+ mba = ida_utils.get_func_microcode(ctor.func)
+ if mba is None:
+ return False
+
params = cpustate.dflow_ctrl_t(depth=0)
- cpustate.set_argument(cpustate.get_object_cc(), state, 0, cpustate.sid_t(0))
- for _, state in cpustate.function_data_flow(func, state, params):
- if len(state.writes) > 0:
- write = state.writes[0]
+ state = cpustate.state_t(mba, params.get_function_for_mba(mba))
+
+ if state.fct.get_args_count() == 0: # function does not take a 'this' argument
+ return False
+ state.set_var_from_loc(state.fct.get_argloc(0), cpustate.sid_t(0))
+
+ ret = False
+ for _, _, state in cpustate.function_data_flow(state, params):
+ if len(yet_to_see) == 0: # nothing more to see
+ return ret
- if not isinstance(write.src, cpustate.mem_t):
+ for write in state.writes:
+ if write.size != ptr_size or not isinstance(write.value, cpustate.mem_t):
continue
- if write.src.addr != load_addr:
+ val = write.value.get_uval()
+ if val not in yet_to_see: # written value is one of our vtables ea
continue
+ yet_to_see.remove(val)
- dst = state.get_previous_register(write.disp.reg)
- if isinstance(dst, cpustate.sid_t): # arg 0 = struct ptr -> ctor/dtor
- offset = cpustate.ctypes.c_int32(write.disp.offset + dst.shift).value
- if offset >= 0:
- return (True, offset)
+ # value is written in our 'this' object
+ if not isinstance(write.target, cpustate.sid_t):
+ continue
- # vtable moved somewhere else
- return (False, -1)
+ # update shift for vtable
+ utils.g_logger.debug(f"Load for vtbl {val:#x} into this:{write.target.shift:x}")
+ ctor.vtables[val] = write.target.shift
+ ret = True
- return (False, -1)
+ return ret
# get ctors & dtors families
-def get_ctors() -> Dict[int, Collection[int]]:
- # associate each ctor/dtor to one vtable (effective vtable of one class)
- ctor_vtbl = dict() # ctor_ea -> vtbl_ea
- for vtbl_ref, vtbl_addr in ida_utils.get_all_vtables():
- for xref in ida_utils.get_data_references(vtbl_ref):
- if not ida_utils.is_vtable_load(xref):
- continue
+def get_ctors() -> Dict[int, Collection[ctor_t]]:
+ all_ctors: Dict[int, ctor_t] = dict()
+ ctors_for_family: Dict[int, Collection[ctor_t]] = defaultdict(deque)
+
+ # make record of candidates ctors & the vtables they load
+ for vtbl in vtables.get_all_vtables():
+ for xref in vtbl.get_loads():
+ fct = idaapi.get_func(xref) # can not return None
+ if fct.start_ea not in all_ctors:
+ all_ctors[fct.start_ea] = ctor_t(fct)
+ all_ctors[fct.start_ea].vtables[vtbl] = None
+
+ # analyze all ctors, find the base vtable for their class
+ for ctor in all_ctors.values():
+ utils.g_logger.debug(
+ f"Analyzing fct {ctor.func.start_ea:#x} for {len(ctor.vtables.keys())} vtables loads into 'this'"
+ )
+ if not analyze_ctor(ctor):
+ continue
- func = idaapi.get_func(xref)
- if func is None:
- continue
+ vtbl, offset = ctor.get_associated_vtable()
+ utils.g_logger.info(f"Found one ctor/dtor @ {ctor.func.start_ea:#x} for vtbl {vtbl.ea:#x} (off {offset:#x})")
- ctor, shift = is_ctor(func, vtbl_ref)
- if ctor and shift == 0: # only take first vtable in account
- if func.start_ea in ctor_vtbl:
- ctor_vtbl[func.start_ea] = most_derived_vtable(vtbl_addr, ctor_vtbl[func.start_ea])
- else:
- ctor_vtbl[func.start_ea] = vtbl_addr
+ if offset != 0: # we are only interested in vtables loaded at off:0
+ continue
- # regroup ctors/dtors by families
- mifa = dict() # vtbl_ea -> list of ctors
- for ctor, vtbl in ctor_vtbl.items():
- if vtbl not in mifa:
- mifa[vtbl] = collections.deque()
- mifa[vtbl].append(ctor)
+ ctors_for_family[vtbl.ea].append(ctor)
- return mifa
+ return ctors_for_family
# get all entrypoints from identified ctors / dtors
def get_ctors_entrypoints(entries: entry_record_t):
- for _, fam in get_ctors().items():
- first = True
- for ctor in fam:
- entries.add_entry(arg_entry_t(ctor, 0), True, first)
- first = False
+ for fam in get_ctors().values():
+ for i, ctor in enumerate(fam):
+ entries.add_entry(arg_entry_t(ctor.func.start_ea, 0), True, i == 0)
# find root entrypoints, from classes & allocators found in the base
diff --git a/symless/model/model.py b/symless/model/model.py
index 9b3e7bc..725e10a 100644
--- a/symless/model/model.py
+++ b/symless/model/model.py
@@ -2,6 +2,7 @@
import symless.cpustate.cpustate as cpustate
import symless.utils.ida_utils as ida_utils
+import symless.utils.vtables as vtables
from symless.model import *
from symless.utils.utils import g_logger as logger
@@ -9,152 +10,187 @@
# handle function ret, record ret type for function typing
-def handle_ret(ea: int, state: cpustate.state_t, ctx: context_t):
+def handle_ret(state: cpustate.state_t, ctx: context_t):
if state.ret is None:
return
- value = state.ret.code
- if not isinstance(value, cpustate.sid_t) or value.shift != 0:
+ # only record struct pointer returned without shift
+ if not isinstance(state.ret, cpustate.sid_t) or state.ret.shift != 0:
return
- fea = idaapi.get_func(state.ret.where).start_ea
-
- fct = ctx.get_function(fea)
- fct.set_ret(value.sid)
+ prot = ctx.get_function(state.get_fea())
+ prot.set_ret(ctx.graph.get_entry_by_id(state.ret.sid))
# Build model members from state access
-def handle_access(ea: int, state: cpustate.state_t, ctx: context_t):
- for access in state.access:
- disp = access.key
-
- # use previous registers values, before insn was computed
- cur = state.get_previous_register(disp.reg)
+def handle_access(ea: int, sub_ea: int, state: cpustate.state_t, ctx: context_t):
+ for access in state.accesses:
+ struc = access.target
- if not isinstance(cur, cpustate.sid_t):
+ # a structure is accessed
+ if not isinstance(struc, cpustate.sid_t):
continue
- offset = cpustate.ctypes.c_int32(disp.offset + cur.shift).value
+ offset = access.target.shift
+ size = access.size
- entry = ctx.graph.get_entry_by_id(cur.sid)
- if entry.add_field(offset, disp.nbytes) is None:
- continue
+ # record field beeing accessed
+ # do not make field for access of unknown (0) size
+ entry = ctx.graph.get_entry_by_id(struc.sid)
+ if size != 0:
+ f = entry.add_field(offset, size)
+ logger.debug(f"{ea:#x}.{sub_ea:x}: adding {f} to {entry.entry_id()}")
- logger.debug(
- f"Handle access 0x{ea:x}: add operand to {entry.entry_id()} at offset 0x{offset:x}, ea: 0x{access.ea:x}, n: {access.op_index}"
- )
- entry.add_operand(access.ea, access.op_index, cur.shift)
+ # record operand for the access, to type later
+ # we only care about regs operands
+ if access.loc.t == ida_hexrays.mop_r:
+ regid = ida_hexrays.mreg2reg(access.loc.r, access.loc.size)
+ if regid == -1: # special kreg, nothing to type
+ return
+ logger.debug(f"{ea:#x}.{sub_ea:x}: add operand to {entry.entry_id()} at offset 0x{offset:x}")
+ entry.add_operand(ea, offset, regid)
-# retrieve virtual methods for given vtable
-# and add child entry points for each of them to current entry
-def analyze_virtual_methods(vtable_ea: int, current: entry_t, offset: int, ctx: context_t):
+
+# add arg_0 entry for each virtual method
+# assuming every virtual method takes 'this' as first argument
+# we are aware that in some cases this is not right ('this' unused by virtual method + optimization)
+def analyze_virtual_methods(vtbl: vtables.vtable_t, current: entry_t, offset: int, ctx: context_t):
if not ctx.can_follow_calls():
return
- for fea in ida_utils.vtable_members(vtable_ea):
+ for fea, _ in vtbl.get_members():
+ fct = idaapi.get_func(fea)
+ if fct is None: # no body for virtual method, it may be an import
+ continue
+
+ model = ctx.dflow_info.get_function(fct)
+
+ # virtual method should have at least one argument (this)
+ # if not we have no interest in analysing it
+ if model is None or model.get_args_count() == 0:
+ continue
+
# add entry to analyse
child = ctx.graph.add_entry_as_child(current, arg_entry_t(fea, 0), offset, False)
if child is not None:
logger.debug(f"Add virtual method 0x{fea:x}, {child.entry_id()}, as child of {current.entry_id()}")
# mark function as virtual
- fct = ctx.get_function(fea)
- fct.set_virtual()
+ prot = ctx.get_function(fea)
+ prot.set_virtual()
# Handle writes to struc members
-def handle_write(ea: int, state: cpustate.state_t, ctx: context_t):
+def handle_write(ea: int, sub_ea: int, state: cpustate.state_t, ctx: context_t):
ptr_size = ida_utils.get_ptr_size()
for write in state.writes:
- disp = write.disp
- src = write.src
+ struc = write.target
+ size = write.size
# mov [sid + offset], mem -> ptr loaded
- cur = state.get_previous_register(disp.reg)
- if not (isinstance(cur, cpustate.sid_t) and isinstance(src, cpustate.mem_t) and disp.nbytes == ptr_size):
+ if not (isinstance(struc, cpustate.sid_t) and isinstance(write.value, cpustate.mem_t) and size == ptr_size):
continue
- value = src.get_val()
- entry = ctx.graph.get_entry_by_id(cur.sid)
- offset = cpustate.ctypes.c_int32(disp.offset + cur.shift).value
+ offset = struc.shift
+
+ value = write.value.get_uval()
+ entry = ctx.graph.get_entry_by_id(struc.sid)
# check if addr is a vtable
- vtbl_size = ida_utils.vtable_size(value)
+ vtbl = vtables.vtable_t(value)
# value is not a vtable address
- if vtbl_size == 0:
- type = ftype_ptr_t(src)
- logger.debug(f'Handle write 0x{ea:x}: add type "{type}" to field 0x{offset:x} of {entry.entry_id()}')
+ if not vtbl.valid():
+ type = ftype_ptr_t(write.value)
+ logger.debug(f'{ea:#x}.{sub_ea:x}: add type "{type}" to field 0x{offset:x} of {entry.entry_id()}')
else:
+ # record vtable loading sites
+ # used later in conflicts to differentiate a base vtable from an inherinting one
+ vtbl.search_loads()
+
# get / create vtable entry point
- vtbl = ctx.graph.add_entry(vtbl_entry_t(value), True)
- type = ftype_struc_t(vtbl)
+ vtbl_entry = ctx.graph.add_entry(vtbl_entry_t(vtbl), True)
+ type = ftype_struc_t(vtbl_entry)
logger.debug(
- f"Handle write 0x{ea:x}: associate {vtbl.entry_id()} to field 0x{offset:x} of {entry.entry_id()}"
+ f"{ea:#x}.{sub_ea:x} associate {vtbl_entry.entry_id()} to field 0x{offset:x} of {entry.entry_id()}"
)
# add entrypoints to analyze virtual methods
- analyze_virtual_methods(value, entry, offset, ctx)
+ analyze_virtual_methods(vtbl, entry, offset, ctx)
# type structure field with retrieved type
entry.get_field(offset).set_type(type)
# Handle read of struct members
-def handle_read(ea: int, state: cpustate.state_t, ctx: context_t):
+def handle_read(ea: int, sub_ea: int, state: cpustate.state_t, ctx: context_t):
+ ptr_size = ida_utils.get_ptr_size()
+
for read in state.reads:
- disp = read.disp
- src = state.get_previous_register(disp.reg)
- dst = cpustate.reg_string(read.dst)
+ struc = read.target
+ size = read.size
# mov reg, [sid + offset]
- if not isinstance(src, cpustate.sid_t):
+ if not isinstance(struc, cpustate.sid_t):
continue
- entry = ctx.graph.get_entry_by_id(src.sid)
- offset = cpustate.ctypes.c_int32(disp.offset + src.shift).value
+ offset = struc.shift
+
+ entry = ctx.graph.get_entry_by_id(struc.sid)
- # no read entries hierarchy
+ # do not expand entries graph too much from read_entry_t leaves
if isinstance(entry, read_entry_t):
logger.debug(f"Ignoring read from {entry.entry_id()}")
continue
# no fixed value, propagate read entrypoint
rtype = entry.get_field_type(offset)
- if rtype is None:
- r_entry = ctx.graph.add_entry(read_entry_t(ea, state.fct_ea, dst, entry, offset))
- logger.debug(f"Handle read at 0x{ea:x}: type not known, propagating {r_entry.entry_id()}")
+ if rtype is None and size == ptr_size:
+ r_entry = ctx.graph.add_entry(read_entry_t(ea, sub_ea, state.get_fea(), read.dst, entry, offset))
+ logger.debug(f"{ea:#x}.{sub_ea:x}: type not known, propagating {r_entry.entry_id()}")
+
+ elif rtype is None:
+ pass
# a struc ptr is read
elif isinstance(rtype, ftype_struc_t):
- r_entry = ctx.graph.add_entry_as_child(rtype.entry, dst_reg_entry_t(ea, state.fct_ea, dst), 0, False)
+ r_entry = ctx.graph.add_entry_as_child(
+ rtype.entry, dst_var_entry_t(ea, sub_ea, state.get_fea(), read.dst), 0, False
+ )
if r_entry is not None:
- logger.debug(f"Handle read at 0x{ea:x} from {rtype.entry.entry_id()}, propagating {r_entry.entry_id()}")
+ logger.debug(f"{ea:#x}.{sub_ea:x} from {rtype.entry.entry_id()}, propagating {r_entry.entry_id()}")
# propagate any field
else:
- state.set_register(read.dst, rtype.get_propagated_value())
- logger.debug(f"Handle read at 0x{ea:x}: propagating read type {rtype}")
+ state.set_var_from_mop(read.dst, rtype.get_propagated_value())
+ logger.debug(f"{ea:#x}.{sub_ea:x}: propagating read type {rtype}")
# handle call, add entrypoints in callee
-def handle_call(ea: int, state: cpustate.state_t, ctx: context_t):
+def handle_call(ea: int, sub_ea: int, state: cpustate.state_t, ctx: context_t):
if not ctx.can_follow_calls():
return
if state.call_to is not None:
ctx.dive_in = False # default: do not dive in every callee
call_ea = state.call_to.start_ea
- fct = ctx.dflow_info.get_function(call_ea)
+
+ callee_model = ctx.dflow_info.get_function(state.call_to)
+ if callee_model is None:
+ return # function mba could not be generated, analysis is not possible
+
+ # callsite nargs can differ from callee nargs
+ callee_nargs = min(len(state.call_args), callee_model.get_args_count())
# look for entries to be propagated as callee's arguments
epc = 0
- for i in range(fct.args_count):
- arg = cpustate.get_argument(fct.cc, state, i, False, state.call_type == cpustate.call_type_t.JUMP)
+ for i in range(callee_nargs):
+ arg = state.call_args[i]
+
if not isinstance(arg, cpustate.sid_t):
continue
@@ -162,19 +198,18 @@ def handle_call(ea: int, state: cpustate.state_t, ctx: context_t):
# create new arg entry point
# one entry point is restricted to be propagated in only one function
- ctx.graph.add_entry_as_child(entry, arg_entry_t(call_ea, i), arg.shift, True)
- epc += 1
+ epc += int(ctx.graph.add_entry_as_child(entry, arg_entry_t(call_ea, i), arg.shift, True) is not None)
- logger.debug(f"Handle call at 0x{ea:x}, {epc} entrypoints recorded")
+ logger.debug(f"{ea:#x}.{sub_ea:x}, {epc} entrypoints recorded")
# handle new cpu state
-def handle_state(ea: int, state: cpustate.state_t, ctx: context_t):
- handle_access(ea, state, ctx)
- handle_write(ea, state, ctx)
- handle_read(ea, state, ctx)
- handle_call(ea, state, ctx)
- handle_ret(ea, state, ctx)
+def handle_state(ea: int, sub_ea: int, state: cpustate.state_t, ctx: context_t):
+ handle_access(ea, sub_ea, state, ctx)
+ handle_write(ea, sub_ea, state, ctx)
+ handle_read(ea, sub_ea, state, ctx)
+ handle_call(ea, sub_ea, state, ctx)
+ handle_ret(state, ctx)
""" Entrypoints analysis & entries graph building """
@@ -190,18 +225,17 @@ def dive_in(callee: cpustate.function_t, state: cpustate.state_t, ctx: context_t
if dive:
# arguments entries are to be built (again ?), reset their states
- for ep in ctx.graph.get_entries_at(callee.ea, 0):
+ for ep in ctx.graph.get_entries_at(callee.ea, -1, arg_entry_t.inject_before):
ep.reset()
-
utils.g_logger.debug("Diving into fct 0x%x: %s" % (callee.ea, "YES" if dive else "NO"))
return dive
# injector callback, inject entrypoints into cpustate
-def model_injector(state: cpustate.state_t, insn: idaapi.insn_t, before_update: bool, ctx: context_t):
- for ep in ctx.graph.get_entries_at(insn.ea, not before_update):
- ctx.dive_in |= ep.inject(insn.ea, state, ctx) # dive in callee if new eps are to be analyzed
- utils.g_logger.debug(f"Injecting {ep.entry_id()} at 0x{insn.ea:x}")
+def model_injector(state: cpustate.state_t, ea: int, sub_ea: int, before_update: bool, ctx: context_t):
+ for ep in ctx.graph.get_entries_at(ea, sub_ea, before_update):
+ utils.g_logger.debug(f"Injecting {ep.entry_id()} at {ea:#x}.{sub_ea:x}")
+ ctx.dive_in |= ep.inject(state) # dive in callee if new eps are to be analyzed
# entrypoints graph builder
@@ -211,8 +245,8 @@ def analyze_entrypoints(ctx: context_t):
entries = ctx.get_entrypoints()
# injector callback
- def inject_cb(state: cpustate.state_t, insn: idaapi.insn_t, before_update: bool):
- model_injector(state, insn, before_update, ctx)
+ def inject_cb(state: cpustate.state_t, ea: int, sub_ea: int, before_update: bool):
+ model_injector(state, ea, sub_ea, before_update, ctx)
inject = cpustate.injector_t(inject_cb, 3)
@@ -235,14 +269,21 @@ def inject_cb(state: cpustate.state_t, insn: idaapi.insn_t, before_update: bool)
logger.debug(f"Analyzing entry {entry.entry_header()} ..")
func = idaapi.get_func(entry.ea)
- for ea, state in cpustate.generate_state(func, ctx.dflow_info, cpustate.get_default_cc()):
- handle_state(ea, state, ctx)
+ for ea, sub_ea, state in cpustate.generate_state(func, ctx.dflow_info):
+ handle_state(ea, sub_ea, state, ctx)
+
+ # entrypoint was not injected
+ # this happens when the user selects an entrypoint that gets deleted from the mba after call analysis
+ # solution for now: let the user select another entrypoint, or update the mba
+ if entry.to_analyze:
+ logger.error(
+ f"Entry {entry.entry_header()} was not injected because it is unvalid - Redo the analysis."
+ )
+ current_count = 0
+ break
logger.debug(f"Entrypoints wave {current_wave} has been analyzed (total: {current_count})")
current_wave += 1
- # record visited functions information into model
- ctx.record_functions(ctx.dflow_info.visited)
-
# remove propagation data from model
del ctx.dflow_info
diff --git a/symless/plugins/__init__.py b/symless/plugins/__init__.py
index 15a348c..9f80f04 100644
--- a/symless/plugins/__init__.py
+++ b/symless/plugins/__init__.py
@@ -4,6 +4,10 @@ class plugin_t:
def __init__(self):
pass
+ # the extension has been reloaded
+ def reload(self):
+ pass
+
# terminate & clean the extension
def term(self):
pass
diff --git a/symless/plugins/builder.py b/symless/plugins/builder.py
index b4f6084..9e2006e 100644
--- a/symless/plugins/builder.py
+++ b/symless/plugins/builder.py
@@ -1,11 +1,13 @@
import os
-from typing import Optional, Tuple, Union
+from collections import deque
+from dataclasses import dataclass
+from typing import Collection, Optional, Tuple
+import ida_hexrays
import idaapi
from PyQt5 import QtCore, QtGui, QtWidgets
import symless
-import symless.cpustate as cpustate
import symless.cpustate.arch as arch
import symless.generation.generate as generate
import symless.generation.structures as structures
@@ -17,6 +19,10 @@
# builder window title
WINDOW_TITLE = "Symless structure builder"
+# fictive color_t used for tagging elements in our microcode view
+COLOR_TARGET = idaapi.COLOR_OPND1
+SCOLOR_TARGET = chr(COLOR_TARGET)
+
# Structure builder plugin extension
class BuilderPlugin(plugin_t):
@@ -24,9 +30,13 @@ def __init__(self):
self.uihook = PopUpHook()
self.uihook.hook()
+ def reload(self):
+ self.uihook.init_action()
+
def term(self):
self.uihook.unhook()
- self.uihook.term()
+ if self.uihook.loaded:
+ self.uihook.term()
# retrieve the extension
@@ -34,310 +44,744 @@ def get_plugin() -> plugin_t:
return BuilderPlugin()
-# base class for a tab in our plugin's UI
-class BuilderTabBase(QtWidgets.QWidget):
- def __init__(self, label: str, window: "BuilderMainWid", parent: QtWidgets.QWidget = None):
- super().__init__(parent)
- self.window = window
-
- # build widget
- lmain = QtWidgets.QVBoxLayout()
-
- # window's hint
- whint = QtWidgets.QLabel(self)
- whint.setTextFormat(QtCore.Qt.TextFormat.RichText)
- whint.setText(label)
- whint.setAlignment(QtCore.Qt.AlignCenter)
-
- lmain.addWidget(whint)
- lmain.setAlignment(whint, QtCore.Qt.AlignTop)
-
- self.populate(lmain)
-
- lbottom = QtWidgets.QGridLayout()
- cancel_btn = QtWidgets.QPushButton("Cancel", self)
- cancel_btn.clicked.connect(self.window.reject)
- ok_btn = QtWidgets.QPushButton("Propagate", self)
- ok_btn.clicked.connect(self.execute)
- lbottom.addWidget(cancel_btn, 0, 0)
- lbottom.addWidget(ok_btn, 0, 1)
-
- lmain.addLayout(lbottom)
- lmain.setAlignment(lbottom, QtCore.Qt.AlignBottom)
- self.setLayout(lmain)
-
- # populate widget's components
- def populate(self, layout: QtWidgets.QLayout):
- pass
-
- # is the form filled correctly
- def completed(self) -> Tuple[bool, str]:
- return True, None
-
- # call after popup is built, used to give focus
- def give_focus(self):
- pass
-
- def execute(self):
- valid, err = self.completed()
- if valid:
- self.window.execute(self)
+# a selected micro-operand for data flow entrypoint
+@dataclass
+class mop_sel_t:
+ ea: int # insn ea
+ sub_idx: int # sub insn idx
+ mop: ida_hexrays.mop_t # selected mop
+ as_dst: bool # is dst operand
+
+ def __str__(self) -> str:
+ if self.mop.t == ida_hexrays.mop_r:
+ name = ida_hexrays.get_mreg_name(self.mop.r, self.mop.size)
+ elif self.mop.t == ida_hexrays.mop_S:
+ name = f"stk:#{self.mop.s.off:x}"
+ return f"{name} @ {self.ea:#x}.{self.sub_idx:x} ({'DST' if self.as_dst else 'SRC'})"
+
+
+# context for parsing microinstruction and extracting operands
+class minsn_parse_ctx_t:
+ def __init__(self, mba: ida_hexrays.mba_t, ea: int):
+ self.mba = mba
+ self.ea = ea
+ self.sub_idx = 0
+ self.targets: Collection[mop_sel_t] = deque()
+
+ def add_mop(self, mop: ida_hexrays.mop_t, as_dst: bool):
+ self.targets.append(mop_sel_t(self.ea, self.sub_idx, ida_hexrays.mop_t(mop), as_dst))
+
+ def next_subinsn(self):
+ self.sub_idx += 1
+
+
+# get simplified name for global at ea
+# to be displayed in microcode view
+def get_simplified_gbl_name(ea: int) -> str:
+ d = ida_utils.demangle_ea(ea)
+ if not len(d):
+ return f"off_{ea:#x}"
+ if "(" in d:
+ return d.split("(")[0]
+ return d
+
+
+# parse a microcode operand
+# returns its str representation & update the contained micro-operands in the context
+# as_dst: operand value gets updated by the instruction
+def parse_mop(ctx: minsn_parse_ctx_t, op: ida_hexrays.mop_t, as_dst: bool = False) -> Optional[str]: # noqa: C901
+ if op.t == ida_hexrays.mop_z: # none
+ return None
+
+ if op.t == ida_hexrays.mop_r: # micro register
+ ctx.add_mop(op, as_dst) # add as target variable
+ return idaapi.COLSTR(idaapi.COLSTR(ida_hexrays.get_mreg_name(op.r, op.size), SCOLOR_TARGET), idaapi.SCOLOR_REG)
+
+ if op.t == ida_hexrays.mop_n: # immediate (number)
+ return f"#{idaapi.COLSTR(hex(op.signed_value()), idaapi.SCOLOR_NUMBER)}"
+
+ if op.t == ida_hexrays.mop_str: # immediate (string)
+ return f'"{idaapi.COLSTR(op.cstr, idaapi.SCOLOR_STRING)}"'
+
+ if op.t == ida_hexrays.mop_d: # result of another instruction
+ in_repr = parse_minsn(ctx, op.d, True)
+ ctx.next_subinsn()
+ return in_repr
+
+ if op.t == ida_hexrays.mop_S: # local stack variable
+ member = op.s.get_stkvar(None)
+ if member in (None, -1): # happens
+ return idaapi.COLSTR(f"stk:#{op.s.off:x}", idaapi.SCOLOR_LOCNAME)
+ ctx.add_mop(op, as_dst)
+
+ if isinstance(member, int): # IDA 9 API changed get_stkvar() prototype
+ m = idaapi.udm_t()
+ op.s.get_stkvar(m)
+ varname = m.name
+
+ # IDA 8 case
else:
- idaapi.warning(err)
-
- def get_shift(self) -> int:
- raise Exception("Not implemented")
+ varname = idaapi.get_member_name(member.id)
+
+ return idaapi.COLSTR(idaapi.COLSTR(varname, SCOLOR_TARGET), idaapi.SCOLOR_LOCNAME)
+
+ if op.t == ida_hexrays.mop_v: # global variable
+ color = idaapi.SCOLOR_DNAME
+ if idaapi.get_func(op.g) is not None:
+ color = idaapi.SCOLOR_CNAME
+ return idaapi.COLSTR(get_simplified_gbl_name(op.g), color)
+
+ if op.t == ida_hexrays.mop_b: # micro basic block
+ b = ctx.mba.get_mblock(op.b)
+ assert b.serial == op.b
+ return f"{b.start:#x}" # type == BLT_STOP -> ret
+
+ if op.t == ida_hexrays.mop_f: # args list
+ return f"({', '.join([parse_mop(ctx, i) for i in op.f.args])})"
+
+ if op.t == ida_hexrays.mop_l: # local variable
+ return idaapi.COLSTR("?", idaapi.SCOLOR_LOCNAME) # should only exist at MMAT_LVARS maturity
+
+ if op.t == ida_hexrays.mop_a: # address of operand
+ return f"&({parse_mop(ctx, op.a)})"
+
+ if op.t == ida_hexrays.mop_h: # helper function
+ return idaapi.COLSTR(op.helper, idaapi.SCOLOR_MACRO)
+
+ if op.t == ida_hexrays.mop_c: # mcases
+ return idaapi.tag_remove(op.c._print()) # TODO test me
+
+ if op.t == ida_hexrays.mop_fn: # floating constant
+ return idaapi.COLSTR(idaapi.tag_remove(op.fpc._print()), idaapi.SCOLOR_NUMBER) # TODO test me
+
+ if op.t == ida_hexrays.mop_p: # operands pair
+ return f"({parse_mop(ctx, op.pair.lop, as_dst)}, {parse_mop(ctx, op.pair.hop, as_dst)})"
+
+ if op.t == ida_hexrays.mop_sc: # scattered
+ return op.scif.name # TODO test me
+
+ return "?"
+
+
+# parse microcode instruction
+# returns a str representation of the instruction + the variable it contains
+# provides a simpler representation than the one given by insn._print()
+def parse_minsn(ctx: minsn_parse_ctx_t, insn: ida_hexrays.minsn_t, inlined: bool = False) -> str:
+ ops_repr = filter(
+ lambda k: k is not None,
+ [parse_mop(ctx, i, j) for (i, j) in ((insn.l, False), (insn.r, False), (insn.d, insn.modifies_d()))],
+ )
+
+ repr_ = None # out string representation
+ par_ = ("(", ")") if inlined else ("", "")
+ padd = 0 if inlined else 9
+
+ # call special format
+ if insn.opcode == ida_hexrays.m_call:
+ repr_ = f"{idaapi.COLSTR(ida_utils.g_mcode_name[insn.opcode], idaapi.SCOLOR_INSN):{' '}<{padd}} {next(ops_repr)}{next(ops_repr, '()')}"
+
+ # special "ret" goto
+ elif (
+ insn.opcode == ida_hexrays.m_goto
+ and insn.l.t == ida_hexrays.mop_b
+ and ctx.mba.get_mblock(insn.l.b).type == ida_hexrays.BLT_STOP
+ ):
+ repr_ = idaapi.COLSTR("ret", idaapi.SCOLOR_INSN)
+
+ # special embedded operations
+ elif inlined and insn.d.t == ida_hexrays.mop_z:
+ if insn.opcode == ida_hexrays.m_add:
+ repr_ = f"{next(ops_repr)}+{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_sub:
+ repr_ = f"{next(ops_repr)}-{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_mul:
+ repr_ = f"{next(ops_repr)}*{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_shl:
+ repr_ = f"{next(ops_repr)}<<{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_shr:
+ repr_ = f"{next(ops_repr)}>>{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_or:
+ repr_ = f"{next(ops_repr)}|{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_and:
+ repr_ = f"{next(ops_repr)}&{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_xor:
+ repr_ = f"{next(ops_repr)}^{next(ops_repr)}"
+ elif insn.opcode == ida_hexrays.m_ldx:
+ repr_ = f"{next(ops_repr)}:{next(ops_repr)}"
+ par_ = ("[", "]")
+
+ # default repr
+ if repr_ is None:
+ repr_ = f"{idaapi.COLSTR(ida_utils.g_mcode_name[insn.opcode], idaapi.SCOLOR_INSN):{' '}<{padd}} {', '.join(ops_repr)}"
+
+ # return insn representation within appropriate parentheses
+ return f"{par_[0]}{repr_}{par_[1]}"
+
+
+def find_in_line_wrapper(
+ range: idaapi.tagged_line_section_t, line: idaapi.tagged_line_sections_t, tag: int
+) -> idaapi.tagged_line_section_t:
+ if hasattr(line, "find_in"):
+ return line.find_in(range, tag) # IDA 8
+ return line.nearest_after(range, range.start, tag) # IDA 9
+
+
+# view of the current function (simplified) microcode
+# for the user to select the propagation entry variable
+class MicrocodeViewer(idaapi.simplecustviewer_t):
+ def __init__(self, mba: ida_hexrays.mba_t, current_ea: int, hint: Tuple[int, int]):
+ super().__init__()
+ self.mba = mba
+ guess_mreg, guess_size = hint # guess for target variable
+
+ # chosen target (insn ea, operand, is a dst operand ?)
+ self.chosen: Optional[mop_sel_t] = None
+
+ # list of valid target operands for each line
+ self.ops_per_line: Collection[Optional[Collection[mop_sel_t]]] = list()
+
+ self.Create("Symless microcode view")
+
+ # fill view with microinstructions
+ block = self.mba.blocks
+ _jump = True
+ while block:
+ insn = block.head
+ while insn:
+ # print(idaapi.tag_remove(insn._print()))
+
+ ctx = minsn_parse_ctx_t(self.mba, insn.ea)
+ insn_repr = parse_minsn(ctx, insn)
+ self.ops_per_line.append(ctx.targets)
+
+ # set line in listing
+ if _jump and insn.ea >= current_ea:
+ self.AddLine(f"{idaapi.COLSTR(hex(insn.ea), idaapi.SCOLOR_INSN)} {insn_repr}")
+ self.Jump(self.Count() - 1)
+ _jump = False
+ else:
+ self.AddLine(f"{idaapi.COLSTR(hex(insn.ea), idaapi.SCOLOR_PREFIX)} {insn_repr}")
+
+ # find target mvar from hint
+ # if multiple mvar match hint, the last one is selected
+ if insn.ea == current_ea and self.chosen is None:
+ for i, mvar in enumerate(ctx.targets):
+ if mvar.mop.t == ida_hexrays.mop_r and mvar.mop.r == guess_mreg and mvar.mop.size == guess_size:
+ self.set_chosen(self.Count() - 1, i)
+ break
+
+ insn = insn.next
+ block = block.nextb
+ if block: # basic block boundaries
+ self.ops_per_line.append(None) # account for empty lines
+ self.AddLine("")
+
+ # remove IDA status bar
+ qwidget = idaapi.PluginForm.TWidgetToPyQtWidget(self.GetWidget())
+ for child in qwidget.children():
+ if isinstance(child, QtWidgets.QStatusBar):
+ child.setMaximumHeight(0)
+
+ qwidget.setMinimumWidth(512)
+
+ # get index of given section in given line
+ # use improper method because tagged_line_sections_t are not iterable in IDA python
+ def index_of_sect_in_line(self, section: idaapi.tagged_line_section_t, line: idaapi.tagged_line_sections_t) -> int:
+ range = idaapi.tagged_line_section_t()
+ range.start = 0
+ range.length = 0xFFFF
+
+ # loop over the tagged sections of the line
+ i = 0
+ current = find_in_line_wrapper(range, line, COLOR_TARGET)
+ while current and current.valid():
+ if section.start == current.start and section.length == current.length:
+ break
+ i += 1
+ range.start = current.start + current.length
+ current = find_in_line_wrapper(range, line, COLOR_TARGET)
+ return i
+
+ # resfresh the view & try to disable (again) the default highlighting
+ def OnCursorPosChanged(self):
+ self.Refresh()
+ idaapi.set_highlight(self.GetWidget(), None, idaapi.HIF_LOCKED)
+ self.Close()
+
+ # set chosen target variable to given (line, idx) variable
+ def set_chosen(self, line: int, idx: int):
+ # highligth selected var
+ old_line = self.GetLine(line)[0]
+ pat_off, pat_end = 0, 0
+ for _ in range(idx + 1):
+ pat_off = old_line.find(idaapi.SCOLOR_ON + SCOLOR_TARGET, pat_end, len(old_line))
+ pat_end = old_line.find(idaapi.SCOLOR_OFF + SCOLOR_TARGET, pat_off, len(old_line)) + 2
+ self.EditLine(
+ line,
+ old_line[:pat_off]
+ + old_line[pat_off:pat_end].replace(SCOLOR_TARGET, idaapi.SCOLOR_ERROR)
+ + old_line[pat_end:],
+ )
- def get_dive(self) -> bool:
- raise Exception("Not implemented")
+ # set appropriate chosen
+ self.chosen = self.ops_per_line[line][idx]
+ utils.g_logger.debug(f"selected variable is {self.chosen}")
+
+ # forget current selection
+ def forget_chosen(self):
+ for i in range(self.Count()):
+ line = self.GetLine(i)[0]
+ if idaapi.SCOLOR_ERROR in line:
+ self.EditLine(i, line.replace(idaapi.SCOLOR_ERROR, SCOLOR_TARGET))
+ self.Refresh()
+
+ self.chosen = None
+
+ # the user clicked (hopefully) a target variable
+ def OnClick(self, shift):
+ self.forget_chosen() # clear any previous selection
+
+ # click location
+ loc = idaapi.listing_location_t()
+ if not idaapi.get_custom_viewer_location(loc, self.GetWidget(), idaapi.CVLF_USE_MOUSE):
+ return False
+
+ # click location as coords
+ y, x, _ = self.GetPos()
+
+ # get clicked variable
+ nearest = loc.tagged_sections.nearest_at(x, COLOR_TARGET)
+ if nearest is None or not nearest.valid():
+ return False
+
+ # get variable idx in line
+ var_idx = self.index_of_sect_in_line(nearest, loc.tagged_sections)
+
+ self.set_chosen(y, var_idx)
+ return True
+
+
+"""
+class StackViewer(idaapi.simplecustviewer_t):
+ def __init__(self, fea: int):
+ super().__init__()
+ self.Create("Symless stack view")
+ self.selected = None
+ func = idaapi.get_func(fea)
+ frame = idaapi.get_frame(func)
+ if not frame:
+ utils.g_logger.warning("No frame found")
+ return
+ self.items = []
+ for offset, name, size in idautils.StructMembers(frame.id):
+ # Get the member ID and type
+ mptr: idaapi.member_t = idaapi.get_member_by_name(frame, name)
+ tif = idaapi.tinfo_t()
+ idaapi.get_member_tinfo(tif, mptr)
+ mtype = idaapi.print_tinfo("", 0, 0, idaapi.PRTYPE_1LINE, tif, "", "")
+ if mtype is None:
+ mtype = "unknown"
+
+ # Add the details to the items list
+ self.items.append([hex(offset), name, hex(size), mtype])
+ self.AddLine(
+ f"{idaapi.COLSTR('rsp+0x{:04x}'.format(offset), idaapi.SCOLOR_KEYWORD)} "
+ f"{idaapi.COLSTR(name.ljust(10), idaapi.SCOLOR_DNAME)} "
+ f"{idaapi.COLSTR(mtype, idaapi.SCOLOR_NUMBER)}"
+ )
+
+ # remove status bar
+ qwidget = idaapi.PluginForm.TWidgetToPyQtWidget(self.GetWidget())
+ for child in qwidget.children():
+ if isinstance(child, QtWidgets.QStatusBar):
+ child.setMaximumHeight(0)
+ qwidget.setMinimumWidth(384)
+
+ def OnClick(self, shift):
+ if self.selected is not None:
+ line, _, _ = self.GetLine(self.selected)
+ self.EditLine(self.selected, line[2:])
+
+ line_no, x, y = self.GetPos()
+ if line_no != self.selected:
+ self.EditLine(line_no, f"> {self.GetCurrentLine()}")
+ self.selected = line_no
+ self.Refresh()
+"""
+
+
+# a line in the structures list
+class StrucSelItem(QtWidgets.QListWidgetItem):
+ def __init__(self, struc_name: str, size: int = 0):
+ super().__init__(struc_name)
+ self.name = struc_name
+ self.size = size
- def get_struc(self) -> Union[int, str]:
- raise Exception("Not implemented")
+ def get_name(self) -> str:
+ return self.name
+ def get_size(self) -> int:
+ return self.size
-# an item in a list, representing a structure
-class StrucSelItem(QtWidgets.QListWidgetItem):
- def __init__(self, sid: int, display: str):
- super().__init__(display)
- self.sid = sid
- def get_struc(self) -> int:
- return self.sid
+# default option for structure selector
+class StrucSelDefaultItem(QtWidgets.QListWidgetItem):
+ def __init__(self):
+ super().__init__("New structure")
+ icon = QtGui.QIcon(os.path.join(os.path.abspath(symless.__path__[0]), "resources", "cross.png"))
+ self.setIcon(icon)
+ ft = QtGui.QFont()
+ ft.setBold(True)
+ self.setData(QtCore.Qt.FontRole, ft)
+
+ def get_size(self) -> int:
+ return 0
-# structure selector
+# structure selector (list)
class StrucSelWid(QtWidgets.QListWidget):
def __init__(self, parent: QtWidgets.QWidget = None):
super().__init__(parent)
- idx = idaapi.get_first_struc_idx()
- while idx != idaapi.BADADDR:
- sid = idaapi.get_struc_by_idx(idx)
- self.addItem(StrucSelItem(sid, idaapi.get_struc_name(sid)))
- idx = idaapi.get_next_struc_idx(idx)
+ self.setWhatsThis("Select a structure to propagate.")
+
+ # Get structures from local types
+ tif = idaapi.tinfo_t()
+ for id in range(1, idaapi.get_ordinal_count(None)):
+ if tif.get_numbered_type(None, id) and (tif.is_struct() or tif.is_forward_struct()):
+ local_type_name = idaapi.idc_get_local_type_name(id)
+ self.addItem(StrucSelItem(local_type_name, 0 if tif.is_forward_struct() else tif.get_size()))
self.sortItems()
+ # default option
+ default = StrucSelDefaultItem()
+ self.insertItem(0, default)
+ self.setCurrentItem(default)
+
def sizeHint(self) -> QtCore.QSize:
size = super().sizeHint()
- size.setHeight(384)
+ size.setHeight(256)
return size
-# UI for propagating a new structure
-class BuilderNewTab(BuilderTabBase):
+# base class for a tab in our plugin's UI
+class BuilderTabBase(QtWidgets.QWidget):
def __init__(self, window: "BuilderMainWid", parent: QtWidgets.QWidget = None):
- super().__init__("Create a new structure
", window, parent)
+ super().__init__(parent)
+ self.window = window
- def populate(self, layout: QtWidgets.QLayout):
- # central box
- wcenter = QtWidgets.QFrame(self)
- wcenter.setFrameStyle(QtWidgets.QFrame.Shape.Panel | QtWidgets.QFrame.Shadow.Raised)
+ # get form error, None if well filled
+ def get_error(self) -> Optional[str]:
+ return None
- # structure's name selector
- self.selector = QtWidgets.QLineEdit(self)
- self.selector.setPlaceholderText("New structure's name..")
- # shift selector
- self.shift = QtWidgets.QLineEdit(self)
- int_valid = QtGui.QIntValidator(self)
- int_valid.setBottom(0)
- self.shift.setValidator(int_valid)
- self.shift.setText("0")
- self.shift.setMaxLength(5)
- self.shift.setAlignment(QtCore.Qt.AlignVCenter | QtCore.Qt.AlignRight)
- self.shift.setFixedWidth(64)
- lshift = QtWidgets.QLabel(self)
- lshift.setText("shifted by")
- lshift.setBuddy(self.shift)
+"""
+# tab - propagate structure in stack
+class BuilderFromStkTab(BuilderTabBase):
+ def __init__(self, fea: int, window: "BuilderMainWid", parent: QtWidgets.QWidget = None):
+ super().__init__(window, parent)
+
+ # title
+ layout = QtWidgets.QVBoxLayout()
+ title = QtWidgets.QLabel(self)
+ title.setText("Select a stack offset
")
+ layout.addWidget(title)
+
+ self.stack = StackViewer(fea)
+ stackQTW = idaapi.PluginForm.TWidgetToPyQtWidget(self.stack.GetWidget())
+ stackQTW.setWhatsThis("Choose the offset of your structure in the stack")
+ layout.addWidget(stackQTW, QtCore.Qt.AlignLeft)
+
+ # size selector
+ self.size = QtWidgets.QLineEdit(self)
+ self.size.setText("0x0")
+ self.size.setMaxLength(16)
+ self.size.setValidator(QtGui.QRegExpValidator(QtCore.QRegExp("^([0-9]+)|(0x[0-9a-fA-F]+)$"), self.size))
+ self.size.setWhatsThis("Size of the in-stack structure.")
+ lsize = QtWidgets.QLabel(self)
+ lsize.setText("Structure size")
+ lsize.setWhatsThis("Size of the in-stack structure.")
+ lsize.setBuddy(self.size)
# deep dive checkbox
- self.chk = QtWidgets.QCheckBox("spread in callees", self)
+ self.chk = QtWidgets.QCheckBox("Spread in callees", self)
+ self.chk.setWhatsThis("Should propagation follow functions calls.")
self.chk.setChecked(True)
- # layout
- lcenter = QtWidgets.QVBoxLayout()
- lbar = QtWidgets.QHBoxLayout()
-
- lbar.addWidget(lshift)
- lbar.setAlignment(lshift, QtCore.Qt.AlignRight)
- lbar.addWidget(self.shift)
- lbar.setAlignment(self.shift, QtCore.Qt.AlignLeft)
- lbar.addWidget(self.chk)
+ lcenter = QtWidgets.QHBoxLayout()
+ lcenter.addWidget(lsize)
+ lcenter.addWidget(self.size)
+ lcenter.addStretch()
+ lcenter.addWidget(self.chk)
+ layout.addLayout(lcenter)
- lcenter.addWidget(self.selector)
- lcenter.addLayout(lbar)
+ self.setLayout(layout)
- wcenter.setLayout(lcenter)
- layout.addWidget(wcenter)
+ def get_error(self) -> Optional[str]:
+ if self.get_stack_offset() is None:
+ return "Please select the offset of the structure in the stack."
- self.setWhatsThis(
- "Choose a name for your new structure, a shift to apply and specify if we should follow function calls."
- )
+ if self.get_structure_size() <= 0:
+ return "Please provide a valid structure size."
- def give_focus(self):
- self.selector.setFocus(QtCore.Qt.FocusReason.PopupFocusReason)
+ return None
- def get_dive(self) -> bool:
- return self.chk.isChecked()
+ def get_stack_offset(self) -> Optional[int]:
+ line_no = self.stack.selected
+ if line_no is not None and line_no < len(self.stack.items):
+ return int(self.stack.items[line_no][0], 16)
+ return None
- def get_shift(self) -> int:
+ def get_structure_size(self) -> int:
+ sval = self.size.text()
try:
- return int(self.shift.text())
+ return int(sval, 16 if sval.startswith("0x") else 10)
except ValueError:
- return 0
-
- def get_struc(self) -> str:
- return self.selector.text()
-
- def completed(self) -> Tuple[bool, str]:
- name = self.selector.text()
- if len(name) == 0:
- return False, "Please provide a name for the new structure"
-
- sid = idaapi.get_struc_id(name)
- if sid != idaapi.BADADDR:
- return False, f'Structure "{name}" already exists'
-
- return True, None
-
-
-# UI for propagating an existing structure
-class BuilderExistingTab(BuilderTabBase):
- def __init__(self, window: "BuilderMainWid", parent: QtWidgets.QWidget = None):
- super().__init__("Select an existing structure
", window, parent)
-
- def populate(self, layout: QtWidgets.QLayout):
- # structure selector
- self.selector = StrucSelWid(self)
-
- # structure selector search bar
- self.search_bar = QtWidgets.QLineEdit(self)
- self.search_bar.setPlaceholderText("Search for a structure..")
- self.search_bar.textChanged.connect(self.find_struc)
+ return -1
+
+ def set_structure_size(self, size: int):
+ self.size.setText(hex(size))
+"""
+
+
+# tab - propagate structure pointer
+class BuilderFromPtrTab(BuilderTabBase):
+ def __init__(
+ self,
+ mba: ida_hexrays.mba_t,
+ ea: int,
+ hint: Tuple[int, int],
+ window: "BuilderMainWid",
+ parent: QtWidgets.QWidget = None,
+ ):
+ super().__init__(window, parent)
+
+ # title
+ layout = QtWidgets.QVBoxLayout()
+ title = QtWidgets.QLabel(self)
+ title.setText("Select an entry variable
")
+ title.setAlignment(QtCore.Qt.AlignCenter)
+ layout.addWidget(title)
+
+ # microcode view
+ self.microcodeViewer = MicrocodeViewer(mba, ea, hint)
+ microQTW = idaapi.PluginForm.TWidgetToPyQtWidget(self.microcodeViewer.GetWidget())
+ microQTW.setWhatsThis(
+ "Select an entry point for the propagation. The entry point should be a variable having for value a pointer to the structure to propagate."
+ )
+ layout.addWidget(microQTW, QtCore.Qt.AlignLeft)
# shift selector
self.shift = QtWidgets.QLineEdit(self)
- int_valid = QtGui.QIntValidator(self)
- int_valid.setBottom(0)
- self.shift.setValidator(int_valid)
- self.shift.setText("0")
- self.shift.setMaxLength(5)
- self.shift.setAlignment(QtCore.Qt.AlignVCenter | QtCore.Qt.AlignRight)
- self.shift.setFixedWidth(64)
+ self.shift.setText("0x0")
+ self.shift.setMaxLength(16)
+ self.shift.setValidator(QtGui.QRegExpValidator(QtCore.QRegExp("^([0-9]+)|(0x[0-9a-fA-F]+)$"), self.shift))
+ self.shift.setWhatsThis("Shift to apply to the propagated structure pointer.")
lshift = QtWidgets.QLabel(self)
- lshift.setText("shifted by")
+ lshift.setText("Shifted by")
+ lshift.setWhatsThis("Shift to apply to the propagated structure pointer.")
lshift.setBuddy(self.shift)
# deep dive checkbox
- self.chk = QtWidgets.QCheckBox("spread in callees", self)
+ self.chk = QtWidgets.QCheckBox("Spread in callees", self)
+ self.chk.setWhatsThis("Should propagation follow functions calls.")
self.chk.setChecked(True)
- # layout
lcenter = QtWidgets.QHBoxLayout()
-
lcenter.addWidget(lshift)
- lcenter.setAlignment(lshift, QtCore.Qt.AlignRight)
lcenter.addWidget(self.shift)
- lcenter.setAlignment(self.shift, QtCore.Qt.AlignLeft)
+ lcenter.addStretch()
lcenter.addWidget(self.chk)
-
- layout.addWidget(self.selector)
- layout.addWidget(self.search_bar)
layout.addLayout(lcenter)
- self.setWhatsThis(
- "Select an existing structure to build & propagate.\nChoose an optional shift to apply and specify the need to follow function calls."
- )
+ self.setLayout(layout)
- def give_focus(self):
- self.search_bar.setFocus(QtCore.Qt.FocusReason.PopupFocusReason)
+ def get_error(self) -> Optional[str]:
+ if self.get_entry_variable() is None:
+ return "Please provide a variable as an entry point for the propagation."
- # filter structures list with given keyword
- def find_struc(self, key: str):
- lkey = key.lower()
- for i in range(self.selector.count()):
- current = self.selector.item(i)
+ if self.get_shift() < 0:
+ return "Please provide a valid shift (negative values not supported)."
- if lkey in current.text().lower():
- current.setHidden(False)
- else:
- current.setHidden(True)
+ return None
- def get_dive(self) -> bool:
- return self.chk.isChecked()
+ def get_entry_variable(self) -> Optional[ida_hexrays.mop_t]:
+ if self.microcodeViewer.chosen is None:
+ return None
+ return self.microcodeViewer.chosen.mop # chosen op
def get_shift(self) -> int:
+ sval = self.shift.text()
try:
- return int(self.shift.text())
+ return int(sval, 16 if sval.startswith("0x") else 10)
except ValueError:
- return 0
-
- def get_struc(self) -> int:
- selected: StrucSelItem = self.selector.currentItem()
- if selected is None:
- return idaapi.BADADDR
- return selected.get_struc()
-
- def completed(self) -> Tuple[bool, str]:
- selected = self.selector.currentItem()
- if selected is None:
- return False, "Please select a structure"
- return True, None
+ return -1
# plugin's main UI
class BuilderMainWid(QtWidgets.QDialog):
- def __init__(self, parent: QtWidgets.QWidget = None):
+ def __init__(self, mba: ida_hexrays.mba_t, ea: int, hint: Tuple[int, int], parent: QtWidgets.QWidget = None):
super().__init__(parent)
+ self.mba = mba
- # properties to gather using this form
- self.dive: bool = False
- self.struc: Union[int, str] = idaapi.BADADDR
- self.shift: int = 0
+ # main layout
+ layout = QtWidgets.QVBoxLayout()
- # tabs
- self.tabWidget = QtWidgets.QTabWidget(self)
- self.tabWidget.setMovable(False)
- self.tabWidget.setTabsClosable(False)
+ # window's title
+ whint = QtWidgets.QLabel(self)
+ whint.setText("Select a structure
")
+ whint.setAlignment(QtCore.Qt.AlignCenter)
+ layout.addWidget(whint)
+ layout.setAlignment(whint, QtCore.Qt.AlignTop)
- f_tab = BuilderExistingTab(self, self.tabWidget)
- self.tabWidget.addTab(f_tab, "From existing")
+ # structure selector
+ self.struct_selector = StrucSelWid(self)
+ layout.addWidget(self.struct_selector, QtCore.Qt.AlignLeft)
+ layout.setStretch(1, 1)
- s_tab = BuilderNewTab(self, self.tabWidget)
- self.tabWidget.addTab(s_tab, "Build new")
+ # structure selector search bar
+ self.search_bar = QtWidgets.QLineEdit(self)
+ self.search_bar.setPlaceholderText("Structure name")
+ self.search_bar.setWhatsThis("Name of the structure (new or existing) to propagate.")
+ self.search_bar.textChanged.connect(self.search_for_structure)
+ layout.addWidget(self.search_bar)
+ self.search_bar.setFocus()
- # main layout
- layout = QtWidgets.QVBoxLayout()
- layout.addWidget(self.tabWidget)
+ # ctrl+f action
+ saction = QtWidgets.QAction(self)
+ saction.setShortcut(QtGui.QKeySequence.Find)
+ saction.triggered.connect(self.search_for)
+ self.addAction(saction)
+
+ # tabs
+ self.tabs = QtWidgets.QTabWidget(self)
+ self.tabs.setMovable(False)
+ self.tabs.setTabsClosable(False)
+ self.tab0 = BuilderFromPtrTab(self.mba, ea, hint, self, self.tabs)
+ self.tabs.addTab(self.tab0, "From pointer")
+ # self.tab1 = BuilderFromStkTab(self.mba.entry_ea, self, self.tabs)
+ # self.tabs.addTab(self.tab1, "From stack")
+ layout.addWidget(self.tabs)
+ layout.setStretch(3, 2)
+
+ # Cancel & Propagate buttons
+ lbottom = QtWidgets.QGridLayout()
+ cancel_btn = QtWidgets.QPushButton("Cancel", self)
+ cancel_btn.clicked.connect(self.reject)
+ ok_btn = QtWidgets.QPushButton("Propagate", self)
+ ok_btn.setDefault(True)
+ ok_btn.clicked.connect(self.execute)
+ lbottom.addWidget(cancel_btn, 0, 0)
+ lbottom.addWidget(ok_btn, 0, 1)
+
+ layout.addLayout(lbottom)
+ layout.setAlignment(lbottom, QtCore.Qt.AlignBottom)
self.setLayout(layout)
# window's properties
self.setWindowTitle(WINDOW_TITLE)
- self.setWhatsThis("You may use this form to automatically rebuild structures using Symless")
# window's icon
icon = QtGui.QIcon(os.path.join(os.path.abspath(symless.__path__[0]), "resources", "champi.png"))
self.setWindowIcon(icon)
- # set focused widget
- f_tab.give_focus()
+ # closing handler
+ self.finished.connect(self.on_finish)
+
+ def get_error(self) -> Optional[str]:
+ struc = self.struct_selector.currentItem()
- # execute form action
- def execute(self, form: BuilderTabBase):
- self.dive = form.get_dive()
- self.shift = form.get_shift()
- self.struc = form.get_struc()
+ if isinstance(struc, StrucSelDefaultItem) and len(self.search_bar.text()) == 0:
+ return "Please provide a name for the new structure."
- if self.shift < 0 or self.struc == idaapi.BADADDR:
- self.reject()
+ return None
+
+ # 'propagate' was clicked
+ def execute(self):
+ tab = self.tabs.currentWidget()
+ error = self.get_error() or tab.get_error()
+
+ if error is not None:
+ idaapi.warning(error)
+ return
self.accept()
+ # search for structure in list
+ def search_for_structure(self, key: str):
+ lkey = key.lower()
+ for i in range(self.struct_selector.count()):
+ current = self.struct_selector.item(i)
+ current.setHidden(
+ False if (not isinstance(current, StrucSelItem) or lkey in current.text().lower()) else True
+ )
+
+ # Ctrl+F action
+ def search_for(self):
+ self.search_bar.setFocus()
+
+ # get structures (name) selected by user
+ def get_structure(self) -> str:
+ selected = self.struct_selector.currentItem()
+ if isinstance(selected, StrucSelDefaultItem):
+ return self.search_bar.text()
+
+ return selected.get_name()
+
+ # spread in callees checked by user
+ def get_dive(self) -> bool:
+ return self.tabs.currentWidget().chk.isChecked()
+
+ # struc ptr shift specified by user
+ def get_shift(self) -> int:
+ tab = self.tabs.currentWidget()
+ return tab.get_shift() if isinstance(tab, BuilderFromPtrTab) else 0
+
+ # get the microcode variable the user selected as entry point
+ def get_entry_variable(self) -> Optional[ida_hexrays.mop_t]:
+ tab = self.tabs.currentWidget()
+ return tab.get_entry_variable()
+
+ # ea of the selected entry point
+ def get_entry_ea(self) -> Tuple[int, int]:
+ tab = self.tabs.currentWidget()
+ return (
+ (tab.microcodeViewer.chosen.ea, tab.microcodeViewer.chosen.sub_idx)
+ if isinstance(tab, BuilderFromPtrTab)
+ else (self.mba.entry_ea, 0)
+ )
+
+ # selected entry operand is a destination operand
+ def entry_is_dst_op(self) -> bool:
+ tab = self.tabs.currentWidget()
+ return tab.microcodeViewer.chosen.as_dst if isinstance(tab, BuilderFromPtrTab) else False
+
+ # close custom viewers when window is close
+ # otherwise they will haunt IDA forever
+ def on_finish(self, result):
+ # TODO this does not seem to work
+ # old views still appears in "Synchronize with"
+ self.tab0.microcodeViewer.Close()
+ # self.tab1.stack.Close()
+
# Hook to attach new action to popup menu
class PopUpHook(idaapi.UI_Hooks):
- def __init__(self):
- idaapi.UI_Hooks.__init__(self)
+ loaded = False
+
+ # triggered when all UI elements have been initialized
+ def ready_to_run(self):
+ self.init_action()
+
+ def init_action(self):
+ if self.loaded:
+ return
+
+ # check that the decompiler exists
+ if not idaapi.init_hexrays_plugin():
+ utils.g_logger.error("You do not have the decompiler for this architecture, symless will not load")
+ self.unhook()
+ return
icon_path = os.path.join(utils.get_resources_path(), "propag.png")
self.icon = idaapi.load_custom_icon(icon_path)
@@ -347,99 +791,128 @@ def __init__(self):
"Propagate structure",
BuildHandler(),
"Shift+t",
- "Automatic t-t-t",
+ "Build structure from selected variable",
self.icon,
idaapi.ADF_OWN_HANDLER,
)
idaapi.register_action(self.action)
+ self.loaded = True
def term(self):
idaapi.unregister_action(self.action.name)
idaapi.free_custom_icon(self.icon)
- # right click menu popup
- def finish_populating_widget_popup(self, widget, popup, ctx):
- # window is DISASM & no selection
- if idaapi.get_widget_type(widget) != idaapi.BWN_DISASM or (ctx.cur_flags & idaapi.ACF_HAS_SELECTION) != 0:
+ # triggered on right click menu popup
+ def finish_populating_widget_popup(self, widget, popup, ctx: idaapi.action_ctx_base_t):
+ # window is (DISASM or PSEUDOCODE) & no selection
+ if ctx.widget_type not in (idaapi.BWN_DISASM, idaapi.BWN_PSEUDOCODE) or ctx.has_flag(idaapi.ACF_HAS_SELECTION):
return
- current_ea = idaapi.get_screen_ea()
- current_op = idaapi.get_opnum()
+ # we are inside a function
+ if ctx.cur_func is None:
+ return
- # install the action if target is a register or a call insn
- if operand_is_reg(current_ea, current_op) or insn_is_call(current_ea):
- idaapi.attach_action_to_popup(widget, popup, self.action.name)
+ idaapi.attach_action_to_popup(widget, popup, self.action.name)
# context menu structure builder action
class BuildHandler(idaapi.action_handler_t):
- def activate(self, ctx) -> int:
- current_ea = ctx.cur_ea
- reg_id, op, nb_ops = target_op_reg(current_ea, idaapi.get_opnum())
-
- # selection is a register as an instruction's operand
- if reg_id >= 0:
- dst_op = op.n == 0 and nb_ops != 1
-
- # selection is a call instruction
- elif insn_is_call(current_ea):
- reg_id = 0 # rax, return of malloc..
- dst_op = True
+ def activate(self, ctx: idaapi.action_ctx_base_t) -> int:
+ hint_mreg = ida_hexrays.mr_none
+ hint_size = 0
- # should not happen
- else:
- return 0
-
- # arch supported
if not arch.is_arch_supported():
utils.g_logger.error("Unsupported arch (%s) or filetype" % arch.get_proc_name())
return 0
- # convert to full register
- if reg_id in ida_utils.X64_REG_ALIASES:
- reg_id = ida_utils.X64_REG_ALIASES[reg_id]
+ mba = ida_utils.get_func_microcode(ctx.cur_func, True)
+ if not mba:
+ utils.g_logger.error(f"Could not generate microcode for function {ctx.cur_func.start_ea:#x}")
+ return 0
+
+ # guess the micro operand associated to user selection
+ if ctx.widget_type == idaapi.BWN_DISASM:
+ hint_mreg, hint_size = self.guess_selected_mop_from_assembly()
+ else: # idaapi.BWN_PSEUDOCODE
+ # TODO implement guessing for pseudocode view
+ pass
# display plugin's UI
- reg = cpustate.reg_string(reg_id)
- form = BuilderMainWid()
- form.exec()
+ form = BuilderMainWid(mba, ctx.cur_ea, (hint_mreg, hint_size))
+ code = form.exec()
- # close if form cancel button was hit
- code = form.result()
+ # cancel button was hit
if code == QtWidgets.QDialog.Rejected:
return 0
- # build existing structure
- propagate_structure(current_ea, reg, dst_op, form.struc, form.shift, form.dive)
+ propagate_structure(
+ form.get_entry_ea(),
+ form.get_entry_variable(),
+ form.entry_is_dst_op(),
+ form.get_structure(),
+ form.get_shift(),
+ form.get_dive(),
+ )
- return 0
+ return 1 # all IDA windows will be refreshed
def update(self, ctx):
return idaapi.AST_ENABLE_ALWAYS
+ # use the currently selected assembly operand to guess the corresponding microcode operand
+ # returns (mreg_t, size)
+ def guess_selected_mop_from_assembly(self) -> Tuple[int, int]:
+ mreg = ida_hexrays.mr_none
+ mreg_size = 0
+
+ cur_ea = idaapi.get_screen_ea() # current address
+ cur_op = idaapi.get_opnum() # current op idx
+ cur_insn = idaapi.insn_t() # current instruction
+ insn_len = idaapi.decode_insn(cur_insn, cur_ea)
+
+ if insn_len == 0 or cur_op < 0 or cur_op > ida_utils.get_len_insn_ops(cur_insn):
+ return (mreg, mreg_size)
+
+ op = cur_insn.ops[cur_op]
+ if op.type == idaapi.o_reg:
+ mreg = ida_hexrays.reg2mreg(op.reg) # mr_none if none
+ mreg_size = idaapi.get_dtype_size(op.dtype)
-# propagate & build an existing structure
-# from given ea an reg (register)
-# for given struc (sid or name), shift and dive (should follow callees) option
-# dst_op: is selected register a src or dst operand
-def propagate_structure(ea: int, reg: str, dst_op: bool, struc: Union[int, str], shift: int, dive: bool):
+ elif op.type in [idaapi.o_phrase, idaapi.o_displ]:
+ mreg = ida_hexrays.reg2mreg(op.phrase)
+ mreg_size = ida_utils.get_ptr_size()
+
+ if mreg_size:
+ utils.g_logger.debug(f"Guess for target mreg: {ida_hexrays.get_mreg_name(mreg, mreg_size)}")
+
+ return (mreg, mreg_size)
+
+
+# do the propagation & build the structure
+def propagate_structure(
+ ea_couple: Tuple[int, int], mop: ida_hexrays.mop_t, dst_op: bool, strucname: str, shift: int, dive: bool
+):
idaapi.show_wait_box("HIDECANCEL\nPropagating struct info..")
+ ea, subea = ea_couple
+
try:
# get containing function
fct = idaapi.get_func(ea)
- # define entry for selected register
- entries = model.entry_record_t()
- entry_before = model.src_reg_entry_t(ea, fct.start_ea, reg)
- entry_before.struc_shift = shift # set right shift on associated structure
- entries.add_entry(entry_before, True)
-
- # hack: if reg is on dst operand, create both inject_before and inject_after entries
+ # entry is to be injected after minsn is processed
if dst_op:
- entry_after = model.dst_reg_entry_t(ea, fct.start_ea, reg)
- entry_after.struc_shift = shift
- entries.add_entry(entry_after, True, False)
+ entry = model.dst_var_entry_t(ea, subea, fct.start_ea, mop)
+
+ # entry is to be injected before
+ else:
+ entry = model.src_reg_entry_t(ea, subea, fct.start_ea, mop)
+
+ entry.struc_shift = shift # shift for associated structure
+
+ # set root entries
+ entries = model.entry_record_t()
+ entries.add_entry(entry, True)
# build entrypoints graph
ctx = model.context_t(entries, set())
@@ -450,21 +923,24 @@ def propagate_structure(ea: int, reg: str, dst_op: bool, struc: Union[int, str],
strucs = structures.define_structures(ctx)
# associate generated model with chosen structure
- _, struc_model = entry_before.get_structure()
-
- # struc is a structure id
- if isinstance(struc, int):
- struc_model.set_existing(struc)
+ _, struc_model = entry.get_structure()
+ if struc_model is None:
+ pass # previous steps have failed (hopefully with an error msg)
- # struc is a structure name
else:
- struc_model.set_name(struc)
+ struc_model.set_name(strucname)
+ struc_model.force_generation = True # generate even if empty
- # import structures into IDA
- generate.import_structures(strucs)
+ # make sure not to reduce existing structure size by removing padding
+ struc = ida_utils.get_local_type(strucname)
+ if struc and not struc.is_forward_decl() and struc.get_size() > struc_model.get_size():
+ struc_model.set_size(struc.get_size())
- # type operands with structures
- generate.import_context(ctx)
+ # import structures into IDA
+ generate.import_structures(strucs)
+
+ # type operands with structures
+ generate.import_context(ctx)
except Exception as e:
import traceback
@@ -472,37 +948,7 @@ def propagate_structure(ea: int, reg: str, dst_op: bool, struc: Union[int, str],
utils.g_logger.critical(repr(e) + "\n" + traceback.format_exc())
finally:
- idaapi.hide_wait_box()
+ # no need to keep all mbas
+ ida_utils.g_microcode_cache.clear()
-
-# get the register at given adress & operand
-# returns (reg id, operand, nb operands)
-def target_op_reg(ea: int, op_num: int) -> Tuple[int, Optional[idaapi.op_t], int]:
- insn = idaapi.insn_t()
- insn_len = idaapi.decode_insn(insn, ea)
- nb_ops = ida_utils.get_len_insn_ops(insn)
-
- if insn_len == 0 or op_num < 0 or op_num >= nb_ops:
- return -1, None, 0
-
- op = insn.ops[op_num]
- if op.type == idaapi.o_reg:
- return op.reg, op, nb_ops
-
- if op.type in [idaapi.o_phrase, idaapi.o_displ]:
- return cpustate.x64_base_reg(insn, op), op, nb_ops
-
- return -1, None, 0
-
-
-# is given operand a register
-def operand_is_reg(ea: int, op_num: int) -> bool:
- reg_id, _, _ = target_op_reg(ea, op_num)
- return reg_id >= 0
-
-
-# is given instruction a call
-def insn_is_call(ea: int) -> bool:
- insn = idaapi.insn_t()
- idaapi.decode_insn(insn, ea)
- return insn.itype in cpustate.INSN_CALLS
+ idaapi.hide_wait_box()
diff --git a/symless/resources/bigger_champi.png b/symless/resources/bigger_champi.png
index 57ea259..2ad152b 100644
Binary files a/symless/resources/bigger_champi.png and b/symless/resources/bigger_champi.png differ
diff --git a/symless/resources/cross.png b/symless/resources/cross.png
new file mode 100644
index 0000000..94e04d8
Binary files /dev/null and b/symless/resources/cross.png differ
diff --git a/symless/symbols/__init__.py b/symless/symbols/__init__.py
index 1cd3130..15e4e97 100644
--- a/symless/symbols/__init__.py
+++ b/symless/symbols/__init__.py
@@ -3,7 +3,6 @@
import idaapi
-import symless.cpustate.arch as arch
import symless.utils.ida_utils as ida_utils
import symless.utils.utils as utils
@@ -22,7 +21,7 @@
)
# invalid method field names exps & replacements
-re_invalid_method_name = ((re.compile(r"[\s]+"), "_"), (re.compile(r"[^0-9a-zA-Z_]"), ""))
+re_invalid_method_name = ((re.compile(r"[\s]+"), "_"), (re.compile(r"[^0-9a-zA-Z~_-]"), ""))
# full method name from method signature
@@ -84,9 +83,12 @@ def get_vtable_name_from_ctor(vtable_ea: int) -> Optional[str]:
# get child & parent classes names from vtable symbol
def get_classnames_from_vtable(vtable_ea: int) -> Tuple[Optional[str], Optional[str]]:
- if arch.is_elf():
- return get_classnames_from_vtable_gcc(vtable_ea)
- return get_classnames_from_vtable_msvc(vtable_ea)
+ for f in (get_classnames_from_vtable_gcc, get_classnames_from_vtable_msvc):
+ derived, parent = f(vtable_ea)
+ if derived is not None:
+ return derived, parent
+
+ return None, None
# get child & parent classes names from vtable symbol for gcc compiled binaries
diff --git a/symless/symbols/rename.py b/symless/symbols/rename.py
index 513f65b..80c5c26 100644
--- a/symless/symbols/rename.py
+++ b/symless/symbols/rename.py
@@ -26,7 +26,7 @@ def find_structure_name(struc: generation.structure_t) -> Collection[str]:
# loop over all nodes associated to the structure
# get names from the associated nodes
- for root, shift, block in struc.node_flow():
+ for root, block, shift in struc.node_flow(False):
if root != current_root:
current_root, depth = root, 0
@@ -61,7 +61,7 @@ def find_structure_name(struc: generation.structure_t) -> Collection[str]:
def define_structures_names(record: generation.structure_record_t):
all_names = set() # all given names record
- for struc in record.get_structures():
+ for struc in record.get_structures(include_discarded=False):
# define structure's fields names
struc.compute_names()
diff --git a/symless/utils/ida_utils.py b/symless/utils/ida_utils.py
index 7e86b71..13a205c 100644
--- a/symless/utils/ida_utils.py
+++ b/symless/utils/ida_utils.py
@@ -1,29 +1,14 @@
-from typing import List, Optional, Tuple
+from collections import deque
+from typing import Generator, List, Optional, Tuple
+import ida_hexrays
import idaapi
import idautils
import idc
-import symless.cpustate.cpustate as cpustate
import symless.symbols as symbols
import symless.utils.utils as utils
-# alias small registers on full-width registers
-X64_REG_ALIASES = {
- 16: 0, # al -> rax
- 17: 1, # cl -> rcx
- 18: 2, # dl -> rdx
- 19: 3, # bl -> rbx
- 20: 0, # ah -> rax
- 21: 1, # ch -> rcx
- 22: 2, # dh -> rdx
- 23: 3, # bh -> rbx
- 25: 5, # bpl -> rbp
- 26: 6, # sil -> rsi
- 27: 7, # dil -> rdi
-}
-
-
""" Imports utilities """
@@ -53,9 +38,6 @@ def iterator(ea, name, ord):
def demangle(name: str, inf_attr=idc.INF_SHORT_DN) -> str:
- if not name:
- return name
-
demangled = idaapi.demangle_name(name, idc.get_inf_attr(inf_attr))
if demangled:
return demangled
@@ -101,8 +83,15 @@ def get_all_references(address: int) -> set:
""" Pointers utilities """
-def get_ptr_size():
- return 8 if idaapi.get_inf_structure().is_64bit() else 4
+g_ptr_size = None
+
+
+def get_ptr_size() -> int:
+ global g_ptr_size
+ g_ptr_size = (
+ g_ptr_size if g_ptr_size else (8 if idaapi.inf_is_64bit() else (4 if idaapi.inf_is_32bit_or_higher else 2))
+ )
+ return g_ptr_size
def __dereference_pointer(addr: int, ptr_size: int) -> int:
@@ -113,14 +102,6 @@ def dereference_pointer(addr: int) -> int:
return __dereference_pointer(addr, get_ptr_size())
-def dereference_function_ptr(addr: int, ptr_size: int) -> bool:
- fea = __dereference_pointer(addr, ptr_size)
- func = idaapi.get_func(fea)
- if func is None or func.start_ea != fea: # addr is a function entry point
- return None
- return fea
-
-
# get size bytes from given ea, if ea is initialized with a value
def get_nb_bytes(ea: int, size: int) -> int:
if not idaapi.is_loaded(ea):
@@ -136,171 +117,6 @@ def get_nb_bytes(ea: int, size: int) -> int:
return idaapi.get_byte(ea)
-# return true if data at given ea & size has a value
-def is_data_initialized(ea: int, size: int) -> bool:
- # assume there can not be uninitialized bytes between data start & end
- return idaapi.is_loaded(ea) and idaapi.is_loaded(ea + size - 1)
-
-
-""" Vftable utilities """
-
-
-# can instruction at given ea load a vtable
-def is_vtable_load(ea: int) -> bool:
- if idaapi.get_func(ea) is None:
- return False
-
- insn = idaapi.insn_t()
- if idaapi.decode_insn(insn, ea) == 0:
- return False
-
- if insn.itype not in [idaapi.NN_lea, idaapi.NN_mov] or insn.ops[0].type not in (
- idaapi.o_reg,
- idaapi.o_phrase,
- idaapi.o_displ,
- ):
- return False
-
- # type 1: lea/mov rax, vtbl
- # type 2: lea/mov rax, [eax + vtbl_offset] (PIE case)
- return insn.ops[1].type in [idaapi.o_mem, idaapi.o_displ, idaapi.o_imm]
-
-
-# is vtable loaded at addr load stored later in a struct disp
-# returns the stored value if it is the case
-# TODO: miss mv [rax + rcx*2 + 16], rbx, even if we won't use it
-def is_vtable_stored(load: int, loaded: int) -> int:
- # following is: mov [rcx + n], rax
- bb = get_bb(load)
- if bb is None:
- return idaapi.BADADDR
-
- bb.start_ea = load
-
- state = cpustate.state_t()
- state.reset_arguments(cpustate.get_abi())
-
- insn = idaapi.insn_t()
- ea = bb.start_ea
-
- while cpustate.next_instruction(ea, bb, insn):
- cpustate.process_instruction(state, insn)
-
- if len(state.writes) > 0 and isinstance(state.writes[0].src, cpustate.mem_t):
- actual_loaded = state.writes[0].src.addr
- if loaded == actual_loaded:
- return state.writes[0].src.get_val()
-
- ea += insn.size
-
- return idaapi.BADADDR
-
-
-# is given ea a vtable or a vtable ref (.got)
-# returns effective vtable address
-def is_vtable_start(ea: int) -> int:
- if not idaapi.is_loaded(ea):
- return idaapi.BADADDR
-
- for xref in get_data_references(ea):
- # code loads the ea into a register
- if not is_vtable_load(xref):
- return idaapi.BADADDR
-
- # value from ea is stored into a struct
- stored_value = is_vtable_stored(xref, ea)
- if stored_value == idaapi.BADADDR:
- continue # continue because we miss the "mov [rax + rcx*n], vtbl" instructions
-
- # stored addr points to a functions ptrs array
- if vtable_size(stored_value) == 0:
- return idaapi.BADADDR
-
- utils.g_logger.debug(f"0x{ea:x} is a vtable / vtable ref for vtable 0x{stored_value:x}")
- return stored_value
-
- return idaapi.BADADDR
-
-
-# Returns function ea if function at given addr is in vtable, None otherwise
-def is_in_vtable(start_addr: int, addr: int, ptr_size: int):
- fea = dereference_function_ptr(addr, ptr_size)
- if fea is None:
- return None
-
- if addr == start_addr:
- return fea
-
- if (
- idaapi.get_first_dref_to(addr) != idaapi.BADADDR or idaapi.get_first_cref_to(addr) != idaapi.BADADDR
- ): # data is referenced, not part of the vtable
- return None
-
- return fea
-
-
-# yield all members of given vtable
-def vtable_members(addr: int):
- ptr_size = get_ptr_size()
-
- current = addr
- fea = is_in_vtable(addr, current, ptr_size)
- while fea is not None:
- yield fea
- current += ptr_size
- fea = is_in_vtable(addr, current, ptr_size)
-
-
-def vtable_size(addr: int) -> int:
- vtbl = [fea for fea in vtable_members(addr)]
- return len(vtbl) * get_ptr_size()
-
-
-# scans given segment for vtables
-# WARN: will not return vtables only used at virtual bases (vbase)
-def get_all_vtables_in(seg: idaapi.segment_t):
- utils.g_logger.info(
- "scanning segment %s[%x, %x] for vtables" % (idaapi.get_segm_name(seg), seg.start_ea, seg.end_ea)
- )
-
- current = seg.start_ea
- while current != idaapi.BADADDR and current < seg.end_ea:
- # do not cross functions
- chunk = idaapi.get_fchunk(current)
- if chunk is not None:
- current = chunk.end_ea
- continue
-
- # references a vtable ?
- effective_vtable = is_vtable_start(current)
- if effective_vtable != idaapi.BADADDR:
- utils.g_logger.info(f"vtable found at 0x{effective_vtable:x}")
- yield (current, effective_vtable)
-
- current = idaapi.next_head(current, seg.end_ea)
-
-
-# scans code segments for vtables
-def get_all_vtables():
- seg = idaapi.get_first_seg()
- while seg is not None:
- # search for vtables in .data and .text segments
- if seg.type == idaapi.SEG_CODE or seg.type == idaapi.SEG_DATA:
- for i in get_all_vtables_in(seg):
- yield i
-
- seg = idaapi.get_next_seg(seg.start_ea)
-
-
-# vtable ea from already existing vtable struc
-def get_vtable_ea(vtable: idaapi.struc_t) -> Tuple[int, str]:
- name = idaapi.get_struc_name(vtable.id)
- if not name.endswith(idaapi.VTBL_SUFFIX):
- return idaapi.BADADDR, name
-
- return idaapi.get_first_dref_to(vtable.id), name
-
-
""" Type utilities """
@@ -326,14 +142,20 @@ def get_local_type(name: str) -> Optional[idaapi.tinfo_t]:
return None
-# tinfo to struc sid, by name correspondance
-def struc_from_tinfo(tinfo: idaapi.tinfo_t) -> int:
- return idaapi.get_struc_id(tinfo.get_type_name())
+# convert a local variable forward ref into a real struct
+def replace_forward_ref(tif: idaapi.tinfo_t):
+ ord, tname = tif.get_ordinal(), tif.get_type_name()
+ mudt = idaapi.udt_type_data_t()
+ tif.create_udt(mudt)
+ err = tif.set_numbered_type(None, ord, idaapi.NTF_REPLACE)
+ if err != idaapi.TERR_OK:
+ utils.g_logger.error(f'Could not convert forward ref to "{tname}" : {idaapi.tinfo_errstr(err)} ({err})')
-# struc sid to tinfo
-def tinfo_from_stuc(sid: int) -> Optional[idaapi.tinfo_t]:
- return get_local_type(idaapi.get_struc_name(sid))
+# just a wrap around find_udm that returns BADADDR instead of -1
+def find_udm_wrap(struc: idaapi.tinfo_t, udm: idaapi.udm_t) -> int:
+ rc = struc.find_udm(udm, idaapi.STRMEM_OFFSET)
+ return idaapi.BADADDR if rc in (-1, idaapi.BADADDR) else rc
""" Function utilities """
@@ -386,7 +208,7 @@ def set_function_argument(
# creates a new valid func_type_data_t object
-def new_func_data(cc: int = idaapi.CM_CC_UNKNOWN) -> idaapi.func_type_data_t:
+def new_func_data() -> idaapi.func_type_data_t:
func_data = idaapi.func_type_data_t()
# ret type to void
@@ -394,85 +216,220 @@ def new_func_data(cc: int = idaapi.CM_CC_UNKNOWN) -> idaapi.func_type_data_t:
ret_tinfo.create_simple_type(idaapi.BT_VOID)
func_data.rettype = ret_tinfo
- # calling convention
- func_data.cc = cc
+ # fastcall as default cc
+ func_data.cc = idaapi.CM_CC_FASTCALL
return func_data
+# get the tinfo for a given function
+def get_fct_type(fea: int, force_decompile: bool = False) -> Optional[idaapi.tinfo_t]:
+ tinfo = idaapi.tinfo_t()
+ hf = ida_hexrays.hexrays_failure_t()
+
+ if (not force_decompile) and idaapi.get_tinfo(tinfo, fea):
+ return tinfo
+
+ utils.g_logger.info(f"Forcing decompilation of fct {fea:#x}")
+ cfunc = ida_hexrays.decompile_func(idaapi.get_func(fea), hf, ida_hexrays.DECOMP_NO_WAIT)
+ if cfunc is None:
+ utils.g_logger.warning(f"Could not decompile fct {fea:#x}: {hf.str} ({hf.code})")
+ return None
+
+ if cfunc.get_func_type(tinfo):
+ return tinfo
+
+ utils.g_logger.error(f"No tinfo_t for fea {fea:#x}")
+ return None
+
+
# get function type, create default one if none
-def get_or_create_fct_type(fea: int, default_cc: int) -> Tuple[idaapi.tinfo_t, idaapi.func_type_data_t]:
- func_tinfo = idaapi.tinfo_t()
+def get_or_create_fct_type(fea: int) -> Tuple[idaapi.tinfo_t, idaapi.func_type_data_t]:
+ func_tinfo = get_fct_type(fea)
+ if func_tinfo is None:
+ return (idaapi.tinfo_t(), new_func_data())
+
func_data = idaapi.func_type_data_t()
+ if func_tinfo.get_func_details(func_data):
+ return (func_tinfo, func_data)
+
+ return (func_tinfo, new_func_data())
+
+
+# yields one instruction operands
+def get_insn_ops(insn: idaapi.insn_t) -> Generator[idaapi.op_t, None, None]:
+ i = 0
+ while i < idaapi.UA_MAXOP and insn.ops[i].type != idaapi.o_void:
+ yield insn.ops[i]
+ i += 1
+
+
+# get instruction's operands count
+def get_len_insn_ops(insn: idaapi.insn_t) -> int:
+ return len([i for i in get_insn_ops(insn)])
+
+
+# cache our mbas with their special kregs
+g_microcode_cache = dict()
+
- if idaapi.get_tinfo(func_tinfo, fea):
- # unable to retrieve func_data on __high fcts, maybe try get_func_details(func_data, GTD_NO_ARGLOCS) ?
- if not func_tinfo.get_func_details(func_data):
- return (idaapi.tinfo_t(), new_func_data(default_cc))
+# get the microcode for a given function
+def get_func_microcode(func: idaapi.func_t, analyze_calls: bool = False) -> Optional[ida_hexrays.mba_t]:
+ global g_microcode_cache
+ if func.start_ea in g_microcode_cache:
+ return g_microcode_cache[func.start_ea]
+
+ # generate the function microcode
+ mbr = ida_hexrays.mba_ranges_t(func)
+ hf = ida_hexrays.hexrays_failure_t()
+ mba: ida_hexrays.mba_t = ida_hexrays.gen_microcode(
+ mbr, hf, None, ida_hexrays.DECOMP_NO_WAIT, ida_hexrays.MMAT_PREOPTIMIZED
+ )
+
+ if not mba:
+ utils.g_logger.error(f"Could generate mba for fct {func.start_ea:#x}: {hf.str} ({hf.code})")
+ return None
+
+ # build cfg and define blocks relations
+ mba.build_graph()
+
+ # resolve calls arguments and returns
+ if analyze_calls:
+ mba.analyze_calls(ida_hexrays.ACFL_GUESS)
+
+ # only cache mba used by cpustate (without initial call analysis)
else:
- utils.g_logger.warning(f"Could not retrieve tinfo_t for function 0x{fea:x}, trying decompile_func..")
+ g_microcode_cache[mba.entry_ea] = mba
- # call decompiler to get more info
- try:
- import ida_hexrays
+ # allocate special kregs
+ # used to pass result of inline minsns to parent minsn
+ setattr(mba, "tmp_result_kregs", deque())
+ for _ in range(8):
+ mba.tmp_result_kregs.append(mba.alloc_kreg(get_ptr_size()))
+ setattr(mba, "call_result_kreg", mba.alloc_kreg(get_ptr_size()))
- cfunc = ida_hexrays.decompile_func(idaapi.get_func(fea), ida_hexrays.hexrays_failure_t(), 0)
- if cfunc.__deref__() is not None and cfunc.get_func_type(func_tinfo):
- func_tinfo.get_func_details(func_data)
- return (func_tinfo, func_data)
+ return mba
- except ImportError:
- pass
- utils.g_logger.warning(f"Could not retrieve tinfo_t for function 0x{fea:x} from decompiling")
- func_data = new_func_data(default_cc)
+# analyze calls of given mba, make sure to have the correct args count for each
+def mba_analyze_calls(mba: ida_hexrays.mba_t):
+ global g_microcode_cache
- return (func_tinfo, func_data)
+ if mba.callinfo_built(): # already done
+ return
+
+ hf = ida_hexrays.hexrays_failure_t()
+
+ # find all calls, decompile callees for accurate arguments count
+ for i in range(mba.qty):
+ mblock = mba.get_mblock(i)
+ minsn = mblock.head
+
+ while minsn:
+ if minsn.is_unknown_call() and minsn.l.t == ida_hexrays.mop_v:
+ fct = idaapi.get_func(minsn.l.g)
+ if fct:
+ utils.g_logger.info(f"decompiling callee {fct.start_ea:#x} for accurate call info")
+ ida_hexrays.decompile_func(fct, hf, ida_hexrays.DECOMP_NO_WAIT)
+ minsn = minsn.next
-# get basic block containing ea
-def get_bb(ea: int) -> idaapi.range_t:
- func = idaapi.get_func(ea)
- if func is None:
+ # resolve call arguments (and ret) in mba
+ mba.analyze_calls(ida_hexrays.ACFL_GUESS)
+
+
+# get the microcode block containing the specified ea
+def get_block_microcode(fct: idaapi.func_t, ea: int) -> Optional[Tuple[ida_hexrays.mblock_t, ida_hexrays.mba_t]]:
+ # function microcode
+ mba = get_func_microcode(fct)
+ if not mba:
return None
- flow = idaapi.qflow_chart_t()
- flow.create("", func, func.start_ea, func.end_ea, idaapi.FC_NOEXT)
- for i in range(flow.size()):
- if ea >= flow[i].start_ea and ea < flow[i].end_ea:
- return idaapi.range_t(flow[i].start_ea, flow[i].end_ea)
+ # return containing block
+ try:
+ return (
+ next(
+ filter(
+ lambda b: (not (b.flags & ida_hexrays.MBL_FAKE)) and ea >= b.start and ea < b.end,
+ [mba.get_mblock(i) for i in range(mba.qty)],
+ )
+ ),
+ mba,
+ )
+ except StopIteration: # yes this can happen
+ return None
- return None
+# lift the instruction at ea into a micro instruction
+# returns a MMAT_PREOPTIMIZED minsn
+def get_ins_microcode(ea: int) -> Optional[ida_hexrays.minsn_t]:
+ insn = idaapi.insn_t()
+ rvec = idaapi.rangevec_t()
+ hf = ida_hexrays.hexrays_failure_t()
-# get instruction's operands + convert registers (al -> rax)
-def get_insn_ops(insn: idaapi.insn_t) -> List[idaapi.op_t]:
- ops = list()
- for index_op in range(get_len_insn_ops(insn)):
- op = insn.ops[index_op]
+ # get the instruction size
+ insn_size = idaapi.decode_insn(insn, ea)
+ if not insn_size:
+ utils.g_logger.warning(f"no instruction found at {ea:#x}")
+ return None
- if op.reg in X64_REG_ALIASES:
- op.reg = X64_REG_ALIASES[op.reg]
- ops.append(op)
- return ops
+ # generate the mba
+ rvec.push_back(idaapi.range_t(insn.ea, insn.ea + insn_size))
+ mbr = ida_hexrays.mba_ranges_t(rvec)
+ mba = ida_hexrays.gen_microcode(
+ mbr, hf, None, ida_hexrays.DECOMP_NO_WAIT | ida_hexrays.DECOMP_NO_FRAME, ida_hexrays.MMAT_PREOPTIMIZED
+ )
+ if not mba:
+ utils.g_logger.error(f"Could not get minsn for ea {ea:#x}: {hf.str} ({hf.code})")
+ return None
+ # find the minsn in the mba
+ # it should be the first instruction of the first non-fake block
+ minsn = next(
+ filter(lambda b: not (b.flags & ida_hexrays.MBL_FAKE), [mba.get_mblock(i) for i in range(mba.qty)])
+ ).head
-# get instruction's operands count
-def get_len_insn_ops(insn: idaapi.insn_t) -> int:
- res = 0
- for op in insn.ops:
- if op.type == idaapi.o_void:
- break
- res += 1
- return res
+ return ida_hexrays.minsn_t(minsn) # original is freed with mba
""" Misc """
-# does IDA support structures folders
-def can_create_folder() -> bool:
+# get root folder for local types, if supported
+def get_local_types_folder() -> Optional[idaapi.dirtree_t]:
try:
- return idaapi.get_std_dirtree is not None
+ return idaapi.get_std_dirtree(idaapi.DIRTREE_LOCAL_TYPES)
except AttributeError:
- return False
+ return None
+
+
+# mcode_t values to microinstruction names
+g_mcode_name = {
+ getattr(ida_hexrays, mcode): mcode[2:] for mcode in filter(lambda y: y.startswith("m_"), dir(ida_hexrays))
+}
+
+
+# mopt_t values to operand type as str
+g_mopt_name = [
+ "mop_z",
+ "mop_r",
+ "mop_n",
+ "mop_str",
+ "mop_d",
+ "mop_S",
+ "mop_v",
+ "mop_b",
+ "mop_f",
+ "mop_l",
+ "mop_a",
+ "mop_h",
+ "mop_c",
+ "mop_fn",
+ "mop_p",
+ "mop_sc",
+]
+
+
+# get instruction + operands representation
+def insn_str_full(insn: ida_hexrays.minsn_t) -> str:
+ return f"[{insn.dstr()}] = [{g_mcode_name[insn.opcode]} {', '.join([g_mopt_name[i.t] for i in (insn.l, insn.r, insn.d)])}]"
diff --git a/symless/utils/utils.py b/symless/utils/utils.py
index dfe779e..c4e2094 100644
--- a/symless/utils/utils.py
+++ b/symless/utils/utils.py
@@ -48,3 +48,12 @@ def print_delay(prefix: str, start: float, end: float):
min = int(delay / 60)
sec = delay - (min * 60)
g_logger.info("%s in %s%s" % (prefix, "%d minutes and " % min if min > 0 else "", "%d seconds" % sec))
+
+
+# convert integer to given sign & size
+def to_c_integer(value: int, sizeof: int, signed: bool = True) -> int:
+ mask = 1 << (sizeof * 8)
+ out = value & (mask - 1)
+ if signed and (out & (mask >> 1)):
+ out -= mask
+ return out
diff --git a/symless/utils/vtables.py b/symless/utils/vtables.py
new file mode 100644
index 0000000..9fbc40e
--- /dev/null
+++ b/symless/utils/vtables.py
@@ -0,0 +1,228 @@
+from collections import deque
+from typing import Collection, Generator, Optional, Tuple
+
+import idaapi
+
+import symless.cpustate.cpustate as cpustate
+import symless.utils.ida_utils as ida_utils
+import symless.utils.utils as utils
+
+""" Utilities for identifying virtual tables """
+
+
+# model for a virtual table
+class vtable_t:
+ def __init__(self, ea: int):
+ self.ea = ea
+ self.total_xrefs = 0 # total of xrefs on virtual methods
+
+ # all virtual methods (fea, is_imported)
+ self.members: Collection[Tuple[int, bool]] = deque()
+
+ for fea, is_import in vtable_members(ea):
+ self.members.append((fea, is_import))
+ self.total_xrefs += len(ida_utils.get_data_references(fea))
+
+ # list of ea where the vtable is loaded
+ self.load_xrefs: Collection[int] = deque()
+
+ def size(self) -> int:
+ return len(self.members) * ida_utils.get_ptr_size()
+
+ def add_load(self, xref: int):
+ self.load_xrefs.append(xref)
+
+ def get_loads(self) -> Collection[int]:
+ return self.load_xrefs
+
+ # search for places where this vtable is loaded into a structure
+ def search_loads(self):
+ for x in search_xrefs_for_vtable_load(self.ea, self.ea):
+ self.add_load(x)
+
+ # special RTTI case, vtable symbol may not point directly to the array of fct ptrs
+ # ctor loads the vtable in the object like this:
+ # lea rax, `vtable for'FooBar
+ # add rax, 10h
+ # mov [rbx], rax
+ if len(self.get_loads()) == 0:
+ for x in search_xrefs_for_vtable_load(self.ea - 2 * ida_utils.get_ptr_size(), self.ea):
+ self.add_load(x)
+
+ def get_members(self) -> Collection[Tuple[int, bool]]:
+ return self.members
+
+ def members_count(self) -> int:
+ return len(self.members)
+
+ # only contain imported functions
+ def all_imports(self) -> bool:
+ return all([is_import for _, is_import in self.members])
+
+ # we think this really is a vtable
+ def valid(self) -> bool:
+ return self.members_count() > 0 and not self.all_imports()
+
+ # get most derived vtable between self and other
+ # decision based on some not-so-accurate heuristics
+ def get_most_derived(self, other: "vtable_t") -> "vtable_t":
+ # biggest vtable is the most derived
+ if other.members_count() > self.members_count():
+ return other
+ if other.members_count() < self.members_count():
+ return self
+
+ # vtable with the most referenced methods is the base one
+ # why ? its methods may be referenced from all inheriting vtables
+ if self.total_xrefs > other.total_xrefs:
+ return other
+ if other.total_xrefs > self.total_xrefs:
+ return self
+
+ # most referenced vtable is the base one
+ # as it is more loaded (also loaded in child classes ctors/dtors)
+ if len(self.load_xrefs) > len(other.load_xrefs):
+ return other
+
+ return self
+
+ def __hash__(self):
+ return self.ea
+
+ def __eq__(self, value):
+ return (isinstance(value, int) and self.ea == value) or (isinstance(value, vtable_t) and self.ea == value.ea)
+
+
+# returns the next member for the given vftable, None if we reached the end
+def next_vtable_member(vtbl_ea: int, member_ea: int, ptr_size: int) -> Optional[Tuple[int, bool]]:
+ fea = ida_utils.__dereference_pointer(member_ea, ptr_size) & ~1 # in case of thumb mode
+ func = idaapi.get_func(fea)
+
+ # addr is a function entry point
+ if func and func.start_ea == fea:
+ imported = False
+
+ # addr points to an import
+ elif idaapi.is_mapped(fea) and idaapi.getseg(fea).type == idaapi.SEG_XTRN:
+ imported = True
+
+ else:
+ return None
+
+ # if a reference is found on the member, consider it is not part of the current vtable
+ if vtbl_ea != member_ea and (
+ idaapi.get_first_dref_to(member_ea) != idaapi.BADADDR or idaapi.get_first_cref_to(member_ea) != idaapi.BADADDR
+ ):
+ return None
+
+ return fea, imported
+
+
+# yield all members of given vtable
+def vtable_members(vtbl_ea: int) -> Generator[Tuple[int, bool], None, None]:
+ ptr_size = ida_utils.get_ptr_size()
+
+ current = vtbl_ea
+ r = next_vtable_member(vtbl_ea, current, ptr_size)
+ while r is not None:
+ yield r
+ current += ptr_size
+ r = next_vtable_member(vtbl_ea, current, ptr_size)
+
+
+# does the given xref points to a loading of vtbl_ea into a struct
+def is_vtable_loaded_at(fct: idaapi.func_t, xref_ea: int, vtbl_ea: int) -> bool:
+ block = ida_utils.get_block_microcode(fct, xref_ea)
+ if not block:
+ return False
+ mbb, mba = block
+
+ # flow in xref basic block, see if vtable is loaded & stored to a struct
+ minsn = mbb.head
+ state = cpustate.state_t(mba, None)
+
+ utils.g_logger.debug(f"Looking for a load of vtable {vtbl_ea:#x} at {xref_ea:#x}")
+
+ while minsn: # for every bb's instructions
+ for subinsn in cpustate.flatten_minsn(minsn, mba): # for every sub instruction
+ cpustate.process_instruction(state, subinsn)
+
+ # check for vtable ea to be stored
+ for write in state.writes:
+ if (
+ write.size == ida_utils.get_ptr_size()
+ and isinstance(write.value, cpustate.mem_t)
+ and write.value.get_uval() == vtbl_ea
+ ):
+ return True
+
+ minsn = minsn.next
+
+ return False
+
+
+# search the xrefs of given ea for loads of vtbl_ea
+def search_xrefs_for_vtable_load(ea: int, vtbl_ea: int) -> Generator[int, None, None]:
+ for xref in ida_utils.get_data_references(ea):
+ fct = idaapi.get_func(xref)
+
+ # referenced from other data
+ if fct is None and ida_utils.dereference_pointer(xref) == ea:
+ yield from search_xrefs_for_vtable_load(xref, vtbl_ea)
+
+ elif fct and is_vtable_loaded_at(fct, xref, vtbl_ea):
+ yield xref
+
+
+# get a model for the vtable at given address
+# None if no vtable found at that address
+def next_vtable(ea: int, end_ea: int) -> Tuple[Optional[vtable_t], int]:
+ if not idaapi.is_loaded(ea):
+ return None, idaapi.next_head(ea, end_ea)
+
+ vtbl = vtable_t(ea)
+ if vtbl.members_count() == 0: # not an array of fct ptrs
+ return None, idaapi.next_head(ea, end_ea)
+
+ if vtbl.all_imports(): # we are not sure any of these are fct ptrs
+ return None, ea + vtbl.size()
+
+ # find vtable loading sites
+ vtbl.search_loads()
+
+ # vtable only if it is loaded by code
+ if len(vtbl.get_loads()) != 0:
+ return vtbl, ea + vtbl.size()
+
+ return None, ea + vtbl.size()
+
+
+# scans given segment for vtables
+def get_all_vtables_in(seg: idaapi.segment_t) -> Generator[vtable_t, None, None]:
+ utils.g_logger.info(
+ "scanning segment %s[%x, %x] for vtables" % (idaapi.get_segm_name(seg), seg.start_ea, seg.end_ea)
+ )
+
+ current = seg.start_ea
+ while current != idaapi.BADADDR and current < seg.end_ea:
+ # do not cross functions
+ chunk = idaapi.get_fchunk(current)
+ if chunk is not None:
+ current = chunk.end_ea
+ continue
+
+ # is a vtable ?
+ vtbl, current = next_vtable(current, seg.end_ea)
+ if vtbl:
+ yield vtbl
+
+
+# scans code segments for vtables
+def get_all_vtables() -> Generator[vtable_t, None, None]:
+ seg = idaapi.get_first_seg()
+ while seg is not None:
+ # search for vtables in .data and .text segments
+ if seg.type == idaapi.SEG_CODE or seg.type == idaapi.SEG_DATA:
+ yield from get_all_vtables_in(seg)
+
+ seg = idaapi.get_next_seg(seg.start_ea)