diff --git a/CHANGELOG.md b/CHANGELOG.md index 081e870b2..aab5f4668 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,7 +16,7 @@ - Support specifying IndexRef in sinks. - Support interactive mode, allowing users to modify the taint configuration file and re-run taint analysis without needing to re-run the whole program analysis. - Enhance TFG dumping by adding taint configuration and call site info to Source/Sink node and TaintTransfer edge. - - Support programmatical taint config provider. + - Support programmatic taint config provider. - Class hierarchy analysis (CHA) - Support ignoring call sites that call methods declared in `java.lang.Object`. - Support ignoring call sites whose callees exceed given limit. diff --git a/docs/en/index-single.adoc b/docs/en/index-single.adoc index 1ce6835aa..a2fe416f7 100644 --- a/docs/en/index-single.adoc +++ b/docs/en/index-single.adoc @@ -14,6 +14,8 @@ include::setup-in-intellij-idea.adoc[leveloffset=+1] include::command-line-options.adoc[leveloffset=+1] +include::types-classes.adoc[leveloffset=+1] + include::taint-analysis.adoc[leveloffset=+1] include::commonly-used-taint-config.adoc[leveloffset=+1] diff --git a/docs/en/index.adoc b/docs/en/index.adoc index c884188a2..d4361f9e8 100644 --- a/docs/en/index.adoc +++ b/docs/en/index.adoc @@ -9,6 +9,7 @@ The reference documentation consists of the following sections: * <> * <> +* <> * <> ** <> * <> diff --git a/docs/en/taint-analysis.adoc b/docs/en/taint-analysis.adoc index 146308ff5..26e290483 100644 --- a/docs/en/taint-analysis.adoc +++ b/docs/en/taint-analysis.adoc @@ -9,8 +9,16 @@ This documentation is dedicated to providing guidance on using our taint analysi == Enabling Taint Analysis +Taint analysis can be enabled in one of two ways, or both approaches together: + +* using the YAML configuration file. + +* using the programmatic configuration provider. + +=== YAML Configuration File + In Tai-e, taint analysis is designed and implemented as a plugin of pointer analysis framework. -To enable taint analysis, simply start pointer analysis with option `taint-config`, for example: +To enable taint analysis with the YAML configuration file, simply start pointer analysis with option `taint-config`, for example: [source] ---- @@ -23,100 +31,102 @@ In the upcoming section, we will provide a comprehensive guide on crafting a con TIP: You could use various pointer analysis techniques to obtain different precision/efficiency tradeoffs. For additional details, please refer to <>. -== Configuring Taint Analysis - -In this section, we present instructions on configuring sources, sinks, taint transfers, and sanitizers for the taint analysis using a YAML configuration file. -To get a broad understanding, you can start by examining the https://github.com/pascal-lab/Tai-e/blob/master/src/test/resources/pta/taint/taint-config.yml[taint-config.yml] file from our test cases as an illustrative example. -NOTE: Certain configuration values include special characters, such as spaces, `[`, and `]`. -To ensure these values are correctly interpreted by the YAML parser, please make sure to enclose them within *quotation marks*. +==== Interactive Mode -=== Basic Concepts +Interactive mode enables users to modify the taint configuration file(s) and re-run taint analysis without needing to re-run the whole program analysis. -We first present several basic concepts employed in the configuration. +This feature significantly speeds up both taint configuration development/debugging and production scenarios that running multiple configuration sets. -==== Type +To enable interactive mode, append additional `taint-interactive-mode:true` option when starting the taint analysis, for example: -You may write following types in configuration: +[source] +---- +-a pta=...;taint-config:;taint-interactive-mode:true;... +---- -[cols="1,2,2"] -|=== -| Type | Format | Examples +Once the taint analysis completes, Tai-e will enter an interactive state where you can: -| Class type -| Fully-qualified class name. -| `java.lang.String`, `org.example.MyClass` +1. Modify the taint configuration file(s) and press `r` in the console to re-run the taint analysis with your updated configuration. +2. Press `e` in the console to exit interactive mode. -| Array type -| A type following by one or more `[]`, where the number of `[]` equals the number of the array dimension. -| `java.lang.String[]`, `org.example.MyClass[][]`, `char[]` -| Primitive type -| Primitive type names in Java. -| `int`, `char`, etc. -|=== +=== Programmatic Taint Configuration Provider -==== Method Signature +In addition to the YAML configuration file, Tai-e also supports programmatic taint configuration. -In the configuration, we employ a method signature to provide a unique identifier for a method in the analyzed program. -The format of a method signature is given below: +To enable it, start pointer analysis with option `taint-config-providers`, for example: [source] ---- - +-a pta=...;taint-config-providers:[my.example.MyTaintConfigProvider];... ---- -* `CLASS_TYPE`: The class in which the method is declared. -* `RETURN_TYPE`: The return type of the method. -* `METHOD_NAME`: The name of the method. -* `PARAMETER_TYPES`: The list of parameters types of the method. -Multiple parameter types are separated by `,` (Do not insert spaces around `,`!). -If the method has no parameters, just write `()`. - -For example, the signatures of methods `equals` and `toString` of `Object` are: +The class `my.example.MyTaintConfigProvider` should extend the interface `pascal.taie.analysis.pta.plugin.taint.TaintConfigProvider`. -[source] ----- - - +[source,java,subs="verbatim"] ---- +package my.example; -==== Field Signature +public class MyTaintConfigProvider extends TaintConfigProvider { + public MyTaintConfigProvider(ClassHierarchy hierarchy, TypeSystem typeSystem) { + super(hierarchy, typeSystem); + } -Just like methods, field signatures serve the purpose of uniquely identifying fields within the analyzed program. -The format of a field signature is given below: + @Override + protected List sources() { return List.of(); } -[source] ----- - + @Override + protected List sinks() { return List.of(); } +// ... +} ---- -* `CLASS_TYPE`: The class in which the field is declared. -* `FIELD_TYPE`: The type of the field. -* `FIELD_NAME`: The name of the field. -For example, the signature of the field `info` below +== Configuring Taint Analysis -[source,java] ----- -package org.example; +In this section, we present instructions on configuring sources, sinks, taint transfers, and sanitizers for the taint analysis using a YAML configuration file. +To get a broad understanding, you can start by examining the https://github.com/pascal-lab/Tai-e/blob/master/src/test/resources/pta/taint/taint-config.yml[taint-config.yml] file from our test cases as an illustrative example. -class MyClass { - String info; -} ----- +NOTE: Certain configuration values include special characters, such as spaces, `[`, and `]`. +To ensure these values are correctly interpreted by the YAML parser, please make sure to enclose them within *quotation marks*. -is +=== Basic Concepts -[source] ----- - ----- +We first present several basic concepts employed in the configuration. + +==== Type, Method, and Field Signatures + +In taint configuration, you'll need to specify types, methods, and fields within the program. +This is done using their signatures, as detailed in <>. + +To simplify the configuration process, our taint analysis also supports <>. +These patterns provide a more flexible way to specify program elements. +For example, instead of listing every method in a class, you might use a pattern to match all methods with a certain return type or parameter list. + +This approach reduces the amount of configuration needed and makes it easier to maintain and update your taint analysis settings. + +==== Index Reference + +In taint analysis configuration, it's often necessary to specify: + +* A variable +* A field of an object referenced by a variable +* Elements of an array referenced by a variable + +These specifications may be required at a call site or within a method. +To facilitate this, we introduce the concept of **index reference**. + +An index reference consists of two parts: -==== Variable Index +1. **Index**: This refers to the specified variable (also called the _variable index_). +2. **Reference**: This indicates whether we're referring to: +- The variable itself +- A field of the object referenced by the variable +- Elements of the array referenced by the variable -When setting up taint analysis, it's typically necessary to indicate a variable at a call site or within a method. -This can be accomplished using _variable index_. +This combination of variable indexes and references provides a flexible way to pinpoint exactly which program element you want to include in your taint analysis configuration. +Let's break this down. ===== Variable Index of A Call Site @@ -127,15 +137,15 @@ We classify variables at a call site into several kinds, and provide their corre | Kind | Description | Index | Result variable -| The variable that receives the result of the method call, also known as the left-hand side (LHS) variable of the call site. +| The variable receiving the method call result (i.e., the left-hand side or LHS variable) | `result` | Base variable -| The variable that points to the receiver object of the method call. Note that this variable is absent in the cases of static method calls. +| The variable pointing to the receiver object of the method call (absent in static method calls) | `base` | Arguments -| The arguments of the call site, indexed starting from 0. +| The arguments of the call site, indexed starting from 0 | `0`, `1`, `2`, ... |=== @@ -152,8 +162,8 @@ r = o.foo(p, q); ===== Variable Index of A Method -Currently, we support specifying parameters of a method using indexes. -Similar to arguments of a call site, the parameters are indexed starting from 0. +Within a method, we currently support indexing for method parameters. +Similar to call site arguments, the parameters are indexed starting from 0. For example, the indexes of parameters `t`, `s`, and `o` of method `foo` below are `0`, `1`, and `2`. [source,java] @@ -167,6 +177,18 @@ class MyClass { } ---- +===== Reference + +The reference part is optional and specifies which aspect of the indexed variable we're interested in: + +1. **No reference**: Refers to the variable itself as specified by the index. +2. **Field reference**: Append `.` to the index (e.g., `0.f` refers to field `f` of the object pointed to by the variable with index `0`). +3. **Array element reference**: Append `[\*]` to the index (e.g., `result[*]` refers to all elements of the array pointed to by the result variable). + +NOTE: `[` and `]` are special characters in YAML, so you need to enclose them in quotes like `"result[*]"`. + +This flexible system allows for precise specification of variables, object fields, and array elements in various contexts within your taint analysis configuration. + === Sources Taint objects are generated by sources. In the configuration file, sources are specified as a list of source entries following key `sources`, for example: @@ -188,11 +210,11 @@ The format of this kind of sources is: [source,yaml,subs="+normal"] ---- -- { kind: call, method: METHOD_SIGNATURE, index: INDEX, [underline]#type: TYPE# } +- { kind: call, method: METHOD_SIGNATURE, index: INDEX_REF, [underline]#type: TYPE# } ---- -If you write such a source in the configuration, then when the taint analysis finds that method `METHOD_SIGNATURE` is invoked at call site _l_, it will generate a taint object of type `TYPE` for the variable indicated by `INDEX` at call site _l_. -For how to specify `METHOD_SIGNATURE` and `INDEX`, please refer to <> and <>. +If you write such a source in the configuration, then when the taint analysis finds that method `METHOD_SIGNATURE` is invoked at call site _l_, it will generate a taint object of type `TYPE` for the reference indicated by `INDEX_REF` at call site _l_. +For how to specify `METHOD_SIGNATURE` and `INDEX_REF`, please refer to <> and <>. We use [underline]#underlining# to emphasize the optional nature of [underline]#`type: TYPE`# in call source configuration. When it is not specified, the taint analysis will utilize the corresponding declared type from the method. @@ -231,11 +253,11 @@ To address this requirement, our taint analysis provides the capability to confi [source,yaml,subs="+normal"] ---- -- { kind: param, method: METHOD_SIGNATURE, index: INDEX, [underline]#type: TYPE# } +- { kind: param, method: METHOD_SIGNATURE, index: INDEX_REF, [underline]#type: TYPE# } ---- -If you include this type of source in the configuration, when the taint analysis determines that the method `METHOD_SIGNATURE` is reachable, it will create a taint object of `TYPE` for the parameter indicated by `INDEX`. -For guidance on specifying `METHOD_SIGNATURE` and `INDEX`, please refer to the <> and <>. +If you include this type of source in the configuration, when the taint analysis determines that the method `METHOD_SIGNATURE` is reachable, it will create a taint object of `TYPE` for the reference indicated by `INDEX_REF`. +For guidance on specifying `METHOD_SIGNATURE` and `INDEX_REF`, please refer to the <> and <>. ==== Field Sources @@ -247,7 +269,7 @@ Our taint analysis also enables users to designate fields as taint sources using ---- When you include this type of source in the configuration, if the taint analysis identifies that the field `FIELD_SIGNATURE` is loaded into a variable `v` (e.g., `v = o.f`), it will generate a taint object of `TYPE` for `v`. -For instructions on specifying `FIELD_SIGNATURE`, please refer to <>. +For instructions on specifying `FIELD_SIGNATURE`, please refer to <>. === Sinks @@ -257,19 +279,18 @@ In the configuration file, sinks are defined as a list of sink entries under the [source,yaml] ---- sinks: - - { method: METHOD_SIGNATURE, index: INDEX } + - { method: METHOD_SIGNATURE, index: INDEX_REF } - ... ---- -If you include this type of sink in the configuration, when the taint analysis identifies that the method `METHOD_SIGNATURE` is invoked at call site `l` and the variable at `l`, as indicated by `INDEX`, points to any taint objects, it will generate reports for the detected taint flows. +If you include this type of sink in the configuration, when the taint analysis identifies that the method `METHOD_SIGNATURE` is invoked at call site `l` and the reference at `l`, as indicated by `INDEX_REF`, points to any taint objects, it will generate reports for the detected taint flows. -For guidance on specifying `METHOD_SIGNATURE` and `INDEX`, please refer to <> and <>. +For guidance on specifying `METHOD_SIGNATURE` and `INDEX_REF`, please refer to <> and <>. === Taint Transfers -In taint analysis, taint is associated with the data's content, allowing it to move between objects. -This process is referred to as _taint transfer_, and it occurs frequently in real-world code. -If not managed effectively, the failure to address these transfers can result in the oversight of numerous security vulnerabilities. +In taint analysis, taint is associated with data content and can move between objects. This process, known as _taint transfer_, is common in real-world code. +Effectively managing these transfers is crucial for detecting potential security vulnerabilities. ==== Introduction @@ -302,9 +323,9 @@ To address such scenarios, our taint analysis allows users to specify which meth ==== Configuration In this section, we provide instructions on configuring taint transfers. -Taint transfer essentially involves the triggering of taint propagation from specific variables to other variables at call sites through method calls. -We refer to the source of taint transfer as the _from-variable_ and the target as the _to-variable_. -For example, in the case of `sb.append(taint)` from the previous example, `taint` serves as the from-variable, and `sb` acts as the to-variable. +Taint transfer essentially involves the triggering of taint propagation from specific reference (e.g., variables or fields) to other references at call sites through method calls. +We refer to the source of taint transfer as the _from-ref_ and the target as the _to-ref_. +For example, in the case of `sb.append(taint)` from the previous example, `taint` serves as the from-ref, and `sb` acts as the to-ref. In the configuration file, taint transfers are defined as a list of transfer entries under the key `transfers`, as shown in the example below: @@ -320,14 +341,14 @@ Each transfer entry follows this format: [source,yaml,subs="+normal"] ---- -- { method: METHOD_SIGNATURE, from: INDEX, to: INDEX, [underline]#type: TYPE# } +- { method: METHOD_SIGNATURE, from: INDEX_REF, to: INDEX_REF, [underline]#type: TYPE# } ---- -Here, `METHOD_SIGNATURE` represents the method that triggers taint transfer, `from` and `to` specify the indexes of from-variable and to-variable at the call site. +Here, `METHOD_SIGNATURE` represents the method that triggers taint transfer, `from` and `to` specify the from-ref and to-ref at the call site. `TYPE` denotes the type of the transferred taint object, which is also *optional*. Taint transfer can be intricate in real-world programs. -To detect a broader range of security vulnerabilities, our taint analysis supports various types of taint transfers. +To detect a broader range of security vulnerabilities, our taint analysis supports various types of taint transfers using <>. You can use different expressions for `from` and `to` in transfer entries to enable different types of taint transfers, as outlined below: |=== @@ -378,8 +399,6 @@ At this call, the taint stored in array `cmds` is transferred to `expr`, and we Here, `from: "0[*]"` indicates that the taint analysis will examine _all elements_ in the array pointed to by _0-th parameter_ (i.e., `cmds`), and if it detects any taint objects, it will propagate them to the variable specified by `to: result` (i.e., `expr`). -NOTE: `[` and `]` are special characters in YAML, so you need to enclose them in quotes like `"0[*]"`. - === Sanitizers Our taint analysis allows users to define sanitizers in order to reduce false positives. This can be accomplished by writing a list of sanitizer entries under the key `sanitizers` in the configuration, as demonstrated below: @@ -393,6 +412,12 @@ sanitizers: Subsequently, the taint analysis will prevent the propagation of taint objects to the parameter specified by `INDEX` in the method `METHOD_SIGNATURE`. +[NOTE] +==== +Currently, sanitizers do not support index references. +You can only specify variables using the `INDEX` parameter. +==== + // TODO: === Call-Site Mode === Multiple Configuration Files @@ -403,38 +428,6 @@ Users can simply place all relevant configuration files within a designated dire TIP: The taint analysis will traverse the directory iteratively during the configuration loading process. Therefore, you have the flexibility to organize the configuration files as you see fit, including placing them in multiple subdirectories if desired. -=== Programmatical Taint Configuration Provider - -In addition to the YAML configuration file, Tai-e also supports programmatical taint configuration. - -To enable it, start pointer analysis with option `taint-config-provider`, for example: - -[source] ----- --a pta=...;taint-config-provider:[my.example.MyTaintConfigProvider];... ----- - -The class `my.example.MyTaintConfigProvider` should extends the interface `pascal.taie.analysis.pta.plugin.taint.TaintConfigProvider`. - -[source,java,subs="verbatim"] ----- -package my.example; - -public class MyTaintConfigProvider extends TaintConfigProvider { - public MyTaintConfigProvider(ClassHierarchy hierarchy, TypeSystem typeSystem) { - super(hierarchy, typeSystem); - } - - @Override - protected List sources() { return List.of(); } - - @Override - protected List sinks() { return List.of(); } -// ... -} ----- - - == Output of Taint Analysis Currently, the output of the taint analysis consists of two parts: console output and taint flow graph. @@ -463,13 +456,13 @@ Given that there are several kinds of <>, each kind has a corresponding a| * `METHOD_SIGNATURE`: The method containing the call site. * `[i@Ln]`: Position of the call site. * `CALL_STMT`: The call statement (site). - * `INDEX`: <> of the source point variable. + * `INDEX_REF`: <> of the source point. | Parameter source | A parameter of the source method. | `METHOD_SIGNATURE/INDEX` a| * `METHOD_SIGNATURE`: The source method. - * `INDEX`: <> of the source point variable. + * `INDEX_REF`: <> of the source point. | Field source | A variable that receives loaded value from the source field. diff --git a/docs/en/types-classes.adoc b/docs/en/types-classes.adoc new file mode 100644 index 000000000..bb6e69b2d --- /dev/null +++ b/docs/en/types-classes.adoc @@ -0,0 +1,272 @@ +include::attributes.adoc[] + += How to Specify and Access Types, Classes, and Class Members (Methods and Fields) + +Java programs are built using types and classes, which consist of class members such as methods and fields. +Tai-e assigns a unique identifier, known as a signature, to each type, class, and class member. +These signatures enable users to easily configure and specify the behavior of program analyzers for specific elements, such as in taint configuration (see <>). +Additionally, they allow analysis developers to easily retrieve and manipulate program elements through Tai-e's convenient APIs. + +In some cases, it may be necessary to specify _a large number_ of related classes or class members within a configuration or when implementing a particular program analysis. +To streamline this process, we have designed and implemented various signature patterns and matchers for classes, methods, and fields, enabling you to specify and retrieve multiple elements using a single signature pattern. + +This documentation will guide you through the format of signatures for types, classes, and class members, as well as the APIs for accessing these program elements via their signatures. + +NOTE: Since generic types are erased in Java, type signatures, along with class and class member signatures, *do not include type parameters*. + +== Type Signatures +In this section, we introduce the signatures for various Java types, including primitive types, reference types, and the `void` type. + +=== Primitive Types +The signatures for the eight Java primitive types are simply their names: `byte`, `short`, `int`, `long`, `float`, `double`, `char`, and `boolean`. + +=== Reference Types +Java reference types include class types (encompassing interfaces and enums) and array types. +The signature formats for these types are outlined below. + +==== Class Types (Including Interfaces and Enums) +The signature for a class type is its fully-qualified class name, which includes the package name. +For an inner class, insert a `$` between the outer class name and the inner class name. +Here are some examples: + +* `java.lang.String` +* `pascal.taie.Main` +* `org.example.MyClass` +* `java.util.Map$Entry` + +==== Array Types +An array type signature consists of its base type followed by one or more `[]`, with the number of `[]` indicating the array's dimensions. +Here are some examples: + +* `java.lang.String[]` +* `org.example.MyClass[][]` +* `char[]` + +=== Void Type +The signature for the void type is simply `void`. This appears in <> for methods that do not return a value. + +=== Programmatically Accessing a Type via Signature +For analysis developers, Tai-e provides convenient APIs to access various types. All the classes related to types, mentioned below, are located in the `pascal.taie.language.type` package. + +In Tai-e, the `TypeSystem` class (accessible via `World.get().getTypeSystem()`) offers APIs to retrieve all types (except `void`, which is discussed later): + +* `TypeSystem.getPrimitiveType(String)`: Retrieves a primitive type by its signature. +* `TypeSystem.getClassType(String)`: Retrieves a class type by its signature. +* `TypeSystem.getArrayType(Type,int)`: Retrieves an array type by its base type and the number of dimensions. +* `TypeSystem.getType(String)`: Retrieves a primitive type, class type, or array type by its signature. + +Additionally, primitive types and the `void` type are implemented as enums in Tai-e, and can be directly accessed through their respective classes, such as `IntType.INT` and `VoidType.VOID`. + +== Class and Class Member Signatures +In this section, we introduce the signatures for classes and their members, specifically methods and fields. +While constructors are typically considered class members, in Tai-e, they are treated as methods with a special name ``, as explained in <>. + +=== Class Signatures +Unsurprisingly, the format for class signatures is identical to that of <>, so we won’t repeat the details here. + +=== Method Signatures +The format of a method signature is as follows: + +[source] +---- + +---- + +* `CLASS_TYPE`: The signature of the class in which the method is declared. +* `RETURN_TYPE`: The signature of the method's return type. +* `METHOD_NAME`: The name of the method. +* `PARAMETER_TYPES`: A `,`-separated list of parameter type signatures (Do not insert spaces around the `,`!). +If the method has no parameters, use `()`. + +Here are some examples of method signatures: + +[source] +---- + + + +---- + +As mentioned earlier, *constructors* are treated as methods in Tai-e. +Each constructor has the name ``, and its return type is always `void`. +For example, the constructor signatures for `ArrayList` are: + +[source] +---- +()> +(int)> +(java.util.Collection)> +---- + +Another special class member is the *static initializer* (also known as the class initializer), which is treated as a method with no arguments and no return value in Tai-e. +The method name for a static initializer is ``. +For example, the signature of static initializer for `Object` is `()>`. + +=== Field Signatures +Like methods, field signatures uniquely identify fields within a Java program. +The format of a field signature is as follows: + +[source] +---- + +---- + +* `CLASS_TYPE`: The signature of the class where the field is declared. +* `FIELD_TYPE`: The signature of the field's type. +* `FIELD_NAME`: The name of the field. + +For example, the signature for the field `info` in the following code: + +[source,java] +---- +package org.example; + +class MyClass { + String info; +} +---- + +is: + +[source] +---- + +---- + +=== Programmatically Accessing a Class or Member via Signature +Tai-e offers convenient APIs through the `pascal.taie.language.classes.ClassHierarchy` class, allowing analysis developers to access a class or member by its signature. +The available methods are: + +* `ClassHierarchy.getClass(String)`: Retrieves a class (`JClass`) by its signature. +* `ClassHierarchy.getMethod(String)`: Retrieves a method (`JMethod`) by its signature. +* `ClassHierarchy.getField(String)`: Retrieves a field (`JField`) by its signature. + +== Signature Patterns +Sometimes, users need to specify multiple related classes or members in a configuration, such as in <>. +To simplify this process, we have designed and implemented the _signature pattern_ mechanism, similar to regular expressions but specifically tailored for classes and members. +This allows users to conveniently specify multiple related classes or members using a single signature pattern. + +In this section, we will introduce the formats of signature patterns and explain how to use them in analysis development. + +=== Name Wildcards +Signatures are composed of various names, including class names, method names, field names, and type names within method and field signatures. +To simplify specifying these names, we introduce the concept of *name wildcards*, which form the foundation of signature patterns. +A name wildcard is any name that contains zero or more `+*+` characters, where each `+*+` can match any sequence of characters. + +Here are some examples: + +* `+java.util.*+` matches all classes in the `java.util` package and its sub-packages (like `java.util.regex`) +* `+get*+` matches all method names that start with `get` (like `getName` or `getKey`) +* Names without any `+*+` characters match exactly (like `toString` only matches the `toString` methods) + +=== Class Signature Pattern +Class signature patterns come in two forms: + +1. **Basic Pattern**: A name wildcard that directly matches class names. +* Example: `+java.util.*+` matches all classes in the `java.util` package +* Example: `java.util.HashMap` matches exactly that class + +2. **Subclass Pattern**: A name wildcard followed by `^` that matches both the specified classes and all their subclasses. +* Example: `java.util.List^` matches `List` and all classes that extend or implement it +* Example: `java.lang.*Exception^` matches all exception classes in the `java.lang` package and their subclasses, including classes like `RuntimeException`, `IllegalArgumentException`, and any custom exceptions that extend these classes + +The subclass pattern is particularly useful when you need to capture an entire class hierarchy without listing each class individually. + +=== Method Signature Pattern + +Method signature patterns follow a format similar to method signatures but with added flexibility to match multiple methods. The general format is: + +[source] +---- + +---- + +Each component of the method signature pattern supports different matching mechanisms: + +* `CLASS_PATTERN`: Can be a class signature pattern (basic or subclass pattern). +* `RETURN_TYPE_PATTERN`: A type signature pattern. +* `METHOD_NAME_PATTERN`: Can be a name wildcard. +* `PARAMETER_TYPE_PATTERNS`: A `,`-separated list of type signature patterns (no spaces around `,`), which also supports parameter wildcards. + +**Type Signature Patterns**: + +- For class types, they are equivalent to class patterns. +- For other types, they use simple name wildcard matching. + +**Parameter Wildcards**: +Method signature patterns support parameter wildcards, allowing you to specify repetition of type signature patterns. +There are three types of repetition: + +1. Repeat exactly N times: `TYPE_PATTERN\{N\}` +2. Repeat at least N times: `TYPE_PATTERN{N+}` +3. Repeat between M and N times: `TYPE_PATTERN\{M-N\}` + +Here are some examples of method signature patterns: + +[source] +---- + +---- +This pattern matches all methods in `List` and its implementations that start with `get` and have one parameter of any type. + +[source] +---- + +---- +This pattern matches all methods in classes directly under the `java.lang` package that start with `set`, return `void`, and have two parameters: a `String` and any other type. + +[source] +---- +<*: java.lang.String toString()> +---- +This pattern matches `toString` methods that return `String` and have no parameters, in any class. + +[source] +---- + +---- +This pattern matches all methods in `Map` and its implementations that have two parameters: the first being `Object` or any of its subclasses, and the second being any type. + +[source] +---- + +---- +This pattern matches `format` methods in the `String` class that take a `String` parameter followed by zero or more `Object` (or subclass) parameters. + +[source] +---- + +---- +This pattern matches `asList` methods in the `Arrays` class that take between 1 and 5 `Object` parameters. + +Method signature patterns provide a powerful way to specify groups of related methods across multiple classes, greatly simplifying configuration in various analyses. +The addition of parameter wildcards further enhances this flexibility, allowing for precise matching of methods with varying numbers of parameters. + +=== Field Signature Pattern + +Field signature patterns follow a format similar to field signatures but with added flexibility to match multiple fields. +The format of a field signature pattern is: + +[source] +---- + +---- + +This format is simpler than the method signature pattern, as field signatures do not include a parameter list. +Each component (`CLASS_PATTERN`, `FIELD_TYPE_PATTERN`, and `FIELD_NAME_PATTERN`) supports the same matching mechanisms as in method signature patterns. + +Example: +[source] +---- + +---- +This pattern matches the `size` field in `java.util.List` and its subclasses, regardless of the field's type. + +=== Programmatically Accessing Multiple Classes or Members via Signature Pattern +Tai-e provides convenient APIs for analysis developers to retrieve multiple classes or members using signature patterns. +To use these, developers first create a `pascal.taie.language.classes.SignatureMatcher` object, passing a `ClassHierarchy` as an argument. +They can then use the following APIs: + +* `SignatureMatcher.getClasses(String)`: Retrieves classes (`JClass`) based on the specified class signature pattern. +* `SignatureMatcher.getMethods(String)`: Retrieves methods (`JMethod`) based on the specified method signature pattern. +* `SignatureMatcher.getFields(String)`: Retrieves fields (`JField`) based on the specified field signature pattern. diff --git a/gradle/wrapper/gradle-wrapper.jar b/gradle/wrapper/gradle-wrapper.jar index 2c3521197..a4b76b953 100644 Binary files a/gradle/wrapper/gradle-wrapper.jar and b/gradle/wrapper/gradle-wrapper.jar differ diff --git a/gradle/wrapper/gradle-wrapper.properties b/gradle/wrapper/gradle-wrapper.properties index 9355b4155..cea7a793a 100644 --- a/gradle/wrapper/gradle-wrapper.properties +++ b/gradle/wrapper/gradle-wrapper.properties @@ -1,6 +1,6 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists -distributionUrl=https\://services.gradle.org/distributions/gradle-8.10-bin.zip +distributionUrl=https\://services.gradle.org/distributions/gradle-8.12-bin.zip networkTimeout=10000 validateDistributionUrl=true zipStoreBase=GRADLE_USER_HOME