The .swiftsourceinfo
file is generated by the Swift compiler during compilation. It is emitted alongside .swiftmodule
and .swiftdoc
when the -emit-module
flag is present. As its name suggests, this file records the Swift source information of a Swift module, including file paths, timestamps, symbol (USR) declarations, and more.
The .swiftsourceinfo
file is used to enhance diagnostics, indexing, and potentially debugging. However, it always embeds the absolute paths. If the file is downloaded from a remote cache, local diagnostics and indexing are hindered. Therefore, a tool is needed to remap the source paths.
Since .swiftsourceinfo
is considered an implementation detail of the compiler, there is limited documentation available regarding its usage and format. However, I have managed to gather some insights from online discussions and my own investigations.
Based on the initial proposal, the .swiftsourceinfo
file was introduced to enhance diagnostics, also known as compiler error messages. Within the Swift source code, I discovered a test case, of which a simplified version is presented here.
// ModuleA.swift
open class ParentClass {
open func foo(a: Int) {}
}
// ModuleB.swift
import ModuleA
open class SubClass: ParentClass {
open override func foo(a: String) {}
}
Here are the error messages with and without the .swiftsourceinfo
file.
In Xcode, having the full path and line number enables us to double-click the error message to jump directly to the corresponding file. Without
.swiftsourceinfo
, this functionality is not available.
During my investigation, I discovered that swift source info is utilized by SourceKit to enhance indexing-related features. For instance, in the code example below, attempting to jump to the definition of the symbol foo
will not succeed without .swiftsourceinfo
, even if the IndexStore is fully populated. This is because foo
is a synthesized symbol (s:10Foo0A8ProtocolPA2A0A6StructVRszrlE3fooSSvpZ::SYNTHESIZED::s:10Foo0A6StructV
). This particular symbol can be observed in both the SourceKit logs and the generated symbol graph JSON file.
// Foo.swift
public protocol FooProtocol {}
public struct FooStruct : FooProtocol {}
extension FooProtocol where Self == FooStruct {
public static var foo: String {
"Hello, world!"
}
}
// In another module
print(FooStruct.foo)
The .swiftsourceinfo
file could also be utilized by the debugger, but I have yet to find a specific case to confirm this.
Warning
The format of .swiftsourceinfo
is not guaranteed to be stable. It may change across different compiler versions. The following content is verified with Swift 5.9 and 5.10.
Like everything else in LLVM and Swift, .swiftsourceinfo
file is in a LLVM Bitstream binary format. Using llvm-bcanalyzer
, we can see its high level block structure.
$ llvm-bcanalyzer -dump Foo.swiftsourceinfo
<BLOCKINFO_BLOCK/>
<MODULE_SOURCEINFO_BLOCK NumWords=281 BlockCodeSize=2>
<CONTROL_BLOCK NumWords=36 BlockCodeSize=3>
<METADATA abbrevid=5 op0=3 op1=0 op2=0 op3=0 op4=0 op5=0 op6=0 op7=0/> blob data = 'Apple Swift version 5.10 (swiftlang-5.10.0.13 clang-1500.3.9.4)'
<MODULE_NAME abbrevid=4/> blob data = 'Foo'
<TARGET abbrevid=6/> blob data = 'arm64-apple-macosx14.0'
</CONTROL_BLOCK>
<DECL_LOCS_BLOCK NumWords=240 BlockCodeSize=4>
<SOURCE_FILE_LIST abbrevid=4/> blob data = unprintable, 84 bytes.
<BASIC_DECL_LOCS abbrevid=5/> blob data = unprintable, 460 bytes.
<DECL_USRS abbrevid=6 op0=252/> blob data = unprintable, 292 bytes.
<TEXT_DATA abbrevid=7/> blob data = unprintable, 75 bytes.
<DOC_RANGES abbrevid=8/> blob data = unprintable, 1 bytes.
</DECL_LOCS_BLOCK>
</MODULE_SOURCEINFO_BLOCK>
However, if we want to understand more details, we need to look into the Swift source code. The serializing logic starts at here and the deserializing logic starts at here.
MODULE_SOURCEINFO_BLOCK
is the only core block that a .swiftsourceinfo
file has. It contains two sub blocks, CONTROL_BLOCK
and DECL_LOCS_BLOCK
.
CONTROL_BLOCK
has three records for the basic compilation information: the module name (e.g. FooModule), the compiler version (e.g. Apple Swift version 5.9.2), and the target triple (e.g. arm64-apple-ios17.2-simulator).
DECL_LOCS_BLOCK
is much more interesting than CONTROL_BLOCK
and has all the source information. It consists of five records, which are binary blob and opaque to the end users.
SOURCE_FILE_LIST
contains a list of fixed-size item, with each containing source file information.
struct SourceFileRecord {
uint32_t FileID; // the offset to TEXT_DATA, indicating the file path
uint8_t Fingerprint1[32]; // hash of interface including type members
uint8_t Fingerprint2[32]; // hash of interface excluding type members
uint64_t Timestamp;
uint64_t FileSize;
};
BASIC_DECL_LOCS
consists of a list of fixed-size item, with each representing a location of a USR declaration.
// The layout of one record item
struct DeclLocRecord {
uint32_t FileID; // the offset to TEXT_DATA, indicating the file path
uint32_t DocRangeID; // the offset to DocRangeRecord, indicating the documentation location.
struct {
uint32_t Offset;
uint32_t Line;
uint32_t Column;
struct {
uint32_t Offset;
uint32_t LineOffset;
uint32_t Length;
uint32_t FileID; // the offset to TEXT_DATA, indicating the file path
} Directive; // ExternalSourceLocs::LocationDirective
} RawLoc[3]; // ExternalSourceLocs::RawLoc, Loc/StartLoc/EndLoc
};
- Serializing BASIC_DECL_LOCS
- Parsing BASIC_DECL_LOCS
- The
LocationDirective
is related to the usage of#sourceLocation
DECL_USRS
is a serialized llvm::OnDiskIterableChainedHashTable
, where the key is a USR, and the value is the index of a location record in BASIC_DECL_LOCS
. Virtually, the deserialized DECL_USRS
looks like below.
s:10Foo0A8ProtocolPA2A0A6StructVRszrlE3fooSSvpZ -> 4
s:10Foo0A6StructV -> 2
s:10Foo0A8ProtocolPA2A0A6StructVRszrlE3fooSSvgZ (Foo.swift:10:33) -> 3
s:10Foo0A8ProtocolP -> 1
s:10Foo3BarC -> 0
s:e:s:10Foo0A8ProtocolPA2A0A6StructVRszrlE3fooSSvpZ -> 5
TEXT_DATA
is a list of \0
terminated strings, which are the actual source file paths. As mentioned before, they’re always absolute paths.
DOC_RANGES
is a list of fixed-size item, representing the the location of documentation. A documentation is a code comment in a DocC format. The layout of doc range for a USR is described below.
uint32_t nums; // The number (N) of DocRangeRecord followed by this
struct {
struct {
... // 7 uint32_t fields, same layout as the RawLoc in DeclLocRecord
} RawLoc;
uint32_t Unknown; // Unknown what is this used for
} DocRangeRecord[N]; // N DocRangeRecord