-
Notifications
You must be signed in to change notification settings - Fork 72
Machine Code Cache
The machine code cache is stored in a single sequential piece of memory allocated in Pharo startup. It has the following characteristics:
- It does not grow, however it is possible to set its size when the VM starts.
- The memory could be set either to readable/executable or writable, but cannot be both at the same time. It would require to swap the permissions (the JIT on new macOS machines has this kind of requirement).
- The machine code cache is splitted in two:
- The trampoline section which are small machine code routines automatically generated on VM startup (for this the section is marked as writable before, and set to executable after routines instantiation).
- A machine code (of compiled methods and inline caches) section.
- As any cache it should manage invalidation between the cache code entry and the method it represents.
It is important to realize that this section of memory is reserved to optimize execution time, so it preserves only the most recently used methods in the call stack.
When the VM decides that a method needs to be compiled to machine code, it sets the section as writable, store the method in the machine code section, swap back to executable, and continue running.
When there is a method m1 which calls another method m2, if m1 always and only calls m2 in the same call site, then m2 is linked through a references (pointers). The pointers are important as they should be updated when methods in the machine code section need to be moved or removed.
A counter of incoming references is kept on each of the methods/caches in the memory section. The counter only has 3 bits which allows to set its value from 0 to 7.
- 0 : Methods or caches with no incoming references.
- 7 : Methods or caches which are the most referenced.
A compiled method in the stack will be added to the machine code cache with its Cog method counter (CMC) set to 7. Successive compactions will decrement the Cog method counters (using heuristics) in divisions by 2, rounded downwards. So given a compaction pass:
- CMC set to 7 and 6 will be set to 3,
- CMC set to 5 and 4 will be set to 2.
- In a following compaction pass, CMC will be set to 1.
- At the end of the process all methods in the section are set to 0.
This behavior is configured in multiple parts in the Cogit source code: #incrementUsageOfTargetIfLinkedSend:mcpc:ignored:, #compactCompiledCode and #initialClosedPICUsageCount
[We could explore different combinations of numbers]
When the section is full and memory is requested (for example when a method is compiled), those methods/caches which are set to 0 are free. If the size of freed memory does not satisfy the required size, it keeps freeing until it has enough free memory. The assumption behind is that the newest created method should run faster than the oldest methods and probably are not used anymore.
Once this process is completed, the compaction process will move the methods/caches to the beggining (there is no way to predict the size of machine code to be allocated so far).
During compaction, the reference counting starts from the lowest numbers of incoming references, taking care of not incrementing the methods already set to 7, and incrementing those that have new references on each pass.
If a method is in the call stack, and it should be freed, this would represent a very expensive operation as all the stack should be also reconfigured/patched to reflect the current state of the interpreter.
If a method is not in the call stack anymore, it will be removed during the next compaction.
The subsequent analysis requires an understanding of how machine code methods are represented in the cache. A method in machine code has a header, code and a map. Instructions in the code section has annotations located in the map. So to find the message sends, annotations marked as “send” could be used. For example a “call” send to m2 will be performed with code of the annotations, the offsets, and parameters if it has.
The current format of the compiled method metadata is difficult to follow, which leads to code which is difficult to understand. [We could do a new Dojo session to propose ideas for new metadata].
If we would have a dynamic inliner, is the metada section a good candidate to represent the inline version?
There are two kinds of metadata for methods (PICs does not have metadata because they are fixed structures):
- Metadata for register mapping: In an optimized machine code there are inaccessible (stuff?) which are not registers, and for which there is no C API. For example: When leaving a method that is optimized, the data in CPU registers should be extracted and store it in the heap or the stack.
- Inlining information mappings. And then the mapping between variables.
But it also depends of the kinds of optimizations in play.
The analysis start in compactCogCompiledCode method.
- markActiveMethodsAndReferents → markCogMethodsAndReferentsOnPage: → markMethodAndReferents:
- It performs a “machine code do:” iterating over the stack pages and machine code instructions which have a “send” annotation in the Cog method map, and applying the (current) heuristics of successive divisions by 2 in #incrementUsageOfTargetIfLinkedSend:mcpc:ignored:
- Only one level is of message sends is considered. So if we have three Cog methods with counters set to 7, it will iterate those three Cog methods, and the methods referred by those.
- A method could have two sends to the same method in two different call sites. In this case both call sites references are incremented.
-
freeOlderMethodsForCompaction
- This method actually free methods depending of the contents of the stack, so a better name could be freeLowestReferredMethodsForCompaction or similar.
- The amount to free it starts with 1/4 of the cache size.
- The baseAddress which is a pointer to the trampoline’s end section.
- The main part of this method is a loop which iterates over (cog) methods to free its counters. The iteration begins from the baseAddress to the method zone free start section (mzFreeStart). The first iteration will scan the cog methods with counters set to 1, checks if it is enough of freed space, if not, then repeat the scan with the cog methods with counters set to 1, and so on.
- compactPICsWithFreedTargets
- Iterates over the section managing PICs and MICs. Methods has a reference count number, but when PICs and MICs are created they are set to an initial value of 3 for the PIC and 6 for the MICs (open PICs). This can be checked in initialClosedPICUsageCount and initialOpenPICUsageCount methods. For methods could be checked in initialMethodUsageCount.
-
cPICCompactAndIsNowEmpty:
- It runs an algorithm which fixes the broken references of previously freed references. This method compacts the PIC itself. It does a loop between 1 and the number of cases, similar to a switch with the number of cases.
- As PICs has a different structure than cog methods (they have a fixed structure so they do not have metadata) they store the number of fields it has in a header, representing the number of cases the PIC has. In this context of machine code the PIC is compiled as a switch. Each case of the switch has the same number of instruction n, so offsets are found multiplying 2n, 3n, etc.
- Now is going to compact the cases of the PICs and removing if there are no cases. The PIC is one block which can contain holes (if there is a hole in the middle of the PIC it moves up all the other PICs? do we assume there are no holes in the PIC?).
- It runs an algorithm which fixes the broken references of previously freed references. This method compacts the PIC itself. It does a loop between 1 and the number of cases, similar to a switch with the number of cases.
- planCompaction
- At this point we need to move the methods to a new place and fix the references.
- This method skips the space that is free and iterates over the cog method. It starts at 0 and goes back (negative numbering). If it is not free, the method is planned to move to the delta by updating its objectHeader.
- relocateMethodsPreCompaction
- Patch the calls from each method to its references, this is, updating the calls to the new location but not yet moving the methods to a new location.
- relocateAndPruneYoungReferrers
- Here the “young referrers” represents a list of methods inside the cache, that are young referrers, i.e. has pointers to methods that have references to young objects.
- compactCompiledCode
- The first loop updates the usage count and cleans its header value (delta is not used anymore). It advances only as soon as it finds methods, but stops as soon as it finds free memory. So it is looking for the next free chunk.
- The second loops takes the method object (in the heap) and link to the machine code method.