-
Notifications
You must be signed in to change notification settings - Fork 200
Breaking the 1 billion instructions per second barrier via Dynamic Recompilation and WebAssembly
No doubt, writing a fast emulator for the Web Browsers is hard. You have to deal with a lot of web technologies and none of them is naturally suited for this job. Nevertheless jor1k manages over 100 million instruction per seconds (MIPS) utilizing asm.js. But jor1k is still based on ordinary machine code interpretation.
To speed up even more you have to take a look at other emulators such as QEMU which is probably the most advanced and versatile emulator out there. It beats jor1k by one or two orders of magnitude in terms of speed. The technique behind QEMU is called just in time (JIT) recompilation or dynamic recompilation. It translates the machine code of one machine to another machine reducing almost all overhead. To accomplish this, it has to work on very low level and obviously, it can't run in the browser.
But what do you need to use the same technology in the browser?
This can already be done in JavaScript via eval and can be done in principle in WebAssembly too by filling a bytearray.
All modules can share the same memory. E. g. you can give them the same array.
You can define an interface for the compiled code and import and export functions. However this can be very slow as the code can run through an foreign function interface.
While the goto statement is frowned upon, it is still quite natural on the machine language level. Neither Javascript, nor WebAssembly support this valueable feature. However, there is a trick to do the same via switch case
and an endless loop.
var goto_label = 1;
while(1) {
switch(goto_label) {
case 1: // label 1
.... do something ....
.... fall through ....
case 2: // label 2
.... do something ....
if (condition) goto_label = 2; else goto_label = -1;
continue;
case -1: // end execution
return;
}
}
Javascript and WebAssembly doesn't support it naturally. However, like in point 4, you can do it yourself. With switch case
, you can build a logic, in which you can jump to stop and start at every instruction. Unfortunately, this enlarges the code significantly.
This is automatically done via the JavaScript engine. Just remove all links to an object.
This is not supported. You have to recompile everything. But you can use tables with function pointers, which points to small code chunks.
So it looks like, at least in theory dynamic recompilation can be used in the browser as well. And actually, I tried it, and the commit can be found here. It is completely based on JavaScript, because WebAssembly was not available at that time.
The results were devastating:
- Because of the way modern CPUs work you should only compile very small code pieces (from jump to jump). But to provide the JavaScript engines ten thousands of code chunks a second (obviously as JavaScript strings) simply does not work. I have never seen browsers crashing so fast than with this code.
- The code was more than ten times slower than without dynamic recompilation. The JavaScript engine was compiling all the time and I had to additionally manage the more complex state of the machine (tables of functions, heuristics, dirty pages). The CPU was mainly busy by doing function calls (see number 7 above). The technique asm.js was even worse, because the calls run through a Foreign Function Interface (FFI). The overhead was just so gigantic.
With these seemingly insoluble problems I gave up for more than two years.
Now, we have WebAssembly, a sort of byte code for the Internet to provide a better load-time-efficiency. In contrast to bare JavaScript the code is denser and compiles faster. Unfortunately WebAssembly does not really solve the major problems I mentioned above. It is just a little bit more efficient. However it gave me new hope, that the problems might be solved in the future. Fortunately two years later I am more experienced with low level computing and especially Linux. I came up with an idea.
- Actually, what I want is to emulate the Linux operating system. Ignore other possible applications for now.
- The applications in Linux run in the so called user mode, which is much much restrict, bus also easier to handle. This is also the mode, in which the most time consuming code runs. So let's concentrate on this.
- The executable is usually not relocated in virtual memory, just the dynamic libraries.
So the idea is: Recompile the whole binary executable before it is executed
This solves the major problem. You compile only one module, e. g one function for an application.
Let's run through the details ...
The biggest part is to write the recompiler. E. g. read the ELF file and autogenerate a huge switch statement and put all the code in it. An example of such a code can be found here: ELF executable recompiled to asm.js code At the moment, it writes an asm.js file and compiles it via Binaryien In the future the CPU will autogenerate the WebAssembly binary directly.
The CPU must know exactly which program it is currently executing. Under Linux the programs are distinguished by their process id (PID). This is just an integer. To tell the current active PID the CPU, you can put a one liner into the __switch_to
function of Linux.
The compile task still takes a lot of time. So maybe you don't want to recompile every program. Also the CPU must know, which ELF Executable you want to start next. What you need, is a small program, which executes the program you want to recompile and tell the new pid and the file name the CPU.
This software is called recompile
.
At the moment only one program is supported.
Interpreted core | Recompiled core |
---|---|
link: Interpreted | link: Dynamic |
start by typing fbdemo
|
start by typing recompile /usr/bin/fbdemo
|