Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safeguards / alternatives for MAP_FIXED mmap? #73

Open
rakslice opened this issue Jan 17, 2025 · 4 comments
Open

Safeguards / alternatives for MAP_FIXED mmap? #73

rakslice opened this issue Jan 17, 2025 · 4 comments

Comments

@rakslice
Copy link
Owner

rakslice commented Jan 17, 2025

Background:

On a real 68k host, where the Mac memory addresses need to be real host program memory addresses, and for emulation speed on non-68k hosts, macemu can take advantage of having the mac memory in host memory at a fixed location known at compile time, so that the offset between mac addresses and host addresses is fixed. The general case of a fixed offset is DIRECT_ADDRESSING; the special case where that offset is 0 is REAL_ADDRESSING. For more information on addressing modes see TECH. These fixed offsets are especially useful under JIT where it means you don't have to deal with shuffling a memory offset around, and reserve another register or incur access to another global to get it basically every load/store.

Of course this is used for the mac's entire main RAM and ROM, which we know the sizes of at launch time.

But also the framebuffer: The various SDL video implementations basically assume you are doing one of the fixed addressing modes, and always use their vm_acquire_framebuffer() to do a fixed allocation for the framebuffer, using whatever suitable OS platform functionality exists in vm_alloc.cpp, and the way the code is organized it needs to free and reallocate that framebuffer as the size changes when we change the video mode.

To support a fixed address block on a modern OS with virtual addressing you need to allocate process memory at fixed addresses.

On a platform where there is mmap() that supports the MAP_FIXED flag we use that...

The issue:

Unfortunately the common mmap() MAP_FIXED behaviour is that if there is already a memory allocation in the process address space covering the memory address range you asked for, it just cuts it up and deletes the part that overlaps the request and gives it to the mmap() caller instead.

And with multiple threads potentially making allocations you can't just use the API to check there are no pages currently in the range and then allocate.

Let me just quote the Linux manpage verbatim:

       MAP_FIXED
              Don't interpret addr as a hint: place the mapping at
              exactly that address.  addr must be suitably aligned: for
              most architectures a multiple of the page size is
              sufficient; however, some architectures may impose
              additional restrictions.  If the memory region specified
              by addr and length overlaps pages of any existing
              mapping(s), then the overlapped part of the existing
              mapping(s) will be discarded.  If the specified address
              cannot be used, mmap() will fail.

              Software that aspires to be portable should use the
              MAP_FIXED flag with care, keeping in mind that the exact
              layout of a process's memory mappings is allowed to change
              significantly between Linux versions, C library versions,
              and operating system releases.  Carefully read the
              discussion of this flag in NOTES!

[From NOTES:]

Using MAP_FIXED safely
       The only safe use for MAP_FIXED is where the address range
       specified by addr and length was previously reserved using
       another mapping; otherwise, the use of MAP_FIXED is hazardous
       because it forcibly removes preexisting mappings, making it easy
       for a multithreaded process to corrupt its own address space.

       For example, suppose that thread A looks through /proc/pid/maps
       in order to locate an unused address range that it can map using
       MAP_FIXED, while thread B simultaneously acquires part or all of
       that same address range.  When thread A subsequently employs
       mmap(MAP_FIXED), it will effectively clobber the mapping that
       thread B created.  In this scenario, thread B need not create a
       mapping directly; simply making a library call that, internally,
       uses [dlopen(3)](https://man7.org/linux/man-pages/man3/dlopen.3.html) to load some other shared library, will suffice.
       The [dlopen(3)](https://man7.org/linux/man-pages/man3/dlopen.3.html) call will map the library into the process's
       address space.  Furthermore, almost any library call may be
       implemented in a way that adds memory mappings to the address
       space, either with this technique, or by simply allocating
       memory.  Examples include [brk(2)](https://man7.org/linux/man-pages/man2/brk.2.html), [malloc(3)](https://man7.org/linux/man-pages/man3/malloc.3.html), [pthread_create(3)](https://man7.org/linux/man-pages/man3/pthread_create.3.html),
       and the PAM libraries ⟨http://www.linux-pam.org/⟩.

       Since Linux 4.17, a multithreaded program can use the
       MAP_FIXED_NOREPLACE flag to avoid the hazard described above when
       attempting to create a mapping at a fixed address that has not
       been reserved by a preexisting mapping.

Here's FreeBSD:

       MAP_FIXED	  Do not permit	the system to select a	different  ad-
			  dress	 than the one specified.  If the specified ad-
			  dress	 cannot	 be  used,  mmap()  will   fail.    If
			  MAP_FIXED  is	 specified, addr must be a multiple of
			  the page size.  If MAP_EXCL is not specified,	a suc-
			  cessful MAP_FIXED request replaces any previous map-
			  pings	for the	process' pages in the range from  addr
			  to  addr  + len.  In contrast, if MAP_EXCL is	speci-
			  fied,	the request will fail if a mapping already ex-
			  ists within the range.

OpenBSD:

If the MAP_FIXED flag is specified, the allocation will happen at the specified address, 
replacing any previously established mappings in its range. 

Solutions:

There's no way to actually recover if the fixed memory allocation is actually needed and the memory space in the process is already allocated; the only possible improvement to a given fixed allocation call as it concerns memory already allocated is that we detect that it can't not overlap and crash out and at least not give the user a partly working system that is slowly destroying itself. But of course, if we can do the fixed allocations more wisely in the first place, i.e. 1) do them once and never change them and 2) do them as soon as possible at program start so the least possible other stuff is allocated, that would be an improvement irrespective of whether we can detect the overlap or not.

@rakslice rakslice changed the title Safeguards for MAP_FIXED mmap? Safeguards / alternatives for MAP_FIXED mmap? Jan 17, 2025
@rakslice
Copy link
Owner Author

rakslice commented Jan 17, 2025

Recently I ran into a problem with peyla's android port of a Basilisk II codebase where the video mode change was working some of the time but other times it would result in video corruption and/or crashes.

Guess what?

The video corruption was when the vm_acquire_framebuffer fixed allocated framebuffer (the_buffer) had the change detection's regular calloc allocated scratch copy (the_buffer_copy) partly overlapping with it, i.e. the MAP_FIXED allocation had taken memory pages already given to the allocator, that were already being used for an allocation the allocator had handed out.

This resulted in repeating patterns of a very spooky automata-looking nature (pelya/BasiliskII-android#9 (comment)), because the change detection would copy the changed framebuffer to the copy, but because of the overlap that copy would go into the original framebuffer at a different screen location, which would then triggering the change detection again for that second location and get copied to a third location, etc. (rinse, repeat)

But the problem wasn't just video stuff; for instance quitting would sometimes hang in PrefsExit(), which is literally just the simplest possible function that traverses and frees the linked list of prefs and can't possibly hang unless all the pointers have been overwritten with garbage.

In that Android port's case since it was the slowest plain vanilla interpreted 68k emulating Basilisk II using "virtual" addressing (i.e. the Mac memory is just a glorified array of bytes you got with glorified malloc()), I could just update video_sdl.cpp to just malloc() some more bytes instead of using a fixed allocation and the problems go away.

But let's look into the case of the video mode change on systems where the fixed allocation is needed. To make it simple let's set aside the use_vosf case for now.

The sequence of events for a video mode change at a high level is:

void SDL_monitor_desc::switch_to_current_mode(void)
{
	// Close and reopen display
	LOCK_EVENTS;
	video_close();
	video_open();
	UNLOCK_EVENTS;

video_close() does some stuff and then last thing destructs the video driver object:

	delete drv;
	drv = NULL;
}

That destructor releases the old framebuffer allocation:

driver_base::~driver_base()
{
...
		vm_release_framebuffer(the_buffer, the_buffer_size);

which munmap()s it.

Then the video_open() constructs a new video driver object and calls its init():

	// Create display driver object of requested type
	drv = new(std::nothrow) driver_base(*this);
	if (drv == NULL)
		return false;
	drv->init();

That sets the video mode:

void driver_base::init()
{
	set_video_mode(display_type == DISPLAY_SCREEN ? SDL_FULLSCREEN : 0);
	int aligned_height = (VIDEO_MODE_Y + 15) & ~15;

Which results in a call to SDL_SetVideoMode(), which has to allocate an unknown amount of additional structs behind the scenes in SDL driver implementation for your platform land.
Then init() allocates the copy buffer using calloc():

		// Allocate memory for frame buffer
		the_buffer_size = (aligned_height + 2) * s->pitch;
		the_buffer_copy = (uint8 *)calloc(1, the_buffer_size);

and only then does it do a fixed framebuffer allocation at the same location (with a new size):

		the_buffer = (uint8 *)vm_acquire_framebuffer(the_buffer_size);

So even in the ideal world where the framebuffer was always the same size and you could only clobber things allocated from when it was freed to when it was allocated again, that's still a fair amount of things, and that's not considering what is happening in other threads. But actually it's not the same size, it's potentially expanding, so it can clobber other things besides.

@rakslice
Copy link
Owner Author

rakslice commented Jan 17, 2025

There are a variety of tricks already used to address the problem of not really being able to change a fixed allocation like this, that exist in a quality spectrum from a little better than handwaving to kinda sorta working some substantial percentage of the time:

  • Various platforms nudge the offset of the fixed allocations to places that are less likely to hit other memory use
  • There's a scheme with vm_init_reserved()/vm_acquire_reserved() where the init part allocates as big a piece as the biggest framebuffer you will ever possibly need, and then vm_acquire_reserved() just does a size check and gives you the existing allocation and it is never freed (and I guess hopefully you can do the init part soon enough after launch that there's very little that you could run into from before)

@rakslice
Copy link
Owner Author

rakslice commented Jan 17, 2025

Some other hypothetical options:

  • As you can see various versions of platforms have some additional mmap() option (e.g. Linux 4.17+ MAP_FIXED_NOREPLACE, FreeBSD MAP_EXCL) you can use along with MAP_FIXED that tells mmap() to error if it overlaps an existing assignment
  • Like I did for that "virtual" memory Basilisk II situation, make sure we don't do fixed allocations in cases where there is no reason to do them rakslice/BasiliskII-android@81633a5
  • Get memory at a fixed location by just exhaustively doing regular mmap()s until we've got one for the addresses of interest (up to some "giving up" threshold) and then releasing all the unused ones. I implemented a version of this a while ago to deal with the absence of MAP_32BIT on some 64-bit platform, that never needed to make it out of my local repo at the time.

@rakslice
Copy link
Owner Author

rakslice commented Jan 18, 2025

I vaguely remember that previous issues around this had led to an allocation for the framebuffer being done very early in the process, maybe in main() along the lines of when the ROM and RAM allocations were put together and when the black screen issue on some video cards was still current, and the issue with the framebuffer allocation coming up at a lower address than memory. That may not have been unix, or it may have been unix and maybe it was X11 or whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant