Database reader does not work for mmdb files larger than 2gb #154

nonetallt · 2024-02-01T17:41:16Z

Trying to create a new reader (with or without a cache) for a 2,5gb database:

new Reader(file, new CHMCache());

Results in the following exception:

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
	at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1183)
	at com.maxmind.db.BufferHolder.<init>(BufferHolder.java:31)
	at com.maxmind.db.Reader.<init>(Reader.java:119)
	at com.maxmind.db.Reader.<init>(Reader.java:69)
	at app.service.geoip.LocalFilesystemGeoipDataProvider.loadDatabaseReader(LocalFilesystemGeoipDataProvider.java:72)

A quick search of the error message reveals that due to some old standards ByteBuffer size is limited to 2gb.

Is there any workaround or fix for this or is the reader simply unusable with a larger database?

The text was updated successfully, but these errors were encountered:

oschwald · 2024-02-01T17:51:40Z

Unfortunately, I don't think there is a quick fix to this. If I recall correctly, it is also a limitation of Java's memory mapping. We would probably need to store an array of ByteBuffers and have multiple memory maps as well.

nonetallt · 2024-02-01T18:06:54Z

Unfortunately, I don't think there is a quick fix to this. If I recall correctly, it is also a limitation of Java's memory mapping. We would probably need to store an array of ByteBuffers and have multiple memory maps as well.

Any change you would consider changing the internals to account for this limitation? Understandably this might not be something that can be fixed with just a snap of the fingers but just how large of a refactor are we talking about here?

oschwald · 2024-02-01T18:13:29Z

Given that all the MaxMind databases are well under this limit, it seems unlikely that we would implement this ourselves in the near future. The change will be at least somewhat invasive and I suspect it will harm the performance for smaller databases. We would consider merging a PR to address the problem if the impact on existing users was minimal.

In terms of how large of a refactor this would be, I suspect you would need to modify BufferHolder significantly and you would need to replace the use of ByteBuffer throughout the code with a similar abstraction that can handle more than 2 GB.

nonetallt · 2024-02-01T18:18:33Z

Alright, thank you for the prompt answer. I don't think I currently have the resources to fix it myself given my general lack of knowledge considering the library's internals.

If you are concerned for the performance, it would probably make sense to either have 2 different memory handling implementations or a whole separate reader class.

shawjef3 · 2025-01-17T20:19:12Z

I created a PR to fix this. #222

oschwald · 2025-02-10T21:38:44Z

For both of you, may I ask what databases you are using that are so large? I believe MaxMind's largest publicly distributed database is under 400 MB in size.

Although this issue is something that it seems like we will need to fix eventually, we are not ready to accept a PR that significantly increases the public API or the complexity of the code to do it. I still believe it should be doable with a relatively moderate change that doesn't increase the public API at all and where almost all of the additional complexity is limited to a class that wraps multiple ByteBuffer objects.

nonetallt · 2025-02-10T21:59:50Z

For both of you, may I ask what databases you are using that are so large? I believe MaxMind's largest publicly distributed database is under 400 MB in size.

The largest single database in my use case was around 1.5GB, provided by ipinfo. However it is possible to merge multiple databases together using a tool like mmdbmeld, increasing the size further.

The two main reasons I see for using a merged database are:

Multiple providers to cover a maximal range of ip addresses.
Adding additional data points, avoiding having to run queries against multiple different databases. For example, one database might include country while another contains information about addresses associated with a vpn. If the application is required to check for both, it makes sense to execute only one query to gather all the relevant data.

shawjef3 · 2025-02-11T17:07:13Z

@oschwald I have a patch for #229 that gives people the option of implementing a reader for files > 2 GB. I believe my patch is backwards compatible. It introduces an interface and a constructor for Reader. Are those changes are acceptable?

oschwald · 2025-02-11T17:51:02Z

@shawjef3, it is a relatively big change and increases the public API significantly, e.g., the new interfaces. It is also not the easiest for the user to use as they need to provide their own ByteReader implementation in this case. I think the ideal solution would be something with no public API change and no new dependencies that just used multiple ByteBuffer values when the database exceeded 2 GB.

shawjef3 · 2025-02-11T17:59:28Z

Okay. I give up.

shawjef3 · 2025-02-11T18:17:43Z

I edited my comment. I meant to ask about I have a patch for #229.

oschwald · 2025-02-11T18:33:07Z

Okay. I give up.

I apologize that this has been frustrating. In general, I would recommend talking through proposed changes before making pull requests, especially large pull requests.

You are also free to maintain a fork of this repo with any changes that you deem appropriate. There is no reason that there has to be only one MMDB reader for Java. Many languages have several competing implementations that have different priorities.

As for the underlying issue here, I would like to see it fixed, but it has not been a priority for us as all of our databases are so far from the limit. The fix I have in mind would not add any additional dependencies, not require any changes from the user to use a larger database, and would not materially impact the performance for databases under 2 GB. Under the hood, I suspect it would provide two different ByteBuffer wrappers for performance reasons, one for the single ByteBuffer case and one for the multiple ByteBuffer case; that said, it is possible that a single implementation would not add enough overhead for both.

oschwald mentioned this issue Jan 17, 2025

fix https://github.com/maxmind/MaxMind-DB-Reader-java/issues/154 #222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database reader does not work for mmdb files larger than 2gb #154

Database reader does not work for mmdb files larger than 2gb #154

nonetallt commented Feb 1, 2024

oschwald commented Feb 1, 2024

nonetallt commented Feb 1, 2024 •

edited

Loading

oschwald commented Feb 1, 2024 •

edited

Loading

nonetallt commented Feb 1, 2024 •

edited

Loading

shawjef3 commented Jan 17, 2025

oschwald commented Feb 10, 2025

nonetallt commented Feb 10, 2025

shawjef3 commented Feb 11, 2025 •

edited

Loading

oschwald commented Feb 11, 2025

shawjef3 commented Feb 11, 2025

shawjef3 commented Feb 11, 2025

oschwald commented Feb 11, 2025

Database reader does not work for mmdb files larger than 2gb #154

Database reader does not work for mmdb files larger than 2gb #154

Comments

nonetallt commented Feb 1, 2024

oschwald commented Feb 1, 2024

nonetallt commented Feb 1, 2024 • edited Loading

oschwald commented Feb 1, 2024 • edited Loading

nonetallt commented Feb 1, 2024 • edited Loading

shawjef3 commented Jan 17, 2025

oschwald commented Feb 10, 2025

nonetallt commented Feb 10, 2025

shawjef3 commented Feb 11, 2025 • edited Loading

oschwald commented Feb 11, 2025

shawjef3 commented Feb 11, 2025

shawjef3 commented Feb 11, 2025

oschwald commented Feb 11, 2025

nonetallt commented Feb 1, 2024 •

edited

Loading

oschwald commented Feb 1, 2024 •

edited

Loading

nonetallt commented Feb 1, 2024 •

edited

Loading

shawjef3 commented Feb 11, 2025 •

edited

Loading