Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document source of various metadata files #1506

Open
steven-sheehy opened this issue Dec 10, 2024 · 3 comments
Open

Document source of various metadata files #1506

steven-sheehy opened this issue Dec 10, 2024 · 3 comments

Comments

@steven-sheehy
Copy link
Contributor

The cht, dat, and metadat folders contain files whose source of truth is unclear. Without knowing the source, it's hard to know if the data needs to be updated. Also, if the source is known then automatic processes could be developed to automatically keep them up to date or alternative sources could be suggested. For example, it might be interesting to replace most of the metadat files with code that scrapes a game database like screenscraper.fr or thegamesdb.net.

This documentation could be in the form of a simple README inside each specific sub-folder or a comment with a URL added to the header of the dat file.

@OctopusButtons
Copy link
Contributor

OctopusButtons commented Dec 30, 2024

files whose source of truth is unclear. Without knowing the source, it's hard to know if the data needs to be updated

I thought the source list in the readme looks good. Is that new from after you posted the issue, or what kind of source do you mean?

But yes I was looking for clearer documentation about exactly which parts make sense for github contribution versus which parts are imported periodically from Redump etc (which would over-write in-house contributions). I'm looking through Histories to try to determine, in order to update Readme.

Case Study Questions
Below questions came up when I tried to find out where Retroarch is getting the name "Virtua Fighter 4 - Evolution [Greatest Hits] (USA)" which then doesn't match any thumbnails.

  • dat documentation says it's "Customized DAT files, maintained by the libretro team"
    • How it functions alongside all other dat files?
      • Answer: It has precedence because it's earlier in the list.
    • Is the reason so that we can work around / fix issues or gaps from the Redump etc groups, with our own addenda? If yes, that's great, and I'll PR added documentation, but I'm not sure.
    • What is the purpose or function for example of the SNES dat, is it because No-Intro excludes Virtual Console variants?
      • Answer: I'm surprised to find out that No-Intro doesn't catalog Virtual Console SNES variants (as of 2025) but without stating that policy in any obvious place. I had assumed the libretro file was made because No-Intro just hadn't logged VC variants yet at the time. Yet I see No-Intro logs VC variants of a GBA game, but not VC SNES examples.
    • Conflicts? If there's conflicting info between dat and metadat (e.g. imported from Redump) which one wins?
      • Answer: earlier items in the dat list take precedence. Hence the in-house dat can be used to over-ride problems from No-Intro etc.
  • metadat/developer connects, for example, Virtua Fighter 4 - Evolution [Greatest Hits] (USA) to serial SLUS-20616 while the redump file instead uses SLUS-20616GH
    • Why is that a separate database, shouldn't it be covered by the redump metadata (below)? In the case of Virtua Fighter 4 - Evolution [Greatest Hits], the developer dat info is only that name and a serial (which apparently conflicts with the redump "GH" serial). The metadat/developer dat doesn't always have checksum hashes to connect file to info.
  • metadat/redump (link) doesn't have any [Greatest Hits] subtitle/tag for any Virtua Fighter 4. Yet redumps own website lists the "Greatest Hits" version tag as an "Edition" field not in the name. And the metadat/redump file connects "Virtua Fighter 4 - Evolution (USA)" (no [Greatest Hits] title tag) to serial "SLUS-20616GH.
  • rdb is compiled and I don't have tools yet to open and view directly, therefore I'm looking at all the component files that went into it.
    • How does the final rdb assign the [Greatest Hits] title tag to my Virtua Fighter 4 Evolution file, when the redump database doesn't have the [Greatest Hits] title tag, and when the metadat/developer database has [Greatest Hits] name info but connects it to SLUS-20616 not SLUS-20616GH?
      • Answer: Presumably either a different (earlier) dat takes precedence for the same item, and/or the crc maps to a serial then the serial maps to the final title in a different dat. Or the file itself a serial in metadata. Researching and tracing.
  • Pending. Will trace using my file hash to see which/where dat sequence hypothetically assigns the final RA name and serial, then update comment, and maybe update documentation accordingly if I learn.

Big Question: which databases should people contribute to via libretro github databases? Versus which changes should only go through those groups rather than libretro's github dat addenda? It's obvious in cases like metadat/no-intro, metadat/tosec, and metadat/redump, but many others are less straightforward. I'm looking through the Histories and will update the ReadMe with specifics about each dat.

I'll update the documentation, after I understand it well enough.

@steven-sheehy
Copy link
Contributor Author

steven-sheehy commented Dec 30, 2024

I thought the source list in the readme looks good. Is that new from after you posted the issue, or what kind of source do you mean?

Sorry, I should've been more clear. The README mostly documents the source of the dats, metadat/no-intro, metadat/redump, metadat/tosec, and metadat/mame* folders. But not the cht or the rest of the metadat folders. Their source of truth is still unclear to me.

Big Question: Should people contribute to libretro github databases where appropriate? Or do the Redump databases get copied periodically from Redump etc, meaning changes should only go through Redump etc?

I'm not an expert, but I believe Rob periodically syncs the metadat/no-intro, metadat/redump, metadat/tosec, and metadat/mame* folders from the upstream dats using his libretro-dats. See his most recent PR for an example. So don't think you would want to manually update those files and the generated RDB but the rest are fair game.

@OctopusButtons
Copy link
Contributor

OctopusButtons commented Dec 31, 2024

Oh I see what you mean. Though cht specifically I figure is any source / anywhere / any contribution, since cheat codes are a low-stakes side perk in the app. But yeah: I'm going through and researching each sub-database in order to update the per dat bullet list with

  • A) description. (I'm now an expert on the history of "Ukie" because of the elspa dat...)
  • B) sourcing
  • C) clear flag on user-contribution-relevant or not.

If anyone can give any casual info, I'll revise/combine/format additions to Readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants