-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-radiocarbon dates: the bazAARverse #91
Comments
From my perspective, it actually does not make so much sense to put OSL and Dendro into a package that is called c14bazaar. That would probably go much better into an oslbazaar and dendrobazaar package. Also, one could envisage extracting the filter functions (that might be useful for all the bazaars) into another package bazaarSanitizaar? |
I generally agree with you. My experience with On the other hand it's tedious to maintain and establish multiple packages. I would only invest this time if we have a solid number of expected users. |
Admittedly, multiple packages require a bit more coordination. But on the other hand, the amount of code to maintain should approx. stay the same, doesn't it? |
Exactly that's why I wanted to avoid the overhead of multiple packages. But you convinced me. We could call it the bazAARverse with the packages
and the general helper package bazAAR. Now we only need a team of 5 and 3 weeks of time 👍. If only we could justify this investment... |
Can I detect a hint of sarcasm in those lines? ;-) |
Maybe a pinch. But seriously: This would be fantastic. A great standardization challenge, that could advance the field. Some discussions I recently shared with @stschiff inspired me to think in bigger dimensions again. I only fear we're doing this mostly for ourselves at the moment. Is there are possibility that we reach the critical mass to make this an established tool in our field? Some developments in the last weeks give me hope, but I'm still wondering. |
This sounds mysterious. Nevertheless, starting with osl and dendro would be already a (maybe manageable?) enterprise and a 'leap forward'. Although, adna is also very tempting... |
I started to work on c14bazAAR because I profited from it for my own research projects (although the work went way beyond that at some points). I think this connection is crucial to do this in a feasible way. @yesdavid Do you think you would need OSL (and/or Uran-Thorium etc.) datings for your research? If yes, could you imagine creating and maintaining such packages if you receive proper support from us? Maybe this is interesting for @felixriede as well? @MartinHinz Are you in a position where this would apply to you concerning Dendro-datings? Are there even open (!) databases out there for this kind of data? I could volunteer to coordinate the process and apply the necessary changes to c14bazAAR to detangle 14C related functions and general functions. I'm sure @dirkseidensticker would be on board as well. Is this a good way to approach this? A good investment of our time? I see this as a long-term, slow-pace transformation. |
We can look into it, perhaps experiment with data import from existing db for the Palaeolithic, as part of @yesdavid's projekt. We'll put it on our to do list. |
I will volunteer for the dendro part! Count me in! |
What a great discussion! I see great value in the way we approached standardization within c14bazAAR, which could be translated to other kinds of data as well.
I am very much thrilled about such an approach! We would need to discuss how the logic we have already should/could be split. A lot of our efforts with regards to standartization were pointed at the metadata that are associated with 14C dates. Especially our approached towards 'thesaurification' are only scratching the surface as of yet. @nevrome is right, a critical mass is important as well as a focus on research questions that benefit from such 'investments' of time and energy. Two action points from me:
Btw: aDRAC contains a few OSL dates as well ... might be a good time to turn myself in 😉 |
One interesting aspect @yesdavid brought forward: The amount of data we have manually compiled to simplify oddly specific sample material descriptions could be enough to try machine learning. I guess he was joking, but I would love to give this a try one day.
I got the impression the topic for the Hackathon is already pretty fix. But as this might not be the last one of these events, I think this is a good idea.
I think this is the kind of data where it might be the most easy to contact the authors and ask for a data publication on a long-term archive.
Off with his head! But seriously: We should check all databases for this: #92 |
machine-learning sounds good! the only downside is, that it should be consistent for every user, and you can not trust the machines to do so on the client-side. But on 'server-side' this might be worthwhile hackathon: not at CAA, but what about a virtual hackathon, or let's call it a sprint on the GitHub repo one day or the other? paywall: who is not in with open science, is out. |
I like this idea. Maybe one day in February that we all try to shovel free to lay the foundation in a concerted attack. |
I am in! |
This sounds all great. Regarding aDNA data, just my five cents: This is so high-dimensional (a million genetic markers aren't uncommon) that it wouldn't fit into the exact same framework as the other data you have (C14, dendro, isotopes), which mostly come down to one number (plus extensive meta-information). One could of course think about summary stats (like Principal Components coordinates or something). But I like the generic setup of these bazAARverse packages, which would basically try to offer a consistent API into such datasets, perhaps even with cross-compatibility of at least overlapping meta-info fields (say, longitude, latitude, or even somehow universal individual IDs that would link experimental data for the same individual burial). |
Thanks for the clarification, I think you are absolutely right. On the other hand, such things like Haplogroup, mt or Y, could be made accessible with that, or just a link for downloading the original high-dimensional data. Still not being very familiar with that topic, but eager to learn, I would already benefit from such a possibility. |
Yes, good point. Haplogroups could go into this, for sure, and they are already quite interesting. And of course, if we could even automatically download the full data somehow through a function, that would make people's life a lot easier. I think a lot of this depends on the development of an open and consistent data format for aDNA data and its meta-data, and we're working on that with @nevrome and others. So he's in the right position to help pushing this on this frontend side once we're making progress on the backend side. |
@yesdavid I saw that the
14cpalaeolithic
database you added contains non-radiocarbon dates. c14bazAAR does not support these yet: We can't simply put them into the fieldsc14age
andc14std
, that wouldn't make any sense.For the moment I decided to remove them:
I think we could make this possible in the future, but this will require some changes in the architecture of the whole package.
Maybe the easiest solution would be dedicated S3 classes for each dating method: e.g.
uratho_date_list
,osl_date_list
,dendro_date_list
. Some functions that were developed for thec14_date_list
could be applied to these objects as well, others not. There should be a superclass that allows to merge these dates despite their semantic differences.I'm dependent on your input here, @dirkseidensticker and @MartinHinz. There are other possible solutions like extra packages for each kind of dates or -- much less ambitious -- some more columns for the
c14_date_list
.Generally I'm a big fan of the Unix philosophy (Do One Thing and Do It Well), but in this case c14bazAAR already contains multiple functions that are useful for all kinds of dating information. On the other hand, for example dendro dates are a huge can of worms that we might want to avoid.
The text was updated successfully, but these errors were encountered: