For now, this page is somewhat a placeholder for an ongoing research project on which I'm a collaborator. The project consists of various tools to help retrieve, filter, harmonize, and clean nature occurrence records. While the target of this project is invertebrate occurrence records, most of the tools are agnostic towards taxon type. Until this manuscript is out, I'm going to keep it vague.
For anyone who hasn't worked with occurrence records, dates are one of the most crucial pieces of information for making use of ecological data. And for anyone who hasn't worked with date-centric data, dates are one of the most frustrating, varied, and often-ambiguous data types to work with. Within the greater invert occurrence records pipeline, this function has been able to recover date information for crucial occurrence records, altering our perception of a species' presence and conservation needs.
My contribution to this code involved improving the regex used to recognize and parse unambiguous dates. The question of what makes a date unambiguous can be argued, and how desperate you are to recover data dictates how much you let through. In some newer industries, 09/23 would very clearly refer to September, 2023. In other, older practices (like ecology), this year could refer to 1923 (or even 1823). And what if that date was written by an eccentric, forgetful scientist, and it really refers to September 23rd? These are the sorts of logical improvements I sought to contribute.
My contributions to this other, unreleased paper involved various improvements to data retrieval, management, filtering, conservation metric calculation, and overall assistance with the structure and management of the project as a collaborative whole. When the paper is published, I will update this page with more information!