Provenance
What data is AI using when it gives its users a result? How can we trust a conclusion if we don’t know its premise?
We’re considering fostering the field of “AI Provenance”: knowing what data informs AI outputs.
We’re not the first group to be interested in this topic – both Training Data Attribution (TDA) methods (like TRAK) and language model architectures (like Retro) are interesting approaches in this area that continue to develop.
We’re currently unsure of the value further research here is for our mission.
We’re looking for someone to understand the field, figure out how possible overcoming algorithmic bottlenecks are, and likely pathways to implementation.
Knowing the source AI is using allows people to be better calibrated on the result
Please reach out if you think you have something to contribute to our inquiry!