Location
Ngee Ann Kongsi Auditorium (NAKA)
Start Date
4-6-2026 2:30 PM
End Date
4-6-2026 3:00 PM
Description
The rapid rise of generative AI has created unprecedented demand for large, high-quality research corpora. Open access repositories and other open scholarly infrastructures have therefore become primary sources for AI bots, because they host research that is not universally reliable, but remains far more evidence-based than most web content. This is both an opportunity and a strain: repositories are now more valuable than ever, but machine traffic brings sustainability, capacity and policy challenges. Repositories may be able to scale, but who should fund that scaling, and under what conditions?
The core dilemma is how to curb abusive high-load bot activity without undermining openness for both humans and machines. Much discovery now happens through retrieval-augmented generation (RAG) and similar AI services that fetch content on behalf of users, so blocking bots often blocks people. Blocking bot access does not stop AI use, it only removes research content from the answers people receive.
Copyright misunderstandings add confusion. Copyright protects the expression of a paper, not the scientific knowledge it conveys. Facts, data and ideas are not owned and cannot be copyrighted. The issue is not whether AI may learn from research, but how to enable lawful, well-governed machine access while protecting repository sustainability.
Drawing on the experience of CORE (core.ac.uk), a comprehensive index of the world’s scholarly literature, the talk will outline:
• rate-limiting and targeted blocking of bad bots
• recognising good vs bad bots and applying graduated controls
• keeping OAI-PMH and APIs open for legitimate machine use
• avoiding Cloudflare and other human-oriented blocks on machine interfaces
• community efforts including the OR2025 AI Bots panel and the COAR Task Force
The session argues for shared norms for responsible machine access, so open infrastructures remain usable and resilient in an AI-driven research ecosystem.
Managing AI Bot Access to Open Scholarly Infrastructures
Ngee Ann Kongsi Auditorium (NAKA)
The rapid rise of generative AI has created unprecedented demand for large, high-quality research corpora. Open access repositories and other open scholarly infrastructures have therefore become primary sources for AI bots, because they host research that is not universally reliable, but remains far more evidence-based than most web content. This is both an opportunity and a strain: repositories are now more valuable than ever, but machine traffic brings sustainability, capacity and policy challenges. Repositories may be able to scale, but who should fund that scaling, and under what conditions?
The core dilemma is how to curb abusive high-load bot activity without undermining openness for both humans and machines. Much discovery now happens through retrieval-augmented generation (RAG) and similar AI services that fetch content on behalf of users, so blocking bots often blocks people. Blocking bot access does not stop AI use, it only removes research content from the answers people receive.
Copyright misunderstandings add confusion. Copyright protects the expression of a paper, not the scientific knowledge it conveys. Facts, data and ideas are not owned and cannot be copyrighted. The issue is not whether AI may learn from research, but how to enable lawful, well-governed machine access while protecting repository sustainability.
Drawing on the experience of CORE (core.ac.uk), a comprehensive index of the world’s scholarly literature, the talk will outline:
• rate-limiting and targeted blocking of bad bots
• recognising good vs bad bots and applying graduated controls
• keeping OAI-PMH and APIs open for legitimate machine use
• avoiding Cloudflare and other human-oriented blocks on machine interfaces
• community efforts including the OR2025 AI Bots panel and the COAR Task Force
The session argues for shared norms for responsible machine access, so open infrastructures remain usable and resilient in an AI-driven research ecosystem.