Artificial Intelligence, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). New tools for accessing archival and bibliographic resources
DOI:
https://doi.org/10.6092/issn.2283-9364/19982Keywords:
Artificial Intelligence (AI), Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Archives, Libraries, Retrieval KnowledgeAbstract
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems offer a new paradigm for querying and retrieving information, making the resource recovery processes more efficient and accurate due to their ability to learn and generate responses based on vast knowledge databases. This paper aims to demonstrate these systems in a simplified form to initiate a scientific discussion on the possibility of integrating these technologies into archival and bibliographic resource retrieval systems, and more broadly, into cultural heritage management.
References
Beaird 2007 = Jason Beaird, The Principles of Beautiful Web Design, Collingwood (Australia), SitePoint, 2007.
Bianchini 2017 = Carlo Bianchini, «Funziona come Google, vero?» Prima indagine sull’interazione utente-catalogo nella biblioteca del Dipartimento di musicologia e beni culturali (Cremona) dell’Università di Pavia, «AIB Studi», 57 (2017), 1, p. 23-49, https://doi.org/10.2426/aibstudi-11557.
Bibliographic Control 2022 = Bibliographic Control in the Digital Ecosystem, a cura di Giovanni Bergamin, Mauro Guerrini e Carlotta Alpigiano, vol. 7. Biblioteche & Bibliotecari / Libraries & Librarians, Roma, Macerata, Firenze, AIB, EUM, FUP, 2022.
Biblioteche e informazione nell’era digitale 2007 = Biblioteche e informazione nell’era digitale: atti del convegno della 4a Giornata delle biblioteche siciliane, Ragusa, 26 maggio 2006, a cura di Renato Meli, Palermo, AIB Sezione Sicilia, 2007.
Bondielli 2001 = Daniela Bondielli, SIUSA - Sistema Informativo Unificato per le Soprintendenze Archivistiche Genesi e sviluppi di un progetto, Pisa, Bollettino d’Informazioni. Centro di ricerche informatiche per i Beni Culturali, XI (2001), 2, p. 43-73.
Carucci 2004 = Paola Carucci, Sistema Guida Generale degli Archivi di Stato italiani, «Archivi & Computer», XIV (2004), 2, p. 52-63.
Cerullo - Negri 2023 = Luigi Cerullo, Antonella Negri, L’infrastruttura software per il patrimonio culturale (ISPC) come abilitatore di un Ecosistema digitale nazionale del patrimonio culturale, «DigItalia», 18 (2023), 1, p. 38-50, https://doi.org/10.36181/digitalia-00059.
Chen - Zaharia - Zou 2023 = Lingjiao Chen, Matei Zaharia, James Zou, FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance, «ArXiv», 2023, p. 1-13. Preprint: https://doi.org/10.48550/ARXIV.2305.05176.
Chen et al. 2024 = Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei Sun, Spiral of Silences: How is Large Language Model Killing Information Retrieval? A Case Study on Open Domain Question Answering, «ArXiv», 2024, p. 1-20. Preprint: https://doi.org/10.48550/ARXIV.2404.10496.
Dao 2023 = Xuan-Quy Dao, Performance Comparison of Large Language Models on VNHSGE English Dataset: OpenAI ChatGPT, Microsoft Bing Chat, and Google Bard, «ArXiv», 2023, p. 1-12. Preprint: https://doi.org/10.48550/ARXIV.2307.02288.
Desai et al. 2024 = Meera A. Desai, Irene V. Pasquetto, Abigail Z. Jacobs, Dallas Card, An archival perspective on pretraining data, «Patterns», 5 (2024), 4, p. 1-11, https://doi.org/10.1016/j.patter.2024.100966.
Di Marcantonio 2023 = Giorgia Di Marcantonio, From Record to Data. New purposes for Archival Description processes, «JLIS.it», 14 (2023), 2, p. 1-11, https://doi.org/10.36253/jlis.it-549.
Djamasbi - Siegel - Tullis 2010 = Soussan Djamasbi, Marisa Siegel, Tom Tullis, Generation Y, Web Design, and Eye Tracking, «International Journal of Human-Computer Studies», 68 (2010), 5, p. 307-323, https://doi.org/10.1016/j.ijhcs.2009.12.006.
European Commission 2024 = European Commission, «Regulation 2021/206 of the European Parliament and of the Council laying down harmonized rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts», 2024.
Feliciati - Grana 2005 = Pierluigi Feliciati, Daniela Grana, Dal labirinto alla piazza. Il progetto “Sistema Informativo degli Archivi di Stato”, «Scrinia», II (2005), 2-3, p. 9-18.
Ferrara 2023 = Emilio Ferrara, Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models, «FirstMonday», 28 (2023), 11. https://doi.org/10.5210/fm.v28i11.13346.
Floridi - Chiriatti 2020 = Luciano Floridi, Massimo Chiriatti, GPT-3: Its Nature, Scope, Limits, and Consequences, «Minds and Machines», 30 (2020), 4, p. 681-694, https://doi.org/10.1007/s11023-020-09548-1.
Floridi 2022 = Luciano Floridi, Etica dell’intelligenza artificiale: sviluppi, opportunità, sfide, Milano, Raffaello Cortina, 2022.
Gamma et al. 2011 = Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Boston, Addison Wesley, 2003.
Gomez-Uribe - Hunt 2016 = Carlos A. Gomez-Uribe, Neil Hunt, The Netflix Recommender System: Algorithms, Business Value, and Innovation, «ACM Transactions on Management Information Systems», 6 (2016), 4, p. 1-19, https://doi.org/10.1145/2843948.
Grana 2004 = Daniela Grana, Il Sistema informative degli Archivi di Stato, «Archivi & Computer», 2 (2004), p. 78-84.
Grana 2005 = Daniela Grana, Le attività e i progetti di digitalizzazione nell’amministrazione archivistica, «DigItalia», 1 (2005), p. 92-96.
Gruppo di lavoro per la revisione e la reingegnerizzazione del Sistema Informativo Nazionale “Anagrafe informatizzata degli archivi italiani 2000 = Gruppo di lavoro per la revisione e la reingegnerizzazione del Sistema Informativo Nazionale “Anagrafe informatizzata degli archivi italiani, Riprogettare “Anagrafe”. Elementi per un nuovo sistema archivistico nazionale, «Rassegna degli Archivi di Stato», LX (2000), 2, p. 373-454.
Guerrini 2022 = Mauro Guerrini, Dalla catalogazione alla metadatazione: tracce di un percorso, a cura di Denise Biagiotti e Laura Manzoni, 2. ed., Roma, Associazione italiana biblioteche, 2022.
Heer - Agrawala 2006 = Jeffrey Heer e Maneesh Agrawala, Software Design Patterns for Information Visualization, «IEEE Transactions on Visualization and Computer Graphics», 12 (2006), 5, p. 853-860, https://doi.org/10.1109/TVCG.2006.178.
Hoffmann et al. 2022 = Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katherine Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Oriol Vinyals, Jack William Rae, Laurent Sifre, An empirical analysis of compute-optimal large language model training, in Advances in Neural Information Processing Systems, edited by Sanmi Koyejo, Shakir Mohamed, Alekh Agarwal, Danielle Belgrave, Kyunghyun Cho, Alice Oh, New York, Curran Associates Inc., 2022. Preprint: https://proceedings.neurips.cc/paper_files/paper/2022/file/c1e2faff6f588870935f114ebe04a3e5-Paper-Conference.pdf.
International Organization for Standardization 2023 = International Organization for Standardization, «ISO/IEC 42001:2023. Information technology, Artificial intelligence, Management system», 2023.
Lewis et al. 2020 = Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, in NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems, edited by Hugo Larochelle, Marc’Aurelio Renzato, Raia Hadsell, Maria-Florina Balcan, Hsuan-Tien Lin, New York, Curran Associates Inc., 2020, p. 9459-9474.
Ji et al. 2023 = Ziwei Ji, Niyeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishil, Ye Jin Bang, Andrea Madotto, Pascale Fung, Survey of Hallucination in Natural Language Generation, «ACM Computing Surveys», 55 (2023), 12, p. 1-38, https://doi.org/10.1145/3571730.
Kaddour et al. 2023 = Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J Kusner, No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models, in Advances in Neural Information Processing Systems, edited by Alice Oh, Tristan Neumann, Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine, New York, Curran Associates Inc., 2023. Preprint: https://proceedings.neurips.cc/paper_files/paper/2023/file/51f3d6252706100325ddc435ba0ade0e-Paper-Conference.pdf.
Kasneci et al., 2023 = Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, Gjergji Kasneci, ChatGPT for good? On opportunities and challenges of large language models for education, «Learning and Individual Differences», 103 (2023), https://doi.org/10.1016/j.lindif.2023.102274.
Ko et. al. 2022 = Hyeyoung Ko, Suyeon Lee, Yoonseo Park e Anna Choi, A Survey of Recommendation Systems: Recommendation Models, Techniques, and Application Fields, «Electronics», 141 (2022), 11, p. 1-48, https://doi.org/10.3390/electronics11010141.
Min et al. 2022 = Erxue Min, Runfa Chen, Yatao Bian, Tingyang Xu, Kangfei Zhao, Wenbing Huang, Peilin Zhao, Junzhou Huang, Sophia Ananiadou, e Yu Rong, Transformer for Graphs: An Overview from Architecture Perspective, «ArXiv», 2022, p. 1-8. Preprint: https://doi.org/10.48550/ARXIV.2202.08455.
Nilsson 2002 = Nils J. Nilsson, Intelligenza artificiale, a cura di Salvatore Gaglio, Milano, APOGEO, 2002.
O’Brien - Toms 2008 = Heather O’Brien, Elaine G. Toms, What Is User Engagement? A Conceptual Framework for Defining User Engagement with Technology, «Journal of the American Society for Information Science and Technology», 59 (2008), 6, p. 938-955, https://doi.org/10.1002/asi.20801.
Online Catalogs 2009 = Online Catalogs: What Users and Librarians Want: An OCLC Report, a cura di Karen Calhoun, Diane Cellentani, OCLC, Dublin, Ohio, OCLC, 2009.
Pastura 2006 = Maria Grazia Pastura, Il Sistema informatico unificato delle soprintendenze archivistiche (SIUSA), «Archivi & Computer», XVI (2006), 3, p. 12-18.
Pavone 1995 = Claudio Pavone, La Guida generale degli Archivi di Stato, riflessioni su un’esperienza, «Le carte e la storia», 1 (1995), p. 10-12.
Pezzali 2024 = Roberto Pezzali, Abbiamo provato Minerva, l’AI italiana della Sapienza di Roma: è fissata con il sesso e risponde senza senso, DDAY.it, 10/05/2024, online: https://www.dday.it/redazione/49301/abbiamo-provato-minerva-lia-italiana-della-sapienza-di-roma-e-fissata-con-il-sesso-e-spesso-risponde-senza-senso.
Piano Nazionale di Ripresa e Resilienza 2021 = Piano Nazionale di Ripresa e Resilienza, PNRR, 2021, https://www.governo.it/sites/governo.it/files/PNRR.pdf.
Prom 2004 = Christopher Prom, User Interactions with Electronic Finding Aids in a Controlled Setting, «The American Archivist», 67 (2004), 2, p. 234-268, https://doi.org/10.17723/aarc.67.2.7317671548328620.
Russell - Norvig 2005 = Stuart J. Russell, Peter Norvig, Intelligenza artificiale: un approccio moderno, Milano, Pearson Prentice Hall, 2005.
Sabba - Plachesi 2017 = Fiammetta Sabba, Giorgia Plachesi, Origini e prospettive del progetto SBN, «AIB Studi», 57 (2017), 3, p. 493-514, https://doi.org/10.2426/aibstudi-11711.
Santoro 2006 = Michele Santoro, Biblioteche e innovazione. Le sfide del nuovo millennio, Milano, Editrice Bibliografica, 2006.
Shuster et al. 2021 = Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, e Jason Weston, Retrieval Augmentation Reduces Hallucination in Conversation, in Findings of the Association for Computational Linguistics: EMNLP 2021, edited by Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wentau Yih, Punta Cana, Dominican Republic., Association for Computational Linguistics, 2021, p. 3784-3803, https://doi.org/10.18653/v1/2021.findings-emnlp.320.
Tomasi 2022 = Francesca Tomasi, Organizzare la conoscenza: digital humanities e web semantico, Milano, Editrice Bibliografica, 2022.
Valacchi 2008 = Federico Valacchi, Contenitori e contenuti: ancora sull’offerta archivistica nel web, «Archivi», IV (2008), 1, p. 33-72.
Vassallo 2023 = Salvatore Vassallo, From typewriter to bit: how finding aids evolve, « JLIS.it», 14 (2023), 3, p. 83-104, https://doi.org/10.36253/jlis.it-559.
Weston - Vassallo 2007 = Paul Gabriele Weston, Salvatore Vassallo., “… e il navigar m’è dolce in questo mare”: linee di sviluppo e personalizzazione dei cataloghi, in La biblioteca su misura: verso la personalizzazione del servizio, a cura di Claudio Gamba e Maria Laura Trapletti, Milano, Editrice Bibliografica, 2007, p. 130-67.
Weston 2002 = Paul Gabriele Weston, Il catalogo elettronico. Dalla biblioteca cartacea alla biblioteca digitale, Roma, Carocci, 2002.
Willer - Dunsire 2013 = Mirna Willer, Gordon Dunsire, Bibliographic Information Organization in the Semantic Web, Oxford, Chandos, 2013.
Xu, Jain - Kankanhalli 2024 = Ziwei Xu, Sanjay Jain, Mohan Kankanhalli, Hallucination is Inevitable: An Innate Limitation of Large Language Models, «ArXiv», 2024, p. 1-26. Preprint: https://doi.org/10.48550/arXiv.2401.11817.
Yakel - Shaw - Reynolds 2007 = Elizabeth Yakel, Seth Shaw, Polly Reynolds, Creating the Next Generation of Archival Finding Aids, «D-Lib Magazine», 13 (2007), 5-6, http://dx.doi.org/10.1045/may2007-yakel.
Zhang et. al. 2023 = Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi, Siren's song in the AI ocean: a survey on hallucination in large language models, «ArXiv», 2023. Preprint: https://doi.org/10.48550/arXiv.2309.01219.
Zheng et al. 2017 = Pai Zheng, Shiqiang Yu, Yuanbin Wang, Ray Y. Zhong, e Xun Xu, User-Experience Based Product Development for Mass Personalization: A Case Study, «Procedia CIRP», 63 (2017), p. 2-7, https://doi.org/10.1016/j.procir.2017.03.122.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Giorgia Di Marcantonio
This work is licensed under a Creative Commons Attribution 4.0 International License.