Use of genAI for research
Guideline for researchers
Generative AI (genAI) is increasingly being used in the various phases of research: from developing a research idea or grant proposal and defining the scope of a research project, to conducting research and presenting research results.
This guideline provides information on how genAI works, and the possibilities, and risks of using genAI.
More information about the use of genAI can be found on the Learning and Teaching Community (LTC) Hub for staff. The tool picker provides additional information, including a risk assessment, on the use of genAI software such as Copilot.
What is generative AI (genAI)?
Generative AI (genAI) is a form of AI that automatically creates content based on 'prompts' (questions or requests from users). Large Language Models (LLMs) use probabilistic calculations to predict which words or structures best fit within a text. This enables them, for example, to detect spelling and grammatical errors and to suggest improvements to a text. Multimodal models can generate not only text but also video, audio, or images.
GenAI is trained on large amounts of existing data and operates based on probabilistic calculations and algorithms. It therefore does not constitute human 'intelligence' and has no 'understanding.' It can also make mistakes and generate factual inaccuracies ('hallucinations').
Each LLM has its own model architecture and therefore unique capabilities and performance characteristics, depending on its intended use. It is therefore essential for researchers to carefully determine which model they use and for what purpose.
Principles: responsible and transparent use
Researchers at the law faculty are themselves responsible for the responsible use of genAI in their research. Reviewing (checking) all genAI-generated output (i.e. proofreading and correcting where necessary) is essential, regardless of the capabilities of the system used.
Responsibility for the responsible use of genAI applies to all researchers, including PhD candidates and research assistants. Researchers discuss the various risks with any co-authors and supervisors.
Junior researchers are encouraged, at an early stage of their careers, to have their research and writing skills well developed before using genAI tools.
The use of genAI must be both responsible and transparent. Depending on the application and the guidelines of a publisher, the use of genAI must be disclosed in a statement (for example in a footnote) or in the methods section of an article or chapter.
For certain applications, a publisher may require that prompts are recorded in a logbook or that output is documented for verification. Guidelines on the use of AI often also exist for the writing of grant applications (such as those of NWO and the ERC).
Doctoral Research
PhD candidates and their supervisors share responsibility for discussing the use of genAI (and its associated risks) in doctoral research. The use of genAI must be disclosed and justified in the dissertation.
Supervisors may not rely solely on genAI to provide feedback on the work of PhD candidates, as this would impede adequate and substantive supervision of the doctoral trajectory. When genAI is used for feedback, the associated risks and ethical aspects must be considered.
In case of questions, consultation with the Graduate School is possible.
Opportunities
GenAI offers many opportunities for research. For example, language models (LLMs) can be used to improve texts and to conduct research on the web. Researchers can also use genAI as a tool for coding, brainstorming, translating, analysing text or numerical data, and structuring their work.
GenAI can be applied for multiple purposes, but it is not equally suitable for all tasks. The overview below helps researchers determine more concretely what genAI is good at, reasonably good at, or poorly suited for.
GenAI is good at:
- Summarising and making text more accessible, such as documents, (parts of) court decisions, parliamentary documents, publications, and web pages;
- Rewriting text for clarity or in a specific style;
- Formatting and improving documents or presentations;
- Asking questions, assisting with brainstorming, and visualising text;
- Serving as a tool for literature research: searching for and extracting key points from literature;
- Serving as a tool for coding.
GenAI is reasonably good at:
- Answering general questions about a topic (based on publicly accessible information) through web searches;
- Writing general texts on a topic (based on publicly accessible data, with the risk of incorrect information and hallucinations);
- Generating text for presentations;
- Performing (an initial) analysis of datasets and documents;
- Translating texts.
GenAI is poor at:
- Creating new knowledge and ideas;
- Making legal decisions;
- Interpreting the consequences of policies and decisions;
- Safely searching for and analysing (grey) literature behind paywalls (although this is possible to a limited extent with 'agents').
Risks
The use of genAI entails risks. It is important to realise that the use of genAI-generated output is not without consequences. For example, such output may be cited in other research or used to inform important decisions.
Where possible, it is recommended to make use of the 'opt-out' options in genAI systems to limit the reuse of interactions for product improvement or model training, also in view of potential infringements of intellectual property rights and in the interest of knowledge security.
The points of attention below are important for researchers when using genAI.
Knowledge security
Processing research data with an AI system from a commercial provider carries risks for knowledge security. For example, it may be unclear to what extent the commercial provider uses the data for its own purposes, applies it for product improvement or model training, or is required to share it with governments.
Information regarding knowledge security can be found here.
Privacy and data protection
When using commercial genAI, data is processed in the company’s cloud. It is not always possible to limit or prevent data sharing. When processing personal data (data that can be directly or indirectly traced to a natural person), the General Data Protection Regulation (GDPR) applies. Researchers are advised to anonymise text before inputting it into genAI.
Using local LLMs (LLMs stored locally and running on the computing power of a local computer) can, in some cases, reduce privacy risks. Note, however, that the GDPR still applies, and personal data must always be adequately protected.
Researchers are advised to include the use of genAI in the Data Protection Impact Assessment (DPIA) as part of the research plan.
Researchers with questions about data protection can consult the Privacy and Security Officers.
Copyright
Uploading or copying copyrighted material can entail risks, for example if a publisher’s license does not allow it. These risks are not yet fully clarified.
In general, publishers can impose restrictions through licenses. As a guideline, uploading your own material or material licensed under an open access arrangement does not constitute copyright infringement. Legislation, case law, and parliamentary documents are copyright-free. In some cases, an exception applies for individual use of copyrighted material for scientific research. If collaboration occurs with commercial parties or commercial goals are also pursued, this exception does not apply.
By keeping a logbook or saving interactions with chatbots, researchers can, if necessary, demonstrate that the work is original and that copyright is respected.
Ethical aspects
When using genAI, it is important for researchers to be aware of the ethical aspects. For example, genAI relies on pre-trained models, which can contain biases. Some models, for instance, are trained more heavily on a Western worldview, which can cause the output to reinforce certain viewpoints and stereotypes.
For collaborative research activities (such as publications), it is advisable to hold prior discussions about the use, justification, and ethical aspects of using genAI tools.
Environmental impact
GenAI tools require significant amounts of energy and can therefore have negative environmental consequences. In particular, data centres - which enable the processing of the data underlying LLM output - consume large amounts of electricity and water. Generating images, audio, and videos, as well as conducting 'deep research' involving reasoning, uses even more energy than simpler applications.
For this reason, use genAI consciously and consider whether the use of these tools is necessary, also in relation to other digital language tools.