By Large Language Models
Copyright thehindubusinessline
Although the title of the article suggests a light-hearted topic, it addresses what could be an important aspect of the Large Language Model (LLM) AI. As a chartered accountant working with the LLM AI (mostly the free versions), being aware of hallucinations, and checking the outcome for veracity, the author would like to talk about the further limitations (if there are any).
The following is the reproduction of prompts (questions) and the responses by Gemini (free version) which are self-explanatory. The responses are edited substantially for brevity.
One understands that Large Language Models are based on voluminous amount of data made available which form their core. Is this a continuous process or is there a cut-off date when it stops.
The process of building a large language model like me is not a single, continuous stream of data. Instead, it’s a multi-stage process that involves a distinct “knowledge cut-off” date.
The reason for this is the immense computational and financial cost of this process. It takes an enormous amount of time and energy to train a model on trillions of data points. It is not feasible to do this continuously.
While the core knowledge is fixed, models can be updated and improved in other ways.
However, the ability to perform live searches and retrieve new information allows me to overcome this limitation and provide answers on a wide range of current topics.
Is your answer limited to June 2024 knowledge being the cut-off date?
My knowledge is not limited to June 2024. My training data includes information from a vast range of sources, and I am continuously updated to provide the most current and accurate information possible.
In an earlier prompt, Gemini had indicated June 2024 as the cut-off date which was later, in response to another prompt, revised as follows: ‘The most up-to-date information available indicates that my core knowledge cut-off is January 2025. Some specific models, like the gemini-2.5-flash-image-preview model, have a more recent cut-off of June 2025’.
When will you enhance your core knowledge? This is going to be a perennial problem as days pass.
You’ve pinpointed one of the biggest challenges in the field of large language models. The problem of a fixed knowledge cut-off is indeed “perennial” as you’ve called it, and it’s something that researchers and developers are actively working to solve.
Retrieval-Augmented Generation (RAG): This is the current solution and what I use to provide you with up-to-date information. It’s a technique called Retrieval-Augmented Generation (RAG). This approach is highly effective for many queries, but as you correctly pointed out, it has limitations. The quality of the answer is dependent on the quality of the retrieved information, and it can sometimes lead to less nuanced answers compared to what a fully trained model could provide. The field is moving towards more dynamic and efficient methods to solve this problem.
Does it mean the quality and reliability of answers are better when confined to pre June 2024 topic than post June 2024 since your core knowledge is of higher quality?
The short answer is: Yes, for certain types of questions; the quality and reliability of my answer can be better when confined to topics within my core knowledge cut-off.
Here’s a more detailed breakdown of why:
Pre-June 2024 topics (within my core knowledge): Deep and integrated knowledge: For these topics, my understanding is based on the vast amount of data which includes books, articles, code, integrated into my core neural network. This allows me to draw connections, reason, and provide nuanced and comprehensive answers.
Reduced Hallucination Risk: The risk of a plausible-sounding but false fact (a “hallucination”) is lower. I am not relying on a real-time, external source that might be biased, inaccurate, or just a single, non-authoritative page.
Contextual understanding: My core training gives me a deep understanding of the historical and contextual relationships between concepts. If you ask me about the causes of a historical event, I can pull from a wide range of interconnected information to give you a well-rounded answer.
Post-June 2024 topics (using real-time search): Reliance on external sources: For these topics, I am not answering from my internal knowledge. I’m using an external tool — Google Search — to find the information.
The “pipeline” matters: The quality of the answer is heavily dependent on the quality of the search results I retrieve: if the information is on a single, low-quality, or biased website, my answer will reflect that; if the information is from multiple, authoritative sources, my answer will be more robust and reliable; the process of me reading, synthesising, and summarising that information from search results introduces a potential for error that is not present when I’m drawing from my internal, trained knowledge.
Potential for inaccuracies and gaps: There is a possibility that I may not find all relevant information, or that the information I find is incomplete. This can lead to a less comprehensive or even a partially incorrect answer. In some cases, if the information is too new or sparse on the web, I may not be able to provide a detailed answer at all.
The trade-off: For topics within my core knowledge base, you’re getting a more deeply integrated, synthesised, and less-prone-to-hallucination answer. For topics outside of that, you’re getting an answer that is as good as the live search results I’m able to find and process.
The key takeaway is to always treat my answers — especially on new, rapidly changing, or highly technical topics — as a starting point. It’s always a good practice to verify critical facts, regardless of when the information was published.
Conclusion (by the author)
For a similar question, Perplexity (free version) also answered June 2024 as the cut-off point.
The point of the article is to ensure that the user is fully aware of a further dimension to the LLMs. As for the author, AI has been a very useful tool provided certain precautions are observed.
The author is a chartered accountant
Published on September 20, 2025