AI Showdown: Three Chatbots Tackle India’s Toughest UPSC Exam Challenge

Every year, over one million aspirants invest years in preparing for one of India’s most challenging examinations, the UPSC Civil Services Preliminary. The cutoff for 2025 was set at 92.66 marks out of 200; thus, a single incorrect guess can shatter an aspirant’s dreams. With the rise of AI tools such as ChatGPT, Gemini, and Claude, many students wondered if these AIs could successfully take the exam themselves.

To investigate this, we assessed the performance of these AI models using the actual UPSC CSE Prelims GS Paper 1 from 2025 (May 25, 2025) and 2024 (June 16, 2024), with official answer keys available. Each AI model was given all 100 questions from each paper individually and was required to provide answers along with one-line reasoning.

The models evaluated include ChatGPT (GPT-5, May 2026), Gemini (2.5 Pro), and Claude (Sonnet 4.5). All received the same questions in plain text without hints, coaching, or prior context. They were instructed to identify the single correct answer from options labeled (a) through (d) and provide brief reasoning. No web search capabilities or priming were used, meaning the AI relied solely on the information gleaned during their training.

For scoring, the official UPSC marking scheme was employed: +2 for each correct answer, -0.67 for incorrect responses, and 0 for unattempted questions. All three AI models attempted all 100 questions.

About the 2025 Paper

The 2025 GS Paper 1 was characterized as moderate to difficult, with a significant emphasis on economics (18 questions), followed by environment and ecology (15), polity (14), history and culture (15), and science and technology (12). A notable feature of this paper was the prevalence of multi-statement verification questions, which penalize guessing more heavily compared to traditional factual recall. The official cutoff for the general category was established at 92.66 marks, the highest since 2020.

Final Scorecard: UPSC Prelims 2025

Category	ChatGPT (GPT-5)	Gemini (2.5 Pro)	Claude (Sonnet 4.5)	2025 Cutoff
GS Paper 1 Score (est.)	~118 marks	~122 marks	~112 marks	92.66
Questions Correct (of 100)	~73	~76	~68	~46
Accuracy %	73%	76%	68%	N/A
Would Clear Prelims?	YES	YES	YES	—

All three AI models surpassed the cutoff of 92.66 marks in 2025. However, the subject-by-subject analysis revealed significant differences in their capabilities.

Sample Questions: AI Responses Analysis

To illustrate how each AI performed, we present a selection of questions from the 2025 paper, alongside their answers and the correct response.

Q#	Question (abbreviated)	ChatGPT	Gemini	Claude	Key	Result
1	Alternative powertrain vehicles	C (correct)	C (correct)	C (correct)	C	All correct
2	UAV capabilities	B (correct)	D (wrong)	D (wrong)	B	Split result
6	CL-20, HMX, LLM-105 common characteristic	B (wrong)	C (correct)	B (wrong)	C	Gemini wins
12	India and COP28 health declaration	D (correct)	C (wrong)	D (correct)	D	Split result
25	Fa-hien travelled to India during reign	B (correct)	B (correct)	B (correct)	B	All correct

Performance Analysis

Gemini 2.5 Pro: Frontrunner (76/100, ~122 marks)
Gemini exhibited the strongest overall performance, particularly excelling in current affairs and environment questions. It correctly identified AIIB for the Nature Solutions Finance Hub question, while ChatGPT and Claude mistakenly mentioned ADB, indicating Gemini’s superior retention of recent institutional knowledge. Its weakest area was science and technology.

ChatGPT GPT-5: Consistent but Cautious (73/100, ~118 marks)
ChatGPT provided a consistent performance across subjects. Notably proficient in polity and history, its weaknesses were evident in environment and current affairs. For instance, on a question about CL-20 and fuel types, it displayed a preference for broader categories rather than specifics.

Claude Sonnet 4.5: Reliable Reasoner, Gaps in Specifics (68/100, ~112 marks)
Claude emerged with the narrowest margin above the cutoff. It excelled in questions requiring logical reasoning but faltered in specific current affairs and environment queries, missing the Mahajanapadas-rivers pairing.

Subject-wise Analysis

History and Culture: Strong Performance
All three AIs scored above 80% in history, demonstrating strong confidence in questions about significant historical figures and events.

Current Affairs and Environment: Significant Challenges
The performance of all AIs dropped in these areas. Specific questions, often about timely and nuanced topics, proved difficult, showing that AI models struggle with recent developments and intricate details.

Science and Technology: Technical Distinctions are Challenging
This was the section where all three AIs struggled, particularly with specific queries around advanced technologies, indicating a gap in specialized knowledge.

2024 Paper: Benchmark Comparison

The 2024 UPSC Prelims saw a slightly lower cutoff of 88 marks. When tested on a sample of 30 questions, the AIs performed 2-5 percentage points better compared to 2025. In 2024, a UPSC-focused AI app scored significantly higher. By 2025-26, the gap narrowed, with models now clearing prelims without specialized training.

Final Thoughts

While AI can clear the UPSC Prelims, it remains one of three stages of the examination, which also includes Mains and Personality Tests. The latter stages require original analytical writing and interpersonal skills that current AI cannot replicate. Consequently, while AI has improved aspirants’ preparation, success still hinges on human effort, particularly in staying updated with current events and developing in-depth knowledge. The 2025 examination highlighted this reality, underscoring that sustained effort, real-time awareness, and analytical aptitude remain irreplaceable.

About the 2025 Paper

Final Scorecard: UPSC Prelims 2025

Sample Questions: AI Responses Analysis

Performance Analysis

Subject-wise Analysis

2024 Paper: Benchmark Comparison

Final Thoughts

Leave a Reply Cancel reply

Latest News

Bárbara de Regil Announces Decision to Have No More Children

Muslim Driver Returns Rs 15,682 Mistakenly Paid by Ola Passenger

Indian Taxpayers Question Spending on Part-Time Ministers and Event Managers

Dallas Shows Strong Support for Cooper Flagg: Implications for Kyrie Irving’s Future

Muslim Auto Driver Altaf Earns Praise for Returning ₹15,682 to Passenger

Education Minister Takes Extreme Measures to Secure Position Amid Challenges

About IndiaNewsWeek

Quick Links

More Sections

Stay Connected