AutoInsight: AI-Powered Article Analysis Project#

Disclaimer from a reviewer: This report is formatted as github README.md file (not jupyter book), so I recommend to read it on github by the following link to see a proper formatting.

Introduction#

Handling a vast number of articles can be time-consuming to analyze. To address this challenge, you can delegate the article analysis task to a Large Language Model (LLM). The LLM will read the articles and provide answers to your specific questions. This project is designed to streamline the process, replacing traditional search and analysis methods to save valuable time. This repository provides a tool for downloading articles from PubMed and implementing automatic question-answering based on the content of these articles. The goal is to streamline the process of retrieving relevant scientific literature and extracting valuable information through a question-answering system.

Results#

To address this challenge, you must tackle two key issues: downloading articles from a database and analyzing them using LLM. The PubMedSearcher class handles the first task, finding the necessary number of relevant articles based on your search query and downloading the full text for free articles. The second challenge is addressed by the OpenAiManager class, which takes the full text of the article or abstract, along with the question you want to answer, and returns the answer to the question. To test the program, I entered the query “rapamycin in aging” and downloaded 200 articles, for which I asked LLM to answer the following questions:

  • Is rapamycin increasing lifespan?

  • What animals was the study conducted on?

  • Which dosages of treatment (and other parameters you consider relevant) were presented in papers?

Here finall results for each question#

image

Exmaple 1 (https://pubmed.ncbi.nlm.nih.gov/31761958/)#

Answer from LLM image Real data image

Exmaple 2 (https://pubmed.ncbi.nlm.nih.gov/28374166/)#

Answer from LLM image Real data image

Exmaple 3 (https://pubmed.ncbi.nlm.nih.gov/26442901/)#

Answer from LLM image Real data image

Discussion#

This represents a basic implementation of the concept for automated article analysis, leveraging a free LLM to address the problem. To enhance analysis results and program performance, future iterations may involve employing a paid version of GPT. This could mitigate errors related to requests through the g4f library. Additionally, an alternative approach could be explored, incorporating vector databases and Langhain library for further optimization and perfomance comparison

Features#

  • preprocesser.py: Fetch articles from PubMed using queries.

  • gpt_manager.py: Extract valuable insights from the downloaded articles by posing questions to the system.

Getting Started#

  1. Clone the repository:

git clone https://github.com/ifreyk/gpt_pubmed.git
  1. Install dependencies:

pip install -r requirements.txt
  1. Open application.py as a notebook: Follow the code