Features
Pulls latest research papers via the arXiv API
Embeds abstracts using
allenai/scibert_scivocab_uncased
Semantic search with
AnnoyIndex
Smart matching based on abstract similarity
Built with FastAPI for speed and developer friendliness
Tech Stack
FastAPI - Web framework
SciBERT - Sentence embeddings for scientific language
Annoy - Approximate nearest neighbor search
arXiv API - Source of research papers
NumPy, Requests, XML - Support libs
Installation
git clone https://github.com/yourusername/semantic-research-search.git cd semantic-research-search pip install -r requirements.txt
Make sure to include this in your requirements.txt
:
fastapi uvicorn requests sentence-transformers annoy numpy
Running the API
uvicorn main:app --reload
Visit http://127.0.0.1:8000/docs
to explore the interactive Swagger UI.
API Endpoints
GET /
Purpose: Health check
Returns: {"Message": "Localhost works!"}
GET /FetchPapersFromARXIv?query=<your_topic>
Description: Fetches top arXiv papers for a given query Example:
curl http://localhost:8000/FetchPapersFromARXIv?query=deep+learning
GET /SearchResearchPapers?query=<your_topic>
Description: Fetches, embeds, and searches research papers semantically Example:
curl http://localhost:8000/SearchResearchPapers?query=neural+networks
Returns: Top 3 semantically matched research papers with:
Title
Abstract
First Author
Link to paper
How It Works
Sends your query to the arXiv API.
Extracts metadata + abstract from top results.
Generates embeddings using SciBERT.
Stores them in an
AnnoyIndex
for fast vector similarity lookup.Encodes the query and returns the closest papers by semantic meaning.