Deep Research (GitHub Project by dzhng)

person James
calendar_today March 30, 2025

Deep Research (GitHub Project by dzhng)

Purpose and Overview

Deep Research (also called Open Deep Research) is an open-source AI-powered research assistant that performs iterative, “deep” research on any topic by combining web search, web scraping, and large language models (LLMs) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). It aims to refine its research direction over multiple steps, diving deeper based on previous findings. The project’s goal is to provide a simple yet effective implementation of a deep research agent in under 500 lines of code (for transparency and ease of extension) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). In practice, Deep Research takes a user query and systematically searches for information, analyzes results, and produces a comprehensive report with findings and sources. It was created by David Zhang (GitHub user dzhng) and is released under the MIT License (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). The repository is very popular, with over 14,000 stars and 1,400 forks on GitHub, indicating strong community interest (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). (The project is sponsored by Aomni, an AI startup (deep-research/README.md at main · dzhng/deep-research · GitHub), but it is available for anyone to use or contribute to.)

Features

Deep Research provides several key features for automated research (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub):

Installation and Requirements

Deep Research is a Node.js application written in TypeScript, so it requires a Node.js runtime (the project targets Node v22.x per its configuration (deep-research/package.json at main · dzhng/deep-research · GitHub)). You can either set it up directly on your machine or use Docker. Before installation, you will need API keys for two services: a Firecrawl API key (for web search and content extraction) and an OpenAI API key (for the LLM) (deep-research/README.md at main · dzhng/deep-research · GitHub). (Firecrawl is a service for performing search engine queries and scraping pages programmatically, and OpenAI’s API provides the language model for analysis.) The basic requirements are summarized as follows (deep-research/README.md at main · dzhng/deep-research · GitHub):

Node.js Installation

To install Deep Research locally using Node.js:

  1. Clone the Repository: Download or clone the GitHub repository to your machine.
  2. Install Dependencies: In the project directory, run the installation command (uses NPM) to install all required Node packages – for example: npm install (deep-research/README.md at main · dzhng/deep-research · GitHub). This will fetch libraries such as the Firecrawl client, OpenAI SDK, and others (Express, Zod, etc.) that the project depends on (deep-research/package.json at main · dzhng/deep-research · GitHub).
  3. Configure API Keys: Create a .env.local file (you can copy or rename the provided .env.example file) and add your API keys and settings (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). At minimum, you need to set:
  4. FIRECRAWL_KEY="your_firecrawl_key"
  5. OPENAI_KEY="your_openai_key"
    You can also configure optional variables here, such as FIRECRAWL_BASE_URL if using a self-hosted Firecrawl server, or OPENAI_ENDPOINT and OPENAI_MODEL if you want to use a custom or local LLM instead of OpenAI (deep-research/README.md at main · dzhng/deep-research · GitHub). (If you leave out the OpenAI key and instead provide a custom endpoint and model, the tool will use those settings to query a local or alternative model.)

Docker Installation

If you prefer using Docker (which avoids manual Node setup and isolates the environment), the repository includes a Docker setup:

  1. Clone the Repository: Get the code as above.
  2. Create Environment File: Rename the .env.example file to .env.local and insert your API keys into it (same as the Node.js step) (deep-research/README.md at main · dzhng/deep-research · GitHub).
  3. Install Dependencies: Even in the Docker setup, it’s recommended to run npm install once (either on host or in the Docker build) to ensure dependencies are in place (deep-research/README.md at main · dzhng/deep-research · GitHub).
  4. Start Docker Containers: Run the provided Docker Compose configuration: docker compose up -d to start the container in the background (deep-research/README.md at main · dzhng/deep-research · GitHub). This sets up a container (named deep-research by default) with the Node environment.
  5. Execute the App in Container: Finally, execute the Deep Research program inside the Docker container. For example: docker exec -it deep-research npm run docker (deep-research/README.md at main · dzhng/deep-research · GitHub). This attaches to the running container and initiates the research agent. (The npm run docker script likely ensures the application runs with the correct settings inside Docker.)

After either installation method, you should have the application ready to accept research queries.

Usage

To use Deep Research, you simply run the program and interact with its prompts. If installed via Node, run npm start in the project directory to launch the research assistant (deep-research/README.md at main · dzhng/deep-research · GitHub). (If using Docker, the last installation step already runs the program inside the container.) Once running, the tool operates in a console (command-line) interface:

  • Initial Prompt: The system will ask you to enter your research query (the topic or question you want investigated). After that, it asks for the breadth and depth parameters for the research (deep-research/README.md at main · dzhng/deep-research · GitHub). You can input how broad you want the search (number of parallel search queries, recommended 3–10, default 4) and how deep the research should go (number of recursive iterations, recommended 1–5, default 2) (deep-research/README.md at main · dzhng/deep-research · GitHub). These settings control the scope of the exploration.
  • Interactive Refinement: The assistant may also ask follow-up questions to refine its understanding of your needs (deep-research/README.md at main · dzhng/deep-research · GitHub). For example, it might seek clarification on ambiguous terms or ask you to choose between subtopics, which helps it steer the research in the most relevant direction. After you respond to any follow-ups, the automated search process begins.

Once you’ve provided the necessary inputs, Deep Research will autonomously perform the following steps to gather information and produce results:

  1. Generate and Execute Searches: The agent uses the LLM to formulate relevant search engine queries based on your query and goals. It then executes these searches via the Firecrawl API, retrieving the top web results (deep-research/README.md at main · dzhng/deep-research · GitHub).
  2. Process Results: For each result, it will visit the page (scrape content) and analyze the information using the LLM (deep-research/README.md at main · dzhng/deep-research · GitHub). The system looks for answers, key facts, or new leads in the text. It keeps track of “learnings” (important findings) and “directions” (new questions or subtopics to pursue) (deep-research/README.md at main · dzhng/deep-research · GitHub).
  3. Recursive Deepening: If the depth setting > 0 and there are new directions to explore, the agent will iterate. It treats the new directions as next-level queries and repeats the search and analysis cycle (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). In each iteration, it incorporates what it has learned so far (prior goals and new questions) to focus the subsequent searches (deep-research/README.md at main · dzhng/deep-research · GitHub). This looping continues until the specified depth is reached (or no further directions remain).
  4. Report Generation: After completing the required rounds of research, Deep Research compiles a comprehensive report in Markdown format (deep-research/README.md at main · dzhng/deep-research · GitHub). The report includes the question, the gathered information and answers, and references (citations) to the source URLs for verification (deep-research/README.md at main · dzhng/deep-research · GitHub). This report is saved to a file in the working directory (by default named report.md, or answer.md in certain modes) (deep-research/README.md at main · dzhng/deep-research · GitHub). You can open this Markdown file to read the findings presented in a structured way.

Concurrency: By default, the agent runs a few searches in parallel to speed up the process. If you have a paid or self-hosted Firecrawl with higher rate limits, you can increase the concurrency setting in the code to make it run even faster (conduct more searches simultaneously) (deep-research/README.md at main · dzhng/deep-research · GitHub). Conversely, if you’re on the free Firecrawl plan and hit rate limits, you might lower the concurrency to avoid errors (deep-research/README.md at main · dzhng/deep-research · GitHub). This adjustability allows the user to optimize performance based on their resources.

Advanced Usage: Deep Research is flexible in terms of which language model it uses. Out of the box, it uses OpenAI’s “o3-mini” model via the OpenAI API for analyzing content and generating queries (deep-research/README.md at main · dzhng/deep-research · GitHub). However, the tool can automatically switch to an open-source model called DeepSeek R1 (by Fireworks.ai) if you provide a Fireworks API key (deep-research/README.md at main · dzhng/deep-research · GitHub). DeepSeek R1 is a powerful LLM that can be used as a replacement for OpenAI’s model, potentially saving costs or allowing local deployment. To use it, you just add FIREWORKS_KEY="your_api_key" to the environment; the code detects this and will prefer the R1 model for all LLM operations (deep-research/README.md at main · dzhng/deep-research · GitHub). Additionally, you can configure custom endpoints and models – for example, pointing to a local LLM server or a different OpenAI-compatible API – by setting OPENAI_ENDPOINT and CUSTOM_MODEL in the environment (deep-research/README.md at main · dzhng/deep-research · GitHub). These options make it possible to use Deep Research with various backend models (local or cloud) beyond the default.

Supported Platforms and Dependencies

Platforms: Since Deep Research runs on Node.js, it is essentially platform-agnostic – it can run on any operating system that supports Node (Linux, Windows, macOS). The provided Docker setup further ensures it can run in a containerized environment, which can be deployed on cloud servers or other OSes with Docker support. There’s no specialized hardware requirement mentioned; however, access to the internet is necessary (for the search queries and web scraping).

Programming Language: The project is primarily written in TypeScript (approximately 97% TypeScript, with a small amount of JavaScript and Docker configuration) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). This means the code is executed under Node.js (with the TypeScript transpiled to JavaScript).

Dependencies: The package.json lists several dependencies that the project uses (deep-research/package.json at main · dzhng/deep-research · GitHub). Key libraries and services include:

All these dependencies are installed automatically via npm install. The environment variables (.env file) are crucial for connecting to the external services (without valid API keys, the search and LLM features won’t function). The documentation stresses providing those keys during setup.

Documentation

The main documentation for Deep Research is the README on its GitHub page (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). This README serves as a user guide and includes:

Beyond the README, there is no separate documentation website or wiki at this time. However, the codebase is intentionally kept small and commented where necessary, making it relatively approachable for developers to read and understand the logic. The project’s simplicity is part of its design – to serve as a clear example of a research agent – so developers can use the README and the source code itself as reference documentation.

Contributors and Community Support

Deep Research was created by David Zhang (GitHub handle dzhng), and he is the principal maintainer. The project has an active community around it: as of March 2025, the GitHub repository lists 14 contributors who have contributed code or documentation (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). This suggests that in addition to the original author, many others have submitted improvements, bug fixes, or enhancements via pull requests. The community involvement is also evident from the number of issues and discussions: there are dozens of open issues (feature requests, bug reports, ideas) and ongoing pull requests (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.), indicating that users are actively engaging with the project.

For support or questions, the main avenue is the GitHub repository itself. Users can open issues to report problems or ask for help, and the maintainer and community members often respond there. (There is no dedicated Slack or Discord officially linked in the project documentation, so GitHub issues and discussions serve as the primary support channel.) Given the popularity of the project, there are also third-party discussions on forums like Reddit and Hacker News where users share experiences and tips. For example, users on Reddit have praised Deep Research as one of the best research tools and discussed how to run it with local models (Any open source or commercial apis for deep research out? - Reddit) (Anybody having luck with running local deep research? - Reddit). This kind of community conversation helps newcomers troubleshoot and get the most out of the tool even outside the official repo.

Moreover, the enthusiasm for Deep Research has led to community-driven extensions and ports. Notably:

  • A developer created a Python port of Deep Research (available as a repository deep-research-py) which aims to replicate the functionality in Python, potentially to integrate with Python-based workflows or avoid Node setup (epuerta9/deep-research-py: save 200 a month and use ... - GitHub). This suggests that users who prefer Python can still leverage the ideas of Deep Research. (The tagline for the Python version mentions “save $200 a month” – likely implying it helps avoid expensive proprietary solutions by using this open tool (epuerta9/deep-research-py: save 200 a month and use ... - GitHub).)
  • Another community project built a web-based UI for Deep Research (deep-research-web-ui) so that users can interact with the research agent through a browser interface instead of the command line (AnotiaWang/deep-research-web-ui - GitHub). This web UI wraps around dzhng’s core engine (and even supports the DeepSeek R1 model) (AnotiaWang/deep-research-web-ui - GitHub), providing a more user-friendly experience. Such projects extend Deep Research’s accessibility to less technical users and demonstrate the community’s willingness to improve the tool.

These community contributions and high-profile mentions (including a Hugging Face blog and various tech articles) form a support network around Deep Research. While the project itself is young, its community is growing quickly, which bodes well for those adopting it – one can likely find help or collaborators readily.

Recent Updates and Future Roadmap

Deep Research is under active development. The repository is updated frequently (over 60 commits so far, within just a couple of months of its release), addressing issues and adding features. For instance, support for the Fireworks DeepSeek R1 model was introduced as a new feature, allowing users to leverage a powerful open-source model in place of OpenAI’s model (deep-research/README.md at main · dzhng/deep-research · GitHub). This update indicates the maintainer’s responsiveness to the community’s interest in more open and cost-effective LLM options. Other recent commits have likely included bug fixes and performance tweaks (for example, handling of concurrency limits or better parsing of search results), as common in a fast-evolving project.

In terms of a future roadmap, the official repository does not provide a formal public roadmap document. There isn’t a specific section in the README or an official announcement detailing upcoming features. However, based on the project’s trajectory and discussions in the community, we can infer some possible directions:

  • The maintainer and contributors may continue to improve the agent’s capabilities, such as enabling it to read a wider variety of content formats (PDFs, docs, etc.) or handle more complex query types. (One external analysis suggested expanding the range of file formats the agent can read and adding more fine-grained file handling as beneficial improvements (Open-source DeepResearch – Freeing our search agents), though this is a general suggestion in the community rather than a committed plan for this repo.)
  • Integration with more tools could be explored. For example, replacing or augmenting the text-based web scraper with a more advanced browsing approach (even a vision-based browser for content that requires rendering) has been mentioned in AI agent circles (Open-source DeepResearch – Freeing our search agents). This could potentially appear in Deep Research if the community contributes it or if the sponsor (Aomni) has interest in such features.
  • Given the project’s emphasis on keeping things simple (<500 LOC), any new features will likely be carefully weighed to avoid unnecessary complexity. The author might focus on refinements – improving the quality of the research results, making the LLM prompts more effective, reducing API usage costs, etc., rather than dramatically increasing scope.

Overall, while no official roadmap is published, the momentum behind Deep Research suggests it will continue to evolve. The creator welcomes contributions (“feel free to open a PR”) and the broader community (including AI enthusiasts on Hugging Face and other open-source contributors) is already collaborating on similar “deep research” agents. It’s reasonable to expect future updates to include better performance, support for additional LLM providers (e.g., other OpenAI-compatible APIs or local models), and perhaps easier ways to use the tool (maybe a built-in web UI or integration with chat interfaces). Users interested in upcoming changes should watch the GitHub repository for new commits or issues labeled as enhancements. And if someone has a specific feature in mind, they can raise it in the issue tracker – given how active the project is, such suggestions are likely to be considered promptly.

Sources: The information above is summarized from the official Deep Research GitHub repository README and metadata, as well as community discussions and contributions related to the project. Key references include the GitHub README for features and usage (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub), installation instructions (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub), and project stats (stars, contributors) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). Community projects and discussions are noted from external sources like GitHub search results and blogs (epuerta9/deep-research-py: save 200 a month and use ... - GitHub) (AnotiaWang/deep-research-web-ui - GitHub).

article Further Research

Related research papers will appear here