Deep Research (GitHub Project by dzhng)
Purpose and Overview
Deep Research (also called Open Deep Research) is an open-source AI-powered research assistant that performs iterative, “deep” research on any topic by combining web search, web scraping, and large language models (LLMs) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). It aims to refine its research direction over multiple steps, diving deeper based on previous findings. The project’s goal is to provide a simple yet effective implementation of a deep research agent in under 500 lines of code (for transparency and ease of extension) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). In practice, Deep Research takes a user query and systematically searches for information, analyzes results, and produces a comprehensive report with findings and sources. It was created by David Zhang (GitHub user dzhng) and is released under the MIT License (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). The repository is very popular, with over 14,000 stars and 1,400 forks on GitHub, indicating strong community interest (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). (The project is sponsored by Aomni, an AI startup (deep-research/README.md at main · dzhng/deep-research · GitHub), but it is available for anyone to use or contribute to.)
Features
Deep Research provides several key features for automated research (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub):
- Iterative Research: It performs deep multi-step research by iteratively generating search queries, retrieving results, and then formulating new queries or directions based on what it learns (deep-research/README.md at main · dzhng/deep-research · GitHub). This allows the agent to progressively “dig deeper” into a topic rather than stopping at surface-level information.
- Intelligent Query Generation: The system uses LLMs to generate targeted search queries from the user’s query and the research context (deep-research/README.md at main · dzhng/deep-research · GitHub). In other words, it intelligently comes up with search terms or questions that are likely to find relevant information, guided by the current research goals and prior findings.
- Depth & Breadth Control: Users can configure how wide and deep the research goes via parameters. The breadth setting controls how many different search queries/topics to explore at each level, and the depth setting controls how many layers of follow-up research to conduct (deep-research/README.md at main · dzhng/deep-research · GitHub). This ensures the user can balance between exploring broadly versus drilling down narrowly.
- Smart Follow-Up Questions: After initial searches, the assistant can ask the user clarifying or follow-up questions to refine the research direction (deep-research/README.md at main · dzhng/deep-research · GitHub). This helps the agent better understand the user’s intent and adjust its queries or focus accordingly, leading to more relevant results.
- Comprehensive Reports: Deep Research compiles all its findings into a detailed Markdown report, including the information gathered and citations to sources (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). The final report is saved as a file (e.g.
report.md
oranswer.md
) for the user to review, containing the answer or summary of the research question and references. - Concurrent Processing: The tool can run multiple search queries and result analyses in parallel to improve efficiency (deep-research/README.md at main · dzhng/deep-research · GitHub). (By default there is a concurrency limit in place, but advanced users with higher API rate limits can increase this for faster performance (deep-research/README.md at main · dzhng/deep-research · GitHub).)
Installation and Requirements
Deep Research is a Node.js application written in TypeScript, so it requires a Node.js runtime (the project targets Node v22.x per its configuration (deep-research/package.json at main · dzhng/deep-research · GitHub)). You can either set it up directly on your machine or use Docker. Before installation, you will need API keys for two services: a Firecrawl API key (for web search and content extraction) and an OpenAI API key (for the LLM) (deep-research/README.md at main · dzhng/deep-research · GitHub). (Firecrawl is a service for performing search engine queries and scraping pages programmatically, and OpenAI’s API provides the language model for analysis.) The basic requirements are summarized as follows (deep-research/README.md at main · dzhng/deep-research · GitHub):
- Platform/OS: Any system that can run Node.js (Linux, macOS, Windows). Docker can be used as an alternative for a containerized setup.
- Node.js Environment: Install Node.js (version 22.x recommended) and NPM. The codebase is ~97% TypeScript (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.), so it runs via Node.
- API Keys: You must have API credentials for Firecrawl and OpenAI. (These will be placed in an environment file during setup.) Optionally, a Fireworks API key is needed to use the DeepSeek R1 model (an open-source large model) instead of OpenAI’s model (deep-research/README.md at main · dzhng/deep-research · GitHub).
Node.js Installation
To install Deep Research locally using Node.js:
- Clone the Repository: Download or clone the GitHub repository to your machine.
- Install Dependencies: In the project directory, run the installation command (uses NPM) to install all required Node packages – for example:
npm install
(deep-research/README.md at main · dzhng/deep-research · GitHub). This will fetch libraries such as the Firecrawl client, OpenAI SDK, and others (Express, Zod, etc.) that the project depends on (deep-research/package.json at main · dzhng/deep-research · GitHub). - Configure API Keys: Create a
.env.local
file (you can copy or rename the provided.env.example
file) and add your API keys and settings (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). At minimum, you need to set: FIRECRAWL_KEY="your_firecrawl_key"
OPENAI_KEY="your_openai_key"
You can also configure optional variables here, such asFIRECRAWL_BASE_URL
if using a self-hosted Firecrawl server, orOPENAI_ENDPOINT
andOPENAI_MODEL
if you want to use a custom or local LLM instead of OpenAI (deep-research/README.md at main · dzhng/deep-research · GitHub). (If you leave out the OpenAI key and instead provide a custom endpoint and model, the tool will use those settings to query a local or alternative model.)
Docker Installation
If you prefer using Docker (which avoids manual Node setup and isolates the environment), the repository includes a Docker setup:
- Clone the Repository: Get the code as above.
- Create Environment File: Rename the
.env.example
file to.env.local
and insert your API keys into it (same as the Node.js step) (deep-research/README.md at main · dzhng/deep-research · GitHub). - Install Dependencies: Even in the Docker setup, it’s recommended to run
npm install
once (either on host or in the Docker build) to ensure dependencies are in place (deep-research/README.md at main · dzhng/deep-research · GitHub). - Start Docker Containers: Run the provided Docker Compose configuration:
docker compose up -d
to start the container in the background (deep-research/README.md at main · dzhng/deep-research · GitHub). This sets up a container (nameddeep-research
by default) with the Node environment. - Execute the App in Container: Finally, execute the Deep Research program inside the Docker container. For example:
docker exec -it deep-research npm run docker
(deep-research/README.md at main · dzhng/deep-research · GitHub). This attaches to the running container and initiates the research agent. (Thenpm run docker
script likely ensures the application runs with the correct settings inside Docker.)
After either installation method, you should have the application ready to accept research queries.
Usage
To use Deep Research, you simply run the program and interact with its prompts. If installed via Node, run npm start
in the project directory to launch the research assistant (deep-research/README.md at main · dzhng/deep-research · GitHub). (If using Docker, the last installation step already runs the program inside the container.) Once running, the tool operates in a console (command-line) interface:
- Initial Prompt: The system will ask you to enter your research query (the topic or question you want investigated). After that, it asks for the breadth and depth parameters for the research (deep-research/README.md at main · dzhng/deep-research · GitHub). You can input how broad you want the search (number of parallel search queries, recommended 3–10, default 4) and how deep the research should go (number of recursive iterations, recommended 1–5, default 2) (deep-research/README.md at main · dzhng/deep-research · GitHub). These settings control the scope of the exploration.
- Interactive Refinement: The assistant may also ask follow-up questions to refine its understanding of your needs (deep-research/README.md at main · dzhng/deep-research · GitHub). For example, it might seek clarification on ambiguous terms or ask you to choose between subtopics, which helps it steer the research in the most relevant direction. After you respond to any follow-ups, the automated search process begins.
Once you’ve provided the necessary inputs, Deep Research will autonomously perform the following steps to gather information and produce results:
- Generate and Execute Searches: The agent uses the LLM to formulate relevant search engine queries based on your query and goals. It then executes these searches via the Firecrawl API, retrieving the top web results (deep-research/README.md at main · dzhng/deep-research · GitHub).
- Process Results: For each result, it will visit the page (scrape content) and analyze the information using the LLM (deep-research/README.md at main · dzhng/deep-research · GitHub). The system looks for answers, key facts, or new leads in the text. It keeps track of “learnings” (important findings) and “directions” (new questions or subtopics to pursue) (deep-research/README.md at main · dzhng/deep-research · GitHub).
- Recursive Deepening: If the depth setting > 0 and there are new directions to explore, the agent will iterate. It treats the new directions as next-level queries and repeats the search and analysis cycle (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). In each iteration, it incorporates what it has learned so far (prior goals and new questions) to focus the subsequent searches (deep-research/README.md at main · dzhng/deep-research · GitHub). This looping continues until the specified depth is reached (or no further directions remain).
- Report Generation: After completing the required rounds of research, Deep Research compiles a comprehensive report in Markdown format (deep-research/README.md at main · dzhng/deep-research · GitHub). The report includes the question, the gathered information and answers, and references (citations) to the source URLs for verification (deep-research/README.md at main · dzhng/deep-research · GitHub). This report is saved to a file in the working directory (by default named
report.md
, oranswer.md
in certain modes) (deep-research/README.md at main · dzhng/deep-research · GitHub). You can open this Markdown file to read the findings presented in a structured way.
Concurrency: By default, the agent runs a few searches in parallel to speed up the process. If you have a paid or self-hosted Firecrawl with higher rate limits, you can increase the concurrency setting in the code to make it run even faster (conduct more searches simultaneously) (deep-research/README.md at main · dzhng/deep-research · GitHub). Conversely, if you’re on the free Firecrawl plan and hit rate limits, you might lower the concurrency to avoid errors (deep-research/README.md at main · dzhng/deep-research · GitHub). This adjustability allows the user to optimize performance based on their resources.
Advanced Usage: Deep Research is flexible in terms of which language model it uses. Out of the box, it uses OpenAI’s “o3-mini” model via the OpenAI API for analyzing content and generating queries (deep-research/README.md at main · dzhng/deep-research · GitHub). However, the tool can automatically switch to an open-source model called DeepSeek R1 (by Fireworks.ai) if you provide a Fireworks API key (deep-research/README.md at main · dzhng/deep-research · GitHub). DeepSeek R1 is a powerful LLM that can be used as a replacement for OpenAI’s model, potentially saving costs or allowing local deployment. To use it, you just add FIREWORKS_KEY="your_api_key"
to the environment; the code detects this and will prefer the R1 model for all LLM operations (deep-research/README.md at main · dzhng/deep-research · GitHub). Additionally, you can configure custom endpoints and models – for example, pointing to a local LLM server or a different OpenAI-compatible API – by setting OPENAI_ENDPOINT
and CUSTOM_MODEL
in the environment (deep-research/README.md at main · dzhng/deep-research · GitHub). These options make it possible to use Deep Research with various backend models (local or cloud) beyond the default.
Supported Platforms and Dependencies
Platforms: Since Deep Research runs on Node.js, it is essentially platform-agnostic – it can run on any operating system that supports Node (Linux, Windows, macOS). The provided Docker setup further ensures it can run in a containerized environment, which can be deployed on cloud servers or other OSes with Docker support. There’s no specialized hardware requirement mentioned; however, access to the internet is necessary (for the search queries and web scraping).
Programming Language: The project is primarily written in TypeScript (approximately 97% TypeScript, with a small amount of JavaScript and Docker configuration) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). This means the code is executed under Node.js (with the TypeScript transpiled to JavaScript).
Dependencies: The package.json
lists several dependencies that the project uses (deep-research/package.json at main · dzhng/deep-research · GitHub). Key libraries and services include:
- Firecrawl (Mendable’s Firecrawl API) – used for performing Google searches (SERP queries) and scraping web page content via an API (deep-research/README.md at main · dzhng/deep-research · GitHub). The Node library
@mendable/firecrawl-js
is used to interact with this service (deep-research/package.json at main · dzhng/deep-research · GitHub). This handles the “web browsing” component for the agent. - OpenAI API – used for natural language processing (generating search queries, analyzing text, asking follow-ups). The project uses an OpenAI SDK (
@ai-sdk/openai
) to call OpenAI’s models (deep-research/package.json at main · dzhng/deep-research · GitHub). By default it uses OpenAI’so3-mini
model (a smaller GPT-3 family model) for cost efficiency, but you can configure it to use other models or endpoints as noted above. - Fireworks AI SDK – the dependency
@ai-sdk/fireworks
in the project suggests integration with Fireworks’ services (deep-research/package.json at main · dzhng/deep-research · GitHub). In particular this is tied to the DeepSeek R1 model support. When a Fireworks API key is provided, the agent will use the Fireworks LLM (R1) instead of OpenAI, via this SDK. - Express & CORS: The project includes Express (a web framework) and CORS middleware (deep-research/package.json at main · dzhng/deep-research · GitHub). This indicates there might be a small web server or API endpoints (perhaps for hosting a local interface or for Docker usage). It could be used to serve the final report or to facilitate the Docker container’s operation. However, the primary user interaction is via console; the presence of Express may be for future web UI integration or for handling any local callback requests during searches.
- Utility Libraries: It uses
lodash-es
for utility functions,p-limit
to manage promise concurrency limits (deep-research/package.json at main · dzhng/deep-research · GitHub),uuid
for generating unique identifiers, andzod
for schema validation of data (deep-research/package.json at main · dzhng/deep-research · GitHub). It also includesjs-tiktoken
for tokenization (likely to manage token limits when sending data to the LLM, ensuring prompts don’t exceed model context length) (deep-research/package.json at main · dzhng/deep-research · GitHub).
All these dependencies are installed automatically via npm install
. The environment variables (.env file) are crucial for connecting to the external services (without valid API keys, the search and LLM features won’t function). The documentation stresses providing those keys during setup.
Documentation
The main documentation for Deep Research is the README on its GitHub page (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). This README serves as a user guide and includes:
- Introduction: An explanation of what the tool is and the philosophy behind it (i.e., an open, simple implementation of a deep research agent under 500 LOC) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). It briefly describes how the system works and the goals of the project.
- How It Works: There is a high-level overview of the internal workflow. The README even contains a flowchart (written in Mermaid markdown) illustrating the agent’s process flow (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub) – from taking input (user query, breadth, depth) to generating search queries, processing results, deciding whether to go deeper, and finally producing a report. This diagram gives users and developers a conceptual understanding of the iterative loop that Deep Research follows.
- Features List: A list of the main features and capabilities (iterative search, smart query generation, etc.), which we summarized above (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub).
- Setup Instructions: Step-by-step instructions for both Node.js setup and Docker setup are provided in the README (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub). It details how to install dependencies and configure the environment variables, which is very helpful for new users.
- Usage Guide: The README outlines how to run the tool (
npm start
) and explains the prompts the user will see (entering the query, breadth, depth, etc.) (deep-research/README.md at main · dzhng/deep-research · GitHub). It then describes what the program will do in response (generating searches, etc.) and notes where the output is saved (deep-research/README.md at main · dzhng/deep-research · GitHub). - Tips and Advanced Configurations: There are sections covering things like adjusting concurrency settings for performance (deep-research/README.md at main · dzhng/deep-research · GitHub), enabling the DeepSeek R1 model with the Fireworks API key (deep-research/README.md at main · dzhng/deep-research · GitHub), and using custom LLM endpoints/models via environment variables (deep-research/README.md at main · dzhng/deep-research · GitHub). These act as documentation for power users who might want to tweak the behavior of Deep Research or integrate it with other AI models.
- Sponsor and Contribution Note: The author mentions sponsorship by Aomni and encourages starring the repo and following him on Twitter/X (deep-research/README.md at main · dzhng/deep-research · GitHub), but there isn’t a formal contribution guide beyond “feel free to fork or PR.”
Beyond the README, there is no separate documentation website or wiki at this time. However, the codebase is intentionally kept small and commented where necessary, making it relatively approachable for developers to read and understand the logic. The project’s simplicity is part of its design – to serve as a clear example of a research agent – so developers can use the README and the source code itself as reference documentation.
Contributors and Community Support
Deep Research was created by David Zhang (GitHub handle dzhng), and he is the principal maintainer. The project has an active community around it: as of March 2025, the GitHub repository lists 14 contributors who have contributed code or documentation (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). This suggests that in addition to the original author, many others have submitted improvements, bug fixes, or enhancements via pull requests. The community involvement is also evident from the number of issues and discussions: there are dozens of open issues (feature requests, bug reports, ideas) and ongoing pull requests (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.), indicating that users are actively engaging with the project.
For support or questions, the main avenue is the GitHub repository itself. Users can open issues to report problems or ask for help, and the maintainer and community members often respond there. (There is no dedicated Slack or Discord officially linked in the project documentation, so GitHub issues and discussions serve as the primary support channel.) Given the popularity of the project, there are also third-party discussions on forums like Reddit and Hacker News where users share experiences and tips. For example, users on Reddit have praised Deep Research as one of the best research tools and discussed how to run it with local models (Any open source or commercial apis for deep research out? - Reddit) (Anybody having luck with running local deep research? - Reddit). This kind of community conversation helps newcomers troubleshoot and get the most out of the tool even outside the official repo.
Moreover, the enthusiasm for Deep Research has led to community-driven extensions and ports. Notably:
- A developer created a Python port of Deep Research (available as a repository
deep-research-py
) which aims to replicate the functionality in Python, potentially to integrate with Python-based workflows or avoid Node setup (epuerta9/deep-research-py: save 200 a month and use ... - GitHub). This suggests that users who prefer Python can still leverage the ideas of Deep Research. (The tagline for the Python version mentions “save $200 a month” – likely implying it helps avoid expensive proprietary solutions by using this open tool (epuerta9/deep-research-py: save 200 a month and use ... - GitHub).) - Another community project built a web-based UI for Deep Research (
deep-research-web-ui
) so that users can interact with the research agent through a browser interface instead of the command line (AnotiaWang/deep-research-web-ui - GitHub). This web UI wraps around dzhng’s core engine (and even supports the DeepSeek R1 model) (AnotiaWang/deep-research-web-ui - GitHub), providing a more user-friendly experience. Such projects extend Deep Research’s accessibility to less technical users and demonstrate the community’s willingness to improve the tool.
These community contributions and high-profile mentions (including a Hugging Face blog and various tech articles) form a support network around Deep Research. While the project itself is young, its community is growing quickly, which bodes well for those adopting it – one can likely find help or collaborators readily.
Recent Updates and Future Roadmap
Deep Research is under active development. The repository is updated frequently (over 60 commits so far, within just a couple of months of its release), addressing issues and adding features. For instance, support for the Fireworks DeepSeek R1 model was introduced as a new feature, allowing users to leverage a powerful open-source model in place of OpenAI’s model (deep-research/README.md at main · dzhng/deep-research · GitHub). This update indicates the maintainer’s responsiveness to the community’s interest in more open and cost-effective LLM options. Other recent commits have likely included bug fixes and performance tweaks (for example, handling of concurrency limits or better parsing of search results), as common in a fast-evolving project.
In terms of a future roadmap, the official repository does not provide a formal public roadmap document. There isn’t a specific section in the README or an official announcement detailing upcoming features. However, based on the project’s trajectory and discussions in the community, we can infer some possible directions:
- The maintainer and contributors may continue to improve the agent’s capabilities, such as enabling it to read a wider variety of content formats (PDFs, docs, etc.) or handle more complex query types. (One external analysis suggested expanding the range of file formats the agent can read and adding more fine-grained file handling as beneficial improvements (Open-source DeepResearch – Freeing our search agents), though this is a general suggestion in the community rather than a committed plan for this repo.)
- Integration with more tools could be explored. For example, replacing or augmenting the text-based web scraper with a more advanced browsing approach (even a vision-based browser for content that requires rendering) has been mentioned in AI agent circles (Open-source DeepResearch – Freeing our search agents). This could potentially appear in Deep Research if the community contributes it or if the sponsor (Aomni) has interest in such features.
- Given the project’s emphasis on keeping things simple (<500 LOC), any new features will likely be carefully weighed to avoid unnecessary complexity. The author might focus on refinements – improving the quality of the research results, making the LLM prompts more effective, reducing API usage costs, etc., rather than dramatically increasing scope.
Overall, while no official roadmap is published, the momentum behind Deep Research suggests it will continue to evolve. The creator welcomes contributions (“feel free to open a PR”) and the broader community (including AI enthusiasts on Hugging Face and other open-source contributors) is already collaborating on similar “deep research” agents. It’s reasonable to expect future updates to include better performance, support for additional LLM providers (e.g., other OpenAI-compatible APIs or local models), and perhaps easier ways to use the tool (maybe a built-in web UI or integration with chat interfaces). Users interested in upcoming changes should watch the GitHub repository for new commits or issues labeled as enhancements. And if someone has a specific feature in mind, they can raise it in the issue tracker – given how active the project is, such suggestions are likely to be considered promptly.
Sources: The information above is summarized from the official Deep Research GitHub repository README and metadata, as well as community discussions and contributions related to the project. Key references include the GitHub README for features and usage (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub), installation instructions (deep-research/README.md at main · dzhng/deep-research · GitHub) (deep-research/README.md at main · dzhng/deep-research · GitHub), and project stats (stars, contributors) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.) (GitHub - dzhng/deep-research: An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.). Community projects and discussions are noted from external sources like GitHub search results and blogs (epuerta9/deep-research-py: save 200 a month and use ... - GitHub) (AnotiaWang/deep-research-web-ui - GitHub).