Domain Research (dr) — Live Web

This guide shows how to use omicverse.llm.domain_research with a live web search retriever to produce a comprehensive, cited report sourced from the internet.

Prerequisites: - Optional but recommended: set TAVILY_API_KEY for the Tavily backend. - For the DuckDuckGo backend, install duckduckgo_search and beautifulsoup4 for better results.

Quick start (auto backend):

from omicverse.llm.domain_research import ResearchManager

# Easiest: `vector_store="web"` auto-selects Tavily if TAVILY_API_KEY is set,
# otherwise falls back to DuckDuckGo.
rm = ResearchManager(vector_store="web")
report = rm.run("State-of-the-art methods for single-cell integration in 2024")
print(report)

Force backend or tweak retrieval: - vector_store="web:tavily" or vector_store="web:duckduckgo" to force a backend. - For manual control (e.g., max_results, fetch_content), import and instantiate WebRetrieverStore directly.

Tips: - fetch_content=True fetches and extracts text from result URLs. Set to False to rely on snippets only. - Combine with an LLM-backed synthesizer for a stronger executive summary:

import os
from omicverse.llm.domain_research.write.synthesizer import PromptSynthesizer

synth = PromptSynthesizer(
    model="gpt-4o-mini",
    base_url="https://api.openai.com/v1",
    api_key=os.getenv("OPENAI_API_KEY", ""),
)
rm = ResearchManager(vector_store="web", synthesizer=synth)
print(rm.run("Multi-omics integration benchmarks 2023–2025"))

Troubleshooting: - Tavily: ensure TAVILY_API_KEY is set and valid. - DuckDuckGo: for best stability use the duckduckgo_search package; otherwise the HTML fallback may be rate-limited or change over time. - If pages are not HTML or are behind paywalls, the fetcher returns the raw response text (truncated) as the document body.