Skip to content

collector-rss

RSS ingests Atom/RSS URLs defined as collection sources. The collector downloads feeds, clamps item counts per feed, honours HTTP timeouts/custom user agents configured in YAML, and publishes normalized entities downstream.

Container invocation

Setting Value
ICEHIVE_SERVICE collector-rss
ICEHIVE_CONTROLLER_URL Controller base URL
-configdir /etc/icehive

-listen :8087 default.

Optional YAML (collector-rss.yaml)

Key Default Purpose
listen :8087 Metrics + health listener
poll_interval_seconds 60 Delay between scans of relational collection recipes
fetch_timeout_seconds 90 HTTP client timeout (seconds); must parse as a positive integer to override the 90 second built-in fallback
user_agent (see env fallback) When non-empty in YAML it becomes the User-Agent; otherwise ICEHIVE_RSS_USER_AGENT supplies the fallback before gofeed defaults kick in
items_max_per_feed 25 Max items persisted per polled feed

Feeds themselves live in relational collection_sources rows whose collector_type is collector-rss (YAML adjusts transport ergonomics only).

Environment overrides

Variable Behaviour
ICEHIVE_RSS_USER_AGENT Used whenever YAML omits user_agent

Routing keys

Schemas publish under collector.source_schema.collector-rss; enqueue topics use collector.collection_request.collector-rss suffixes.