RSS Feeds
The Syndication feed parser makes it possible to parse RSS and Atom feeds from upstream documents, extract each feed item from the feeds, and forward them, along with their metadata to other middlewares in the pipeline.
đ° Parsing Feeds
To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline.
đ Metadata
This middleware will automatically extract feed item metadata and make them available as part of the output CloudEvents. The following metadata are extracted, when available, from feed items.
Metadata | Description |
---|---|
title | The title of the feed item. |
description | The description of the feed item. |
createdAt | The creation date of the feed item. |
updatedAt | The last update date of the feed item. |
authors | The authors associated with the feed item. |
keywords | The keywords associated with the feed item. |
language | The language of the feed item. |
đ Output
This middleware takes as an input RSS or Atom syndication feeds, and outputs multiple HTML documents that are associated with each extracted feeds. This makes it possible for downstream middlewares to process each HTML document that is part of the original feed in parallel.
Below is an example of an output HTML document extracted from a feed item by the syndication feed processor.
đ Click to expand example
âšī¸ Limits
This middleware will not attempt to request via HTTP the feed items to compute their size. Therefore, the size
property on the document event for feed items is not specified on output events.
Another limitation lies in that this middleware only outputs HTML documents, and does not currently forward RSS Enclosures to downstream middlewares (e.g associated images or video documents).
đī¸ Architecture
This middleware is based on a Lambda compute using the feedparser
Python library to parse the feeds and extract the feed items.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
application/rss+xml | RSS feeds. |
application/atom+xml | Atom feeds. |
Supported Outputs
Mime Type | Description |
---|---|
text/html | HTML documents. |
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
đ Examples
- Article Curation Pipeline - Builds a pipeline converting HTML articles into plain text.