Home AI Article
AINLP

Build an AI News Summarizer: Complete Python Tutorial for Automated Content Digests

👤 By harshith
📅 Dec 13, 2025
⏱️ 18 min read
💬 0 Comments

📑 Table of Contents

Jump to sections as you read...

Introduction to AI News Summarization

In an era of information overload, AI-powered news summarization has become essential for staying informed without spending hours reading. From morning briefings to research alerts, intelligent summarization systems help users consume more information in less time.

In this comprehensive tutorial, you’ll build a complete news summarization system that aggregates content from multiple sources, generates concise summaries using state-of-the-art NLP models, and presents them through a clean interface. This is the technology powering apps like Artifact, Feedly, and Google News.

What You’ll Build

By the end of this tutorial, you’ll have a news summarizer that:

  • Fetches news from multiple RSS feeds and APIs
  • Extracts article content intelligently
  • Generates abstractive summaries using transformers
  • Groups related articles into topics
  • Provides customizable summary lengths
  • Includes a web dashboard for browsing summaries

Understanding Text Summarization

Types of Summarization

Extractive Summarization: Selects important sentences from the original text. Like highlighting key passages.

Abstractive Summarization: Generates new sentences that capture the essence. Like writing a brief in your own words.

Hybrid Approaches: Combines both methods for best results.

Challenges in News Summarization

  • Maintaining factual accuracy without hallucination
  • Preserving key entities (names, dates, numbers)
  • Handling multiple perspectives on the same story
  • Generating coherent multi-document summaries
  • Adapting to different news domains (tech, politics, sports)

Prerequisites and Setup

Required Libraries

# Create virtual environment
python -m venv news_summarizer_env
source news_summarizer_env/bin/activate

# Core dependencies
pip install transformers torch
pip install newspaper3k lxml_html_clean
pip install feedparser requests
pip install beautifulsoup4 trafilatura

# Additional tools
pip install schedule python-dotenv
pip install flask flask-cors

# For better summaries
pip install sentencepiece

Project Structure

news_summarizer/
├── app.py                    # Flask web application
├── fetcher/
│   ├── __init__.py
│   ├── rss_fetcher.py        # RSS feed parsing
│   ├── article_extractor.py  # Full article extraction
│   └── news_api.py           # News API integration
├── summarizer/
│   ├── __init__.py
│   ├── extractive.py         # Extractive summarization
│   ├── abstractive.py        # Transformer-based summarization
│   └── multi_doc.py          # Multi-document summarization
├── models/
│   └── config.py             # Model configurations
├── templates/
│   └── index.html            # Web dashboard
├── data/
│   └── feeds.json            # RSS feed list
└── requirements.txt

Step 1: News Fetcher

First, let’s create a robust news fetching system:

# fetcher/rss_fetcher.py
import feedparser
from typing import List, Dict, Optional
from dataclasses import dataclass, field
from datetime import datetime
import hashlib

@dataclass
class NewsArticle:
    title: str
    url: str
    source: str
    published: Optional[datetime] = None
    summary: str = ""
    content: str = ""
    author: str = ""
    tags: List[str] = field(default_factory=list)
    image_url: str = ""
    article_id: str = ""

    def __post_init__(self):
        if not self.article_id:
            self.article_id = hashlib.md5(self.url.encode()).hexdigest()[:12]

class RSSFetcher:
    """Fetch news articles from RSS feeds."""

    def __init__(self):
        self.default_feeds = {
            "tech": [
                "https://feeds.arstechnica.com/arstechnica/technology-lab",
                "https://www.theverge.com/rss/index.xml",
                "https://techcrunch.com/feed/",
            ],
            "general": [
                "https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml",
                "https://feeds.bbci.co.uk/news/rss.xml",
                "https://www.theguardian.com/world/rss",
            ],
            "science": [
                "https://www.sciencedaily.com/rss/all.xml",
                "https://www.nature.com/nature.rss",
            ]
        }

    def fetch_feed(self, feed_url: str, source_name: str = "") -> List[NewsArticle]:
        """Fetch articles from a single RSS feed."""
        articles = []

        try:
            feed = feedparser.parse(feed_url)

            # Get source name from feed if not provided
            if not source_name:
                source_name = feed.feed.get("title", "Unknown")

            for entry in feed.entries[:20]:  # Limit to 20 per feed
                article = NewsArticle(
                    title=entry.get("title", ""),
                    url=entry.get("link", ""),
                    source=source_name,
                    summary=self._clean_html(entry.get("summary", "")),
                    author=entry.get("author", ""),
                    tags=[tag.term for tag in entry.get("tags", [])],
                )

                # Parse published date
                if "published_parsed" in entry and entry.published_parsed:
                    article.published = datetime(*entry.published_parsed[:6])

                # Try to get image
                if "media_content" in entry:
                    for media in entry.media_content:
                        if "url" in media:
                            article.image_url = media["url"]
                            break

                articles.append(article)

        except Exception as e:
            print(f"Error fetching {feed_url}: {e}")

        return articles

    def _clean_html(self, text: str) -> str:
        """Remove HTML tags from text."""
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(text, "html.parser")
        return soup.get_text(separator=" ").strip()

    def fetch_category(self, category: str) -> List[NewsArticle]:
        """Fetch all articles from a category."""
        all_articles = []

        feeds = self.default_feeds.get(category, [])
        for feed_url in feeds:
            articles = self.fetch_feed(feed_url)
            all_articles.extend(articles)

        # Sort by date, newest first
        all_articles.sort(
            key=lambda x: x.published or datetime.min,
            reverse=True
        )

        return all_articles

    def fetch_all(self) -> Dict[str, List[NewsArticle]]:
        """Fetch articles from all categories."""
        results = {}
        for category in self.default_feeds:
            results[category] = self.fetch_category(category)
        return results

Step 2: Article Content Extractor

Extract full article content from URLs:

# fetcher/article_extractor.py
from typing import Optional
import requests
from newspaper import Article
import trafilatura

class ArticleExtractor:
    """Extract full article content from URLs."""

    def __init__(self):
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        }
        self.timeout = 10

    def extract_with_newspaper(self, url: str) -> Optional[str]:
        """Extract article using newspaper3k."""
        try:
            article = Article(url)
            article.download()
            article.parse()
            return article.text
        except Exception as e:
            print(f"Newspaper extraction failed: {e}")
            return None

    def extract_with_trafilatura(self, url: str) -> Optional[str]:
        """Extract article using trafilatura."""
        try:
            downloaded = trafilatura.fetch_url(url)
            if downloaded:
                text = trafilatura.extract(downloaded)
                return text
        except Exception as e:
            print(f"Trafilatura extraction failed: {e}")
            return None

    def extract(self, url: str) -> str:
        """Extract article content using multiple methods."""

        # Try trafilatura first (usually better)
        content = self.extract_with_trafilatura(url)
        if content and len(content) > 200:
            return content

        # Fallback to newspaper3k
        content = self.extract_with_newspaper(url)
        if content and len(content) > 200:
            return content

        return ""

    def extract_batch(self, urls: list) -> dict:
        """Extract content from multiple URLs."""
        results = {}
        for url in urls:
            results[url] = self.extract(url)
        return results

Step 3: Extractive Summarizer

Create a fast extractive summarizer for initial processing:

# summarizer/extractive.py
import re
from typing import List, Tuple
from collections import Counter
import math

class ExtractiveSummarizer:
    """Extractive summarization using TF-IDF scoring."""

    def __init__(self):
        self.stop_words = set([
            "the", "a", "an", "and", "or", "but", "in", "on", "at", "to",
            "for", "of", "with", "by", "from", "as", "is", "was", "are",
            "were", "been", "be", "have", "has", "had", "do", "does", "did",
            "will", "would", "could", "should", "may", "might", "must",
            "that", "which", "who", "whom", "this", "these", "those",
            "it", "its", "they", "them", "their", "we", "us", "our",
            "you", "your", "he", "him", "his", "she", "her", "i", "me", "my"
        ])

    def _tokenize(self, text: str) -> List[str]:
        """Simple tokenization."""
        text = text.lower()
        words = re.findall(r'\b[a-z]+\b', text)
        return [w for w in words if w not in self.stop_words and len(w) > 2]

    def _split_sentences(self, text: str) -> List[str]:
        """Split text into sentences."""
        sentences = re.split(r'(?<=[.!?])\s+', text)
        return [s.strip() for s in sentences if len(s.strip()) > 20]

    def _compute_tf(self, words: List[str]) -> dict:
        """Compute term frequency."""
        tf = Counter(words)
        total = len(words)
        return {word: count / total for word, count in tf.items()}

    def _compute_idf(self, sentences: List[str]) -> dict:
        """Compute inverse document frequency."""
        n_docs = len(sentences)
        word_doc_count = Counter()

        for sentence in sentences:
            words = set(self._tokenize(sentence))
            word_doc_count.update(words)

        idf = {}
        for word, count in word_doc_count.items():
            idf[word] = math.log(n_docs / (1 + count))

        return idf

    def _score_sentence(
        self,
        sentence: str,
        tf: dict,
        idf: dict,
        position: int,
        total_sentences: int
    ) -> float:
        """Score a sentence based on multiple factors."""
        words = self._tokenize(sentence)
        if not words:
            return 0.0

        # TF-IDF score
        tfidf_score = sum(tf.get(w, 0) * idf.get(w, 0) for w in words) / len(words)

        # Position score (first sentences are often more important)
        position_score = 1.0 - (position / total_sentences) * 0.3

        # Length score (prefer medium-length sentences)
        length = len(words)
        if 10 <= length <= 30:
            length_score = 1.0
        elif length < 10:
            length_score = length / 10
        else:
            length_score = 30 / length

        return tfidf_score * position_score * length_score

    def summarize(
        self,
        text: str,
        num_sentences: int = 3,
        max_length: int = 500
    ) -> str:
        """Generate extractive summary."""
        sentences = self._split_sentences(text)

        if len(sentences) <= num_sentences:
            return text

        # Compute TF-IDF
        all_words = self._tokenize(text)
        tf = self._compute_tf(all_words)
        idf = self._compute_idf(sentences)

        # Score sentences
        scored_sentences = []
        for i, sentence in enumerate(sentences):
            score = self._score_sentence(sentence, tf, idf, i, len(sentences))
            scored_sentences.append((i, sentence, score))

        # Select top sentences
        scored_sentences.sort(key=lambda x: x[2], reverse=True)
        selected = scored_sentences[:num_sentences]

        # Sort by original position for coherence
        selected.sort(key=lambda x: x[0])

        summary = " ".join(s[1] for s in selected)

        # Truncate if needed
        if len(summary) > max_length:
            summary = summary[:max_length].rsplit(" ", 1)[0] + "..."

        return summary

Step 4: Abstractive Summarizer

Use transformer models for high-quality summaries:

# summarizer/abstractive.py
from typing import List, Optional
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    pipeline
)

class AbstractiveSummarizer:
    """Abstractive summarization using transformer models."""

    def __init__(
        self,
        model_name: str = "facebook/bart-large-cnn",
        device: str = None
    ):
        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
        self.model_name = model_name

        print(f"Loading {model_name} on {self.device}...")

        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)

        # Create pipeline for easier use
        self.summarizer = pipeline(
            "summarization",
            model=self.model,
            tokenizer=self.tokenizer,
            device=0 if self.device == "cuda" else -1
        )

        print("Model loaded successfully!")

    def summarize(
        self,
        text: str,
        max_length: int = 150,
        min_length: int = 50,
        do_sample: bool = False
    ) -> str:
        """Generate abstractive summary."""

        # Truncate input if too long
        max_input_length = 1024
        inputs = self.tokenizer(
            text,
            max_length=max_input_length,
            truncation=True,
            return_tensors="pt"
        )

        if len(inputs["input_ids"][0]) < 50:
            return text  # Text too short to summarize

        # Generate summary
        summary = self.summarizer(
            text,
            max_length=max_length,
            min_length=min_length,
            do_sample=do_sample,
            truncation=True
        )

        return summary[0]["summary_text"]

    def summarize_batch(
        self,
        texts: List[str],
        max_length: int = 150,
        min_length: int = 50
    ) -> List[str]:
        """Summarize multiple texts efficiently."""

        summaries = self.summarizer(
            texts,
            max_length=max_length,
            min_length=min_length,
            truncation=True,
            batch_size=4
        )

        return [s["summary_text"] for s in summaries]

    def summarize_news(
        self,
        text: str,
        style: str = "brief"
    ) -> str:
        """Generate news-optimized summary."""

        length_configs = {
            "brief": {"max_length": 75, "min_length": 30},
            "standard": {"max_length": 150, "min_length": 50},
            "detailed": {"max_length": 300, "min_length": 100}
        }

        config = length_configs.get(style, length_configs["standard"])

        return self.summarize(text, **config)

Step 5: Multi-Document Summarizer

Summarize multiple related articles:

# summarizer/multi_doc.py
from typing import List, Dict
from dataclasses import dataclass
from .extractive import ExtractiveSummarizer
from .abstractive import AbstractiveSummarizer

@dataclass
class ArticleCluster:
    topic: str
    articles: List[Dict]
    combined_summary: str
    key_points: List[str]

class MultiDocSummarizer:
    """Summarize multiple related articles."""

    def __init__(self, abstractive_summarizer: AbstractiveSummarizer = None):
        self.extractive = ExtractiveSummarizer()
        self.abstractive = abstractive_summarizer or AbstractiveSummarizer()

    def _extract_key_points(self, texts: List[str], n_points: int = 5) -> List[str]:
        """Extract key points from multiple texts."""
        # Combine all texts
        combined = " ".join(texts)

        # Get extractive summary
        sentences = self.extractive._split_sentences(combined)

        # Score all sentences
        all_words = self.extractive._tokenize(combined)
        tf = self.extractive._compute_tf(all_words)
        idf = self.extractive._compute_idf(sentences)

        scored = []
        for i, sent in enumerate(sentences):
            score = self.extractive._score_sentence(sent, tf, idf, i, len(sentences))
            scored.append((sent, score))

        # Get top unique points
        scored.sort(key=lambda x: x[1], reverse=True)
        key_points = []
        seen = set()

        for sent, score in scored:
            # Avoid similar sentences
            words = set(sent.lower().split()[:5])
            if not words & seen:
                key_points.append(sent)
                seen.update(words)
            if len(key_points) >= n_points:
                break

        return key_points

    def summarize_cluster(
        self,
        articles: List[Dict],
        topic: str = "News"
    ) -> ArticleCluster:
        """Summarize a cluster of related articles."""

        # Extract content from articles
        texts = [a.get("content", a.get("summary", "")) for a in articles]
        texts = [t for t in texts if t]

        if not texts:
            return ArticleCluster(
                topic=topic,
                articles=articles,
                combined_summary="No content available",
                key_points=[]
            )

        # Create combined text with source attribution
        combined_parts = []
        for i, (article, text) in enumerate(zip(articles, texts)):
            source = article.get("source", f"Source {i+1}")
            # Take first 500 chars from each
            combined_parts.append(f"[{source}]: {text[:500]}")

        combined_text = " ".join(combined_parts)

        # Generate abstractive summary
        try:
            summary = self.abstractive.summarize(
                combined_text,
                max_length=200,
                min_length=75
            )
        except Exception as e:
            print(f"Abstractive summarization failed: {e}")
            summary = self.extractive.summarize(combined_text, num_sentences=3)

        # Extract key points
        key_points = self._extract_key_points(texts)

        return ArticleCluster(
            topic=topic,
            articles=articles,
            combined_summary=summary,
            key_points=key_points
        )

    def create_daily_digest(
        self,
        articles_by_category: Dict[str, List[Dict]]
    ) -> Dict[str, ArticleCluster]:
        """Create a daily digest from categorized articles."""

        digest = {}

        for category, articles in articles_by_category.items():
            if articles:
                # Take top 5 articles per category
                top_articles = articles[:5]
                cluster = self.summarize_cluster(top_articles, topic=category)
                digest[category] = cluster

        return digest

Step 6: Flask Application

Create the web interface:

# app.py
import os
from flask import Flask, render_template, jsonify, request
from flask_cors import CORS
from datetime import datetime

from fetcher.rss_fetcher import RSSFetcher
from fetcher.article_extractor import ArticleExtractor
from summarizer.abstractive import AbstractiveSummarizer
from summarizer.extractive import ExtractiveSummarizer

app = Flask(__name__)
CORS(app)

# Initialize components
rss_fetcher = RSSFetcher()
article_extractor = ArticleExtractor()
extractive_summarizer = ExtractiveSummarizer()

# Lazy load abstractive summarizer (heavy model)
abstractive_summarizer = None

def get_abstractive_summarizer():
    global abstractive_summarizer
    if abstractive_summarizer is None:
        abstractive_summarizer = AbstractiveSummarizer()
    return abstractive_summarizer

@app.route("/")
def index():
    return render_template("index.html")

@app.route("/api/news")
def get_news():
    """Fetch latest news from all categories."""
    category = request.args.get("category", "all")

    if category == "all":
        all_news = rss_fetcher.fetch_all()
        # Flatten and sort
        articles = []
        for cat, arts in all_news.items():
            for art in arts[:10]:  # 10 per category
                articles.append({
                    "id": art.article_id,
                    "title": art.title,
                    "url": art.url,
                    "source": art.source,
                    "category": cat,
                    "summary": art.summary[:200] if art.summary else "",
                    "published": art.published.isoformat() if art.published else None,
                    "image": art.image_url
                })
    else:
        articles_list = rss_fetcher.fetch_category(category)
        articles = [
            {
                "id": art.article_id,
                "title": art.title,
                "url": art.url,
                "source": art.source,
                "category": category,
                "summary": art.summary[:200] if art.summary else "",
                "published": art.published.isoformat() if art.published else None,
                "image": art.image_url
            }
            for art in articles_list[:20]
        ]

    return jsonify({"articles": articles})

@app.route("/api/summarize", methods=["POST"])
def summarize_article():
    """Summarize a specific article."""
    data = request.get_json()
    url = data.get("url")
    style = data.get("style", "standard")
    method = data.get("method", "abstractive")

    if not url:
        return jsonify({"error": "URL required"}), 400

    # Extract article content
    content = article_extractor.extract(url)

    if not content:
        return jsonify({"error": "Could not extract article content"}), 400

    # Generate summary
    if method == "extractive":
        num_sentences = {"brief": 2, "standard": 3, "detailed": 5}.get(style, 3)
        summary = extractive_summarizer.summarize(content, num_sentences=num_sentences)
    else:
        summarizer = get_abstractive_summarizer()
        summary = summarizer.summarize_news(content, style=style)

    return jsonify({
        "summary": summary,
        "content_length": len(content),
        "summary_length": len(summary),
        "compression_ratio": f"{len(summary)/len(content)*100:.1f}%"
    })

@app.route("/api/digest")
def get_digest():
    """Get daily news digest."""
    all_news = rss_fetcher.fetch_all()

    digest = {}
    for category, articles in all_news.items():
        if articles:
            # Get summaries for top 3 articles
            summaries = []
            for art in articles[:3]:
                summaries.append({
                    "title": art.title,
                    "source": art.source,
                    "summary": art.summary[:150] + "..." if art.summary else ""
                })
            digest[category] = summaries

    return jsonify({"digest": digest, "generated_at": datetime.now().isoformat()})

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Step 7: Web Dashboard

<!-- templates/index.html -->
<!DOCTYPE html>
<html>
<head>
    <title>AI News Summarizer</title>
    <style>
        * { box-sizing: border-box; margin: 0; padding: 0; }
        body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; background: #f0f2f5; }
        .container { max-width: 1200px; margin: 0 auto; padding: 20px; }
        header { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 30px 20px; margin-bottom: 20px; border-radius: 12px; }
        h1 { font-size: 28px; margin-bottom: 10px; }
        .subtitle { opacity: 0.9; }

        .categories { display: flex; gap: 10px; margin-bottom: 20px; flex-wrap: wrap; }
        .category-btn { padding: 8px 16px; border: none; background: white; border-radius: 20px; cursor: pointer; font-size: 14px; transition: all 0.2s; }
        .category-btn:hover, .category-btn.active { background: #667eea; color: white; }

        .news-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(350px, 1fr)); gap: 20px; }
        .news-card { background: white; border-radius: 12px; overflow: hidden; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
        .news-card img { width: 100%; height: 180px; object-fit: cover; }
        .news-card .content { padding: 20px; }
        .news-card h3 { font-size: 16px; margin-bottom: 10px; line-height: 1.4; }
        .news-card .meta { font-size: 12px; color: #666; margin-bottom: 10px; }
        .news-card .summary { font-size: 14px; color: #444; line-height: 1.6; }
        .news-card .actions { margin-top: 15px; display: flex; gap: 10px; }

        button { padding: 8px 16px; border: none; border-radius: 6px; cursor: pointer; font-size: 13px; }
        .btn-primary { background: #667eea; color: white; }
        .btn-secondary { background: #e0e0e0; color: #333; }

        .modal { display: none; position: fixed; top: 0; left: 0; width: 100%; height: 100%; background: rgba(0,0,0,0.5); align-items: center; justify-content: center; }
        .modal.active { display: flex; }
        .modal-content { background: white; padding: 30px; border-radius: 12px; max-width: 600px; width: 90%; max-height: 80vh; overflow-y: auto; }
        .modal-content h2 { margin-bottom: 20px; }
        .modal-content .summary-text { line-height: 1.8; font-size: 16px; }
        .close-btn { float: right; font-size: 24px; cursor: pointer; }

        .loading { opacity: 0.5; pointer-events: none; }
    </style>
</head>
<body>
    <div class="container">
        <header>
            <h1>AI News Summarizer</h1>
            <p class="subtitle">Stay informed with AI-powered summaries</p>
        </header>

        <div class="categories">
            <button class="category-btn active" data-category="all">All News</button>
            <button class="category-btn" data-category="tech">Technology</button>
            <button class="category-btn" data-category="general">General</button>
            <button class="category-btn" data-category="science">Science</button>
        </div>

        <div class="news-grid" id="news-grid">
            <p>Loading news...</p>
        </div>
    </div>

    <div class="modal" id="summary-modal">
        <div class="modal-content">
            <span class="close-btn" onclick="closeModal()">&times;</span>
            <h2 id="modal-title"></h2>
            <div class="summary-text" id="modal-summary"></div>
        </div>
    </div>

    <script>
        let currentCategory = "all";

        async function loadNews(category) {
            currentCategory = category;
            const grid = document.getElementById("news-grid");
            grid.innerHTML = "<p>Loading...</p>";

            document.querySelectorAll(".category-btn").forEach(btn => {
                btn.classList.toggle("active", btn.dataset.category === category);
            });

            try {
                const response = await fetch(`/api/news?category=${category}`);
                const data = await response.json();
                displayNews(data.articles);
            } catch (error) {
                grid.innerHTML = "<p>Error loading news</p>";
            }
        }

        function displayNews(articles) {
            const grid = document.getElementById("news-grid");
            grid.innerHTML = articles.map(article => `
                <div class="news-card">
                    ${article.image ? `<img src="${article.image}" alt="">` : ""}
                    <div class="content">
                        <h3>${article.title}</h3>
                        <div class="meta">${article.source} | ${article.category}</div>
                        <p class="summary">${article.summary}</p>
                        <div class="actions">
                            <button class="btn-primary" onclick="summarize('${article.url}', '${article.title.replace(/'/g, "")}')">
                                AI Summary
                            </button>
                            <button class="btn-secondary" onclick="window.open('${article.url}', '_blank')">
                                Read Full
                            </button>
                        </div>
                    </div>
                </div>
            `).join("");
        }

        async function summarize(url, title) {
            const modal = document.getElementById("summary-modal");
            const modalTitle = document.getElementById("modal-title");
            const modalSummary = document.getElementById("modal-summary");

            modalTitle.textContent = title;
            modalSummary.innerHTML = "Generating AI summary...";
            modal.classList.add("active");

            try {
                const response = await fetch("/api/summarize", {
                    method: "POST",
                    headers: { "Content-Type": "application/json" },
                    body: JSON.stringify({ url, style: "standard", method: "abstractive" })
                });
                const data = await response.json();

                if (data.error) {
                    modalSummary.innerHTML = `Error: ${data.error}`;
                } else {
                    modalSummary.innerHTML = `
                        <p>${data.summary}</p>
                        <hr style="margin: 20px 0">
                        <small>Compression: ${data.compression_ratio} | Original: ${data.content_length} chars</small>
                    `;
                }
            } catch (error) {
                modalSummary.innerHTML = "Failed to generate summary";
            }
        }

        function closeModal() {
            document.getElementById("summary-modal").classList.remove("active");
        }

        document.querySelectorAll(".category-btn").forEach(btn => {
            btn.addEventListener("click", () => loadNews(btn.dataset.category));
        });

        // Close modal on outside click
        document.getElementById("summary-modal").addEventListener("click", (e) => {
            if (e.target.id === "summary-modal") closeModal();
        });

        // Load initial news
        loadNews("all");
    </script>
</body>
</html>

Running the Application

# Start the application
python app.py

# Access at http://localhost:5000

Conclusion

You’ve built a complete AI news summarization system that fetches, processes, and summarizes news from multiple sources. This system demonstrates production patterns including multi-source aggregation, hybrid summarization, and an interactive web interface.

Key takeaways:

  • Hybrid summarization combines extractive speed with abstractive quality
  • RSS feeds provide reliable, structured news access
  • Trafilatura excels at content extraction from web pages
  • BART and similar models generate fluent abstractive summaries
  • Caching and lazy loading optimize resource usage

This foundation can be extended with personalization based on reading history, topic clustering for related stories, email digest delivery, and integration with mobile push notifications.

Found this helpful? Share it!

Help others discover this content

About harshith

AI & ML enthusiast sharing insights and tutorials.

View all posts by harshith →