Magzter GOLD ile Sınırsız Olun

Magzter GOLD ile Sınırsız Olun

Sadece 9.000'den fazla dergi, gazete ve Premium hikayeye sınırsız erişim elde edin

$149.99
 
$74.99/Yıl

Denemek ALTIN - Özgür

NLP: Text Summarisation with Python

Open Source For You

|

March 2025

Here's a simple Python method based on the Natural Language Toolkit for extractive text summarisation in natural language processing.

- Dr Dipankar Ray

NLP: Text Summarisation with Python

In natural language processing (NLP), frequency-based summarisation is a straightforward extractive text summarisation technique that selects sentences based on the frequency of important words in the text. The approach is based on the assumption that frequently occurring words represent the core themes of the text. Let's discuss a simplified algorithm using this approach.

Steps in frequency-based summarisation Preprocessing:

  • Tokenization: Split the text into sentences and words.

  • Stop word removal: Remove common words like 'and', 'the', or 'is' that do not contribute to meaning.

  • Stemming: Reduce words to their base forms.

Word frequency calculation:

  • Count the occurrences of each word in the text.

  • Normalise frequencies if needed, e.g., by dividing by the total number of words.

Sentence scoring:

  • Assign scores to sentences based on the cumulative frequency of the words they contain.

  • Sentences with more frequent words score higher.

Sentence selection:

  • Rank sentences by their scores.

  • Select the top n sentences (based on a predefined ratio or word count) to form the summary.

Natural Language Toolkit (NLTK) package-based text processing uses this package with all the required modules. The following modules have been used here.

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

from nltk.stem import PorterStemmer

Tokenization

In natural language processing, tokenization divides a string into a list of tokens. Tokens are useful when finding valuable patterns; tokenization also replaces sensitive data components with non-sensitive ones.

Open Source For You'den DAHA FAZLA HİKAYE

Open Source For You

Open Source For You

Ukraine builds sovereign AI using Google's Gemma

Ukraine has launched a landmark national AI initiative, building a fully sovereign large language model (LLM) using Google's open source Gemma framework.

time to read

1 min

January 2026

Open Source For You

Open Source For You

Kubernetes vs Docker Swarm: Choosing the Right Orchestration Tool

This overview of the differences between Kubernetes and Docker Swarm will help DevOps developers determine the right container orchestration tool for their project.

time to read

5 mins

January 2026

Open Source For You

Open Source For You

Quantum Programming: Speaking the Language of Qubits

Quantum software tools are evolving and will soon make quantum computing easily accessible. With tools like Qiskit and Cirq, anyone can begin exploring the quantum world, experiment with algorithms, and contribute to a rapidly evolving field.

time to read

7 mins

January 2026

Open Source For You

Open Source For You

Building a Real-Time Grocery Price Comparison System

This real-time grocery price comparison system has been designed for leading Indian e-commerce platforms. Built on microservices architecture, the system leverages FastAPI for backend services and Selenium for dynamic web scraping to deliver accurate, up-to-date pricing data.

time to read

7 mins

January 2026

Open Source For You

Open Source For You

The Role of Generative AI in the AWS Database Migration Service

The integration of generative AI in the AWS Database Migration Service enhances schema conversion, making data migrations more accurate, speedy, and efficient.

time to read

7 mins

January 2026

Open Source For You

Open Source For You

Proxmox unveils open source Datacenter Manager

Proxmox has officially launched the first full and stable version of its Datacenter Manager, positioning itself as a robust open source alternative for private cloud deployments.

time to read

1 min

January 2026

Open Source For You

Open Source For You

Linux Foundation lauds Japan's OSS boom but warns of governance and security gaps

Linux Foundation Research has released its latest report, ‘The State of Open Source Japan 2025: Accelerating Business Value through Strategic Open Source Engagement’.

time to read

1 min

January 2026

Open Source For You

Open Source For You

Improving Microservices Performance with Django

Django, along with other open source tools like Redis, PostgreSQL, Celery and NGINX, helps address the challenges of request overhead and latency in microservices architecture.

time to read

4 mins

January 2026

Open Source For You

Open Source For You

Manim: Creating Dynamic Visual Animations

Learn about how Manim, a Python library, converts code into captivating animations, and why it has become the preferred choice for educators, developers, and content creators worldwide.

time to read

4 mins

January 2026

Open Source For You

Open Source For You

openSUSE empowers Linux developers with Intel NPU access

The openSUSE project has begun distributing packaging for the Intel Neural Processing Unit (NPU) driver, enabling small-scale AI development on Linux.

time to read

1 min

January 2026

Listen

Translate

Share

-
+

Change font size