Exploratory Data Analysis and Market Basket Analysis with Python

In the realm of retail, understanding customer behavior and optimizing product offerings can be a game-changer. In this blog post, we’ll explore how to perform Exploratory Data Analysis (EDA) and Market Basket Analysis using Python, specifically focusing on a dataset related to retail transactions.

Introduction

The dataset we’re working with contains information about retail transactions. It includes details such as InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, and Country. Our goal is to explore customer purchase patterns and uncover associations between different products.

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

# Load the dataset
data = pd.read_excel('data_path')

The dataset contains information about retail transactions, including details such as Invoice Number, Stock Code, Description, Quantity, Invoice Date, Unit Price, Customer ID, and Country.

data.info()
data.head()

Data Overview

Before diving into the analysis, let’s get an overview of the dataset. We observe that there are over 500,000 entries, and some columns, such as Customer ID and Description, have missing values.

data.info()
data.Country.value_counts()
len(data.CustomerID.unique())

We have transactions from various countries, with the majority coming from the United Kingdom. There are 4,373 unique customers.

Customer Insights

Let’s identify the top customers based on their total purchase amount:

data['TotalPrice'] = data['Quantity'] * data['UnitPrice']
customer_purchase = data.groupby(['CustomerID']).TotalPrice.sum().sort_values(ascending=False)

The top customers in terms of total purchase amount are identified, with CustomerID 14646.0 leading the pack.

Data Cleaning

To ensure accurate analysis, we clean the data by removing credit records:

# Remove credit records
data = data[~data.InvoiceNo.astype('str').str.startswith('C')]
# Strip whitespace from Description
data['Description'] = data.Description.str.strip()

Basket Creation

Now, we focus on transactions from a specific country, for instance, Germany:

data_Germany = data[data.Country == 'Germany']
basket_Germany = data_Germany.groupby(['InvoiceNo', 'Description'])['Quantity'].sum().unstack().reset_index().fillna(0).set_index('InvoiceNo')

The basket is created, and the dataset is encoded for further analysis:

basket_encoded = basket_Germany.applymap(lambda x: 0 if x <= 0 else 1)
basket_Germany = basket_encoded

Market Basket Analysis

Using Apriori algorithm, we identify frequent itemsets and generate association rules:

frq_items = apriori(basket_Germany, min_support=0.05, use_colnames=True)
rules = association_rules(frq_items, metric="confidence", min_threshold=.1)
rules = rules.sort_values(['confidence', 'lift'], ascending=[False, False])

The association rules provide insights into item relationships. Let’s explore some of the interesting rules:

rules.head(20)

Among the top rules, we find interesting associations like “JUMBO BAG WOODLAND ANIMALS” being associated with “POSTAGE” with high confidence.

Conclusion

In this blog post, we embarked on a journey of exploring retail transaction data, identifying top customers, cleaning the data, and performing Market Basket Analysis. Understanding customer behavior and product associations can empower businesses to make informed decisions.

This is just a glimpse into the vast world of data analysis and its application in the retail domain. Further exploration and fine-tuning of parameters can reveal deeper insights, paving the way for data-driven strategies.

By leveraging Python and its rich ecosystem of libraries, businesses can unlock valuable information hidden within their data, driving growth and enhancing customer satisfaction.

Feel free to experiment with your own datasets and adapt the code to suit your specific business needs. Happy analyzing!

Creating a Hand Gesture Recognition System with Convolutional Neural Networks (CNN) and OpenCV

ByKishore January 29, 2024May 26, 2024

Hand gesture recognition is a fascinating application that involves the intersection of computer vision and machine learning. In this blog post, we’ll explore how to build a hand gesture recognition system using a Convolutional Neural Network (CNN) and OpenCV for real-time video processing. Building the Neural Network Let’s start by assembling the neural network using…

Data Analytics

Visualizing Data for Regression

ByKishore January 11, 2024May 27, 2024

Exploratory Data Analysis (EDA) Exploratory Data Analysis (EDA) is a crucial step in understanding and preparing data for building predictive models. In this lab, we focus on visualizing the dataset related to automobile pricing using Python. The dataset is loaded and cleaned, and now we’ll explore it through various visualizations. Summarizing and Manipulating Data: Developing…

Machine Learning

Understanding Decision Trees: A Comprehensive Guide with Python Implementation

ByKishore February 20, 2024May 27, 2024

Introduction: Decision trees are powerful tools in the field of machine learning and data science. They are versatile, easy to interpret, and can handle both classification and regression tasks. In this blog post, we will explore decision trees in detail, understand how they work, and implement a decision tree classifier using Python. What is a…

Machine Learning

Understanding CIFAR-10 Dataset and K-Nearest Neighbors (KNN) Classifier

ByKishore February 19, 2024May 26, 2024

In this blog post, we’ll explore the CIFAR-10 dataset and how to use the K-Nearest Neighbors (KNN) algorithm to classify images from this dataset. CIFAR-10 is a well-known dataset in the field of machine learning and computer vision, consisting of 60,000 32×32 color images in 10 classes, with 6,000 images per class. Loading and Preprocessing…

Generative AI

A Deep Dive into Transformers and its Function

ByKishore April 24, 2024May 24, 2024

Introduction: In recent years, Generative AI has witnessed a paradigm shift with the introduction of transformer models. These models, characterized by their attention mechanisms, have revolutionized natural language processing (NLP) and other generative tasks. In this blog post, we’ll explore the transformer architecture, its applications in NLP, and its extension to other creative domains. Understanding…

Data Analytics | Machine Learning | NLP

Exploring Named Entity Recognition with Conditional Random Fields

ByKishore January 9, 2024January 10, 2024

Named Entity Recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying entities, such as names of people, organizations, and locations, within a text. NER plays a crucial role in various applications, including information retrieval, question answering, and text summarization. In this blog post, we’ll dive into the world of…