[深度學習][Python]多層感知器（MLP）模型使用不同激活函數（ReLU 和 Sigmoid）的效果

螃蟹_crab

發佈於AI深度學習筆記

2024/05/26 更新2024/05/26 發佈閱讀 8 分鐘

本文將展示使用不同激活函數（ReLU 和 Sigmoid）的效果。

一個簡單的多層感知器（MLP）模型來對 Fashion-MNIST 資料集進行分類。

relu vs sigmoid

函數定義

Sigmoid 函數

Sigmoid 函數將輸入壓縮到 0到 1 之間：

特性：

輸出範圍是 (0,1)(0, 1)(0,1)。
當 xxx 趨向無窮大時，輸出趨向於 1；當 xxx 趨向負無窮大時，輸出趨向於 0。

ReLU 函數

ReLU 函數只保留正數，將負數輸出為 0：

特性：

輸出範圍是 [0,∞]。
將所有負輸入值壓縮為 0，正輸入值保持不變。

2. 梯度特性

Sigmoid 函數

Sigmoid 函數的梯度在輸入值非常大或非常小時會趨近於 0，這會導致梯度消失問題（Gradient Vanishing Problem）。
當激活值接近 0 或 1 時，導數值變得非常小，從而導致梯度傳遞到前幾層時變得幾乎為零，訓練變得非常緩慢。

ReLU 函數

ReLU 的梯度在正值範圍內為 1，在負值範圍內為 0。
ReLU 避免了梯度消失問題，因為在正值範圍內梯度不會變小。
但是，ReLU 存在「神經元死亡」問題，即如果一個神經元的輸出總是負值，它的梯度將永遠是 0，該神經元將不再更新。

3. 計算效率

Sigmoid 函數

計算 Sigmoid 涉及到指數運算，這在計算上相對比較昂貴。

ReLU 函數

ReLU 只需簡單的比較和取最大值運算，計算效率非常高。

4. 適用場景

Sigmoid 函數

常用於輸出層，特別是在二元分類問題中。

ReLU 函數

常用於隱藏層，在大多數現代神經網絡架構中是首選激活函數。

程式範例

import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
import matplotlib.pyplot as plt

# 載入 Fashion-MNIST 資料集
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

# 標準化影像數據到 [0, 1] 範圍
x_train = x_train / 255.0
x_test = x_test / 255.0

# 定義模型架構
def create_model(activation):
    model = Sequential([
        Flatten(input_shape=(28, 28)),
        Dense(128, activation=activation),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# 創建兩個模型，一個使用 ReLU，另一個使用 Sigmoid
model_relu = create_model('relu')
model_sigmoid = create_model('sigmoid')

# 訓練模型
history_relu = model_relu.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test), verbose=2)
history_sigmoid = model_sigmoid.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test), verbose=2)

# 評估模型
test_loss_relu, test_acc_relu = model_relu.evaluate(x_test, y_test, verbose=2)
test_loss_sigmoid, test_acc_sigmoid = model_sigmoid.evaluate(x_test, y_test, verbose=2)

print(f"ReLU Model Test Accuracy: {test_acc_relu}")
print(f"Sigmoid Model Test Accuracy: {test_acc_sigmoid}")

# 繪製訓練過程中的準確度和損失
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history_relu.history['accuracy'], label='ReLU Train Accuracy')
plt.plot(history_relu.history['val_accuracy'], label='ReLU Val Accuracy')
plt.plot(history_sigmoid.history['accuracy'], label='Sigmoid Train Accuracy')
plt.plot(history_sigmoid.history['val_accuracy'], label='Sigmoid Val Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history_relu.history['loss'], label='ReLU Train Loss')
plt.plot(history_relu.history['val_loss'], label='ReLU Val Loss')
plt.plot(history_sigmoid.history['loss'], label='Sigmoid Train Loss')
plt.plot(history_sigmoid.history['val_loss'], label='Sigmoid Val Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

繪製訓練過程

繪製訓練過程中的準確度和損失

準確率高

結果與比較

ReLU（Rectified Linear Unit）：通常在深度學習中效果更好，因為它在正值範圍內有較好的梯度傳遞效果，能夠減少梯度消失問題。
Sigmoid：在多層神經網絡中可能會遇到梯度消失問題，使得訓練變慢且難以收斂。

透過這個程式範例，你可以觀察到使用不同激活函數的模型在準確度ReLU 0.875 比Sigmoid 0.872好一點，由繪製訓練過程上來看ReLU收斂速度比Sigmoid來的快收斂。通常來說，ReLU 會比 Sigmoid 表現更好一點。

螃蟹_crab的沙龍AI深度學習筆記視覺辨識

留言

螃蟹_crab的沙龍

167會員

322內容數

本業是影像辨識軟體開發，閒暇時間進修AI相關內容，將學習到的內容寫成文章分享。興趣是攝影，踏青，探索未知領域。人生就是不斷的挑戰及自我認清，希望老了躺在床上不會後悔自己什麼都沒做。

螃蟹_crab的沙龍的其他內容

2024/06/21

[OCR][Python]tesseract 4.0 辨識模型Fine tune

微調(Fine tune)是深度學習中遷移學習的一種方法，其中預訓練模型的權重會在新數據上進行訓練。本文主要介紹如何使用新的訓練圖檔在tesseract 辨識模型進行Fine tune 有關於安裝的部分可以參考友人的其他文章 Tesseract OCR - 繁體中文【安裝篇】將所有資料

2024/06/21

[OCR][Python]tesseract 4.0 辨識模型Fine tune

2024/06/01

[OCR][Python]測試tesseract與easyOCR誰比較準跟快

平時都在用tesseract來辨識OCR的部分，在網路上也常常聽說easyOCR比tesseract好用，就拿之前測試的OCR素材來比較看看囉。以下輸入同樣圖片直接測試，並非絕對誰就比較準，只單純測試數字含英文的部分。圖片素材就是15碼(英文加數字)，檔名為OCR正確結果

2024/06/01

[OCR][Python]測試tesseract與easyOCR誰比較準跟快

2024/05/25

[深度學習][Python]使用簡單的神經網路來訓練辨識fashion_mnist資料

本文主要介紹神經網路訓練辨識的過程，利用fashion_mnist及簡單的神經網路來進行分類。使用只有兩層的神經網路來訓練辨識fashion_mnist資料。

2024/05/25

[深度學習][Python]使用簡單的神經網路來訓練辨識fashion_mnist資料

本文主要介紹神經網路訓練辨識的過程，利用fashion_mnist及簡單的神經網路來進行分類。使用只有兩層的神經網路來訓練辨識fashion_mnist資料。

看更多

你可能也想看

Learn AI 不 BI

AI說書 - 從0開始 - 130 | Masked Language Modeling 訓練

我想要一天分享一點「LLM從底層堆疊的技術」，並且每篇文章長度控制在三分鐘以內，讓大家不會壓力太大，但是又能夠每天成長一點。回顧 AI說書 - 從0開始 - 129 中說，Bidirectional Encoder Representations from Transformers (BER

#AI#ai#PromptEngineering

2024/08/13

Learn AI 不 BI

AI說書 - 從0開始 - 130 | Masked Language Modeling 訓練

#AI#ai#PromptEngineering

2024/08/13

陳沅綦的沙龍

柏林劇團《三便士歌劇》：巴里．柯斯基的經典再造，與布萊希特劇場的當代轉向

本文分析導演巴里・柯斯基（Barrie Kosky）如何運用極簡的舞臺配置，將布萊希特（Bertolt Brecht）的「疏離效果」轉化為視覺奇觀與黑色幽默，探討《三便士歌劇》在當代劇場中的新詮釋，並藉由舞臺、燈光、服裝、音樂等多方面，分析該作如何在保留批判核心的同時，觸及觀眾的觀看位置與人性幽微。

#2026北藝嚴選#北藝嚴選#臺北表演藝術中心

2026/02/11