GitHub - trungle14/Deep-vs-Shallow-Neural-Network-discussion

Deep-vs-Shallow-Neural-Network-discussion

As we may know Deep networks can be adapted to such prior information while the Shallow one can not. Let's actually see the results in different scenarios in this post.
Inspired by the work https://ojs.aaai.org/index.php/AAAI/article/view/10913 and the opportunity to expose this problem from my Professor in Predictive Analytics class.

Set UP

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

Generate simulated data

n_samples = 120000
X = np.random.uniform(-2 * pi, 2 * pi, n_samples)
Y = 2 * ((2 * np.cos(X)**2 - 1)**2) - 1
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.5, random_state=42)

Function to create a neural network model - Model architecture

def create_model(layers, units):
    model = Sequential()
    model.add(Dense(units, input_shape=(1,), activation='relu'))
    for _ in range(layers - 1):
        model.add(Dense(units, activation='relu'))
    model.add(Dense(1, activation='linear'))
    model.compile(optimizer=SGD(learning_rate=0.0001, momentum=0.9), loss='mean_squared_error')
    return model



# Define configurations
configs = {
    '1_layer': [24, 48, 72, 128, 256],
    '2_layers': [12, 24, 36],
    '3_layers': [8, 16, 24]
}

Results and discussion

# Store MSE and number of parameters for each configuration
performance = {}

# Train and evaluate models for each configuration
for name, units_list in configs.items():
    mse_list = []
    param_list = []
    for units in units_list:
        layers = int(name.split('_')[0])
        model = create_model(layers, units)
        model.fit(X_train, Y_train, epochs=10, batch_size=32, verbose=0)
        mse = model.evaluate(X_test, Y_test, verbose=0)
        mse_list.append(mse)
        params = model.count_params()  # Count the total number of parameters
        param_list.append(params)
    performance[name] = {'mse': mse_list, 'params': param_list}


# Plot Number of Units vs MSE
plt.figure(figsize=(12, 5))
for name, data in performance.items():
    layers = int(name.split('_')[0])
    plt.plot(configs[name], data['mse'], label=f'{layers} hidden layer{"s" if layers > 1 else ""}')
plt.xlabel('Number of Units')
plt.ylabel('Mean Squared Error')
plt.title('MSE vs. Number of Neurons in Hidden Layers')
plt.legend()
plt.show()

# Plot Number of Parameters vs MSE
plt.figure(figsize=(12, 5))
for name, data in performance.items():
    layers = int(name.split('_')[0])
    plt.plot(data['params'], data['mse'], label=f'{layers} hidden layer{"s" if layers > 1 else ""}')
plt.xlabel('Number of Parameters')
plt.ylabel('Mean Squared Error')
plt.title('MSE vs. Number of Parameters in Neural Network')
plt.legend()
plt.show()

We can alwaays see the deep NN always performance better than the shallow one although we have the same total number of neuron and other parameter.

Parameter Efficiency: Even with the same total number of neurons, a deep architecture might use its parameters more efficiently. By distributing neurons across multiple layers, a DNN can facilitate a more intricate partitioning of the input space, leading to better generalization from the data. Shallow networks might not partition the input space as effectively, potentially requiring more neurons to achieve similar performance.
Generalization to New Data: Deep networks tend to generalize better to new, unseen data. Their layered structure allows them to extract and abstract features in a way that is more robust to variations in input data. However, it's important to note that deeper networks are not universally better. They require more computational resources and data to train effectively, and they can be more prone to overfitting, especially in scenarios with limited training data. In some simpler tasks, or when interpretability is key, a shallower network might be preferable.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Deep vs Shallow Neural Network comparison.ipynb		Deep vs Shallow Neural Network comparison.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-vs-Shallow-Neural-Network-discussion

Set UP

Generate simulated data

Function to create a neural network model - Model architecture

Results and discussion

About

Releases

Packages

Languages

trungle14/Deep-vs-Shallow-Neural-Network-discussion

Folders and files

Latest commit

History

Repository files navigation

Deep-vs-Shallow-Neural-Network-discussion

Set UP

Generate simulated data

Function to create a neural network model - Model architecture

Results and discussion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages