OCR Neural Network

Project Overview

This project involves the development of a complete Optical Character Recognition (OCR) system written entirely in C. Unlike typical Python implementations using libraries like TensorFlow or PyTorch, this project required implementing the mathematical core of a neural network (backpropagation, matrix operations, activation functions) from scratch.

Technical Architecture

The software pipeline transforms a raw image of text into digital characters through several distinct stages:

1. Preprocessing

Raw images are often noisy. We implemented several filters to prepare the data:

Grayscale Conversion: Weighted average of RGB channels.
Binarization (Otsu's Method): Calculating the optimal threshold to separate text from background.
Noise Reduction: Median and Gaussian filters to remove artifacts.
Rotation Correction: Detecting skew angle using Hough Transform and correcting it.

2. Segmentation

The system isolates individual characters using XY-Cut algorithm and histogram projection profiles. This step creates the specific input matrices that will be fed into the neural network.

3. Neural Network (The Core)

We built a Multilayer Perceptron (MLP) in C.

Topology: Input layer (pixel grid), Hidden layers, Output layer (characters).
Training: Supervised learning using the Backpropagation algorithm.
Maths: Implementation of Sigmoid activation function and its derivative for gradient descent optimization.
Serialization: Ability to save and load trained weights to/from a file.

Challenges & Optimization

The main challenge was performance and memory management. Since we allocated memory manually for large matrices representing weights and biases, preventing memory leaks (checked with Valgrind) was critical. We also optimized the training loop to converge efficiently on the XOR problem before scaling to OCR datasets.

Conclusion

This project achieved a final grade of 16.87/20. It was a rigorous exercise in applying low-level programming concepts to high-level AI problems. The source code is available here: GitHub - Perceptio-S3-EPITA.

My OCR