Definition:
Vectorization /ˌvɛk.tə.raɪˈzeɪ.ʃən/ noun — In machine learning and data processing, vectorization is the process of transforming input data into numerical vector formats that can be efficiently processed by mathematical models, especially by neural networks and other linear algebra-based algorithms.
Vectorization is essential for enabling machines to handle text, images, audio, and other non-numeric data types. Examples include:
- Text data: transformed into vectors using Bag of Words, TF-IDF, or word embeddings like Word2Vec, GloVe, or BERT
- Images: represented as high-dimensional pixel arrays
- Categorical features: converted using one-hot encoding or embedding layers
Benefits of vectorization:
- Improves computational efficiency by enabling batch processing and GPU acceleration
- Standardizes input format for model training
- Captures semantic and structural relationships in data
In a broader context, vectorization also refers to rewriting algorithms to operate on entire data structures (vectors, matrices) rather than using explicit loops, thus accelerating performance in numerical computing libraries such as NumPy, TensorFlow, and PyTorch.
« Back to dictionary

