The foundational techniques of AI image generation involve teaching the model to recognize images and text together by processing images with a tokenizer or encoder to convert them into vectors (embeddings), doing the same for text, and then combining the two vectors through algebra. This approach enables the model to answer questions like explaining the content of an image based on text or generating an image from text.