OpenAI Model Timing
Introduction The goal of this article is to explore the latency of different OpenAI models. When using AI models in production, latency is an important factor to consider. Comparing Model Architectures First, I test the latency for different OpenAI models. I test the following models: gpt-4, gpt-4-0613, gpt-3.5-turbo, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k, gpt-3.5-turbo-16k-0613, text-davinci-003, text-davinci-002, text-davinci-001, text-curie-001, text-babbage-001, text-ada-001, davinci-002, babbage-002, davinci, curie, babbage, and ada. These are all the OpenAI models that are available for inference through the chat and completions endpoints. The models can be divided into chat models, instruct models, and base models. Chat models are gpt-4 and gpt-3.5 and are LLMs that are optimized for chat. Instruct models are models that are trained with reinforcement learning through human feedback to follow instructions [1]. ...