#AI | Writing | Ozgur Yildiz

May 29, 2025

Vector search: finding similar content without keyword matching.

How vector databases work, how to query them efficiently, and when to use them instead of traditional search.
May 26, 2025

Embeddings: what they encode and a practical use case.

What embedding vectors represent, how to generate them with the OpenAI API, and how to use them for semantic search.
May 22, 2025

Rate limits and retry logic for AI APIs.

How OpenAI and other AI API rate limits work, and how to build retry logic that handles them without hammering the provider.
May 19, 2025

Caching LLM responses: what to cache, what not to, and the key design.

A practical guide to caching LLM API responses, including what makes a good cache key and when caching backfires.
May 15, 2025

Latency budget in a real-time AI app: where the milliseconds go.

How to break down end-to-end latency in a real-time AI application and identify which hops to optimize first.
May 12, 2025

STT to LLM to TTS: a pipeline where every hop adds latency.

How to architect a speech-to-speech pipeline and where to optimize each stage to minimize end-to-end latency.
May 8, 2025

Text-to-speech: latency, voice selection, and streaming audio back.

How TTS APIs work, how to pick voices, and how to stream audio to the client before the full synthesis is done.
May 5, 2025

Gemini vs OpenAI: the API differences that matter when you use both.

A practical comparison of the Gemini and OpenAI APIs covering authentication, message format, tool calling, and streaming.
May 1, 2025

OpenAI function calling: the feature that makes LLMs do real work.

How OpenAI function calling works, how to define tools, and how to wire the model's output back into your application.
Apr 28, 2025

Streaming LLM responses: why waiting for the full answer feels broken.

LLMs generate tokens one at a time. Streaming sends them as they're produced instead of waiting for completion. How to implement streaming with the OpenAI API and handle it on the client.
Apr 24, 2025

JSON mode in the OpenAI API: getting structured output you can actually use.

Getting consistent, parseable JSON from LLMs requires more than asking nicely. JSON mode, structured outputs, and the patterns that make LLM output reliable.
Apr 21, 2025

Speaker diarization: turning a transcript into 'who said what.'

Transcription gives you text. Diarization adds speaker identity. How diarization works, the tools available, and how to combine it with Whisper output.
Apr 17, 2025

Chunking audio for transcription: size, overlap, and the timing that matters.

The Whisper API has a 25MB file size limit. For long recordings, chunking is required. How to split audio correctly so transcription quality doesn't suffer at the boundaries.
Apr 14, 2025

OpenAI Whisper API: what the response actually looks like and how to use it.

A practical look at the Whisper transcription API response format, the verbose JSON mode with timestamps, and how to use the output in real applications.