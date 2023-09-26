OpenAI's ChatGPT is all set for a major update, which will allow the chatbot to have voice conversations with users and interact using images. The update will move it closer to popular artificial intelligence (AI) assistants like Apple's Siri, Amazon's Alexa, Samsung's Bixby, and others.

In a blog post on Monday (September 25), OpenAI said that the voice feature "opens doors to many creative and accessibility-focused applications".

"We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you're talking about," the post read.

The company said that voice and image will provide the users with more ways to use ChatGPT in day-to-day life. For instance, one can take a picture of a landmark while travelling and have a live conversation with the chatbot about what's interesting about it.

The company also gave examples such as after dinner, the advanced version of the chatbot will help a child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.

Currently, similar AI services like Siri and Alexa are integrated with the devices they run on. They are often used to set alarms and reminders and deliver information off the internet.

OpenAI said that they are rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks. The voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.

How to update the app?

The company said that to get started with voice, head to Settings → New Features on the mobile app and opt into voice conversations. Then, tap the headphone button located in the top-right corner of the home screen and choose your preferred voice out of five different voices.

OpenAI said in the post that the new voice capability is powered by a new text-to-speech model, which is capable of generating human-like audio from just text and a few seconds of sample speech.

"We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text," it said.

Meanwhile, the image understanding is powered by multimodal GPT-3.5 and GPT-4. These models apply their language reasoning skills to a wide range of images.

The blog post said that now, the user can show ChatGPT one or more images and ask about their problems. For example, one can troubleshoot why the grill won't start, explore the contents of the fridge to plan a meal, or analyse a complex graph for work-related data.

(With inputs from agencies)

