top of page

Siri's Supposed Upgrade: Revolutionary AI or Just Catching Up with the Crowd

4/7/24

Editorial team at Bits with Brains

Apple, always the dark horse in AI, has recently made significant strides in AI technology, particularly in the areas of reference resolution and multimodal AI.

Apple's recent research in artificial intelligence (AI) has led to the development of a new system called ReALM (Reference Resolution As Language Modeling), which significantly enhances how AI understands and interacts with the content displayed on screens.


In essence, ReALM is designed to make sense of the ambiguous references we often use in daily conversation, such as "that" or "this," when we're referring to something specific on our devices. For example, if you're looking at a list of restaurants on your phone and ask Siri to "call the first one," ReALM helps Siri understand exactly which restaurant you're talking about, even though you haven't named it explicitly. This technology bridges the gap between human speech patterns and the AI's understanding of context, making interactions with devices more intuitive and natural.


The innovative approach behind ReALM lies in its method of converting everything visible on a device's screen into text. This includes all buttons, images, and other interactive elements. By transforming these visual elements into a language it can understand, ReALM allows the AI to process requests more efficiently, without the heavy computational demands of image recognition technologies. This not only speeds up the AI's response time but also enhances privacy and security by eliminating the need to send data to the cloud for processing. As a result, Apple's AI, particularly Siri, becomes more capable of assisting users in a way that feels seamless and aligned with how we naturally communicate, making our interactions with technology smoother and more enjoyable.

With ReALM, Apple is positioning itself as a major competitor in the AI space, particularly in the development of autonomous agents that can understand and navigate the visual world. Reportedly, Apple’s model outperforms GPT 4 in certain aspects.


Nevertheless, there’s wide agreement that Apple is in catch-up mode when it comes to AI, and there has been a great deal of speculation that Apple will try and bridge the gap via acquisition.


One possible target is Perplexity AI, a company that specializes in web search using large language models. Perplexity AI's approach to web search involves using a large language model to summarize information from various sources, providing users with a more personalized and intuitive search experience. This approach could potentially revolutionize the way online search works, with users being able to do image search, video search, and even generate images in the format they need for their specific use case.


Another potential acquisition for Apple is Anthropic, a company that could help Apple run on-device features using large language models. This is particularly important for on-device chips, which can run large language models without requiring a constant internet connection. This could give Apple a significant advantage in the AI space, as on-device AI models can provide faster and more secure experiences for users.


The implications of these developments for organizations looking to implement AI could be significant. By integrating visual and textual data, Apple is creating more sophisticated AI models that can better understand and interact with the world around them. The potential acquisition of Perplexity AI and Anthropic could further solidify Apple's position in AI, providing the company with access to cutting-edge AI technology and expertise.


Sources:

[1] https://www.youtube.com/watch?v=h-nAx90DzeI

[2] https://petapixel.com/2024/04/02/apple-researchers-make-ai-that-can-read-between-the-pixels/

[3] https://hyscaler.com/insights/apple-realm-reference-resolution/

[4] https://arxiv.org/html/2403.20329v1

[5] https://www.marktechpost.com/2024/04/03/apple-researchers-present-realm-an-ai-that-can-see-and-understand-screen-context/

[6] https://venturebeat.com/ai/apple-researchers-develop-ai-that-can-see-and-understand-screen-context/

[7] https://economictimes.indiatimes.com/tech/technology/ettech-explainer-is-apples-realm-better-than-openais-gpt-4/articleshow/109003347.cms

[8] https://timesofindia.indiatimes.com/technology/tech-news/apple-claims-its-realm-is-better-than-gpt-4-at-this-task-explained/articleshow/109030643.cms

Sources

bottom of page