Anthropic has introduced two major upgrades to its AI lineup; Claude 3.5 Sonnet and Claude 3.5 Haiku. Alongside these advancements, a new computer use feature has been launched in a public beta. These developments push the boundaries of automation, coding, and computer navigation, bringing new possibilities for developers and businesses alike.
Claude 3.5 Sonnet: Enhancing Software Engineering
The Claude 3.5 Sonnet offers significant upgrades over its previous version, with enhanced abilities in coding and automation. This model shines in agentic coding tasks, improving its performance on benchmarks like SWE-bench Verified, moving from 33.4% to 49%, outperforming publicly available models, including OpenAI’s o1-preview. It also scored higher on the TAU bench, used to assess tool-based problem-solving:
- Retail domain: From 62.6% to 69.2%
- Airline domain: From 36% to 46%
These gains come with no added cost or latency, making Claude 3.5 Sonnet an ideal solution for complex, multi-step development tasks. Companies such as GitLab have reported up to 10% better reasoning on DevSecOps tasks. The Browser Company also found the model to be exceptional for automating web-based workflows.
This model has been tested rigorously in partnership with the US and UK AI Safety Institutes to ensure safe deployment. Its compliance with the ASL-2 Standard, part of Anthropic’s Responsible Scaling Policy, confirms that it meets safety benchmarks required for broader use.
Image Source: Anthropic
Claude 3.5 Haiku: Affordable, Fast, and Capable AI
The new Claude 3.5 Haiku model is designed for speed and cost-efficiency while matching the performance of Claude 3 Opus, which is the Anthropic’s largest previous model, across many evaluations. This model demonstrates excellent results in low-latency tasks, making it suitable for real-time applications like user-facing products and data-intensive tasks.
Claude 3.5 Haiku scores 40.6% on SWE-bench Verified, outperforming earlier Claude models and even GPT-4o in some areas. It provides accurate tool usage and improved instruction-following capabilities, making it effective for generating personalized experiences from large datasets, such as purchase history, pricing records, or inventory data.
This model will be available later in October, via Anthropic’s API, Amazon Bedrock, and Google Cloud Vertex AI. Initially, it will support text-only tasks, with image input functionality expected soon.
AI-Driven Computer Use in Public Beta
One of the most exciting features Anthropic has introduced is Claude’s ability to use computers. Now in public beta, developers can use Claude to perform tasks just like a human, such as navigating screens, typing, clicking, and more. This feature allows the model to automate repetitive processes, conduct open-ended research, and even test software across multiple platforms.
Early adopters like Replit are already using this capability for automating complex UI navigation tasks, helping their Replit Agent product evaluate applications as they are developed.
In tests conducted by OSWorld, Claude 3.5 Sonner scored 22% when given more time to complete a task, outperforming other AI models that scored just 7.8%. Even so, the feature is still experimental and has some limitations. Tasks that require scrolling, zooming, or dragging can be challenging for the AI to perform smoothly. Developers are advised to start with low-risk projects to explore its potential. Anthropic promises ongoing improvements to this feature based on the feedback.
Ensuring Safe Deployment
To address concerns around security risks, such as spam, fraud, or misinformation, Anthropic has developed new classifiers to monitor and prevent misuse of the computer use feature. This proactive approach helps ensure the responsible deployment of AI-driven automation.
Dataset and Training Details of Claude Models
According to Google Cloud, all Claude models are trained through several techniques:
- Unsupervised learning (learning from patterns in raw data)
- Reinforcement Learning with Human Feedback (RLHF) (improving with feedback from people)
- Constitutional AI (a process involving both supervised learning and reinforcement learning).
Training Infrastructure
Claude 3.5 Sonnet v2 is trained using cloud services provided by Amazon Web Services (AWS) and Google Cloud Platform (GCP). The main frameworks used for development include PyTorch, JAX, and Triton.
Sources of Training Data
Claude models use a mix of data that includes:
- Public internet information that was collected up to August 2023, with Claude 3.5 Sonnet v2’s training ending in April 2024.
- Non-public data from third parties, which includes content created or labeled by users, companies, or hired service providers.
- Internally generated data by Anthropic for refining the model.
Data Cleaning and Filtering
To ensure high-quality data, Anthropic applies methods like deduplication (removing repeated information) and classification to filter out irrelevant or low-quality data.
Crawling Practices
When gathering public data from websites, Anthropic follows responsible crawling practices:
- robots.txt files and other website signals are respected to ensure compliance with site owners' preferences.
- Anthropic does not access password-protected or sign-in pages or bypass CAPTCHAs to collect data.
- Their web-crawling system operates transparently, making it easy for site owners to identify visits and communicate their preferences to Anthropic.
What’s Next
The Claude Sonnet 3.5 is already available for use, and the Claude 3.5 Haiku will be released later in October. Both models along with the computer use feature, can be accessed via Anthropic’s API, Amazon Bedrock, and Google Cloud Vertex AI. As these innovations evolve, Anthropic invites developers to provide feedback and experiment with these tools in safe practical applications.