InfoWorld

AWS tightens default security on Redshift

Wed, 05 Feb 2025 03:14:08 +0000

Amazon’s security improvements for its AWS Redshift managed data warehouse service are welcome additions, says an expert.

Loris Degioanni, chief technology officer at Sysdig, told InfoWorld that AWS’s enhanced security defaults for Amazon Redshift are a “necessary evolution for the accelerated cloud adoption we’ve seen across organizations with varying security expertise. Secure configurations are the first line of defense, and by enforcing best practices from day one, these changes reinforce the shift-left mindset Sysdig has long championed. However, security doesn’t stop at strong defaults — continuous monitoring, risk prioritization, and real-time threat detection are critical.”

Redshift allows organizations to store and analyze their data using their choice of business intelligence tools. According to HGInsights, just over 27,700 organizations use Redshift.

The changes

The three changes are in Redshift’s default settings for newly created clusters, Redshift Serverless workgroups, and clusters restored from snapshots:

Public accessibility to data has been disabled. Newly-created clusters will be accessible only within a customer’s virtual private cloud [VPC]. If an administrator needs public access, they must explicitly override the default and set the “PubliclyAccessible” parameter to “true” when running the “CreateCluster” or “RestoreFromClusterSnapshot” API operations. Note that if Redshift applications are in a VPC other than Amazon’s, customers can configure cross-VPC access.
With a publicly accessible cluster, AWS recommends admins always use security groups or network access control lists (network ACLs) to restrict access;
Database encryption is enabled by default. In other words, the ability to create unencrypted clusters in the Redshift console is gone. When an admin uses the console, CLI, API, or CloudFormation to create a provisioned cluster without specifying an AWS Key Management Service (AWS KMS) key, the cluster will automatically be encrypted with an AWS-owned key. The AWS-owned key is managed by AWS.
Secure connections are enforced by default. This enforces encryption of communication between a customer’s applications and the Amazon Redshift data warehouse to help protect the confidentiality and integrity of the data being transmitted between the customer’s applications and Amazon Redshift.
A new default parameter group named “default.redshift-2.0” is created for newly created or restored clusters, with the “require_ssl” parameter set to “true” by default. New clusters created without a specified parameter group will automatically use the “default.redshift-2.0” parameter group, which will be automatically selected in the Redshift console. This change will also be reflected in the “CreateCluster” and “RestoreFromClusterSnapshot” API operations, as well as in the corresponding console, AWS CLI, and AWS CloudFormation operations.

For customers using existing or custom parameter groups, Redshift will continue to honor the “require ssl” value specified in the customer’s parameter group. However, AWS recommends that admins update this parameter to “true” to enhance the security of connections. Admins still have the option to change this value in their custom parameter groups as needed. The procedure is outlined in the Amazon Redshift Management Guide for configuring security options for connections.

Amazon also noted that those creating unencrypted clusters by using automated scripts or using data sharing with unencrypted clusters could be impacted by the changes. “If you regularly create new unencrypted consumer clusters and use them for data sharing, review your configurations to verify that the producer and consumer clusters are both encrypted to reduce the chance that you will experience disruptions in your data-sharing workloads,” it advised.

Asked why these changes are being made, Sirish Chandrasekaran, Redshift’s vice-president, said the additional security defaults will help customers adhere to best practices in data security from day one without requiring additional setup, reducing the risk of potential misconfigurations.

Out of the box, Redshift comes with a number of security capabilities, including support for multi-factor authentication, encryption for data at rest, access control and identity management and federation for single sign-on. But these and other tools are useless unless they are used and properly configured.

Recommendations

In a series of blogs, Palo Alto Networks made a number of recommendations to Redshift admins:

make sure they know exactly which users and roles have the ability to use the command “redshift:GetClusterCredentials” and “redshift-data.” An attacker can use this command to generate temporary credentials to access a Redshift cluster;
Redshift admins can create users and groups and assign them only the privileges they need. Admins should create a user per identity, so in the event of a security incident, it’ll be possible to track and monitor what data was accessed, or even prevent unauthorized access before it happens;
because Redshift can authorize identity and access roles that allow a data cluster to access external data sources, make sure these roles are limited to only those who need this access.

Palo Alto Networks also provided more detail on access control in a blog post, and Sysdig offered this advice on Redshift security best practices.

Finally, Sysdig’s Degioanni provided this additional caution: “While cloud providers provide a level of security for their underlying infrastructure and services, organizations remain responsible for protecting their own applications and data under a shared responsibility model. Because attacks can happen quickly — sometimes within minutes — having real-time visibility into production workloads is crucial for detecting active risk and responding to threats as they occur.”

Rust update fixes ‘forever’ compilation

Wed, 05 Feb 2025 01:26:57 +0000

The Rust Team has published a point release update, Rust 1.84.1, to address a few regressions in last month’s Rust 1.84 release, including fixing painfully slow compilation related to the next-generation trait solver.

Announced January 30, Rust 1.84.1 can be accessed as an update by running the command rustup update stable.

With the new trait solver, the goal is to check whether a given trait bound is satisfied, most notably when type checking the body of potentially generic functions. However, the new feature resulted in unexpectedly slow compilation in Rust 1.84. This is fixed with Rust 1.84.1.

In a second regression addressed by the 1.84.1 update, errors have been fixed for overlapping impls in incremental builds. A third regression patched involves resolving compiler errors in duplicate-crate diagnostics. And a fourth fixes missing debug info when LLVM’s location discriminator value limit was exceeded.

Several fixes also were made for those building Rust from source, including resolving symlinks of LLVM tool binaries before copying them and making it possible to use ci-rustc on tarball sources.

The follow-up to Rust 1.84, Rust 1.85, is due February 20, with capabilities including stable support for async closures.

Automated builds, tests, and quality gates are key to software quality – report

Tue, 04 Feb 2025 22:54:30 +0000

When it comes to software quality, the top three practices cited for production readiness include automated building and deploying, automated tests, and automated quality gates for every merge request, according to the JetBrains Qodana State of Software Quality Report 2024. In addition the report found that implementing automated quality gates for each merge request, using feature flags for controlled rollouts, and manual testing for dedicated QA/testing teams were the biggest challenges.

Released February 4, the JetBrains report includes findings from a survey of 808 developers, QA engineers, and other technical professionals done in July and August 2024. JetBrains, which produces the Qodana code quality platform, found that there was a clear shift toward more rigorous practices as products evolve from launch to maturity. This shift highlights the increasing complexity of maintaining code quality as companies grow, the report said.

Asked which software development processes led to production readiness, 64% of respondents said automated building and deploying was “very effective,” followed by 63% for automated tests and 55% for automated quality gates for every merge request. Asked which practices they desired but were lacking for assessing production readiness, 27% said they would implement automated quality gates if they had more resources and time, while 21% expressed the desire to use feature flags and 19% reported a need for manual testing by a dedicated QA/testing team. JetBrains also found that companies typically implement four to six mandatory checks as part of the pre-production quality assurance process. Chief among these was passing all unit tests written for the existing code.

The report also offers recommendations for making the adoption of new practices manageable. These include focusing on new and changed code only and starting with low-risk areas of newly introduced products or processes. Organizations also should choose tools that are readily available to developers.

Other findings in the JetBrains Qodana State of Software Quality Report 2024:

Common foes of software quality include technical challenges and organizational and process issues.
For pre-production readiness, data security and privacy issues were considered a “must fix” by 60% of respondents.
39% of respondents use manual testing by a dedicated team member and find it effective.
39% of respondents said manual testing required too much manual effort or workaround solutions — more than any other practice.
61% of teams use automated tests and find them effective in overall software quality assurance.

Anthropic unveils new framework to block harmful content from AI models

Tue, 04 Feb 2025 10:31:48 +0000

Anthropic has showcased a new security framework, designed to reduce the risk of harmful content generated by its large language models (LLM), a move that could have far-reaching implications for enterprise tech companies.

Large language models undergo extensive safety training to prevent harmful outputs but remain vulnerable to jailbreaks – inputs designed to bypass safety guardrails and elicit harmful responses, Anthropic said in a statement.

“Some jailbreaks exploit the system by flooding it with excessively long prompts, while others manipulate the input style, such as using unusual capitalization,” the company noted. Detecting and blocking these tactics has historically been challenging.

“In our new paper, we describe a system based on Constitutional Classifiers that guards models against jailbreaks,” Anthropic said. “These Constitutional Classifiers are input and output classifiers trained on synthetically generated data that filter the overwhelming majority of jailbreaks with minimal over-refusals and without incurring a large compute overhead.”

Constitutional Classifiers are based on a process similar to Constitutional AI, a technique previously used to align Claude, Anthropic said. Both methods rely on a constitution – a set of principles the model is designed to follow.

“In the case of Constitutional Classifiers, the principles define the classes of content that are allowed and disallowed (for example, recipes for mustard are allowed, but recipes for mustard gas are not),” the company added.

This advancement could help organizations mitigate AI-related risks such as data breaches, regulatory non-compliance, and reputational damage arising from AI-generated harmful content.

Other tech companies have taken similar steps, with Microsoft introducing its “prompt shields” feature in March last year, and Meta unveiling a prompt guard model in July 2024.

Evolving security paradigms

As AI adoption accelerates across industries, security paradigms are evolving to address emerging threats.

Constitutional classifier represents a more structured approach to AI security, embedding both ethical and safety considerations through layered and scalable filtering mechanisms that have yet to be publicly demonstrated by its competitors.

“Anthropic’s approach focuses on ‘universal jailbreaks,’ which systematically bypass the model’s safeguards and create unauthorized model changes that can increase the number of prompts, draining system resources or injecting illegal data at scale,” said Neil Shah, partner and co-founder at Counterpoint Research. “A systematic approach can effectively reduce jailbreaks, helping enterprises not only protect their data from hacking, extraction, or unauthorized manipulation but also avoid unexpected costs from unlimited API calls in cloud environments and prevent resource strain when deployed on-premises.”

Such a shift toward comprehensive, multi-layered security frameworks highlights the growing complexity of managing AI systems in enterprise environments.

As organizations increasingly rely on AI for critical operations, robust security measures like this will be key to mitigating both technical and financial risks.

Advantages in competition

As AI security becomes increasingly critical to enterprise adoption of LLMs, the latest offering could offer Anthropic a competitive edge.

“Competitors like OpenAI, with the widespread adoption and usage of ChatGPT, are in a stronger position due to their experience curve, allowing them to enhance their security frameworks and approaches almost daily,” Shah said. “However, Anthropic is more vocal and confident that its new security approach could serve as a differentiator, potentially sparking a new battle over security mechanisms as most models reach similar levels of capability in terms of parameters and performance.”

This emerging focus on security as a key differentiator highlights the evolving priorities within the AI industry, where technical performance alone may no longer be enough to maintain a competitive advantage.

For enterprises, this shift underscores the importance of evaluating not just model capabilities but also the robustness of security frameworks when selecting AI solutions.

Not seeing ROI from your AI? Observability may be the missing link

Tue, 04 Feb 2025 09:00:00 +0000

From chatbots, to coding copilots, to AI agents, generative AI-powered apps are seeing increased traction among enterprises. As they go mainstream, however, their shortcomings are becoming more clear and problematic. Incomplete, offensive, or wildly inaccurate responses (aka hallucinations), security vulnerabilities, and disappointingly generic responses can be roadblocks to deploying AI — and for good reason.

In the same way that cloud-based platforms and applications gave birth to new tools designed to evaluate, debug, and monitor those services, the proliferation of AI requires its own set of dedicated observability tools. AI-powered applications are becoming too important to treat as interesting but unreliable test cases — they must be managed with the same rigor as any other business-critical application. In other words, AI needs observability.

What is AI observability?

Observability refers to the technologies and business practices used to understand the complete state of a technical system, platform, or application. For AI-powered applications specifically, observability means understanding all aspects of the system, from end to end. Observability helps companies evaluate and monitor the quality of inputs, outputs, and intermediate results of applications based on large language models (LLMs), and can help to flag and diagnose hallucinations, bias, and toxicity, as well as performance and cost issues.

We need observability in AI because the technology is starting to show its limitations at the precise moment that it’s becoming indispensable — and for enterprises, the limitations are unacceptable.

For example, I teach a computer science course on trustworthy machine learning at Stanford University and advise my students to consider LLMs’ answers as hallucinatory unless proven otherwise. Why? Because LLMs are trained to generalize from large bodies of text, generating original text modeled on the general patterns found in the text they were trained on. They are not built to memorize facts.

But when LLMs are being used in place of search engines, some users approach them with the expectation that they will deliver accurate and helpful results. If the AI fails to do that, it erodes trust. In one egregious example, two lawyers were fined for submitting a legal brief written by AI that cited non-existent cases.

Hallucinations, security leaks, and incorrect answers undermine the trust businesses need to have in the AI-powered applications they build, and present roadblocks for bringing AI into production. If the LLM produces inappropriate answers, it also hurts the ability of consumers to trust the company itself, causing damage to the brand.

Moving beyond ‘looks good to me’

As one corporate LLM user told me, “We want an easy way to evaluate and test the accuracy of different models and apps instead of taking the ‘looks good to me approach.’” From evaluation to ongoing monitoring, observability is increasingly important to any organization using AI applications.

AI observability gives the owners of AI applications the power to monitor, measure, and correct performance, helping in three different aspects of corporate AI use:

Evaluation and experimentation: With so many AI models and tools on the market, it’s important that enterprises can easily determine which elements work best for their specific AI app use case. Observability is critical for evaluating different LLMs, configuration choices, code libraries, and more, enabling users to optimize their tech choices for each project.
Monitoring and iteration: Once an AI app has been deployed and is in use, observability helps with logging execution traces and monitoring its ongoing performance. When a problem crops up, observability is crucial for diagnosing the cause, fixing the problem, and then validating the fix — an iterative process of continuous improvement familiar to anyone who has worked with cloud software.
Tracking costs and latency: Tech leaders are becoming increasingly practical about their AI efforts. Gone are the days of unchecked AI spending — leaders are now deeply concerned about the ROI of AI projects, and understanding which use cases are delivering business results. From this perspective, the two essential dimensions to measure are how much an application costs and how much time it takes to deliver answers (known as latency). Throwing more GPUs and servers at an application can reduce latency, but it drives up cost. You can’t find the right balance for your application unless you can measure both accurately. Observability gives enterprises a clearer picture of both of these elements, enabling them to maximize results and minimize costs.

What enterprises should expect and demand from AI

As enterprises bring AI applications into production, they must expect and demand more than “good enough.” For AI to become a reliable, trustworthy component of business infrastructure, its answers must align with the “the 3H rule,” being honest, harmless, and helpful.

AI needs to be honest, meaning factually accurate and free of hallucinations. Enterprises must be able to use LLMs for tasks where their generalization is desirable: Summarizing, generating inferences, and planning. Honest AI also means the system recognizes and acknowledges when it cannot accurately answer a question. For example, if the answer just does not exist, the LLM should say “I cannot answer that” as opposed to spitting out something random.

For tasks where memorization of facts is more important, we need to supplement LLMs with additional information and data sources to ensure that responses are accurate. This is an active field of research known as retrieval-augmented generation, or RAG: Combining LLMs with databases of factual data that they can retrieve to answer specific questions.

AI needs to be harmless, meaning answers don’t leak personally identifiable information and are not vulnerable to “jailbreak” attacks designed to circumvent their designers’ guardrails. Those guardrails must ensure that the answers don’t embody bias, hurtful stereotypes, or toxicity.

Finally, AI needs to be helpful. It needs to deliver answers that match the queries users give it, that are concise and coherent, and provide useful results.

The RAG Triad: A framework for evaluating AI apps

The RAG Triad is one example of a set of metrics that helps evaluate RAG apps to ensure that they are honest and helpful. It includes three metrics — context relevance, groundedness, and answer relevance — to measure the quality of the three steps of a typical RAG application.

Context relevance measures how relevant each piece of context retrieved from the knowledge base is to the query that was asked.
Groundedness measures how well the final response is grounded in or supported by the retrieved pieces of context.
Answer relevance measures how relevant the final response is to the query that was asked.

By decomposing a composite RAG system into components — query, context, and response — this evaluation framework can triage the failure points and provide a clearer understanding of where improvements are needed in the RAG system and guide targeted optimization.

Figure 1. The RAG Triad

Snowflake

Guarding against harm involves aligning models to guard against safety risks (e.g. see Llama Guard) and adding guardrails to applications for metrics related to toxicity, stereotyping, adversarial attacks and more.

There has been substantial progress toward achieving the 3H requirements and making AI apps honest, harmless, and helpful. With AI observability, we can guard against hallucinations, catch irrelevant and incomplete responses, and identify security lapses. The rise of agentic workflows raises an additional set of challenges — checking that the right tools were called with the right parameters in the right sequence, that the execution traces from multi-agent distributed systems are properly logged and monitored, and that the end-to-end system behaves as expected — further underscoring the importance of AI observability.

Observability is critical to all applications that are critical to the business. AI observability will be a key enabling technology for helping AI realize its full potential to transform businesses, optimize processes, reduce costs, and unlock new revenue.

Anupam Datta is the principal research scientist and AI research team lead at Snowflake. He joined Snowflake as part of the acquisition of TruEra, where he served as co-founder, president, and chief scientist. Datta was on the faculty at Carnegie Mellon University from 2007 to 2022, most recently as a tenured professor of electrical and computer engineering and computer science.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

What you need to know about developing AI agents

Tue, 04 Feb 2025 09:00:00 +0000

Imagine developing an API for the generative AI era. This API would support natural language input, leverage intelligence from large language models, automate decisions by integrating with SaaS and enterprise systems, and enable business process orchestration by connecting to other genAI-enabled APIs.

That’s one way to understand what AI agents are and how they work. They integrate the reasoning of RAG-enabled LLMs with access to contextual business knowledge and the ability to act semi-independently. Whereas an LLM takes natural language input and answers questions, an AI agent is more like a business partner that actually gets work done. The more sophisticated autonomous agents are designed to work largely free of human interactions. Being autonomous, they incur more risks in interpreting inputs and taking actions.

AI agents are transforming the future of work by providing developers, business users, and others with a role-based partner that proactively automates tasks and acts as a knowledgeable collaborator. AI agents are generating significant business interest for their range of use cases and applicability beyond just intelligent information processing with genAI. Many AI agent use cases are not moonshots but solid proposals likely to get funded.

“AI agents have evolved from rule-based systems to intelligent, genAI-driven tools capable of natural language interaction,” says Ram Palaniappan, CTO of TEKsystems. They can be deployed in diverse use cases, such as procurement to evaluate suppliers and place orders, customer support to perform complex troubleshooting, and healthcare to provide diagnosis summaries. Organizations with process-driven, human-intensive operations benefit from scalability, 24/7 availability, and streamlined workflows.”

AI agents versus APIs

Let’s start by understanding how AI agents differ from APIs and other web services.

AI agents accept natural language and other non-technical inputs, including emerging AI agents that will accept voice, images, and video. Contrast this to APIs that only accept system-oriented inputs like JSON and XML.
Instead of coded business rules, AI agents connect to RAG, language models, and other genAI models for relevant knowledge.
AI agents can reason the next courses of action based on their assigned role and the defined guardrails of what they are not supposed to do. Because APIs are rules-driven, they require far more programming time and effort to simulate different roles, decisions, and actions.
Both APIs and AI agents can perform transactions and communicate with other services. AI agents can also act on their decisions and send natural language prompts to other AI agents.
AI agents provide natural language outputs to human collaborators. Non-technical people can validate an AI agent’s level of understanding and its actions because the decision-making steps and connection to APIs and other AI agents are summarized in natural language.

“Developers building AI agents for use cases like customer service should use natural language to encode business logic instead of code,” says Deon Nicholas, co-founder of Forethought. “They should equip agents with the ability to communicate with APIs in the same way humans communicate with websites. This will unlock truly agentic AI, which can take action and resolve issues, delivering a true value add.”

What kind of AI agent do you need?

There are several types of AI agents, classified by how they make decisions and perform actions. Model-based agents replace rules with AI models and supporting data, while goal and utility-based agents compare different scenarios before selecting a course of action. The more sophisticated AI learning agents use feedback loops to improve results, while hierarchical agents work in a group to deconstruct complex tasks.

Can you wrap an API with a natural language interface and call it an AI agent? The answer is yes; these are simple reflex agents that leverage rules to connect natural language input to action.

What are the prerequisites to developing AI agents?

When developing AI agents, you should be aware of prerequisites involving platforms, data, integration, security, and compliance.

“The success of AI agents requires a foundational platform to handle data integration, effective process automation, and unstructured data management,” says Rich Waldron, co-founder and CEO of Tray.ai. “AI agents can be architected to align with strict data policies and security protocols, which makes them effective for IT teams to drive productivity gains while ensuring compliance.”

Mike Connell, COO of Enthought, says you need a high volume of clean and (for some applications) labeled data that accurately represents the problem domain to train and validate models. Connell says, “A robust data pipeline is essential for preprocessing, transforming, and ensuring the availability of real-time data streams to refine the model and keep it calibrated to a changing world. Additionally, you should consider the need for domain-specific ontologies or embeddings to enhance the agent’s contextual understanding and decision-making capabilities.”

Regarding security and compliance, Joseph Regensburger, VP of research at Immuta, says AI agents have identities, so access to complex AI chains and knowledge graphs requires controls as if they were human. Regensburger recommends, “Capture the frequent changes in regulations and business agreements in an access control solution and enforce them on all potential human and machine actors.” Keeping up with changing business rules is essential to ensure AI agents are not developed based on outdated usage agreements.

Technologies and platforms for developing AI agents

Enterprise platforms such as Appian, Atlassian, Cisco Webex, Cloudera, Pega, Salesforce, SAP, ServiceNow, and Workday have announced AI agent capabilities embedded in their workflows and user experiences. For example, the Workday recruiter agent helps HR recruiters find and hire talent, while Atlassian’s AI-powered virtual service agent helps automate tier-1 support issues.

Some platforms also have capabilities for subject matter experts and non-technical business users to develop their own AI agents. Salesforce Agent Builder allows non-technical users to create customized AI agents. Users describe the agent’s role and select topics representing the work to be done, and the AI maps these to activities that can be performed on the platform. Other platforms with AI agent-building capabilities include Cisco Webex AI Agent Studio, ServiceNow Agentic AI, and Tray.ai Merlin Agent Builder.

One option for AI agent development comes directly as a service from platform vendors that use your data to enable agent analysis, then provide the APIs to perform transactions. A second option is from low-code or no-code, automation, and data fabric platforms that can offer general-purpose tools for agent development.

“A mix of low-code and pro-code tools will be used to build agents, but low-code will dominate since business analysts will be empowered to build their own solutions,” says David Brooks, SVP of Evangelism at Copado. “This will benefit the business through rapid iteration of agents that address critical business needs. Pro coders will use AI agents to build services and integrations that provide agency.”

A third option comes from developing agents natively with code, AI agent builders, or LLM application frameworks.

“You can either build AI agents natively—such as with Python or C++—or use a framework like AutoGen, LangGraph, or CrewAI, but these may not scale well or have sufficient guardrails,” says Abhi Maheshwari, CEO of Aisera.” You also need modern data infrastructure, such as a data lake or lake house. Data must also be relevant for the domain and “seamlessly integrated using techniques like fine-tuning of LLMs or RAG,” says Maheshwari.

Organizations looking to be early adopters in developing AI agents will likely need to review their data management platforms, development tools, and smarter devops processes to enable developing and deploying agents at scale.

“To accelerate agent development, companies will need a robust set of tools that allow them to design, customize, deploy, and monitor agents at scale,” says Maryam Ashoori, director of product management of watsonx.ai, IBM. “This includes models optimized for function calling, middleware to orchestrate agents and connect them with broader enterprise toolsets, optimized runtime, technical guardrails, and governance capabilities to ensure they operate as intended. It will also require tooling that caters to a wide set of users and skillsets, from pro-code tools for developers to low-code and no-code tools for business users to embed them in daily workflows.”

Testing AI agents

Testing LLMs and validating accuracy requires human testers, automation, and synthetic data for basic accuracy testing, while more sophisticated techniques leverage secondary AI models and use generative adversarial networks (GANs) to test at scale.

Rahul Pradhan, VP of product and strategy, AI, data, and analytics at Couchbase, says, “Testing for accuracy via sophisticated observability tools, feedback loops, and fallback mechanisms will help organizations establish trust in AI agents, marking a leap toward leveraging agents that can perform tasks with autonomy.”

Mike Finley, CTO and co-founder of AnswerRocket, says AI agents can be tested for accuracy in two stages:

Require AI Agents to provide documented proof points where any facts used or quoted include their sources and any decisions made include documented logical steps describing their inputs.
AI verifiers are supervisor agents whose job is to watch the work of other AI agents and review accuracy while looking for subtle cues like a shift in tone.

AI agents will redefine the workforce

LLMs and RAGs received significant hype on generative AI that’s given way to the potential of how AI agents can impact productivity across a wide range of business workflows. As more platforms make agents available and development platforms scalable, new human and AI responsibilities will likely emerge.

“Agentic AI will reshape the workplace and create new roles, such as ‘Agent Managers’ to oversee specialized agents, strategically guide these systems, and ensure alignment with business roles, similar to supervisors managing teams today,” says Artem Kroupenev, VP of strategy at Augury. “As multi-agent systems grow, HR-like departments may emerge to manage a hybrid workforce of human and AI agents, focusing on training, coordination, and performance metrics. This hybrid approach could blend human intuition with machine efficiency for better productivity.”

The key to growth may not be in how easy it is to develop AI agents, but in whether and how organizations will trust them and whether employees will embrace their capabilities.

Public cloud providers are missing the mark with AI

Tue, 04 Feb 2025 09:00:00 +0000

Microsoft’s recent earnings call painted a sobering picture. Despite the company’s aggressive push into artificial intelligence and substantial investments in infrastructure, growth numbers fell short of expectations. As CEO Satya Nadella attempted to explain the shortfall to investors, one reality became increasingly clear: The traditional public cloud model is struggling to deliver on the promises of generative AI.

I’ve been warning about this scenario for months—losing some of my friends at public cloud providers in the process. The challenge isn’t just about having enough GPUs or data center capacity, it’s about fundamental misalignment between how public clouds are built and what AI workloads need. The public cloud providers are attempting to fit the square pegs of their existing cloud computing offerings into the round hole of AI. It’s not working out well.

When enterprises try to scale their AI initiatives on traditional cloud infrastructure, they’re often met with unpredictable costs, performance bottlenecks, and infrastructure limitations that make sustained growth hard. The problem is a basic architectural mismatch. Public cloud providers built infrastructure to accommodate generalized computing workloads. These are the standard enterprise applications that dominated the past decade. However, AI workloads are different. They require specialized hardware configurations, massive data throughput, and complex orchestration capabilities that weren’t part of the original public cloud design philosophy.

An AI and cloud mismatch

When I point this out, I get pushback. I am told that infrastructure built for general-purpose computing needs will also accommodate the special needs of AI workloads. You don’t have to be an AI engineer to understand that such a plan will not work. I suspect the cloud providers hoped to forgo the expense and risk of building out infrastructure to better accommodate AI systems and thought nobody would notice. Well, they noticed.

The mismatch manifests in several critical ways. First, the pricing models that worked well for traditional applications became prohibitively expensive when applied to AI workloads. Companies running large language models or training sophisticated AI systems are finding their cloud bills skyrocketing, often without proportional business value. I get a call a week from clients alarmed that their cloud bills are 20 times higher than what they expected. They are in a panic now that they are on the hook to get an AI system deployed that might drain IT budgets.

Second, the infrastructure itself isn’t optimized for the intensive, sustained computational demands of AI applications. What works for running a web application or database simply doesn’t cut it for modern AI workloads.

We’re already seeing the consequences. More enterprises are exploring alternative approaches, including private AI infrastructure and hybrid solutions. They’re finding that the promise of simple, scalable AI deployment in the public cloud often comes with hidden complexities and costs that make it difficult to achieve growth.

This isn’t just about technical limitations, it’s about business model adaptation. Public cloud providers need to recognize that AI requires a different approach to infrastructure, pricing, and service delivery. The current model of charging for general compute resources and adding premium fees for AI-specific services isn’t sustainable for most enterprises, and they are moving on to non-cloud alternatives.

The stakes are high. As enterprises continue to invest heavily in AI initiatives, they’ll gravitate toward platforms that can provide predictable performance, reasonable costs, and specialized infrastructure. Currently, that is AI private clouds, traditional on-prem hardware, managed service providers, and the new AI-focused microclouds, such as CoreWeave. Public cloud providers risk losing their position as the default choice for enterprise computing if they can’t adapt quickly enough. Since they are still lashing out at me, I suspect they have yet to get a clue.

What should enterprises do?

In the rapidly evolving landscape of artificial intelligence, enterprises face a pivotal moment. Imagine a company standing at a crossroads, confronted by the limitations of public cloud providers to adequately support their AI ambitions. Recognizing this challenge, savvy leaders are developing strategies to secure their organization’s future.

One approach gaining traction is the hybrid strategy, a dual-path plan that balances the agility of public cloud resources with the control of private infrastructure. Companies leverage public clouds for bursts of experimentation while dedicating specialized AI infrastructure for resource-intensive workloads. Flexibility and efficiency in an approach that has been around for a while.

Cost management is another vital consideration. The finance team should diligently monitor cloud expenses, armed with sophisticated tools that track usage in real time. They’re analyzing the total cost of ownership, uncovering insights about reserved instances and committed-use discounts, carefully picking the most economical options for their predictable AI workloads.

As they delve deeper, these enterprises begin a thorough assessment of their infrastructure needs. They ask the crucial questions: Which workloads truly require cloud scalability? What can run efficiently on dedicated hardware? By investing in specialized AI accelerators, they find the right balance between cost and performance.

Risk mitigation is paramount as well. To prevent vendor lock-in, leaders ensure their applications remain portable, mastering the art of container orchestration. They embrace flexibility in their data architecture, prepared to pivot as needed.

The path forward may be complex, but those who navigate it wisely will position themselves for success in an AI-driven world. It’s a journey to ensure not just survival, but growth and innovation to harness the true power of this AI stuff.

Linux Foundation warns of US OFAC sanctions

Mon, 03 Feb 2025 23:03:44 +0000

Developers must be cautious about who they interact with and where contributions come from, warns a Linux Foundation blog post intended to help open-source developers navigate US OFAC (Office of Foreign Assets Control) sanctions.

Published January 29 and titled, “Navigating Global Regulations and Open Source: US OFAC Sanctions,” the blog states that increased cybersecurity risk and regulatory compliance are creating burdens on open source communities that these communities must meet. While there are sanctions programs in place around the world, many developers will need to be mindful of laws and regulations like OFAC, which restrict or prohibit transactions with certain countries, entities, and individuals, the foundation warned.

Issues involving OFAC sanctions programs and open source are not common, but are important to be aware of, the foundation noted. “With the US and international sanctions targeting technology companies based in Russia, this issue has become a topic in certain open source communities that have participation from entities targeted by such sanctions,” it said. Violating OFAC sanctions can result in serious consequences, including large civil fines and criminal penalties. OFAC sanctions apply not just to financial transactions, but often to almost all interactions with a sanctions target, including those in the open source community spaces.

Further, OFAC sanctions only reflect the US sanctions programs. Many other countries also have similar sanctions programs in place, including the European Union, United Kingdom, Japan, Australia, Switzerland, China, and many more. OFAC-sanctioned countries include comprehensively sanctioned countries such as Russia, Iran, Cuba, and North Korea and other countries subject to OFAC sanctions including Iraq, Lebanon, Venezuela, and Nicaragua.

OFAC publishes a list of Specifically Designated Nationals (SDNs) and provides a OFAC SDN list search tool. Using the search tool, a user can check if an organization is on the OFAC SDN List. Some sanctions apply to entire countries (e.g. Iran), regions (e.g., the Crimea region of Ukraine), or governments (e.g., the government of Venezuela), and those countries, regions, and governments are not on the SDN List. The OFAC SDN list and search tool are also not exhaustive and any analysis cannot solely rely on this list, the foundation said.

The foundation stressed its commitment to open source and global collaboration and doing so responsibly while complying with laws and regulations where the foundation and community members operate. “It is disappointing that the open source community cannot operate independently of international sanctions programs, but these sanctions are the law of each country and are not optional,” the foundation said.

Several Russian maintainers of the Linux kernel were removed last fall because of various compliance requirements.

EU supports AI challenge to Silicon Valley and China

Mon, 03 Feb 2025 15:49:30 +0000

The European Union is backing a major AI initiative, OpenEuroLLM, in a bid to counter the dominance of Silicon Valley and China in artificial intelligence.

The initiative is being led by Jan Hajič, a computational linguist at Charles University in Czechia, and Peter Sarlin, co-founder of Silo AI.

“The models that OpenEuroLLM will build will be fully open, to exploit the advantages of Open Source that have been so successful in software development in the past,” Hajič told InfoWorld. This approach will allow the models to comply with EU regulations more easily, he said, adding that so-called open models from outside the EU are either only partially open (for example, just the model weights), or there is no transparency about their provenance, making it impossible to be sure whether they are unbiased.

Sarlin said the consortium’s focus is on enabling European companies to innovate while maintaining control of the AI technology they use.

“Many European businesses are already experimenting with AI, but with proprietary models, they don’t own the technology they build upon. OpenEuroLLM provides the foundation for companies to develop AI solutions they truly own and control, while working within European frameworks for responsible AI development,” he said. “Hopefully, the project can serve as an inspiration for a regulatory framework that embraces open source.”

The consortium behind the OpenEuroLLM project brings together 20 leading European research institutions, companies, and high-performance computing (HPC) centers to develop multilingual, open-source large language models tailored for European businesses and public services.

The project has received funding from the EU’s Digital Europe Program, and became part of the EU’s Strategic Technologies for Europe Platform (STEP) on Monday.

The EU’s three-year grant to fund the project is small for the moment — around €20.6 million. The consortium has €37.4 million in total funding, and additional contributions in the form of computing resources, according to the European Commission.

The biggest challenge for a project like this is access to computing capacity, Hajič said.

“There is no single place in Europe to have such a large computing power as in the U.S. or China. However, there are several large facilities available in the EuroHPC network which we will use. There is more computing power coming, such as in the AI Factories program funded by the EU. There are technical challenges related to distributed computing, especially for LLM training, but we are confident we can do it,” he said.

Another challenge is making multilingual LLMs that can serve all EU member states equally.

“We want to have comparable quality for all European languages, large or small, which is not the case today due to the scarcity of data in so-called low-resource languages. This is also a substantial challenge,” he said. “We are building on the expertise of all the partners — academic as well as companies — acquired in previous smaller projects.”

The EU has 24 official languages, several of them spoken by less than 1% of the EU’s population of around 450 million.

In addition to Silo AI, the private AI lab acquired last year by AMD for $665 million, four other companies are involved in the project: Aleph Alpha and Ellamind of Germany, Spain’s Prompsit Language Engineering, and LightOn of France, which recently became Europe’s first publicly traded generative AI company.

“The consortium aims to develop open-source multilingual and high-performance language models that will support the competitiveness and digital sovereignty of Europe,” LightOn said in a statement.

“Europe has the talent and resources necessary to take a leading position in this global AI competition,” Laurent Daudet, co-CEO of LightOn said in the statement. “To transform these efforts into a real strategic lever, Europe must not only capitalize on the AI Act, a true catalyst for innovation towards trustworthy AI, but also support a coordinated approach from its leaders. This is now possible thanks to the OpenEuroLLM consortium.”

Competing in a shifting AI landscape

The rise of China’s DeepSeek, with an open-source AI reasoning model that has disrupted the global AI market, has intensified the urgency for Europe to develop its own competitive AI ecosystem.

OpenEuroLLM seeks to address this challenge by developing AI models that are transparent and allow for community involvement. The initiative aims to ensure that AI models, software, and datasets remain fully open, enabling businesses to customize them according to their industry-specific needs while preserving linguistic and cultural diversity.

Europe’s AI future: a high-stakes battle

The EU’s AI ambitions come at a time when American and Chinese companies are racing to establish dominance in generative AI. While OpenAI, Google, and Anthropic lead AI development in the US, China’s DeepSeek has gained attention for its cost-effective and high-performance models.

With OpenEuroLLM, European leaders are betting that an open, collaborative approach can provide businesses with AI solutions tailored to their needs while ensuring Europe remains competitive in the AI race.

“This project is clearly contributing to Europe’s digital sovereignty. The AI regulations and standards are already in force, and we believe this project will test them to the limits,” said Hajič.

But even when the project runs up against those limits, that’s not the end of the story, he said. “If we see that there are rules that block us from bringing affordable, high quality models to the market, we will certainly work with the European Commission on amending them.”

Download the Agentic AI Enterprise Spotlight

Mon, 03 Feb 2025 15:00:00 +0000

Download the February 2025 issue of the Enterprise Spotlight from the editors of CIO, Computerworld, CSO, InfoWorld, and Network World.

Review: Solver can code that for you

Mon, 03 Feb 2025 09:00:00 +0000

In the area of programming automation, sometimes it’s better to fix a software problem by first analyzing the whole repository rather than guessing about what files are relevant to the problem. That’s particularly relevant when you are managing your project on GitHub or one of its competitors, because problem tickets can be part of the repo.

If you can automate the entire fix, even better. Essentially, wholesale is better than retail. Why not use AI to fix entire problems rather than limiting it to completing a line of code?

If you’re intimately familiar with your code and trying to solve a bug yourself, then you might be able to jump right to the offending methods. But that’s the exception, rather than the rule, and being the human expert doesn’t usually scale.

There are currently at least three commercial products addressing repo-level automated software engineering using large language models: Solver, GitHub Copilot Workspace, and Amazon Q Developer Agent. There are also several research projects in the space, for example OpenHands.

SWE-bench tests and ranks the efficacy of such efforts. Solver is currently (as of this writing) ranked #16 on the SWE-bench Verified leaderboard, with 50% of the benchmark problems resolved. As you can see yourself by viewing the leaderboard, rankings can change on a daily basis as new submissions come in.

What is Solver?

The people behind Solver describe it as a “cloud-native API that transforms software coding from a manual craft into a scalable utility. Unlike supervised AI coding assistants that offer inline suggestions focused on code completion, Solver empowers developers to offload time-consuming and tedious programming tasks wholesale.” They also talk a lot about “elastic engineering” and “repository-based reasoning.”

The API won’t be available to customers until later this year, but you can request early access at the website. The site looks nearly empty when you start, but it quickly fills up when you have Solver working on a problem. You can work on your own repos, or work on three demo repos supplied by the company. Suggested tasks are supplied for the demo repos, including bug fixes, security scans and fixes, functionality and speed improvements, refactoring, improving tests and documentation, and doing maintenance.

IDG

Solver example

I tried asking Solver to fix an example bug in the requests repo. Here’s the transcript. The TL;DR version of this section is that Solver does go off on tangents (like every AI code generation model I know of) but can recover from them, at least in Python, which is currently the only language in which it can run test code.

My query:

Requests that include multiple path separators (e.g. //) are now causing failures. This may be related to Authentication.

Solver:

Found 8 relevant files

‎src/requests/sessions.py

src/requests/adapters.py

src/requests/models.py

tests/test_adapters.py

tests/test_requests.py

src/requests/auth.py

tests/test_utils.py

src/requests/utils.py

Let’s first examine the sessions.py file since it appears most relevant to handling URLs with multiple path separators and authentication.

Reading src/requests/sessions.py

Let’s check how authentication stripping is handled since this seems to be related to both path separators and authentication. Let’s look at the should_strip_auth method.

Searching for should_strip_auth

Let’s examine the should_strip_auth method to understand how it handles URL parsing and authentication stripping.

Reading src/requests/sessions.py

Let’s look at how URLs are actually prepared, since the issue seems to be with path separators. Let’s check the PreparedRequest class in models.py.

Reading src/requests/models.py

Let’s see the rest of the prepare_url method since it’s responsible for URL preparation and might be where the path separator issue is occurring.

Reading src/requests/models.py

Let’s check how parse_url and requote_uri functions work since they’re key to URL processing. They should be in utils.py.

Searching for def parse_url

Let’s check where parse_url is imported from, as it might be in a different module.

Searching repo for parse_url

Let’s look at the URL handling in utils.py around line 1004 where there seems to be special handling for netloc and path.

Reading src/requests/utils.py

Let’s check the requote_uri function since it’s used in the final URL preparation and might be related to how path separators are handled.

Searching for def requote_uri

Let’s look at the requote_uri function to see how it handles URL encoding.

Reading src/requests/utils.py

Looking at the code, I believe I understand the issue. When URLs contain multiple path separators (//), they’re being handled in multiple places:

In sessions.py, there’s special handling for URLs starting with // (scheme-relative URLs)

In models.py‘s prepare_url, the URL is parsed and reconstructed

In utils.py, requote_uri handles URL encoding but preserves forward slashes as safe characters

The issue likely occurs when authentication is present and the URL contains multiple slashes. Let’s check how the URL is initially processed in models.py‘s prepare_url method.

Reading src/requests/models.py

I believe I’ve identified the issue. The problem occurs in the URL preparation process when there are multiple path separators and authentication is present. Here’s what’s happening:

When a URL contains multiple slashes (e.g. http://example.com//path) and authentication is present, the URL parsing and reconstruction process in prepare_url is not properly preserving the multiple slashes.

The key issue is in the URL reconstruction process where urlunparse is used with the parsed components, followed by requote_uri. While requote_uri preserves single slashes (they’re in the safe characters list), the initial parsing and reconstruction may be normalizing multiple slashes into single ones.

Me:

Finally! This sounds promising.

Solver:

Let’s check if there’s a test case for this scenario.

Searching for test_path_with_multiple_slashes

Let’s check for any existing tests that handle URLs with double slashes in general.

Searching for double slash

Based on my analysis, I can now submit a clear explanation of the issue and suggested fix. The problem occurs in the URL preparation process and needs to be addressed in the prepare_url method of PreparedRequest.

Solver finished

At this point I pressed the continue solving button to get Solver to continue along these lines.

Me:

You’re on the right track; please continue.

Solver:

Let’s add a test case first to verify the issue with multiple slashes and authentication.

Let’s check the existing test file first to see where we should add our test.

Reading tests/test_requests.py

Let’s add our test case to the TestRequests class. First, let’s find a good location near other URL-related tests.

Searching for test_path

Perfect, let’s add our test case right after the test_path_is_not_double_encoded test.

IDG

Now let’s modify the prepare_url method in models.py to preserve multiple slashes. The issue is likely in how the path is handled before being passed to urlunparse.

IDG

Note that a null is a suspect choice of marker.

Solver:

Let’s run the tests to verify our changes fix the issue. We’ll use pytest to run just our new test case.


$ python -m pytest tests/test_requests.py -k test_path_with_multiple_slashes -v
============================= test session starts ==============================
platform linux -- Python 3.9.20, pytest-8.3.3, pluggy-1.5.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /repo
configfile: pyproject.toml
plugins: cov-5.0.0, httpbin-2.1.0
collecting ... collected 330 items / 329 deselected / 1 selected

tests/test_requests.py::TestRequests::test_path_with_multiple_slashes FAILED [100%]

=================================== FAILURES ===================================
_________________ TestRequests.test_path_with_multiple_slashes _________________

self = 

    def test_path_with_multiple_slashes(self):
        """Test that URLs with multiple slashes are preserved."""
        url = "http://example.com//path//to//resource"
        request = requests.Request("GET", url).prepare()
>       assert request.url == url
E       AssertionError: assert 'http://examp...to%00resource' == 'http://examp.../to//resource'
E         
E         - http://example.com//path//to//resource
E         ?                    ^    ^^  ^^
E         + http://example.com/%00path%00to%00resource
E         ?                    ^^^    ^^^  ^^^

tests/test_requests.py:151: AssertionError
=========================== short test summary info ============================
FAILED tests/test_requests.py::TestRequests::test_path_with_multiple_slashes
====================== 1 failed, 329 deselected in 0.75s =======================
Exit Code: 1

The test failure shows that our null byte replacement approach isn’t working correctly – the null bytes are getting URL encoded. Let’s try a different approach using a more URL-safe temporary marker.

IDG

We need to add the re import at the top of the file.

Let’s check the current imports first.

Reading src/requests/models.py

I see we can add re to the existing imports. Let’s add it with the other standard library imports at the top.

IDG

Now let’s run the test again to see if our new approach works.


$ python -m pytest tests/test_requests.py -k test_path_with_multiple_slashes -v
============================= test session starts ==============================
platform linux -- Python 3.9.20, pytest-8.3.3, pluggy-1.5.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /repo
configfile: pyproject.toml
plugins: httpbin-2.1.0, cov-5.0.0
collecting ... collected 330 items / 329 deselected / 1 selected

tests/test_requests.py::TestRequests::test_path_with_multiple_slashes FAILED [100%]

=================================== FAILURES ===================================
_________________ TestRequests.test_path_with_multiple_slashes _________________

self = 

    def test_path_with_multiple_slashes(self):
        """Test that URLs with multiple slashes are preserved."""
        url = "http://example.com//path//to//resource"
>       request = requests.Request("GET", url).prepare()

tests/test_requests.py:150: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/requests/models.py:299: in prepare
    p.prepare(
src/requests/models.py:368: in prepare
    self.prepare_url(url, params)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = , url = 'http://example.com/path/to/resource'
params = {}

    def prepare_url(self, url, params):
        """Prepares the given HTTP URL."""
        #: Accept objects that have string representations.
        #: We're unable to blindly call unicode/str functions
        #: as this will include the bytestring indicator (b'')
        #: on python 3.x.
        #: https://github.com/psf/requests/pull/2238
        if isinstance(url, bytes):
            url = url.decode("utf8")
        else:
            url = str(url)
    
        # Remove leading whitespaces from url
        url = url.lstrip()
    
        # Don't do any URL preparation for non-HTTP schemes like `mailto`,
        # `data` etc to work around exceptions from `url_parse`, which
        # handles RFC 3986 only.
        if ":" in url and not url.lower().startswith("http"):
            self.url = url
            return
    
        # Support for unicode domain names and paths.
        try:
            scheme, auth, host, port, path, query, fragment = parse_url(url)
        except LocationParseError as e:
            raise InvalidURL(*e.args)
    
        if not scheme:
            raise MissingSchema(
                f"Invalid URL {url!r}: No scheme supplied. "
                f"Perhaps you meant https://{url}?"
            )
    
        if not host:
            raise InvalidURL(f"Invalid URL {url!r}: No host supplied")
    
        # In general, we want to try IDNA encoding the hostname if the string contains
        # non-ASCII characters. This allows users to automatically get the correct IDNA
        # behaviour. For strings containing only ASCII characters, we need to also verify
        # it doesn't start with a wildcard (*), before allowing the unencoded hostname.
        if not unicode_is_ascii(host):
            try:
                host = self._get_idna_encoded_host(host)
            except UnicodeError:
                raise InvalidURL("URL has an invalid label.")
        elif host.startswith(("*", ".")):
            raise InvalidURL("URL has an invalid label.")
    
        # Carefully reconstruct the network location
        netloc = auth or ""
        if netloc:
            netloc += "@"
        netloc += host
        if port:
            netloc += f":{port}"
    
        # Bare domains aren't valid URLs.
        if not path:
            path = "/"
        else:
            # Preserve multiple consecutive slashes in the path
            original_path = path
            path = re.sub("//+", "/", path)  # Normalize to single slashes for parsing
    
        if isinstance(params, (str, bytes)):
            params = to_native_string(params)
    
        enc_params = self._encode_params(params)
        if enc_params:
            if query:
                query = f"{query}&{enc_params}"
            else:
                query = enc_params
    
        url = requote_uri(urlunparse([scheme, netloc, path, None, query, fragment]))
    
        # Restore multiple slashes if they were present in the original path
        if path != "/" and "original_path" in locals():
            # Find the path portion in the final URL and replace it
>           url_parts = list(urlparse(url))
E           NameError: name 'urlparse' is not defined

src/requests/models.py:490: NameError
=========================== short test summary info ============================
FAILED tests/test_requests.py::TestRequests::test_path_with_multiple_slashes
====================== 1 failed, 329 deselected in 0.55s =======================
Exit Code: 1

Ah, we need to use the imported urlparse. Let’s fix the imports and try a simpler approach.

IDG

Let’s run the test again to see if this simpler approach works.


$ python -m pytest tests/test_requests.py -k test_path_with_multiple_slashes -v
============================= test session starts ==============================
platform linux -- Python 3.9.20, pytest-8.3.3, pluggy-1.5.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /repo
configfile: pyproject.toml
plugins: httpbin-2.1.0, cov-5.0.0
collecting ... collected 330 items / 329 deselected / 1 selected

tests/test_requests.py::TestRequests::test_path_with_multiple_slashes PASSED [100%]

====================== 1 passed, 329 deselected in 0.62s =======================
Exit Code: 0

The test passes. The simpler solution works because we’re letting the URL components pass through without any normalization. Let’s submit our changes which include:

Added a test case to verify multiple slashes are preserved in URLs with and without authentication
Modified prepare_url to preserve the exact path structure without normalization
Added a comment explaining the preservation of multiple slashes

Solver finished

Discussion with a founder

Somewhere in the middle of the session shown in the previous section, you’ll see that I pressed the continue solving button and told Solver, “You’re on the right track; please continue.”

I wondered why Solver had stopped here to wait for user input, so I sent in an issue report and asked, “Why does Solver stop after identifying the problem? Yes, it did continue when I pressed the appropriate button. This is a question, not a bug report.”

Surprisingly quickly, I got a considered response from Mark Gabel, one of the founders of Solver:

Hi, Martin! It’s great to hear from you and I’m glad you’ve had the chance to start playing with Solver.

Your question was:

Why does Solver stop after identifying the problem? Yes, it did continue when I pressed the appropriate button. This is a question, not a bug report.

This is a very insightful thing to point out — most users would have just hit the “please continue” and forgotten about it.

You’ve actually hit on a point that’s relevant to all kinds of AI agents, Solver included: Just how “trigger happy” should it be? By “trigger happy,” I mean its willingness to go ahead and actually do work — making code changes, etc.

It’s something we have to be careful with and there tends to be a delicate balance. If we train Solver to be too trigger happy, then it may run off on its own and make dozens of code changes based on a hunch or a guess, even given a very ambiguous task.

However, if we overcorrect to the opposite, it becomes way too shy, often not wanting to do anything on its own without checking with the developer for permission and confirmation of the right approach. And that’s really at odds with our vision of “elastic engineering,” in which you’re truly delegating tasks.

It’s something we tweak constantly, and you actually caught a case where Solver was probably a bit too shy.

One thing I’ve proposed — but we’re debating because it’s somewhat complex — is actually giving the developer control over this via some kind of slider or knob. Imagine the “lowest” setting being, basically, “Be extra cautious; check with me before you do anything; and ask lots of questions” and the “highest” setting being equivalent to “Just go for it; take a reasonable guess at any information you’re missing.” And the values you could adjust to in between would be blends of the two.

Thank you again for pointing out this issue and being thoughtful about it. And please don’t hesitate to reach out again — every email you send or issue you report is 100% visible to my team.

Best,
Mark

Automating software development tasks

While it’s not yet finished — I expect integrations with popular IDEs sooner rather than later — Solver demonstrably raises the bar on automated software engineering. While it has competition, it’s worth testing now, especially if your projects are in Python, the only language in which it can currently run code by itself.

Tetragon: Extending eBPF and Cilium to runtime security

Mon, 03 Feb 2025 09:00:00 +0000

If you’re responsible for security, you’d better be good at context switching. Enterprise infrastructure has never been more distributed, and the software stack has never had more moving parts. The average application today has more than 150 dependencies, according to Sonatype, and major languages like Java release updates up to 10 times per year.

No surprise, then, that modern security threats are increasingly of the software supply chain variety, entering through the most vulnerable components in our stacks. And troubleshooting scenarios have become exceptionally complicated.

From determining the entry point of the threat (a backdoor in versions 5.6.0 and 5.6.1 in the case of XZ Utils), to discovering where in your environment you are running compromised software, to pinpointing exactly which files were touched or exfiltrated by malicious actors — chasing all of this down means unpacking a Russian doll of different domains.

Runtime security is missing context

A decade since the first commit was made to Kubernetes, security teams and developers are still adapting to the shift from virtual machines to containers. Not only did Kubernetes’ labels and pods (bundles of containers) break traditional IP-centric security approaches, but questions that were straightforward with virtual machines — where services were running, for how long, where they ingressed, where they egressed, and what else they touched along the way — became murky in container environments.

In the past, a systems administrator would log into SSH and walk through a well-established, scripted procedure to run these interrogations. Today’s cloud-native environments — huge pools of Kubernetes nodes running interconnected microservices at scale, with pods coming and going at high frequency — have outgrown any sort of scripted procedure or established mental model. The lifetimes of pods can vary widely, and when terminated they push IP addresses back into the pool.

Not only are the locations of services difficult to pin down in a Kubernetes cluster, but the abstraction layers that make these distributed systems possible — between namespaces, pods, containers, kernels, and network — create blind spots for security pros who need to investigate them at run time and block threats with policies.

In order to answer security questions in containerized environments, you need much more context, at a much lower level. The place to get that context is the Linux kernel.

If you think about the Linux kernel as an API, opening any file or socket is like making an API call. The granularity of security data that is emitted to and from the kernel makes it the perfect collection point. Further, when a new packet comes into your server it needs to be processed — it needs to be routed, broken down and distributed and brought into user space. That makes the kernel the optimal point in the data path to enforce security policies. And we can do that using Tetragon.

Building on Cilium and eBPF

Created by Isovalent, the creators of Cilium, Tetragon is a Kubernetes-aware security observability and runtime enforcement tool that takes advantage of Cilium’s networking, security, and observability capabilities, which in turn rely on eBPF’s hooks into the Linux kernel. Like Cilium, Tetragon is free open source, available under the Apache 2.0 license.

One of the first widely adopted eBPF projects, Cilium has become the de facto container network interface (CNI) for Kubernetes environments, chosen by all three major public cloud providers for their managed Kubernetes services. Cilium was the first graduated project in the CNCF’s CNI category, and it’s one of the three most-contributed-to cloud-native open source projects, along with Kubernetes and OpenTelemetry.

Cilium uses eBPF to provide advanced networking capabilities including multi-cluster and multi-cloud networking, advanced monitoring capabilities including network, service, and security observability, and extensive network security capabilities including traffic encryption, network policy enforcement, and runtime enforcement. Tetragon is the subproject that focuses on runtime enforcement, drawing on eBPF’s hooks into the Linux kernel’s data path to give runtime security the missing context and policy enforcement primitives that never shipped with the Kubernetes operating model.

Detection and enforcement in the Linux kernel

Tetragon runs on any Linux machine and uses eBPF for observability and enforcement. It cares about “security significant” events, like process execution events, system call activities, high volume I/O activity including network and file access. By sitting in the kernel with eBPF, Tetragon can truly observe everything.

What sets Tetragon apart is its intelligent in-kernel filtering and aggregation capabilities. Rather than sending all events to user space for processing, Tetragon leverages eBPF to perform sophisticated filtering directly in the kernel. This approach enables real-time threat detection and policy enforcement with minimal impact on system performance.

Because of its lineage with parent-project Cilium, Tetragon brings powerful network-aware capabilities. It can correlate process-level activities with network flows, showing which specific processes within containers are establishing connections or attempting suspicious network activities. This network awareness extends across clusters and environments, with deep context about the originating processes and binaries.

Tetragon excels in use cases requiring deep security observability, such as detecting unauthorized process executions, monitoring sensitive file access, tracking privilege escalations, and identifying suspicious network patterns.

From an enforcement perspective, Tetragon enables real-time policy controls over system calls, file operations, network communications, and process behaviors — all defined through Kubernetes-native policies. This combination of deep visibility and granular control makes Tetragon particularly valuable for implementing zero-trust security postures in cloud-native environments.

A new era of network observability and security

Most engineers tend to think of “system calls” when they think of observability in the Linux kernel. But when operating at the kernel level, we can look at file access, at specific namespaces or containers, and then tie them with identity metadata. We can also look much more closely at networking events.

With the hooks of eBPF in the kernel and extension into the network, Tetragon promises much richer remediation workflows than are now possible. Traditional tools wait too long to act—they observe something happening in the kernel, move it into user space, and then decide whether to act on it or send an alert. Milliseconds matter. With Tetragon, policies are built to be enforced in real time within the kernel so malicious events never execute. So instead of just reacting to these security events we see over the network, with Tetragon we can actually block an action with a sigkill, or override a value so it never executes.

We’re at an interesting crossroads in security and networking today. Personas that were once completely separate—network administrators, virtual machine administrators, Kubernetes platform engineers, cloud engineers working on specific public cloud platforms—are melding into masters-of-all “platform engineering” teams.

Similarly, we’re seeing the boundaries between layers of the enterprise stack starting to relax, and seeking common operating models for simplicity. With Tetragon, the rich contextualization of security events, and ability to enforce against them, across not just user space but the kernel and the network layer, will make powerful runtime security a generalized skill that platform engineers and developers alike can tap into, to stay ahead of the software supply chain threat domain.

Jeremy Colvin is a senior engineer at Isovalent. Jeremy’s passion is digging into the bits and bytes of what makes good security. As one of the world’s leading experts on the eBPF program, Tetragon, Jeremy works with security engineers to create defense strategies for new threat types like XZ Utils, which require advanced contextualization across Kubernetes identities, network infrastructure, and namespace. Jeremy graduated from Princeton, focusing on policy around privacy and information security, and has a masters degree in information security from UC Berkeley. Outside of Isovalent, Jeremy enjoys playing soccer and volunteering with Best Buddies.

—

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

The DeepSeek lesson

Mon, 03 Feb 2025 09:00:00 +0000

During the past two weeks, DeepSeek unraveled Silicon Valley’s comfortable narrative about generative artificial intelligence by introducing dramatically more efficient ways to scale large language models (LLMs). Without billions in venture capital to spend on Nvidia GPUs, the DeepSeek team had to be more resourceful and learned how to “activate only the most relevant portions of their model for each query,” as Reflexivity president Giuseppe Sette notes.

It didn’t take long for everyone to start interpreting DeepSeek’s feat through the lens of their own biases. Closed-model vendors cried foul over theft of training data (given how much of their own training data was lifted from others, the irony police were out in full force), while open sourcerors saw DeepSeek as a natural fulfillment of open source superiority (despite the fact that there is no correlation between being open and winning in tech).

Lost in all this confirmation bias were two big developments, one positive and the other quite negative. First, AI need no longer be dominated by a billionaire’s club. DeepSeek didn’t democratize AI, exactly, but it has shown that AI entry costs needn’t require seed rounds in the hundreds of billions. Second, although there’s no reason to think open approaches to AI will win, there’s every reason to think that OpenAI’s hyper-closed approach will most definitely lose because it’s customer-unobsessed. Winning in AI won’t be about open versus closed, but rather about customer trust.

‘Techno-feudalism on steroids’

I don’t have anything to add to the financial implications of DeepSeek’s approach. As DeepLearning.AI founder Andrew Ng points out, “LLM token prices have been falling rapidly, and open weights have contributed to this trend and given developers more choice.” DeepSeek, by optimizing how it handles compute and memory, takes this to the next level: “OpenAI’s o1 costs $60 per million output tokens; DeepSeek R1 costs $2.19.” As he concludes, the expectation is that “humanity [and developers] will use more intelligence…as it gets cheaper.”

But who will build the tools to access that AI-driven intelligence? Here’s where things get interesting.

Although it’s fun to eviscerate OpenAI and others for finger-pointing over stolen training data, given these LLM vendors’ propensity to “borrow” copious quantities of others’ data to train their own models, there’s something far more troubling at play. As Me & Qi cofounder Arnaud Bertrand argues, “The far more worrying aspect here is that OpenAI is suggesting that there are some cases in which they own the output of their model.” This is “techno-feudalism on steroids,” he warns: a world in which LLM owners can claim ownership of “every piece of content touched by AI.”

This isn’t open source versus closed source. Closed source software doesn’t try to take ownership of the data it touches. This is something more. OpenAI, for example, is clear(ish) that users own the outputs of their prompts, but that users can’t use those outputs to train a competing model. That would violate OpenAI’s terms and conditions. This isn’t really different from Meta’s Llama being open to use—unless you’re competing at scale.

And yet, it is different. OpenAI seems to be suggesting that its input (training) data should be open and unfettered, but the data others use (including data that competitive LLMs have recycled from OpenAI) can be closed. This is muddy, murky new ground, and it doesn’t bode well for adoption if enterprise customers have to worry—even a little bit—about their output data being owned by the model vendors. The heart of the issue is trust and customer control, not open source versus closed source.

Exacerbating enterprise mistrust

RedMonk cofounder Steve O’Grady nicely sums up enterprise concern with AI: “Enterprises recognize that to maximize the benefit from AI, they need to be able to grant access to their own internal data.” However, they’ve been “unwilling to do this at scale” because they don’t trust the LLM vendors with their data. OpenAI has exacerbated this mistrust. The vendors that will end up winning will be those that earn customers’ trust. Open source can help with this, but ultimately enterprises don’t care about the license; they care about how the vendor deals with their data. This is just one of the reasons AWS and Microsoft were first to build booming cloud businesses. Enterprises trusted them to take care of their sensitive data.

In this early rush for gold in the AI market, we’ve become so fixated on the foundational models that we’ve forgotten that the real, biggest market has yet to emerge, and trust will be central to winning it. Tim O’Reilly is, as ever, spot on when he calls out the “AI company leaders and their investors” for being “too fixed on the pursuit or preservation of monopoly power and the outsized returns that come with it.” They forget that “most great companies actually come after a period of experimentation and market expansion, not though lock-in at the beginning.” The AI companies are trying to optimize for profit too soon in the market’s evolution. Efforts to control model output will tend to constrain customer adoption, not expand it.

In sum, AI vendors that want to win need to think carefully about how they can establish trust in a market that has moved too quickly for enterprise buyers to feel secure. Grasping statements, such as OpenAI’s, about model output data don’t help.

JavaScript Temporal to ease dates and times

Fri, 31 Jan 2025 23:10:21 +0000

Web developers will soon find it easier to work with dates and times in JavaScript, thanks to emerging browser support for JavaScript’s Temporal object, according to Mozilla.

Implementations of the new Temporal object have begun shipping in experimental releases of browsers, said Brian Smith, staff technical writer on the MDN Web Docs team at Mozilla, in a blog post on January 23. Applications relying on scheduling, time-sensitive data, or internationalization can use Temporal for precise, consistent dates, times, calendars, and durations.

However, stable, cross-browser support for Temporal has not been reached yet, and changes may be made as implementations develop, Smith said. Mozilla’s Firefox browser appears to have the most-mature implementation at this juncture, with support being built into the Nightly version behind the behind the javascript.options.experimental.temporal preference, he said. The main browser bugs that track Temporal implementations include Firefox, Safari and Chrome.

JavaScript Temporal was designed as a full replacement for the JavaScript Date object, to make date and time management reliable and predictable, Smith said. The date object has supported a user’s local time and UTC, with no time zone support. It also has had unreliable parsing and Date is mutable, a situation that can produce hard-to-trace bugs, according to Smith. Also, there are problems with Date across daylight savings time and historical calendar changes.

By contrast, Temporal offers support for time zone and calendar representations and many built-in methods for conversions, comparisons, computations, formatting, and other capabilities. Key concepts in Temporal include unique points in history, wall clock times, and durations. With experimental implementations landing, it is a good time to try out Temporal and become familiar with it, said Smith. MDN has online documentation on Temporal.

Plunge into Python: New tools and tips for Python developers

Fri, 31 Jan 2025 09:00:00 +0000

Create luscious-looking full-stack websites in nothing but Python, get a hands-on lesson in integrating Rust and Python, and learn everything you need to know about the care and feeding of Python virtual environments. And, if you’re a noob looking for an absolute beginner’s guide to Python, check out the third revised edition of a now-classic text. (Actually, you might want to check it out even if you aren’t a noob.)

The quantum computing reality check

Fri, 31 Jan 2025 09:00:00 +0000

In late 2024, a major pharmaceutical company invested $50 million in quantum computing initiatives through a leading cloud provider, hoping to revolutionize their drug discovery process. Six months later, they quietly shifted their resources to traditional high-performance computing and AI-driven solutions. This story isn’t unique; it represents a growing pattern of enterprises learning the hard way that most of quantum computing’s promises remain theoretical despite the aggressive marketing of quantum computing as a service (QaaS).

In some circumstances, quantum computing is a massive waste of money and a huge distraction. I suspect I’ll get some hate for this statement, even from companies and people I consider friends. But QaaS is taking priority over other investments during this push, which has been underway for decades but has yet to return much value. We must stop falling into these hype traps that cause us to lose money and progress.

I know a few of you will remember my post when I was skeptical but more optimistic about the potential for quantum computing in the cloud. Since then, I’ve done many hours of research and worked on enterprise projects that have evaluated or are using quantum computing. I now have a less optimistic take on QaaS, so I felt compelled to provide an updated view of this technology. I’m moving from a stance of “wait for it!” to “perhaps it’s time we moved on.”

The reality of quantum

As we enter 2025, it’s time for a frank discussion about quantum computing’s place in enterprise technology strategies. The allure is undeniable: QaaS promises to solve complex problems in minutes where traditional computers would take a millennium. Yet, the reality is far more nuanced and disappointing for those who seek immediate business value.

Major cloud providers have made quantum computing accessible through their platforms, which creates an illusion of readiness for enterprise adoption. However, this accessibility masks a fatal flaw: Most quantum computing applications remain experimental. Indeed, most require deep expertise in quantum physics and specialized programming knowledge. Real-world applications are severely limited, and the costs are astronomical compared to the actual value delivered.

Consider the experience of a global financial services firm that spent $20 million developing quantum algorithms for portfolio optimization. After 18 months of development, they discovered that enhanced classical algorithms combined with existing AI solutions could achieve comparable results at a fraction of the cost. This illustrates a crucial point: The opportunity cost of investing in quantum computing often outweighs the potential benefits.

Business ROI is a moving goalpost

The timeline to practical quantum computing applications is another sobering reality. Industry experts suggest we’re still 7 to 15 years away from quantum systems capable of handling production workloads. This extended horizon makes it difficult to justify significant investments. Until then, more immediate returns could be realized through existing technologies.

I’ve spent the past 10 years consulting on quantum computing projects, both cloud and not cloud, where the best path to value was not to use quantum computing at all. In many instances, the client ignored my recommendation and pressed on. In those instances, when the objective changed from “finding business value” to “developing an emerging technology,” I learned of failed projects. I always tell enterprises they won’t profit from building their own IT systems around R&D. Let others experiment.

The distraction factor cannot be overlooked. Companies pursuing quantum computing initiatives often divert valuable resources, including financial and human capital, from more pressing technological needs. Quantum computing may eventually deliver on its promises, but organizations must focus on solving today’s business problems with today’s tools.

The industry’s fascination with quantum computing has made companies fear being left behind or, worse, not being part of the “cool kids club”; they want to deliver extraordinary presentations to investors and customers. We tend to jump into new trends too fast because the allure of being part of something exciting and new is just too compelling. I’ve fallen into this trap myself.

“Juice worth the squeeze”

Organizations need to adopt a more pragmatic approach to quantum computing. Instead of making speculative investments in QaaS, companies should focus on understanding how existing technologies can solve their current business challenges. Don’t ignore quantum computing entirely, but do maintain an informed perspective while avoiding costly premature commitments.

The key takeaway isn’t that quantum computing lacks potential. The timeline for realizing this potential is longer than many vendors and enthusiasts suggest. Organizations must balance their excitement for quantum computing with practical considerations about immediate business value and return on investment. I’m optimistic about the potential value in QaaS. However, I don’t think it will be as much as many expect, and it might end up being just a stepping stone to other technologies with more business value potential.

Throughout 2025, the most successful organizations will resist the premature urge to jump on the quantum bandwagon. Instead, they’ll focus on leveraging proven technologies while maintaining a watchful eye. The future of quantum computing may be bright, but for most organizations, the present demands a more grounded approach to technology investment and innovation.

Remember, not every technological advance needs to be adopted immediately. Sometimes, the wisest decision is to let others pioneer the path while you extract maximum value from existing technologies. In the case of quantum computing, patience is the most profitable strategy.

TypeScript checks returns for conditional and indexed access types

Thu, 30 Jan 2025 23:38:08 +0000

Microsoft has moved TypeScript 5.8 into the beta stage. The new release of the company’s typed superset of JavaScript introduces a limited form of type checking against conditional types in return statements.

The TypeScript 5.8 beta was introduced January 29. To get started with the beta, developers can execute the following command in NPM: npm install -D typescript@beta.

With checked returns for conditional and accessed types, TypeScript now supports a limited form of checking against conditional types in return statements, to avoid type assertions. There is an issue with type assertions in that they can defeat legitimate checks that TypeScript otherwise would perform. With TypeScript 5.8, when a function’s return type is a generic conditional type, TypeScript now will use control flow analysis for generic parameters whose types are used in the conditional type, instantiate the conditional type with the narrowed type of each parameter, and relate against that new type.

Microsoft notes some limits to the feature. Special checking, for instance, only happens when a single parameter is associated with the type being checked against in a conditional type or used as a key in an indexed access type. When using a conditional type, at least two checks must exist, with a terminal branch including never. The parameter’s type has to be generic and be constrained by a union type. Overall, checks will not occur when a specific type is associated with type parameter.

With TypeScript 5.8, developers will not have to wait long for a general availability, according to the release schedule. A release candidate is due February 11 and the final release is due to arrive February 25. TypeScript 5.8 follows the release of TypeScript 5.7 in November, which featured improved error reporting.

Other features planned for TypeScript 5.8 include the following:

Optimizations are being introduced to improve the time to build a program as well as update a program based on a file change in either –-watch mode or editor scenarios. TypeScript now avoids any array allocations that would be involved while normalizing paths. Also, when edits are made that do not fundamentally change the fundamental structure of a project, TypeScript now avoids re-validating options provided to it.
TypeScript adds support for require() of ECMAScript modules in –-module nodenext. When –-module nodenext is enabled, TypeScript will avoid issuing errors on these require() calls to ESM files.
TypeScript introduces a stable –-module node18 flag. Users of Node.js 18 are thus provided with a flag offering a stable point of reference that does not incorporate certain behaviors that are in –-module next. Among these behaviors is that require() of ECMAScript modules is disallowed under node18 but allowed under nodenext.
TypeScript introduces an –-erasableSyntaxOnly flag. When enabled, TypeScript will only allow the use of constructs that can be erased from a file, and will issue an error if any constructs are encountered that cannot be erased.
Node.js 22 no longer accepts import assertions using the assert syntax. And when --module nodenext is enabled in TypeScript 5.8, TypeScript issues an error if it encounters an import assertion.

Microsoft’s new DocumentDB rethinks NoSQL on PostgreSQL

Thu, 30 Jan 2025 09:00:00 +0000

Microsoft’s recent launch of a standalone version of the MongoDB compatibility layer for its global-scale Azure Cosmos DB brought back an old name. Back in 2018, when the company unveiled a public version of the Project Florence database engine that powers much of Azure, they called it DocumentDB. That original name worked well for some of the database’s personalities, but its support for much more than JSON documents soon led to a new, now more familiar name. Cosmos DB has continued to evolve, with its document database capabilities offering a familiar set of MongoDB-compatible APIs.

A recent set of updates introduced the vCore variant of Azure Cosmos DB, which moves from the multi-tenant, cross-region, transparently scalable resource unit-based Cosmos DB to an alternative architecture that behaves more like traditional Azure services, with defined host virtual machines and a more predictable pricing model. The vCore-based MongoDB APIs are the same as those used with the cloud-scale resource unit version, but the underlying technologies are quite different, and moving from one version to the other requires a complete migration of your data.

Last week Microsoft revealed the differences in the two implementations when it unveiled an open-source release of the vCore Cosmos DB engine. Built on the familiar PostgreSQL platform, the new public project adds NoSQL features with the MongoDB APIs. As it focuses purely on storing JSON content, Microsoft decided to bring back the original DocumentDB name.

The new DocumentDB comes with a permissive MIT license and is intended to provide a standard NoSQL environment for your data to reduce the complexity associated with migrating from one platform to another. Choosing to work with PostgreSQL is part of that, as it has long been a popular platform for developers, one that’s had something of a recent renaissance.

A modern NoSQL database with PostgreSQL roots

By open sourcing a tool that’s already widely used in Azure, Microsoft is giving developers the ability to run something that’s already proven to work well. Most of the features we expect to find in a modern NoSQL store are already there, from basic CRUD (create, read, update, delete) operations to more complex vector search tools and the indexes needed to support them. This ensures you will be able to build on and extend a database that can support most scenarios.

DocumentDB sits on top of the existing PostgreSQL platform, which manages storage, indexing, and other key low-level operations. The result is that DocumentDB is implemented using two components: one to add support for BSON (Binary JavaScript Object Notation) data types and one to support the DocumentDB APIs, adding CRUD operations, queries, and index management.

BSON is the fundamental data type used in MongoDB but with implementations in most common languages. If you’re going to build a common NoSQL store based on MongoDB APIs, then BSON will be the way you represent your standard NoSQL data structures, such as key-value pairs and arrays. It’s easy to build JSON documents, but using BSON allows you to store and search content more effectively.

You can think of DocumentDB as a stack. At the bottom is PostgreSQL itself, then the DocumentDB extension that gives the database the ability to work with BSON data. Once installed it lets you parse BSON data and then use the PostgreSQL engine to build indexes, not only using the database engine’s standard tools but also other extensions. The result is the ability to deliver complex indexes that support all kinds of queries.

One useful feature is the ability to use PostgreSQL’s vector index capabilities to build your BSON data into a retrieval-augmented generation (RAG) application or use nearest-neighbor searches to build recommendation engines or identify fraud patterns. There’s a lot of utility in a NoSQL database with many different indexing options; it gives you the necessary foundations for many different application types—all working on the same data set.

Getting started with DocumentDB

This first public release of DocumentDB inherits code already running in Azure, so it’s ready to build and use, hosted on GitHub. The instructions in the project wiki are focused on using VS Code and Docker to build on top of WSL 2.0, though you can use any Linux via VS Code’s remote engine. You build the container, then make, install, and launch the binaries. The DocumentDB container already holds PostgreSQL, so once setup is complete, you can connect to its shell and start experimenting with BSON support.

From the shell, you can embed API calls in select statements. This allows you to experiment with operations before adding them to calls from your code. The shell lets you build collections, add items, and experiment with CRUD operations. Other operations apply filters and support queries, as well as building indexes across one or more fields in a collection. You can find a lengthy list of documented API functions in the project wiki, grouped into common sets of operations.

For now, the GitHub wiki is the main source of documentation for DocumentDB. It’s a little on the thin side and could do with more examples. However, DocumentDB is currently intended for developers who want an alternative to MongoDB, one that’s available with an open source license rather than a source-available license. For now, as there’s no SDK, you’ll need to build your own calls to the API. These are based on MongoDB, so porting applications shouldn’t be too complex.

Why this? Why now?

The reasoning behind the DocumentDB project seems to be the big ambition to deliver a standard NoSQL API and engine, much like that developed for SQL. Microsoft has a lot of experience working in standards bodies, especially building and delivering the essential tests needed to make sure that any implementation of the resulting standard meets the necessary requirements.

We’ve seen Microsoft deliver extensive test suites for protocols and languages, and we can expect this level of tooling to be a key component of any future NoSQL standard. We need common APIs and engine features to help with application and data portability. A common standard will allow NoSQL stores to compete on performance and other business-essential features such as scalability and resilience.

DocumentDB’s layered approach to delivering basic functionality is perhaps the most important part of what Microsoft is doing here. The blog post announcing DocumentDB talks about “a protocol translation layer” on top of the BSON extension, bridging APIs to the document store in a way that makes it possible to have a single store that looks like MongoDB to one set of clients, Aerospike to another, or CouchDB, Couchbase, and more.

A reference for a NoSQL standard

For DocumentDB to be the foundation of a NoSQL standard, it has to be vendor-neutral. By allowing you to switch protocols on top of the same underlying store, you can use the APIs you’re familiar with, no matter their source. Query engine designers can focus on their area of expertise, while the PostgreSQL team can continue to deliver the resilient, high-performance database necessary for modern applications.

One example of this is the latest release of the open source FerretDB NoSQL database. The latest release, FerretDB 2.0, is built using DocumentDB and is getting a considerable performance increase. The FerretDB team can continue to work on its own features, taking advantage of the open source DocumentDB to provide the core BSON support necessary for a MongoDB-compatible NoSQL database. The FerretDB team claims up to 20x better performance. It will continue to use its own Apache 2.0 license in parallel with Microsoft’s MIT license.

Another interesting point shows how much Microsoft has changed in the past decade or so: The first product shipping on the standalone DocumentDB is coming from Ferret, an open source company that’s not Microsoft.

DocumentDB is a project to keep an eye on, especially when Microsoft starts the process of using it as a reference implementation for a new NoSQL standard. With community support, hopefully we’ll then see a rapid rollout of the MongoDB API features that are currently missing—adding them into both the middleware layer to map them to PostgreSQL operations and the API implementation.

Browser Use: An open-source AI agent to automate web-based tasks

Thu, 30 Jan 2025 09:00:00 +0000

Author’s note: The generative AI revolution has sparked an explosion of open-source tools that fundamentally transform how developers build and deploy AI-powered applications. Each month here, I will introduce an innovative new project from the open-source AI ecosystem, providing an overview of the project along with some tips to help you harness its capabilities.

Project overview – Browser Use

Browser Use is an open-source project created by Magnus Muller and Gregor Zunic to make websites accessible to AI agents. As of January 2025, the project’s GitHub repository boasts over 21,000 stars and 51 contributors, reflecting its growing popularity in the AI automation landscape.

While APIs are the preferred mechanism for integrating external applications with AI agents, web browser automation plays an important role in digital interactions. Browser Use connects AI agents directly to web browsers, enabling them to autonomously navigate, interact with, and extract information from websites—effectively bridging the gap between artificial intelligence and web browsing. This is useful for developers seeking to create intelligent, web-native agents that can perform tasks ranging from data collection to complex multi-step workflows.

What problem does Browser Use solve?

Web automation and browser interaction have long been challenging for developers and AI researchers. Traditional tools like Selenium struggle with dynamic web elements, complex user interactions, and maintaining test stability across different browser environments.

Existing web automation frameworks are typically rigid, requiring extensive coding expertise and constant maintenance, which creates significant overhead for development teams.

The current landscape of browser automation is fragmented and inefficient. Developers face multiple pain points:

Managing dynamic web content that changes quickly
Ensuring cross-browser compatibility
Developing reliable interaction scripts
Maintaining test suites as web applications evolve

AI agents attempting web interactions encounter even more complex challenges. Most existing solutions lack the flexibility to autonomously navigate websites, interpret complex UI elements, and perform multi-step tasks without breaking. The WebArena leaderboard reveals that even the best-performing AI models have a success rate of only 35.8% when attempting real-world web tasks.

These limitations particularly impact developers, AI researchers, and automation engineers. Startups and enterprises seeking to build intelligent web-browsing agents are constrained by current technological barriers and unable to create robust, adaptable solutions that can reliably interact with diverse web environments.

A closer look at Browser Use

Browser Use is an open-source library designed to empower AI agents with seamless web browsing capabilities for Python developers. It provides a robust framework that enables AI to interact with websites dynamically, mimicking human-like browsing behavior across different programming ecosystems.

At the heart of Browser Use’s browser automation is Playwright, a powerful cross-browser automation library developed by Microsoft. Playwright enables reliable, fast web automation by providing a unified API for Chromium, Firefox, and WebKit browsers. It offers advanced features like automatic waiting, network interception, and robust selector engines, which Browser Use leverages to create more intelligent and resilient web interaction agents.

Browser Use relies heavily on Chromium to perform its tasks. I couldn’t find a way to change this behavior to utilize an existing browser on my machine.

The project supports multiple models:

OpenAI’s GPT models
Google Gemini
Azure OpenAI
Anthropic Claude
DeepSeek
Ollama

Browser Use distinguishes itself through several unique features:

Integration with multiple large language models (LLMs)
Persistent browser sessions
Complex workflow management
Intelligent DOM interaction

The library integrates smoothly with:

LangChain for AI workflow management
Playwright for cross-browser automation
Major AI development platforms

Browser Use employs a hierarchical agent architecture featuring:

A planner agent for task decomposition
A browser navigation agent for web interactions
Flexible skills for web page sensing and acting

By leveraging LangChain, Browser Use taps into the wide range of LLM support already provided by the popular framework.

One limitation I encountered while exploring the framework is its lack of integration with mainstream agent frameworks such as CrewAI, AutoGen, and PhiData. Consequently, I had to develop a custom tool and register it with the agent—not a straightforward process, as I needed to understand the JSON schema of the output and carefully extract the final content.

Key use cases for Browser Use

1. Web research and data extraction: Browser Use enables AI agents to autonomously navigate complex websites, extract structured information, and perform comprehensive research tasks. For instance, an AI agent can:

Automatically search job boards and compile detailed job listings
Scrape product information across multiple e-commerce platforms
Gather competitive intelligence by analyzing websites in real-time

2. Workflow automation:  The library allows AI agents to interact with web interfaces just like humans, automating multi-step processes such as:

Filling out online forms
Booking travel reservations
Tracking package deliveries
Managing account registrations and updates

3. Cross-platform integration:  Browser Use supports seamless integration with multiple LLMs and frameworks, enabling developers to build sophisticated web-interacting agents across various domains.

I tried out Browser Use with GPT-4o by attempting to bypass the BotDetect CAPTCHA demo (shown below), achieving a 75% success rate.

IDG

Harnessing AI agents for browser automation

Browser Use represents a pivotal innovation in AI agent development. It addresses critical challenges in web automation and browser interaction. By providing an open-source framework that enables AI agents to navigate websites dynamically, the project fills a significant gap in current web automation technologies.

The project thrives on community collaboration, welcoming contributions from developers worldwide. With an active GitHub community and open issues, Browser Use encourages developers to participate in expanding its capabilities. The project’s transparent development approach and MIT licensing make it accessible for both individual developers and enterprise teams.

While Browser Use is an open-source library for AI-driven browser automation, one of the commercial alternatives is BrowserBase. BrowserBase offers headless browser infrastructure for web automation. It distinguishes itself with features like advanced debugging, session recording, proxy support, and stealth mechanisms to avoid bot detection. Unlike Browser Use’s library approach, BrowserBase offers a complete infrastructure platform for running headless browsers, targeting enterprises needing scalable web automation solutions.

Bottom line – Browser Use

Browser Use stands out as a significant tool for developers seeking to integrate AI agents with web browsers. Its comprehensive features, ease of use, and active community support make it an asset in the realm of AI-driven web automation. By facilitating seamless AI-browser interactions, Browser Use contributes to the advancement of intelligent web-based applications.

Sourcegraph unveils AI coding agents

Thu, 30 Jan 2025 09:00:00 +0000

Looking to challenge well-established GitHub Copilot in AI-powered coding assistance, Sourcegraph has announced coding agents for code review, code migration, testing, documentation, and notification, along with an API for building custom agents.

Sourcegraph made the announcement on January 29. The Code Review Agent and the Agent API are available via early access programs, whereas the other agents will become available in the coming months, the company said. Sourcegraph also announced auto-edit for Sourcegraph Cody for Visual Studio Code, an AI-powered feature that suggests edits in a current file and elsewhere based on recent edits and instant code review, testing, and documentation feedback from agents. The company said it began to roll out auto-edit for Cody users today.

Sourcegraph also announced a unified experience for code search, chat, and agents in the editor, code review, and on the web, powered by an agentic RAG layer for greater accuracy. With agentic chat enabled, Sourcegraph Cody gains access to a suite of tools for retrieving and refining context including code search, code base files, the terminal, web browser searches, and any OpenCtx-compatible development tools.

InfoWorld

AWS tightens default security on Redshift

The changes

Recommendations

Rust update fixes ‘forever’ compilation

Automated builds, tests, and quality gates are key to software quality – report

Anthropic unveils new framework to block harmful content from AI models

Evolving security paradigms

Advantages in competition

Not seeing ROI from your AI? Observability may be the missing link

What is AI observability?

Moving beyond ‘looks good to me’

What enterprises should expect and demand from AI

The RAG Triad: A framework for evaluating AI apps

What you need to know about developing AI agents

AI agents versus APIs

What kind of AI agent do you need?

What are the prerequisites to developing AI agents?

Technologies and platforms for developing AI agents

Testing AI agents

AI agents will redefine the workforce

Public cloud providers are missing the mark with AI

An AI and cloud mismatch

What should enterprises do?

Linux Foundation warns of US OFAC sanctions

EU supports AI challenge to Silicon Valley and China

Competing in a shifting AI landscape

Europe’s AI future: a high-stakes battle

Download the Agentic AI Enterprise Spotlight

Review: Solver can code that for you

What is Solver?

Solver example

My query:

Solver:

Me:

Solver:

Me:

Solver:

Solver:

Discussion with a founder

Automating software development tasks

Tetragon: Extending eBPF and Cilium to runtime security

Runtime security is missing context

Building on Cilium and eBPF

Detection and enforcement in the Linux kernel

A new era of network observability and security

The DeepSeek lesson

‘Techno-feudalism on steroids’

Exacerbating enterprise mistrust

JavaScript Temporal to ease dates and times

Plunge into Python: New tools and tips for Python developers

Top picks for Python readers on InfoWorld

More good reads and Python updates elsewhere

The quantum computing reality check

The reality of quantum

Business ROI is a moving goalpost

“Juice worth the squeeze”

TypeScript checks returns for conditional and indexed access types

Microsoft’s new DocumentDB rethinks NoSQL on PostgreSQL

A modern NoSQL database with PostgreSQL roots

Getting started with DocumentDB

Why this? Why now?

A reference for a NoSQL standard

Browser Use: An open-source AI agent to automate web-based tasks

Project overview – Browser Use

What problem does Browser Use solve?

A closer look at Browser Use

Key use cases for Browser Use

Harnessing AI agents for browser automation

Bottom line – Browser Use

Sourcegraph unveils AI coding agents