Software Architect, Agent Evaluation & Core Framework Job at Datagrid AI, San Mateo, CA

TlJUQ0U0WGN2SWVyN2tvQkRZRWtpZWd4WlE9PQ==
  • Datagrid AI
  • San Mateo, CA

Job Description

Job Title:

Software Architect, Agent Evaluation & Core Framework

Location:

Remote First

SF Bay area preferred 

About Datagrid

Datagrid is the AI Agent that gets work done for you.

Instead of just answering questions, Datagrid’s agents take action—automating entire workflows across your tools, files, and systems. Whether it’s searching through documents to find answers, cross-referencing data to uncover gaps, or running a financial analysis that updates your Excel file—Datagrid does the work, so you don’t have to.

You get your time back. You 10x your output. The AI runs the playbook.

Behind the scenes, Datagrid connects to over 100 platforms and 2,000+ APIs—Excel, Google Docs, SharePoint, Slack, PDFs, websites, and more. It handles multi-modal problems like handling unstructured data like images and documents, as well as entire databases with ease, and communicates through channels like Teams, Slack, or SMS.

It’s built for trust and precision: agents cite their sources and operate safely in real-time. Enterprise teams get full control with teamspaces, RBAC, and usage reports. You can customize everything—launch fast on your own, or partner with our expert team.

From research to reporting, from digging through files to delivering results— Datagrid doesn’t just assist. It executes.

We’re looking for passionate individuals to join us at the frontier of AI innovation.

About the role:

Datagrid Agents operate where our customers work-across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app

Software Architect, Agent Evaluation & Core Framework, is crucial because we cannot manually test the vast array of agent interactions and capabilities. You will own and drive extending our evaluation harness to provide actionable reports on agent regressions and improvements, directly impacting strategic direction and customer experience. A key part of this will be incorporating the best open-source benchmarks into our evaluation set, and figuring out how to Agentically generate evaluations that are representative of customer use cases. As you become established, you will also have the opportunity to make fundamental changes to the Core Framework to improve the way Agents reason, use tools, and collaborate with humans. 

What you’ll do:

  • Work closely with an ex-Googler who built Gemini evals to create a harness for evaluating Agent performance, make that harness available both for local development and in CI/CD pipelines, and set up alerting for when Agents misbehave.
  • Influence and contribute to the extension of Datagrid’s Agentic capabilities.
  • Choose the best open/closed source components to build out the testing infra.
  • Integrate publicly available benchmarks such as RAGBench into the testing system.
  • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions.
  • Expose evaluation performance via alerts and dashboards

What you’ll have:
  • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1.
  • 10+ years of B2B software engineering experience.
  • Ability to write effective LLM prompts without assistance.
  • Proficiency with nodejs and server side frameworks such as NestJS or NextJS.
  • Familiarity with JavaScript frameworks such as React, Angular JS.
  • Experience with databases such as Weaviate and BigQuery.
  • Experience working with GCP or similar cloud providers.

Nice to Haves
  • Experience with any LLM evaluation platform (Galileo, Arize, LangSmith Orq)
  • Background in B2B SaaS automation tools
  • Contributions to open-source AI projects or published research
  • Familiarity with prompt engineering or model evaluation

Pay Range and Benefits

$200,000 – $240,000 USD per year, depending on experience and qualifications.

At Datagrid we set pay ranges using market data, internal benchmarks, and the scope of responsibilities. Final compensation within this range will be determined based on relevant experience, skills, and geographic location.

In addition to base salary, this role may be eligible for:

  • Equity in the company
  • Home office set-up reimbursement
  • Health, dental, and vision benefits
  • Flexible PTO and remote work options

Equal Opportunity Employer

Datagrid is an equal opportunity employer and is committed to building a diverse and inclusive team. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law. We encourage candidates from all backgrounds to apply.

Job Tags

Local area, Home office, Flexible hours,

Similar Jobs

University Hospitals

Patient Care Nurse Assistant - ICU Job at University Hospitals

 ...Job Description Description A Brief Overview The Patient Care Nursing Assistant (PCNA) completes patient care activities and provides clerical support as delegated by the RN. PCNAs perform duties including, but not limited to: ADLs, vital signs, I&O, 12-lead ECG... 

Louis Vuitton

Fashion Stylist Job at Louis Vuitton

Louis Vuitton seeks a Fashion Stylist in Scottsdale, Arizona to serve as a brand ambassador, focusing on the Ready-to-Wear business. The role involves driving sales targets, enhancing client experiences, and collaborating with key stakeholders. The position offers a competitive... 

Hireology Demo Account Ultimate Parent

Beta Tester Job at Hireology Demo Account Ultimate Parent

Test

Fastly

Senior Security Engineer - Detection Engineering Job at Fastly

 ...languages like Python, Ruby, Go, or Rust.Experience with cloud platforms such as AWS or GCP.Familiarity with security frameworks like MITRE ATT&CK and NIST CSF.Experience with Linux administration, system hardening, and intrusion techniques.Preferred Skills Published... 

Focused HR Solutions

8-11 QA Tester Job at Focused HR Solutions

 ...required to come on site. Our client has an opening for a QA Tester 735026 This position is up to 12 months, with the option of...  ...conducting QA and testing of web applications Required 2 Years. ~ Experience assessing the testability of requirements and developing test...