🤖Building the Full Data Layer for AI Applications

🦾Plus: 📜 OpenAI’s new ‘social contract’ ideas for society, ASI

In partnership with

Hey folks! Let’s get into Big Data and AI craziness…

In today's edition: Anthropic's revenue is spiking hard. The New Yorker surfaced the Altman firing memos. And there are now 200+ free datasets for every ML project you'll ever need. 👇

  • 📦200+ Free Datasets for Data Science, Machine learning, AI, NLP

  • ⚡Cursor doubles AI coding speed on Blackwell GPUs

  • 🤖Google prepares Jules V2 agent capable of taking bigger tasks

  • 🔬How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines

  • 📜 OpenAI’s new ‘social contract’ ideas for society, ASI

  • 🚀 Wang's first Meta models getting ready to ship

  • 💡 AI Tutorial:How to run Google's Gemma 4 model locally on your phone

  • 🤖AI Tools and Data Tools to checkout

In this article, you will learn why production AI applications need both a vector database for semantic retrieval and a relational database for structured, transactional workloads. If you look at the architecture diagram of almost any AI startup today, you will see a large language model (LLM) connected to a vector store.

AI Agents Are Reading Your Docs. Are You Ready?

Last month, 48% of visitors to documentation sites across Mintlify were AI agents—not humans.

Claude Code, Cursor, and other coding agents are becoming the actual customers reading your docs. And they read everything.

This changes what good documentation means. Humans skim and forgive gaps. Agents methodically check every endpoint, read every guide, and compare you against alternatives with zero fatigue.

Your docs aren't just helping users anymore—they're your product's first interview with the machines deciding whether to recommend you.

That means:
→ Clear schema markup so agents can parse your content
→ Real benchmarks, not marketing fluff
→ Open endpoints agents can actually test
→ Honest comparisons that emphasize strengths without hype

In the agentic world, documentation becomes 10x more important. Companies that make their products machine-understandable will win distribution through AI.

Presented below are datasets spanning a wide spectrum, catering to domains such as Data Science, Machine Learning, AI, NLP, Data Analysis, Analytics, Education, Computer Vision, Pricing Optimization, Classification, and Pre-Trained Models.

Traditional MoE models waste processing time managing data around individual experts during single-token generation. By assigning computing power directly to outputs instead, warp decode skips these extra steps, making inference 1.8x faster and more accurate.

Google appears to be building the next generation of its Jules coding agent, internally referenced as “Jitro,” which could represent a fundamental rethinking of how developers work with AI-powered software engineering tools. While the current Jules experiment has seen little visible progress in recent months

Meta's AI agents weren't making useful edits quickly enough when pointed at one of the company's large-scale data processing pipelines. The company fixed this by building a pre-compute engine consisting of a swarm of over 50 specialized AI agents that systematically read every file to produce context files that encode the tribal knowledge that previously lived only in engineers' heads.

👨‍💻 Data Tools, Libraries

Spin (GitHub Repo)

Spin is a bash utility that improves the Docker experience. It can replicate any environment on any machine and centralize infrastructure from a single configuration file.

AI Gateway (GitHub Repo)

AI Gateway is an interface between apps and hosted large language models. It streamlines API requests to LLM providers using a unified API.

AI Toolkit (GitHub Repo)

AI Toolkit is a header-only C++ library that brings finite state machines, behavior trees, utility AI, and goal-oriented action planning to game NPCs.

AI News:

OpenAI just published a 13-page policy document with ideas to help society navigate superintelligence and its societal impacts, asking Washington to tax AI-driven profits, create a wealth fund, implement a 4-day workweek, and more. The proposal said we are “beginning a transition toward superintelligence”, with Altman telling Axios the moment requires a new “social contract” for society.

Your AI is resolving tickets. Is it keeping customers?

Resolution rates look great. But Gladly's 2026 Customer Expectations Report reveals the metric most CIOs are missing — and what the data says about where AI investments actually translate into retention, not just throughput.

Anthropic's annual revenue run-rate has spiked from roughly $9 billion at the end of 2025 to more than $30 billion. Fewer than 135 S&P companies booked at least $30 billion in sales in the past 12 months. OpenAI's annual revenue run-rate is around $24 billion. Anthropic recently announced an expansion of its partnership with Google and Broadcom.

The New Yorker published an investigation into Sam Altman, drawing on 100+ interviews, unseen memos from ex–chief scientist Ilya Sutskever, and notes from Dario Amodei — alleging a long-running pattern of deception at the top of OpenAI. The reporting spans Altman's full career arc, including conflicts at his startup Loopt, Y Combinator partners trying to push him out, and the OAI board drama.

Google launched its own dictation app, “Google AI Edge Eloquent” on iOS. Akin to SuperWhisper, the free app allows you to dictate on your phone once its Gemma-based automatic speech recognition (ASR) models are downloaded. You can see the live transcription in the app, and you can also turn off the cloud mode to use local-only processing.

Meta is set to release the first AI models developed by Alexandr Wang’s Superintelligence team, with Axios reporting the company will make some of them available as open source — though the largest models will reportedly stay closed. Meta and Wang’s codenamed ‘Avocado’ model was delayed in March over benchmark performances that fell short of rival models across the board.

How Will You Generate Retirement Income?

Most people with $1,000,000 or more saved have a number. Fewer have a plan for turning it into reliable income. Fisher Investments' Definitive Guide to Retirement Income helps you calculate future costs and build a portfolio strategy around them.

AI Tutorial

How to run Google's Gemma 4 model locally on your phone

  1. Download the Google AI Edge Gallery from the Google Play Store, App Store, or install the APK from the latest release on GitHub.

  2. Open Google AI Edge Gallery, tap on AI Chat, and download Google's Gemma 4 model E2B or E4B. (Pick the variant that best fits your device. The heaviest model isn't always the best choice.)

You can also click on the “+” at the bottom to import your own models.

Once downloaded, you're ready to run everything locally on your phone. No internet connection needed.

  1. Select AI Chat to start a conversation, or explore the other available features:

  • Agent Skills: transforms your LLM from a conversationalist into a proactive assistant.

  • Ask Image: use multimodal power to identify objects, solve visual puzzles, or get detailed descriptions using your camera or photo gallery.

  • Audio Scribe: transcribe and translate voice recordings into text in real time.

🔥Top AI tools to increase productivity: 

  1. Zoice is the single platform for every creator. Transcribe, generate, and animate your content.

  2. Floowed is a flexible, no-code AI credit workflow automation platform

  3. BookSwift is a modern appointment booking platform for providers

  4. Marketsy.ai provides a smart e-commerce experience supported by a powerful admin panel.

  5. StrideFuel - Built for weight loss success—especially GLP-1 users

  6. WorldEngen is an AI copilot for 3D production that helps professional teams

  7. AppWizzy is an AI tool that helps you build and host full-stack web applications

  8. SongGuru.AI: An AI-Based Music Creation and Audio Processing Platform

View our database of all the best AI tools for your needs: aitoolsup.com

Have cool resources to share? Submit AI tool

A.I. Generated Image of the Day

👀 The Warrior Princess

AI Tools Up NewsletterReceive a weekly email with updates on new AI tools, helpful prompts, and the latest AI developments. Join over 20000 + professionals from Google, OpenAI, Notion, Apple, and more.

SPONSOR US

Get your product in front of Big Data & AI enthusiasts

Our newsletter is read by thousands of tech professionals, investors, engineers, managers, and business owners around the world.

Interested in Sponsoring the Big Data News Weekly Newsletter?Get in touch today

What did you think of today's email?

Your feedback helps me create better emails for you!

Login or Subscribe to participate in polls.