šŸ“–Hugging Face’s 200-page guide to train your own models

🦾Plus: ā€OpenAI-AWS $38B Cloud deal šŸ’°

Hey folks! Let’s get into Big Data and AI craziness…

In today's edition: What's Shaping the Future of Data?

  • šŸ¤How software engineers-data scientists work together?

  • ⚔LangChain DeepAgents framework now in CLI

  • šŸ¤–AI Workflows that Agents Can Build and Run On-the-Fly

  • šŸ”Vectorless Vision-Based RAG - No OCR, No Database

  • ā¤ļø Facebook Dating Is a Surprise Hit for the Social Network

  • šŸ“Š New benchmark tests AI’s freelance automation

  • šŸ’” AI Tutorial:How to create realistic AI voices for your content

  • šŸ¤– AI Tools and Data Tools to checkout

Hugging Face just dropped their "Smol Training Playbook," a 200+ page deep dive into building their SmolLM3 model from scratch. The team documents the complete pipeline, pretraining, post-training, and infrastructure, sharing what worked, what failed, and how to keep training runs stable. Think of it as the field notes from training a competitive 3B parameter model, minus the usual vendor mystique. And it’s completely free.

Want to get the most out of ChatGPT?

ChatGPT is a superpower if you know how to use it correctly.

Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.

Both data scientists and engineers must be responsible for the issue and must try to solve the issue at any step of the work. Continuous communication ensures that possible discrepancies are recognized in the early stage.

LangChain just shipped DeepAgents CLI, bringing their DeepAgents framework straight to your terminal. Install with pip install deepagents-cli, and you get an agent that can edit files, run shell commands, search the web, and even remember information across sessions by writing memories locally to remember API patterns, project conventions, and context from previous conversations.

Declarative AI workflows you can read, write, and trust - like Dockerfile or SQL but for multi-step LLM pipelines. Pipelex gives you a DSL and Python runtime for repeatable AI workflows. You declare what happens at each step, any model or provider can run it.

DeepSeek OCR is powerful, but do you even need OCR models for RAG? PageIndex takes a different approach with vision-based RAG that mimics how humans actually read documents: reasoning over a hierarchical table-of-contents structure to identify relevant pages, then processing those pages as images with VLMs like GPT-4.1 for visual understanding and answer generation.

Q4 is the perfect window to turn this year’s numbers into a clear, actionable forecast aligned with your goals. Set your business up for a stronger 2026 with BELAY’s new guide.

šŸ‘Øā€šŸ’» Data Tools, Libraries

Newsbang: Your Lens on Emerging Trends

PostgreSQL Index Advisor is a PostgreSQL extension for recommending indexes to improve query performance.

pylyzer (GitHub Repo)

pylyzer is a static code analyzer and language server for Python.

AI News:

OpenAI just secured a seven-year, $38B agreement with Amazon Web Services for computing infrastructure, marking the company’s largest diversification away from Microsoft’s cloud services. The partnership grants OAI access to hundreds of thousands of Nvidia GPUs across AWS data centers, with deployment targeted for late 2026 completion.

Free, private email that puts your privacy first

Proton Mail’s free plan keeps your inbox private and secure—no ads, no data mining. Built by privacy experts, it gives you real protection with no strings attached.

Facebook Dating debuted in 2019. The feature lets people create a free dating profile and swipe and match with other users. It has more than 21 million daily users, making it one of the most popular online dating services. Facebook Dating shows how social networking is evolving into two broad categories: content and services.

AI cloud startup Lambda has signed a multi‑billion‑dollar agreement with Microsoft to deploy tens of hundreds of Nvidia GPUs across its infrastructure. The partnership will expand Microsoft’s use of high‑end AI chips through external providers, helping it meet surging demand for model training capacity without the delays of building data centers.

Brian Koo (grandson of LG Group's founder) has co-founded Utopai East, a 50-50 joint venture with Utopai Studios to build AI-powered film and TV production infrastructure. The partnership will produce content using existing infrastructure initially, with the first piece of content expected to launch next year, focusing on Korean creators and international IP expansion.

Scale AI and the Center for AI Safety published the Remote Labor Index, a new benchmark that tests AI models on real freelance projects, revealing that even the top systems complete less than 3% of tasks at professional human standards.

You shouldn’t be. Get paid up to 2 days early and make your money go further with 4% interest on savings,* up to $200 in free overdraft coverage,** and more.

AI Tutorial

How to create realistic AI voices for your content

  • Open Google AI Studio and select ā€˜Native Speech Generation.’

  • Pick your mode: Single-speaker for narrations or Multi-speaker for dialogues.

  • Write your script, adding style notes and choosing voices for each speaker.

  • Click ā€˜Run’ to generate the audio, then download it for your project.

šŸ”„Top AI tools to increase productivity: 

  1. YouBrief is a free AI tool designed to help users quickly extract summaries from YouTube videos

  2. VocalReplica is an AI-powered web-based tool that allows users to effortlessly isolate vocals

  3. HomeStage lets you upload a picture and our AI will add furniture within seconds.

  4. ChatMaxima is a Conversational Marketing SaaS platform that revolutionizes the way businesses connect with customers

  5. Wemate - Explore, craft, and communicate with the virtual companions of your dreams through Wemate.

  6. Forewrite - Craft and enhance various content forms, including images, code, and speech-to-text

  7. Editby - Create content for your blog, newspaper, newsletter, press notes, social networks etc. with AI.

  8. Data Analyst AI connects Google Analytics with ChatGPT, delivering AI-powered eCommerce insights and automated weekly reports.

View our database of all the best AI tools for your needs: aitoolsup.com

Have cool resources to share? Submit AI tool

A.I. Generated Image of the Day

šŸ‘€ Brutalist Utopia

AI Tools Up NewsletterReceive a weekly email with updates on new AI tools, helpful prompts, and the latest AI developments. Join over 15000 + professionals from Google, OpenAI, Notion, Apple, and more.

SPONSOR US

Get your product in front of Big Data & AI enthusiasts

Our newsletter is read by thousands of tech professionals, investors, engineers, managers, and business owners around the world.

Interested in Sponsoring the Big Data News Weekly Newsletter?Get in touch today

What did you think of today's email?

Your feedback helps me create better emails for you!

Login or Subscribe to participate in polls.