RMMD/data&llm engineer
Ricardo M. Maldonado — Mathematician · Mexico City (remote)

I build the data and retrieval layer that makes AI systems fast — and correct.

Five years turning messy corporate data into production applications. Now extending that craft into LLM agents and RAG.

See selected work Get in touch
+60M rows processed · 5+ production apps · government-deployed platform
01Approach

Performance is a feature, not an afterthought.

I come from mathematics, and I treat data systems the same way: find the cheapest correct path. Most of my work runs on constrained hardware — on-prem servers, a single VPS — so I lean on columnar formats, lazy evaluation and push-down filtering to move only the data a query actually needs.

That same discipline carries into LLM work. An agent is only as good as the layer feeding it: a well-modeled schema, fast retrieval, and deterministic guards around a non-deterministic model. I build the whole path — extraction, transformation, and the app people actually use.

02The stack

Tools I reach for.

LLM & agents

LangGraphText-to-SQLPrompt eng.Eval suitesOpenAI · GeminiOllama / Llama

RAG & retrieval

PostgreSQL + pgvectorEmbeddingsDuckDBPolarsArrow · Parquet

Python & backend

FastAPIPydantic v2REST APIsPlaywrightasync

Data & ops

PostgreSQL / PostGISDatabricks · PySparkDocker · TraefikLinux VPSGit
03Selected work

Things I've shipped.

LLM

Electoral Analyst Bot

Conversational Text-to-SQL agent · personal, open-source

A LangGraph agent that answers natural-language questions about Mexican elections (2018–2024). Intent router, LLM SQL generation, semantic validation guards, an iterative corrector, and an evaluation suite over a PostgreSQL star schema (~1.2M rows). The guards reject incorrect SQL before it runs — deterministic quality control on top of an LLM.

LangGraphPostgreSQLPydanticPolarsDocker
Building phase 2 — Telegram + VPS deploy
DATA

Creer Para Ver — BI platform

End-to-end analytics · Natura&Co México

Automated ETL from Databricks SQL and a corporate portal (Playwright RPA), Polars LazyFrames with predicate push-down over Apache Parquet, and an interactive dashboard. +60M records. Cut report generation from ~45 minutes to under 3 seconds.

PythonPolarsParquetPlaywrightShiny
GOV

Child Labour Data Explorer

ILO → hosted by the Mexican government (STPS)

Interactive analysis of child-labour data aligned to SDG 8.7: a logistic-regression risk model, multi-level Leaflet maps, downloadable tables. Deployed on a Linux VPS with full documentation and a vulnerability assessment. Now hosted at trabajoinfantil.stps.gob.mx.

RShinyLeafletLogistic reg.VPS
trabajoinfantil.stps.gob.mx →
OSS

Prepara tus Datos

Open-source · no-code data prep

A no-code web app for cleaning and transforming data, built for NGOs and non-technical users: cleaning, pivot, filter, group, and fuzzy joins. MIT-licensed, modular, deployed with Docker/Traefik.

RShinyMITDocker

Also: a normalized child-risk index built for UNICEF from INEGI microdata, and a peer-reviewed psychometric validation using Kendall's W and Fleiss' Kappa (CISETC 2023).

04Contact

Open to remote roles in LLM/AI and data engineering.

Building production systems for mid-sized teams — agents, RAG, and the data layer underneath. If that's the kind of problem you have, let's talk.