Celebrating Homai - Using AI for Good

Our colleague Aigiz Kunafin has achieved an outstanding milestone - importance of his side-project Homai was acknowledged by the “AI for Good” Initiative of United Nations. It won the final round of AIFG Innovation Incubator on Malta which was run by Mellifera under umbrella of International Telecommunication Union (ITO).

AI for Good is an increasingly important initiative. It promotes AI projects that help to make the world a better place by focusing on Sustainable Development Goals of UN.

Homai uses AI to address four of these goals

In this blog post we will talk about two things:

  1. What is Homai and how it uses AI to make the world a better place?
  2. How is this project implemented behind the scenes? What do RAG, LLMs and Fine-tuning have to do with it?

What is Homai?

Homai is a platform designed to preserve endangered languages and cultures using AI.

Homai started simply. Aigiz is a native speaker of Bashkir, a Turkic language with fewer than 750,000 native speakers and one among many endangered languages worldwide.

Aigiz aimed to preserve the sound of his language and beauty of rich culture, protecting it from extinction. He wanted his children to have the chance to hear and practice Bashkir every day.

Along the way, he sought to provide all Bashkir children with the opportunity to hear fairy tales, songs, and converse in their native tongue.

Ultimately, Homai expanded its mission: offering the same opportunity - a chance for cultural survival in the digital age - to all children and endangered languages across the globe.

This might sound impossible. Thankfully, modern technology - AI - provides the necessary tools to make this vision a reality. Interestingly, this is the same "AI" that some view as dangerous or threatening.

Here's how Homai works:

At its core is a smart speaker. This is how it appears when placed in schools and kindergartens:

Children can talk to this smart speaker in their native language, asking questions, listening to songs and fairy tales in their native language.

Obviously, when asked a cultural question, this smart speaker won't simply rely on ChatGPT (OpenAI knows nothing about local villages or poets) or attempt to find a non-existent Wikipedia page. Instead, it will leverage its own curated knowledge base to provide relevant facts or tell a fairy tale.

Saving one language is not nearly enough. On average, a language disappears every two weeks. Linguists estimate that nearly half of the world's approximately 7,000 languages could become extinct within the next century.

To address this, Aigiz and his team are creating a comprehensive platform designed to digitize and preserve multiple languages simultaneously. This ecosystem aims to support low-resource languages by providing:

  1. A cost-effective and robust smart speaker device.

  2. Smart speaker backend infrastructure capable of hosting various languages and cultural content.

  3. A streamlined onboarding process for new languages onto this platform.

  4. A dedicated system for capturing and organizing custom cultural knowledge.

  5. Dataset collection toolkit - tools, software, processes and guidance

  6. Assistance for training or tuning Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Large Language Models (LLMs).

Digitizing even a single language demands considerable effort but is now achievable by small, passionate teams - something that previously required extensive resources from large linguistic institutions. Modern technologies, artificial intelligence, and powerful foundational models have made this possible.

This process can move forward with multiple languages in parallel, and so it does:

Homai integrates all these elements together - from the smart speaker design to empowering linguistic communities to digitize their own language and culture, making them accessible to children at home and in kindergartens.

How does Homai work behind the scenes?

You have seen a nice speaker photo above.

Aigiz and his team of enthusiasts remember this smart speaker from a different perspectives throughout the years of evolution:

The smart speaker device is crucial for adoption - it has to be cost-effective, powerful and sturdy enough. Only then it could be placed in many families and kindergartens for the children to talk to.

Under the hood, it's essentially a simple PCB powered by an ESP32-S3 microcontroller, complete with a microphone, a few diodes, and buttons. The ESP32-S3 is an inexpensive, compact microcontroller typically favored by hobbyists for creating remote temperature sensors and IoT automation.

It has only a tiny amount of memory (just 512 KB of RAM for both data and instructions, 8 MB of PSRAM and 16MB of storage) and only two cores.

Despite these limitations, Aigiz and his team have successfully transformed this into a functional smart Wi-Fi-connected speaker. This device runs:

  • A small, specialized wake-word detection model (a machine learning model trained to detect a specific word or phrase).

  • An audio processing pipeline.

  • A streaming client to maintain continuous communication with the “brains” of the system.

The software itself is native C++ code developed using the ESP-IDF framework, with firmware updates deployed over-the-air (OTA).

Homai Server is a more traditional application. At the heart of it is a an event-driven coordination server written in golang. It maintains connections with all the devices via web sockets and manages their state through various stages like: “listening”, “running ASR” or “sending response”.

This backend is also responsible for resiliency in face of failure, authentication, and managing jobs for the machine learning models and external services.

If the backend is a heart, then GPU servers running custom LLM models are the muscles.

There are custom fine-tuned machine learning models per language. Speech recognition is currently based on Wav2Vec2.0 Bert, while text-to-speech runs on VITS (conditional variational autoencoder with adversarial learning). However, these architectures are a current implementation detail. They can change rapidly, tracking current state of the art in linguistics.

These GPU servers run python-based agents. They continuously pull jobs from the backend server, run them through the GPU pipelines and push results back to the server. GPU servers can be run in parallel, to provide redundancy and load balancing.

There also is a special type of the server that is responsible for the cultural intelligence and overall conversations - agentic. This server is also written in Python. It keeps track of conversations, detects intents and integrates with third party information sources.

For example, when Homai is used in classes, teachers would frequently prepare class notes and exercises and upload them to their profile via a special website for the pedagogues. Then associated Homai device would be able to refer to the data and exercises, when teachers mention these during the class. Agentic server implements all the required functionality for that, along with managing other language-specific bits of knowledge:

  • - fairy tales and songs
    - skills and integrations
    - cultural knowledge base

As you could’ve already guessed, this part is implemented as a specialised advanced RAG system. It uses patterns and practices similar to the ones discussed in Enterprise RAG Challenge (How I Won the Enterprise RAG Challenge).

Technology-wise, the project uses Nix and NixOS to manage multiple different deployments (and deployment stages) and connect them via a private secure network. In addition to the servers mentioned above, there also are component for observability and logging, SSL termination, serving content and APIs, managing firmware updates.

This would not be possible without modern AI

AI gets no credit on this team slide

Advancements in AI, particularly through open-source research and the release of powerful, multimodal language models, have made it technically viable to capture speech and preserve cultural heritage effectively. Collaborative efforts from linguists worldwide have further lowered the barriers to training custom speech recognition and text-to-speech models, enabling even individual researchers to accomplish this.

However, speech recognition and generation alone are insufficient for cultural preservation; a smart assistant requires intelligence and cultural insight. Recent breakthroughs in LLMs, advanced RAG, and reasoning architectures helped here.

LLM Benchmarks , Enerprise RAG Challenge and our insights in AI Cases contributed to this progress. They helped to make efficient design decisions and, in turn, have drawn inspiration from the successes of Homai. The power of international collaboration between talented teams became a source of inspiration and motivation for “AI Strategy & Research Hub” at TIMETOACT GROUP Austria. Its purpose is to coordinate practical AI R&D in the community and push forward State-of-the-Art together.

The process of language preservation is a community effort. It has to be structured and organised into a repeatable process. AI helped here as well. AI coding was used to quickly build numerous tools and interfaces. Including volunteer-oriented websites for compiling cultural knowledge and chatbots designed for recording and verifying audio samples. Anthropic Claude emerged as the most frequently utilized AI tool in these workflows, along with a dash of ChatGPT o1 pro for the most challenging tasks.

LLM-driven processes also helped with dataset preparation and cleanup at scale.

To amplify the human effort even further, AI was used to design and develop the most complex parts of system: backend orchestrator in go, python agents and the smart speaker firmware.

Long story short, AI was pretty helpful. It allowed to achieve a lot.

At TIMETOACT GROUP Austria, we are taking these lessons to heart in our “Embrace AI” initiative - helping to spread patterns and practices of AI in coding among all the peers, while providing them with all the required tools and support.

AI, just like any tool, is only as good as the person who handles it. The Homai project was made possible because of Aigiz's years of dedicated work, continuous learning, and passion for creatively applying AI to the problems worth solving. We are honored (and also are in a bit of awe) to follow our talented colleague on this amazing journey and learn from him.

AI For Good

These days, we often hear concerns about AI’s disruptive impact on society and modern culture. Hopefully, this blog post has provided you with a different, more optimistic perspective - one where AI is indispensable for preserving disappearing languages and cultures.

If you have a worthy problem where AI can also be used to make the world a better place at scale - don’t hesitate to reach us at TIMETOACT GROUP Austria. We’d love to hear more from you!

Workshop

AI Workshops for Companies

Whether it's the basics of AI, prompt engineering, or potential scouting: our diverse AI workshop offerings provide the right content for every need.

Blog 7/22/24

Let's build an Enterprise AI Assistant

Let’s take the basic principles of building AI assistants for a spin with a product case that we worked on: using AI to support enterprise sales pipeline.

Blog 7/22/24

So You are Building an AI Assistant?

So you are building an AI assistant for the business? This is a popular topic in the companies these days. Everybody seems to be doing that. While running AI Research in the last months, I have discovered that many companies in the USA and Europe are building some sort of AI assistant these days, mostly around enterprise workflow automation and knowledge bases. There are common patterns in how such projects work most of the time. So let me tell you a story...

Blog 10/10/22

Celebrating achievements

Our active memory can be like a cache of recently used data; fresh ideas & frustrations supersede older ones. That's why celebrating achievements is key for your success.

Blog 4/16/24

The Intersection of AI and Voice Manipulation

The advent of Artificial Intelligence (AI) in text-to-speech (TTS) technologies has revolutionized the way we interact with written content. Natural Readers, standing at the forefront of this innovation, offers a comprehensive suite of features designed to cater to a broad spectrum of needs, from personal leisure to educational support and commercial use. As we delve into the capabilities of Natural Readers, it's crucial to explore both the advantages it brings to the table and the ethical considerations surrounding voice manipulation in TTS technologies.

Blog 11/12/24

ChatGPT & Co: LLM Benchmarks for October

Find out which large language models outperformed in the October 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 12/4/24

ChatGPT & Co: LLM Benchmarks for November

Find out which large language models outperformed in the November 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 1/7/25

ChatGPT & Co: LLM Benchmarks for December

Find out which large language models outperformed in the December 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 10/1/24

ChatGPT & Co: LLM Benchmarks for September

Find out which large language models outperformed in the September 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 2/3/25

ChatGPT & Co: LLM Benchmarks for January

Find out which large language models outperformed in the January 2025 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 5/17/24

8 tips for developing AI assistants

8 practical tips for implementing AI assistants

Blog 10/30/24

Second Place - AIM Hackathon 2024: Trustpilot for ESG

The NightWalkers designed a scalable tool that assigns trustworthiness scores based on various types of greenwashing indicators, including unsupported claims and inaccurate data.

Blog 3/17/22

Using NLP libraries for post-processing

Learn how to analyse sticky notes in miro from event stormings and how this analysis can be carried out with the help of the spaCy library.

Blog 1/21/25

AI Contest - Enterprise RAG Challenge

TIMETOACT GROUP Austria demonstrates how RAG technologies can revolutionize processes with the Enterprise RAG Challenge.

Blog 11/5/24

AIM Hackathon 2024: Sustainability Meets LLMs

Focusing on impactful AI applications, participants addressed key issues like greenwashing detection, ESG report relevance mapping, and compliance with the European Green Deal.

Blog 11/4/24

SAM Wins First Prize at AIM Hackathon

The winning team of the AIM Hackathon, nexus. Group AI, developed SAM, an AI-powered ESG reporting platform designed to help companies streamline their sustainability compliance.

Blog 3/11/25

Answering Business Questions with LLMs

8th place in Enterprise RAG Challenge 2025: Answering Business Questions with LLMs

Blog 10/29/24

Third Place - AIM Hackathon 2024: The Venturers

ESG reports are often filled with vague statements, obscuring key facts investors need. This team created an AI prototype that analyzes these reports sentence-by-sentence, categorizing content to produce a "relevance map".

Blog 10/4/24

Open-sourcing 4 solutions from the Enterprise RAG Challenge

Our RAG competition is a friendly challenge different AI Assistants competed in answering questions based on the annual reports of public companies.

Blog 11/24/23

Part 3: How to Analyze a Database File with GPT-3.5

In this blog, we'll explore the proper usage of data analysis with ChatGPT and how you can analyze and visualize data from a SQLite database to help you make the most of your data.