# NOTICE — тенгрИИ / tengrAI

This product (тенгрИИ / tengrAI, hereafter "the Product") includes third-party software and data covered by their own licenses. Their authors are credited below.

The Product itself remains the property of Roman Polyakov. Brand marks "тенгрИИ", "tengrAI", their logos and trade dress are trademarks of Roman Polyakov; their use is governed separately and is **not** granted by the licenses below.

---

## Speech models / TTS

### XTTS-v2 (base model & training framework)
- **Source:** [Coqui-AI / XTTS](https://github.com/coqui-ai/TTS)
- **License:** Mozilla Public License 2.0 (MPL-2.0) — Coqui Public Model License
- **Use:** base architecture, conditioning latents, vocoder. Fine-tuned by Roman Polyakov on Kazakh-language speech corpus (Phase 1 + Phase 2, May 2026).
- **Notice:** any modifications to MPL-licensed source files are made available on request.

### KSC2 — Kazakh Speech Corpus 2
- **Source:** ISSAI, Nazarbayev University — [github.com/IS2AI/Kazakh_TTS](https://github.com/IS2AI/Kazakh_TTS)
- **License:** MIT
- **Use:** training data for Whisper-KK fine-tune (ASR) and XTTS-v0.10 KK Phase 1 (TTS).
- **Citation:** Mussakhojayeva et al., "KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset", Interspeech 2022.

### KazakhTTS (Mussakhojayeva et al.)
- **Source:** ISSAI — [github.com/IS2AI/KazakhTTS](https://github.com/IS2AI/KazakhTTS)
- **License:** Creative Commons Attribution 4.0 International (CC BY 4.0)
- **Use:** training data for XTTS-v0.10 KK Phase 1+2 (TTS).
- **Citation:** Mussakhojayeva et al., 2022. As per CC BY 4.0, modifications were made (resampling 22→24kHz, segmentation, normalization).

---

## ASR

### Whisper Large v3 Turbo
- **Source:** OpenAI — [github.com/openai/whisper](https://github.com/openai/whisper)
- **License:** MIT
- **Use:** base ASR model, fine-tuned with PEFT/LoRA adapter on Kazakh corpus (Whisper-KK).

---

## Language model base

### Qwen2.5-7B
- **Source:** Alibaba Cloud — [huggingface.co/Qwen](https://huggingface.co/Qwen)
- **License:** Apache License 2.0
- **Use:** base architecture for тенгрИИ language model. Fine-tuned by Roman Polyakov on Kazakh corpora and instruction data.

---

## Frameworks & runtime

- **Astro** (web framework) — MIT License — [astro.build](https://astro.build)
- **Cloudflare Pages / Workers** (hosting) — proprietary platform
- **PyTorch** — modified BSD-style license — [pytorch.org](https://pytorch.org)
- **Transformers / PEFT / Datasets** (Hugging Face) — Apache 2.0 — [huggingface.co](https://huggingface.co)
- **llama.cpp / Ollama** (inference) — MIT License
- **FastAPI / uvicorn** — MIT License

---

## Data sources (corpora)

- **Wikipedia (KK)** — CC BY-SA 3.0 — used for continued pretraining.
- **Common Crawl C4 (kk)** — ODC-By 1.0 — used for continued pretraining.
- **HPLT v2 (kk)** — CC0 1.0 — used for continued pretraining.
- **adilet.zan.kz (Government of Kazakhstan legal acts)** — public domain (РК закон №419-V "О правовых актах", ст. 36) — used for legal layer.

---

## Trademarks

The following are trademarks of Roman Polyakov:
- **тенгрИИ** (KK/RU)
- **tengrAI** (EN)
- their associated logos and trade dress.

Pending registration with Казпатент (Kazakhstan, May 2026) and WIPO Madrid System (Q3 2026).

---

## Contact

For license inquiries, attribution corrections, or to report a missing notice:
- **Email:** legal@tengrai.ai
- **Website:** [tengrai.ai](https://tengrai.ai)

---

*Last updated: 2026-05-04*
*This NOTICE is provided in addition to, and does not supersede, the individual license texts of each component.*
