Our Projects

Explore our comprehensive portfolio of open-source contributions, including state-of-the-art models, massive datasets, and specialized tools for the AI research community.

SparkEmbedding-300m

Text EmbeddingsMultilingual

A state-of-the-art multilingual text embedding model by XenArcAI, fine-tuned from EmbeddingGemma. Optimized for cross-lingual retrieval, semantic search, and Matryoshka Representation Learning (MRL). Supports 119 languages with a 2048-token context window, designed for high-efficiency scalable deployment.

parameters0.3B
context2048 tokens
languages119
View on Hugging Face

MathX Dataset

Mathematical ReasoningTraining Data

A high-quality, synthetically curated, and meticulously filtered dataset designed for advanced mathematical reasoning and AI model training.

size5M
items1
View on Hugging Face

CodeX

Synthetic DataReasoningCode

A massive collection of pre-curated coding datasets by XenArcAI, featuring over 9 million samples. Includes 'CodeX-2M-Thinking' with step-by-step reasoning for instruction tuning and 'CodeX-7M-Non-Thinking' for raw pattern learning.

samples9.5M+
datasets2
size10GB+
View on Hugging Face

AIRealNet

AI DetectionComputer Vision

A binary image classifier designed to distinguish AI-generated images from real human photographs. Built on Microsoft's Swinv2 Tiny architecture for high accuracy and efficient deployment. Prioritizes privacy by excluding personal data from training.

downloads10k/month
parameters0.2B
View on Hugging Face
XenArcAI

Advancing AI research through innovation, ethics, and collaboration for a better future.

Research

  • Research Areas
  • Open Source
  • Blog

Company

  • About
  • Careers
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Licensing

Resources

  • Support

© 2025 XenArcAI. All rights reserved.

XenArcAI
ResearchProjectsBlog
Get In Touch