ml-intern

NeuroName: Domain-Specific AI Architecture for Creative Name Generation

License: MIT Python 3.9+ PyTorch

🧠 What is NeuroName?

NeuroName is a purpose-built neural architecture for generating creative, novel names for brands, YouTube channels, social media handles, products, and more. Unlike generic LLMs that produce obvious word combinations, NeuroName creates genuinely new words that:

  • Sound natural and pronounceable
  • Evoke intended meanings without being literal
  • Are controllable (length, style, language feel, energy)
  • Are truly novel β€” not existing words or obvious compounds

πŸ”¬ Why Current LLMs Fail at Creative Naming

Problem Why It Happens NeuroName Solution
Too generic LLMs predict probable tokens from training distribution Character-level VAE generates outside known distributions
Obvious combinations Token-level = existing word chunks Char-level latent space enables smooth morphological blending
No sound awareness No phonotactic model Dedicated Phonotactic Discriminator scores pronounceability
Can't be truly novel Constrained to recombine training tokens VAE latent interpolation creates genuinely new sequences
No fine control Prompt engineering is imprecise Energy-based composable attribute control in latent space
RLHF kills creativity Safety alignment β†’ conservative outputs No RLHF; creativity is the objective function

πŸ—οΈ Architecture Overview

Input: semantic_hints + control_params (length, style, language_feel, energy)
                    β”‚
                    β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Semantic Encoder          β”‚  ← Transformer encodes meaning hints
    β”‚   (attention-pooled)        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Conditional Prior         β”‚  ← P(z|semantics, controls) - Gaussian
    β”‚   Network (ΞΌ, Οƒ learned)    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό z ~ N(ΞΌ, σ²)
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Latent Space + EBM        β”‚  ← Energy-based attribute composition
    β”‚   (ODE-guided sampling)     β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Character Decoder         β”‚  ← Transformer generates char-by-char
    β”‚   (cross-attends to z)      β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Phonotactic Validator     β”‚  ← CNN+Transformer scores sound quality
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
         Generated Name: "Velocix" βœ“

🧬 Key Innovations

1. Character-Level VAE (not token-level)

Operates at individual characters, enabling creation of genuinely novel sequences impossible with subword tokenizers.

2. Phonotactic Discriminator

Learned model of sound combinations (bigrams, trigrams, syllable structure) based on the Bouba-Kiki Effect and cross-linguistic phonotactics. Ensures outputs are pronounceable and pleasant-sounding.

3. Morphological Composition Module

Explicit linguistic word-formation operations as differentiable modules:

  • Blending: "breakfast + lunch β†’ brunch" style merging
  • Affixation: Meaningful prefix/suffix attachment
  • Vowel Harmony: Sound shifting for cohesion
  • Clipping + Extension: Shortening with style

4. Energy-Based Composable Control

Multiple attributes (style, length, language feel) composed via energy functions in latent space. Mathematically principled β€” not prompt hacking.

5. Sound Symbolism Integration

Phoneme-meaning associations baked into the architecture:

  • Plosives (b, d, k, t): Power, strength β†’ "Kodak", "TikTok"
  • Fricatives (f, s, sh, v): Speed, elegance β†’ "Swift", "Visa"
  • Nasals (m, n): Warmth, comfort β†’ "Amazon", "Nintendo"
  • Close vowels (i, e): Precision, tech β†’ "Google", "Pixel"

πŸ“¦ Installation

pip install torch numpy pyyaml tqdm
git clone https://huggingface.co/asdf98/neuroname
cd neuroname
pip install -e .

πŸš€ Quick Start

from neuroname import NeuroNameGenerator

# Initialize generator
generator = NeuroNameGenerator()

# Generate brand names with semantic hints
names = generator.generate(
    semantic_hints=["speed", "technology", "future"],
    style="modern",        # modern/classic/playful/techy/organic/elegant/bold/minimal
    language_feel="latin", # english/latin/greek/japanese/nordic/spanish/french/abstract
    energy="energetic",    # calm/neutral/energetic
    length_range=(5, 8),
    num_names=10,
    temperature=0.8
)
print(names)
# ['Velocix', 'Tervon', 'Nexura', 'Fluxen', 'Zyphos', ...]

# Generate YouTube channel names
names = generator.generate(
    semantic_hints=["gaming", "adventure", "epic"],
    style="playful",
    language_feel="english",
    energy="energetic",
    length_range=(6, 12),
    num_names=10
)

# Generate social media handles
names = generator.generate(
    semantic_hints=["art", "minimal", "aesthetic"],
    style="elegant",
    language_feel="french",
    energy="calm",
    length_range=(4, 8),
    num_names=10
)

πŸ‹οΈ Training

# Train from scratch
python train.py --config configs/default.yaml

# Train with custom data
python train.py --data_path your_names.txt --epochs 100

πŸ“ Repository Structure

neuroname/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ pyproject.toml              # Package configuration
β”œβ”€β”€ neuroname/
β”‚   β”œβ”€β”€ __init__.py             # Package exports
β”‚   β”œβ”€β”€ model.py                # Core architecture (VAE + all components)
β”‚   β”œβ”€β”€ generator.py            # High-level generation interface
β”‚   β”œβ”€β”€ phonotactics.py         # Phonotactic scoring & sound symbolism
β”‚   β”œβ”€β”€ morphology.py           # Morphological composition operations
β”‚   β”œβ”€β”€ latent_ops.py           # Energy-based latent space control
β”‚   β”œβ”€β”€ data.py                 # Dataset & data loading utilities
β”‚   └── config.py               # Configuration management
β”œβ”€β”€ train.py                    # Training script
β”œβ”€β”€ configs/
β”‚   └── default.yaml            # Default training configuration
└── notebooks/
    └── demo.ipynb              # Interactive demonstration

πŸ“Š Sound Symbolism Research Basis

Our architecture is grounded in linguistic research on sound-meaning associations:

Phoneme Type Associations Example Brands
Voiced plosives (b, g, d) Strong, bold, grounded Bose, Google, Dell
Voiceless plosives (p, t, k) Sharp, precise, clean Paypal, Tesla, Kodak
Fricatives (f, v, s, z) Fast, flowing, futuristic Visa, Zara, Spotify
Nasals (m, n) Warm, nurturing, smooth aMazon, Nintendo
Liquids (l, r) Fluid, dynamic, premium Lexus, Rolex
High vowels (i, ee) Small, quick, technical Pixel, Wii
Low vowels (a, o) Big, open, powerful Apple, Volvo

πŸ”§ Technical Details

  • Model Size: ~15M parameters (intentionally small β€” domain-specific, not general)
  • Latent Dimension: 128
  • Character Vocabulary: 44 chars (lowercase + digits + special)
  • Max Name Length: 32 characters
  • Training: ELBO loss + phonotactic reward + attribute classification

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

Architecture inspired by:

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "asdf98/neuroname"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for asdf98/neuroname