How long do you test each AI companion platform?

We spend a minimum of 7 days actively testing each platform. This includes daily conversations, testing all features, and evaluating both free and premium tiers. Some complex platforms get 10-14 days of testing.

Do affiliate commissions affect your ratings?

No. Our ratings are based solely on testing results. Whether a platform pays us a high commission, a low one, or nothing at all, our ranking is identical. Some platforms in our reviews have no affiliate program; they get the same treatment as the ones that pay us. Our reputation depends on honest recommendations, and bad platforms get bad reviews regardless of commission rates.

How often do you update your reviews?

We re-test top platforms every 3 months and update reviews whenever significant changes occur (pricing changes, new features, policy updates). Community scores refresh every two weeks (1st and 15th of each month). Every article shows its last update date.

What makes your testing different from other review sites?

Two things. First, we spend 7+ days with each platform testing premium features with real money, not 30 minutes. Second, we are the only review site that combines editor testing with community sentiment analysis. Every two weeks we analyze hundreds of Reddit threads, YouTube reviews, and Google Play ratings to generate Community Scores alongside our Editor Scores. This dual-score system catches issues that short-term testing misses.

Do you accept payment for reviews?

No. We never accept payment for reviews or allow platforms to influence our ratings. Our only revenue comes from affiliate commissions when readers click our links and subscribe - which only happens if our recommendations are trustworthy.

How do I cite your community pulse data?

Use the analysis date shown on the relevant review page or the dated snapshot URL for the dataset. Cite as: "AI Companion Picker, Community Pulse Methodology v1.1, [data as of YYYY-MM-DD]." Methodology version plus the data-as-of date together specify exactly what was true at citation time, even if the methodology revises later.

Can I get the raw data or audit your methodology?

The classifier prompt is published in full on this page (see the collapsible block under How We Analyze). Source data — Reddit threads, YouTube videos, Google Play reviews — is public and can be retrieved using the parameters listed in the methodology section. For deeper audit access (collector scripts, sample input/output pairs, classifier prompt revisions), email contact@aicompanionpicker.com — we can share with researchers and journalists on request.

About Us & How We Rate AI Companions - Dual-Score Methodology

Why This Site Exists

The AI companion space is exploding, but finding honest information is hard. Most "reviews" are thinly veiled ads. We started AI Companion Picker to provide genuine, experience-based recommendations.

But we realized that even honest editor reviews only tell part of the story. A platform might test well over 7 days but frustrate users after months of use. That's why we built something no other review site has: a dual-score system that combines our hands-on editor testing with community sentiment aggregated from Reddit, YouTube, and Google Play every month. See all community ratings.

Meet the Team

Nolan Voss

Lead Editor & AI Companion Reviewer

I started testing AI companions out of curiosity and quickly realized how hard it was to find trustworthy information. Most sites either push whatever pays the most commission or are so vague they're useless. I built this site to share what I actually learned through real testing.

200+ hours testing

7 platforms reviewed

Nadia Laurent

AI Companion Reviewer

I've spent weeks getting to know the most popular AI boyfriend apps — the charming ones, the awkward ones, and everything in between. My reviews are honest, personal, and judgment-free.

100+ hours testing

6 platforms reviewed

Our Dual-Score System

Every platform on AI Companion Picker receives two independent ratings: an Editor Score from hands-on testing and a Community Score aggregated from Reddit, YouTube, and Google Play.

Why Two Scores?

A 7-day editor test shows how a platform performs under ideal conditions. But real users deal with billing issues, long-term memory degradation, customer support wait times, and pricing changes that only surface over months. Our dual-score system catches what short-term testing misses.

The Editor Score is explained first below. The Community Score methodology — data sources, classifier, confidence ratings, sample sizes, limitations — is documented in detail in Community Pulse Scoring further down this page.

The divergence tells the story. When our editor score and community score are close, you can be confident in both. When they disagree, pay attention to why. For example, Candy AI scores 4.5 from editors but 3.2 from the community. The gap? Pricing transparency and conversation depth over time. That's information you can't get from either score alone. See all dual scores.

Editor Testing: Why 7 Days Minimum

We spend a minimum of 7 days testing each platform. This isn't a quick 30-minute trial - we use each platform daily to understand how it truly performs over time.

Why 7 days? Because first impressions can be misleading. Some platforms impress initially but become repetitive after a few days. Others start slow but reveal depth over time. A week of daily use reveals the true experience.

Our Daily Testing Routine

Morning: Continue existing conversations, test context retention
Afternoon: Try new scenarios, test specific features
Evening: Document observations, note any issues

Our Scoring System: 4 Categories, 100 Points

Each platform receives a score out of 100, broken down into four weighted categories. Here's exactly how we calculate ratings:

Category	Weight	What We Measure
Conversation Quality	30%	Natural dialogue, context retention, character consistency
Value for Money	25%	Free tier, pricing fairness, hidden costs
Features	25%	Image generation, customization, voice, memory
Ease of Use	20%	Onboarding, interface, mobile experience

Category 1: Conversation Quality (30%)

This is the most important category because it's the core product. An AI companion that can't hold a good conversation fails at its primary purpose.

What We Test

Natural dialogue flow: Does it feel like talking to a character, or a chatbot? We test with varied conversation styles - casual chat, emotional support, creative roleplay.
Context retention: Does it remember what you discussed yesterday? Last week? We deliberately reference past conversations to test memory.
Character consistency: Does the AI stay in character? We test by trying to "break" the character with contradictory prompts.
Response variety: Does it give the same responses repeatedly, or genuinely varied answers? Repetition is a major quality killer.
Emotional intelligence: Can it detect and respond appropriately to emotional cues? We test with both positive and challenging scenarios.

Scoring Rubric

27-30 points: Exceptional - conversations feel genuinely engaging, excellent memory, zero repetition
22-26 points: Good - natural flow with minor issues, decent memory
17-21 points: Average - functional but noticeably AI-like, limited memory
12-16 points: Below average - repetitive, poor context retention
0-11 points: Poor - feels like a basic chatbot, breaks character frequently

Category 2: Value for Money (25%)

We evaluate whether you're getting fair value at each price point. The cheapest option isn't always the best value, and the most expensive isn't always worth it.

What We Evaluate

Free tier generosity: How much can you actually do for free? Some platforms offer 50+ free messages; others limit you to 5.
Premium pricing: Is the monthly cost reasonable for what you get? We compare against industry averages ($10-30/month).
Feature-to-price ratio: Do premium features justify the upgrade cost?
Hidden costs: Are there surprise credit systems, upsells, or paywalled features not mentioned upfront?
Refund policy: Can you get your money back if unsatisfied?

Scoring Rubric

23-25 points: Excellent value - generous free tier, fair pricing, no hidden costs
18-22 points: Good value - reasonable pricing with minor limitations
13-17 points: Average - standard industry pricing, some restrictions
8-12 points: Overpriced - features don't justify cost
0-7 points: Poor value - aggressive upselling, hidden fees, restrictive limits

Category 3: Features (25%)

Beyond conversation, what else does the platform offer? We test every major feature thoroughly.

Features We Test

Image generation: Quality, speed, customization options, and how well images match the character.
Character customization: How much control do you have over appearance, personality, and backstory?
Voice features: Quality of voice chat or voice messages, if available.
Memory system: Can you add facts, preferences, and relationship history that the AI remembers?
Platform stability: Uptime, loading speeds, error frequency.

Scoring Rubric

23-25 points: Feature-rich - excellent images, deep customization, voice, robust memory
18-22 points: Good features - most expected features work well
13-17 points: Basic - conversation-focused with limited extras
8-12 points: Limited - missing key features competitors offer
0-7 points: Barebones - text chat only, no customization

Category 4: Ease of Use (20%)

A great platform should be intuitive. You shouldn't need a tutorial to figure out basic features.

What We Evaluate

Onboarding: How quickly can a new user start chatting? Is account creation painless?
Interface clarity: Are features easy to find? Is navigation logical?
Mobile experience: Does it work well on phones? Is there an app?
Account management: Easy to upgrade, downgrade, or cancel?
Help resources: Documentation, FAQ, customer support quality.

Scoring Rubric

18-20 points: Excellent UX - intuitive, fast, works great everywhere
14-17 points: Good UX - easy to use with minor friction
10-13 points: Average - functional but could be improved
6-9 points: Confusing - hard to find features, poor mobile experience
0-5 points: Frustrating - buggy, slow, actively impedes use

Our Testing Process: Day by Day

Here's exactly what happens during our 7-day testing period:

Days 1-2: First Impressions

Create account, test onboarding flow
Explore free tier limits
Create 2-3 different characters
Test basic conversations
Document first impressions

Days 3-4: Premium Features

Upgrade to premium tier (with real money)
Test all premium features
Try image generation (if available)
Test voice features (if available)
Evaluate value vs. free tier

Days 5-6: Stress Testing

Long conversations (1000+ messages)
Complex roleplay scenarios
Test context retention from Day 1
Try to "break" the AI
Test edge cases and limitations

Day 7+: Analysis & Writing

Review all notes and observations
Calculate scores for each category
Compare against tested competitors
Write honest review with specific examples
Document pricing accurately

Community Pulse Scoring

Editor scores tell you how a platform performs under controlled testing. Community scores tell you how it performs in the wild — what real users say across Reddit, YouTube, and Google Play. This section documents exactly how community scores are produced, so you can verify the work, cite specific snapshots, and understand the limits of what these numbers mean.

Coverage and Framing

This is an early-period descriptive snapshot of an ongoing cross-platform AI companion observability time series. Coverage to date represents the first months of an ongoing time series; longitudinal trend claims will be appropriate when 12+ months of data accumulate (≥ Q1 2027). Until then we frame month-over-month changes as preliminary observations, not established patterns. Source-collection cadence and full methodology are documented in the sections below.

Editor Score: Nature and Update Cadence

The editor score is a 0.0 to 5.0 score from hands-on testing (see "Editor Testing" above for the testing protocol and rubric). It is static: it does not vary month-to-month with the community pulse cadence.

An editor score updates only on a formal re-test, typically on a 3-month cadence and sometimes triggered by major platform changes (significant feature releases, large pricing changes, policy shifts). When a re-test changes the score, the new value replaces the old, and the "as of" date moves forward.

Where to find it:

Each platform's review page on aicompanionpicker.com
In the open dataset: platform_metadata.csv (one row per platform) with columns editor_score and editor_score_as_of
Not in the time-series CSV (relocated in v1.1; shipping a static value alongside monthly data was misleading)

Why this matters: leaving editor_score in the longitudinal CSV implied it varied month-to-month. Relocating it and dating it with editor_score_as_of represents what it actually is, a static editorial assessment attached to a specific review snapshot.

What We Collect

Three public data sources, all collected without paid access:

Reddit — public discussions in 16 broad subreddits (r/CharacterAI, r/replika, r/aigirlfriend, r/ChatGPT, r/ChatbotRefugees, r/Chatbots, r/AICompanions, r/NSFWCharacterAI, r/Nomi, r/kindroid, r/singularity, r/ArtificialIntelligence, r/PygmalionAI, r/SillyTavernAI, r/lonely, r/socialanxiety) plus platform-specific subreddits where they exist. Per-platform search covers the platform's name and known aliases.
YouTube — review videos and creator comments via the YouTube Data API. Both video transcripts (where available) and comment threads are analyzed.
Google Play Store — user reviews via the google-play-scraper library (no API key required). Currently covers four platforms with verified app IDs: Replika, Anima AI, Candy AI, and Paradot.

Per-platform data source coverage

Platform	Reddit subreddit	Google Play app	Editor	Community
Replika	r/replika	`ai.replika.app`	3.8	2.1(n=615)
Kindroid	r/kindroid	— (web only)	4.3	3.4(n=15)
Nomi AI	r/Nomi	— (web only)	4.4	— (n=15, low confidence)
Anima AI	— (broad sweep only)	`anima.virtual.ai.robot.friend`	4.1	3.0(n=74)
Candy AI	— (broad sweep only)	`io.candy.android.app`	4.5	2.4(n=120)
Paradot	— (broad sweep only)	`com.withfeelingai.test`	3.9	1.7(n=184)
CrushOn AI	— (broad sweep only)	— (web only)	4.3	— (n=15, low confidence)
DreamGF	— (broad sweep only)	— (web only)	4.2	— (n=15, low confidence)
Fantasy AI	— (broad sweep only)	— (web only)	3.8	— (insufficient data)
FantasyGF	— (broad sweep only)	— (web only)	4	— (insufficient data)
JuicyChat AI	— (broad sweep only)	— (web only)	3.7	— (insufficient data)
Kupid AI	— (broad sweep only)	— (web only)	3.6	— (n=15, low confidence)
Nectar AI	— (broad sweep only)	— (web only)	4.1	— (n=7, low confidence)

Community scores shown are from the most recent pipeline run. Per-platform analysis (top quotes, aspect breakdown, alignment vs editor) is on each platform's review page.

Honest disclosure: seven platforms (CrushOn AI, DreamGF, FantasyGF, Fantasy AI, JuicyChat, Kupid AI, Nectar AI) have neither a dedicated subreddit nor a verified Google Play app. They rely on cross-subreddit mentions plus YouTube — often producing too few datapoints to score reliably. When signal is too sparse we report a community score of zero (insufficient data) rather than fabricate one.

How We Collect

The pipeline runs bi-weekly (1st and 15th of each calendar month, 10:00 CEST). Each run produces one snapshot per platform, written as static JSON to the public site repo (typically deployed within 5–10 minutes of pipeline completion). Specific parameters per source:

Source	Look-back window	Filters	Caps
Reddit	90 days	Min 2 upvotes; deduped by permalink	100 threads / platform; 50 comments / thread
YouTube	180 days	Min 500 views	15 videos / platform; 20 comments / video
Google Play	180 days (~6 months)	All public reviews	1,000 reviews / platform

How We Analyze

Collected data is passed to an AI classifier — Anthropic's Claude Haiku 4.5 model (claude-haiku-4-5-20251001) — which scores each platform across five fixed aspects on a 1.0–5.0 scale: Character Quality, Pricing & Value, Image/Media Generation, Privacy & Safety, and Customer Support. The classifier is instructed to use only the provided source data, never hallucinate quotes, preserve typos in quotations, anonymize Reddit usernames, and produce only valid JSON output.

The overall Community Score is a weighted average of the five aspect scores — aspects with more underlying data are weighted more heavily. Aspects flagged as "insufficient data" are excluded from the average.

Show full classifier prompt (verbatim)

The exact prompt template used in production. Placeholders like {platform_name} and {reddit_data} are populated at runtime from the collected source data.

You are an expert AI companion platform analyst. You are analyzing community sentiment data from Reddit and YouTube about a specific AI companion platform.

## Platform
- **Name**: {platform_name}
- **Editor Overall Score**: {editor_score}/5 (this is the overall rating, NOT per-aspect)
- **Editor Pros**: {editor_pros}
- **Editor Cons**: {editor_cons}

NOTE: The editor_score in each aspect output should be the OVERALL editor score ({editor_score}), since we don't have per-aspect editor scores. This lets readers compare the community's per-aspect view against the editor's overall assessment.

## Your Task

Analyze the Reddit threads/comments and YouTube reviews/comments below. Produce a structured Community Pulse analysis.

## Analysis Rules

1. **Only use data from the provided sources.** Never hallucinate quotes, threads, or sentiments.
2. **Score each aspect 1.0-5.0** based on community sentiment (not your opinion). Use 0.1 increments.
3. **Select real quotes** from the data — copy them exactly as written (typos and all).
4. **Compare community sentiment to the editor score** — note where they align and diverge.
5. **Detect trends** — are recent posts (last 30 days) more positive or negative than older ones?
6. **Anonymize Reddit users** — never include usernames, just "Reddit user in r/[subreddit]".
7. **YouTube creators** — use channel name (they're public figures).

## Fixed Aspects to Analyze

Score ALL five aspects. If there's insufficient data for an aspect, score it 0 and note "insufficient data".

1. **Character Quality** — personality, memory, conversation depth, consistency
2. **Pricing & Value** — cost, plans, value-for-money, billing practices
3. **Image/Media Generation** — quality, consistency, customization, speed
4. **Privacy & Safety** — data handling, content policies, trust signals
5. **Customer Support** — responsiveness, issue resolution, billing help

## Output Format

Return ONLY valid JSON (no markdown, no explanation outside the JSON):

```json
{
  "community_score": 3.6,
  "aspects": [
    {
      "name": "Character Quality",
      "community_score": 4.2,
      "editor_score": 4.5,
      "alignment": "agree",
      "summary": "One sentence summarizing community sentiment on this aspect."
    }
  ],
  "top_quotes": [
    {
      "text": "Exact quote from the data",
      "source": "reddit",
      "context": "r/CharacterAI, 47 upvotes, 2026-02-15",
      "sentiment": "positive"
    },
    {
      "text": "Exact quote from video transcript or comment",
      "source": "youtube",
      "context": "Channel Name, 45K views, 2026-01-20",
      "sentiment": "negative"
    },
    {
      "text": "Exact quote from Google Play review",
      "source": "google_play",
      "context": "Google Play, 3/5 stars, 5 helpful, 2026-03-10",
      "sentiment": "mixed"
    }
  ],
  "consensus": "2-3 sentence synthesis of where editor and community agree and disagree. Include specific data points.",
  "trends": {
    "direction": "improving",
    "signal": "Brief explanation of what's changing and why."
  },
  "data_quality": {
    "reddit_threads_useful": 3,
    "youtube_videos_useful": 8,
    "confidence": "medium",
    "notes": "Any caveats about data quality or gaps."
  }
}
```

## Alignment values
- "agree" — scores within 0.5 of each other
- "community_higher" — community scores 0.5+ higher
- "community_harsher" — community scores 0.5+ lower
- "major_gap" — scores differ by 1.0+
- "insufficient_data" — not enough data to score

## Trend directions
- "improving" — recent sentiment more positive than older
- "declining" — recent sentiment more negative
- "stable" — no meaningful change
- "insufficient_data" — not enough temporal spread to determine

## Confidence levels
- "high" — 10+ Reddit threads AND 5+ YouTube videos with relevant content, OR strong data across all 3 sources
- "medium" — 5+ threads OR 3+ videos OR 20+ app store reviews with relevant content
- "low" — fewer than 5 threads AND fewer than 3 videos AND fewer than 20 app store reviews

## Top quotes
Select 3-6 of the most insightful, specific quotes. Prioritize:
- Quotes with specific claims (pricing, features, comparisons)
- Quotes with high engagement (upvotes, likes, views, helpful votes)
- Mix of positive and negative sentiment
- Mix of Reddit, YouTube, and Google Play sources

## Community Score Calculation
The overall community_score should be a weighted average of the 5 aspect scores, with aspects that have more data weighted more heavily. Score 0 aspects (insufficient data) should be excluded from the average.

---

## Reddit Data

{reddit_data}

## YouTube Data

{youtube_data}

## Google Play Store Data

{appstore_data}

Aspect-level Scoring

Beyond the overall community score, the classifier produces per-aspect scores on a 1.0 to 5.0 scale across five fixed aspects:

Character Quality: how natural and engaging conversations feel; memory and personality consistency
Pricing & Value: pricing transparency, value-for-money perception, free-tier generosity, hidden-cost reports
Image/Media Generation: quality, controllability, and limits of image or media features (where applicable)
Privacy & Safety: data-handling concerns, content moderation experience, account safety reports
Customer Support: responsiveness to issues, billing fairness, communication transparency from the platform team

All five aspects are produced by the classifier on every monthly run. An aspect can be marked insufficient_data if the signal across the three sources is too sparse for that aspect; in that case it is excluded from the weighted-average community score for that platform-month.

Where to find aspect scores:

In the time-series CSV: columns aspect_character_quality, aspect_pricing_value, aspect_image_media, aspect_privacy_safety, aspect_customer_support
In the JSON API and pulse-history JSON: aspects array per monthly entry, with full classifier output
On per-platform review pages: rendered as a per-aspect breakdown chart

Classifier Transparency

Every monthly classification run records its full audit trail in the published JSON. Per-month entries in pulse-history/[platform].json include:

analyzed_at: ISO timestamp of when the classifier ran (UTC)
model: exact model identifier (e.g. claude-haiku-4-5-20251001)
cost_usd: API cost of that classification, typically around $0.01 per platform-month
aspects: per-aspect score array
consensus: classifier-generated narrative summarizing community sentiment for the month

Combined with the verbatim classifier prompt above, this enables full reproducibility: a researcher can take the same source data, run it through the same prompt with the same model, and verify whether the outputs match.

Zenodo bundles ship a snapshot of this prompt (classifier-prompt-vN.N.md) alongside the data, plus a sample-classifications.json file with anonymized input-output pairs, so the prompt that produced the data is preserved with it even if the live methodology page advances to a later version.

Confidence Ratings

The classifier tags every platform analysis with one of three confidence levels. Confidence is judged by the LLM relevance-aware, not from raw counts — a thread or video only counts toward the threshold if its content is substantively about the platform. Tangential mentions don’t.

High — 10+ Reddit threads AND 5+ YouTube videos with relevant content, or strong data across all three sources.
Medium — 5+ threads OR 3+ videos OR 20+ Google Play reviews with relevant content.
Low — fewer than the medium threshold. Low-confidence scores are not displayed in the per-platform table above; we report them as — rather than risk misleading citation.

Why this matters: a platform may show 15 threads in raw counts, but if the classifier finds that only 1 of those threads is substantively about the platform (the rest are tangential mentions), confidence is low. Counting raw threads would miss this; the classifier’s relevance judgment doesn’t.

Sample Sizes (Most Recent Run)

Each pipeline run records exactly how many datapoints fed into each platform's analysis. Below: the most recent month's sample sizes per platform. This table refreshes automatically with each bi-weekly run.

Platform	YouTube videos	YouTube comments	Google Play	Total
Replika	15	96	600	615
Paradot	15	206	169	184
Candy AI	15	18	105	120
Anima AI	15	291	59	74
CrushOn AI	15	130	0	15
DreamGF	15	281	0	15
Fantasy AI	15	124	0	15
Kindroid	15	85	0	15
Kupid AI	15	106	0	15
Nomi AI	15	184	0	15
JuicyChat AI	3	21	0	3
FantasyGF	2	3	0	2
Nectar AI	2	2	0	2

Data as of 2026-06. Platforms without datapoints are omitted; their community scores are reported as zero (insufficient data) on review pages.

Reconciliation: How Three Sources Combine

All three data feeds (Reddit, YouTube, Google Play) are passed to the classifier together in a single prompt. The classifier produces one community score per aspect, weighted by data availability. The editor score is provided as separate context — the classifier compares but does not blend it into the community score.

Each aspect output includes an alignment field that flags how the community sentiment compares to the editor score:

agree — community and editor scores within 0.5
community_higher — community 0.5+ higher than editor
community_harsher — community 0.5+ lower than editor
major_gap — 1.0+ difference in either direction (these get highlighted on review pages)
insufficient_data — not enough signal for that aspect

Cadence and Lag

Best for: month-over-month change tracking within an early-period dataset. The bi-weekly cadence captures shifts within a calendar month, including pricing changes, feature launches, and observable sentiment movement. Longer-horizon trend claims (quarter-over-quarter, year-over-year) will be supported once 12+ months of data accumulate.

Caveat: events within the past ~14 days may predate the most recent data window. For stories about something that happened in the past week (a feature change, a viral post, a policy update), our most recent snapshot may not yet reflect the community's reaction.

Citation guidance: when citing community pulse data, reference the analysis date shown on the relevant review page (or the dated snapshot URL for raw data). Methodology version plus the data-as-of date together specify exactly what was true at citation time, even after the methodology revises.

Embeddable charts and dated slices: per-platform sentiment and aspect-breakdown charts are pre-rendered as standalone SVGs at /widgets/[platform]/[metric]/[period].svg. Slice values include all-time, year-quarter (YYYY-qN), and year-to-date (YYYY-ytd). New years and quarters auto-materialize as data accumulates; the URL convention is locked. See the API docs for the full embed code template.

What We Don't Claim

Honesty about what these numbers are not:

Not statistically representative. We sample public discussion, not a randomly drawn population of users. Reddit, YouTube, and Google Play attract people with strong opinions (positive or negative) more than satisfied silent users.
Not real-time. Bi-weekly cadence means the most recent ~14 days may be missing from any given snapshot.
Not sponsorship-adjusted. YouTube reviews can be paid promotions that don't disclose. We do not currently flag or down-weight likely sponsored content.
English-language only. All three data feeds are filtered to English content. Sentiment from non-English communities is not captured.
Mobile-app bias. Google Play data only exists for platforms with verified app IDs (currently 4 of our 13 tracked). Web-only platforms are under-represented in app-store sentiment.
LLM classifier limitations. The classifier follows instructions well but is not infallible — sarcasm, context-dependent quotations, and edge cases can be misclassified. The full classifier prompt is published above so anyone can audit and reproduce.

Verify Our Work

Three ways to audit any community score we publish:

The classifier prompt is public on this page (in the collapsible block under How We Analyze). The prompt is the entire instruction set used; reproducing it on the same source data with Claude Haiku 4.5 yields substantively similar results.
The data sources are public. Reddit, YouTube, Google Play — all queryable by anyone. Every search parameter (subreddits, date windows, upvote thresholds) is listed above.
For deep audit — methodology questions, classifier prompt revisions, or reproducing a specific platform's analysis on shared source data — email contact@aicompanionpicker.com. We can share collector scripts and sample datasets with researchers and journalists on request.

Found an error in a community score? Email us. We publicly correct methodology bugs in the change log below.

Our Commitment to Honesty

Let's address the elephant in the room: yes, we use affiliate links and earn commissions when you subscribe through our links.

But here's what we don't do:

We don't rank platforms higher because they pay more commission
We don't hide flaws to protect affiliate relationships
We don't accept payment for reviews or "sponsored" placements
We don't recommend platforms we haven't personally tested

Our business model only works if you trust our recommendations. If we recommend bad platforms, you'll leave and never come back. That's why honesty isn't just ethical - it's essential to our survival.

How We Handle Conflicts of Interest

When a platform we recommend makes changes (price increases, feature removals, policy changes), we update our review immediately - even if it hurts our affiliate relationship.

We've removed recommendations for platforms that became worse over time. Our archive shows past reviews we've downgraded when platforms stopped deserving their ratings.

Updates Policy

AI companion platforms evolve rapidly. Here's how we keep reviews accurate:

Quarterly re-testing: Top platforms get re-tested every 3 months
Bi-weekly community refresh: Community scores from Reddit, YouTube + Google Play are updated on the 1st and 15th of every month
Immediate updates: Major changes (pricing, features) trigger instant updates
Version tracking: Every article shows its last update date

DOI Versioning Policy

The Community Pulse Dataset is published on Zenodo with persistent DOIs so researchers can cite it as a canonical academic primitive (alongside or in place of URL-based citation).

Each dataset version has two DOIs:

Concept DOI — the parent DOI under which all versions accumulate. This is what most researchers should cite. It always resolves to the latest published version. Example use: a paper that cites our methodology generally, not a specific snapshot.
Version DOI — a specific dated snapshot. Cite this when the analysis depends on data being exactly as it was at one point in time (e.g. a longitudinal comparison that re-runs against frozen data).

New dataset versions mint monthly, on the 1st of each month, capturing the data accumulated since the previous version. The data refresh cadence is bi-weekly (1st and 15th); the DOI mint cadence is monthly to match academic citation conventions and avoid versioned-DOI churn. From v1.1 onward minting is automated; v1.0 was minted manually.

Methodology version (this page) and dataset version are tracked separately. v1.1 of the methodology applies to v1.0 of the dataset and any v1.x dataset that follows under the same methodology. Methodology changes that are additive (no recomputation) bump the methodology minor version; non-additive changes that recompute historical scores would bump methodology major version (none planned).

Cite the aggregated concept DOI: 10.5281/zenodo.20044309. Per-platform per-version DOIs are listed on each platform's data download page.

Methodology Version & Change Log

We version our methodology so citations stay verifiable as the pipeline evolves. Cite as: "AI Companion Picker, Community Pulse Methodology v1.1, [data as of YYYY-MM-DD]."

Current version: v1.1 · Last updated: 2026-05-05

Change log

v1.1 (2026-05-05): Additive update. Aspect-level scores exposed as aspect_* columns in the time-series CSV (no recomputation; the data was already produced by the classifier and was being stripped before publishing). Editor score relocated from the time-series CSV to a new platform_metadata.csv companion file, with editor_score_as_of reflecting the date of original validation. "Coverage and framing" section added; early-period descriptive snapshot framing applies until 12+ months of data accumulate. Safety-incidents companion file documented. Replication index introduced for Zenodo bundles (source-records, classifier-prompt snapshot, sample-classifications). No change to scoring formulas, classifier model, sample-size thresholds, or confidence rules; all numerical results from v1.0 remain valid.
v1.0 (2026-05-12) — Initial publication. Pipeline cadence: bi-weekly (1st and 15th of each month). Classifier: Claude Haiku 4.5 (claude-haiku-4-5-20251001). Reddit window: 90 days, min 2 upvotes, 100 threads/platform, 50 comments/thread. YouTube window: 180 days, min 500 views, 15 videos/platform, 20 comments/video. Google Play: 180 days, 1,000 reviews/platform max, 4 platforms with verified app IDs (Replika, Anima AI, Candy AI, Paradot). Community Score = weighted average of 5 fixed aspects (Character Quality, Pricing & Value, Image/Media Generation, Privacy & Safety, Customer Support).

Platforms We've Tested

Using this methodology, we've reviewed and compared the top AI companion platforms:

Candy AI Review - Our current top pick for image generation
CrushOn AI Review - Best budget option
DreamGF Review - Best for visual customization
Character AI Alternatives - Options for different needs
Janitor AI Alternatives - No API key required options
Best Free AI Girlfriend Apps - No-cost options tested
Candy vs CrushOn vs Replika - Head-to-head comparison

Contact

Questions about our process, feedback, or platform suggestions? We'd love to hear from you.

contact@aicompanionpicker.com

Age Verification Required

About Us & How We Rate

Why This Site Exists

Meet the Team

Nolan Voss

Nadia Laurent

Our Dual-Score System

Editor Testing: Why 7 Days Minimum

Our Daily Testing Routine

Our Scoring System: 4 Categories, 100 Points

Category 1: Conversation Quality (30%)

What We Test

Scoring Rubric

Category 2: Value for Money (25%)

What We Evaluate

Scoring Rubric

Category 3: Features (25%)

Features We Test

Scoring Rubric

Category 4: Ease of Use (20%)

What We Evaluate

Scoring Rubric

Our Testing Process: Day by Day

Days 1-2: First Impressions

Days 3-4: Premium Features

Days 5-6: Stress Testing

Day 7+: Analysis & Writing

Community Pulse Scoring

Coverage and Framing

Editor Score: Nature and Update Cadence

What We Collect

Per-platform data source coverage

How We Collect

How We Analyze

Aspect-level Scoring

Classifier Transparency

Confidence Ratings

Sample Sizes (Most Recent Run)

Reconciliation: How Three Sources Combine

Cadence and Lag

What We Don't Claim

Verify Our Work

Our Commitment to Honesty

How We Handle Conflicts of Interest

Updates Policy

DOI Versioning Policy

Methodology Version & Change Log

Change log

Platforms We've Tested

Contact

Frequently Asked Questions

Frequently Asked Questions