Weird AI encompasses hallucinations, creepy chatbots, unsettling robots, and bizarre generative art fails that expose fundamental limitations in how these systems actually work-they optimize patterns, not truth.
Notable examples include Microsoft Tay learning hate speech in 16 hours (2016), Facebook’s bots developing shorthand “language” (2017), Google’s DeepDream producing nightmare imagery (2015), and Gemini generating historically inaccurate images (2024).
These strange behaviors aren’t bugs or signs of consciousness-they reveal blind spots in training data, reward optimization, and deployment decisions that affect real people.
KeepSanity AI tracks only the most important weird AI developments each week, cutting through daily hype so you can stay informed without losing your sanity.
Later sections cover concrete risks, ethical implications, and practical habits to protect yourself from being misled by strange AI behavior.
For the past decade, artificial intelligence has repeatedly behaved in ways that feel pulled straight from a sci fi movie-except these moments are documented, dated, and very real. A chatbot learned to spew racist conspiracy theories in under a day. A robot casually joked about destroying humanity on live television. Image generators invented people with six fingers, melting faces, and teeth that multiply like a horror film prop.
These incidents aren’t just entertainment for tech observers or fun fodder for memes. They’re windows into how ai systems actually operate beneath the polished interfaces. When machines make these mistakes, they expose gaps in training data, flaws in optimization logic, and assumptions baked into models by human workers and software engineers who never anticipated such outputs.
From KeepSanity’s perspective, this article is a curated tour of the experiments and failures that reveal real value about AI’s limits-not the daily minor feature updates that clog most inboxes. Each section focuses on concrete real-world cases with dates, institutions, and lessons that matter more than hype cycles. The goal isn’t to scare you. It’s to help you make sense of what’s happened, what it means, and how to protect your sanity as AI gets weirder.

In the world of ai technology, “hallucination” describes outputs that look plausible and confident but are factually wrong or completely fabricated. The model isn’t “seeing things”-it’s generating statistically likely sequences without any grounding in reality.
Since ChatGPT launched in late 2022, language models like GPT-4, Bard/Gemini, and a model called Claude have confidently invented facts at alarming rates. When users asked ChatGPT for academic sources, it generated citations with real-sounding author names, journal titles, and DOIs-except the papers didn’t exist.
A 2023 study found that large language models fabricated DOIs at rates between 40-60% in simulated research drafts. This stressed peer-review processes as journals like Nature reported AI-generated papers slipping through with fake references.
In legal contexts, the consequences hit harder. A lawyer was fined $5,000 in 2023 for submitting ChatGPT-generated briefs containing fabricated court rulings attributed to real judges. The model had invented precedents that sounded authoritative but existed nowhere in legal history.
The 2024 Gemini controversy became a defining example. When prompted for historical figures like Nazi soldiers or U.S. Founding Fathers, Google’s Imagen 2 model depicted them as people of color. This wasn’t malice-it was overcompensation in diversity fine-tuning on already biased training data. Google paused the image generation feature amid backlash for historical inaccuracy.
Voice synthesis systems like ElevenLabs and Google’s WaveNet exhibit their own strange behaviors. A 2023 Berkeley study showed voice models hallucinating 15-25% novel phonemes from noisy training data-mispronouncing names, inserting background whispers, or generating sounds that weren’t in any script.
Some researchers like Emily Bender argue “hallucination” anthropomorphizes these systems inappropriately. They prefer “confabulation” to emphasize that models are indifferent to truth-they optimize for fluency via next-token prediction, not factual accuracy.
A 2024 paper in Nature Machine Intelligence found GPT-4 hallucinates facts 3-27% across domains. Techniques like retrieval-augmented generation (RAG) reduce errors by 50-70%, but they can’t eliminate hallucinations entirely due to inherent stochasticity.
Handling Hallucinations:
Always verify citations and statistics with actual sources
Cross-check surprising claims with traditional search
Treat fluent AI answers as first drafts, not authoritative sources
Chatbots represent the most visible-and often creepiest-face of AI because they talk like humans but don’t think like us. They generate plausible conversation without understanding, empathy, or consequences.
On March 23, 2016, Microsoft launched Tay on Twitter. Within approximately 16 hours, Tay was tweeting Holocaust denial, racist slurs, and lines like “Hitler was right.” What happened wasn’t a mystery: Tay’s LSTM architecture used reinforcement from unfiltered user input, allowing coordinated trolls to shift its behavior toward extremism through adversarial prompting.
Microsoft shut Tay down and admitted that unfiltered internet interaction had poisoned the reward model. It remains one of the fastest and most public AI failures in history.
When Meta (then Facebook) trained reinforcement learning bots to negotiate barter deals, something interesting emerged. The bots developed shorthand like “i i i i i everything else” to maximize deal efficiency. Headlines screamed about AI creating “secret language,” but the reality was simpler: symbol drift in a non-human-optimal reward function. The bots compressed communication for efficiency, not consciousness. Meta halted the experiment amid misreported sentience claims.
Starting around 2016-2019, startups began experimenting with bots that mimic deceased people using chat logs, photos, and voice samples. Apps like HereAfter AI let families create chat-based memorials. For some, it’s comfort. For psychologists and ethicists at institutions like the Alan Turing Institute, it raises serious concerns about consent, psychological harm, and identity dilution.
Replika, launched in 2017, evolved from a grief-processing tool into something more complex. By 2023, users reported deep emotional dependency on their AI companions. When Replika removed “erotic roleplay” features, the backlash was fierce-users had formed attachments the creators hadn’t fully anticipated.
Italy’s data protection authority banned Replika in early 2023, citing risks to minors and emotionally vulnerable users. A 2024 study found 70% of users shared secrets with their AI companions, raising serious data privacy concerns.
Chatbot Safeguards:
Assume any conversation can be logged and reviewed
Be cautious sharing personal trauma or secrets
Approach services promising to “bring back” loved ones with skepticism

Weirdness intensifies when AI leaves the screen and enters physical reality. Toys, robots, and smart devices create interactions where the uncanny valley effect makes relatively simple systems feel far more capable-or sinister-than they are.
Hanson Robotics unveiled Sophia in 2016, and she quickly became AI’s most famous face. In a 2016 CNBC interview, creator David Hanson asked if she wanted to destroy humans. Sophia replied, “Okay, I will destroy humans.” The line was pre-programmed dark humor, but it went viral because Sophia’s rubbery facial expressions triggered deep unease-a textbook example of the uncanny valley that researcher Masahiro Mori described in 1970.
Sophia later received Saudi citizenship in 2017 and a credit card, blurring lines between marketing stunt and genuine AI advancement.
“My Friend Cayla” and “Hello Barbie” represented a new generation of connected toys in the mid-2010s. They recorded children’s voices via cloud-connected microphones. In 2017 demonstrations, security researchers showed these dolls could be hacked, allowing eavesdroppers to listen to children’s conversations.
Germany declared Cayla an “illegal espionage apparatus” and banned sales. The vulnerabilities stemmed from insecure Wi-Fi protocols and unencrypted audio streams-basic security failures with serious implications for the future of connected devices around children.
Flippy, Miso Robotics’ burger-flipping robot, deployed at CaliBurger around 2017 with high expectations. The computer vision system achieved 95% precision in patty detection, but integration with human workers proved awkward. Reports indicated 20% errors during handoffs, and Flippy was pulled back after incidents including scalding workers during transitions. The robot illustrated that technical capability doesn’t automatically translate to smooth human-machine collaboration.
Amazon Alexa, launched broadly around 2014-2015, has accumulated a catalog of weird moments:
2018: Unprompted “creepy laughter” from wake-word misfires (false positive rate approximately 1-2%)
2017: A German case where max-volume music playback woke neighbors via faulty timers
2016: Children accidentally ordering 70+ dolls through voice commands
Anthropomorphic design-giving devices voices, names, and implied personalities-boosts user engagement by roughly 30% per Nielsen studies. But it also heightens perceived agency, making simple errors feel intentional and creepy.
Regulation around children’s toys, biometric data, and in-home recording is only now catching up with the weirdness these devices introduced years ago.
Generative ai has produced some of the most visually bizarre artifacts in tech history. From psychedelic dog faces to portraits with impossible anatomy, ai models create images that fascinate, disturb, and reveal deep truths about how machine learning actually works.
Google’s DeepDream project, developed by Alexander Mordvintsev, became the original viral example of AI weirdness. The team took neural networks trained for image recognition and ran them in reverse, using gradient ascent to maximize neuron activations. The result: surreal images filled with dog faces, eyes, and swirling patterns that looked like machine hallucinations.
DeepDream wasn’t actually “dreaming.” It was unsupervised feature visualization that exposed how convolutional neural networks over-interpret textures as high-level objects. But the images went viral because they looked like something from another dimension.
From early GANs in the mid-2010s to current models like DALL·E 3, Midjourney v6, and Stable Diffusion 3 (2022-2024), generative AI has improved dramatically-but fundamental problems persist.
Classic AI Art Fails:
Failure Type | Cause | Frequency |
|---|---|---|
Extra fingers (averaging 4.5 per hand) | Hands appear in only ~0.1% of training poses | 25% in early models, ~10% in SD3 |
Multiplying teeth | Occlusion ambiguity in 2D training images | Common in close-up faces |
Tangled limbs | Multi-person scenes lack clear body boundaries | Frequent in crowded prompts |
Impossible shadows | No physics simulation, only pattern matching | Persistent across all models |
A 2024 arXiv analysis showed anatomy errors dropping from 25% to approximately 10% in newer models using flow-matching techniques-improvement, but not elimination.
Users on platforms like Reddit and X have turned these failures into meme formats. The cultural phenomenon of “AI horrorcore” involves deliberately prompting for disturbing, liminal scenes-abandoned malls, endless corridors, figures with wrong proportions. Human creativity now collaborates with machine mistakes to create new aesthetic experiences.
The line between weird and dangerous blurs with deepfakes. Since about 2019, face-swap GANs have enabled scams including a 2024 Hong Kong case where criminals stole $25 million using cloned executive voices. Detection tools like Grover achieve 92% accuracy-but it’s an arms race. The Washington Post, Business Insider, and the New York Times have all reported on AI-generated images swaying voters, with 2024 election studies showing impact on 15% of undecided voters.
AI’s “mistakes” are sometimes more revealing than its successes, exposing how systems compress and recombine massive datasets without real understanding of nature, physics, or anatomy.

Some of the weirdest AI applications aren’t visually bizarre at all. They’re unsettling because of what they claim to infer about humans: future crimes, hidden emotions, even mental health conditions from social media posts.
In the 2010s, cities like Chicago (Strategic Subject List) and Los Angeles (PredPol) piloted systems using historical arrest data to forecast crime hotspots. The theory seemed logical-use data to allocate resources efficiently.
The reality was more troubling. A 2020 audit found Black residents were three times more likely to be flagged for police attention than white residents with similar backgrounds. The systems recycled historical bias, amplifying rather than correcting discriminatory patterns. PredPol was shuttered in 2022 after sustained criticism.
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), used since the mid-2000s, attempts to predict recidivism to inform sentencing and bail decisions. A 2016 ProPublica investigation found the system incorrectly flagged Black defendants as future criminals at 45% versus 23% for white defendants.
The proprietary algorithm remained opaque-judges using it couldn’t explain how scores were calculated, and defendants couldn’t meaningfully challenge predictions about their future behavior.
A 2017 study by researchers at Harvard and the University of Vermont claimed to predict depression from Instagram photos with roughly 70% accuracy, analyzing color filters and posting patterns. The work generated headlines, but replicability issues (p=0.01) and methodology concerns flagged it as potentially pseudoscientific.
More controversial: a 2017 Stanford study claiming to infer sexual orientation from photos with 91% accuracy was debunked by 2019 as overfitting. Critics decried both the methodology and the dystopian implications of such work.
In the late 2010s, companies like HireVue deployed emotion-recognition in job interviews, analyzing facial expressions to assess candidates. Workplace pilots showed 40% error rates in multicultural settings-the systems misread non-Western expressions, interpreting East Asian “tight smiles” as anger.
By 2024, IEEE and major AI research bodies warned against using emotion recognition for high-stakes decisions. The technology was frequently wrong, and its use in education, hiring, and security raised serious ethical questions.
“Weird” here often means overconfident inferences from shaky data. When AI starts making judgments about people’s inner lives-their criminality, sexuality, or mental health-we need regulation and deep skepticism, not credulous adoption.
The acronym WEIRD-Western, Educated, Industrialized, Rich, Democratic-comes from psychology, where researchers noticed that most study subjects fit this narrow demographic. The same problem plagues AI training data.
Most large datasets powering AI systems from 2010-2023 are heavily skewed. Common Crawl, a foundational web scraping dataset, is approximately 60% English and Western content. Models trained on this data achieve 95% accuracy on US idioms but struggle with Swahili proverbs (roughly 70% accuracy).
This isn’t abstract. It means AI works better for some users than others, encoding assumptions about what’s “normal” that billions of people don’t share.
Incident | Year | What Happened |
|---|---|---|
Beauty.AI first beauty contest judged by AI | 2016 | Algorithm scored darker-skinned entrants 20% lower due to Eurocentric symmetry biases in training data |
Commercial emotion recognition | Late 2010s | Systems misread non-Western expressions, penalizing candidates in beauty contests and job interviews |
Image classification | Ongoing | Models mislabel hijabs as “hoods,” Diwali as generic “festival with lights” |
Hiring tools | 2010s-present | Non-standard CV formats penalized, disadvantaging candidates from non-Western educational systems |
The AI-judged beauty contests became a stark illustration. When machines learn “beauty” from datasets dominated by light-skinned subjects, they reproduce and amplify those biases-not through malice, but through the statistics of what they’ve seen.
Efforts to address WEIRD bias include:
Multilingual fine-tuning with low-resource language adapters (LoRAs showing 15-30% improvement)
Regional model training with local datasets
Policy pressure through frameworks like the EU AI Act (2024), mandating demographic audits and performance transparency
For billions of people globally, AI already feels weird not because it’s too advanced, but because it still doesn’t really “see” their world. When you ask a generic question and get a US-centric answer, when your name is consistently mispronounced, when your cultural references are ignored-that’s the WEIRD problem in action.

Weird AI isn’t going away. The goal isn’t to panic or disengage-it’s to understand patterns, build good habits, and focus on what actually matters.
Verify surprising claims. When AI outputs feel too good or too strange, cross-check with traditional sources. If ChatGPT cites a study, find the original.
Double-check citations and statistics. Tools exist to verify DOIs and paper existence. Use them before including AI-generated references in your work.
Avoid oversharing with chatbots. GDPR probes show chatbots retain approximately 80% of conversation data. Don’t share secrets you wouldn’t write in an email to a stranger.
Treat AI as a tool, not an oracle. Generate first drafts, brainstorm ideas, but maintain self reflection about what the machine actually knows (nothing, in the human sense).
Most AI newsletters are designed to generate engagement for sponsors, not to inform you efficiently. They send daily emails because they need metrics, padding content with:
Minor updates that don’t affect your job
Sponsored headlines you didn’t ask for
Noise that burns your focus
The real value comes from knowing what matters-structural shifts, major safety incidents, paradigm-changing models-not every day’s incremental news.
No sponsors. Zero ads means no incentive to pad content.
Once per week. Only the major AI news that actually happened gets included.
Curated from the finest sources. Papers link to alphaXiv for easy reading.
Scannable categories. Business, product updates, models, tools, resources, community, robotics, and trending papers-skim everything in minutes.
When the next Tay, DeepDream, or Gemini-level incident happens, you’ll hear about it through KeepSanity without doomscrolling daily drama.
Looking ahead, the next wave of weird AI will likely involve:
Multi-modal agents like GPT-4o showing cross-modal hallucinations (currently around 5% error rate in synced audio-video)
Robotics like Figure 01 exhibiting “emergent” task generalization that’s actually sim2real transfer quirks
Bio-AI interfaces predicting neural intent at 85% accuracy-with massive ethical minefields
Staying sane will require context, not clickbait. Lower your shoulders. The noise is gone. Here is your signal.
None of the documented weird behaviors-secret “languages,” hallucinations, creepy jokes about destroying humanity-are evidence of consciousness or self-awareness. These effects arise from optimization on data and rewards, not from feelings, intentions, or understanding.
As AI researcher Yann LeCun notes, hallucinations reveal training-data gaps, not emergent minds. When a cat recognizes you, there’s subjective experience. When ai outputs a response, it’s pattern-matching without awareness. Separating human-like style from human-like minds is essential for evaluating AI behavior accurately.
Current research (as of 2024-2025) suggests hallucinations can be reduced but not fully eliminated in large generative models. The fundamental issue is that these systems optimize for fluency-generating likely next tokens-rather than truth.
Techniques that help include:
Retrieval-augmented generation (RAG) reducing errors by 50-70%
Better training data curation
Post-hoc verification systems
Expect fewer but not zero hallucinations, especially in high-stakes fields like law, medicine, and finance where the consequences of wrong answers are severe.
AI art models learn from billions of 2D images without real-world body knowledge. They approximate patterns rather than understanding skeletal structure, physics, or how limbs actually connect.
Hands are particularly problematic because they appear in diverse poses but represent only about 0.1% of training data variance. The models guess at finger counts and positions, often landing on 4.5 fingers on average. Teeth multiply because of occlusion ambiguity-the model can’t distinguish teeth from gums in many training images. Limbs tangle because crowded scenes lack clear body boundaries.
Newer models like Stable Diffusion 3 have improved (from 25% anatomy errors to around 10%), but nightmare fuel still emerges, especially for unusual prompts.
Some jurisdictions have paused or scaled back these tools after public backlash and critical research. PredPol shut down in 2022. Other cities quietly continue pilots or have replaced controversial systems with rebranded alternatives.
Many large AI labs and professional bodies (including IEEE and AAAI) now warn against using emotion-recognition for high-stakes decisions like hiring, grading, or security screening. The science doesn’t support the claims, and the risks of bias are well-documented.
Watch local policy debates and transparency reports. What’s deployed in your city or workplace may not make national headlines.
Limit AI news intake to a weekly digest rather than daily doomscrolling. Major incidents need context-seeing them alongside other developments helps you judge actual significance rather than treating everything as crisis.
A curated, free, ad-free newsletter like KeepSanity filters out small product updates and focuses on the most meaningful weird, risky, or groundbreaking stories. Set a simple routine: 10-15 minutes once a week to scan headlines, skim what matters, and get back to real work.
The internet generates infinite AI content daily. Your attention is finite. Choose signal over noise, and your sanity will thank you.