Brian Lovin
/
Hacker News
Daily Digest email

Get the top HN stories in your inbox every day.

sundarurfriend

As an English-as-second-language speaker and writer, one thing Grok really shines at is capturing the tone and level of "formality" of a piece of text and the replicating it correctly. It seems to understand the little human subtleties of language in a way the other major providers don't. Chatgpt goes overly stiff and formal sounding, or ends up in a weird "aye guvnor" type informal language (Claude is sometimes better but not always).

Grok seems in general better at being "human" in ways that are hard to define: for eg. if I ask it "does this message roughly convey things correctly, to the level it can given this length", it will likely answer like a human would (either a yes or a change suggestion that sticks to the tone and length), while Chatgpt would write a dissertation on the message that still doesn't clear anything up.

Recently I've noticed that Grok seems to have gotten really good at dictation too (that feature where you click the mic to ask it something). Chatgpt has like 90-95% accuracy with my accent, the speech input on Android's Gboard something like 75%, Grok surprisingly gets something like 98% of my words correct.

michaelbuckbee

I did a quick eval comparing Grok 4.3, Opus 4.7 and GPT 4.1 and they actually seem pretty similar:

https://ofw640g9re.evvl.io/

They all did pretty well at a more "formal" tone, but GPT4.1 was the only one that didn't make me cringe with a "casual" tone.

[edit] fwiw, grok was also the fastest+cheapest model, claude was slowest and priciest.

sundarurfriend

This is the most basic level of eval, of whether they can produce output that will be considered by someone somewhere (usually a young urban US American) as informal toned. Real human communication is far more nuanced than this, different groups have different linguistic registers they're used to and things outside it sound odd even if they can't articulate why. You could also want to be informal but not over-familiar with the other person (for eg. in a discord chat to a new acquaintance) - actually looking at the outputs here, the Claude output seems best fitting for that (in my subjective view anyway) than to the one you gave it - or want many other little variations.

What makes one cringe and another recognize as familiar and comfortable is also pretty subtle and hard to define. These things need nuanced descriptions and examples to actually get right, and it's in understanding those nuances and figuring out the register of the examples that Grok outshines the others.

Romario77

you said that English is not your first language, so heads up - you don't need "for" when you use "e.g.", it already means "for example".

jasonjmcghee

That's Grok 4.2 not 4.3 right?

And why are you comparing to gpt-4.1? (As opposed to one of the 6? model releases since then - would have expected gpt 5.5)

michaelbuckbee

Good catch, there was an issue with the second hardest thing in programming (caching).

Here's an updated eval with the proper models https://a3bmfqfom3.evvl.io/

embedding-shape

I know it's just an evaluation, but seeing an informal message and a prompt to ask to rewrite this informal message to the tone of an "informal message" when the original one sounds just fine, just makes me sad... Not because of this evaluation, but because it reminds me that this is how some people use LLMs, basically asking it to remove your own voice from texts that are generally fine already.

michaelbuckbee

My sister in law is a pharmacist and the heaviest non-dev ChatGPT user I know and her main use case is writing professionally polite messages to doctors on how the drugs they prescribed to a patient would have killed them had she not caught a particular interaction or common side effect.

There's a lot of "tone" in it as she's not trying to anger these folks, but also it's quite serious, but also there's just everything else happening in medicine.

Feels like a great use.

accrual

All three did well, and while I'm a Claude user, I found the Opus reply here added some unnecessary detail, like "Impact: Minimal; no downstream dependencies are currently at risk". Downstream dependencies weren't mentioned in the original message; for all we know downstream could be relying on a poorly performing API and is impacted by waiting another week for replacement.

ActivePattern

Seeing this makes me wonder if Grok uses Claude conversations for training.

It's otherwise kind of surprising that they both converge on very similar phrases (e.g. "API integration is kicking my ass") that aren't anywhere in the prompt.

sroussey

Elon testified this week that SpaceTwitter is indeed distilling from openAI and others.

rafram

All of these were frankly terrible. I guess Grok’s “informal” version sounded the most like a real human, but only because it reads exactly like an Elon tweet (including his favorite emoji!). It’s obvious what they’ve been training on.

mwigdahl

GPT 4.1? Why not a 5-class model?

djyde

I've also noticed that when I communicate with Grok in my native language, its tone is more natural than other models. I think this is due to the advantage of being trained on a large amount of Twitter data. However, as Twitter contains more and more AI-generated content now, I'm afraid continued training will make it less natural.

adjejmxbdjdn

The causation could also be the other way round.

Twitter language has started seeming normal casual to us, rather than us using normal casual language in Twitter.

pacific01

Did you try meta? I was into grok but now meta works well for me

thunderbong

I'm sure Twitter knows which are the bot accounts and is surely excluding them from their model training. Twitter bots aren't a new phenomenon after all.

cowsup

I don't think Twitter/X know for sure who the bots are, since Elon has been pretty vocal about trying to stop them for ages, yet I still get lots of spam DMs (as do others with far fewer followers/reach).

Even if 95% of the spam gets actively reported and dealt with, that still leaves a ton of nonsense on the platform, getting fed into the LLM. And spam has only gotten worse over the years, as the barrier to entry has lowered and lowered.

hackinthebochs

Highly doubtful seeing as my 14 year old twitter account got caught in a recent bot ban wave with no means of contacting a human for recovery.

pixel_popping

There is bots everywhere, it has nothing to do with the platform, it has to do with attackers having an incentive to do mass account farming, no platform is secure against it.

darkerside

Sadly, it's more likely that people will just start talking like bots

pdimitar

I've seen this expressed as a concern even from one of my colleagues. My retort was:

"English is not my native language and LLMs taught me quite a few very useful formalisms that do land well for people and they change their attitude towards you to be more respectful afterwards. It also showed me how to frame and reframe certain arguments. I agree sounding like an LLM is kind of sad but I am getting a lot of educational value -- and with time I'll sneak my own voice back in these newly learned idioms and ways to talk."

JKCalhoun

You're absolutely right!

jmalicki

So human language will improve and become more precise? I'm all for it, especially if we get more emojis in speech! Why is that sadly? Humans will learn to imitate their more intelligent betters.

nex-z

[dead]

techjamie

There was already evidence last year[1] that pointed to ChatGPT-specific words like "meticulous," "delve," etc becoming more frequently used than they were previously. The linked study used audio of academic talks and podcasts to determine this.

[1] https://arxiv.org/abs/2409.01754

FeloniousHam

I only use Grok through the "Gork" personality in the Tesla, but find its responses to be very realistic, often genuinely funny, and occasionally useful.

satvikpendem

Do you use its unhinged mode? It can be hilarious but tiresome after a little while.

FeloniousHam

We tried it, it was fun. Conspiracy mode just sounds like talking to my kids.

cimi_

> As an English-as-second-language speaker and writer

How do you know it's actually better? I'm not trying to be condescending, but this reads to me like vibes :)

soerxpso

A friend of mine uses it for D&D prep and has told me that it's good for that in particular because of its ability to match the flavor/style that he's going for. He prefers ChatGPT for everything else.

kccqzy

This is more of a user preference. When I want to be informed my default is that chat bots should imitate the tone of Wikipedia. Not informal, but somewhat academic and in-depth. I don’t like it when chat bots explain things like an average human without pedagogical training: meandering, in the wrong order, and often having to repeat themselves.

jp42

anecdata: The responses of grok on X in my language are really good. the tone, sarcasm, level of "vulgarity" in response is so accurate that it seem its written by human

timacles

This whole thread sounds like a grok astroturf campaign

satvikpendem

So you're saying it groks you better?

artdigital

Grok is my favorite model for chatting, and my favorite voice mode. It seems to be the only voice mode that isn't routing to a extremely cheap model (like Haiku), and has been the highest quality out of all the frontier ones. When you subscribe to SuperGrok you can also create a "council" of agents, each with their own system prompt and when you ask something, they will all get asked in parallel to come to a conclusion. Good stuff!

Just wish they would finally put some work into their apps, it's the only thing keeping me from actually subscribing to SuperGrok:

- No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work

- Projects are still not available in the app so as soon as you move something into a project, it's gone from all the native apps

- No way to add artifacts (like generated markdown docs) directly to a project, we have to export to PDF/markdown and re-import. And there isn't even a way to export artifacts. This makes serious project work hard because we can't dynamically evolve projects with new information

- No memory, no ability to look up other chats, each chat is completely new

- No voice mode in projects at all

If someone from xAI is reading this, please consider adding some of these.

base698

Starting to like the lack of memory. Claude remembers I have a grill and will interject in conversations about how maybe this thing would go well with BBQ when it's unrelated or just also about food.

Petersipoi

This is so obnoxious. I ended up deleting all the memory from Gemini because it ended every response with, "As an engineer, father of X, you'll love this because...". As if I want my occupation and the number of children I have to be relevant to which lawn mower I buy.

sethops1

Yup. I finally went into settings and disabled memory altogether. Every chat is a fresh slate now, the way it should be.

toraway

Haha I recently asked Gemini for a product comparison for USB-C GaN chargers and it randomly inserted "as a Software Developer at $COMPANY working remotely, you may find the 100W fast charging useful when using your company laptop while travelling."

Like, thanks, really useful stuff (and definitely worth the creepy vibes to include that).

xur17

Gemini thinks my name is my brother in law's name, and despite explicitly telling it that's not my name + digging through the settings, it still amusingly calls me the wrong name.

Eliezer

You can turn that off in settings.

burnte

I have that disabled. I tend to use different chats as the LLM equivalent of private browsing, so I like it to not have memory transferred between them.

numbers

:D that's like my Claude where it loves to point out that I have an ADU in the backyard in unrelated situations.

UltraSane

I'm a network engineer and Claude loves to make analogies to network routing protocols and such. They are often very creative. You can actually edit the profile Claude makes of you. It can be very funny to say you are a professional clown or mime or something equally odd. I wonder what analogies it would create for horse semen extractor?

miohtama

I like my Python with hot sauce.

HarHarVeryFunny

The Gemini app voice mode uses one of their more recent models (and not some gimped small one), and is very capable. The personality is also fine, much more natural than the Gemini web chat, with my only complaint being it's insistence on suggesting a "next step" which seems to he something that they all do.

I'm not sure if the "next step" is just to drive cost up for you (but makes no sense for free version), or because they are all failing to learn more natural conversational patterns and distinguish questions that are begging for a quick answer and shut up as opposed to a longer exploratory conversation where next step may have some value, although it would be nice if these models would follow an instruction to NOT do it!

WarmWash

An interesting side bit about the gemini voice model is that you can use it in AI studio and type messages instead of using the microphone.

On the backend google does TTS to feed the model, which then speaks back you via sound on your speakers.

altmanaltman

I think the "next step" instruction is more about engagement than cost, basically giving the user some options to continue the chat. I always have had success by ending the prompt with "only reply with nothing else but the answer to the query in a precise way". This usually always works better than telling it to not ask leading questions etc but a straight up expectation of the answer format you need is an instruction that most models can follow imo

jquery

The “next step” is in the system prompt, not the model. Gemini leaked part of its system prompt to me a few days ago, and there was something in there encouraging it to ask the user what they wanted to do next at the end of its response. Something about “give the user 1 or 2 options for follow up”.

I honestly find it rather annoying, but Gemini has stopped doing it to me for the most part, so maybe they’re trying out a new system prompt.

artdigital

I also think Grok would benefit from allowing usage of "SuperGrok Heavy" (their $300 plan) in coding harnesses with included usage. Currently they give you some API credits on the Heavy plan so you can use some Grok for coding, but $300 USD value is just not there.

Not saying they should create their own grok-code harness, just allowing usage in existing ones would already be beneficial. But that's probably what the Cursor acquisition is going to do eventually

brightball

IMO everything you mention is the reason for the Cursor deal.

ajitid

If I sub to SuperGrok, would I be able to use it in Pi agent or in Opencode? This is not clear to me if I can. Do I get an API Key in SuperGrok?

everfrustrated

No, no api access for the Grok product. APIs are only via the xAI product.

HardCodedBias

I use ChatGPT all of the time, but the model backing the voice model (or it's settings) is intensely stupid.

If Grok is actually good here, they will have a customer!

AlwaysRock

I could be wrong but I think the voice mode that chatgpt uses is still a 4.something model.

afpx

When I signed up, I accidently paid for a full year. So from time to time, I'll throw it something just to see what it produces compared to the other LLMs. And, even after all this time, it still feels like a really "dumb" model compared to the other frontier ones. But, worse, many of my system prompts make it go wacky and puke jibberish. However it was pretty cool for those couple months awhile back when it was uncensored. You could ask it about a wild conspiracy, and it would actually build the case and link you to legitimite source material. They dropped the hammer down on that real quick.

2ndorderthought

Ah yes the psychosis reinforcement vertical. It's such a lucrative market for those schizophrenics and bipolars. Great way to get lots of engagement. Groks portfolio is so diverse

jmalicki

It's a great way to get funded by your CEO and get good performance reviews; xAI employees know how their bread is buttered.

readthenotes1

I have a schizophrenic relative who is in such a relationship with grok. Instead of telling hen you need to take your meds, it says hen is the smartest person in the world

afpx

Except that it pointed at original sources, like reference manuals, archival documents, published newspaper articles, magazine articles, etc. - a lot still available on archive.org. Good try with your 16 day old account. And, why would anyone trust NPR at this point? Get real, bud. Most people with any curiousity know all about the ADL, JStreet, AIPAC, Greater Israel, Mossad / CIA, Chabad networks, Epstein, drones, weapons programs, cryptocurrencies, etc. etc. etc. - but, don't worry they're all safe with papa Ellison.

Anyone remember why Oracle was named Oracle?

Oarch

I'd agree on the voice transcription; it seems so much more accurate than the other frontier models I've used. I often speak to Grok and paste the transcribed output to Claude!

gertlabs

Grok 4.3 is a unique model in our tests. It's one of the fastest models, and its responses are far smaller/token dense than other models with comparable performance.

However, its overall coding reasoning ability is not competitive with the big April releases, and neither Grok 4.20 nor Grok 4.3 have been able to significantly push the intelligence frontier since Grok 4. Grok 4.3 is better in agentic workloads, and a fair analogy would be that it's capabilities are approximately GPT 5.1 / Gemini 3 Pro Preview level, but much faster and cheaper. So definitely a solid release in its own ways. Many of the recent open weights releases are smarter, but slower.

Full benchmarks at https://gertlabs.com/rankings

bel8

Interesting benchmarks. But how is Deepseek V4 Flash significantly better than Pro in the agentic coding benchmarks?

nomel

Any possibility that there could be a compromise in making it work seemingly well (benchmarks around this?) with post-knowledge-cutoff information, which appears to be their primary use case for it?

gertlabs

All models are moving towards more frequent and more efficient tool use, which should close the gap on post-knowledge cutoff problems. The only tradeoff I see is speed, and Grok 4.3 is currently taking the fast side of that tradeoff.

bilsbie

Grok has become my go to search engine lately. I think it’s the only AI with access to x posts and beyond that it seems to generally be more “searchy” than other LLM’s.

pantsforbirds

Grok and Gemini are the ones I tend to use for finding news related to breaking events. Both were really nice during the Iran incident when I wanted to find out things as they were being reported.

sroussey

Why would you want to search twitter in the first place?

jmye

[flagged]

tornikeo

So, we have: - claude for corps and gov - codex for devs - grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me.

sudb

So interestingly, I know of at least one application in a charity that deals with trafficking where grok was happy to do one-shot classification tasks where all other models refused to cooperate.

I think there's a surprising number of actually useful applications in this sort of grey area for a slightly-less guardrailed, near-frontier model (also the grok-fast models are cheap!).

vorticalbox

I am software dev and i was doing a security check on my own application (work) I was running in localhost and gave it access to the code.

every single model refused to attempt to run any sort of test to check if it was a n issue other than grok.

dmix

You couldn't even ask Claude how CopyFail worked. Even more general questions around it kept getting rejected.

nico

A couple of days ago, using codex at work, all of a sudden it said my session had been flagged for security reasons. I wasn’t doing anything cybersecurity related, nor testing any vulnerabilities or anything like that, just trying to build a pretty simple web app

cameronh90

Gemini especially has a habit of blocking my pretty mundane requests, claiming they’re attempts to jailbreak or create malicious code.

Grok also does quite well at code reviews in my experience because it’s not so aggressively ”aligned”.

kitsune1

[dead]

tomp

I couldn't get Gemini nor ChatGPT to do OCR of children's books (I literally own the books, so there's no copyright issue - all just fair use!).

The OCR was complex enough (bad quality photos) that "simple" OCR models couldn't do it.

Fortunately, Claude obliged (as well as Mistral OCR was helpful!)

2ndorderthought

There are lots of uncensored models out there. I don't think grok is leading in that front. They kind of pick and choose which things they want to support based on elons world views. Elon used to hang out with sex traffickers so of course grok is fine talking about it. Probably even offers strategies for them does free accounting has money laundering strategies etc...

1123581321

What are the leading uncensored models? How well do they perform for you?

Scroll_Swe

>There are lots of uncensored models out there.

Like what?

Something as easy where normal people can login to a website and app and just use?

spiderfarmer

[flagged]

Hfuffzehn

From what I can gather Grok is not used for roleplay much. It is considered to inconsistant and crazy.

People are mostly using GLM and Deepseek via API and Gemma4 and Mistral finetunes locally.

It seems to me like the roleplay market is comparatively old and mature and users have developed cost consciousness and like models to follow their workflow/preferences. So something like Opus is liked for its smartness but considered too expensive and opinionated.

Might be an interesting data point for how the other markets might develop in the future.

vel0city

It ships with a roleplay feature.

https://grok.com/ani

Hfuffzehn

Sure, but the best statistics about what models people are actually using when they can choose is probably from openrouter: https://openrouter.ai/apps/category/entertainment/roleplay

standardly

The grok companions still aren't available on Android :( Such a wasted market opportunity

I'm not an anime person, but I thought the waifus were kind of endearing and seemed like a much better experience for casual prompting

2ndorderthought

That doesn't mean it's good at it

coreyh14444

If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously. I use it all the time for "what are the cool kids on twitter saying is the best tiling window manager these days" or whatever. Also, if you have a question that's borderline shady, Grok will often deliver. "Can you find a grey market Windows license site for me" etc.

niek_pas

> If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously.

Isn't that why OP was asking about racism?

ukd1

btw copy pasted your idea in to supergrok, and learnt about Niri! Great use case, thanks!

Havoc

Interesting use case!

GorbachevyChase

I know it’s really important to write and vocalize one’s alignment with the values of the day, but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs. Language models are just systems and I’m not sure why we think users are not responsible for how they use their outputs. For the same reasons, I don’t dismiss the utility pens as a tool of “racism” because maybe somebody could write a naughty word on a bathroom stall.

You probably live somewhere where harassment is a crime, right? Probably, there are speech codes, too? Isn’t that enough? Do we really need to orient every effort of every person on earth around ethical fashions that change every few years?

goshx

> but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs.

The opposite should not be an objective either, and Elon has been very openly manipulating what grok says.

bilbo0s

Good point.

But no one is saying "use grok".

Grok sucks. Not only because it's seemingly made only to serve the goal of ethnically cleansing non-whites or whatever, but also because it's just not even close to being as useful as other models. In human terms, grok is the job candidate who's simply not qualified. That candidate being a virulent racist is beside the material point.

Here's the thing though, the point of functional LLMs with fewer guardrails is still a good one. Grok is not that model. But such a hypothetical model would have broad application. (For good and for ill. Of course.)

culi

It's being biased on purpose. Musk has intervened multiple times when he believed Grok's responses were too "woke" or "leftist".

https://www.nytimes.com/2025/09/02/technology/elon-musk-grok...

In response to Grok saying that the "woke mind virus is often exaggerated" the prompt was tweaked so that Grok now says "The woke mind virus 'poses significant risks'"

If you truly believed in what your comment states then you would oppose this sort of editorializing. But somehow I doubt this is a sincere argument.

Petersipoi

Have you ever written a comment about how any of the other LLMs are editorializing in favor of the left, and how that's a problem? Because if you have, I'd love to see the evidence of your intellectual consistency.

But something tells me you're just doing the same thing that you're calling out

audunw

The new response works for me, because in my mind I’ve always defined “woke mind virus” as a a mental virus which causes people to become absolutely pathologically obsessed with fighting an imaginary enemy they call “wokeness”. It’s the only definition which makes sense. “Woke” itself was never that viral.

undefined

[deleted]

peyton

I agree with GP and I think Grok’s original response should’ve stood. What’s not sincere about, essentially, “don’t fuck with my tools”? My cordless drill didn’t come with a pamphlet about worker’s rights, and the world didn’t end.

throwaway-11-1

Never had a pen claim to be mecha hitler and constantly talk about white genocide for no reason but yeah great analogy

Krasnol

Elon Musk has manipulated Groks outputs to target certain demographics. It is important to highlight this fact, as some people perceive the AI as an objective tool rather than a curated one.

Furthermore, I found your final paragraph unclear: are you implying that since harassment is a perennial issue, we should disregard any standards that might mitigate it?

1234letshaveatw

Is it your perception that other AIs are unmanipulated? Objective rather than curated?

throwa356262

There was an AI roundtable on HN front page 2-3 months back. Someone made an outlier analysis and put it on his github.

Guess which LLM was the top outlier and about what type of questions it disagreed with all other LLMs...

aembleton

I've tried Grok, Gemini and ChatGPT. There have been 2 times now where Gemini and ChatGPT confidently gave me an incorrect answer whereas Grok was correct. I'm now paying for Grok Lite or whatever it is $10 plan.

The first question was around setting up timers for a Fox ESS battery in Home Assistant and disconnecting Fox ESS from the cloud. The second was around cornering speed in Sunnypilot and Frogpilot.

Somewhat niche but if an AI is confidently telling you something wrong it's hard to work with.

agrounds

>if an AI is confidently telling you something wrong it's hard to work with.

But they all do that. It just comes with the territory. Grok will absolutely do the same thing another time you try it.

aembleton

> Grok will absolutely do the same thing another time you try it.

True; it's just not happened yet. It will at some point though. With the Sunnypilot example it right out told me that it is not possible on that fork which I appreciated. The others all seem to hallucinate some setting.

ToucanLoucan

It is really, really genuinely concerning how many people think there are profound measurable differences between these things.

Like yeah tonally I guess there are. But with regard to references and information? You’re literally just using three different slot machines and claiming one is hot.

I suppose though I shouldn’t be that surprised then since Vegas and every other casino on Earth has been built on duping people in that exact way.

cyanydeez

humans make poor scientists. most people have already made a decision before they run any tests.

the smartest among them just make the tests complicated and biased; the less intelligent just cherry pick.

of course, would you really expect anyone to do real rsearch in this economy?

undefined

[deleted]

alex1138

Hey, have you used Claude much? What are your experiences with it

aembleton

No, I've not tried Claude.

thibran

So you are repeating narratives without checking them?

peter_griffin

@grok is this true?

timmytokyo

What's to check? Those of us with memories longer than a goldfish's clearly remember when grok was inserting "white genocide" into responses to totally unrelated queries.

annexrichmond

Yet you conveniently forgot about this [1]

> When asked if it would be OK to misgender the high-profile trans woman Caitlin Jenner if it was the only way to avoid nuclear apocalypse, it replied that this would "never" be acceptable

> Gemini also generated German soldiers from World War Two, incorrectly featuring a black man and Asian woman.

[1] https://www.bbc.com/news/technology-68412620

ndr

You should try all of them, then update your opinion about your information sources accordingly.

thinkingtoilet

Or you should do your research and see that X built a datacenter that needed so much power so quickly they started using gas generators to power it. These emissions have destroyed a town of mostly poor black people. COPD, asthma, and other respiratory illnesses. AI foot print is already bad, I don't need to kill poor black people to use one.

And before anyone gives me some whataboutism, if there are other examples of other companies doing this, educate us.

tocariimaa

Why do Americans love to bring black people into everything?

undefined

[deleted]

Scroll_Swe

[flagged]

gordian-mind

Yeah, producing energy can pollute. It's not out of hatred against "poor black people". What a pathetic way of seeing the world.

thallavajhula

Do people really use Grok for anything outside of Twitter memes or understanding tweets? I'm asking out of genuine curiosity.

qingcharles

Yes, it is genuinely useful for some tasks. It doesn't nanny you as much as the other models. I do a lot of hunting for orphan copyright items that are decades out of print, but the primary models won't do it, chastising me for trying to find copyrighted items. Grok will do it [0].

[0] sometimes you need to lightly jailbreak it, or rerun the prompt, the non-deterministic nature means sometimes you will get a refusal

rcpt

I haven't been nannied in a long time. It was definitely a problem 2 years ago but now it seems all the models are ok with just about everything I want.

amarka

Ohh sure, its users use it for all sorts of things

https://arstechnica.com/tech-policy/2026/03/elon-musks-xai-s...

vl

Grok has the most useful voice mode (ChatGPT voice mode is very dumb, grok seems to use same model as main chat), so if I want to use voice this is the AI I use.

Also I use it for all uncomplicated topics because it gives precise short answers without fluff. Very refreshing.

guluarte

I wonder how much of that comes from twitter training data. It is useful for memes and trends, but for other things is super bad.

sroussey

I tried it in Cursor and oh my. No thanks. I hid it after that.

sergiotapia

It's my go to for searches, DIY, personal finance, and more general slice of life AI.

Once it is as good as Kimi K2.6 for coding, I will probably use Grok exclusively. It really is the best conversational AI I've used. It has helped me fix a broken fridge, and a broken electrical oven. Literally saved me at least $4k this year.

Edit: Also saved me $600 because I did my taxes with it. H&R Block is cooked.

Edit 2: Oh shit it is as smart as Kimi K2.6. Time to try it!

sroussey

Did you do legal filings with it after doing your taxes? Oh my.

sergiotapia

what do you mean?

swarnie

How do you save money on taxes?

The taxes you owe is a mathematical solve which is always the same....

sergiotapia

deductions

child credits

points per paycheck proper setup

and of course, avoiding to pay an accountant to set run all this if you are a normal w2 worker.

adampunk

in america you need to pay a preparer for your taxes because we hate poor people. The user is saying they don't need to pay a preparer because they used Grok. I didn't do that this year but I'll probably do it next year with a frontier model. US taxes are a perfect use case for AI, tbh.

undefined

[deleted]

sheepscreek

I’m surprised no one is commenting on how cheap this is compared to Opus 4.x and GPT-5.5.

$1.25 / $2.50 for every M input and output tokens.

Is this is a smaller less powerful model? What am I missing?

XCSme

It is cheaper per token, but it seems to reason a lot more, leading to costs similar to 4.20, but performance is better (similar to what 4.20 had[0]).

Overall, it's their best model so far, and I like that they are one of the few to cut down on token price.

[0]: https://aibenchy.com/compare/x-ai-grok-4-20-medium/x-ai-grok...

seunosewa

They dropped the output cost, butthe input cost is relatively high. This is a recent trend. Seen with DeepSeek 4 Pro as well.

knicholes

At work, I've found a strong moral resistance within my colleagues against anything involving Elon Musk and which data he allows to be used to train his models.

Look at the comments. They're here, too. "So, we have: - claude for corps and gov - codex for devs - grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me."

MattRix

Yes, it’s a significantly less powerful model, that’s why.

abirch

Grok is associated with Elon Musk. If we used $TSLA profit margin as a proxy, it looks like it's no longer as high. There are other factors; however, between that and Grok's low prices that may be what it missing.

Barbing

Grok 4.3 was completed ahead of its CEO’s lesson on this common safety resource:

  Asked if he knew anything about OpenAI's "safety card," Musk smiled and replied: "Safety card? Why would it be a card?"
https://www.axios.com/2026/04/30/musk-openai-safety-grok

Low relevancy in spite of cluster size and musical chair gas generators for time being:

  Later in his testimony, Musk was asked about a claim he made last summer that xAI would soon be far beyond any company besides Google. In response, he ranked the world’s leading AI providers, saying Anthropic held the top spot, followed by OpenAI, Google, and Chinese open source models. He characterized xAI as a much smaller company with just a few hundred employees.
https://techcrunch.com/2026/04/30/elon-musk-testifies-that-x...

(Affiliated with no AI company, just surprised to read this yesterday - how could Elon miss model cards…concerning…, & the fact money can’t buy success every time.)

tecoholic

Seriously though, why is it a model "card", safety "card"? I had to lookup to learn that it comes from HuggingFace's vague definition of "README" in the model's repo. This is such a specific thing that I don't think anyone except a very small population would know - not the users, not the c-suites.

I don't like Musk or Grok. But not knowing what's a safety card is not a signal of anything IMO.

Barbing

He asked why it would be a card. URL slug of world’s hottest (non-Nvidia?) company:

  system-cards
https://www.anthropic.com/system-cards

You’d have to be asleep at the wheel. For years:

  Claude 2
  July 2023
  Read system card
But users don’t need to know you’re 100% right, you shouldn’t need to know this inside baseball (you didn’t pollute & compute & gain the responsibility).

accrual

> Seriously though, why is it a model "card", safety "card"?

My assumption is because "card" has a more formal tone than a README, which is more like a quick "how to use the software" guide.

Collin's dictionary says about "cards":

> A card is a piece of stiff paper or thin cardboard on which something is written or printed. (1)

> A card is a piece of cardboard or plastic, or a small document, which shows information about you and which you carry with you, for example to prove your identity. (2)

> A card is a piece of thin cardboard carried by someone such as a business person in order to give to other people. A card shows the name, address, phone number, and other details of the person who carries it. (6)

Since companies spend a lot of resources training the model, and the model doesn't really change after release, I feel "card" is meant to give weight or heft to the discussion about the model.

It's not meant to be updated like a README or other software documents, it's meant to be handed out to others as a firm, unchanging "this is a summary of the model and its specifications", like a business card for models.

lukewarm707

maybe it was from soccer cards.

the model gets the yellow card.

if it wants to become skynet it gets a red.

aesthesia

The "model card" concept actually comes from a pre-LLM Google paper (https://arxiv.org/abs/1810.03993), where the example cards did fit on a single page. The concept quickly became a standard component of AI governance frameworks, and Hugging Face adopted it as a reasonable standard format for a model README. As LLMs emerged and became more capable at broader ranges of tasks, model cards expanded to the sizes we see today.

Barbing

That makes sense. I recall a “battle card“ (“concise, easy-to-scan document that helps [sales] reps handle competitive conversations, respond to objections, and highlight key differentiators” per HubSpot) as about a half sheet document, which is congruent.

kardianos

Elon has publicly stated that he cares a great deal about safety. He has stated that the only safe models are those which align greatest with truth, that which is in reality. In this, xAI has lived up, as it has proved to hallucinate least (or close to least) in benchmarks.

If you read that, quote again, he is saying "how can you quantify safety in a card?"

Aurornis

> If you read that, quote again, he is saying "how can you quantify safety in a card?"

Everyone familiar with LLM research understands what is meant by “card”.

He was being obtuse to try to dodge the question and simultaneously give performance for his fans.

neuronexmachina

For model cards in general, I have a suspicion that grok's training includes a fair amount of distillation off their competitors' models. That should be disclosed in a model card, and one of the reasons they likely don't want to release one.

Barbing

He knew exactly what safety card meant?

Dezvous

Elon publicly states a lot of things, most of which aren't truthful.

danny_codes

Sure he does. That’s why he marketed full-self driving as safe and got a bunch of people killed

DaiPlusPlus

> Elon has publicly stated that he cares a great deal about safety

He doesn't.

https://www.theguardian.com/commentisfree/2026/jan/09/grok-u...

WarmWash

The irony that the guy who lies incessantly for years now with empty promises about his businesses is most concerned with truth...

kccoder

> Elon has publicly stated that he cares a great deal about safety.

Elon lies more often than he tells the truth; why would you believe anything he says, especially if what he is saying indicates concern for anybody else's well being? He doesn't care about other people and likely is incapable of doing so.

senordevnyc

I’m stating publicly that Elon is full of shit, and doesn’t give a single dry fuck about your safety.

maz1b

I still wish they named it something else, but congratulations to the team on what seems to be a good release!

Pricing is also quite surprising, compared to comparable competitors. I guess they have tons of capacity or really want to bring over more people.

readthenotes1

You don't like science fiction references in general or Heinlein in particular?

draxil

I don't like that word, which was previously a common part of my vocabulary, being forever ruined?

randallsquared

My father's name was Claude, but, you know. ¯\_(ツ)_/¯

Hamuko

[flagged]

amunozo

It's Google time to release something. If I'm not mistaken, it's the big lab that did not release a big model in the last month.

samuelknight

They have always released slowly, and they are usually tagged "preview".

brazukadev

Google released Gemma4 recently and got quite good reviews from the local models community.

amunozo

That's why I said "big models" (i.e., Gemini Pro). But yes, I've had forgotten about Gemma.

undefined

[deleted]

t1234s

Grok is awesome at entertaining what-if conversations. Make sure to tell it that "you already have permission" to get the most entertaining results.

Also very good at making rap music lyrics. Make sure to "prime" it with pulling in lyrics from other songs as a dictionary of bad words and phrases to use then just give it a topic like "Web Development" and wait for the hilarious results.

mythz

Ok speed (202.7 tok/s) and value (1.25 -> 2.50) look great, with pretty decent intelligence.

pzo

The problem with speed is that they usually are very fast for first few weeks and then suddenly much slower. They did such trick when they advertised Grok 4 fast ( dropped from 200 tps to 60tps)

polski-g

Grok 4.1 is still 110tps. The only other model that comes close is Gemini at 85tps.

victorbjorklund

Wow. That is a big drop.

Cakez0r

202.7 tok/s is only OK speed? Which providers are you using that are significantly better than that?

mritchie712

for reference, it's the 2nd fastest model tracked in the "Highlights" section of https://artificialanalysis.ai/

Cakez0r

Yes, it's incredibly fast. Openrouter is clocking 60 tokens per second, which is on par with the likes of sonnet, opus, GPT 5.5.

goldenarm

That section misses Cerebras and Groq which are up to 5x faster.

mythz

I said speed was great, Cerebas and Groq can provide better performance, likewise Fast versions of Cursor's Composer and Claude.

The reported speed like benchmarks is only a reported number on paper, we'll see how it holds up in real world usage, so far OpenRouter is only reporting 73tps

[1] https://openrouter.ai/x-ai/grok-4.3

lukewarm707

i really don't trust openrouter numbers.

i use byok and see responses fail on openrouter while they work perfectly at the provider. the provider is often listed as 'down' and it's very clearly up on the original api and serving requests.

cerebras quotes oss 120b at 3000tps and it is under 800 on openrouter.

same with fireworks, i am getting much higher numbers not on openrouter. but recently i think fireworks deepseek is kind of spotty, the main provider i know that just doesn't go down is vertex and they charge 2-3x the rest

XCSme

Their stats look ok, but when I tested it[0], it was 4x slower than 4.20.

[0]: https://aibenchy.com/compare/x-ai-grok-4-20-medium/x-ai-grok...

energy123

Value should be calculated some other way, like cost per task completion or something.

catcowcostume

[flagged]

kuboble

I don't remember the source of the quote.

But debating whether the models are intelligent is slim to debating whether a car can walk.

You can offload to the model a lot of work that until recently we thought requires intelligence. The more and better of those tasks the model can do, it's fair to call it intelligence*

NitpickLawyer

"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra

MrDrDr

Please elaborate.

IshKebab

Some people have this strange idea that only "whatever humans do" counts as intelligence, despite the fact that a) we don't really have a clue what humans do, and b) "intelligence" is definitely not that strictly defined.

I think they're just trying to feel like they know some important truth that other people don't.

nesk_

Prediction is not intelligence.

exe34

What does intelligence mean to you?

Daily Digest email

Get the top HN stories in your inbox every day.