Using generative AI as part of historical research: three case studies

resobscura.substack.com

250 points by benbreen 6 days ago

eviks 2 days ago

For a case study would be nice if the case were actually studied…

> had unusually legible handwriting, but even “easy” early modern paleography like this is still the sort of thing that requires days or weeks of training to get the hang of.

Why would you need weeks of training to use some OCR tool? No comparison to any used alternatives in the article. And only using "unusually legible" isn't that relevant for the… usual cases

> This is basically perfect,

I’ve counted at least 5 errors on the first line, how is this anywhere close to perfection???

Same with translation: first, is this an obscure text that has no existing translation to compare the accuracy to instead of relying on your own poor knowledge? Second, what about existing tools?

> which I hadn’t considered as being relevant to understanding a specific early modern map, but which, on reflection, actually are (the Peter Burke book on the Renaissance sense of the past).

How?

> Does this replace the actual reading required? Not at all.

With seemingly irrelevant books like the previous one, yes, it does, the poor student has a rather limited time budget

benbreen a day ago

I agree, I probably should've gone into more detail on the actual case studies and implications. I may write this up as a more academic article at some point so I have space to do that.
To your point about OCR: I think you'll find that the existing OCR tools will not know where to begin with the 18th century Mexican medical text in the second case study. If you can find one that is able to transcribe that lettering, please do let me know because it would be incredibly useful.
Speaking entirely for myself here, a pretty significant part of what professional historians do is to take a ton of photos of hard-to-read archival documents, then slowly puzzle them out after the fact - not by using any OCR tool (because none of them that I'm aware of are good enough to deal with difficult paleography) but the old fashioned way, by printing them out, finding individual letters or words that are readable, and then going from there. It's tedious work and it requires at least a few days of training to get the hang of.
If anyone wants to get a sense of what this paleography actually looks like, this is something I wrote about back in 2013 when I was in grad school - https://resobscura.blogspot.com/2013/07/why-does-s-look-like...
For those looking for a specific example of an intermediate-difficulty level manuscript in English, that post shows a manuscript of the John Donne poem "A Triple Fool" which gives a sense of a typical 17th century paleography challenge that GPT-4o is able to transcribe (and which, as far as I know, OCR tools can't handle - though please correct me if I'm wrong). The "Sea surgeon" manuscript below it is what I would consider advanced-intermediate and is around the point where GPT-4o, and probably most PhD students in history, gets completely lost.
re: basically perfect, the errors I see are entirely typos which don't change the meaning (descritto instead of descritta, and the like). So yes, not perfect, but not anything which would impact a historical researcher. In terms of existing tools for translation, the state of the art that I was aware of before LLMs is Google Translate, and I think anyone who tries both on the same text can see which works better there.
re: "irrelevant books," there's really no way to make an objective statement about what's relevant and what's not until you actually read something rather than an AI summary. For that reason, in my own work, this is very much about augmenting rather than replacing human labor. The main work begins after this sort of LLM-augmented research. It isn't replaced by it in any way.
- eviks 7 hours ago
  
  > To your point about OCR: I think you'll find that the existing OCR tools will not know where to begin with the 18th century Mexican medical text in the second case study. If you can find one that is able to transcribe that lettering, please do let me know because it would be incredibly useful.
  My point about OCR is you haven't done any comparison and is now making the same mistake of claiming without any evidence. The most basic one from google translate does "know where to begin", it even doesn't make the "physical" mistake, though makes others. It also does know where to begin with the image from your second post, although it seems worse. And it's not the state of the art, and I don't know what that is for spanish either, but again, that wasn't my point. You do not have a care-free option, to be able to understand that "physical" mistake you'd still need to read the source, which means you still need those days/weeks of training
  > none of them that I'm aware of are good enough to deal with difficult paleography
  And you haven't demonstrated anything re. difficult paleography for the LLMs in your article either!
  > entirely typos which don't change the meaning
  First, you'd need to actually demonstrate that, and that would require the full accounting which you haven't done (and no, I don't plan to do that either) This could be a typo in a name or a year, which is bound to have some impact on a historical researcher? He'd try searching for a misspelled name and find nothing while there could've been an interesting connection in some other text?
  >translation, the state of the art that I was aware of before LLMs is Google Translate, and I think anyone who tries both on the same text can see which works better there.
  Yes, do try it, for example, in Deepl, to see that it's not any worse
  > no way to make an objective statement about what's relevant and what's not until you actually read something rather than an AI summary
  Sure, but presumably you've done that before making the claim of relevance "on reflection"? So how is it relevant to demand this "human labor" of the students?
carschno 2 days ago

I wanted to say this, but could not express it as well. I think what your points also reveal is the biggest success factor of ChatGPT: it can do many things that specialised tools have been doing (better), but many ChatGPT users had not known about those tools.
I do understand that a mere user of e.g. OCR tooling does not perform a systematic evaluation with the available tools, although it would be the scientific way to decide for one. For a researcher, however, the lack of knowledge about the tooling ecosystem seems concerning.
simonw a day ago

Full quote:
> Granted, Monte had unusually legible handwriting, but even “easy” early modern paleography like this is still the sort of thing that requires days or weeks of training to get the hang of.
He isn't talking about weeks of training to learn to use OCR software, he means weeks of training to learn to read that handwriting without any assistance from software at all.
- eviks a day ago
  
  And this would change how? If you needed to learn to read it before despite being able to use OCR, why would this new tool allow you to not learn anything?
  Or, to get back to my original comment, if it's ok to be illiterate, why would you need weeks to learn using an alternative OCR tool?
  - simonw a day ago
    
    Why are you talking about spending weeks learning to use an OCR tool?
pjc50 2 days ago

Do you know any OCR tools that work on early modern English handwriting?
- conjectures a day ago
  
  I used to work for a historical records org. As of 10 years back, OCR was getting humans to transcribe such work. So whatever the limitations of genai, my prior is against there being a perfectly good old fashioned OCR solution to the 'obscure hisotrical handwriting' problem.
- carschno a day ago
  
  I would start here: https://www.transkribus.org/
  Experts in the field might know more specialized tools, or how to train an actually better Transkribus model without deep technical knowledge required.

simonw 2 days ago

I'd love to read way more stuff like this. There are plenty of people writing about LLMs from a computer science point of view, but I'm much more interested in hearing from people in fields like this one (academic history) who are vigorously exploring applications of these tools.

dr_dshiv 2 days ago

I’m working with Neo-Latin texts at the Ritman Library of Hermetic Philosophy in Amsterdam (aka Embassy of the Free Mind).
Most of the library is untranslated Latin. I have a book that was recently professionally translated but it has not yet been published. I’d like to benchmark LLMs against this work by having experts rate preference for human translation vs LLM, at a paragraph level.
I’m also interested in a workflow that can enable much more rapid LLM transcriptions and translations — whereby experts might only need to evaluate randomized pages to create a known error rate that can be improved over time. This can be contrasted to a perfect critical edition.
And, on this topic, just yesterday I tried and failed to find English translations of key works by Gustav Fechner, an early German psychologist. This isn’t obscure—he invented the median and created the field of “empirical aesthetics.” A quick translation of some of his work with Claude immediately revealed concept I was looking for. Luckily, I had a German around to validate the translation…
LLMs will have a huge impact on humanities scholarship; we need methods and evals.
benbreen 2 days ago

Thank you! Have been a big fan of your writing on LLMs over the past couple years. One thing I have been encouraged by over this period is that there are some interesting interdisciplinary conversations starting to happen. Ethan Mollick has been doing a good job as a bridge between people working in different academic fields, IMO.

grobbyy 2 days ago

A basic problem is they're trained on the Internet, and take on all the biases. Ask any of them so purposed edX to MIT or wrote the platform. You'll get back official PR. Look at a primary source (e.g. public git history or private email records) and you'll get a factual story.

The tendency to reaffirm popular beliefs would make current LLMs almost useless for actual historical work, which often involves sifting fact from fiction.

dmix 2 days ago

Couldn’t LLMs cite primary sources much the same way as a textbook or Wikipedia? Which is how you circumvent the biases in textbooks and wikipedia summaries?
- bandrami 2 days ago
  
  They can, but they also hallucinate non-existent references:
  https://journals.sagepub.com/doi/10.1177/05694345231218454
- simonw 2 days ago
  
  A raw LLM is a bad tool for citations, because you can't guarantee that their model weights will contain accurate enough information to be citable.
  Instead, you should find the primary sources through other means and then paste them into the LLMs to help translate/evaluate/etc, which is what this author is doing.
- Almondsetat 2 days ago
  
  Circumventing the bias would mean providing a uniform sampling of the primary sources, which is not guaranteed to happen

dartos 2 days ago

This is a showcase of exactly what LLMs are good at.

Handwriting recognition, a classic neural network application, and surfacing information and ideas, however flawed, that one may not have had themselves.

This is really cool. This is AI augmenting human capabilities.

BeefWellington 2 days ago

Good read on what someone in a specific field considers to have been achieved (rightly or wrongly). It does lead me to wonder how many of these old manuscripts and their translations are in the training set. That may limit its abilities against any random sample that isn't included.

Then again, maybe not; OCR is one of the most worked on problems, so the quality of parsing characters into text maybe shouldn't be as surprising.

Off topic: it's wild to me that in 2025 sites like substack don't apply `prefers-color-scheme` logic to all their blogs.

satisfice 2 days ago

The intractable problem, here, is that “LLMs are good historians” is a nearly useless heuristic.

I’m not a historian. I don’t speak old spanish. I am not a domain expert at all. I can’t do what the author of this post can do: expertly review the work of an LLM in his field.

My expertise is in software testing, and I can report that LLMs sometimes have reasonable testing ideas— but that doesn’t mean they are safe and effective when used for that purpose by an amateur.

Despite what the author writes, I cannot use an LLM to get good information about history.

simonw 2 days ago

This is similar to the problem with some of the things people have been doing with o1 and o3. I've seen people share "PhD level" results from them... but if I don't have a PhD myself in that subject it's almost impossible for me to evaluate their output and spot if it makes sense or not.
I get a ton of value out of LLMs as a programmer partly because I have 20+ years of programming experience, so it's trivial for me to spot when they are doing "good" work as opposed to making dumb mistakes.
I can't credibly evaluate their higher level output in other disciplines at all.
- xigency 2 days ago
  
  This begs the question, is this wave of LLM AI anything more than a fancy mirror? They're certainly very good at agreeing with people and following along, but, as many have noted, not really useful for anything acting on their own.
amelius 2 days ago

You __can__ get good information from an LLM, however you just have to backtrack every once in a while because the information turned out to be false.
- userbinator 2 days ago
  
  however you just have to backtrack every once in a while because the information turned out to be false.
  The problem is, how do you know? I've seen developers go completely off-course just from bad search engine results and one did admit he felt something wasn't right but kept going because he didn't know better; now imagine he's being told by a very confident but incorrect LLM, and you can see how hazardous that'll be.
  "You don't know what you don't know."
  - sebmellen 2 days ago
    
    Unless you have a very good understanding of the system you’re working on or the tools you’re using, it’s very possible to get knee deep in crap without knowing it. That’s one of the biggest risks of using LLMs as assistants.
  - simonw 2 days ago
    
    You need to develop skills like critical thinking, metacognition, analytical reasoning - being able to get to a robust mental model from a bunch of different inputs, some of which may even contradict each other.
    
    userbinator 2 days ago
    
    People were generally already horrible at that before AI.
- mvdtnz 2 days ago
  
  And therein lies the problem - if you're not already an expert there's no way to tell when is the right moment to backtrack.
- nithril 2 days ago
  
  The exact definition of a useful heuristic, "good enough"

jolmg 2 days ago

> explicación poética

> There are, again, a couple errors here: it should be “explicación phisica” [physical explanation] not “poetic explanation” in the first line, for instance.

The image seems to say "phicica" (with a "c"), but that's not Spanish. "ph" is not even a thing in Spanish. "Physical" is "física", at least today, IDK about the 1700's. So, if you try to make sense of it in such a way that you assume a nonsense word is you misreading rather than the writer "miswriting", I can see why it assumes it might say "poética", even though that makes less sense semantically.

benbreen 2 days ago

Author here, I agree that my read may not be correct either. It’s tough to make out. Although keep in mind that “ph” is used in Latin and Greek (or at least transliterations of Greek into the Roman alphabet) so in an early modern medical context (I.e. one in which it is assumed the reader knows Latin, regardless of the language being used) “ph” is still a plausible start to a word. Early modern spelling in general is famously variable - common to see an author spell the same word two different ways in the same text.
- jolmg 2 days ago
  
  > So, if you try to make sense of it in such a way that you assume a nonsense word is you misreading
  > I agree that my read may not be correct either
  Just in case, by "you", I meant from the POV of the AI, not you the author.
  That's interesting to know about "ph". I didn't know it was present in Latin, and I wonder if that's also the case with Spanish.
  - schoen 2 days ago
    
    I just looked in the Corpus Diacrónico del Español
    https://corpus.rae.es/cordenet.html
    and it found 33 hits for "phisica" and 99 for "phisico", mostly from the 1490s. Now some of these can be deceptive, like a few are from a bilingual Spanish-Latin book and occur in the Latin portions rather than the Spanish portions, but it seems like some authors in the 1400s wrote "ph" in some Spanish words, at least when they knew the Latin or Greek etymologies.
    I don't know when the Iberian languages first got their more phonetic orthographies, especially suppressing that h (that was originally in Latin digraphs used to transliterate Greek letters θ, φ, χ).
    Edit: There are also about two dozen hits for physico/physica, interestingly more from the 1700s rather than 1400s.
    
    jolmg 2 days ago
    
    > but it seems like some authors in the 1400s wrote "ph" in some Spanish words, at least when they knew the Latin or Greek etymologies.
    You know, that might be analogous to Spanish speakers familiar with English writing "tweet" in Spanish text, while being ignorant that RAE added "tuit"[1] to the language, which is more in-line with general language rules. IDK if any Spanish speaker has ever written "tuit" in real life.
    [1] https://dle.rae.es/tuit?m=form

throwup238 2 days ago

> After all (he said, pleadingly) consciousness really is an irreducible interior fortress that refuses to be pinned down by the numeric lens (really, it is!)

I love this line and the “flattening of human complexity into numbers” quote above it. It sums up perfectly how I feel about the whole LLM to AGI hype/debate (even though he’s talking about consciousness).

Everyone who develops a model has to jump through the benchmark hoop which we all use to measure progress but we don’t even have anything approaching a rigorous definition of intelligence. Researchers are chasing benchmarks but it doesn’t feel like we’re getting any closer to true intelligence, just flattening its expression into next token prediction (aka everything is a vector).

voidhorse 2 days ago

Yeah precisely. Ever since the "brain as computer" metaphor was birthed in the 50s-60s the chief line of attack in the effort to make "intelligent" machines has been to continually narrow what we mean by intelligence further and further until we can divest it of any dependence on humanist notions. We have "intelligent" machines today more as a byproduct of our lowering the bar for what constitutes intelligence than by actually producing anything we'd consider remotely capable of the same ingenuity as the average human being.
- afthonos a day ago
  
  I find this take strange. My observation has been the opposite. We used to say it would take human intelligence to play chess. Then Deep Blue came up and we said, no, not like that. Then it was go. Then AlphaGo came up and we said no, not like that. Along the way, it was recognizing images. And then AlexNet came along, and we said no, not like that. Then it was creating art, and then LLMs came along, and we said no, not like that.
  I agree a narrowing has happened. But the narrowing is to move us closer to saying "if it's not implemented in a brain, located inside a skull, in a body that was developed by DNA-coded cells replicating in a controlled manner over a period of years, it's not really AI."
  There's an emotional attachment to intelligence being what makes us human that causes people to lose their minds when machines approach our intelligence. Machines aren't humans. If we value humanity, we should recognize that distinction—even as machines become intelligent and even sentient.
  And we should definitely think twice, or, you know, many many many many more times, before building intelligent machines. But I don't think pretending we're not doing that right now is helpful.
  - voidhorse a day ago
    
    I think that's a great take and though they appear contradictory, I actually think both perspectives are correct.
    I think what both viewpoints show is that, at the end of the day, intelligence is a broad, fuzzily defined thing, and attempting to claim that a single capability is evidence of intelligence always seems to be insufficient (from either direction).
    I also think your points about our own emotional attachment and thinking carefully about intelligent machines are superb. I see a lot of people chasing certain tech right now and I see a far smaller number asking whether or not this tech is something we need or want. I personally don't need to live in a world in which robots are 1:1 emulations of humans (or better). I'd be just as content to live in a world of highly specific and highly optimized collections of robots or "intelligences" only capable of doing one thing really well (a unix theory of "agents", as it were)
  - simonw a day ago
    
    This is called the "AI effect" - the constant shifting of goalposts when the term AI is used, which has been going on for 50+ years at this point: https://en.m.wikipedia.org/wiki/AI_effect

zwischenzug 2 days ago

I wrote this piece in 2023, which argues similarly that LLMs are a boon, not a threat to historians

https://zwischenzugs.com/2023/12/27/what-i-learned-using-pri...

adamredwoods 2 days ago

>> One of the well-known limitations with ChatGPT is that it doesn’t tell you what the relevant sources are that it looked at to generate the text it gives you.
This isn't a limitation, this is critically dangerous. Commercial AI is a centralized, controlled, biased LLM. At what point will someone train it to say something they want people to believe? How can it be trusted?
Consensus based information is still best, and I don't feel LLMs will give us that.
- thom 2 days ago
  
  This is the thing I specifically use LLMs for when I’m doing history courses. I’ll remember some vague quote or event and ask for the primary sources and latest ChatGPTs are excellent and getting the right reference, which I can then look up and check myself. Maybe this works better for Latin and Greek texts when it’s gobbled up all the Loebs out there but it works well for me.
- lmm 2 days ago
  
  Consensus based history has similar problems. It's extremely easy for the consensus to be distorted by contemporary politics.
- delichon 2 days ago
  
  On the contrary. The heart of an LLM is a next word predictor, based on statistics. They do much the same with concepts, making them essentially consensus distillation devices. They are zeitgeisters. They get weird mainly when their training data is too sparse to find actual consensus, so instead tell you to stick cheese to your pizza with glue.
  - astrange 2 days ago
    
    > They get weird mainly when their training data is too sparse to find actual consensus, so instead tell you to stick cheese to your pizza with glue.
    That's exactly not how that happened. That happened because Google's summaries are based on their search results and one of the search results contained that.
  - ericjmorey a day ago
    
    This is only useful if you know what data was used to train the model.
dang 2 days ago

Discussed here!
What I learned using private LLMs to write an undergraduate history essay - https://news.ycombinator.com/item?id=38813297 - Dec 2023 (81 comments)

Animats 2 days ago

"LLMs, which are exquisitely well-tuned machines for finding the median viewpoint on a given issue..."

That's an excellent way to put it. It's the default mode of an LLM. You can ask an LLM for biases, and get them, of course.

astrange 2 days ago

I don't think there is any reason to believe this except that everyone seems to want it to be true.
An easy way to make it not be true would be to emphasize some sources in pretraining by putting them in the corpus multiple times.
- miki123211 2 days ago
  
  A much better way is to RLHF the LLM until you get the behavior you want.
  As far as I know, modern LLMs try to strike a balance between being somewhat neutral, while not being too neutral on topics outside of the overton window. They'll give you a "both sides have their good points" argument on abortion, religion, guns or immigration, but won't do that for obvious racism or nazi viewpoints.
  Early LLMs had a problem with getting this balance right, I feel like many of them were a lot more left-leaning. I don't know how much of the change is caused by us understanding the technology better and how much is just the political winds shifting, though.
  I felt like we had a moment there when some models were a bit too "well it depends", even on very uncontroversial subjects.
  - pjc50 2 days ago
    
    > I feel like many of them were a lot more left-leaning
    "Reality has a liberal bias"
- dleeftink 2 days ago
  
  Maybe not 'median' but rather 'sufficiently representative', as with all distributional semantics, given a large enough corpus we can approach the 'true' distribution of word/phrases in a given language.
  - krainboltgreene 2 days ago
    
    Except the corpus itself is fractional of all media. This is like saying Twitter is sufficiently representative of all human history.

gcanyon 2 days ago

I wonder (hope) that for any given issue, the majority of the internet/the training data, and therefore the model's output, will be fairly near to the truth. Maybe not for every topic, but most.

E.g., the models won't report that unicorns are real because the majority of the internet doesn't report that unicorns are real. Of course, there may be issues (like ghosts?) where the majority of the internet isn't accurate?

tptacek a day ago

This was so good. I'm super curious to learn more about the strategies used to set up system prompts for the custom GPT that was set up here.

DennisP 2 days ago

It was pretty neat seeing this because a recent paper found that AI models are bad historians: https://techcrunch.com/2025/01/19/ai-isnt-very-good-at-histo...

But the gist of its argument just seems to be that they don't know fine details of history, and make the same generalized assumptions that humans would make with only a cursory knowledge of a particular topic. This seems unavoidable for a model that compresses a broad swath of human knowledge down to a couple hundred gigabytes.

Using AI as a research tool instead of a fact database is of course a whole different thing.

trgn 2 days ago

One thing I'd love if models would get to help me confirm a thing or find the source od soemthing I have a vague memory of and which may be right or wrong, I just don't know.

E.g. I have this recollection of a quote, slightly pithy, from around the 19 hundreds about hobby clubs controlling social life, maybe from Mark twain, maybe not.

I just cannot come up with the prompt that gets me the answer, instead I just get hallucination after hallucination, just confirming whatever I put in, like a student who didn't study for the test and is just going along with what the professor is asking at the oral exam.

sloproth 2 days ago

In my experience, these AI models haven't been great with knowledge about one specific figure (like a President). I wonder if there's a movement to start introducing these AI models to books or e-books that aren't accessible online? I wish I could be able to discuss the less publicly known details of historical figures' lives or upbringings with AI, but it's clear that more niche information that you can only read about isn't available to it.

ris 2 days ago

Still waiting for someone to train an LLM entirely from sources written before a chosen date and be able to discuss concepts with someone apparently lacking any knowledge of the world after that date.

lionkor 2 days ago

Try to get an LLM to admit it doesn't know something, first
- jfengel 2 days ago
  
  They're pretty apologetic about it. Then they tell you a different wrong thing.
  - lionkor 2 days ago
    
    That takes you pointing out it's wrong. Ask it something it can't know, and it will answer it.
- dehrmann 2 days ago
  
  This is where LLMs are worse than even incompetent people. People at least know when they don't know something, and might even be truthful about it. LLMs don't know they don't know.
  - willy_k 2 days ago
    
    https://en.m.wikipedia.org/wiki/Dunning–Kruger_effect
EcommerceFlow 2 days ago

Would be fascinating trying to get an LLM trained with 1900 data to discover Einstein physics
- jychang 2 days ago
  
  Wouldn't be too difficult. Poincaré/Lorentz/Hilbert were close to developing the same concepts behind special relativity around 1905 as well. If Einstein randomly died in 1904, I think relativity would have been discovered within the decade anyways, just by combining their knowledge.
  Lorentz developed the Lorentz contraction independently of Einstein already, he was just hampered by the fact he adhered to the idea of the luminiferous ether as a medium for light propagation. I fully believe Hilbert+knowledge of tensors (Einstein didn't know the concept of tensors in 1905! [1]) would have developed general relativity as well.
  [1] Einstein actually had an idea for general relativity much before he actually figured it out, he literally just lacked the mathematical knowledge to formalize it. He had to learn tensors in order to develop the Einstein Field Equations https://www.quora.com/How-did-Einstein-get-the-idea-that-he-...
  - suddenlybananas 2 days ago
    
    It wouldn't be difficult for a different (very intelligent) human, it would almost certainly be impossible for LLMs, they have never done anything remotely analogous.
Jordan-117 2 days ago

"What would have happened if ChatGPT was invented in the 17th century? MonadGPT is a possible answer. MonadGPT is a finetune of Mistral-Hermes 2 on 11,000 early modern texts in English, French and Latin, mostly coming from EEBO and Gallica. Like the original Mistral-Hermes, MonadGPT can be used in conversation mode. It will not only answer in an historical language and style but will use historical and dated references. This is especially visible for science questions (astronomy, medecine). Obviously, it's not recommended to follow any advice from Monad-GPT." Available to install and run locally -- or you can try it out for free online."
https://www.metafilter.com/201537/O-brave-new-world-that-has...
waveBidder 2 days ago

might work for say post the 1800's in literate countries, but for e.g. Rome our sources are so sparse and so far removed from the time they're writing about that it would be worse than nothing.
- duskwuff 2 days ago
  
  For a period like the Roman Empire, there's might be too little source material to even train the model to speak Latin, let alone to say anything about its world. IIRC, the entire surviving corpus of ancient Latin would fit into a couple of bookshelves - it's miniscule.
dataviz1000 2 days ago

In the 1950s, most people believed that the Soviets made the biggest contribution to stopping the Nazis. However, today, most people think it was actually the Americans who played the biggest role in defeating the Nazis.
> "In 1945, the French public said the Soviets did the most to defeat Nazi Germany - but in 2024 they're most likely to say it was the Americans"[0]
[0] https://yougov.co.uk/politics/articles/49613-d-day-anniversa...
- kranke155 2 days ago
  
  The Soviets put the men. The Americans put the materiel.
  Stalin thought he would’ve lost if it wasn’t for Lend Lease.
  - somenameforme 2 days ago
    
    The USSR might have lost if not for US supplies, but the Allies would definitely have lost if not for the Red Army. Technology has come a long ways since the 30s, yet even contemporary wars again emphasize that in the end it all just comes down to manpower.
    And this is extremely remarkable if you think about it. Germany basically declared war on the world, and very nearly won.
    
    bee_rider 2 days ago
    
    Germany was not in a position to hit the US. They weren’t a great naval power. They needed to cross the Atlantic, but the English Channel was too big a hurdle.
    Eventually the fact that the war was happening in their land would grind them down. Plus, the US had nukes and aircraft carriers by the end, which would have presented a challenging situation.
    
    somenameforme 2 days ago
    
    Well you're making a huge shift without accounting for it. The land war picture would have been radically different without the Soviets. In total they deployed more than 34 million soldiers during WW2 [1]. That's substantially larger than the contribution of every other ally, combined. The second largest force was the US with a total of 12.2 million soldiers (by the end of the war). [2]
    So what would have happened in this scenario is difficult to even imagine, because Germany would have been under far less pressure. They were already working on the development of 'Projekt Amerika' [3]. It went nowhere, but without the pressures of the Red Army they would have had vastly more resources to expend on such ventures.
    [1] - https://en.wikipedia.org/wiki/Red_Army
    [2] - https://www.nationalww2museum.org/students-teachers/student-...
    [3] - https://en.wikipedia.org/wiki/Amerikabomber
    
    bee_rider a day ago
    
    But you said the Allies definitely would have lost without the Soviet Union. Is it hypothetically possible that the Nazis might invent some mystery weapon without less pressure? Maybe… but it isn’t a sure thing.
    Getting a plane across the Atlantic with the limited bomb and fuel load that entails might not have accomplished much. If only a couple bombs could have done the job, I guess allies wouldn’t have been building all those bomber fleets, right?
    And they didn’t build a serious surface navy before the war, or in the first year-or-so when they were still trading partners with the Soviet Union. It seems there was something beyond manpower pressure holding them back.
    
    somenameforme 19 hours ago
    
    A mystery weapon would not have been necessary. You're talking about a Nazi army that would now be facing an enemy force missing the majority of its soldiers, in a war that was already very close. The Nazis clearly would not have fallen, or even really come close to falling, without the Soviets.
    Now I think your argument is basically the same, but for America. In that the Nazis were in no position to invade and occupy America - which I fully agree with. But victory in a war doesn't require you occupy every single enemy nation. The most common way wars end is in settlement. And I see no possible path where the Nazis would not have been able to achieve a favorable settlement.
    
    kranke155 a day ago
    
    I didn’t say it was either or. It was a symbiotic relationship. The Soviets were willing (and in a position they were forced to) lose millions. It was a tragedy.
    Americans were in position to fund and aid it. Both Stalin and Khrushchev acknowledged that American material help was of great importance.
  - otabdeveloper4 2 days ago
    
    American lend-lease was 11% of the Soviet war effort. The vast majority of that 11% was consumer goods, not weaponry.
    
    kranke155 a day ago
    
    Then why did both Stalin and Khrushchev say it was important for the victory ?
    Sorry but I’ve read tons of AskHistorians answers about this. I recommend you go there and search this same question. The Soviets needed that 11%. My understanding is that a lot of it helped mechanise the Soviet Army, ie trucks were delivered in large quantities.
    From Claude I got this:
    “In Khrushchev's memoirs, he recalled Stalin saying in a private conversation that without American aid through Lend-Lease, the Soviet Union "would not have been able to cope because we lost so much of our industry." Khrushchev himself wrote that Lend-Lease was "of utmost importance" and that "we would have been in a difficult position without it." “
    
    kranke155 a day ago
    
    Here is some more from Claude (it sounds about right but I don’t have to check this):
    “ The significance of Lend-Lease goes far beyond the raw percentage of industrial output. Here's why it was so crucial:
    1. Timing and Critical Shortages: - The aid arrived during the most critical period (1941-1942) when Soviet industry was being relocated east of the Urals - During this vulnerable period, American trucks, food, and materials helped keep the Soviet army mobile and fed - Without this bridge of support during the industrial relocation, the USSR would have faced severe shortages at its most vulnerable moment
    2. Strategic Materials and Bottlenecks: - The Soviets received specific materials that were severe bottlenecks in their production: - Aviation fuel and high-octane gas - Aluminum for aircraft production - Radio equipment and communications gear - Special grades of steel and industrial equipment - These materials were critical multipliers that enabled Soviet production
    3. Transportation and Logistics: - Nearly 450,000 trucks were provided, which revolutionized Soviet logistics - Before Lend-Lease, the Red Army relied heavily on horse transport - American Studebaker trucks allowed for rapid troop movements and superior logistics - This mobility was crucial for later Soviet offensive operations”
  - StefanBatory 2 days ago
    
    And why Soviets put the men? Because they started the war in the first place with Germans.
    Too many people forget about that. They were allies at first.
  - dataviz1000 2 days ago
    
    [flagged]
    
    AdieuToLogic 2 days ago
    
    > According to ChatGPT ...
    ChatGPT is not an authoritative source of knowledge in any way, shape, or form.
    To cite it as such is folly.
    
    kranke155 a day ago
    
    I am not any of that.
    Good luck for asking chatGPT for world history.
    
    dataviz1000 a day ago
    
    My last comment was meant as a joke. I should have made that clearer by emphasizing Poe's Law. My prompt included "this is a joke." Originally, I was only talking about data that shows how people's views of history change over time, whether those changes are true or not. I vaguely remember hearing that statement about ten years ago, and I searched the internet for a trustworthy source to support it.
    Continuing our discussion on how perspectives of history change, it might be that Americans deserve even more credit for this historical event than had 70 years ago.
    The conversation was about how historical perspectives change over time, but the following comments focused on modern views of a specific historical moment.
- jfengel 2 days ago
  
  That's very funny. I'd have thought they'd be hard pressed to get details of the Eastern front, while American involvement was right in front of them.
  As far as I can tell, the Americans and Brits took too much credit. Then the Soviets and Russians insisted on more credit -- arguably too much. Of late I'm hearing historians say "Yeah, the Germans overextended themselves at the start and likely would have lost even if Hitler hadn't betrayed Stalin". I'm sure that analysis too will change.
  - bee_rider 2 days ago
    
    I wonder if the “biggest contribution” was parsed differently in different eras. The Soviets clearly sacrificed the most, which probably was very felt by people who were surrounded by all the dying and personally experienced the violence.
    The US industrial contribution is easier to understand looking back. It makes a lot of sense to us nowadays, looking at it in a table (not to cheapen it, it was an astonishing amount of stuff that was produced).
    It seems entirely possible that the Soviets gave up more for the victory, while the US contributed more to victory.
aero142 2 days ago

Are there any successful models that weren't trained with RLHF, or using a system with RLHF. I'm curious if this could be done without a fine tune step that would't meaningfully bias this.
Uehreka 2 days ago

Normally I balk when commenters go “well they you’re the perfect person to go do it!”, but actually… this is the kind of thing that sounds like it could be a fun project if you’re legit interested. The necessary datasets are likely not hard to gather and collate, a lot of it is probably on places like Project Gutenberg or can be gleaned through OCR of images downloaded from various publicly available archives.
Granted, you’d need to spend about a year on this and for a lot of that time your graphics card (and possibly whole computer) would be unusable, but then if the results were compelling you’d get a cool 15 minutes of internet fame when you posted your results.
- tormeh 2 days ago
  
  I got 15 minutes for basically a useless compiler and programming language that I spent 6 months on. Just for the effort-to-result ratio I feel like it's possible to do quite a lot better.
sloproth 2 days ago

yes! There's this measure of historical expertise that involves "eating the brains", so to speak, of the people living back then such that if you time traveled back to a bar or street in [insert period], you could carry on a conversation about events going on in that time :) I would love something that uses newspaper fragments, books, etc. to simulate this experience!
csmpltn 2 days ago

The only reason LLMs “work” is because they are trained on a vast corpus of (text-based) human interactions online. The main reason LLMs weren’t a thing 25 years ago, was because there just wasn’t enough scrapeable and useful data available online…
Reduce the dataset to “knowledge as of year 1880” - and it’s not certain you’d even be able to “interact” with the LLM in any meaningful way…
- VierScar 2 days ago
  
  The main reason LLMs weren't a thing 25 years ago is because they weren't invented yet, or many of the prior steps. And if they had been, we didn't have the compute create them.
monktastic1 2 days ago

I think I'm slow. Can you explain this again, maybe with more words?
- kccqzy 2 days ago
  
  Let's say we choose 1900 as the cutoff date. That means during training the model is only able to access material written before 1900. It would have a good knowledge about everything discovered in the 19th century and before. There's a great deal of mathematics, physics and chemistry available then. What if now we engage a discussion with that LLM on something discovered after 1900? Say transmutation and nuclear weapons, or general relativity, or the ZFC set theory.
  - VierScar 2 days ago
    
    Wouldn't it be easier to cutoff pre-2020-ish, and ask it to create the transformer architecture of gpt? 1900 is so long ago I doubt most documents are good quality if they've been digitised at all. Most likely just low quality scanned images of inconsistent, half-illegible typewriter documents. Transcribed with OCR at best.
    
    kccqzy 2 days ago
    
    The problem I see with any date after the popularity of the internet is that you just can't be sure of the right date. A lot of traditional web forums now have backdated forum posts that are clearly made by LLM with an implausible date: https://hallofdreams.org/posts/physicsforums/
    
    throwup238 2 days ago
    
    You can use CommonCrawl - which has massive datasets going back to 2008 - and the Internet Archive.
    
    cellis 2 days ago
    
    Also so little training data from that era. Like, exponentially more data was created after, say, <year when most records become digitized = 1970>

urbandw311er 2 days ago

How would one know that the translation of the Italian text (that he gives as an example) was not just already baked into the model’s training data?

cyrillite 2 days ago

Now the question is how can I, someone without a PhD in history but currently a PhD candidate in another discipline, use these tools to reliably interrogate topics of interest and produce at least a graduate level understanding of them?

I know this is possible, but the further away I get from my core domains, the harder it is for me to use these tools in a way that doesn’t feel like too much blind faith (even if it works!)

simonw 2 days ago

I think the trick here is to treat everything these models tell you as part of a larger information diet.
Like if you have a friend who's very well-read and talkative but is also extremely confident and loves the sound of their own voice. You quickly learn to treat them as a source of probably-correct information, but only part of they way you learn any given topic.
I do this with LLMs all the time: I'm constantly asking them clarifying questions about things, but I always assume that they might be making mistakes or feeding me convincing sounding half-truths or even full hallucinations.
Being good at mixing together information from a variety of sources - of different levels of accuracy - is key to learning anything well.
- serviceberry 2 days ago
  
  This strikes me as an odd claim. You don't hang around with a friend who makes things up because they somehow enhance your learning process. You hang around with them despite the fact they're annoyingly unreliable, presumably because you value their company for other reasons.
  Let's say you're trying to get a university degree, but having a professor who makes up 20% of what they say. Is that helping you "learn well"?
  - tormeh 2 days ago
    
    Once everything has made it through the jungle telephone you're lucky if it's 80% correct. 20% wrong is a downright reliable source by human standards, at least about topics which people care about.
    
    notTooFarGone 2 days ago
    
    But a human can tell you if they are not too sure and completely sure.
    
    krainboltgreene 2 days ago
    
    This is cute, but ultimately not true nor helpful.
  - brandall10 2 days ago
    
    You might want to read the academic criticisms of an influential pop history book written by an academic, such as Sapiens.
    And 20% is way overstated, esp for a SOTA model when it comes to verifiable facts.
    
    serviceberry 2 days ago
    
    I don't think that's a useful comparison. Humans writing history books have agendas and biases, but they're usually fairly transparent. In contrast, LLM failure modes are more or less impenetrable and very non-human. You're just inexplicably served with some very convincing but incorrect stuff.
  - ismailmaj 2 days ago
    
    20% is a harsh figure but it could be a good entry point to figure out the unknown unknowns and go in depth once you have the relevant keywords using more reliable sources.
  - edgineer 2 days ago
    
    Well that sounds like oral history, which is how all people used to learn. Strictly fact check everything you say seems like a modern invention.
- eslaught 2 days ago
  
  Confidence hijacks the human brain. Without direct, personal expertise or experience to the contrary, spending time around your hypothetical "friend who's very well-read and talkative but is also extremely confident and loves the sound of their own voice" is going to subconsciously influence your opinions, possibly without you even knowing.
  It's easy to laugh and say, well I'm smart enough to defeat this. I know the trick. I'll just mentally discount this information so that I'm not unduly influenced by it. But I suspect you are over-indexing on fields where you are legitimately an expert—where your expertise gives you a good defense against this effect. Your expertise works as a filter because you can quickly discard bad information. In contrast, in any area where you're not an expert, you have to first hold the information in your head before you can evaluate it. The longer you do that, the higher the risk you integrate whatever information you're given before you can evaluate it for truthfulness.
  But this all assumes a high degree of motivation and effort. Like the opening to this article says, all empirical evidence clearly points in the direction of people simply not trying when they don't need to.
  Personally, I solve the problem in my friend circle by avoiding overconfident people and cultivating friendships among people who have a good understanding of their own certainty and degree of expertise, and the humility to admit when they don't know something. I think we need the same with these AIs, though as far as I understand getting the AI to correctly estimate its own certainty is still an open problem.
- vunderba 2 days ago
  
  > Like if you have a friend who's very well-read and talkative but is also extremely confident and loves the sound of their own voice. You quickly learn to treat them as a source of probably-correct information, but only part of they way you learn any given topic.
  I can't speak to everyone's experience - but whenever I'm having a conversation around relatively complex topics with MY friends - the deeper they dive, the more they're constantly referring back to their dive computer. They'll also try to make arguments that are principally anchored to the pegs that they're convinced will hold. I'm aware I'm mixing metaphors here but the point stands.
  As far as "mixing information" - yes there are commonly known tricks to trying to get a more accurate answer:
  - Query several LLMs
  - Query the same LLM multiple times with different context histories
  - Socratically force it to re-assess itself
  - Provide RAG / documents / access to search engines
  - Force quantitative tests in the form of virtualized envs though this is more for Compsci/Tech/Math
  etc.
  LLMs don't currently have a good sense of their boundaries - they can't provide realistic confidence scores and weight their outputs accordingly - the human equivalent of saying, "I only have a passing familiarity with the original greek of the Septuagint, but I think...."
  It's a poor use of an LLM as a glorified fact checker - it's far better as a tool for free form exploration.
  > Being good at mixing together information from a variety of sources - of different levels of accuracy - is key to learning anything well.
  I have a pretty extensive background in teaching/education and I would heavily disagree with this assertion - at least when starting as a complete novice. The key to learning well is to establish a strong foundation by learning from the most accurate resources as possible. When you pick up a musical instrument, you don't want a teacher who's just one page ahead of you in the lesson book.
aquafox 2 days ago

You ask them for references and check yourself. They are good exploratory and hypothesis generating tools, but not more. Getting a sensible sounding answer should not be an excuse for you to confirm. Often, the devil is in the details.
kozikow 2 days ago

> the harder it is for me to use these tools in a way that doesn’t feel like too much blind faith (even if it works!)
I tend to ask multiple models and if they all give me roughly the same answer, then it's probably right.
- energy123 2 days ago
  
  Also keeping context short. Virtually all my cases of bad hallucinations with o1 have been when I've provided too much context or the conversation has been going on for too long. Starting a new chat fixes it.
  You can see this effect in the ARC-AGI evals, too much context impacts even o3(high).
- aquafox 2 days ago
  
  > if they all give me roughly the same answer, then it's probably right.
  ... or they had a lot of overlapping training data in that area.
- otabdeveloper4 2 days ago
  
  Or maybe they were just trained on the same (incorrect) dataset.
sdesol 2 days ago

I wrote a chat app built around mistrust for LLM responses. You can see an example here:
https://beta.gitsense.com/?chat=ed907b02-4f03-477f-a5e4-ce9a...
If you click on the Evaluation links, you can see how you can use multiple LLMs to validate LLM response. The evaluation of the accurate response is interesting since Llama 3.3 was the most critical.
https://beta.gitsense.com/?chat=fdfb053d-f0e2-4346-bdfc-7305...
At this point, you would ask Llama to explain why the response was not 100% which you can use to cross reference other LLMs or to do your own research.
yannis 2 days ago

I find them useful in summarizing State of the Art to get me going in a new topic, but then again so is Wikipedia. A useful side angle, if you using LaTeX, you can cut-and-paste references into ChatGPT and can turn them into Bibtex format with >80% success. For a PHD study though starting from textbooks, papers etc. it will be better, but can augment successfully, like any tool use it for what is best.
AdieuToLogic 2 days ago

> Now the question is how can I, someone without a PhD in history but currently a PhD candidate in another discipline, use these tools to reliably interrogate topics of interest and produce at least a graduate level understanding of them?
You can't. Because LLM's are statistical generative text algorithms, dependent upon their training data set and subsequent reinforcement. Think Bayesian statistics.
What you are asking for is "to reliably interrogate topics of interest", which is not what LLM's do. Concepts such as reliability are orthogonal to their purpose.

pelagicAustral 2 days ago

I'm not sure what good will a system that only focuses on targeted truths will ever do to humanity, we already live in a world were stats are only valid if they do not offend a single person. The reason AI's are so doctored are that sometimes we just do not want to hear the truth, and we dont.

britch 2 days ago

Interesting perspective. I appreciate that it tests the models at different "layers" of understanding.

I have always felt that LLMs would fall apart beyond the summarization. Maybe they would be able to regurgitate someone else's analysis. The author seems to think there's some level of intelligent creativity at play

I'm hopeful that the author is right. That truly creative thinking may be beyond the abilities of LLMs and be decades away.

I think the author doesn't consider the implications of broad use of LLM societally. Will people be willing to fund human historian grad students when they can get a LLM for a fraction of the price? Will prospective historians have gained the training necessary if they've used an LLM through all of school?

I believe the education system could figure it out over time. I'm more worried that LLMs like this will be used as further justification to defund or halt humanities research. Who needs a history department when I can get 80% for the cost of a few chatGPT queries?

rgmerk 2 days ago

I'm no professional historian, but every time I try this kind of thing I'm very disappointed in the results.

A hobby of mine is editing Wikipedia articles about Australian motorsport (yes, I have an odd hobby, sue me).

The vehicles in the premier domestic auto racing category in Australia, the Supercars Championship, are unique to the category. Like NASCAR, they're built on a dedicated space frame chassis with body panels that look like either a Mustang or a Camaro draped over the top.

I'd seen occasional claims on forums that when the organising body was deciding on the design of the current generation of cars, they considered using the "Group GT3" rules that are used for a bunch of racing series around the world (including the German DTM championship, the GT World Challenge events raced across Europe, Asia, and Australia, and the IMSA GTD and GTD Pro categories). If true, it might be an interesting side note to the article about the Supercars Championship.

So I asked Copilot (the paid model) to find articles in motor sport media about this (there are a number of professional online publications that cover the series extensively). It confidently claimed that yes, indeed, there was some interest in using GT3 cars in the Supercars championship, and pointed me to three articles making this case.

The first was an article featuring quotes from the promoter of the DTM series saying what a good idea it was to have a common car across different national series. So the first article was relevant, but didn't actually show that anyone involved in the administration of the Supercars Championship was interested in the idea.

The second and third references were articles about drivers and teams whose core business is the Supercars championship also running cars in the local GT3 championship (while not explicitly mentioned in the article, they do this for a large wad of cash from the rich hobbyists who co-drive and fund most GT3 racing). Copilot's interpretation of the articles was just flat-out wrong.

Yes, this was a sample size of one historical query, but its response was very poor.

simonw 2 days ago

LLM-powered search usually isn't very good. If you watch what it's actually doing it's running basically the same searches you would, then looking at the first 5-10 results and using those.
If those 5-10 results aren't great the LLM's response won't be great either.
energy123 2 days ago

> So I asked Copilot (the paid model)
Was this using o1? The author of the article was quite clear that his opinion doesn't apply to previous models.
- rgmerk 2 days ago
  
  No, it wasn’t.
  Clearly, I’ll need to give o1 a try at some point to see if it does better.

socki 2 days ago

Wow what an incredibly interesting article. Thank you for sharing.

fencepost 2 days ago

Good tools for translations, etc? Sure!

Good historians? Ehhhhhhh.

The problem is one of trust, and it's very difficult to trust the output of LLMs to be correct/true vs "truthy" without extensive verification that may be either as laborious as doing the original research or that may be difficult or impossible without knowledge and understanding of the internals and sources that may not be available.

afinlayson 2 days ago

This also means they'll be excellent at changing history for those who wish history was more aligned with their views.

sdesol 2 days ago

I actually think changing history will be harder in the future as it requires alignment across models.
- esafak 2 days ago
  
  Why wouldn't people use models of their preference, just as they do news sources?
  - sdesol 2 days ago
    
    They would, but I think challenging facts will be easier. Instead of saying "I heard if from blah" and not having an easy way to fact check it, you just get LLMs to challenge one another. LLMs (today's, not future versions which could be drastically different) don't have a built in goal post mover.
- afinlayson 2 days ago
  
  That assumption presumes the existence of more than one model. Currently, numerous models are being developed frequently, and I hope the price decreases. However, if a single model captures 90% of the market, the others will cease to be updated, making it easier to control. What would transpire in an authoritarian country? Would they permit a disputed border to be included in that model? If that company were acquired by the first trillionaire, could they alter history to favor any wrongdoings they committed to achieve that status? Power is accumulating, not dispersing.
  - thomashop 2 days ago
    
    It feels like we've been moving in the opposite direction, where more and more models from various countries are state-of-the-art.
    The idea that there will be one model to rule them seems very unlikely.

tolerance 2 days ago

This seems to sap the intrigue out of research. But I get it. My impression of academia is antiquated. People have Jobs to do. Capital J. And this is more convenient to them. Even though I think it makes them look sort of dumb. But that’s just me and I’m not an academic anyhow.

While I welcome the rise of parallel shadow institutions as civilization grows spiritlessly utilitarian, the future for common sense looks bleak.

voidhorse 2 days ago

Yep, we are witnessing the climactic zenith of instrumental reason operationalized and distributed on a worldwide scale. At least there's a contingent of semi-humanist thinkers left, but the number is growing worrying slim.

3willows 2 days ago

On the last point, why struggle with history:

Robert Nozick (in Examined Life) asked how we feel if we found out, say, Beethoven seriously composed music based on a secret formula, which is entire mechanical and required no effort for him at all.

Would we still appreciate the music in the same way? If not, does our appreciation really stem from the fact that we feel he has also struggled like we do, and nevertheless produced something incredible.

I remember as a very small child watching figure skaters on TV and thinking "that's no big deal". And before I started programming: "it's just logic, all very straightforward". But that was before I first entered an ice rink or centre-d a div

Maybe we don't really appreciate something unless we appreciate it is hard in a visceral way.

vunderba 2 days ago

> Beethoven seriously composed music based on a secret formula, which is entire mechanical and required no effort for him at all.
If he discovered the formula, then yes I imagine most people would appreciate the music just as much, if not more so.
If he copied the formula from somebody else, then he was just turning the crank - a far more sterile and mechanical affair.
Using Suno to "create" music is just turning the crank.
Related but much of Bach's music is just as incredible for its incredible mathematically relational structures as it is for its pure virtuosity and brilliance.
https://mathscholar.org/2021/06/bach-as-mathematician
- 3willows 2 days ago
  
  To clarify, what Nozick meant was what if Beethoven was just turning the crank.
  "Yet our experience of Beethoven's string quartets would be diminished if we discovered he had stumbled upon someone else's rules for musical composition, which he applied mechanically". (p. 38 of https://archive.org/details/examinedlife00robe/page/n15/mode...)
  I guess another way to put the question is this. Suppose there is an alien civilisation where their brains are hard-wired to make Beethoven level music automatically. Most of us can hum a tune without effort: these aliens can hum music that would strike us as original and compelling without much effort. How would we react then?
  Plus: nice pointer on the math scholar link. I remember loving the musical parts of Godel Escher Bach. Wish there is a good interactive website where I can revisit all the content (and listen to all the music) there in the browser.
- unraveller 2 days ago
  
  This stolen valor mindset becomes absurd if Bach anonymously open-sources the formula he discovered and dies before releasing any music himself.
  Whatever music that follows, however hard fought for - even if 1:1 Bach output that he kept in a drawer, isn't beautiful? Just string plucking? That's not music appreciation, that's love of reputation you can easily grasp and associate with.
  - 3willows a day ago
    
    I think Nozick's example is meant to make us re-think whether there is a strict separation between music (or other artistic) reputation and love of reputation.
    In Anarchy, State and Utopia, he tackles some utopian theorists' claim that, if equality prevails, everyone will rise up to the level of the greatest writers and artists. Would people be content then? Or will they still want to vie for "eyeballs"? If the latter, should we just admit that there is just a deep-seated human desire to compete for dominance?
    For what it is worth, I've written up my reflections on skimming Anarchy, State and Utopia here: https://books-blog.3willows.xyz/posts/2024-10-26-anarchy-sta...
- visarga 2 days ago
  
  The artistic value isn't magically generated by the piano itself, nor by the LLM in isolation. It's the result of the skilled interaction, the human artistry expressed through this new and powerful instrument.
gcanyon 2 days ago

> Maybe we don't really appreciate something unless we appreciate it is hard in a visceral way.
Count me out of that "we" -- I appreciate the artist who put in the work because they put in the work to make the thing I like, but I don't appreciate the thing because of the effort. I can marvel at the effort required to produce art in a certain way, but I'm (largely) indifferent to the effort in my actual appreciation of the thing (or lack of it).
I look forward to the time when I can have as much high-quality (to me) fiction to read as I like, because it's all generated by LLM. Some time after that, I'd love to see the main Star Wars sequence done properly. I won't care that it isn't created by a vast team of humans.
- kranner 2 days ago
  
  To represent the other side, I enjoy reading Urdu and Persian poetry but I will never be interested in reading anything generated by an LLM. No matter how 'high quality' it is represented to be, I'm aware that it was produced by a process that shares nothing with my own experience of the world. It has felt no hope, disappointment, fear, pain, mortality, loss of loved ones, lack of control over itself, a world model that has changed over the years, and doesn't know that it all doesn't amount to much in the end and yet this is all there is for itself. It may turn out to be sentient in some way, but it's almost certainly not sentient in the way that I am sentient. I know it's just mimicking being human as instructed; to take it seriously devalues everything about my own humanity. I'm not ready for that kind of enlightened insight, I think.
  - visarga 2 days ago
    
    > I know it's just mimicking being human as instructed; to take it seriously devalues everything about my own humanity.
    Since it is mirroring human culture, why do you see it in such a negative light? Instead see it like what it is, an interactive reconstruction, or maybe like a microscope to zoom into any idea.
    
    kranner 2 days ago
    
    I’m happy to use LLMs in all other contexts, quite enthusiastically actually. I’ve got DeepSeek 32B running locally on a beefy PC already.
    It’s just in the context of poetry, and literary writing in general, that I feel differently about them. There’s also the fact that I haven’t read all that human writers and poets have already written (and will never be able to in this short life) so there’s no need to turn to synthetic output. No supply problem exists. Poetry in particular is something to ponder over and over. You can’t really run out.
  - gcanyon 2 days ago
    
    Sure, that's your choice/preference, so good for you (sincerely).
    
    kranner 2 days ago
    
    Thanks, I respect your choice as well.
  - boredhedgehog 2 days ago
    
    > It has felt no hope, disappointment, fear, pain, mortality, loss of loved ones, lack of control over itself, a world model that has changed over the years, and doesn't know that it all doesn't amount to much in the end and yet this is all there is for itself.
    You can't know what any poet felt or didn't feel while writing a poem. Perhaps it was a commission piece, or an experiment or an emulation of something the poet had heard elsewhere.
    And more generally, whether the specific emotion another man feels is similar or even comparable to your own is also unknowable. He might use the same word to describe it, but the subjective experience associated with it might be completely different, and completely impossible to share.
    
    kranner 2 days ago
    
    Yes but at least it was possible for that poet to have felt what I feel they might have felt while writing that poem. And the closer they are to me culturally the more likely it is that I am not misidentifying their emotions entirely.
    Also poems are not really puzzles to be solved. If it produces an effect and is solid craft-wise, that is enough. There’s a lot to the craft side btw in the Urdu and Persian ghazal form which is what I had in mind while writing my original comment. LLMs can easily master the latter but have nothing to do with the former. Their output is pure form without substance.
    Edit: I want to add that ambiguity (ابہام) is even a desirable property in Urdu ghazal, specifically. The more interpretations a couplet can have, the greater is the accomplishment in terms of craft.
- majormajor 2 days ago
  
  > I look forward to the time when I can have as much high-quality (to me) fiction to read as I like, because it's all generated by LLM
  Are your tastes so hyper-specific that we aren't already in this world? Fiction is (even pre-LLM) easier to find in whatever genre you want than ever.
  - gcanyon a day ago
    
    I think it's possible to slice preferences endlessly, and I do think I'm a bit unusual.
    I gave ChatGPT a list of my favorite SF novels, and a brief description of why, and asked for similar works. It recommended 10 novels, three of which I've read and weren't in the sweet spot. Also, everything it recommended was 30+ years old -- to be fair, the same is true of the list I gave it, but I think it goes against your point that there's an unlimited supply.
    So I told it about the three and asked it to adjust and to give more recent works, and it obliged. One of the new recommendations was in the Culture series, which I've read one of and it wasn't my jam. Another was Project Hail Mary by Andy Weir, which I've read and enjoyed the Martian, but I'm betting that's the only Andy Weir I'll like. The others I'll have to check out.
    It's an interesting exercise.
- kannanvijayan 2 days ago
  
  > I look forward to the time when I can have as much high-quality (to me) fiction to read as I like, because it's all generated by LLM. Some time after that, I'd love to see the main Star Wars sequence done properly. I won't care that it isn't created by a vast team of humans.
  I think the problem here is analogous to the "500 channels and nothing to watch" issue in the heydey of cable.
  Ok let's say you have an LLM in your hand that can generate any story you want. High quality. So you say: "tell me a story" and it tells you as story. But what story? Who is in it? What characters? Why are they there?
  The only novelty that's going into this the prompt. Everything else is regurgitated weights and probability associations. The question is: does the full infinite closure of recombination over some finite learning set (no matter how large) encompass enough of the essence of creativity to produce something "new"?
  This is a hard question to answer because it forces us to try to define creativity, or lacking that - at least try to identify where it comes from.
  I don't have a clear answer to this but I'll suggest a line of thinking that seems plausible.
  When a person writes a story, it's not derived as an amalgam of everything they have read. It's not some probabilistic weighted average of all those associations. The story they write is also derived from their lived experience. Their personal interactions, their observations, their musings, their passions, their fears.. and how all of those things interact with their circumstance, influencing their reactions, those reactions influencing their environment, and that feeding back into the above process.
  There are two components that seem important here: the first is the existence of a rich, dynamic, and active _dialogue_ between the mind and its environment. It's not static, and it involves a feedback loop between the mind and the environment it models.
  The second is a motive force. For humans the origin is biological. Fear, hunger, satiation, arousal, etc. - those core primitive emotional drives that originally developed to help us survive, but then were layered over with an intellect that elaborated on them. What originated as a motive force to drive the mating instinct evolves into a sonnet about an unattainable maiden. The fear of the dark that keeps us away from the places where we would be eaten.. evolves into a stories about unfathomable creatures and impossible colours that drive men insane.
  And I think there's a third one that's unelaborated and implied but should be made explicit: introspection & reflection. The ability to consider your choices and consequences with respect to your motivations, and adjust any number of things - from the motivations themselves, to expectations/understanding.
  This creature would have a lived experience, some underlying motivations, and a feedback loop established between the two using introspection. I have no idea how you'd build any of that.. but it feels like that's what you'd need before you got yourself a good storyteller.
  But by that point, you'd also be compelled to question whether or not it's even ethical to force it to tell you a story anymore.
  I don't think it's impossible that some broader AI system eventually is capable of genuinely creating creative output. LLMs are not that, though.
  They seem more like a substrate.
LincolnedList 2 days ago

The value of art is in meaning and context. Purely generative art is as meaningful as a pretty rock. Think of the models as a camera. If you take shots from a car's dash cam in a city at random you might fall upon some really beautiful photos. But this is chance, the camera didn't create the city or its scenes. A photographer can choose or create meaningful scenes because he has a mind, consciousness and life experience.
- derektank 2 days ago
  
  Pretty rocks (e.g. mountains, gems, etc.) are frequently ascribed substantial meaning, despite the fact that no consciousness had a hand in creating them.
  - Retric 2 days ago
    
    Raw gemstones are generally uninteresting until people shape them. Diamonds worth thousands may not even qualify as interesting enough to pickup in the raw state, assuming you don’t know what it is.
    Mountains get meaning as aspects of our environment, but try and name the your top 10 most aesthetically pleasing mountains. At least for me, I may appreciate a scenic view but I just don’t think of them in that kind of context.
- visarga 2 days ago
  
  > A photographer can choose or create meaningful scenes because he has a mind, consciousness and life experience.
  But so does a user. Users don't prompt "draw a dog" but give 3 lines of intricate details and iterate a dozen times until it looks right. It's not like these models work all on their own.
  - SecretDreams 2 days ago
    
    Ah yes, the prompt engineers.
- fallinditch 2 days ago
  
  So is it still art if a photographer is driving the car with a dash cam and they drive it with the intention of capturing great images, and then goes over all the captured frames to find the best ones?
  I would say yes, this dash cam technique can be an artistic method. Reminds me of Jon Rafman's wonderful Nine Eyes project - he captures screenshots from Google Streetview, see https://9-eyes.com
- gedy 2 days ago
  
  There are people who define art that way, and there are those who define it as beautiful things. I'd personally own and display something beautiful made by algorithm than much post modern art, which is frequently visually unpleasant in spite of being rich in some message.
  Our brains are drawn to some things visually for instinctive reasons, and I don't need a big message when I'm decorating or wanting to please the eye.
  - LincolnedList a day ago
    
    A rose is beautiful, a painting of a rose less so, its two dimensional and lacks many aspects of the true rose. But it has more value because there are millions of roses, while the painting captures a unique experience of viewing the rose by the painter and transmits it to the other person viewing the painting.
    Unless there are ghosts in the shell, MidJourney gives an aproximation of how a painting of a rose looks like. Its like an aggrregate function that averages a million artists. Its a weird concept.
  - guax 2 days ago
    
    That's a very reductive and limited way to look at art. You're right in stating it as decoration but I would not conflate the two. Different things, both valuable in its own right.
satvikpendem 2 days ago

Have you seen the documentary Tim's Vermeer? Its thesis is that Vermeer, through advanced lenses at the time, was able to paint essentially mechanically rather than having a fine grasp of artistic brushstrokes in the traditional sense. Some, as well as in the documentary itself, think that it'd ruin the purpose of the art but I see it differently, especially with all of these AI artists now online, where the intent of the human making the art is all that matters, not the instrument or manner in which they do it.
- Verdex 2 days ago
  
  I've got a similar outlook w.r.t. Tims Vermeer.
  In my mind, art has always been a technological endeavor. Language, writing, and grammar are all tools. Brushes, stroke technique, and paint composition are all tools. I heard a story about Tony Hawk pioneering some skate board move, being the first in the world to get it right. And then seeing some teenagers doing the same thing years later in a park.
  Real artists learn what is possible and then develop tools to break those limits.
conception 2 days ago

The real value of almost everything is based on effort I think. The best gifts aren’t the ones that are the most expensive but the ones the giver put the most effort and time into. One of the reasons I like pre-cgi is the amount of skill and the effort put into FX is astonishing. Claymation and stop motion don’t look amazing - it’s the effort.
And to your point, knowing how much effort really goes into something often requires a bit of experience to really appreciate it.
- pinoy420 2 days ago
  
  The only claymation that looks good is that of aardman animation. Everything else looks absolute garbage.
  - spencerflem 2 days ago
    
    Aardman is incredible and their polish is wonderful, but this is not true
    fantasic mr fox, coraline, jack stauber's opal, etc. are also very beautiful
    
    jfengel 2 days ago
    
    The first two are stop motion, but not claymation. The third is claymation, but I'm hard pressed to call it "beautiful". Striking, to be sure, but also conspicuously ugly.
    At this point Aardman is also doing a lot of non clay stop motion, but it's still the core of their work.
    
    spencerflem 2 days ago
    
    Oh good point, missed the distinction
  - edm0nd 2 days ago
    
    wrong. Celebrity Deathmatch is the best claymation.
manquer 2 days ago

Why is this a dilemma at all?
Appreciating a piece of music is purely on its own merits of its content not if it was easy to create or not.
The background, ethics, skill or even creative process of the people behind it have no bearing on whether the music itself is good and how we much we like it, even if Hitler wrote the 9th symphony it would be still be a just as good a masterpiece. To consider anything but the merits of the output is a slippery slope of what biases are acceptable and not, that inevitably ends up being racial or at least exclusionary.
Even it was not as difficult as you imagined it to be, he still was the first to find it, or even just the first to popularize it and that is all that matters.
- bormaj 2 days ago
  
  I think it's fair to say that the context in which a work was created adds to the novelty and ingenuity of it's existence. These works don't exist in a vacuum and there's certainly a difference between a symphony created by Beethoven in his time/setting and a symphony produced as some model's output.
  Sure they may functionally have the same effect or enjoyment, but appreciation of a fine work goes deeper than its function.
AndrewKemendo 2 days ago

You’re assuming there is any coherent or consistent epistemological grounding for the average person’s beliefs
Its a fools errand - it’s an infinitely small set of people who can accurately describe their reasoning - even fewer have consistent reasoning - fewer still have coherence between beliefs
The ones that do we call either monks or crazy
I’d argue people aren’t even coherent enough to know how or what to appreciate
- pinoy420 2 days ago
  
  And, of course, you happen to be one of them no doubt :)
  The classic internet philosopher’s lament: everyone else is irrational, inconsistent, and incapable of coherent thought—except, of course, the enlightened commentator making the claim. The irony is that this kind of sweeping generalization is itself an incoherent mess, built on vague cynicism rather than any serious engagement with human reasoning. If you actually believe that consistency and coherence are so rare, what exactly do you think you’re demonstrating here? Because from where I’m sitting, it looks less like deep insight and more like self-important nihilism masquerading as wisdom.
  - igravious 2 days ago
    
    > The classic internet philosopher’s lament: everyone else is irrational, inconsistent, and incapable of coherent thought—except, of course, the enlightened commentator making the claim.
    But that wasn't the claim. @AndrewKemendo said "it’s an infinitely small set of people who can accurately describe their reasoning - even fewer have consistent reasoning - fewer still have coherence between beliefs" So he didn't say that everyone else is irrational, he said that very few can accurately describe their reasoning. And I think this is true. Very few take the time to introspect. Fewer still will do so to the point that they are consistent in their thinking. And fewer still will will analyze their values and beliefs and get them to square up with each other. Their is nothing controversial here. It's demonstratively true, all one has to do is listen to people carefully and probe them to motivate their reasoning every now and again.
    > The irony is that this kind of sweeping generalization is itself an incoherent mess, built on vague cynicism rather than any serious engagement with human reasoning.
    I reject that it's a "sweeping generalization" – I assert that most if not all people who spend enough time carefully introspecting and observing others necessarily must come to this conclusion. What about the claim is an "incoherent mess"? Clearly this is a personal peeve of yours because your response is emotional and doesn't refute the claim in any decent way.
    > If you actually believe that consistency and coherence are so rare, what exactly do you think you’re demonstrating here?
    That's a logical fallacy.
    > Because from where I’m sitting, it looks less like deep insight and more like self-important nihilism masquerading as wisdom.
    Twaddle.
    Emotional twaddle.
    
    darkerside 2 days ago
    
    Have you considered that it isn't that these people don't understand or can't express their motivations, beliefs, and values, but rather that they feel zero need to justify them to you or anyone else who questions them with the sole intent of proving themselves correct?
    
    throwup238 2 days ago
    
    Generally the questioning comes after they’ve already insisted on expressing their values.
  - AndrewKemendo 2 days ago
    
    You’re free to read all my writings and evaluate for yourself
    
    pinoy420 2 days ago
    
    > Iraq war veteran
    No thanks. Not really interested in the views of a murderer.

daveguy 2 days ago

[flagged]

dang 2 days ago

"Please don't fulminate."
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
https://news.ycombinator.com/newsguidelines.html
simonw 2 days ago

Did you read the article or are you just reacting to the headline?
- daveguy 2 days ago
  
  Yes, I read the article. Using these black box jumbles of weights to interpret historical documents is anathema to historical study.
  "I’m told that OpenAI’s newish o1 model is genuinely helpful and creative when it comes to thinking through open problems in the sciences..."
  "Likewise, although my knowledge of Italian is not great, I can read it well enough to confirm that the translation it offers is good enough to use for research:"
  Using a translation for research that you couldn't perform yourself seems extraordinarily substandard for a historian.
  The only part I agree with is a simple search to identify sources that may be relevant that you had not considered. i.e. Primary sources to be examined directly -- the way historians have done it for millennia.
  I don't think history should be filtered through model hallucinations. It seems an invitation to mistakes.
  The reference to a "medallion, or seat of humors, or badge of office" is obviously something being held. The historian specifically says it is not any of these, but a "urine flask." A historian obviously should not take an LLM model as factual.
  Later, the author writes, "After all: when you get down to it, o1 talking about a panopticon and Foucault in the above snippet is very, very similar to what a first year history PhD student might produce."
  This is exactly the point. A mediocre average of writing that a first year student would produce. Sure it could be used for "I hadn't considered that," but it surely should not be used for any factual interpretation.
  - simonw 2 days ago
    
    "Using a translation for research that you couldn't perform yourself seems extraordinarily substandard for a historian."
    Are you saying historians should only ever consider sources in languages they are personally fluent in?
    
    StefanBatory 2 days ago
    
    If they're trying to translate it themselves, then yes, for sure.
    
    daveguy a day ago
    
    Exactly. Thank you. Or with the help of a translator who is fluent in both. Not when that other entity has a tendency to make up plausible sounding bullshit.

fumeux_fume 2 days ago

Are LLMs good historians? Of course not. These types of articles always have some click/rage-bait title declaring AI supremacy at whatever task. I have used ChatGPT 4o to help translate old high German from broadsides printed in the 15-16th centuries into English and it seems to work pretty well. I don't think I'm doing serious ground-breaking research, but I feel like LLMs open doors and expand access to many things that were once completely locked without specialized knowledge.

p3rls 2 days ago

LLMs are trained on far too many reddit posts and top 12 kangs of ancient history type blogs to even contend with wikipedia for anything beyond surface level. Thucydidean-level insights? Forget it.
- astrange 2 days ago
  
  "Training on" a website doesn't mean the output will agree with the website.
  (eg: imagine pretraining where some of the documents are prepended with a "this is a bad example" token.)
  - p3rls a day ago
    
    I just mean the format and topics. When I use o1, it's great for things like calculating square mile comparisons based on your uploaded map of tribes or translating script like the OP's example, but as far as history itself goes, like drawing inferences and making sense of primary sources-- not so much. Ask a question about anything in-depth and it feels like you're interacting with Buzzfeed and not Gibbon.
dang 2 days ago

Ok, I've replaced the title above with more representative language from the article.

isitnow 2 days ago

[dead]

abathur 2 days ago

Someone who knows a lot of history is a history buff.

A historian works with (and may even seek out in musty rooms) primary and secondary sources to produce novel research and interpretation.

An AI is at best limited to ~reading sources that human historians/archivists/librarians have already identified and digitized.

Certainly value to be had here wrt to finding needles in and making sense of already-digitized historical records, but that's more like a research assistant.

yannis 2 days ago

Yes it is more like a research assistant. The "novel research and interpretation " part is your own synthesis, deserving to be published or awarded a degree and a research assistant can save you a lot of time. As AI companies throw more money into their training data or tools become available for researchers to easily enhance this, by uploading their own data the answers will become more "accurate" and more detailed.
AlotOfReading 2 days ago

A significant amount of historical work is re-analyzing existing, known material rather than seeking out novel sources.
I do know some people working in classical literature that have been testing LLMs against untranslated sources and finding them perform reasonably well. It's completely within the scope of possibility to imagine them becoming more useful for academic work over time.
otabdeveloper4 2 days ago

> Certainly value to be had here
I don't agree. You won't cite an LLM in an academic paper as a source (since it's unverifiable and not reproducible), and claiming than an LLM's result is your own original would be fraud. So unless you never plan on publishing anything ever, what's the point?

petermcneeley 2 days ago

Brings a whole new meaning to "history is written by the winners"

rationalfaith 2 days ago

[dead]

grammarnazzzi 2 days ago

[dead]

option 2 days ago

Yeah, especially the ones from China /s

quantadev 2 days ago

[flagged]

otabdeveloper4 2 days ago

[flagged]

dang 2 days ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
https://news.ycombinator.com/newsguidelines.html
Edit: I think part of the problem here was the title. I've replaced it with more representative language from the article.
simonw 2 days ago

He's a historian. He's using LLMs to assist, a little, in the evaluation of those sources.
- otabdeveloper4 2 days ago
  
  [flagged]
  - dang 2 days ago
    
    If you familiarize yourself with Ben's work you'll soon discover that he is no fraud.
    Also, on HN please don't make it look like you're quoting someone when you're not. That's an internet snark trope and we're trying for something quite different here. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.