This tool might be useful for quick one-off referencing, but I feel that most will probably be better off using a proper citation manager like the open-source Zotero.
Is an open-source library being used for this? Or can you describe the methods you use? I worked on this and related problems around extracting features from paper PDFs, we could all learn from how you did it.
Generally, an About page is always appreciated for such web tools with minimal UX, particularly when it's rather automagical.
Endpoint: https://opencitations.net/index/api/v2/references/
Purpose: Retrieves a list of all references from a paper by its DOI
Data format: JSON containing cited DOIs and metadata
DOI Content Negotiation
Endpoint: https://doi.org/{DOI}
Purpose: Fetches metadata and formatted citations for DOIs
Formats: BibTeX, RIS, CSL JSON, RDF XML, etc.
Implements CSL (Citation Style Language) for text-based citations
Local Citation Style Files
Purpose: Provides access to thousands of citation styles
Storage: Pre-generated JSON files with style information
In this case it's querying the relevant DOI registration agency's API for the metadata (statistically that's likely Crossref) that the publisher themselves registered. So it doesn't look like there's any extraction going on here.
Could you share _your_ work though? It's always interesting to see new approaches to metadata.
Traditionally, it was a bit of a one-way street (data comes from publisher) but there's some interesting work being done by COMET [0] and (separately) OpenAlex [1] around cleanup of the publisher-supplied data within the community.
(I used to work at Crossref; am a little involved with COMET)
You can look at the network requests to see what it's doing. It's querying the OpenCitations database followed by the DOI.org content negotiation endpoint, which 302's to Crossref (or whoever the relevant DOI registration agency is).
Well, it has some caveats: a) the papers need to be in crossref, which is ~ 70% of DOIs b) it works bad with preprints for instance.
The advantage is its fast and doesn't need to download/extract anything from the pdf. But for 100% reliability it would be probably necessary.
How does DOI interact with blockchain? I did a quick Google search and didn’t find much (lots of mismatches against “DAO”). Does DOI need blockchain for any legit reasons, like provenance?
I’m no blockchain evangelist in its current state of “value” but this seems like a great test case for resolving the academic or legitimate origin of published material.
DOI has nothing to do with blockchain. There's no great looming issue with resolving the legitimate origin of published material. There's no provenance problem to solve. There's a registration problem, that has been solved, and for which blockchains are a terrible fit.
The current naivite of the scientific community is exhausting.
As if the current political climate isn't going to result in the sabotage of scientific infrastructure if some state actor decides that it could provide some economic or military advantage. (hello three body problem)
DOIs should have been hashes, that would have been cheaper, more resilient, and more covenient.
But sadly librarians tend to re-build paper workflows digitally instead of building digitally native infrastructure.
Blockchain would be fine as a timestamp service to replace publishers, although a consensus based system hosted by the worlds libraries would also be fine for that purpose and require a lot less machinery.
DOIs could be stored for lookup in a blockchain. Isn't there currently a centralized single point of failure in DOI and ORCID resolution?
Users would generate and centrally register or receive a generated W3C DID keypair with which to sign their ScholarlyArticles and peer review CreativeWorks.
W3D DID Decentralized Identifiers solve for what DOI and ORCID solve for without requiring a central registry.
W3C PROV is for describing provenance. PROV RDF can be signed with a DID sk.
PDFs can be signed with their own digital signature scheme, but there's no good way to publish Linked Data in a PDF (prepared as a LaTeX manuscript for example).
Bibliographic and experimental control metadata is only so useful in assuring
provenance and authenticity of article and data and peer reviews which legitimize.
>> JOSS (Journal of Open Source Software) has managed to get articles indexed by Google Scholar [rescience_gscholar]. They publish their costs [joss_costs]: $275 Crossref membership, DOIs: $1/paper:
This tool might be useful for quick one-off referencing, but I feel that most will probably be better off using a proper citation manager like the open-source Zotero.
Keep Zotero/Mendeley for collection management; use this simple tool when you just need the formatted references list in five seconds.
Where it helps
- Deep-dive reading – fetch bulk RIS file and dump a seminal paper’s entire bibliography into Zotero/Mendeley and follow the threads.
- Bulk citing – grab BibTeX's for a cluster of related papers without hunting them down one-by-one.
- LLM grounding – feed language models a clean reference list so they stop hallucinating citations.
Did you just use a LLM to write this reply?
Zotero can't extract references from a paper to read later, or at least, I've been using it wrong for years now.
Is an open-source library being used for this? Or can you describe the methods you use? I worked on this and related problems around extracting features from paper PDFs, we could all learn from how you did it.
Generally, an About page is always appreciated for such web tools with minimal UX, particularly when it's rather automagical.
Its actually open-source. Here is the repo: https://github.com/mireklzicar/doi-reference-extractor
APIs Used OpenCitations API (v2)
Endpoint: https://opencitations.net/index/api/v2/references/ Purpose: Retrieves a list of all references from a paper by its DOI Data format: JSON containing cited DOIs and metadata DOI Content Negotiation
Endpoint: https://doi.org/{DOI} Purpose: Fetches metadata and formatted citations for DOIs Formats: BibTeX, RIS, CSL JSON, RDF XML, etc. Implements CSL (Citation Style Language) for text-based citations Local Citation Style Files
Purpose: Provides access to thousands of citation styles Storage: Pre-generated JSON files with style information
In this case it's querying the relevant DOI registration agency's API for the metadata (statistically that's likely Crossref) that the publisher themselves registered. So it doesn't look like there's any extraction going on here.
Could you share _your_ work though? It's always interesting to see new approaches to metadata.
Traditionally, it was a bit of a one-way street (data comes from publisher) but there's some interesting work being done by COMET [0] and (separately) OpenAlex [1] around cleanup of the publisher-supplied data within the community.
(I used to work at Crossref; am a little involved with COMET)
[0] https://www.cometadata.org/
[1] https://openalex.org/
Looks like it's just calling the crossref API
You can look at the network requests to see what it's doing. It's querying the OpenCitations database followed by the DOI.org content negotiation endpoint, which 302's to Crossref (or whoever the relevant DOI registration agency is).
More info on content negotiation:
https://citation.doi.org/
One suggestion: show the full reference of the DOI entered.
I wonder how OpenCitations populates their data? One example I tried showed 9 references where the paper had 30+.
Well, it has some caveats: a) the papers need to be in crossref, which is ~ 70% of DOIs b) it works bad with preprints for instance. The advantage is its fast and doesn't need to download/extract anything from the pdf. But for 100% reliability it would be probably necessary.
How does DOI interact with blockchain? I did a quick Google search and didn’t find much (lots of mismatches against “DAO”). Does DOI need blockchain for any legit reasons, like provenance?
I’m no blockchain evangelist in its current state of “value” but this seems like a great test case for resolving the academic or legitimate origin of published material.
DOI has nothing to do with blockchain. There's no great looming issue with resolving the legitimate origin of published material. There's no provenance problem to solve. There's a registration problem, that has been solved, and for which blockchains are a terrible fit.
The current naivite of the scientific community is exhausting.
As if the current political climate isn't going to result in the sabotage of scientific infrastructure if some state actor decides that it could provide some economic or military advantage. (hello three body problem)
DOIs should have been hashes, that would have been cheaper, more resilient, and more covenient. But sadly librarians tend to re-build paper workflows digitally instead of building digitally native infrastructure.
Blockchain would be fine as a timestamp service to replace publishers, although a consensus based system hosted by the worlds libraries would also be fine for that purpose and require a lot less machinery.
DOIs could be stored for lookup in a blockchain. Isn't there currently a centralized single point of failure in DOI and ORCID resolution?
Users would generate and centrally register or receive a generated W3C DID keypair with which to sign their ScholarlyArticles and peer review CreativeWorks.
W3D DID Decentralized Identifiers solve for what DOI and ORCID solve for without requiring a central registry.
W3C PROV is for describing provenance. PROV RDF can be signed with a DID sk.
PDFs can be signed with their own digital signature scheme, but there's no good way to publish Linked Data in a PDF (prepared as a LaTeX manuscript for example).
Bibliographic and experimental control metadata is only so useful in assuring provenance and authenticity of article and data and peer reviews which legitimize.
From https://news.ycombinator.com/item?id=28382186 :
>> JOSS (Journal of Open Source Software) has managed to get articles indexed by Google Scholar [rescience_gscholar]. They publish their costs [joss_costs]: $275 Crossref membership, DOIs: $1/paper:
[dead]