(Go: >> BACK << -|- >> HOME <<)

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

National Library of Poland ISBN search #3036

Merged
merged 5 commits into from
Jun 15, 2023
Merged

Conversation

Eccenux
Copy link
Contributor
@Eccenux Eccenux commented May 20, 2023

Hi. I've added BN search for Zotero, to be used hopefully on Wikipedia (Citoid) in future.

This is a simple marcXml search and BN API is rather stable. I've created the test in the Scaffold IDE and it seems to parse data without problems.

@Eccenux
Copy link
Contributor Author
Eccenux commented May 20, 2023

I just noticed I've copy&pasted wrong browserSupport. Not sure if that matters. Using ES6, so no IE support.

@adam3smith
Copy link
Collaborator

Thanks -- looks good in general.
There is a larger question about how many different ISBN searches we're going to add. I think in general this one makes sense -- we don't have good coverage of Poland/Eastern Europe in the current ISBN searches -- but this is run on every single ISBN that people input and makes things a bit slower, so there is a real downside to keep adding these.
@dstillman what's your view on this? One option would be to just run this on ISBNs with Polish region code -- that's surely going to miss some stuff (e.g. other Eastern countries well represented in the NLP) but it would avoid the issue above.

National Library of Poland ISBN.js Outdated Show resolved Hide resolved
@Eccenux
Copy link
Contributor Author
Eccenux commented May 20, 2023

...One option would be to just run this on ISBNs with Polish region code -- that's surely going to miss some stuff (e.g. other Eastern countries well represented in the NLP) but it would avoid the issue above.

I happen to have access to full dump of BN. I'll check for patterns in ISBN.

@Eccenux
Copy link
Contributor Author
Eccenux commented May 20, 2023

Filtering by /^(97883|83)/ might work. Stats from the database:

  • pl83: 1 085 412
  • total with ~valid isbn: 1 433 909

Stats SQL:

select 'pl83' as type, sum(isbn_count) from temp_isbn_short
--select * from temp_isbn_short
where isbn_prefix ~ '^97883' or isbn_prefix ~ '^83'
union all
select 'total with ~valid isbn', sum(isbn_count) from temp_isbn_short
;

Prefix data of BN:
data-isbn-short.csv

@adam3smith
Copy link
Collaborator

Oh yes, I think that's a strong case for prefiltering ISBNs. (which would then also allow us to add many more ISBN translators). @dstillman would it make more sense to do this in the identifier code or the translators individually (which I think we can already do?)

@Eccenux Eccenux requested a review from adam3smith May 21, 2023 02:21
@dstillman
Copy link
Member

Much better to do it in the translator detect code — then it'll just work everywhere.

@Eccenux
Copy link
Contributor Author
Eccenux commented May 21, 2023

Done. Added a simple filter which should be fast.

@Eccenux
Copy link
Contributor Author
Eccenux commented May 28, 2023

@adam3smith Hi. I think I resolved everything here, but please let me know if there is more work for me to do? Otherwise, could you re-run tests and merge?

@Eccenux
Copy link
Contributor Author
Eccenux commented Jun 14, 2023

Hi, I've updated the base for my pull request so merging should be easy.

Maybe this linter will work too... Although I don't know where the version number '65161a4967b2...' came from in the script. From what I see it's some change from 2022-10-28:
OUP: Lint, add authors from DOM if necessary (#2898)

@adam3smith
Copy link
Collaborator
adam3smith commented Jun 15, 2023

(I believe CI fails if there have been any CI-relevant changes on the master branch while the PR is under review)

edit: I don't know either then ;)

@adam3smith
Copy link
Collaborator

Sorry for the delay -- I wanted to spend some time testing this to make sure this works as intended, since we haven't used the idea of restricting to ISBN regions before -- it's looking great and I'm very excited that this means we'll be able to add other national (or quasi-national) libraries to ISBN search without real costs.

@adam3smith adam3smith merged commit a6e270e into zotero:master Jun 15, 2023
0 of 2 checks passed
morganfshirley pushed a commit to morganfshirley/translators that referenced this pull request Jun 15, 2023
sebastian-berlin-wmse added a commit to sebastian-berlin-wmse/translators that referenced this pull request Nov 14, 2023
Libris is a service by The National Library of Sweden ("Kungliga
biblioteket"). It collects information from several Swedish
libraries. For more information see
http://libris.kb.se/help/about_libris_eng.jsp?language=en.

This translator is based on the translator for National Library of
Poland ISBN. Based on the discussion in zotero#3036 it filters only books
published in Sweden. It uses the Xsearch API:
http://libris.kb.se/help/xsearch_eng.jsp?language=en.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants