-
Notifications
You must be signed in to change notification settings - Fork 3
XX[DEPRECATED] Useful commands for common tasks: SW
These are either rails console
ruby or rake
task commands for various common tasks.
sunetid='peter12345'
author=Author.find_by_sunetid(sunetid)
author.author_identities # gives the users alternate names data
Show publications that would be harvested for a given cap profile id but not actually add them to the profile
Note this only shows new publications that would be harvested, not any current publications already in the profile.
cap_profile_id='12345'
author = Author.find_by_cap_profile_id(cap_profile_id)
author_pub_swids = author.publications.with_sciencewire_id.pluck(:sciencewire_id).uniq # this shows the current publications in their profile
harvester=ScienceWire::HarvestBroker.new(author, ScienceWireHarvester.new, alternate_name_query: true) # use alternate_name_query: false to skip alt names query
ids_for_author = harvester.send(:ids_for_author) # just stanford (no alt names)
ids_for_alternate_names = harvester.send(:ids_for_alternate_names) # all alternate names
ids_to_harvest = harvester.generate_ids # gets all ids and removes the current ones (i.e. just the ones that would be harvested)
pubs = ids_to_harvest.collect { |swid| ScienceWireClient.new.get_sw_xml_source_for_sw_id(swid) };nil
titles = pubs.collect { |i| "'#{i.at_xpath('//PublicationItem/Title').text}' BY #{i.at_xpath('//PublicationItem/AuthorList').text}" };nil;puts titles
Show IDs that result from a "dumb query" (i.e. name search only) regardless of number of publications currently on profile
Note that runs only the main identity (no alternate names/identities)
cap_profile_id='12345'
author = Author.find_by_cap_profile_id(cap_profile_id)
author_name = ScienceWire::AuthorName.new(author.last_name,author.first_name,author.middle_name)
harvester=ScienceWire::HarvestBroker.new(author, ScienceWireHarvester.new, alternate_name_query: true) # use alternate_name_query: false to skip alt names query
institution = ScienceWire::AuthorInstitution.new(Settings.HARVESTER.INSTITUTION.name,ScienceWire::AuthorAddress.new(Settings.HARVESTER.INSTITUTION.address.to_hash))
author_attributes = ScienceWire::AuthorAttributes.new(author_name, '', [], institution)
ids_for_dumb_query = harvester.send(:ids_from_dumb_query,author_attributes)
# for another arbitrary institution from their profile
institution = author.author_identities[0].institution
ScienceWire::AuthorInstitution.new(institution,ScienceWire::AuthorAddress.new({}))
author_attributes = ScienceWire::AuthorAttributes.new(author_name, '', [], institution)
ids_for_dumb_query = harvester.send(:ids_from_dumb_query,author_attributes)
cap_profile_id='12345'
AuthorHarvestJob.new.perform(cap_profile_id, harvest_alternate_names: true) # with alternate names
AuthorHarvestJob.new.perform(cap_profile_id, harvest_alternate_names: false) # without alternate names
# note, you may want to pull the latest profile data from CAP before doing this, command shown above
RAILS_ENV=production bundle exec rake sw:cap_profile_harvest[41135] # with default flag for alternate names data harvesting (currently false as of 12/1/2016) ... default is in config/settings.yml (USE_AUTHOR_IDENTITIES)
RAILS_ENV=production bundle exec rake sw:cap_profile_harvest_alt_names[41135] # force alternate names data harvesting to be true
cap_profile_id='12345'
author = Author.find_by_cap_profile_id(cap_profile_id)
author.publications.each {|pub| puts pub.title + "|" + pub.publication_identifiers.map {|pub_id| "#{pub_id.identifier_type}:#{pub_id.identifier_value}"}.join('|')};nil
Useful if the methods that parse source records into pub hashes change
pub=Publication.find_by_pmid('12345') # OR
pub=Publication.find_by_sciencewire_id('12345')
pub.rebuild_pub_hash
pub.save
This is useful if the parsing or other algorithms used to be build the pubhash from a source records changes. Note that it doesn't re-pull the source record itself from pubmed or sciencewire- it just reparses the data we already have and rebuilds the pub hash.
author=Author.find_by_sunetid('sunet') # or
author=Author.find_by_cap_profile_id(12345)
author.publications.each do |pub|
pub.rebuild_pub_hash
pub.save
end
RAILS_ENV=production bundle exec rake cap:poll_for_cap_profile_id[12345] # pulls in latest author identities as well as main author info
or on console
cap_profile_id='12345'
cap_http_client = CapHttpClient.new
record = cap_http_client.get_auth_profile(cap_profile_id)
poller = CapAuthorsPoller.new
puts JSON.pretty_generate(record) # display it
poller.process_record(record) # actually process it
author = Author.find_by_cap_profile_id(84517)
pub=Publication.find_or_create_by_pmid('20337330') # this will pull from PubMed and create a pub record if it doesn't exist
ScienceWireHarvester.new.add_contribution_for_harvest_suggestion(author,pub) # add this publication to this author
# should be taken care of by "add_all_db_contributions_to_my_pub_hash"?
#authorship_hash = {:featured=>'false',:visibility=>'public',:status=>'new',:cap_profile_id=>author.cap_profile_id}
#pub.contributions.build_or_update(author, authorship_hash)
pub.add_all_db_contributions_to_my_pub_hash
pub.save
To add multiple PMIDs to an author:
pmids=%w{19752419 19752420 19674727 21526923 22555112 23337555 24835760 24727261 24652518 25307130 26143666 26086920 26681392}
author = Author.find_by_cap_profile_id(84517)
harvester=ScienceWireHarvester.new
pmids.each do |pmid|
pub=Publication.find_or_create_by_pmid(pmid)
harvester.add_contribution_for_harvest_suggestion(author,pub)
pub.add_all_db_contributions_to_my_pub_hash
pub.save
end
or
# import the provided list of PMIDs from a plain text file, no header row, one line per PMID, into the cap profile ID 12345
RAILS_ENV=production bundle exec rake pubmed:pmid_profile_id_import['/tmp/list_of_pmids','12345']
author = Author.find_by_cap_profile_id(12345)
author.publications
author.approved_publications
client=ScienceWireClient.new
sciencewire_id='42708914'
result = client.get_full_sciencewire_pubs_for_sciencewire_ids([sciencewire_id])
wos_id='000323660000020'
result = client.get_full_sciencewire_pubs_for_wos_ids([wos_id])
doi='10.1038/nature11397'
result = client.get_pub_by_doi(doi, 1)
pmid='10000166'
result = client.pull_records_from_sciencewire_for_pmids([pmid])
pmid='25277988'
pmclient=PubmedClient.new
pmclient.fetch_records_for_pmid_list([pmid])
pmid='25277988'
result=PubmedHarvester.search_all_sources_by_pmid([pmid])
Update the pub-hash for a publication from the ScienceWire source record manually (as cached in our database, no new pull from ScienceWire)
This is not a great idea and is only useful as a quick patch to an existing publication that is wrong in ScienceWire (e.g. a bad DOI in their system)
sciencewire_id='42708914'
publication=Publication.find_by(:sciencewire_id=>sciencewire_id)
sw_source=SciencewireSourceRecord.find_by(:sciencewire_id=>sciencewire_id)
puts sw_source.source_data
sw_source.source_data.gsub!('some bad string','some fixed string') # edit the record if needed
sw_source.save
pub_hash=SciencewireSourceRecord.get_sciencewire_hash_for_sw_id(sciencewire_id)
publication.build_from_sciencewire_hash(pub_hash)
publication.save
sciencewire_id='42708914'
result=ScienceWireClient.new.get_sw_xml_source_for_sw_id(sciencewire_id)
Update an existing sciencewire or pubmed record publication data into the SciencewireSourceRecord/PubmedSourceREcord document and then update pubhash
This is useful if an existing ScienceWire and/or Pubmed record is updated to correct data and we need to resync with them and then update the local publication with the new data
sciencewire_id='42708914'
publication=Publication.find_by(:sciencewire_id=>sciencewire_id)
sw_source = SciencewireSourceRecord.find_by_sciencewire_id(sciencewire_id)
sw_source.sciencewire_update
publication.build_from_sciencewire_hash(sw_source_record.source_as_hash)
publication.sync_publication_hash_and_db
publication.save
pmid='42708914'
publication=Publication.find_by(:pmid=>pmid)
pm_source_record = PubmedSourceRecord.find_by_pmid(pmid)
pm_source_record.pubmed_update
publication.pub_hash = pm_source_record.source_as_hash
publication.save
pmid='42708914'
cap_profile_id='12345'
publication=Publication.find_by_pmid(pmid) # find an existing pub somehow
author = Author.find_by_cap_profile_id(cap_profile_id) # author that we need to add this pub to
sw=ScienceWireHarvester.new
sw.add_contribution_for_harvest_suggestion(author,publication)
Use case: a single author has two author rows with publications associated with each. You want to merge one author into the author, carrying any existing publications but not duplicating them. This happens when two profiles are created initially because CAP was not able to match the physician information to the faculty information until after two profiles were created. They "merged" them on the CAP side, but the publications were not merged on the SUL-PUB side. This manifests itself as unexpected behavior (missing pubs, etc.). The rake task takes in two cap_profile_ids and will merge all of the publications from DUPE_CAP_PROFILED_ID's profile into PRIMARY_CAP_PROFILE_ID's profile. It will then deactivate DUPE_CAP_PROFILED_ID's profile (which should now have no publications associated with it) to prevent harvesting into it.
RAILS_ENV=production bundle exec rake cleanup:merge_profiles[123,456] # will merge all publications from cap_profile_id 456 into 123, without duplication