Request A Quote
Contact us to discuss how we can help you achieve your research goals
Featured Blog

Super easy to use free paper retrieval artifact: PyPubMed

· Searching on PubMed but Internet speed is too bad?
· File opening failed again! Published papers are too many to check and they cannot
· I don’t want to read the abstract/review one by one!
· How to filter the impact factor, publication year, abstract? It makes me crazy when I got struggled to find a target paper!

No need to worry anymore! With this PyPubMed gadgets, all of the problems will be solved!

PyPubMed is a smart tool to make paper searching easier within a big database. The tool was conceived by Huan Yu, the director of Novogene Medical Department, and developed by Qingdong Su, the Senior Bioinformatics Engineer. It can help perform paper retrieval, output detailed information, and store it as an excel table quickly. With the tool, effective information will be easy to master, review and sort out.

Installation

1. Install Python3 Environment

Enter the Python official website link: https://www.python.org/downloads/, and click to download the installation package corresponding to your own system. Take Windows 10 system as an example.

1.1 Download and install the Windows installer (64-bit). 

Novogene-PyPubMed-1

1.2 During installation, check Add Python to the environment variability and complete. 

Novogene-PyPubMed-2

2. Install PyPubMed

Open the command line interface to operate: Windows system opens the command line (shortcut key is Win+R, enter cmd and press Enter), Mac system can directly enter the terminal operation interface.

In the command line mode, execute the following command to install PyPubMed:

pip3 install PyPubmed

Note: All commands and parameters must be separated by at least one space. Non-Windows systems also need to be case sensitive;

If the installation speed is too slow and an error is reported, you can use Alibaba Cloud mirroring to speed it up, now enter the following command:

pip3 install pypubmed -i https://mirrors.aliyun.com/pypi/simple

After installation, test whether the installation is successful, enter the following command line:

pypubmed

The following prompt appears, indicating that the installation is successful:

  • Usage: pypubmed [OPTIONS] COMMAND [ARGS]…
  • Toolkits for NCBI Pubmed

View the current version:

pypubmed --version

Update pypubmed to the latest version:

pip3 install -U pypubmed

3. Add API_KEY parameter

To increase the access frequency limit, the addition of API_KEY parameter for the first use is recommended.

API_KEY generation method: Register NCBI account and log in, then visit the link below and click to generate your API_KEY. Link: https://www.ncbi.nlm.nih.gov/account/settings/#accountSettingsApiK

Novogene-PyPebMed-3

Input the command:

pypubmed -k YOUR_API_KEY search -h

Note: The -k parameter only needs to be added when it is used for the first time.

Function 1: Literature search function

1. Keywords / PMID search

The commonly used method is to first use the PubMed advanced search function to get the logical fields that need to be searched and then use the pypubmed search command line to search and download the literature.

You can enter the help command line pypubmed -h to view common commands and instructions for the first use.

Here are a few commands to keep in mind:

Options:

  1. -min, –min-factor FLOAT can limit the minimum impact factor of the documents to be retrieved.
  2. -l, –limit INTEGER, can limit the number of output documents (Important Reminder: It is recommended that each time you search, you must perform the NCBI keyword search result test according to your needs, get the best keywords and output the number of documents limit, too much output (eg tens of thousands) will cause problems such as too long running time).
  3. -o, –outfile TEXT, can specify the file name of the output result, the default is pubmed.xlsx.
  4. -c, –cache, when the translation is too slow or interrupted, the translated result can be stored in the cache file.

Query example: You need to retrieve the keywords ngs and disease in the title or abstract and output the first 5 articles. Specify the output file name: ngs_disease.xlsx. We can first use PubMed advanced search function to get the field: NGS[Title/Abstract] AND disease [Title/Abstract], and then enter the following command:

pypubmed search “NGS[Title/Abstract] AND disease[Title/Abstract]” -l 5 -o ngs_disease.xlsx

In addition, you can also use PMID to retrieve. First use pubmed to export the txt file of the PMID after the query: pmid-NGSTitleAb-set.txt, save the path: C:\Users\Summer\Desktop, and then enter the following command:

pypubmed search C: \Users\Summer\Desktop\pmid-NGSTitleAb-set.txt

2. Advanced Search

If the search keyword logic field is relatively simple, you can use the pypubmed advance-search command line to search and download the literature. Enter the following command line:

pypubmed advance-search

For example, if you want to search the NGS literature on heart disease, follow the prompts to select:

>>> please choose a number of field [48]: 48

your choice is: 48-Title/Abstract

>>> please enter a search term: cardiopathy

query box now: "cardiopathy"[Title/Abstract]
input finish? [y/N]: n

>>> please choose a number of field [48]: 48
your choice is: 48-Title/Abstract

>>> please enter a search term: ngs>>> please input the logic (and, or, not) [and]: and

The final search fields are as follows:

query box now: ("cardiopathy"[Title/Abstract]) AND ("ngs"[Title/Abstract])

The number of retrieved documents is as follows:

final query box: ("ngs"[Title/Abstract]) AND ("cardiopathy"[Title/Abstract]) count: 1detail: "ngs"[Title/Abstract]:14431, "cardiopathy" [Title/Abstract]:3534

If you need to download, you can continue the follow-up operation, but when the number of documents is large, this method of downloading is not recommended, and the aforementioned method is endorsed.

Function 2: Batch generation of document citation format

Enter the following command line:

pypubmed citations -h

Commonly used commands:

-f, --fmt [ama|mla|apa|nlm]

the citation format of the final output.

Query example: Export the reference format of 2 PMIDs: 33567694, 33546218, just enter the following command:

pypubmed citations 33567694 33546218 -f apa

The query results are as follows:

33567694Esposito, MV, Comegna, M., Cernera, G., Gelzo, M., Paparo, L., Berni Canani, R., & Castaldo, G. (2021). NGS Gene Panel Analysis Revealed Novel Mutations in Patients with Rare Congenital Diarrheal Disorders. Diagnostics (Basel, Switzerland), 11(2), 262.33546218
Maggi, J., Koller, S., Bähr, L., Feil, S., Kivrak Pfiffner, F., Hanson, J., Maspoli, A., Gerth-Kahlert, C., & Berger, W. (2021 ). Long-Range PCR-Based NGS Applications to Diagnose Mendelian Retinal Diseases. International journal of molecular sciences, 22(4), 1508.


PyPubMed is simple and fast to install, low in storage, convenient and efficient to use, and has won unanimous praise from users.

Install and use it quickly, forward it and share the good things with your friends!