Return to site

How To Use Sra Toolkit For Mac

broken image


By default, the SRA Toolkit installed on Biowulf is set up to use the central Biowulf configuration file, which is set up to NOT maintain a local cache of SRA data. After discussion with NCBI SRA developers, it was decided that this was the most appropriate setup for most users on Biowulf. The SRA Toolkit and GitHub download pages. This open-source toolkit can be downloaded from the SRA Toolkit webpage or from GitHub/NCBI and is available for the major operating systems. The GitHub web link also provides the uncompiled files for you if you are computer savvy and would like to compile the files yourself. LaCie Toolkit for macOS. This software allows the user to sync files. LaCie Toolkit for Windows. This software allows the user to back up and sync files.


A collaborator recently asked if I could help pull down a few thousand sequence files from the NCBI Sequence Read Archive (SRA) for a secondary analysis. This is a short post primarily to help me (and hopefully others) remember how to do this once you have a set of SRR IDs of interest.

While I came across several great resources providing information on how to download SRA files using the SRA Toolkit, I wanted to retain just the basics, and some example code, should this type of request come across my desk again in the future. Hopefully this post will keep me from having to start from scratch the next time this comes up and not rehash the same mistakes I made the first time around.

There are several great resources for learning more about accessing data and metadata from the SRA including:

  • The NCBI SRA Download Guide
  • An excellent series of posts from Rob Edwards
  • Morgan Langille's very helpful Download_From_SRA wiki


However, for this type of request, all I think I will need to remember is a few lines of code and that I want to grab the fastq files, in parallel and without compression, to save time.

So below is what I think will be most useful once one has a set of SRR IDs of interest.


I typically work on an iMac pro and have access to multiple cores for parallel computing. Installation of the SRA Toolkit can be performed quickly on a Mac using Homebrew. Cardhop 1 0 7 – manage your contacts without. Installing parallel will also allow you to perform the download…well…in parallel.

Biochemical evidence for evolution lab 12 answers. If you are working on a Linux or Windows machine binaries can be found here.

The code below will install both the toolkit and parallel.


Now we simply want to open the terminal, create a new folder that will house the sequence files, and navigate to that folder before running the fastq-dump command. Here I am creating a new folder on my desktop that will store a bunch of fastq files generated from 16s rRNA gene sequencing on the Illumina MiSeq. As you can guess from the name, these files come from the well-known RISK IBD cohort.


Now the fun part. Ssh copy 17 03 1 download free. The fastq-dump command will:

  • Download the sequence data for each SRR ID contained in the /Users/olljt2/desktop/sra_ids.txt text file
  • Download the data as fastq files (without compression)
  • Run the job over 16 threads (modify per the number available on your machine)
  • Dump each run into a separate file (so we will have either 1 or 2 fastqs per SRR ID depending on whether the run used single- or paired-end sequencing)

A full list of the fastq-dump options can be found here. I prefer to download the files without compression as this greatly reduces the download time. Once the files are on your local machine, you can then go ahead and compress them at your leisure without worrying about maintaining the connection.


The sra_ids.txt text file called above is simply a tab-delimited text file with no header that has a structure of…

Sra Toolkit Github


Mac all images from webpage. Hopefully, I will now remember where to find this information, and snippets of code, the next time the need arises!


If you'd like to use publicly available NGS data, you may want to learn how to use SRA toolkit. Downloaded .sra file can be converted to .fastq file.

Sra

Fyi… what is SRA?

Though above provides comprehensive information, my customer wanted to know ‘exactly how' to use SRA toolkit, so I did it myself and summarized the workflow in below scripts (run at Mac Terminal) and the pdf file.

Hope this helps! and if you have any troubles please feel free to contact me! 🙂

#install sra toolkit
ruby -e '$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)'
brew install wget
brew tap homebrew/science
brew install sratoolkit

#download individual sra file
wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP009/SRP009459/SRR384905/SRR384905.sra #change me#

#if you would like to download a series of sra files, do something like this
sra_list=({384905.384962})
base_url=ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP009/SRP009459

for sra_id in ${sra_list[@]}
do
wget ${base_url}/SRR${sra_id}/SRR${sra_id}.sra
sleep 10m
done

How To Use Sra Toolkit

#convert sra to fastq
fastq-dump –split-files ./SRR384905.sra #change me#

#if you would like convert a series of sra files to fastq files, do something like this
sra_list=({384905.384962}) #change me#
for sra_id in ${sra_list[@]}
do
fastq-dump –split-files ./SRR${sra_id}.sra
done

How To Use Sra Toolkit For Mac Free

#above is summarized in below pdf file





broken image