module avail
Now that you’re (hopefully) re-caffeinated and have stretched your legs and cleared your brain, let’s have a look at some more need-to-know information.
Software
The linux OS has some really powerful utilities but these are not truly bioinformatics software. We have to ‘install’ or ‘load’ software. We do not want to have all our software loaded all the time, as many informatic tools are very complex i.e. requiring different versions of python or java to run. Having software that requires conflicting versions loaded at the same time can cause a malfunction.
Most managed HPC systems (not your own personal computer) manage this by using modules. These are preconfigured installers for your software packages. Cardiff systems use modules - if this is the case you can see the software that is available try the following command:
this will bring up a software list that looks like the following:
------------ /mnt/clusters/darter/software/spack/share/spack/modules/linux-rocky8-zen -------------
abricate/1.0.0-qk6snru freebayes/1.3.6-l3l4sxc py-panaroo/1.2.10-fukmiob
abyss/2.3.5-b76h5om gatk/4.5.0.0-riltjjd py-quast/5.2.0-hgcvroq
alan/2.1.1-izjr2po gdbm/1.23-n6o3b5r py-unicycler/0.5.0-d6k3wzh
augustus/3.5.0-qinnslv grace/5.1.25-gukphap python/3.7.4-ftvphto
barrnap/0.9-kewos3t gromacs/2022.6-3xkd27n python/3.10.5-fsxphqu
bcftools/1.19-d7iukuu hmmer/3.4-yak3kcd python/3.11.7-d2ytb5r
bedtools2/2.31.1-7imv7b4 igv/2.16.2-qiah4ab qualimap/2.2.1-c5hghxs
blast-plus/2.14.1-eqbljdp kraken2/2.1.2-lam4dbm r/4.4.1-5jt6lfe
bowtie/1.3.1-zgreifk mafft/7.505-beg3hnq racon/1.4.3-qpn4rnq
bowtie2/2.5.2-4j3lpsk mcl/14-137-hetggoi raxml-ng/1.1.0-3gc6hnr
busco/5.4.3-av4kf4t minimap2/2.28-w7jvpwe samtools/1.19.2-m76oqh7
bwa/0.7.17-kikwf2c ncurses/6.5-yhetvhq seqtk/1.4-l7d6lqn
caliper/2.11.0-774rsmi perl-bioperl/1.7.6-semipod spades/3.15.5-4zo7gyg
cdhit/4.8.1-z4h4fo5 perl/5.34.0-hvoyvlh star/2.7.11a-cbbjioq
dos2unix/7.4.4-scyfkh5 picard/3.1.1-peglbgr subread/2.0.6-abbqxcc
dyninst/13.0.0-ssbc5ep pilon/1.22-hoz46gi tbl2asn/2022-04-26-mogg2jy
emboss/6.6.0-mlzei3k prodigal/2.6.3-o4shfn4 tmux/3.4-pp5yxaf
fastp/0.23.4-mr6qifo prokka/1.14.6-barrnap-bedtools-5z4fskk trimmomatic/0.39-erd3ktn
fasttree/2.1.11-x2xclli prokka/1.14.6-mqcdb45 velvet/1.2.10-rryts6f
fastx-toolkit/0.0.14-gp27cub py-macs3/3.0.0b3-w77yz26 vt/0.5772-d76l4yn
figtree/1.4.4-knoy37j py-multiqc/1.15-mqimc53 wtdbg2/2.3-iy52ckp
------------------------ /mnt/clusters/darter/software/nospack/modulefiles ------------------------
apptainer/1.3.3 fmlrc/v1.0.0 MAGScoT/1.0.0-conda qiime2/2024.5-conda star/2.7.11b-conda
artemis/18.2.0-conda go/1.22.4 maxbin2/2.2.7-conda reportseff/v2.7.6
coverm/0.7.0-conda java/11.0.2 metabat2/2.17-conda snippy/v4.6.0
fastqc/v0.12.1 java/17.0.1 mitos/2.0.4-conda snp-sites/2.5.1
Any of these software packages can be loaded by simply using the module load command, e.g.
module load fastqc/v0.12.1
Remember to unload the module when you are finished so it does not conflict with the next piece of software, or if you want to unload all modules, use the module purge command.
module unload fastqc/v0.12.1
or
module purge
Module versions can change so check they are correct with module avail
rather than just copy and pasting from the guide.
Data
The data for your course is available to you on a pre-mapped drive location. You can see and list the folders it contains using the following simple command:
ls ~/classdata/
#this should yield the following list of folders
datasets REFS Session1 Session2 Session3 Session4 Session5 Session6
You can see these folders but not write to them! Before each session, copy the corresponding folder to your mydata directory using cp
.
Note the inclusion of the ‘-r’ flag to allow you to copy the folder ‘recursively’: this will copy all the folders and files inside.
Visualising text files
There are many commands available for reading text files on Linux/Unix. These are useful when you want to look at the contents of a file, but not edit them. Among the most common of these commands are cat
, more
, less
, head
and tail
.
cat
streams the entire contents of a file to your terminal and is thus not that useful for reading long files as the text streams past too quickly to read. It is a very useful facility for reading files into other programs.
head
and tail
show the beginning and end 10 lines of a document, respectively. head
in particular is useful for taking a quick peek inside a file. You can use the argument -n [number] to change the number of lines displayed. For example, to see the first 20 lines of a file:
head -n 20 file.txt
more
and less
are commands that show the contents of a file one page at a time, but less
has more functionality than more
(doh)! With both more
and less
, you can use the space bar to scroll down the page, and typing the letter q causes the program to quit – returning you to your command line prompt.
Once you are reading a document with more
or less
, typing a forward slash / will start a prompt at the bottom of the page, and you can then type in text that is searched for below the point in the document you were at. Typing in a ? also searches for a text string you enter, but it searches in the document above the point you were at. Hitting the n key during a search looks for the next instance of that text in the file.
With less
(but not more
), you can use the arrow keys to scroll up and down the page, and the b key to move back up the document if you wish to.
Remember these are just for reading the files. If you want to edit them you’ll need a text editor like nano or vi (see below).
There are many command line options available for each of the above commands, as well as functionality we do not cover here. To read more about them, consult the manual pages.
Move into mydata/Session1
Explore the file O97435.txt by using the commands cat
, more
, less
, head
and tail
. (Note that the file name starts with a capital O, rather than a zero.)
Don’t forget that tab completion can save you typing effort!
cat O97435.txt
more O97435.txt # Use the spacebar to scroll down
Press q to quit.
Now try less
less O97435.txt
Use the spacebar to scroll down, b to go up a page, and the up and down arrow keys to move up and down the file line by line.
Press the / key and search for the letters sequence in the file. Press the ? key and search for the letters gene in the file.
Press the n key to search for other instances of gene in the file.
head O97435.txt
tail O97435.txt
For reading files yourself, we recommend the command less.
The command cat
is more usually used in conjunction with other commands when you wish to process text from within a file in some way.
Remember the man pages
There are many command line options available for each of the above commands, as well as functionality we do not cover here. To read more about them, consult the manual pages:
man cat
man more
man less
man head
man tail
Using text editors
Plain text files are important, both as input to bioinformatics programs and as input or configuration files for system programs. We highly recommend that you learn to use a text editor to prepare and edit plain text files.
Text files, Word Processors and Bioinformatics
Documents written using a word processor such as Microsoft Word or OpenOffice Write are not plain text documents and will not work in either a text editor or if used as a script. If your filename has an extension such as .doc or .odt, it is unlikely to be a plain text document. (Try opening a Word document in Notepad or another text editor on Windows if you want proof of this.)
Word processors are very useful for preparing documents, but do not use them for working with bioinformatics-related files. We recommend that you prepare text files for bioinformatics analyses using Linux-based text editors and not Windows- or Mac-based text editors. This is because Windows- or Mac-based text editors may insert hidden characters that are not handled properly by Linux-based programs. There are endless posts on bioinformatics forums bemoaning the problems that arise from this!
There are a number of different text editors available on Linux. These range in ease of use, and each has its pros and cons. In this practical we will briefly look at two editors, nano and vi. Vi is an extremely powerful text editor and popular with many professionals, as it is installed on almost every Linux system. However, it is also notoriously frustrating to get to grips with. For this reason, you are probably better starting off with nano, which is very easy to use and displays instructions at the bottom of the screen.
Nano
Pros:
Very easy – For example, command options are visible at the bottom of the window, it can be used when logged in without graphical support, and is fast to start up and use
Cons:
By default it puts return characters into lines too long for the screen (i.e. using nano for system administration can be dangerous!) This behavior can be changed by setting different defaults for the program or running it with the –w option. It is not completely intuitive for people who are used to graphical word processors.
Vi (or Vim)
Pros:
Appears on nearly every Unix system. Can be very powerful if you take the time to know the key-short cuts.
Cons:
You have to know the shortcuts!! There’s no menus and no on screen prompts
Note that unlike most graphical programmes, where you write a document and then name it at the end, command-line text editors require you to name a file when you create it.
Create a file with nano
nano test_nano.txt
Type some text, exit ctrl X, save and return to command line now list the contents of the file you created
less test_nano.txt
Create a file with vi
vi test_vi.txt
Vi has two separate modes: ‘input’ mode, where you can type and edit text, and ‘control’ mode where you can do things such as save, search and exit the programme.
Type ‘i’ to enter input mode, and add some text.
Press [esc] to enter control mode.
Exit, saving your edits, by typing :wq
- this stands for write and quit! If you wanted to exit without saving the edits, typing :q!
will force vi to exit without saving.
Now list the contents of the file you created
less test_vi.txt
Runnign code when you need to have a coffee
Often you have some code or script that will take some time to run and you want to shut down your computer and go for coffee or sleep (yes even informaticians sleep) but you want the code to continue to run. To make this work use the command nohup
. The syntax is simple - here are 2 simple examples.
#run this and you can close session and enjoy relax time
nohup [command/script] &
# this command will write any output or logs to a file called - logfile.txt
nohup [command/script] > logfile.txt 2>&1 &
End of Session 1. Go and relax….and have a coffee!
Or if you’re a total nerd and can’t leave the command line, go ahead and have a look at Extension: Commandline Tools and Scripting. This contains more Linux-related tidbits and a sneak preview of some material we’ll cover in Session 2.