For the first project, please create a jupyter notebook and write a small report that includes the following listed below:

Methods

Write a methods section that describes the pipeline that you created for this project. This should follow the conventions we discussed in lecture. This should be a few paragraphs at most.

Results

Write a short results section that includes the following findings and visualizations:

  1. Create a circos plot from the GFF file generated by Prokka
  2. Open the report generated by QUAST and report the following findings for the polished assembly:
    • The genome fraction (%)
    • The duplication ratio
    • The # of misassemblies
    • The # of mismatches per 100kbp
    • The total length of the assembly
    • The GC Content
  3. Open the report generated by QUAST for the unpolished assembly:
    • Note any differences from the above report
  4. Open the report generated by BUSCO and report the following values:
    • The string indicating the BUSCO results (e.g. C:89.0%[S:85.8%,D:3.2%],F:6.9%,M:4.1%,n:3023)
    • Please read the documentation on BUSCO or the source publication and provide a short 1 paragraph explanation of what BUSCO is doing and what the values in the above string indicate about your assembly. We will discuss in class more about what exactly it’s doing.
    • Include the plot created by the BUSCO_PLOT process
  5. Write a short paragraph comparing how successful you believe the experiment was in generating a high quality assembly. Point to specific metrics in the reports and findings above that support your conclusions. You may also compare these findings to the reference assembly found here

Keep this brief and concise. This section should be no more than 500 words.

Updated: