cenplot
A library for building centromere figures.

Quickstart
Getting Started
Install the package from pypi
.
pip install cenplot
CLI
Generating a split HOR tracks using the cenplot draw
command.
# examples/example_cli.sh
cenplot draw \
-t tracks_hor.toml \
-c "chm13_chr10:38568472-42561808" \
-p 4 \
-d plots \
-o "plot/merged_image.png"
Python API
The same HOR track can be created with a few lines of code.
# examples/example_api.py
from cenplot import plot_one_cen, read_one_cen_tracks
chrom = "chm13_chr10:38568472-42561808"
track_list, settings = read_one_cen_tracks("tracks_hor.toml", chrom=chrom)
fig, axes, outfile = plot_one_cen(track_list.tracks, "plots", chrom, settings)
Development
Requires Git LFS
to pull test files.
Create a venv
, build cenplot
, and install it. Also, generate the docs.
git lfs install && git lfs pull
make dev && make build && make install
pdoc ./cenplot -o docs/
Overview
Configuration comes in the form of TOML
files with two fields, [settings]
and [[tracks]]
.
[settings]
format = "png"
[[tracks]]
title = "Alpha-satellite HOR monomers"
position = "relative"
[[tracks]]
title = "Sequence Composition"
position = "relative"
[[settings]]
determines figure level settings while [[tracks]]
determines track level settings.
- To view all of the possible options for
[[settings]]
, seecenplot.PlotSettings
- To view all of the possible options for
[[tracks]]
, see one ofcenplot.TrackSettings
Track Order
Order is determined by placement of tracks. Here the "Alpha-satellite HOR monomers"
comes before the "Sequence Composition"
track.
[[tracks]]
title = "Alpha-satellite HOR monomers"
position = "relative"
[[tracks]]
title = "Sequence Composition"
position = "relative"

Reversing this will plot "Sequence Composition"
before "Alpha-satellite HOR monomers"
[[tracks]]
title = "Sequence Composition"
position = "relative"
[[tracks]]
title = "Alpha-satellite HOR monomers"
position = "relative"

Overlap
Tracks can be overlapped with the position
or cenplot.TrackPosition
setting.
[[tracks]]
title = "Sequence Composition"
position = "relative"
[[tracks]]
title = "Alpha-satellite HOR monomers"
position = "overlap"

The preceding track is overlapped and the legend elements are merged.
Track Types and Data
Track types, or cenplot.TrackType
s, are specified via the type
parameter.
[[tracks]]
title = "Sequence Composition"
position = "relative"
type = "label"
path = "rm.bed"
Each type will expect different BED files in the path
option.
- For example, the option
TrackType.SelfIdent
expects the following values.
query | query_st | query_end | reference | reference_st | reference_end | percent_identity_by_events |
---|---|---|---|---|---|---|
x | 1 | 5000 | x | 1 | 5000 | 100.0 |
When using the Python
API, each will have an associated read_*
function (ex. cenplot.read_bed_identity
).
- Using
cenplot.read_one_cen_tracks
is preferred.
If input BED files have contigs with coordinates in their name, the coordinates are expected to be in absolute coordinates.
Absolute coordinates
chrom | chrom_st | chrom_end |
---|---|---|
chm13:100-200 | 105 | 130 |
Proportion
Each track must account for some proportion of the total plot dimensions.
- The plot dimensions are specified with
cenplot.PlotSettings.dim
Here, with a total proportion of 0.2
, each track will take up 50%
of the total plot dimensions.
[[tracks]]
title = "Sequence Composition"
position = "relative"
type = "label"
proportion = 0.1
path = "rm.bed"
[[tracks]]
title = "Alpha-satellite HOR monomers"
position = "relative"
type = "hor"
proportion = 0.1
path = "stv_row.bed"
When the position is cenplot.TrackPosition.Overlap
, the proportion is ignored.
[[tracks]]
title = "Sequence Composition"
position = "relative"
type = "label"
proportion = 0.1
path = "rm.bed"
[[tracks]]
title = "Alpha-satellite HOR monomers"
position = "overlap"
type = "hor"
path = "stv_row.bed"
Options
Options for specific cenplot.TrackType
types can be specified in options
.
[[tracks]]
title = "Sequence Composition"
position = "relative"
proportion = 0.5
type = "label"
path = "rm.bed"
# Both need to be false to keep x
options = { hide_x = false }
[[tracks]]
title = "Alpha-satellite HOR monomers"
position = "overlap"
type = "hor"
path = "stv_row.bed"
# Change mode to showing HOR variant and reduce legend number of cols.
options = { hide_x = false, mode = "hor", legend_ncols = 2 }

Examples
Examples of both the CLI and Python API can be found in the root of cenplot
's project directory under examples/
or test/
1r""" 2[](https://pypi.org/project/cenplot/) 3[](https://github.com/logsdon-lab/cenplot/actions/workflows/main.yaml) 4[](https://github.com/logsdon-lab/cenplot/actions/workflows/docs.yaml) 5 6A library for building centromere figures. 7 8<figure float="left"> 9 <img align="middle" src="https://raw.githubusercontent.com/logsdon-lab/cenplot/refs/heads/main/docs/example_multiple.png" width="100%"> 10</figure> 11 12# Quickstart 13 14.. include:: ../docs/quickstart.md 15 16 17# Overview 18Configuration comes in the form of `TOML` files with two fields, `[settings]` and `[[tracks]]`. 19```toml 20[settings] 21format = "png" 22 23[[tracks]] 24title = "Alpha-satellite HOR monomers" 25position = "relative" 26 27[[tracks]] 28title = "Sequence Composition" 29position = "relative" 30``` 31 32`[[settings]]` determines figure level settings while `[[tracks]]` determines track level settings. 33* To view all of the possible options for `[[settings]]`, see `cenplot.PlotSettings` 34* To view all of the possible options for `[[tracks]]`, see one of `cenplot.TrackSettings` 35 36## Track Order 37Order is determined by placement of tracks. Here the `"Alpha-satellite HOR monomers"` comes before the `"Sequence Composition"` track. 38```toml 39[[tracks]] 40title = "Alpha-satellite HOR monomers" 41position = "relative" 42 43[[tracks]] 44title = "Sequence Composition" 45position = "relative" 46``` 47 48<figure float="left"> 49 <img align="middle" src="https://raw.githubusercontent.com/logsdon-lab/cenplot/refs/heads/main/docs/simple_hor_top.png" width="100%"> 50</figure> 51 52Reversing this will plot `"Sequence Composition"` before `"Alpha-satellite HOR monomers"` 53 54```toml 55[[tracks]] 56title = "Sequence Composition" 57position = "relative" 58 59[[tracks]] 60title = "Alpha-satellite HOR monomers" 61position = "relative" 62``` 63 64<figure float="left"> 65 <img align="middle" src="https://raw.githubusercontent.com/logsdon-lab/cenplot/refs/heads/main/docs/simple_hor_bottom.png" width="100%"> 66</figure> 67 68## Overlap 69Tracks can be overlapped with the `position` or `cenplot.TrackPosition` setting. 70 71```toml 72[[tracks]] 73title = "Sequence Composition" 74position = "relative" 75 76[[tracks]] 77title = "Alpha-satellite HOR monomers" 78position = "overlap" 79``` 80 81<figure float="left"> 82 <img align="middle" src="https://raw.githubusercontent.com/logsdon-lab/cenplot/refs/heads/main/docs/simple_hor_overlap.png" width="100%"> 83</figure> 84 85The preceding track is overlapped and the legend elements are merged. 86 87## Track Types and Data 88Track types, or `cenplot.TrackType`s, are specified via the `type` parameter. 89```toml 90[[tracks]] 91title = "Sequence Composition" 92position = "relative" 93type = "label" 94path = "rm.bed" 95``` 96 97Each type will expect different BED files in the `path` option. 98* For example, the option `TrackType.SelfIdent` expects the following values. 99 100|query|query_st|query_end|reference|reference_st|reference_end|percent_identity_by_events| 101|-|-|-|-|-|-|-| 102|x|1|5000|x|1|5000|100.0| 103 104When using the `Python` API, each will have an associated `read_*` function (ex. `cenplot.read_bed_identity`). 105* Using `cenplot.read_one_cen_tracks` is preferred. 106 107> [!NOTE] If input BED files have contigs with coordinates in their name, the coordinates are expected to be in absolute coordinates. 108 109Absolute coordinates 110|chrom|chrom_st|chrom_end| 111|-|-|-| 112|chm13:100-200|105|130| 113 114## Proportion 115Each track must account for some proportion of the total plot dimensions. 116* The plot dimensions are specified with `cenplot.PlotSettings.dim` 117 118Here, with a total proportion of `0.2`, each track will take up `50%` of the total plot dimensions. 119```toml 120[[tracks]] 121title = "Sequence Composition" 122position = "relative" 123type = "label" 124proportion = 0.1 125path = "rm.bed" 126 127[[tracks]] 128title = "Alpha-satellite HOR monomers" 129position = "relative" 130type = "hor" 131proportion = 0.1 132path = "stv_row.bed" 133``` 134 135When the position is `cenplot.TrackPosition.Overlap`, the proportion is ignored. 136```toml 137[[tracks]] 138title = "Sequence Composition" 139position = "relative" 140type = "label" 141proportion = 0.1 142path = "rm.bed" 143 144[[tracks]] 145title = "Alpha-satellite HOR monomers" 146position = "overlap" 147type = "hor" 148path = "stv_row.bed" 149``` 150 151## Options 152Options for specific `cenplot.TrackType` types can be specified in `options`. 153* See `cenplot.TrackSettings` 154 155```toml 156[[tracks]] 157title = "Sequence Composition" 158position = "relative" 159proportion = 0.5 160type = "label" 161path = "rm.bed" 162# Both need to be false to keep x 163options = { hide_x = false } 164 165[[tracks]] 166title = "Alpha-satellite HOR monomers" 167position = "overlap" 168type = "hor" 169path = "stv_row.bed" 170# Change mode to showing HOR variant and reduce legend number of cols. 171options = { hide_x = false, mode = "hor", legend_ncols = 2 } 172``` 173 174<figure float="left"> 175 <img align="middle" src="https://raw.githubusercontent.com/logsdon-lab/cenplot/refs/heads/main/docs/simple_hor_track_options.png" width="100%"> 176</figure> 177 178## Examples 179Examples of both the CLI and Python API can be found in the root of `cenplot`'s project directory under `examples/` or `test/` 180 181--- 182""" 183 184import logging 185 186from .lib.draw import ( 187 draw_hor, 188 draw_hor_ort, 189 draw_label, 190 draw_strand, 191 draw_self_ident, 192 draw_bar, 193 draw_line, 194 draw_legend, 195 draw_self_ident_hist, 196 draw_local_self_ident, 197 plot_one_cen, 198 merge_plots, 199 PlotSettings, 200) 201from .lib.io import ( 202 read_bed9, 203 read_bed_hor, 204 read_bed_identity, 205 read_bed_label, 206 read_one_cen_tracks, 207) 208from .lib.track import ( 209 Track, 210 TrackType, 211 TrackPosition, 212 TrackList, 213 LegendPosition, 214 TrackSettings, 215 SelfIdentTrackSettings, 216 LineTrackSettings, 217 LocalSelfIdentTrackSettings, 218 HORTrackSettings, 219 HOROrtTrackSettings, 220 StrandTrackSettings, 221 BarTrackSettings, 222 LabelTrackSettings, 223 PositionTrackSettings, 224 LegendTrackSettings, 225 SpacerTrackSettings, 226) 227 228__author__ = "Keith Oshima (oshimak@pennmedicine.upenn.edu)" 229__license__ = "MIT" 230__all__ = [ 231 "plot_one_cen", 232 "merge_plots", 233 "draw_hor", 234 "draw_hor_ort", 235 "draw_label", 236 "draw_self_ident", 237 "draw_self_ident_hist", 238 "draw_local_self_ident", 239 "draw_bar", 240 "draw_line", 241 "draw_strand", 242 "draw_legend", 243 "read_bed9", 244 "read_bed_hor", 245 "read_bed_identity", 246 "read_bed_label", 247 "read_one_cen_tracks", 248 "Track", 249 "TrackType", 250 "TrackPosition", 251 "TrackList", 252 "LegendPosition", 253 "PlotSettings", 254 "TrackSettings", 255 "SelfIdentTrackSettings", 256 "LocalSelfIdentTrackSettings", 257 "StrandTrackSettings", 258 "HORTrackSettings", 259 "HOROrtTrackSettings", 260 "BarTrackSettings", 261 "LineTrackSettings", 262 "LabelTrackSettings", 263 "PositionTrackSettings", 264 "LegendTrackSettings", 265 "SpacerTrackSettings", 266] 267 268logging.getLogger(__name__).addHandler(logging.NullHandler())
24def plot_one_cen( 25 tracks: list[Track], 26 outdir: str, 27 chrom: str, 28 settings: PlotSettings, 29) -> tuple[Figure, np.ndarray, list[str]]: 30 """ 31 Plot a single centromere figure from a list of `Track`s. 32 33 # Args 34 * `tracks` 35 * List of tracks to plot. The order in the list determines placement on the figure. 36 * `outdir` 37 * Output directory. 38 * `chrom` 39 * Chromosome name to filter for in `Track.data` 40 * `settings` 41 * Settings for output plots. 42 43 # Returns 44 * Figure, its axes, and the output filename(s). 45 46 # Usage 47 ```python 48 import cenplot 49 50 chrom = "chm13_chr10:38568472-42561808" 51 track_list, settings = cenplot.read_one_cen_tracks("tracks_example_api.toml", chrom=chrom) 52 fig, axes, outfiles = cenplot.plot_one_cen(track_list.tracks, "plots", chrom, settings) 53 ``` 54 """ 55 # Show chrom trimmed of spaces for logs and filenames. 56 logging.info(f"Plotting {chrom}...") 57 58 if not settings.xlim: 59 # Get min and max position of all tracks for this cen. 60 _, min_st_pos = get_min_max_track(tracks, typ="min") 61 _, max_end_pos = get_min_max_track(tracks, typ="max", default_col="chrom_end") 62 else: 63 min_st_pos = settings.xlim[0] 64 max_end_pos = settings.xlim[1] 65 66 # # Scale height based on track length. 67 # adj_height = height * (trk_max_end / max_end_pos) 68 # height = height if adj_height == 0 else adj_height 69 70 fig, axes, track_indices = create_subplots( 71 tracks, 72 settings, 73 ) 74 if settings.legend_pos == LegendPosition.Left: 75 track_col, legend_col = 1, 0 76 else: 77 track_col, legend_col = 0, 1 78 79 track_labels: list[str] = [] 80 81 def get_track_label(chrom: str, track: Track, all_track_labels: list[str]) -> str: 82 if not track.title: 83 return "" 84 try: 85 fmt_track_label = track.title.format(chrom=chrom) 86 except KeyError: 87 fmt_track_label = track.title 88 89 track_label = fmt_track_label.encode("ascii", "ignore").decode("unicode_escape") 90 91 # Update track label for each overlap. 92 if track.pos == TrackPosition.Overlap: 93 try: 94 track_label = f"{all_track_labels[-1]}\n{track_label}" 95 except IndexError: 96 pass 97 98 return track_label 99 100 num_hor_split = 0 101 for idx, track in enumerate(tracks): 102 track_row = track_indices[idx] 103 track_label = get_track_label(chrom, track, track_labels) 104 105 try: 106 track_ax: Axes = axes[track_row, track_col] 107 except IndexError: 108 print(f"Cannot get track ({track_row, track_col}) for {track}.") 109 continue 110 try: 111 legend_ax: Axes = axes[track_row, legend_col] 112 except IndexError: 113 legend_ax = None 114 115 # Set xaxis limits 116 track_ax.set_xlim(min_st_pos, max_end_pos) 117 118 if track.opt == TrackType.Legend: 119 draw_legend(track_ax, axes, track, tracks, track_row, track_col) 120 elif track.opt == TrackType.Position: 121 # Hide everything but x-axis 122 format_ax( 123 track_ax, 124 grid=True, 125 xticklabel_fontsize=track.options.legend_fontsize, 126 yticks=True, 127 yticklabel_fontsize=track.options.legend_fontsize, 128 spines=("right", "left", "top"), 129 ) 130 elif track.opt == TrackType.Spacer: 131 # Hide everything. 132 format_ax( 133 track_ax, 134 grid=True, 135 xticks=True, 136 yticks=True, 137 spines=("right", "left", "top", "bottom"), 138 ) 139 else: 140 # Switch track option. {bar, label, ident, hor} 141 # Add legend. 142 if track.opt == TrackType.HOR or track.opt == TrackType.HORSplit: 143 draw_fn = draw_hor 144 elif track.opt == TrackType.HOROrt: 145 draw_fn = draw_hor_ort 146 elif track.opt == TrackType.Label: 147 draw_fn = draw_label 148 elif track.opt == TrackType.SelfIdent: 149 draw_fn = draw_self_ident 150 elif track.opt == TrackType.LocalSelfIdent: 151 draw_fn = draw_local_self_ident 152 elif track.opt == TrackType.Bar: 153 draw_fn = draw_bar 154 elif track.opt == TrackType.Line: 155 draw_fn = draw_line 156 elif track.opt == TrackType.Strand: 157 draw_fn = draw_strand 158 else: 159 raise ValueError("Invalid TrackType. Unreachable.") 160 161 draw_fn( 162 ax=track_ax, 163 legend_ax=legend_ax, 164 track=track, 165 zorder=idx, 166 ) 167 168 # Store label if more overlaps. 169 track_labels.append(track_label) 170 171 # Set labels for both x and y axis. 172 set_both_labels(track_label, track_ax, track) 173 174 if not legend_ax: 175 continue 176 177 # Make legend title invisible for HORs split after 1. 178 if track.opt == TrackType.HORSplit: 179 legend_ax_legend = legend_ax.get_legend() 180 if legend_ax_legend and num_hor_split != 0: 181 legend_title = legend_ax_legend.get_title() 182 legend_title.set_alpha(0.0) 183 num_hor_split += 1 184 185 # Minimalize all legend cols except self-ident 186 if track.opt != TrackType.SelfIdent or ( 187 track.opt == TrackType.SelfIdent and not track.options.legend 188 ): 189 format_ax( 190 legend_ax, 191 grid=True, 192 xticks=True, 193 xticklabel_fontsize=track.options.legend_fontsize, 194 yticks=True, 195 yticklabel_fontsize=track.options.legend_fontsize, 196 spines=("right", "left", "top", "bottom"), 197 ) 198 else: 199 format_ax( 200 legend_ax, 201 grid=True, 202 xticklabel_fontsize=track.options.legend_fontsize, 203 yticklabel_fontsize=track.options.legend_fontsize, 204 spines=("right", "top"), 205 ) 206 207 # Add title 208 if settings.title: 209 title = settings.title.format(chrom=chrom) 210 fig.suptitle( 211 title, 212 x=settings.title_x, 213 y=settings.title_y, 214 horizontalalignment=settings.title_horizontalalignment, 215 fontsize=settings.title_fontsize, 216 ) 217 218 os.makedirs(outdir, exist_ok=True) 219 if isinstance(settings.format, str): 220 output_format = [settings.format] 221 else: 222 output_format = settings.format 223 # Pad between axes. 224 fig.get_layout_engine().set(h_pad=settings.axis_h_pad) 225 226 # PNG must always be plotted last. 227 # Matplotlib modifies figure settings causing formatting errors in vectorized image formats (svg, pdf) 228 png_output = "png" in output_format 229 if png_output: 230 output_format.remove("png") 231 232 outfiles = [] 233 for fmt in output_format: 234 outfile = os.path.join(outdir, f"{chrom}.{fmt}") 235 fig.savefig(outfile, dpi=settings.dpi, transparent=settings.transparent) 236 outfiles.append(outfile) 237 238 if png_output: 239 outfile = os.path.join(outdir, f"{chrom}.png") 240 fig.savefig( 241 outfile, 242 dpi=settings.dpi, 243 transparent=settings.transparent, 244 ) 245 outfiles.append(outfile) 246 247 return fig, axes, outfiles
Plot a single centromere figure from a list of Track
s.
Args
tracks
- List of tracks to plot. The order in the list determines placement on the figure.
outdir
- Output directory.
chrom
- Chromosome name to filter for in
Track.data
- Chromosome name to filter for in
settings
- Settings for output plots.
Returns
- Figure, its axes, and the output filename(s).
Usage
import cenplot
chrom = "chm13_chr10:38568472-42561808"
track_list, settings = cenplot.read_one_cen_tracks("tracks_example_api.toml", chrom=chrom)
fig, axes, outfiles = cenplot.plot_one_cen(track_list.tracks, "plots", chrom, settings)
90def merge_plots( 91 figures: list[tuple[Figure, np.ndarray, list[str]]], outfile: str 92) -> None: 93 """ 94 Merge plots produced by `plot_one_cen`. 95 96 # Args 97 * `figures` 98 * List of figures, their axes, and the name of the output files. Only pngs are concatentated. 99 * `outfile` 100 * Output merged file. 101 * Either `png` or `pdf` 102 103 # Returns 104 * None 105 """ 106 if outfile.endswith(".pdf"): 107 with PdfPages(outfile) as pdf: 108 for fig, _, _ in figures: 109 pdf.savefig(fig) 110 else: 111 merged_images = np.concatenate( 112 [ 113 plt.imread(file) 114 for _, _, files in figures 115 for file in files 116 if file.endswith("png") 117 ] 118 ) 119 plt.imsave(outfile, merged_images)
Merge plots produced by plot_one_cen
.
Args
figures
- List of figures, their axes, and the name of the output files. Only pngs are concatentated.
outfile
- Output merged file.
- Either
png
orpdf
Returns
- None
24def draw_hor( 25 ax: Axes, 26 track: Track, 27 *, 28 zorder: float = 1.0, 29 legend_ax: Axes | None = None, 30): 31 """ 32 Draw HOR plot on axis with the given `Track`. 33 """ 34 hide_x = track.options.hide_x 35 legend = track.options.legend 36 border = track.options.bg_border 37 bg_color = track.options.bg_color 38 39 if track.pos != TrackPosition.Overlap: 40 spines = ( 41 ("right", "left", "top", "bottom") if hide_x else ("right", "left", "top") 42 ) 43 else: 44 spines = None 45 46 format_ax( 47 ax, 48 xticks=hide_x, 49 xticklabel_fontsize=track.options.fontsize, 50 yticks=True, 51 yticklabel_fontsize=track.options.fontsize, 52 spines=spines, 53 ) 54 55 ylim = ax.get_ylim() 56 height = ylim[1] - ylim[0] 57 58 if track.options.mode == "hor": 59 colname = "name" 60 else: 61 colname = "mer" 62 63 # Add HOR track. 64 for row in track.data.iter_rows(named=True): 65 start = row["chrom_st"] 66 end = row["chrom_end"] 67 color = row["color"] 68 rect = Rectangle( 69 (start, 0), 70 end + 1 - start, 71 height, 72 color=color, 73 lw=0, 74 label=row[colname], 75 zorder=zorder, 76 ) 77 ax.add_patch(rect) 78 79 if border: 80 # Ensure border is always on top. 81 add_rect(ax, height, zorder + 1.0) 82 83 if bg_color: 84 # Ensure bg is below everything. 85 add_rect(ax, height, zorder - 1.0, fill=True, color=bg_color) 86 87 if legend_ax and legend: 88 draw_uniq_entry_legend( 89 legend_ax, 90 track, 91 ref_ax=ax, 92 ncols=track.options.legend_ncols, 93 loc="center left", 94 alignment="left", 95 )
Draw HOR plot on axis with the given Track
.
11def draw_hor_ort( 12 ax: Axes, 13 track: Track, 14 *, 15 zorder: float = 1.0, 16 legend_ax: Axes | None = None, 17): 18 """ 19 Draw HOR ort plot on axis with the given `Track`. 20 """ 21 draw_strand(ax, track, zorder=zorder, legend_ax=legend_ax)
Draw HOR ort plot on axis with the given Track
.
10def draw_label( 11 ax: Axes, 12 track: Track, 13 *, 14 zorder: float = 1.0, 15 legend_ax: Axes | None = None, 16) -> None: 17 """ 18 Draw label plot on axis with the given `Track`. 19 """ 20 hide_x = track.options.hide_x 21 color = track.options.color 22 alpha = track.options.alpha 23 legend = track.options.legend 24 border = track.options.bg_border 25 edgecolor = track.options.edgecolor 26 27 patch_options: dict[str, Any] = {"zorder": zorder} 28 patch_options["alpha"] = alpha 29 30 # Overlapping tracks should not cause the overlapped track to have their spines/ticks/ticklabels removed. 31 if track.pos != TrackPosition.Overlap: 32 spines = ( 33 ("right", "left", "top", "bottom") if hide_x else ("right", "left", "top") 34 ) 35 yticks = True 36 else: 37 yticks = False 38 spines = None 39 format_ax( 40 ax, 41 xticks=hide_x, 42 xticklabel_fontsize=track.options.fontsize, 43 yticks=yticks, 44 yticklabel_fontsize=track.options.fontsize, 45 spines=spines, 46 ) 47 48 ylim = ax.get_ylim() 49 height = ylim[1] - ylim[0] 50 51 patch_options["edgecolor"] = edgecolor 52 53 for row in track.data.iter_rows(named=True): 54 start = row["chrom_st"] 55 end = row["chrom_end"] 56 57 if row["name"] == "-" or not row["name"]: 58 labels = {} 59 else: 60 labels = {"label": row["name"]} 61 62 # Allow override. 63 if color: 64 patch_options["facecolor"] = color 65 elif "color" in row: 66 patch_options["facecolor"] = row["color"] 67 68 if track.options.shape == "rect": 69 rect = Rectangle( 70 (start, 0), 71 end + 1 - start, 72 height, 73 **labels, 74 **patch_options, 75 ) 76 ax.add_patch(rect) 77 elif track.options.shape == "tri": 78 midpt = ((end - start) / 2) + start 79 vertices = [ 80 (start, height), 81 (end, height), 82 # tip 83 (midpt, 0), 84 ] 85 ptch = Polygon( 86 vertices, 87 closed=True, 88 **labels, 89 **patch_options, 90 ) 91 ax.add_patch(ptch) 92 93 if border: 94 # Ensure border on top with larger zorder. 95 add_rect(ax, height, fill=False, zorder=zorder + 1.0) 96 97 # Draw legend. 98 if legend_ax and legend: 99 draw_uniq_entry_legend( 100 legend_ax, 101 track, 102 ref_ax=ax, 103 ncols=track.options.legend_ncols, 104 label_order=track.options.legend_label_order, 105 loc="center left", 106 alignment="left", 107 )
Draw label plot on axis with the given Track
.
55def draw_self_ident( 56 ax: Axes, 57 track: Track, 58 *, 59 zorder: float = 1.0, 60 legend_ax: Axes | None = None, 61) -> None: 62 """ 63 Draw self identity plot on axis with the given `Track`. 64 """ 65 hide_x = track.options.hide_x 66 invert = track.options.invert 67 legend = track.options.legend 68 69 colors, verts = [], [] 70 spines = ("right", "left", "top", "bottom") if hide_x else ("right", "left", "top") 71 format_ax( 72 ax, 73 xticks=hide_x, 74 xticklabel_fontsize=track.options.fontsize, 75 yticks=True, 76 yticklabel_fontsize=track.options.fontsize, 77 spines=spines, 78 ) 79 80 if invert: 81 df_track = track.data.with_columns(y=-pl.col("y")) 82 else: 83 df_track = track.data 84 85 for _, df_diam in df_track.group_by(["group"]): 86 df_points = df_diam.select("x", "y") 87 color = df_diam["color"].first() 88 colors.append(color) 89 verts.append(df_points) 90 91 # https://stackoverflow.com/a/29000246 92 polys = PolyCollection(verts, zorder=zorder) 93 polys.set(array=None, facecolors=colors) 94 ax.add_collection(polys) 95 96 ax.set_ylim(df_track["y"].min(), df_track["y"].max()) 97 98 if legend_ax and legend: 99 draw_self_ident_hist(legend_ax, track, zorder=zorder)
Draw self identity plot on axis with the given Track
.
12def draw_self_ident_hist(ax: Axes, track: Track, *, zorder: float = 1.0): 13 """ 14 Draw self identity histogram plot on axis with the given `Track`. 15 """ 16 legend_bins = track.options.legend_bins 17 legend_xmin = track.options.legend_xmin 18 legend_asp_ratio = track.options.legend_asp_ratio 19 colorscale = track.options.colorscale 20 assert isinstance(colorscale, dict), ( 21 f"Colorscale not a identity interval mapping for {track.title}" 22 ) 23 24 cmap = IntervalTree( 25 Interval(rng[0], rng[1], color) for rng, color in colorscale.items() 26 ) 27 cnts, values, bars = ax.hist( 28 track.data["percent_identity_by_events"], bins=legend_bins, zorder=zorder 29 ) 30 ax.set_xlim(legend_xmin, 100.0) 31 ax.minorticks_on() 32 ax.set_xlabel( 33 "Mean nucleotide identity\nbetween pairwise intervals", 34 fontsize=track.options.legend_title_fontsize, 35 ) 36 ax.set_ylabel( 37 "# of Intervals (thousands)", fontsize=track.options.legend_title_fontsize 38 ) 39 40 # Ensure that legend is only a portion of the total height. 41 # Otherwise, take up entire axis dim. 42 ax.set_box_aspect(legend_asp_ratio) 43 44 for _, value, bar in zip(cnts, values, bars): 45 # Make value a non-null interval 46 # ex. (1,1) -> (1, 1.000001) 47 color = cmap.overlap(value, value + 0.00001) 48 try: 49 color = next(iter(color)).data 50 except Exception: 51 color = None 52 bar.set_facecolor(color)
Draw self identity histogram plot on axis with the given Track
.
9def draw_local_self_ident( 10 ax: Axes, 11 track: Track, 12 *, 13 zorder: float = 1.0, 14 legend_ax: Axes | None = None, 15) -> None: 16 """ 17 Draw local, self identity plot on axis with the given `Track`. 18 """ 19 if not track.options.legend_label_order: 20 track.options.legend_label_order = [ 21 f"{cs[0]}-{cs[1]}" 22 for cs in track.options.colorscale.keys() 23 ] 24 draw_label(ax, track, zorder=zorder, legend_ax=legend_ax)
Draw local, self identity plot on axis with the given Track
.
9def draw_bar( 10 ax: Axes, 11 track: Track, 12 *, 13 zorder: float = 1.0, 14 legend_ax: Axes | None = None, 15) -> None: 16 """ 17 Draw bar plot on axis with the given `Track`. 18 """ 19 hide_x = track.options.hide_x 20 color = track.options.color 21 alpha = track.options.alpha 22 legend = track.options.legend 23 ymin = track.options.ymin 24 ymax = track.options.ymax 25 label = track.options.label 26 27 if track.pos != TrackPosition.Overlap: 28 spines = ("right", "top") 29 else: 30 spines = None 31 32 format_ax( 33 ax, 34 xticks=hide_x, 35 xticklabel_fontsize=track.options.fontsize, 36 yticklabel_fontsize=track.options.fontsize, 37 spines=spines, 38 ) 39 40 plot_options = {"zorder": zorder, "alpha": alpha} 41 if color: 42 plot_options["color"] = color 43 elif "color" in track.data.columns: 44 plot_options["color"] = track.data["color"] 45 else: 46 plot_options["color"] = track.options.DEF_COLOR 47 48 # Add bar 49 ax.bar( 50 track.data["chrom_st"], 51 track.data["name"], 52 track.data["chrom_end"] - track.data["chrom_st"], 53 label=label, 54 **plot_options, 55 ) 56 # Trim plot to margins 57 ax.margins(x=0, y=0) 58 ax.set_ylim(ymin=ymin, ymax=ymax) 59 60 if legend_ax and legend: 61 draw_uniq_entry_legend( 62 legend_ax, 63 track, 64 ref_ax=ax, 65 ncols=track.options.legend_ncols, 66 loc="center left", 67 alignment="left", 68 )
Draw bar plot on axis with the given Track
.
10def draw_line( 11 ax: Axes, 12 track: Track, 13 *, 14 zorder: float = 1.0, 15 legend_ax: Axes | None = None, 16) -> None: 17 """ 18 Draw line plot on axis with the given `Track`. 19 """ 20 hide_x = track.options.hide_x 21 color = track.options.color 22 alpha = track.options.alpha 23 legend = track.options.legend 24 ymin = track.options.ymin 25 ymax = track.options.ymax 26 label = track.options.label 27 linestyle = track.options.linestyle 28 linewidth = track.options.linewidth 29 marker = track.options.marker 30 markersize = track.options.markersize 31 32 if track.pos != TrackPosition.Overlap: 33 spines = ("right", "top") 34 else: 35 spines = None 36 37 format_ax( 38 ax, 39 xticks=hide_x, 40 xticklabel_fontsize=track.options.fontsize, 41 yticklabel_fontsize=track.options.fontsize, 42 spines=spines, 43 ) 44 45 plot_options = {"zorder": zorder, "alpha": alpha} 46 if color: 47 plot_options["color"] = color 48 elif "color" in track.data.columns: 49 plot_options["color"] = track.data["color"] 50 else: 51 plot_options["color"] = track.options.DEF_COLOR 52 53 if linestyle: 54 plot_options["linestyle"] = linestyle 55 if linewidth: 56 plot_options["linewidth"] = linewidth 57 58 # Fill between cannot add markers 59 if not track.options.fill: 60 plot_options["marker"] = marker 61 if markersize: 62 plot_options["markersize"] = markersize 63 64 if track.options.position == "midpoint": 65 df = track.data.with_columns( 66 chrom_st=pl.col("chrom_st") + (pl.col("chrom_end") - pl.col("chrom_st")) / 2 67 ) 68 else: 69 df = track.data 70 71 if track.options.log_scale: 72 ax.set_yscale("log") 73 74 # Add bar 75 if track.options.fill: 76 ax.fill_between( 77 df["chrom_st"], 78 df["name"], 79 0, 80 label=label, 81 **plot_options, 82 ) 83 else: 84 ax.plot( 85 df["chrom_st"], 86 df["name"], 87 label=label, 88 **plot_options, 89 ) 90 91 # Trim plot to margins 92 ax.margins(x=0, y=0) 93 ax.set_ylim(ymin=ymin, ymax=ymax) 94 95 if legend_ax and legend: 96 draw_uniq_entry_legend( 97 legend_ax, 98 track, 99 ref_ax=ax, 100 ncols=track.options.legend_ncols, 101 loc="center left", 102 alignment="left", 103 )
Draw line plot on axis with the given Track
.
8def draw_strand( 9 ax: Axes, 10 track: Track, 11 *, 12 zorder: float = 1.0, 13 legend_ax: Axes | None = None, 14): 15 """ 16 Draw strand plot on axis with the given `Track`. 17 """ 18 hide_x = track.options.hide_x 19 fwd_color = ( 20 track.options.fwd_color if track.options.fwd_color else track.options.DEF_COLOR 21 ) 22 rev_color = ( 23 track.options.rev_color if track.options.rev_color else track.options.DEF_COLOR 24 ) 25 scale = track.options.scale 26 legend = track.options.legend 27 28 if track.pos != TrackPosition.Overlap: 29 spines = ( 30 ("right", "left", "top", "bottom") if hide_x else ("right", "left", "top") 31 ) 32 else: 33 spines = None 34 35 format_ax( 36 ax, 37 xticks=hide_x, 38 xticklabel_fontsize=track.options.fontsize, 39 yticks=True, 40 yticklabel_fontsize=track.options.fontsize, 41 spines=spines, 42 ) 43 44 ylim = ax.get_ylim() 45 height = ylim[1] - ylim[0] 46 47 for row in track.data.iter_rows(named=True): 48 # sample arrow 49 start = row["chrom_st"] 50 end = row["chrom_end"] 51 strand = row["strand"] 52 if strand == "-": 53 tmp_start = start 54 start = end 55 end = tmp_start 56 color = rev_color 57 else: 58 color = fwd_color 59 60 if track.options.use_item_rgb: 61 color = row["color"] 62 63 arrow = FancyArrowPatch( 64 (start, height * 0.5), 65 (end, height * 0.5), 66 mutation_scale=scale, 67 color=color, 68 clip_on=False, 69 zorder=zorder, 70 label=row["name"], 71 ) 72 ax.add_patch(arrow) 73 74 if legend_ax and legend: 75 draw_uniq_entry_legend( 76 legend_ax, 77 track, 78 ref_ax=ax, 79 ncols=track.options.legend_ncols, 80 loc="center", 81 )
Draw strand plot on axis with the given Track
.
11def draw_legend( 12 ax: Axes, 13 axes: np.ndarray, 14 track: Track, 15 tracks: list[Track], 16 track_row: int, 17 track_col: int, 18) -> None: 19 """ 20 Draw legend plot on axis for the given `Track`. 21 22 # Args 23 * `ax` 24 * Axis to plot on. 25 * `axes` 26 * 2D `np.ndarray` of all axes to get reference axis. 27 * `track` 28 * Current `Track`. 29 * `tracks` 30 * All tracks to get reference `Track`. 31 * `track_row` 32 * Reference track row. 33 * `track_col` 34 * Reference track col. 35 36 # Returns 37 * None 38 """ 39 ref_track_row = ( 40 track.options.index if isinstance(track.options.index, int) else track_row - 1 41 ) 42 try: 43 ref_track_ax: Axes = axes[ref_track_row, track_col] 44 except IndexError: 45 print(f"Reference axis index ({ref_track_row}) doesn't exist.", sys.stderr) 46 return None 47 48 # TODO: Will not work with HOR split. 49 if hasattr(tracks[ref_track_row].options, "mode"): 50 legend_colname = ( 51 "name" 52 if tracks[ref_track_row].options.mode == "hor" 53 else tracks[ref_track_row].options 54 ) 55 else: 56 legend_colname = "name" 57 58 try: 59 srs_track = tracks[ref_track_row].data[legend_colname] 60 except Exception: 61 print(f"Legend column ({legend_colname}) doesn't exist in {track}.", sys.stderr) 62 return None 63 64 draw_uniq_entry_legend( 65 ax, 66 track, 67 ref_track_ax, 68 ncols=track.options.legend_ncols 69 if track.options.legend_ncols 70 else srs_track.n_unique(), 71 loc="center", 72 alignment="center", 73 ) 74 format_ax( 75 ax, 76 grid=True, 77 xticks=True, 78 yticks=True, 79 spines=("right", "left", "top", "bottom"), 80 )
10def read_bed9(infile: str | TextIO, *, chrom: str | None = None) -> pl.DataFrame: 11 """ 12 Read a BED9 file with no header. 13 14 # Args 15 * `infile` 16 * Input file or IO stream. 17 * `chrom` 18 * Chromsome in `chrom` column to filter for. 19 20 # Returns 21 * BED9 pl.DataFrame. 22 """ 23 try: 24 df = pl.read_csv(infile, separator="\t", has_header=False) 25 df = df.rename( 26 {col: val for col, val in BED9_COL_MAP.items() if col in df.columns} 27 ) 28 df_adj = adj_by_ctg_coords(df, "chrom").sort(by="chrom_st") 29 except Exception: 30 df_adj = pl.DataFrame(schema=BED9_COL_MAP.values()) 31 32 if chrom: 33 df_adj = df_adj.filter(pl.col("chrom") == chrom) 34 if "item_rgb" not in df_adj.columns: 35 df_adj = df_adj.with_columns(item_rgb=pl.lit("0,0,0")) 36 if "name" not in df_adj.columns: 37 df_adj = df_adj.with_columns(name=pl.lit("-")) 38 39 return df_adj
Read a BED9 file with no header.
Args
infile
- Input file or IO stream.
chrom
- Chromsome in
chrom
column to filter for.
- Chromsome in
Returns
- BED9 pl.DataFrame.
14def read_bed_hor( 15 infile: str | TextIO, 16 *, 17 chrom: str | None = None, 18 live_only: bool = True, 19 mer_size: int = HORTrackSettings.mer_size, 20 mer_filter: int = HORTrackSettings.mer_filter, 21 hor_filter: int | None = None, 22 sort_by: str = "mer", 23 sort_order: str = HORTrackSettings.sort_order, 24 color_map_file: str | None = None, 25 use_item_rgb: bool = HORTrackSettings.use_item_rgb, 26) -> pl.DataFrame: 27 """ 28 Read a HOR BED9 file with no header. 29 30 # Args 31 * `infile` 32 * Input file or IO stream. 33 * `chrom` 34 * Chromsome in `chrom` column to filter for. 35 * `live_only` 36 * Filter for only live data. 37 * Contains `L` in `name` column. 38 * `mer_size` 39 * Monomer size to calculate monomer number. 40 * `mer_filter` 41 * Filter for HORs with at least this many monomers. 42 * `hor_filter` 43 * Filter for HORs that occur at least this many times. 44 * `color_map_file` 45 * Convenience color map file for `mer` or `hor`. 46 * Two-column TSV file with no header. 47 * If `None`, use default color map. 48 * `sort_by` 49 * Sort `pl.DataFrame` by `mer`, `hor`, or `hor_count`. 50 * Can be a path to a list of `mer` or `hor` names 51 * `sort_order` 52 * Sort in ascending or descending order. 53 * `use_item_rgb` 54 * Use `item_rgb` column or generate random colors. 55 56 # Returns 57 * HOR `pl.DataFrame` 58 """ 59 df = ( 60 read_bed9(infile, chrom=chrom) 61 .lazy() 62 .with_columns( 63 length=pl.col("chrom_end") - pl.col("chrom_st"), 64 ) 65 .with_columns( 66 mer=(pl.col("length") / mer_size).round().cast(pl.Int8).clip(1, 100) 67 ) 68 .filter( 69 pl.when(live_only).then(pl.col("name").str.contains("L")).otherwise(True) 70 & (pl.col("mer") >= mer_filter) 71 ) 72 .collect() 73 ) 74 # Read color map. 75 if color_map_file: 76 color_map: dict[str, str] = {} 77 with open(color_map_file, "rt") as fh: 78 for line in fh.readlines(): 79 try: 80 name, color = line.strip().split() 81 except Exception: 82 logging.error(f"Invalid color map. ({line})") 83 continue 84 color_map[name] = color 85 else: 86 color_map = MONOMER_COLORS 87 88 df = map_value_colors( 89 df, 90 map_col="mer", 91 map_values=MONOMER_COLORS, 92 use_item_rgb=use_item_rgb, 93 ) 94 df = df.join(df.get_column("name").value_counts(name="hor_count"), on="name") 95 96 if hor_filter: 97 df = df.filter(pl.col("hor_count") >= hor_filter) 98 99 if os.path.exists(sort_order): 100 with open(sort_order, "rt") as fh: 101 defined_sort_order = [] 102 for line in fh: 103 line = line.strip() 104 defined_sort_order.append(int(line) if sort_by == "mer" else line) 105 else: 106 defined_sort_order = None 107 108 if sort_by == "mer": 109 sort_col = "mer" 110 elif sort_by == "name" and defined_sort_order: 111 sort_col = "name" 112 else: 113 sort_col = "hor_count" 114 115 if defined_sort_order: 116 # Get intersection between defined elements 117 remaining_elems = set(df[sort_col]).difference(defined_sort_order) 118 # Add remainder so all elements covered. 119 defined_sort_order.extend(remaining_elems) 120 df = df.cast({sort_col: pl.Enum(defined_sort_order)}).sort(by=sort_col) 121 else: 122 df = df.sort(sort_col, descending=sort_order == HORTrackSettings.sort_order) 123 124 return df
Read a HOR BED9 file with no header.
Args
infile
- Input file or IO stream.
chrom
- Chromsome in
chrom
column to filter for.
- Chromsome in
live_only
- Filter for only live data.
- Contains
L
inname
column.
mer_size
- Monomer size to calculate monomer number.
mer_filter
- Filter for HORs with at least this many monomers.
hor_filter
- Filter for HORs that occur at least this many times.
color_map_file
- Convenience color map file for
mer
orhor
. - Two-column TSV file with no header.
- If
None
, use default color map.
- Convenience color map file for
sort_by
- Sort
pl.DataFrame
bymer
,hor
, orhor_count
. - Can be a path to a list of
mer
orhor
names
- Sort
sort_order
- Sort in ascending or descending order.
use_item_rgb
- Use
item_rgb
column or generate random colors.
- Use
Returns
- HOR
pl.DataFrame
32def read_bed_identity( 33 infile: str | TextIO, 34 *, 35 chrom: str | None = None, 36 mode: str = "2D", 37 colorscale: Colorscale | str | None = None, 38 band_size: int = LocalSelfIdentTrackSettings.band_size, 39 ignore_band_size=LocalSelfIdentTrackSettings.ignore_band_size, 40) -> tuple[pl.DataFrame, Colorscale]: 41 """ 42 Read a self, sequence identity BED file generate by `ModDotPlot`. 43 44 Requires the following columns 45 * `query,query_st,query_end,ref,ref_st,ref_end,percent_identity_by_events` 46 47 # Args 48 * `infile` 49 * File or IO stream. 50 * `chrom` 51 * Chromosome name in `query` column to filter for. 52 * `mode` 53 * 1D or 2D self-identity. 54 * `band_size` 55 * Number of windows to calculate average sequence identity over. Only applicable if mode is 1D. 56 * `ignore_band_size` 57 * Number of windows ignored along self-identity diagonal. Only applicable if mode is 1D. 58 59 # Returns 60 * Coordinates of colored polygons in 2D space. 61 """ 62 df = pl.read_csv( 63 infile, separator="\t", has_header=False, new_columns=BED_SELF_IDENT_COLS 64 ) 65 if chrom: 66 df = df.filter(pl.col("query") == chrom) 67 68 # Check mode. Set by dev not user. 69 mode = Dim(mode) 70 71 # Build expr to filter range of colors. 72 color_expr = None 73 rng_expr = None 74 ident_colorscale = read_ident_colorscale(colorscale) 75 for rng, color in ident_colorscale.items(): 76 if not isinstance(color_expr, pl.Expr): 77 color_expr = pl.when( 78 pl.col("percent_identity_by_events").is_between(rng[0], rng[1]) 79 ).then(pl.lit(color)) 80 rng_expr = pl.when( 81 pl.col("percent_identity_by_events").is_between(rng[0], rng[1]) 82 ).then(pl.lit(f"{rng[0]}-{rng[1]}")) 83 else: 84 color_expr = color_expr.when( 85 pl.col("percent_identity_by_events").is_between(rng[0], rng[1]) 86 ).then(pl.lit(color)) 87 rng_expr = rng_expr.when( 88 pl.col("percent_identity_by_events").is_between(rng[0], rng[1]) 89 ).then(pl.lit(f"{rng[0]}-{rng[1]}")) 90 91 if isinstance(color_expr, pl.Expr): 92 color_expr = color_expr.otherwise(None) 93 else: 94 color_expr = pl.lit(None) 95 if isinstance(rng_expr, pl.Expr): 96 rng_expr = rng_expr.otherwise(None) 97 else: 98 rng_expr = pl.lit(None) 99 100 if mode == Dim.ONE: 101 df_window = ( 102 (df["query_end"] - df["query_st"]) 103 .value_counts(sort=True) 104 .rename({"query_end": "window"}) 105 ) 106 if df_window.shape[0] > 1: 107 logging.warning(f"Multiple windows detected. Taking largest.\n{df_window}") 108 window = df_window.row(0, named=True)["window"] + 1 109 df_local_ident = pl.DataFrame( 110 convert_2D_to_1D_ident(df.iter_rows(), window, band_size, ignore_band_size), 111 schema=[ 112 "chrom_st", 113 "chrom_end", 114 "percent_identity_by_events", 115 ], 116 orient="row", 117 ) 118 query = df["query"][0] 119 df_res = ( 120 df_local_ident.lazy() 121 .with_columns( 122 chrom=pl.lit(query), 123 color=color_expr, 124 name=rng_expr, 125 score=pl.col("percent_identity_by_events"), 126 strand=pl.lit("."), 127 thick_st=pl.col("chrom_st"), 128 thick_end=pl.col("chrom_end"), 129 item_rgb=pl.lit("0,0,0"), 130 ) 131 .select(*BED9_COLS, "color") 132 .collect() 133 ) 134 else: 135 tri_side = math.sqrt(2) / 2 136 df_res = ( 137 df.lazy() 138 .with_columns(color=color_expr) 139 # Get window size. 140 .with_columns( 141 window=(pl.col("query_end") - pl.col("query_st")).max().over("query") 142 ) 143 .with_columns( 144 first_pos=pl.col("query_st") // pl.col("window"), 145 second_pos=pl.col("ref_st") // pl.col("window"), 146 ) 147 # x y coords of diamond 148 .with_columns( 149 x=pl.col("first_pos") + pl.col("second_pos"), 150 y=-pl.col("first_pos") + pl.col("second_pos"), 151 ) 152 .with_columns( 153 scale=(pl.col("query_st").max() / pl.col("x").max()).over("query"), 154 group=pl.int_range(pl.len()).over("query"), 155 ) 156 .with_columns( 157 window=pl.col("window") / pl.col("scale"), 158 ) 159 # Rather than generate new dfs. Add new x,y as arrays per row. 160 .with_columns( 161 new_x=[tri_side, 0.0, -tri_side, 0.0], 162 new_y=[0.0, tri_side, 0.0, -tri_side], 163 ) 164 # Rescale x and y. 165 .with_columns( 166 ((pl.col("new_x") * pl.col("window")) + pl.col("x")) * pl.col("scale"), 167 ((pl.col("new_y") * pl.col("window")) + pl.col("y")) * pl.col("window"), 168 ) 169 .select( 170 "query", 171 "new_x", 172 "new_y", 173 "color", 174 "group", 175 "percent_identity_by_events", 176 ) 177 # arr to new rows 178 .explode("new_x", "new_y") 179 # Rename to filter later on. 180 .rename({"query": "chrom", "new_x": "x", "new_y": "y"}) 181 .collect() 182 ) 183 return df_res, ident_colorscale
Read a self, sequence identity BED file generate by ModDotPlot
.
Requires the following columns
query,query_st,query_end,ref,ref_st,ref_end,percent_identity_by_events
Args
infile
- File or IO stream.
chrom
- Chromosome name in
query
column to filter for.
- Chromosome name in
mode
- 1D or 2D self-identity.
band_size
- Number of windows to calculate average sequence identity over. Only applicable if mode is 1D.
ignore_band_size
- Number of windows ignored along self-identity diagonal. Only applicable if mode is 1D.
Returns
- Coordinates of colored polygons in 2D space.
9def read_bed_label(infile: str | TextIO, *, chrom: str | None = None) -> pl.DataFrame: 10 """ 11 Read a BED9 file with no header. 12 * Labels are ordered by length. 13 14 # Args 15 * `infile` 16 * Input file or IO stream. 17 * `chrom` 18 * Chromsome in `chrom` column to filter for. 19 20 # Returns 21 * BED9 pl.DataFrame. 22 """ 23 df_track = read_bed9(infile, chrom=chrom) 24 25 # Order facets by descending length. This prevents larger annotations from blocking others. 26 fct_name_order = ( 27 df_track.group_by(["name"]) 28 .agg(len=(pl.col("chrom_end") - pl.col("chrom_st")).sum()) 29 .sort(by="len", descending=True) 30 .get_column("name") 31 ) 32 return df_track.cast({"name": pl.Enum(fct_name_order)})
Read a BED9 file with no header.
- Labels are ordered by length.
Args
infile
- Input file or IO stream.
chrom
- Chromsome in
chrom
column to filter for.
- Chromsome in
Returns
- BED9 pl.DataFrame.
244def read_one_cen_tracks( 245 input_track: BinaryIO, *, chrom: str | None = None 246) -> tuple[TrackList, PlotSettings]: 247 """ 248 Read a `TOML` or `YAML` file of tracks to plot optionally filtering for a chrom name. 249 250 Expected to have two items: 251 * `[settings]` 252 * See `cenplot.PlotSettings` 253 * `[[tracks]]` 254 * See one of the `cenplot.TrackSettings` for more details. 255 256 Example: 257 ```toml 258 [settings] 259 format = "png" 260 transparent = true 261 dim = [16.0, 8.0] 262 dpi = 600 263 ``` 264 265 ```yaml 266 settings: 267 format: "png" 268 transparent: true 269 dim: [16.0, 8.0] 270 dpi: 600 271 ``` 272 273 # Args: 274 * input_track: 275 * Input track `TOML` or `YAML` file. 276 * chrom: 277 * Chromosome name in 1st column (`chrom`) to filter for. 278 * ex. `chr4` 279 280 # Returns: 281 * List of tracks w/contained chroms and plot settings. 282 """ 283 all_tracks = [] 284 chroms = set() 285 # Reset file position. 286 input_track.seek(0) 287 # Try TOML 288 try: 289 dict_settings = tomllib.load(input_track) 290 except Exception: 291 input_track.seek(0) 292 # Then YAML 293 try: 294 dict_settings = yaml.safe_load(input_track) 295 except Exception: 296 raise TypeError("Invalid file type for settings.") 297 298 settings: dict[str, Any] = dict_settings.get("settings", {}) 299 if settings.get("dim"): 300 settings["dim"] = tuple(settings["dim"]) 301 302 for track_info in dict_settings.get("tracks", []): 303 for track in read_one_track_info(track_info, chrom=chrom): 304 all_tracks.append(track) 305 # Tracks legend and position have no data. 306 if not isinstance(track.data, pl.DataFrame): 307 continue 308 chroms.update(track.data["chrom"]) 309 tracklist = TrackList(all_tracks, chroms) 310 311 _, min_st_pos = get_min_max_track(all_tracks, typ="min") 312 _, max_end_pos = get_min_max_track(all_tracks, typ="max", default_col="chrom_end") 313 if settings.get("xlim"): 314 settings["xlim"] = tuple(settings["xlim"]) 315 else: 316 settings["xlim"] = (min_st_pos, max_end_pos) 317 318 plot_settings = PlotSettings(**settings) 319 return tracklist, plot_settings
Read a TOML
or YAML
file of tracks to plot optionally filtering for a chrom name.
Expected to have two items:
[settings]
[[tracks]]
- See one of the
cenplot.TrackSettings
for more details.
- See one of the
Example:
[settings]
format = "png"
transparent = true
dim = [16.0, 8.0]
dpi = 600
settings:
format: "png"
transparent: true
dim: [16.0, 8.0]
dpi: 600
Args:
- input_track:
- Input track
TOML
orYAML
file.
- Input track
- chrom:
- Chromosome name in 1st column (
chrom
) to filter for. - ex.
chr4
- Chromosome name in 1st column (
Returns:
- List of tracks w/contained chroms and plot settings.
142class Track(NamedTuple): 143 """ 144 A centromere track. 145 """ 146 147 title: str | None 148 """ 149 Title of track. 150 * ex. "{chrom}" 151 * ex. "HOR monomers" 152 """ 153 pos: TrackPosition 154 """ 155 Track position. 156 """ 157 opt: TrackType 158 """ 159 Track option. 160 """ 161 prop: float 162 """ 163 Proportion of track in final figure. 164 """ 165 data: pl.DataFrame 166 """ 167 Track data. 168 """ 169 options: TrackSettings # type: ignore 170 """ 171 Plot settings. 172 """
A centromere track.
Create new instance of Track(title, pos, opt, prop, data, options)
Plot settings.
15class TrackType(StrEnum): 16 """ 17 Track options. 18 * Input track data is expected to be headerless. 19 """ 20 21 HOR = auto() 22 """ 23 An alpha-satellite higher order repeat (HOR) track with HORs by monomer number overlapping. 24 25 Expected format: 26 * [`BED9`](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) 27 * `name` as HOR variant 28 * ex. `S4CYH1L.44-1` 29 """ 30 HORSplit = auto() 31 """ 32 A split alpha-satellite higher order repeat (HOR) track with each type of HOR as a single track. 33 * `mer` or the number of monomers within the HOR. 34 * `hor` or HOR variant. 35 36 Expected format: 37 * [`BED9`](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) 38 * `name` as HOR variant 39 * ex. `S4CYH1L.44-1` 40 """ 41 HOROrt = auto() 42 """ 43 An alpha-satellite higher order repeat (HOR) orientation track. 44 * This is calculate with default settings via the [`censtats`](https://github.com/logsdon-lab/CenStats) library. 45 46 Expected format: 47 * [`BED9`](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) 48 * `name` as HOR variant 49 * ex. `S4CYH1L.44-1` 50 * `strand` as `+` or `-` 51 """ 52 Label = auto() 53 """ 54 A label track. Elements in the `name` column are displayed as colored rectangles. 55 56 Expected format: 57 * [`BED4-9`](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) 58 * `name` as any string value. 59 """ 60 Bar = auto() 61 """ 62 A bar plot track. Elements in the `name` column are displayed as bars. 63 64 Expected format: 65 * `BED9` 66 * `name` as any numeric value. 67 """ 68 69 Line = auto() 70 """ 71 A line plot track. 72 73 Expected format: 74 * `BED9` 75 * `name` as any numeric value. 76 """ 77 78 SelfIdent = auto() 79 """ 80 A self, sequence identity heatmap track displayed as a triangle. 81 * Similar to plots from [`ModDotPlot`](https://github.com/marbl/ModDotPlot) 82 83 Expected format: 84 * `BEDPE*` 85 * Paired identity bedfile produced by `ModDotPlot` without a header. 86 87 |query|query_st|query_end|reference|reference_st|reference_end|percent_identity_by_events| 88 |-|-|-|-|-|-|-| 89 |x|1|5000|x|1|5000|100.0| 90 91 """ 92 LocalSelfIdent = auto() 93 """ 94 A self, sequence identity track showing local identity. 95 * Derived from [`ModDotPlot`](https://github.com/marbl/ModDotPlot) 96 97 Expected format: 98 * `BEDPE*` 99 * Paired identity bedfile produced by `ModDotPlot` without a header. 100 101 |query|query_st|query_end|reference|reference_st|reference_end|percent_identity_by_events| 102 |-|-|-|-|-|-|-| 103 |x|1|5000|x|1|5000|100.0| 104 """ 105 106 Strand = auto() 107 """ 108 Strand track. 109 110 Expected format: 111 * `BED9` 112 * `strand` as either `+` or `-` 113 """ 114 115 Position = auto() 116 """ 117 Position track. 118 * Displays the x-axis position as well as a label. 119 120 Expected format: 121 * None 122 """ 123 124 Legend = auto() 125 """ 126 Legend track. Displays the legend of a specified track. 127 * NOTE: This does not work with `TrackType.HORSplit` 128 129 Expected format: 130 * None 131 """ 132 133 Spacer = auto() 134 """ 135 Spacer track. Empty space. 136 137 Expected format: 138 * None 139 """
Track options.
- Input track data is expected to be headerless.
A self, sequence identity heatmap track displayed as a triangle.
- Similar to plots from
ModDotPlot
Expected format:
BEDPE*
- Paired identity bedfile produced by
ModDotPlot
without a header.
- Paired identity bedfile produced by
query | query_st | query_end | reference | reference_st | reference_end | percent_identity_by_events |
---|---|---|---|---|---|---|
x | 1 | 5000 | x | 1 | 5000 | 100.0 |
A self, sequence identity track showing local identity.
- Derived from
ModDotPlot
Expected format:
BEDPE*
- Paired identity bedfile produced by
ModDotPlot
without a header.
- Paired identity bedfile produced by
query | query_st | query_end | reference | reference_st | reference_end | percent_identity_by_events |
---|---|---|---|---|---|---|
x | 1 | 5000 | x | 1 | 5000 | 100.0 |
Position track.
- Displays the x-axis position as well as a label.
Expected format:
- None
Legend track. Displays the legend of a specified track.
- NOTE: This does not work with
TrackType.HORSplit
Expected format:
- None
175class TrackList(NamedTuple): 176 """ 177 Track list. 178 """ 179 180 tracks: list[Track] 181 """ 182 Tracks. 183 """ 184 chroms: set[str] 185 """ 186 Chromosomes found with `tracks`. 187 """
Track list.
9@dataclass 10class PlotSettings: 11 """ 12 Plot settings for a single plot. 13 """ 14 15 title: str | None = None 16 """ 17 Figure title. 18 19 Can use "{chrom}" to replace with chrom name. 20 """ 21 22 title_x: float | None = 0.02 23 """ 24 Figure title x position. 25 """ 26 27 title_y: float | None = None 28 """ 29 Figure title y position. 30 """ 31 32 title_fontsize: float | str = "xx-large" 33 """ 34 Figure title fontsize. 35 """ 36 37 title_horizontalalignment: str = "left" 38 """ 39 Figure title position. 40 """ 41 42 format: list[OutputFormat] | OutputFormat = "png" 43 """ 44 Output format(s). Either `"pdf"`, `"png"`, or `"svg"`. 45 """ 46 transparent: bool = True 47 """ 48 Output a transparent image. 49 """ 50 dim: tuple[float, float] = (20.0, 12.0) 51 """ 52 The dimensions of each plot. 53 """ 54 dpi: int = 600 55 """ 56 Set the plot DPI per plot. 57 """ 58 layout: str = "tight" 59 """ 60 Layout engine option for matplotlib. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure. 61 """ 62 legend_pos: LegendPosition = LegendPosition.Right 63 """ 64 Legend position as `LegendPosition`. Either `LegendPosition.Right` or `LegendPosition.Left`. 65 """ 66 legend_prop: float = 0.2 67 """ 68 Legend proportion of plot. 69 """ 70 axis_h_pad: float = 0.2 71 """ 72 Apply a height padding to each axis. 73 """ 74 xlim: tuple[int, int] | None = None 75 """ 76 Set x-axis limit across all plots. 77 * `None` - Use the min and max position across all tracks. 78 * `tuple[float, float]` - Use provided coordinates as min and max position. 79 """
Plot settings for a single plot.
Output format(s). Either "pdf"
, "png"
, or "svg"
.
Layout engine option for matplotlib. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure.
Legend position as LegendPosition
. Either LegendPosition.Right
or LegendPosition.Left
.
71@dataclass 72class SelfIdentTrackSettings(DefaultTrackSettings): 73 """ 74 Self-identity heatmap triangle plot options. 75 """ 76 77 invert: bool = True 78 """ 79 Invert the self identity triangle. 80 """ 81 legend_bins: int = 300 82 """ 83 Number of bins for `perc_identity_by_events` in the legend. 84 """ 85 legend_xmin: float = 70.0 86 """ 87 Legend x-min coordinate. Used to constrain x-axis limits. 88 """ 89 legend_asp_ratio: float | None = 1.0 90 """ 91 Aspect ratio of legend. If `None`, takes up entire axis. 92 """ 93 colorscale: Colorscale | str | None = None 94 """ 95 Colorscale for identity as TSV file. 96 * Format: `[start, end, color]` 97 * Color is a `str` representing a color name or hexcode. 98 * See https://matplotlib.org/stable/users/explain/colors/colors.html 99 * ex. `0\t90\tblue` 100 """
Self-identity heatmap triangle plot options.
Colorscale for identity as TSV file.
- Format:
[start, end, color]
- Color is a
str
representing a color name or hexcode. - See https://matplotlib.org/stable/users/explain/colors/colors.html
- Color is a
- ex.
0 90 blue
146@dataclass 147class LocalSelfIdentTrackSettings(LabelTrackSettings): 148 """ 149 Local self-identity plot options. 150 """ 151 152 colorscale: Colorscale | str | None = None 153 """ 154 Colorscale for identity as TSV file. 155 * Format: `[start, end, color]` 156 * Color is a `str` representing a color name or hexcode. 157 * See https://matplotlib.org/stable/users/explain/colors/colors.html 158 * ex. `0\t90\tblue` 159 """ 160 band_size: int = 5 161 """ 162 Number of windows to calculate average sequence identity over. 163 """ 164 ignore_band_size: int = 2 165 """ 166 Number of windows ignored along self-identity diagonal. 167 """
Local self-identity plot options.
Colorscale for identity as TSV file.
- Format:
[start, end, color]
- Color is a
str
representing a color name or hexcode. - See https://matplotlib.org/stable/users/explain/colors/colors.html
- Color is a
- ex.
0 90 blue
243@dataclass 244class StrandTrackSettings(DefaultTrackSettings): 245 """ 246 Strand arrow plot options. 247 """ 248 249 DEF_COLOR = "black" 250 """ 251 Default color for arrows. 252 """ 253 scale: float = 50 254 """ 255 Scale arrow attributes by this factor as well as length. 256 """ 257 fwd_color: str | None = None 258 """ 259 Color of `+` arrows. 260 """ 261 rev_color: str | None = None 262 """ 263 Color of `-` arrows. 264 """ 265 use_item_rgb: bool = False 266 """ 267 Use `item_rgb` column if provided. Otherwise, use `fwd_color` and `rev_color`. 268 """
Strand arrow plot options.
311@dataclass 312class HORTrackSettings(DefaultTrackSettings): 313 """ 314 Higher order repeat plot options. 315 """ 316 317 sort_order: str = "descending" 318 """ 319 Plot HORs by `{mode}` in `{sort_order}` order. 320 321 Either: 322 * `ascending` 323 * `descending` 324 * Or a path to a single column file specifying the order of elements of `mode`. 325 326 Mode: 327 * If `{mer}`, sort by `mer` number 328 * If `{hor}`, sort by `hor` frequency. 329 """ 330 mode: Literal["mer", "hor"] = "mer" 331 """ 332 Plot HORs with `mer` or `hor`. 333 """ 334 live_only: bool = True 335 """ 336 Only plot live HORs. Filters only for rows with `L` character in `name` column. 337 """ 338 mer_size: int = 171 339 """ 340 Monomer size to calculate number of monomers for mer_filter. 341 """ 342 mer_filter: int = 2 343 """ 344 Filter HORs that have less than `mer_filter` monomers. 345 """ 346 hor_filter: int = 5 347 """ 348 Filter HORs that occur less than `hor_filter` times. 349 """ 350 color_map_file: str | None = None 351 """ 352 Monomer color map TSV file. Two column headerless file that has `mode` to `color` mapping. 353 """ 354 use_item_rgb: bool = False 355 """ 356 Use `item_rgb` column for color. If omitted, use default mode color map or `color_map`. 357 """ 358 split_prop: bool = False 359 """ 360 If split, divide proportion evenly across each split track. 361 """ 362 split_top_n: int | None = None 363 """ 364 If split, show top n HORs for a given mode. 365 """ 366 bg_border: bool = False 367 """ 368 Add black border containing all added labels. 369 """ 370 bg_color: str | None = None 371 """ 372 Background color for track. 373 """
Higher order repeat plot options.
Plot HORs by {mode}
in {sort_order}
order.
Either:
ascending
descending
- Or a path to a single column file specifying the order of elements of
mode
.
Mode:
- If
{mer}
, sort bymer
number - If
{hor}
, sort byhor
frequency.
Monomer color map TSV file. Two column headerless file that has mode
to color
mapping.
271@dataclass 272class HOROrtTrackSettings(StrandTrackSettings): 273 """ 274 Higher order repeat orientation arrow plot options. 275 """ 276 277 live_only: bool = True 278 """ 279 Only plot live HORs. 280 """ 281 mer_filter: int = 2 282 """ 283 Filter HORs that have at least 2 monomers. 284 """ 285 bp_merge_units: int | None = 256 286 """ 287 Merge HOR units into HOR blocks within this number of base pairs. 288 """ 289 bp_merge_blks: int | None = 8000 290 """ 291 Merge HOR blocks into HOR arrays within this number of bases pairs. 292 """ 293 min_blk_hor_units: int | None = 2 294 """ 295 Grouped stv rows must have at least `n` HOR units unbroken. 296 """ 297 min_arr_hor_units: int | None = 10 298 """ 299 Require that an HOR array have at least `n` HOR units. 300 """ 301 min_arr_len: int | None = 30_000 302 """ 303 Require that an HOR array is this size in bp. 304 """ 305 min_arr_prop: float | None = 0.9 306 """ 307 Require that an HOR array has at least this proportion of HORs by length. 308 """
Higher order repeat orientation arrow plot options.
170@dataclass 171class BarTrackSettings(DefaultTrackSettings): 172 """ 173 Bar plot options. 174 """ 175 176 DEF_COLOR = "black" 177 """ 178 Default color for bar plot. 179 """ 180 181 color: str | None = None 182 """ 183 Color of bars. If `None`, uses `item_rgb` column colors. 184 """ 185 186 alpha: float = 1.0 187 """ 188 Alpha of bars. 189 """ 190 191 ymin: int = 0 192 """ 193 Minimum y-value. 194 """ 195 196 ymax: int | None = None 197 """ 198 Maximum y-value. 199 """ 200 201 label: str | None = None 202 """ 203 Label to add to legend. 204 """
Bar plot options.
207@dataclass 208class LineTrackSettings(BarTrackSettings): 209 """ 210 Line plot options. 211 """ 212 213 position: Literal["start", "midpoint"] = "start" 214 """ 215 Draw position at start or midpoint of interval. 216 """ 217 fill: bool = False 218 """ 219 Fill under line. 220 """ 221 linestyle: str = "solid" 222 """ 223 Line style. See https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html. 224 """ 225 linewidth: int | None = None 226 """ 227 Line width. 228 """ 229 marker: str | None = None 230 """ 231 Marker shape. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers, 232 """ 233 markersize: int | None = None 234 """ 235 Marker size. 236 """ 237 log_scale: bool = False 238 """ 239 Use log-scale for plot. 240 """
Line plot options.
Marker shape. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers,
103@dataclass 104class LabelTrackSettings(DefaultTrackSettings): 105 """ 106 Label plot options. 107 """ 108 109 DEF_COLOR = "black" 110 """ 111 Default color for label. 112 """ 113 114 color: str | None = None 115 """ 116 Label color. Used if no color is provided in `item_rgb` column. 117 """ 118 119 use_item_rgb: bool = True 120 """ 121 Use `item_rgb` column if provided. Otherwise, generate a random color for each value in column `name`. 122 """ 123 124 alpha: float = 1.0 125 """ 126 Label alpha. 127 """ 128 129 shape: Literal["rect", "tri"] = "rect" 130 """ 131 Shape to draw. 132 * `"tri"` Always pointed down. 133 """ 134 135 edgecolor: str | None = None 136 """ 137 Edge color for each label. 138 """ 139 140 bg_border: bool = False 141 """ 142 Add black border containing all added labels. 143 """
Label plot options.
376@dataclass 377class LegendTrackSettings(DefaultTrackSettings): 378 index: int | None = None 379 """ 380 Index of plot to get legend of. 381 """