Axes & Legends
Now that we know how to create, configure, compose, and link scatter plots, it's time learn about axes and legends which are often essential in making sense of the visualized data.
Axes
You might have noticed already that axes are drawn by default. E.g.:
from jscatter import Scatter
from numpy.random import rand
scatter = jscatter.Scatter(x=rand(500), y=rand(500))
scatter.show()
INFO
You can also hide the axes via scatter.axes(False)
in case they are not informative like for t-SNE or UMAP embeddings.
In addition, you can also enable a grid, which can be helpful to better locate points.
scatter.axes(grid=True)
And finally, you can also label the axes
scatter.axes(labels=['Speed (km/h)', 'Weight (tons)'])
Legend
When you encode data properties with the point color, opacity, or size, it's immensly helpful to know how the encoded data properties relate to the visual properties by showing a legend.
import jscatter
import numpy as np
import pandas as pd
df = pd.DataFrame({
# Random floats
"mass": np.random.rand(500),
"speed": np.random.rand(500),
"pval": np.random.rand(500),
# Gaussian-distributed floats
"effect_size": np.random.normal(.5, .2, 500),
# Random letters A, B, C, D, E, F, G, H
"cat": np.vectorize(lambda x: chr(65 + round(x * 8)))(np.random.rand(500)),
# Random letters X, Y, Z
"group": np.vectorize(lambda x: chr(88 + round(x * 2)))(np.random.rand(500)),
})
scatter = jscatter.Scatter(
data=df,
x="mass",
y="speed",
color_by="cat",
size_by="pval",
legend=True,
)
scatter.show()
When you encode a categorical data property (like cat
) using color, Jupyter Scatter will list out each category in the legend. In contrast, for continuous data properties (like pval
), only five values are shown in the legend: the minimum, maximum, and three equally spaced values in between.
scatter.color(by="pval").opacity(by="cat").size(5)
Notice how the legend now only shows five entries for color
as it encodes a continuous variable.
In addition to just showing a mapping of data and visual properties, Jupyter Scatter can also label continuous properties.
scatter.color(labeling={
"variable": "p-value",
"minValue": "significant",
"maxValue": "insignificant",
})
Annotating numerical range like this can make it easier for yourself but primarily for collaborators and others to grasp the color mapping.