Selections
A primary way to interact with a scatter plot is through selections. Jupyter Scatter makes this easy by offering selections that synchronize between the Python and JavaScript kernel.
There are two ways to select points in a scatter plot:
Similarly, to act upon selected data points, you can
To demonstrate how these approaches work, we're going to use the following DataFrame of random value:
import jscatter
import numpy as np
import pandas as pd
df = pd.DataFrame({
# Random floats
"mass": np.random.rand(500),
"speed": np.random.rand(500),
"pval": np.random.rand(500),
# Random letters A, B, C, D, E, F, G, H
"cat": np.vectorize(lambda x: chr(65 + round(x * 8)))(np.random.rand(500)),
})
scatter = jscatter.Scatter(data=df, x="mass", y="speed", color_by="cat")
scatter.show()
Programmatically Select Points
To select a specific set of points, you can use the scatter.selection()
method which accepts as input a list of point indices.
scatter.selection([1, 2, 3])
With the help of Panda's query
method, we can easily select specific points matching some criteria as follows:
scatter.selection(df.query("cat == 'A'").index)
With the above call, for instance, we would select all points that belong to category A
.
INFO
By default, Jupyter Scatter references points by their range index. Meaning, scatter.selection([0, 1, 2])
will select the first, second, and third point.
Alternatively, if you're binding a DataFrame to a Scatter
instance via Scatter(data=df)
and your DataFrame has a custom index, you can make the Scatter
instance reference data points by the DataFrame's index via Scatter(data=df, data_use_index=True)
.
Lasso Select Points
As you might have seen already in the interactions guide, we can also select points interactively using the lasso tool.
Get Selected Points
Importantly, once you have selected some points, you can retrieve the interaction using the same method scatter.selection()
that we used earlier. This time just don't pass any arguments to the function.
scatter.selection()
# => [0, 1, 2]
This will return a the indices of the selected points.
If you have bound a DataFrame to the scatter instance, you can use these indices to retrieve the original data records.
scatter.selection(df.query("cat == 'A'").index)
df.loc[scatter.selection()]
x | y | pval | cat | |
---|---|---|---|---|
0 | 0.13 | 0.27 | 0.51 | A |
42 | 0.87 | 0.93 | 0.80 | A |
… | … | … | … | … |
1337 | 0.10 | 0.25 | 0.25 | A |
Observe Selected Points
Real magic can about to happen when you react to selections automatically. You can do this by observing scatter widget's selection
property:
import ipywidgets
import jscatter
import numpy as np
import pandas as pd
df = pd.DataFrame({
# Random floats
"mass": np.random.rand(500),
"speed": np.random.rand(500),
"pval": np.random.rand(500),
# Random letters A, B, C, D, E, F, G, H
"cat": np.vectorize(lambda x: chr(65 + round(x * 8)))(np.random.rand(500)),
})
scatter = jscatter.Scatter(data=df, x="mass", y="speed", color_by="cat")
output = ipywidgets.Output()
@output.capture(clear_output=True)
def selection_change_handler(change):
display(df.loc[change.new].style.hide(axis='index'))
scatter.widget.observe(selection_change_handler, names=["selection"])
ipywidgets.HBox([scatter.show(), output])
If you want to learn how the point selections can be used to help you explore large-scale datasets, check out our in-depth talk+tutorial from SciPy '23.