Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: MVP plotly-express docs #554

Merged
merged 39 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d124531
Docs
alexpeters1208 Jun 11, 2024
af94f9f
Continue docs
alexpeters1208 Jun 12, 2024
6907460
More docs
alexpeters1208 Jun 13, 2024
8b56f55
Add polar and ternary examples
alexpeters1208 Jun 18, 2024
2fbf7b2
Add multiple-axes
alexpeters1208 Jun 20, 2024
03c067b
Start re-wording
alexpeters1208 Jun 20, 2024
275aa7d
Fix spacing
alexpeters1208 Jun 20, 2024
50ee789
Start "what are they useful for"
alexpeters1208 Jun 20, 2024
566fadc
Small language changes
alexpeters1208 Jun 20, 2024
6d15404
When are they appropriate
alexpeters1208 Jun 21, 2024
ad907ba
Update ecdf
alexpeters1208 Jun 21, 2024
1e21a4d
Merge branch 'main' into dx-min-docs
alexpeters1208 Jun 21, 2024
b615e3e
Update notes and warnings
alexpeters1208 Jun 21, 2024
1ad4acd
Simplify "what are they useful for"
alexpeters1208 Jun 21, 2024
d872a90
Don Area suggestion
alexpeters1208 Jun 24, 2024
d39b2b7
More review suggestions
alexpeters1208 Jun 28, 2024
643003a
Funnel plot
alexpeters1208 Jul 2, 2024
6eb71fb
Funnel, funnel area, timeline
alexpeters1208 Jul 3, 2024
9ec22f5
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 9, 2024
ab8ebf8
Start plot by
alexpeters1208 Jul 10, 2024
19dd7b3
More plot by
alexpeters1208 Jul 10, 2024
0cb05ce
Apply suggestions from code review
alexpeters1208 Jul 15, 2024
143249e
See how area plot renders
alexpeters1208 Jul 15, 2024
f9436bf
Check rendering
alexpeters1208 Jul 15, 2024
a47adfe
More tidying
alexpeters1208 Jul 16, 2024
d6638b6
More plot by
alexpeters1208 Jul 16, 2024
86c1b73
First round of revisions from Chip
alexpeters1208 Jul 24, 2024
4e4cc91
Scatter progress
alexpeters1208 Jul 25, 2024
6ef84bc
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 25, 2024
b4a186f
Revise scatter
alexpeters1208 Jul 25, 2024
ab8f2c4
More polish
alexpeters1208 Jul 25, 2024
778246e
More polish, Don review, add density heatmap
alexpeters1208 Jul 25, 2024
b5665a5
Pascal case
alexpeters1208 Jul 26, 2024
a562b64
Links
alexpeters1208 Jul 26, 2024
a432741
Revise concept pieces
alexpeters1208 Jul 26, 2024
c408bf2
Deterministic large datasets
alexpeters1208 Jul 26, 2024
f3c7ebe
Chip review
alexpeters1208 Jul 26, 2024
83fcb87
Chip and Joe suggestions
alexpeters1208 Jul 26, 2024
6dc0ed6
Move density heatmap up
alexpeters1208 Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion plugins/plotly-express/docs/area.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ usa_population = gapminder.where("Country == `United States`")
area_plot = dx.area(usa_population, x="Year", y="Pop")
```

### Color by group
### Area by group

Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Pass the name of the grouping column(s) to the `by` argument.

Expand Down
6 changes: 4 additions & 2 deletions plugins/plotly-express/docs/box.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@ Box plots are appropriate when the data have a continuous variable of interest.

Visualize the distribution of a single variable by passing the column name to `x` or `y`.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

```python order=box_plot,tips
```python order=box_plot_x,box_plot_y,tips
import deephaven.plot.express as dx
tips = dx.data.tips()

box_plot = dx.box(tips, y="TotalBill")
# control the plot orientation using `x` or `y`
box_plot_x = dx.box(tips, x="TotalBill")
box_plot_y = dx.box(tips, y="TotalBill")
```

### Distributions for multiple groups
Expand Down
6 changes: 4 additions & 2 deletions plugins/plotly-express/docs/histogram.md
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,16 @@ Histograms are appropriate when the data contain a continuous variable of intere

Visualize the distribution of a single variable by passing the column name to the `x` or `y` arguments.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

```python order=hist_plot,setosa,iris
```python order=hist_plot_x,hist_plot_y,setosa,iris
import deephaven.plot.express as dx
iris = dx.data.iris()

# subset to get specific species
setosa = iris.where("Species == `setosa`")

hist_plot = dx.histogram(setosa, x="SepalLength")
# control the plot orientation using `x` or `y`
hist_plot_x = dx.histogram(setosa, x="SepalLength")
hist_plot_y = dx.histogram(setosa, y="SepalLength")
```

Modify the bin size by setting `nbins` equal to the number of desired bins.
Expand Down
2 changes: 1 addition & 1 deletion plugins/plotly-express/docs/line.md
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dog_prices = my_table.where("Sym = `DOG`")
line_plot = dx.line(dog_prices, x="Timestamp", y="Price")
```

### Color line plot by group
### Line by group

Create a line with a unique color for each group in the dataset by passing the grouping column name to the `by` argument.

Expand Down
21 changes: 17 additions & 4 deletions plugins/plotly-express/docs/multiple-axes.md
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,35 @@ When two or more response variables appear in separate columns, pass their colum

```python order=line_plot_multi,brazil,gapminder
import deephaven.plot.express as dx
gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset
gapminder = dx.data.gapminder()

# get a specific country
brazil = gapminder.where("Country == `Brazil`")

# specify multiple y-axis columns and order axes left to right with yaxis_sequence
# specify multiple y-axis columns and split axes with yaxis_sequence
line_plot_multi = dx.line(brazil, x="Year", y=["Pop", "GdpPerCap"], yaxis_sequence=[1, 2])
```

If `xaxis_sequence` or `yaxis_sequence` are not specified, the series will share an axis, which may or may not be useful depending on the units and scale of the data.

```python order=line_plot_shared,brazil,gapminder
import deephaven.plot.express as dx
gapminder = dx.data.gapminder()

# get a specific country
brazil = gapminder.where("Country == `Brazil`")

# population and per capita gdp have very different scales and units
line_plot_shared = dx.line(brazil, x="Year", y=["Pop", "GdpPerCap"])
```

### Use `by` with multiple axes

When a single response variable has observations from several groups of data, use the `by` parameter to specify the grouping column.

```python order=line_plot_by,cat_dog,stocks
import deephaven.plot.express as dx
stocks = dx.data.stocks() # import the example stock market data set
stocks = dx.data.stocks()

# subset to get two symbols
cat_dog = stocks.where("Sym in `CAT`, `DOG`")
Expand All @@ -40,7 +53,7 @@ Finally, plots can be layered to achieve multiple axes. Use the `dx.layer` funct

```python order=line_plot_layered,fish,bird,stocks
import deephaven.plot.express as dx
stocks = dx.data.stocks() # import the example stock market data set
stocks = dx.data.stocks()

# subset to get two tables with a shared x-axis
fish = stocks.where("Sym == `FISH`")
Expand Down
2 changes: 2 additions & 0 deletions plugins/plotly-express/docs/plot-by.md
chipkent marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

To plot multiple series from a table into a single chart, use the `by` parameter. This parameter accepts a column name or a list of column names denoting other variables of interest in the dataset. The chart will be partitioned by the values in the specified column(s), with one series for each unique value. Other parameters, such as `color` (for which `by` is an alias), `symbol`, `size`, `width`, and `line_dash` can also be used to partition the chart.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be clarified that by is not simply an alias for color
it can be tweaked by using by_vars and passing in these other columns such as symbol and size
it also behaves slightly differently though, take this example

import deephaven.plot.express as dx
tips = dx.data.tips() # import the example iris data set

by_list = dx.scatter(tips, x="TotalBill", y="Tip", by=["Time", "Smoker"], by_vars=["color", "symbol"])
by_prod = dx.scatter(tips, x="TotalBill", y="Tip", by="Time", symbol="Smoker")

by_list just loops through color/symbol combos (jointly)
whereas by_prod assigns colors to specific column values, so, of the four joint values, Lunch and Dinner have the same color and Yes and No have the same symbol.

The first method is more useful to emphasize differences, whereas the second is more useful to emphasize similarities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the "for which by" is an alias" part - hopefully that cleans up the confusion. As far as using by_vars, that seems like something that should belong in the expanded version of this doc once the full thing is written, and not necessarily in the introduction. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, shouldn't be in the intro, that's fine


Under the hood, the Deephaven query engine performs a `parition_by` table operation on the given color column to create each series. This efficient implementation means that plots with multiple groups can easily scale to tens of millions or billions of rows with ease.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

## Examples

### Scatter plot by a categorical variable
Expand Down
2 changes: 1 addition & 1 deletion plugins/plotly-express/docs/scatter.md
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ custom_colors_3 = dx.scatter(
y="SepalLength",
by="example_colors",
# When set to `identity`, the column data passed to the
# color parameter will used as the actual color
# grouping/color parameter will be used as the actual color
color_discrete_map="identity"
)
```
Expand Down
4 changes: 0 additions & 4 deletions plugins/plotly-express/docs/sidebar.json
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,6 @@
{
"label": "Multiple axes",
"path": "multiple-axes.md"
},
{
"label": "Titles and legends",
"path": "other.md"
}
]
}
Expand Down
6 changes: 4 additions & 2 deletions plugins/plotly-express/docs/violin.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,16 @@ Violin plots are appropriate when the data contain a continuous variable of inte

Visualize the distribution of a single variable by passing the column name to the `x` or `y` arguments.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

```python order=violin_plot,versicolor
```python order=violin_plot_x,violin_plot_y,versicolor
import deephaven.plot.express as dx
iris = dx.data.iris()

# subset to get a specific group
versicolor = iris.where("Species == `versicolor`")

violin_plot = dx.violin(versicolor, x="SepalLength")
# control the plot orientation using `x` or `y`
violin_plot_x = dx.violin(versicolor, x="SepalLength")
violin_plot_y = dx.violin(versicolor, y="SepalLength")
```

### Distributions for multiple groups
Expand Down
Loading