Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(number): add exponentialDistribution function #3375

Open
wants to merge 2 commits into
base: next
Choose a base branch
from

Conversation

ST-DDT
Copy link
Member

@ST-DDT ST-DDT commented Jan 18, 2025

Implements #1862


Adds a function that generates values according to an exponential distribution.

The following table shows the rough distribution of values generated using
Math.floor(exponentialDistribution({ min: 0, max: 10, base: x })):

Value Base 0.1 Base 0.5 Base 1 Base 2 Base 10
0 4.1% 7.4% 10.0% 13.8% 27.8%
1 4.5% 7.8% 10.0% 12.5% 16.9%
2 5.0% 8.2% 10.0% 11.5% 12.1%
3 5.7% 8.7% 10.0% 10.7% 9.4%
4 6.6% 9.3% 10.0% 10.0% 7.8%
5 7.8% 9.9% 10.0% 9.3% 6.6%
6 9.4% 10.7% 10.0% 8.8% 5.7%
7 12.1% 11.5% 10.0% 8.2% 5.0%
8 16.9% 12.6% 10.0% 7.8% 4.5%
9 27.9% 13.8% 10.0% 7.5% 4.1%

The exponential distribution is achieved by using base ** exponent where exponent is a random float between 0 and 1.
The result is then stretched evenly to fit to min and max.

@ST-DDT ST-DDT added c: feature Request for new feature p: 1-normal Nothing urgent m: number Something is referring to the number module labels Jan 18, 2025
@ST-DDT ST-DDT added this to the vAnytime milestone Jan 18, 2025
@ST-DDT ST-DDT requested review from a team January 18, 2025 15:01
@ST-DDT ST-DDT self-assigned this Jan 18, 2025
Copy link

netlify bot commented Jan 18, 2025

Deploy Preview for fakerjs ready!

Name Link
🔨 Latest commit d3236f4
🔍 Latest deploy log https://app.netlify.com/sites/fakerjs/deploys/678d224a323e040008c9d1b8
😎 Deploy Preview https://deploy-preview-3375.fakerjs.dev
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jan 18, 2025

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 99.97%. Comparing base (192b540) to head (d3236f4).

Files with missing lines Patch % Lines
src/modules/number/index.ts 92.59% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             next    #3375      +/-   ##
==========================================
- Coverage   99.97%   99.97%   -0.01%     
==========================================
  Files        2811     2811              
  Lines      217035   217062      +27     
  Branches      940      950      +10     
==========================================
+ Hits       216982   217007      +25     
- Misses         53       55       +2     
Files with missing lines Coverage Δ
src/modules/number/index.ts 99.14% <92.59%> (-0.86%) ⬇️

Comment on lines 542 to 555
* The following table shows the rough distribution of values generated using `Math.floor(exponentialDistribution({ min: 0, max: 10, base: x }))`:
*
* | Value | Base 0.1 | Base 0.5 | Base 1 | Base 2 | Base 10 |
* | :---: | -------: | -------: | -----: | -----: | ------: |
* | 0 | 4.1% | 7.4% | 10.0% | 13.8% | 27.8% |
* | 1 | 4.5% | 7.8% | 10.0% | 12.5% | 16.9% |
* | 2 | 5.0% | 8.2% | 10.0% | 11.5% | 12.1% |
* | 3 | 5.7% | 8.7% | 10.0% | 10.7% | 9.4% |
* | 4 | 6.6% | 9.3% | 10.0% | 10.0% | 7.8% |
* | 5 | 7.8% | 9.9% | 10.0% | 9.3% | 6.6% |
* | 6 | 9.4% | 10.7% | 10.0% | 8.8% | 5.7% |
* | 7 | 12.1% | 11.5% | 10.0% | 8.2% | 5.0% |
* | 8 | 16.9% | 12.6% | 10.0% | 7.8% | 4.5% |
* | 9 | 27.9% | 13.8% | 10.0% | 7.5% | 4.1% |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have these tables here or in the parameter descriptions?

If we have them here we can show them on the documentation.
If we have them in the options we cannot display them.

IMO we should not have them in both places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave the tables here so they show up in the docs.

@ST-DDT
Copy link
Member Author

ST-DDT commented Jan 18, 2025

I'm also not sure whether exponentialDistribution is the correct name for the function.

@matthewmayer
Copy link
Contributor

I don't like calling it "exponentialDistribution " because you're not returning a distribution, you are returning a single value from that distribution.

So I'd call it exponentialValue or floatExponential or something similar to emphasise the thing that is returned is a float or a value.

src/modules/number/index.ts Show resolved Hide resolved
* faker.number.exponentialDistribution(10) // 1.656598169056771
* faker.number.exponentialDistribution({ min: 10, max: 100 }) // 88.7273250669911
* faker.number.exponentialDistribution({ min: 0, max: 100, base: 10 }) // 6.9442760672808745
* faker.number.exponentialDistribution({ min: 0, max: 100, bias: 10 }) // 67.03715679154617
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure adding a second way (bias) of achieving the same results is that useful; would prefer picking either bias or base.

Copy link
Member Author

@ST-DDT ST-DDT Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base is the mathematical basis that the function uses, however it is kind of hard to use because to invert the effect you have to base = 1/base and big base => probably low value.
Bias is intended as a non mathematical way to define the base, where if you want to invert the value you can just bias = -bias and big bias => probably big value.

I would prefer to keep base, because it is the actual foundation as the implementation,
and I would prefer to keep bias, due to its ease of use.
But that's just my opinion and could be changed.

@ST-DDT
Copy link
Member Author

ST-DDT commented Jan 19, 2025

you are returning a single value from that distribution.

I'm not even sure that is the case.

So I'd call it exponentialValue or floatExponential or something similar to emphasise the thing that is returned is a float or a value.

Good idea. I'm not finally sure which term I would like to use but stepping away from calling it exponentialDistribution is the correct choice.

I'm currently considering one of these:

  • exponential
  • exponentialValue
  • exponentialNumber
  • exponentialFloat
  • floatExponential
  • exponentiallySpread (or something similar)
  • powerLaw___ (<- ChatGpt)

*
* @since 9.5.0
*/
exponentialDistribution(
Copy link
Member

@xDivisionByZerox xDivisionByZerox Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes from my side (I'll leave a comment at the code to have a separate discussion block):

  1. The functions seems out of place in this module. All function names so far have had a technical nature to them while this one is more mathmatically named. Following that schema number.int would be number.natural or number.float would be number.rational. Not that dramatic, but something I noticed after we tried organizing the modules in v8 so neatly.
  2. Is it correct to make a new function at all. Reading the original issue, the usecase described a distribution of integers along an exponential scale. This function however returns floats. IMO, integrating the parameters into the existing methods is the most logical option. The default could be set to return a unionized value distribution (base = 1) to not introduce breaking changes.
    2.1. When the parameters are intrgrated into the other functions they should maybe grouped in a separate, optional object. Since I would consider this API pretty advanced for the usual user, I'd "hide" it in an extra nested layer. Additionally, I could think of faker providing some utility function or constants that have predefined, commonly used distributions:
import { faker, exponentialDistibution, linearDistibution } from '@faker-js/faker';

console.log(exponentialDistibution()); // { bias: -1, base: 10 }
console.log(exponentialDistibution({ inverted: true })); // { bias: -1, base: 0.1 }

faker.nuber.int({ distibution: exponentialDistibution() });
faker.nuber.float({ distibution: exponentialDistibution({ inverted: true }) });
faker.nuber.float({ distibution: linearDistibution() });

What do you think? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is logic in making it an option for faker.number.float() etc instead

The only downside is when the really basic functions like float() and int() have too many options it could make it overwhelming for new users to read the documentation.

Copy link
Member

@xDivisionByZerox xDivisionByZerox Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree on your reasoning. Thats why I propsed an additional nested layer in the options. That way the arguments bias and base do not bleed into the first level of options. Additionally, adding some overloads to the function could work. Make one overload (the default) with less arguments in the options that are used more often and one overload with an options object that represents a superset of the default, simple one.

type SimpleOptions = { min: number; max: number };
type AdvancedOptions = SimpleOptions & { distibution: { base: number; bias: number } };

class NumberModule {
  int(): number; // no args = default args
  int(options: AdvancedOptions): number; // advanced arguments, put first to not get picked up by IDE (IDE usually picks last overload as default)
  int(options: SimpleOptions): number; // simple arguments, last to be the default
  int(options?: AdvancedOptions = {}): number { // implementation can just use optional advanced options since it includes all possible cases
    // ...
  }

  // ...
}

The types are separated for representation purposes here, I'm not sure if our docs already allow for non inline type definitions.

Copy link
Member Author

@ST-DDT ST-DDT Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distribution: exponentialDistribution({base: 2}) option seems like a good solution.
type Distribution = (options: {random, min, max}): number
That way other developers can add their own distributions as needed.
I'll try it soon.

I fear AdvancedOptions might be complicated to implement for our api docs.
Currently we just use the last (api) signature.

const signatureData = required(signatures.at(-1), 'method signature');

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way other developers can add their own distributions as needed.

Thats the idea. Although, I'd propably provide some common use caes out of the box. Thats not a requirement for me, tho.

I fear AdvancedOptions might be complicated to implement for our api docs.
Currently we just use the last (api) signature.

No worries. Lets try to focus on designing the best possible API for our users first. We can discuss the handling of the documentation afterwards, IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a ton of different probability distributions and they might all have different numbers of parameters and traditional names for their parameters: https://en.wikipedia.org/wiki/Probability_distribution - not to mention some make sense for discrete distributions (like ints) and some for continuous distributions (like floats).

So i dont think you'll be able to have a standardised way to refer to the parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good argument. I was probably looking too naively at the problem at hand.
It would still be awesome if we could solve the problem without introducing a new method. I stated the module specification problem in my first comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats why I try to implement the distribution option as a function, not data.

function int(options: { distribution: (fakerCore) => number, ....}) {

f() ∈ [0, 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thing that occurred to me; in many cases when generating a small finite number of different ints, developers may find it easier to just use weightedArrayElement and just manually set the percentage chance of getting each value.

So i think this is actually much more useful in the float case, when you can't easily do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the usecases for this method is the regexp method: specifically for * lengths.

@xDivisionByZerox
Copy link
Member

Team decision

The feature will be implemented by extending the already existing methods (number.int(), number.float(), etc). An additional parameter will be added. The parameter will be a function that exepts a randomizer instance and will return a biased randomized value. The return value will be used by the original function to "push" the value in the direction the distribution desires.

We are currently unsure how to name the parameter. While distribution is okay, we are not sure if this is the mathmatical correct term.
The name "spread" was discussed, but discarded since it might confuse users with the JS spread operator.
We will think about other names until the next meeting. If we do not come up with any, the current name of "distribution" is fine for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: feature Request for new feature m: number Something is referring to the number module p: 1-normal Nothing urgent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants