Monday, 15 May 2023

Descriptive Statistics and Linear Regression Using 'statistics' module and 'statsmodels' module

Using 'statistics' module

Using 'statsmodels' module

Tags: Technology,Python,Machine Learning,

Friday, 12 May 2023

Python 'math' Module, 'statistics' Module and Descriptive statistics using Pandas, NumPy, SciPy and StatsModels

Note: In this article we discuss three things:
1. math Module 
2. statistics Module 
3. Descriptive statistics using Pandas, NumPy, SciPy and StatsModels

Python math Module

Python has a built-in module that you can use for mathematical tasks.

The math module has a set of methods and constants.


Math Methods

Method Description
math.acos() Returns the arc cosine of a number
math.acosh() Returns the inverse hyperbolic cosine of a number
math.asin() Returns the arc sine of a number
math.asinh() Returns the inverse hyperbolic sine of a number
math.atan() Returns the arc tangent of a number in radians
math.atan2() Returns the arc tangent of y/x in radians
math.atanh() Returns the inverse hyperbolic tangent of a number
math.ceil() Rounds a number up to the nearest integer
math.comb() Returns the number of ways to choose k items from n items without repetition and order
math.copysign() Returns a float consisting of the value of the first parameter and the sign of the second parameter
math.cos() Returns the cosine of a number
math.cosh() Returns the hyperbolic cosine of a number
math.degrees() Converts an angle from radians to degrees
math.dist() Returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point
math.erf() Returns the error function of a number
math.erfc() Returns the complementary error function of a number
math.exp() Returns E raised to the power of x
math.expm1() Returns Ex - 1
math.fabs() Returns the absolute value of a number
math.factorial() Returns the factorial of a number
math.floor() Rounds a number down to the nearest integer
math.fmod() Returns the remainder of x/y
math.frexp() Returns the mantissa and the exponent, of a specified number
math.fsum() Returns the sum of all items in any iterable (tuples, arrays, lists, etc.)
math.gamma() Returns the gamma function at x
math.gcd() Returns the greatest common divisor of two integers
math.hypot() Returns the Euclidean norm
math.isclose() Checks whether two values are close to each other, or not
math.isfinite() Checks whether a number is finite or not
math.isinf() Checks whether a number is infinite or not
math.isnan() Checks whether a value is NaN (not a number) or not
math.isqrt() Rounds a square root number downwards to the nearest integer
math.ldexp() Returns the inverse of math.frexp() which is x * (2**i) of the given numbers x and i
math.lgamma() Returns the log gamma value of x
math.log() Returns the natural logarithm of a number, or the logarithm of number to base
math.log10() Returns the base-10 logarithm of x
math.log1p() Returns the natural logarithm of 1+x
math.log2() Returns the base-2 logarithm of x
math.perm() Returns the number of ways to choose k items from n items with order and without repetition
math.pow() Returns the value of x to the power of y
math.prod() Returns the product of all the elements in an iterable
math.radians() Converts a degree value into radians
math.remainder() Returns the closest value that can make numerator completely divisible by the denominator
math.sin() Returns the sine of a number
math.sinh() Returns the hyperbolic sine of a number
math.sqrt() Returns the square root of a number
math.tan() Returns the tangent of a number
math.tanh() Returns the hyperbolic tangent of a number
math.trunc() Returns the truncated integer parts of a number

Math Constants

Constant Description
math.e Returns Euler's number (2.7182...)
math.inf Returns a floating-point positive infinity
math.nan Returns a floating-point NaN (Not a Number) value
math.pi Returns PI (3.1415...)
math.tau Returns tau (6.2831...)
Some of these methods have been seen very frequently in our work. These include:

math.ceil(): Rounds a number up to the nearest integer
math.floor(): Rounds a number down to the nearest integer
math.factorial(): Returns the factorial of a number
math.comb(): Returns the number of ways to choose k items from n items without repetition and order
math.degrees(): Converts an angle from radians to degrees
math.radians(): Converts a degree value into radians
math.gcd(): Returns the greatest common divisor of two integers
math.dist(): Returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point

Python statistics Module

Averages and measures of central location

These functions calculate an average or typical value from a population or sample.

mean()

Arithmetic mean (“average”) of data.

fmean()

Fast, floating point arithmetic mean, with optional weighting.

geometric_mean()

Geometric mean of data.

harmonic_mean()

Harmonic mean of data.

median()

Median (middle value) of data.

median_low()

Low median of data.

median_high()

High median of data.

median_grouped()

Median, or 50th percentile, of grouped data.

mode()

Single mode (most common value) of discrete or nominal data.

multimode()

List of modes (most common values) of discrete or nominal data.

quantiles()

Divide data into intervals with equal probability.

Measures of spread

These functions calculate a measure of how much the population or sample tends to deviate from the typical or average values.

pstdev()

Population standard deviation of data.

pvariance()

Population variance of data.

stdev()

Sample standard deviation of data.

variance()

Sample variance of data.

Statistics for relations between two inputs

These functions calculate statistics regarding relations between two inputs.

covariance()

Sample covariance for two variables.

correlation()

Pearson's correlation coefficient for two variables.

linear_regression()

Slope and intercept for simple linear regression.

NormalDist

NormalDist is a tool for creating and manipulating normal distributions of a random variable. It is a class that treats the mean and standard deviation of data measurements as a single entity. Normal distributions arise from the Central Limit Theorem and have a wide range of applications in statistics.

l = [13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 36, 40, 45, 46, 52, 70]

# Sum of all elements

print(sum(l))

# Count of each items.

from collections import Counter 
print(Counter(l))

# Mean

import statistics as st

print(st.mean(l))

print("Median:", st.median(l))

# Mode

print(st.mode(l))

# Mid-range 

print(st.mean([max(l), min(l)]))

# Other statistical measures

print(st.quantiles(data = l, n = 4)) # [20.0, 25.0, 35.25]
print(st.stdev(l))
print(st.variance(l))

import pandas
l = [13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 36, 40, 45, 46, 52, 70]
df = pandas.DataFrame(l, columns=['Numbers'])
sum = df['Numbers'].sum()
count_val = df['Numbers'].value_counts()
mode = df['Numbers'].mode().values.tolist()
midrange = (df['Numbers'].max() + df['Numbers'].min()) / 2

print("The sum of the given data using pandas ", sum)
print("\nThe count of values \n", count_val)
print("\nMean of the given data using pandas ", df['Numbers'].mean())
print("\nMedian of the given data using pandas ", df['Numbers'].median())
print("\nMode of the given data using pandas ", mode[0])
print("\nMidrange of the given data using pandas:", midrange)
print("\nStandard deviation for given data using pandas:", df['Numbers'].std())
print("\nVariance for given data using pandas:", df['Numbers'].var())
print("\nQuantiles\n", df['Numbers'].quantile([0.25,0.50,0.75]))
print("\n\n")

import numpy as np

data=np.array(l)
print("Using NumPy\n")
unique_values, counts = np.unique(data, return_counts=True)
quantiles=np.percentile(data,[25,50,75])
print("Sum ",np.sum(data))
print("\nCount of values \n")
for value, count in zip(unique_values, counts):
    print( value, count)   
print("Mean :",np.mean(data))
print("\nMedian:",np.median(data))
print("\nMode:",np.argmax(np.bincount(data)))    
print("\nStandard deviation",np.std(data))
print("\nVariance :",np.var(data))
print("\nQuantiles \n")
print(quantiles[0],quantiles[1],quantiles[2])
print("\n\n")


from scipy import stats 

print("Using SciPy\n")
mode=stats.mode(data)
print("Mode: ", mode.mode[0])

# For count--> scipy.stats.itemfreq()
# Other statistical measures similar to numpy



$ python statistical_summary.py 
774
Counter({25: 4, 35: 3, 16: 2, 20: 2, 22: 2, 33: 2, 13: 1, 15: 1, 19: 1, 21: 1, 30: 1, 36: 1, 40: 1, 45: 1, 46: 1, 52: 1, 70: 1})
29.76923076923077
Median: 25.0
25
41.5
[20.0, 25.0, 35.25]
13.158442741624686
173.14461538461538
The sum of the given data using pandas  774

The count of values 
25    4
35    3
16    2
20    2
22    2
33    2
13    1
40    1
52    1
46    1
45    1
30    1
36    1
15    1
21    1
19    1
70    1
Name: Numbers, dtype: int64

Mean of the given data using pandas  29.76923076923077

Median of the given data using pandas  25.0

Mode of the given data using pandas  25

Midrange of the given data using pandas: 41.5

Standard deviation for given data using pandas: 13.158442741624686

Variance for given data using pandas: 173.14461538461538

Quantiles
0.25    20.25
0.50    25.00
0.75    35.00
Name: Numbers, dtype: float64



Using NumPy

Sum  774

Count of values 

13 1
15 1
16 2
19 1
20 2
21 1
22 2
25 4
30 1
33 2
35 3
36 1
40 1
45 1
46 1
52 1
70 1
Mean : 29.76923076923077

Median: 25.0

Mode: 25

Standard deviation 12.902914674622618

Variance : 166.4852071005917

Quantiles 

20.25 25.0 35.0



Using SciPy

/home/ashish/Desktop/statistical_summary.py:90: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
    mode=stats.mode(data)
Mode:  25

Tags: Technology,Python,Mathematical Foundations for Data Science,

Thursday, 11 May 2023

Index of Lessons in Technology

Toggle All Sections

Artificial Intelligence

Data Mining

Tags: Technology,Data Mining,Index

Distinguishing Between Artificial Intelligence and Data Science Using Images

Artificial Intelligence & Data Science

AI, ML and Deep Learning in a Euler Diagram

Development of a smart computer system that is able to perform task that normally requires human intelligence such as Visual Perceptions, Speech Recognition, Decision Making & Language Translations Development of a ‘Computer Program or Machine’ which can Learn, Think and Act on its own Learn : Acquire Data Think : Analysis Of Data Act : Taking Action

A broad view of Data Science

Data Science Venn Diagram

Machine Learning ( ML ) : Applying statistics on computer Traditional Software ( TS ) : Doing Business using computer Traditional Research ( TR ) : Use of statistics to understand , explain and grow business Data Science ( DS ): A Broad canvas that encompasses Machine Learning, Traditional Software & Traditional Research

Machine Learning

# A Discipline under Data Science # Imparts and Empowers Machine to Learn , Think , Act for themselves # Help’s computers to learn from pattern’s & behavior and act accordingly without any human intervention or being explicitly programmed ML Technology - It uses Algorithms & Mathematical Models to analyze data and learn from it. For Ex - Following statistical models are used to analyze data # Linear Regression # Decision Trees # Naïve Bayes’ Classification Model Exploratory Data Analysis - # This is the first step in machine learning to be apply on data set # It deal with doing Descriptive and Inferential statistics on data set

Data Science

# It is root of all # A discipline that utilize a combination of Mathematical , Statistical and Computational tools to acquire , process and analyze Big Data . # It help in impart meaning from large amount of Big Data

Data Scientist and Analyst uses –

# ‘Statistical Inference’ to extract hidden & useful patterns from large data sets # And ‘Data Visualization’ techniques to communicate those insights into business oriented directions ( With Domain Expertise ) Processes used in Data Science - # Data Extraction # Data Cleaning # Data Analysis # Visualization of Data # Generalization of actionable insights

Sunday, 9 April 2023

Animation for Single Digit Subtraction

Subtraction is taking out a couple of items from several of those. For example: let us say you have basket with 7 apples and you take out 2 apples from it. Now, how many apples are left in the basket? The answer would be when you substract the quantity that you are taking out (i.e., 2) from the quantity that was there (i.e., 7). So the answer is: 7 - 2 = 5. You are left with 5 apples in the basket.

Note: We will subtract the smaller number from the larger number.


Select first number:

 

Select second number:

 

What you are seeing below is as many sticks as the larger of the two numbers:





Tags: Mathematical Foundations for Data Science,

Hindi to English Learning (Version 3)

User Registration First time users, please register... ...