Data processing for Machine LearningPython for Data Analytics

======= ======================================
2) Missing Values

=================== ===========================

Scaling

Some Additional things in Jupyter Note Book

Pandas Very useful methods for data analysis

1) Importing Pandas library & looking at versions.

-------

2) Reading the data which resides in Excel file.

-------

3) DF.[Column].unique(), get unique records in the column

# the following code gets unique records of column, 'Region' , values are :['Central', 'East', 'West', 'South'],

type: Numpy array

code: type(Sample_Superstore_Updated['Region'].unique())

-------

4) DF.info() function gets the information about the DataFrame, it includes number of records, Index, Data type of the input(DataFrame), number of not null records in each column, data type

Columns for each different kind of data type.

# removing stop words

# import nltk

# from nltk.corpus import stopwords

# stop_words_1 = list(set(stopwords.words('english')))

# import spacy

# en = spacy.load('en_core_web_sm')

# stop_words_2 = list(set(en.Defaults.stop_words))

# stop_words_total = list(set(stop_words_2 + stop_words_1))

# # print( "len(stop_words):" , len(stop_words_total),

# # "\nlen(stop_words_2)-SPACY:" , len(stop_words_2),

# # "\nlen(stop_words_1)-NLTK:" , len(stop_words_1)

# # )

# removing stop words

#https://gist.github.com/sebleier/554280

# import nltk

# from nltk.corpus import stopwords

# stop_words_1 = list(set(stopwords.words('english')))

# import spacy # en = spacy.load('en_core_web_sm') # stop_words_2 = list(set(en.Defaults.stop_words)) # stop_words_total = list(set(stop_words_2 + stop_words_1)) # print( "len(stop_words):" , len(stop_words_total), # "\nlen(stop_words_2)-SPACY:" , len(stop_words_2), # "\nlen(stop_words_1)-NLTK:" , len(stop_words_1) # ) # import requests # stopwords_list = requests.get("https://gist.githubusercontent.com/rg089/35e00abf8941d72d419224cfd5b5925d/raw/12d899b70156fd0041fa9778d657330b024b959c/stopwords.txt").content # stopwords = set(stopwords_list.decode().splitlines())

# stopwords

Data processing for Machine Learning

Contact Form