How I realised Data Science is for me

My fathers' friend once met me and asked what do you offer in the college and I replied that I am a Machine Learning Engineer. Meanwhile I was just in my curious state of just exploring the many fields of computer science(data science, web/app development,etc) to just choose a particular field.He went on to ask me what will I do with the Machine Learning Engineer field.Then I said I will use it to predict data in order to draw insights for companies . So he told me he has a data which he would want me to make some analysis. And I responded that I will do that so he should just send the data. Meanwhile I only learnt to just build diabetes project . So I went home to play with some data provided by Madhu Charan blog so that I could know some of the most used functions in data science libraries(pandas, numpy etc). So I gathered the functions in my notebook so that I could refer to it incase I want to analyze data. Let me show you some:

Pandas

Pandas normally need matplotlib to work with in order to display the plot/graph df = pd.read_csv() -> to read/display the dataframe df.describe() - statistics description df.info EF56FE8F-82A3-4ADF-BE10-ADE680A71F26.png ) - data types of column head/labels df.columns - column header df[“Index”] = values or rows under the label “Index” Df.plot.box(),df[‘petal_species].plot.hist(),df.scater.plot() - pandas power df api to visualize data df.sort_values(by=),df.sort_index() Df.concat([], join=inner) - inner means intersection ,outer is union Df.merge() - just like df.concat but used to combine df of similarity or link Df.dropna()- drop missing values df.sepal_length = df.sepal_length.fillna(df.sepal_length.mean()) Df.apply()-allows you to apply a function to your df Win: df.apply is approximately df[‘index’] anywhere Df.reset_index(drop=True) - to reset index column after dropping df.iloc[3] = index 3 values will be showed df.loc[row,column] = the value intersecting will appear Df.random.choice = accept an array usually 1d to select randomly from df.hist() is for full chart on all data/input and df[‘index’].plot.hist() is for individual Df.ndim - number of dimension of dataframe

Seaborn

sns.countplot - Barchart of labels sns.boxplot - to visualize data in box formm and also outliers(data that is anomaly) Density - helps us see relationship b/n each variable and target variable plt.subplots(4,4, figsize=(20,25)) is different from plt.subplot - one is for multiple plots

Matplotlib.pyplot

plt.figure(figsize=(3,3)) - in terms of inches(width and length) %matplotlib inline- for different visualisation in notebook itself

Numpy

df["Price"] = df["Price"].clip(lower=df["Price"].quantile(0.05), upper=df["Price"].quantile(0.95))- to clip(cut short) outliers Np.random.choice(replace=false Np.random.seed = to make random numbers stagnant and not keep changing

Keras

It takes an input and pass it to next layer. each layer performs mathematical equations to it b4 passing it to the next. The core layers in keras are: dense, activation, dropout.“there are other layers that are more complex, including convolutional layers and pooling layers

So this was some the most used functions used in most dataset analysis. I was doing better😂

In conclusion

After learning some basics like python , sql , do not worry to read every library documentation. Documentation is for reference Build projects via help and on your own. Then do something different from what you already know There are people who are ahead of you. Ask for help . Get a good mentor