Skip to content

The collection of exercises I did during Ironhack's Data Science bootcamp.

Notifications You must be signed in to change notification settings

ricardozacarias/ironhack-labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ironhack Data Analytics Bootcamp

This repo contains all of the practical exercises I did during the Data Analytics Bootcamp @ Ironhack. The entire course lasted for 9 weeks (20-Jan, 20-March 2020) with an additional career week. It was divided into 3 modules:

  1. Git, Python and SQL;
  2. Statistics and probability;
  3. Machine Learning;

Lab Index

In the table below is an index of each exercise ordered by bootcamp module and week, a link to the exercises, the programming language, libraries used and the main topics covered or methods used by me to solve the problems.

Mod/Week Lab Language Libraries Topics/Methods
M1-W1 resolving-git-conflicts Git, Command Line, Bash - GitHub, add, commit, push, pull, merge, conflicts, pull requests
M1-W1 tuple-set-dict Python random, operator, pandas random.sample, operator.itemgetter, pd.DataFrame
M1-W1 list-comprehensions Python os, numpy, pandas os.listdir, os.path.join, pd.concat,np.array, _get_numeric_data
M1-W1 string-operations Python re, math f-strings, str.lower, str.endswith, str.join, str.split, str.replace, re.findall, re.search, bag of words
M1-W1 lambda-functions Python - functions, lambda, zip, sorted, dict.items
M1-W1 numpy Python numpy, np.random (random, rand, sample), np.ones, size, shape, np.reshape, np.transpose, np.array_equal, max, min, mean, np.empty, np.nditer,
M1-W1 functions Python iter functions, iterators, generators, yield
M1-W1 intro-pandas Python pandas, numpy pd.Series, pd.DataFrame, df.columns, subsetting, df.mean, df.max, df.median, df.sum
M1-W1 python-project Python inquirer, playsound Escape Room python text game. functions, dictionaries, conditions
M1-W2 map-reduce-filter Python numpy, pandas, functools functions, map, reduce, filter
M1-W2 import-export Python pandas pd.read_csv, pd.to_csv, pd.read_excel, df.head, df.value_counts
M1-W2 dataframe-calculations Python pandas, numpy, zipfile df.shape, df.unique, str.contains, df.astype, df.isnull, df.apply, df.sort_values, df.equals, pd.get_dummies, df.corr, df.drop, pd.groupby.agg, df.quantile,
M1-W2 first-queries SQL - create db, create table, select, distinct, group by, order by, where, limit, count
M1-W2 my-sql-select SQL - aliases, inner join, left join, sum, coalesce,
M1-W2 my-sql SQL - db design, table relationships, db seeding, forward engineering schemas, one-to-many, many-to-one, many-to-many, linking tables
M1-W2 advanced-mysql SQL - temporary tables, subqueries, permanent tables
M1-W2 data-cleaning Python pandas, numpy, scipy.stats pd.rename, df.dtypes, pd.merge, df.fillna, np.abs, stats.zscore
M1-W2 project-cities Python pandas collected data online from different sources and analyzed the effect of increasing AirBnBs in Lisbon on hotel prices
M1-W3 api-scavenger Python, APIs, Command Line pandas, pandas.io.json curl, pd.read_json, json_normalize, pd.to_datetime
M1-W3 web-scraping Python, APIs requests, beautifulsoup, tweepy requests.get, requests.get.content, BeautifulSoup, soup.find_all, soup.tag.text, soup.tag.get, soup.tag.find, tweepy.get_user, tweepy.user_timeline, tweepy.user.statuses_count, tweepy.user.follower_count
M1-W3 advanced-regex Python re re.findall, re.sub,
M1-W3 matplotlib-seaborn Python matplotlib.pyplot, seaborn, numpy, pandas plt.plot, plt.show, plt.subplots, plt.legend, plt.bar, plt.barh, plt.pie, plt.boxplot, plt.xticks, ax.set_title, ax.set_xlabel, sns.set, sns.distplot, sns.barplot, sns.despine, sns.violinplot, sns.catplot, sns.heatmap, np.linspace, pd.select_dtypes, pd.Categorical, df.cat.codes, np.triu, sns.diverging_palette
M1-W3 pandas-deep-dive Python pandas df.describe, df.groupby.agg, df.apply
M1-W3 project-data-thieves Python pandas, geopandas, geoplot data from kaggle survey and web scraping to analyze the best countries in the world to work in data jobs (quality of life, number of offers and average salaries)
M2-W4 subsetting-and-descriptive-stats Python pandas, matplotlib, seaborn df.loc, df.groupby.agg, df.quantile, df.describe,
M2-W4 understanding-descriptive-stats Python pandas, random, matplotlib, numpy random.choice, plt.hist, plt.vlines, np.mean, np.std
M2-W4 regression-analysis Python numpy, pandas, scipy, sklearn.linear_model, matplotlib, seaborn plt.scatter, df.corr, scipy.stats.linregress, sns.heatmap, sklearn.LinearRegression, lm.fit, lm.score, lm.coef_, lm.intercept
M2-W4 advanced-pandas Python pandas, numpy, random df.isnull, df.set_index, df.reset_index, random.choices, df.lookup, pd.cut
M2-W4 mini-project1 Python pandas, numpy, matplotlib, seaborn, scipy.stats EDA, df.map, df.info, df.apply (with lambda), df.replace, df.dropna, sns.boxplot, plt.subplots_adjust, df.drop, sns.pairplot, sns.regplot, sns.jointplot, stats.linregress
M2-W4 pivot-table-and-correlation Python pandas, scipy.stats df.pivot_table(index, columns, aggfunc), stats.linregress, plt.scatter, stats.pearsonr, stats.speamanr
M2-W4 tableau Tableau - mini project: analyzed the relationship between the number of characters in the title and description of apps and umber of downloads
M2-W5 intro-probability Probability - probability space, conditional probability, contingency tables
M2-W5 reading-stats-concepts Statistics - p-values, AB testing, means and expected values
M2-W5 probability-distributions Python scipy.stats, numpy discrete: stats.binom, stats.poisson. continuous: stats.uniform, stats.norm, stats.expon, np.random.exponential, stats.rvs, stats.cdf, stats.pdf, stats.ppf
M2-W5 confidence-intervals Python scipy.stats, numpy stats.norm.interval, calculating sample sizes
M2-W5 intro-to-scipy Python scipy, numpy stats.tmean, stats.fisher_exact, scipy.interpolate, interpolate.interp1d, np.arange
M2-W5 hypothesis-testing-1 Python scipy.stats, numpy, pandas, statsmodels stats.ttest_1samp, stats.sem, stats.t.interval, pd.crosstab, statsmodels.proportions_ztest
M2-W5 hypothesis-testing-2 Python pandas, scipy.stats stats.f_oneway, stats.ttest_ind, stats.ttest_rel, pd.concat
M2-W5 mini-project2 Python pandas, numpy, scipy.stats, matplotlib stats.norm, stats.ppf, stats.t.interval, stats.pdf, np.linspace, stats.shapiro
M2-W6 two-sample-hyp-test Python pandas, scipy.stats, numpy stats.ttest_ind, stats.ttest_rel, stats.ttest_1samp, stats.chi2_contingency, np.where
M2-W6 goodfit-indeptests Python scipy.stats, numpy stats.poisson, stats.pmf, stats.chisquare, stats.norm, stats.kstest, stats.cdf, stats.chi2_contingency, stats.binom
M3-W7 intro-to-ml Python pandas, numpy, datetime, sklearn.model_selection pd.to_numeric, df.interpolate, np.where, dt.strptime, dt.toordinal, train_test_split
M3-W7 supervised-learning-feature-extraction Python pandas, numpy pd.to_numeric, df.apply, pd.to_datetime, np.where, pd.merge
M3-W7 supervised-learning Python pandas, seaborn, sklearn.model_selection, sklearn.linear_model, LogisticRegression, sklearn.neighbors, sklearn.preprocessin df.corr, sns.heatmap, df.drop, df.dropna, pd.get_dummies, train_test_split, LogisticRegression, confusion_matrix, accuracy_score, KNeighborsClassifier, RobustScaler
M3-W7 supervised-learning-sklearn Python sklearn.linear_model, sklearn.datasets, sklearn.preprocessing, sklearn.model_selection, statsmodels.api, sklearn.metrics, sklearn.feature_selection LinearRegression, load_diabetes, PolynomialFeatures, StandardScaler, train_test_split, sm.OLS, r2_score, RFE
M3-W7 unsupervised-learning Python sklearn.preprocessing, sklearn.cluster, sklearn.metrics, yellowbrick.cluster StandardScaler, KMeans, silhouette_score, KElbowVisualizer, DBSCAN
M3-W7 unsupervised-learning-and-sklearn Python sklearn.preprocessing, sklearn.cluster, mpl_toolkits.mplot3d LabelEncoder, KMeans, fig.gca(projection='3d')
M3-W8 problems-in-ml Python sklearn.metrics, sklearn.model_selection, sklearn.ensemble, sklearn.datasets, sklearn.svm, matplotlib.colors r2_score, mean_squared_error, train_test_split, RandomForestRegressor, load_boston, SVC, ListedColormap
M3-W8 imbalance Python sklearn.model_selection, sklearn.preprocessing, sklearn.linear_model, sklearn.tree, sklearn.preprocessing, sklearn.metrics train_test_split, LabelEncoder, LogisticRegression, DecisionTreeClassifier, RobustScaler, StandardScaler, PolynomialFeatures, MinMaxScaler, confusion_matrix, accuracy_score
M3-W8 deep-learning Python tensorflow, keras.models, keras.layers, keras.utils, sklearn.model_selection keras.Sequential, keras.Dense, keras.to_categorical, save_weights, load_weights
M3-W8 nlp Python re, nltk, nltk.stem, nltk.corpus, sklearn.feature_extraction.text, nltk.probability WordNetLemmatizer, stopwords, CountVectorizer, TfidfVectorizer, ConditionalFreqDist, nltk.word_tokenize, nltk.PorterStemmer, nltk.WordNetLemmatizer, nltk.NaiveBayesClassifier, nltk.classify.accuracy, classifier.show_most_informative_features