This repo contains all of the practical exercises I did during the Data Analytics Bootcamp @ Ironhack. The entire course lasted for 9 weeks (20-Jan, 20-March 2020) with an additional career week. It was divided into 3 modules:
In the table below is an index of each exercise ordered by bootcamp module and week, a link to the exercises, the programming language, libraries used and the main topics covered or methods used by me to solve the problems.
Mod/Week | Lab | Language | Libraries | Topics/Methods |
---|---|---|---|---|
M1-W1 | resolving-git-conflicts | Git, Command Line, Bash | - | GitHub, add, commit, push, pull, merge, conflicts, pull requests |
M1-W1 | tuple-set-dict | Python | random, operator, pandas | random.sample, operator.itemgetter, pd.DataFrame |
M1-W1 | list-comprehensions | Python | os, numpy, pandas | os.listdir, os.path.join, pd.concat,np.array, _get_numeric_data |
M1-W1 | string-operations | Python | re, math | f-strings, str.lower, str.endswith, str.join, str.split, str.replace, re.findall, re.search, bag of words |
M1-W1 | lambda-functions | Python | - | functions, lambda, zip, sorted, dict.items |
M1-W1 | numpy | Python | numpy, | np.random (random, rand, sample), np.ones, size, shape, np.reshape, np.transpose, np.array_equal, max, min, mean, np.empty, np.nditer, |
M1-W1 | functions | Python | iter | functions, iterators, generators, yield |
M1-W1 | intro-pandas | Python | pandas, numpy | pd.Series, pd.DataFrame, df.columns, subsetting, df.mean, df.max, df.median, df.sum |
M1-W1 | python-project | Python | inquirer, playsound | Escape Room python text game. functions, dictionaries, conditions |
M1-W2 | map-reduce-filter | Python | numpy, pandas, functools | functions, map, reduce, filter |
M1-W2 | import-export | Python | pandas | pd.read_csv, pd.to_csv, pd.read_excel, df.head, df.value_counts |
M1-W2 | dataframe-calculations | Python | pandas, numpy, zipfile | df.shape, df.unique, str.contains, df.astype, df.isnull, df.apply, df.sort_values, df.equals, pd.get_dummies, df.corr, df.drop, pd.groupby.agg, df.quantile, |
M1-W2 | first-queries | SQL | - | create db, create table, select, distinct, group by, order by, where, limit, count |
M1-W2 | my-sql-select | SQL | - | aliases, inner join, left join, sum, coalesce, |
M1-W2 | my-sql | SQL | - | db design, table relationships, db seeding, forward engineering schemas, one-to-many, many-to-one, many-to-many, linking tables |
M1-W2 | advanced-mysql | SQL | - | temporary tables, subqueries, permanent tables |
M1-W2 | data-cleaning | Python | pandas, numpy, scipy.stats | pd.rename, df.dtypes, pd.merge, df.fillna, np.abs, stats.zscore |
M1-W2 | project-cities | Python | pandas | collected data online from different sources and analyzed the effect of increasing AirBnBs in Lisbon on hotel prices |
M1-W3 | api-scavenger | Python, APIs, Command Line | pandas, pandas.io.json | curl, pd.read_json, json_normalize, pd.to_datetime |
M1-W3 | web-scraping | Python, APIs | requests, beautifulsoup, tweepy | requests.get, requests.get.content, BeautifulSoup, soup.find_all, soup.tag.text, soup.tag.get, soup.tag.find, tweepy.get_user, tweepy.user_timeline, tweepy.user.statuses_count, tweepy.user.follower_count |
M1-W3 | advanced-regex | Python | re | re.findall, re.sub, |
M1-W3 | matplotlib-seaborn | Python | matplotlib.pyplot, seaborn, numpy, pandas | plt.plot, plt.show, plt.subplots, plt.legend, plt.bar, plt.barh, plt.pie, plt.boxplot, plt.xticks, ax.set_title, ax.set_xlabel, sns.set, sns.distplot, sns.barplot, sns.despine, sns.violinplot, sns.catplot, sns.heatmap, np.linspace, pd.select_dtypes, pd.Categorical, df.cat.codes, np.triu, sns.diverging_palette |
M1-W3 | pandas-deep-dive | Python | pandas | df.describe, df.groupby.agg, df.apply |
M1-W3 | project-data-thieves | Python | pandas, geopandas, geoplot | data from kaggle survey and web scraping to analyze the best countries in the world to work in data jobs (quality of life, number of offers and average salaries) |
M2-W4 | subsetting-and-descriptive-stats | Python | pandas, matplotlib, seaborn | df.loc, df.groupby.agg, df.quantile, df.describe, |
M2-W4 | understanding-descriptive-stats | Python | pandas, random, matplotlib, numpy | random.choice, plt.hist, plt.vlines, np.mean, np.std |
M2-W4 | regression-analysis | Python | numpy, pandas, scipy, sklearn.linear_model, matplotlib, seaborn | plt.scatter, df.corr, scipy.stats.linregress, sns.heatmap, sklearn.LinearRegression, lm.fit, lm.score, lm.coef_, lm.intercept |
M2-W4 | advanced-pandas | Python | pandas, numpy, random | df.isnull, df.set_index, df.reset_index, random.choices, df.lookup, pd.cut |
M2-W4 | mini-project1 | Python | pandas, numpy, matplotlib, seaborn, scipy.stats | EDA, df.map, df.info, df.apply (with lambda), df.replace, df.dropna, sns.boxplot, plt.subplots_adjust, df.drop, sns.pairplot, sns.regplot, sns.jointplot, stats.linregress |
M2-W4 | pivot-table-and-correlation | Python | pandas, scipy.stats | df.pivot_table(index, columns, aggfunc), stats.linregress, plt.scatter, stats.pearsonr, stats.speamanr |
M2-W4 | tableau | Tableau | - | mini project: analyzed the relationship between the number of characters in the title and description of apps and umber of downloads |
M2-W5 | intro-probability | Probability | - | probability space, conditional probability, contingency tables |
M2-W5 | reading-stats-concepts | Statistics | - | p-values, AB testing, means and expected values |
M2-W5 | probability-distributions | Python | scipy.stats, numpy | discrete: stats.binom, stats.poisson. continuous: stats.uniform, stats.norm, stats.expon, np.random.exponential, stats.rvs, stats.cdf, stats.pdf, stats.ppf |
M2-W5 | confidence-intervals | Python | scipy.stats, numpy | stats.norm.interval, calculating sample sizes |
M2-W5 | intro-to-scipy | Python | scipy, numpy | stats.tmean, stats.fisher_exact, scipy.interpolate, interpolate.interp1d, np.arange |
M2-W5 | hypothesis-testing-1 | Python | scipy.stats, numpy, pandas, statsmodels | stats.ttest_1samp, stats.sem, stats.t.interval, pd.crosstab, statsmodels.proportions_ztest |
M2-W5 | hypothesis-testing-2 | Python | pandas, scipy.stats | stats.f_oneway, stats.ttest_ind, stats.ttest_rel, pd.concat |
M2-W5 | mini-project2 | Python | pandas, numpy, scipy.stats, matplotlib | stats.norm, stats.ppf, stats.t.interval, stats.pdf, np.linspace, stats.shapiro |
M2-W6 | two-sample-hyp-test | Python | pandas, scipy.stats, numpy | stats.ttest_ind, stats.ttest_rel, stats.ttest_1samp, stats.chi2_contingency, np.where |
M2-W6 | goodfit-indeptests | Python | scipy.stats, numpy | stats.poisson, stats.pmf, stats.chisquare, stats.norm, stats.kstest, stats.cdf, stats.chi2_contingency, stats.binom |
M3-W7 | intro-to-ml | Python | pandas, numpy, datetime, sklearn.model_selection | pd.to_numeric, df.interpolate, np.where, dt.strptime, dt.toordinal, train_test_split |
M3-W7 | supervised-learning-feature-extraction | Python | pandas, numpy | pd.to_numeric, df.apply, pd.to_datetime, np.where, pd.merge |
M3-W7 | supervised-learning | Python | pandas, seaborn, sklearn.model_selection, sklearn.linear_model, LogisticRegression, sklearn.neighbors, sklearn.preprocessin | df.corr, sns.heatmap, df.drop, df.dropna, pd.get_dummies, train_test_split, LogisticRegression, confusion_matrix, accuracy_score, KNeighborsClassifier, RobustScaler |
M3-W7 | supervised-learning-sklearn | Python | sklearn.linear_model, sklearn.datasets, sklearn.preprocessing, sklearn.model_selection, statsmodels.api, sklearn.metrics, sklearn.feature_selection | LinearRegression, load_diabetes, PolynomialFeatures, StandardScaler, train_test_split, sm.OLS, r2_score, RFE |
M3-W7 | unsupervised-learning | Python | sklearn.preprocessing, sklearn.cluster, sklearn.metrics, yellowbrick.cluster | StandardScaler, KMeans, silhouette_score, KElbowVisualizer, DBSCAN |
M3-W7 | unsupervised-learning-and-sklearn | Python | sklearn.preprocessing, sklearn.cluster, mpl_toolkits.mplot3d | LabelEncoder, KMeans, fig.gca(projection='3d') |
M3-W8 | problems-in-ml | Python | sklearn.metrics, sklearn.model_selection, sklearn.ensemble, sklearn.datasets, sklearn.svm, matplotlib.colors | r2_score, mean_squared_error, train_test_split, RandomForestRegressor, load_boston, SVC, ListedColormap |
M3-W8 | imbalance | Python | sklearn.model_selection, sklearn.preprocessing, sklearn.linear_model, sklearn.tree, sklearn.preprocessing, sklearn.metrics | train_test_split, LabelEncoder, LogisticRegression, DecisionTreeClassifier, RobustScaler, StandardScaler, PolynomialFeatures, MinMaxScaler, confusion_matrix, accuracy_score |
M3-W8 | deep-learning | Python | tensorflow, keras.models, keras.layers, keras.utils, sklearn.model_selection | keras.Sequential, keras.Dense, keras.to_categorical, save_weights, load_weights |
M3-W8 | nlp | Python | re, nltk, nltk.stem, nltk.corpus, sklearn.feature_extraction.text, nltk.probability | WordNetLemmatizer, stopwords, CountVectorizer, TfidfVectorizer, ConditionalFreqDist, nltk.word_tokenize, nltk.PorterStemmer, nltk.WordNetLemmatizer, nltk.NaiveBayesClassifier, nltk.classify.accuracy, classifier.show_most_informative_features |