Welcome to CodeSnippets! Here you can search for various codes and Code Snippets filtered by categories and CRISP-DM methodology. As a user, you can only search these codes. After registration, you can add your own categories and upload your personal codes and Code Snippets to help others too.

Importing Required Libraries

Importing Pandas library

Business understanding
import pandas as pd

Importing train_test_split library

Business understanding
from sklearn.model_selection import train_test_split

Importing DecisionTreeClassifier library

Business understanding
from sklearn.tree import DecisionTreeClassifier

Importing metrics library

Business understanding
from sklearn import metrics

Importing matplotlib library

Business understanding
import matplotlib.pyplot as plt

Importing Counter library

Business understanding
from collections import Counter

Importing confusion_matrix library

Business understanding
from sklearn.metrics import confusion_matrix

Importing Pandas library

Business understanding
import pandas as pd

Importing train_test_split library

Business understanding
from sklearn.model_selection import train_test_split

Importing DecisionTreeClassifier library

Business understanding
from sklearn.tree import DecisionTreeClassifier

Importing metrics library

Business understanding
from sklearn import metrics

Importing NumPy library

Business understanding
import numpy as np

Importing Seaborn library

Business understanding
import seaborn as sns

Importing Pandas library

Business understanding
import pandas as pd

Importing train_test_split library

Business understanding
from sklearn.model_selection import train_test_split

Importing DecisionTreeClassifier library

Business understanding
from sklearn.tree import DecisionTreeClassifier

Importing metrics library

Business understanding
from sklearn import metrics

Importing Counter library

Business understanding
from collections import Counter

Importing confusion_matrix library

Business understanding
from sklearn.metrics import confusion_matrix

Importing NumPy library

Business understanding
import numpy as np

Importing accuracy_score library

Business understanding
from sklearn.metrics import accuracy_score

Importing precision_recall_fscore_support

Business understanding
from sklearn.metrics import precision_recall_fscore_support

Importing Pandas library

Business understanding
import pandas as pd

Importing train_test_split library

Business understanding
from sklearn.model_selection import train_test_split

Importing matplotlib library

Business understanding
import matplotlib.pyplot as plt

Importing NumPy library

Business understanding
import numpy as np

Importing DecisionTreeRegressor for regression prediction

Business understanding
from sklearn.tree import DecisionTreeRegressor

Importing Pandas library

Business understanding
import pandas as pd

Importing train_test_split library

Business understanding
from sklearn.model_selection import train_test_split

Importing Counter library

Business understanding
from collections import Counter

Importing StandardScaler

Business understanding
from sklearn.preprocessing import StandardScaler

Importing LogisticRegression

Business understanding
from sklearn.linear_model import LogisticRegression

Importing matplotlib library

Business understanding
import matplotlib.pyplot as plt

Importing NumPy library

Business understanding
import numpy as np

Importing numpy library for array operations

Business understanding
from numpy import array

Importing colors for visualization

Business understanding
from matplotlib.colors import ListedColormap

Setting inline mode for Jupyter plot display

Business understanding
%matplotlib inline

Importing library for random value generation

Business understanding
import random

Importing matplotlib library

Business understanding
import matplotlib.pyplot as plt

Importing NumPy library

Business understanding
import numpy as np

Getting input data matrix shape

Data preparation
m, n = X.shape

Creating bias column for data matrix

Data preparation
bias = np.ones((X.shape[0], 1))

Expanding data matrix with bias column

Data preparation
biased_X = np.hstack((bias, X))

Initializing random number generator with fixed seed

Data preparation
random_gen = np.random.RandomState(1)

Importing Pandas library

Business understanding
import pandas as pd

Importing train_test_split library

Business understanding
from sklearn.model_selection import train_test_split

Importing Keras Dense layer for model building

Business understanding
from tensorflow.keras.layers import Dense

Importing Keras Sequential API

Business understanding
from tensorflow.keras import Sequential

Importing Pandas library

Business understanding
import pandas as pd

Importing train_test_split library

Business understanding
from sklearn.model_selection import train_test_split

Importing Counter library

Business understanding
from collections import Counter

Importing NumPy library

Business understanding
import numpy as np

Importing Seaborn library

Business understanding
import seaborn as sns

Importing Keras Dense layer for model building

Business understanding
from tensorflow.keras.layers import Dense

Importing Keras Sequential API

Business understanding
from tensorflow.keras import Sequential

Importing one-hot encoding tools

Business understanding
from tensorflow.keras.utils import to_categorical

Importing Pandas library

Business understanding
import pandas as pd

Importing regex for text processing

Business understanding
import re

Importing Pandas library

Business understanding
import pandas as pd

Importing Counter library

Business understanding
from collections import Counter

Importing regex for text processing

Business understanding
import re

Importing NLP toolkit (NLTK)

Business understanding
import nltk

Importing HTTP request library

Business understanding
import requests

Importing HTML parser from lxml

Business understanding
from lxml import html

Downloading NLTK tokenizer module

Business understanding
nltk.download('punkt')

Importing word tokenizer

Business understanding
from nltk.tokenize import word_tokenize

Loading Data

Loading dataset

Data understanding
diabetes = pd.read_csv('diabetes_inbalanced.csv', index_col=0)

Loading dataset

Data understanding
titanic = pd.read_csv('titanic.csv')

Loading dataset

Data understanding
diabetes = pd.read_csv('diabetes_inbalanced.csv', index_col=0)

Loading dataset

Data understanding
df = pd.read_csv('ice_cream_data.csv', sep=";")

Loading dataset

Data understanding
df = pd.read_csv('Heart.csv')

Defining training data (input examples and expected outputs)

Data preparation
training_data = [
    (array([121,16.8]), 1),
    (array([114,15.2]), 1),
    (array([210,9.4]), -1),
    (array([195,8.1]), -1),
] 

Alternative training set for XOR problem

Data preparation
training_data = [
    (array([3,-2]), -1),
    (array([3,1]), 1),
    (array([2,0]), -1),
] 

Generating linearly separable data with two classes

Data preparation
X, y = datasets.make_blobs(n_samples=100,n_features=2,
                           centers=2,cluster_std=1,
                           random_state=3) 

Test data for model error calculation

Data preparation
mal_byt = np.array([1,2,3,4])
bol = np.array([1,0,2,5])

Loading dataset

Data understanding
titanic = pd.read_csv('titanic.csv')

Loading javelin throw dataset

Data understanding
data = pd.read_csv('darts.csv')

Opening text file for reading

Data understanding
text_file = open('human_rights.txt', 'r')

Loading text file content

Data understanding
h_rights = text_file.read()

Loading tweets from CSV file

Data understanding
tweets = pd.read_csv("tweets.csv")

Sample tweet for regex demonstration

Data preparation
tweet = "@nltk T awesome! #regex #pandas #python"

Sample text for NLP operations

Data preparation
text = "The cat is in the box. The cat likes the box. The box is over the cat."

Data Visualization

Printing contents of 'person' variable

Data preparation
osoba

Displaying scaled training data

Data understanding
X_train_scaled

Displaying first 4 rows of expanded matrix

Data preparation
biased_X[:4]

Displaying first 4 predicted values

Data preparation
output_pred[:4]

Displaying first 4 model errors

Data preparation
errors[:4]

Data Preprocessing

Encoding gender

Data preparation
titanic['Sex'] = titanic['Sex'].replace({'male': 0, 'female': 1})

Reshaping input data to 2D array for model compatibility

Data preparation
osoba = osoba.reshape(1,-1)

Removing 'Unnamed: 0' column

Data preparation
df = df.drop(columns='Unnamed: 0')

Converting ChestPain category to numerical values

Data preparation
df['ChestPain'] = df['ChestPain'].astype('category')
df['ChestPain'] = df['ChestPain'].cat.codes

Converting Thal category to numerical values

Data preparation
df['Thal'] = df['Thal'].astype('category')
df['Thal'] = df['Thal'].cat.codes

Converting AHD category to numerical values

Data preparation
df['AHD'] = df['AHD'].astype('category')
df['AHD'] = df['AHD'].cat.codes

Scaling training data

Data preparation
X_train_scaled = scaler.fit_transform(X_train)

Scaling test data

Data preparation
X_test_scaled = scaler.transform(X_test)

Encoding gender

Data preparation
titanic['Sex'] = titanic['Sex'].replace({'male': 0, 'female': 1})

Extending class encoding for multi-class classification

Data preparation
multi = data
multi["competitor"]=multi["competitor"].replace({'Steve':0,'Susan':1,'Michael':2,'Kate':3})

Feature Selection

Selecting all dataset features except target variable, defining target variable

Data preparation
X = diabetes[diabetes.columns.difference(['Outcome'])]
y = diabetes['Outcome']
y=y.astype('int')

Splitting data into training/test sets

Data preparation
X = titanic[titanic.columns.difference(['Survived'])]
y = titanic['Survived']
y=y.astype('int')

Defining test vector representing individual with various attributes

Data preparation
osoba = np.array([10, #age
                  0, #fare
                  0, #parent/children
                  1, #pclass
                  0, #sex
                  3]) #siblings/spouses

Selecting all dataset features except target variable, defining target variable

Data preparation
X = diabetes[diabetes.columns.difference(['Outcome'])]
y = diabetes['Outcome']
y=y.astype('int')

Separating input features (X) from target variable (y)

Data preparation
X = df.drop(['Revenue'], axis = 1)
y = df['Revenue']

Selecting Temperature and Revenue attributes

Data preparation
X = df['Temperature'].values
y = df['Revenue'].values

Creating input vector for visualization

Data preparation
vstup = df.drop(["Revenue"], axis=1)

Separating input features (X) from target variable (y)

Data preparation
X = df.drop(columns = "AHD")
y = df['AHD']

Splitting data into training/test sets

Data preparation
X = titanic[titanic.columns.difference(['Survived'])]
y = titanic['Survived']
y=y.astype('int')

Separating input features from target variable

Data preparation
X = vyber[vyber.columns.difference(['competitor'])]
y = vyber['competitor']
y=y.astype('int')

Preparing data for 4-class classification

Data preparation
X = multi[multi.columns.difference(['competitor'])]
y = to_categorical(multi['competitor'])

Splitting Data

Splitting data into training and test sets

Data preparation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Splitting data into training and test sets

Data preparation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Splitting data into training and test sets

Data preparation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Splitting data into training and test sets

Data preparation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Splitting data into train/test sets

Data preparation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)

Splitting data into training and test sets

Data preparation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Splitting data into train/test sets

Data preparation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

Building Decision Tree Model

Creating decision tree model with gini criterion

Modeling
clf = DecisionTreeClassifier(criterion='gini')

Training decision tree

Modeling
clf = DecisionTreeClassifier()

Training decision tree

Modeling
clf = DecisionTreeClassifier()

Creating decision tree model

Modeling
regressor = DecisionTreeRegressor()

Creating StandardScaler model

Modeling
scaler = StandardScaler()

Creating and training logistic model

Modeling
log_reg = LogisticRegression(random_state=0).fit(X_train_scaled, y_train)

Model Training

Training decision tree on training data

Modeling
clf = clf.fit(X_train,y_train)

Training decision tree on training data

Modeling
clf = clf.fit(X_train,y_train)

Training decision tree on training data

Modeling
clf = clf.fit(X_train,y_train)

Training decision tree

Modeling
regressor.fit(X_train, y_train)

Training model for 50 epochs

Modeling
model.fit(X_train, y_train, epochs=50)

Training model for 200 epochs

Modeling
model.fit(X_train, y_train, epochs=200)

Model Prediction

Predicting values on test data

Modeling
y_pred = clf.predict(X_test)

Predicting values on test data

Modeling
y_pred = clf.predict(X_test)

Using decision tree model to predict for 'person'

Modeling
clf.predict(osoba)

Predicting values on test data

Modeling
y_pred = clf.predict(X_test)

Predicting values on test data

Modeling
y_pred = regressor.predict(X_test)

Predicting values on training data

Modeling
log_reg.predict(X_train_scaled)

Predicting probabilities for test data

Modeling
log_reg.predict_proba(X_test_scaled)

Predicting output for custom input vector

Modeling
vstup_q = np.array([[-4,8]])
classifier.predict(vstup_q)

Using model to predict test data (X_test)

Modeling
predictions = model.predict(X_test)

Generating predictions for test data

Evaluation
y_pred = model.predict(X_test).round()

Generating predictions for test data

Evaluation
y_pred = model.predict(X_test).round()

Evaluating the Model

Evaluating model accuracy

Evaluation
print("Presnosť:",metrics.accuracy_score(y_test, y_pred))

Creating confusion matrix to analyze correct/incorrect predictions

Evaluation
confusion_matrix(y_test, y_pred, labels=[1,0])

Evaluating model accuracy

Evaluation
print("Presnosť:",metrics.accuracy_score(y_test, y_pred))

Displaying maximum depth of trained decision tree

Evaluation
clf.get_depth()

Listing all parameters of trained decision tree

Evaluation
clf.get_params()

Creating confusion matrix to analyze correct/incorrect predictions

Evaluation
confusion_matrix(y_test, y_pred, labels=[1,0])

Evaluating model accuracy

Evaluation
accuracy_score(y_test, y_pred)

Calculating precision, recall, f1-score and support

Evaluation
precision_recall_fscore_support(y_test, y_pred, labels=[1, 0])

Printing precision, recall, f1-score and support

Evaluation
p, r, f, s = precision_recall_fscore_support(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print('acc: ',metrics.accuracy_score(y_test, y_pred))
print('prec: ',((p[0]+p[1])/2),'(',p[0],' / ',p[1],')')
print('rec: ',((r[0]+r[1])/2),'(',r[0],' / ',r[1],')')
print('f1-sc:',((f[0]+f[1])/2))

print(confusion_matrix(y_test, y_pred)) 

Creating DataFrame for result comparison

Evaluation
d = pd.DataFrame({'Real Values':y_test, 'Predicted Values':y_pred})

Calculating residual squares

Evaluation
d['sqr_res'] = pow((d['Real Values'] - d['Predicted Values']), 2)

Summing residual squares

Evaluation
d['sqr_res'].sum()

Evaluating model accuracy on training data

Evaluation
log_reg.score(X_train_scaled, y_train)

Evaluating model accuracy on test data

Evaluation
log_reg.score(X_test_scaled, y_test)

Evaluating model accuracy

Evaluation
print("Presnosť:",metrics.accuracy_score(y_test, y_pred))

Displaying first 6 predictions

Evaluation
preds[:6]

Displaying first 6 actual values

Evaluation
y_test[:6]

Converting predictions to class labels

Evaluation
labels_predict=np.argmax(y_pred,axis=1)
labels_predict[:6]

Creating confusion matrix for evaluation

Evaluation
confusion_matrix(labels_predict, np.argmax(y_test, axis=1))

Printing classifier accuracy

Evaluation
print("Presnost: ",metrics.accuracy_score(labels_predict, np.argmax(y_test, axis=1)))

Visualizing Decision Trees

Visualizing decision tree

Deployment
plt.figure(figsize=(40,20))
plot_tree(regressor, feature_names=vstup.columns.tolist())

Data Analysis

Displaying basic statistical values of dataset

Data understanding
diabetes.describe()

Counting value frequency of attribute

Data understanding
Counter(diabetes.Outcome)

Displaying basic statistical values of dataset

Data understanding
diabetes.describe()

Counting value frequency of attribute

Data understanding
Counter(diabetes.Outcome)

Counting frequency of 'AHD' attribute values

Data understanding
Counter(df.AHD)

Checking for missing values

Data understanding
df.isnull().sum()

Analyzing class distribution in data

Data preparation
Counter(vyber.competitor)

Displaying loaded text content

Data understanding
h_rights

Calculating total character count

Data preparation
len(h_rights)

Calculating number of unique words

Evaluation
len(set(h_rights.split()))

Finding longest word in text

Evaluation
max = 0
for w in slova:
    if len(w)>max:
        max = len(w)

Displaying tweet dataset structure

Data understanding
tweets.head()

Applying word count function to entire dataset

Data preparation
tweets['word_count'] = tweets.apply(lambda x: tweet_count(x), axis = 1)

Calculating tweet character counts

Data preparation
tweets['char_count'] = tweets['tweet'].str.len()

Calculating average word length in tweets

Evaluation
tweets['avg_len'] = (tweets['char_count'] - (tweets['word_count'] - 1)) / tweets['word_count']

Model Interpretation

Displaying decision path taken for 'person' prediction

Deployment
clf.decision_path(osoba).toarray()

Defining decision boundary line for visualization

Evaluation
def priamka(x):
    y = (W[0]*x + b)/(W[1]*(-1))
    return y 

Printing model structure

Modeling
model.summary()

Printing model structure

Modeling
model.summary()

Advanced Visualization

Visualizing actual vs predicted values

Deployment
plt.scatter(X_test, y_test, color='red')
plt.scatter(X_test, y_pred, color='green')
plt.title('Decision Tree Regression')
plt.xlabel('Temperature')
plt.ylabel('Revenue')
plt.show()

Creating grid for smoother visualization

Deployment
X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))

Visualizing decision tree

Deployment
plt.plot(X_grid, regressor.predict(X_grid), color='black')
plt.title('Decision Tree Regression')
plt.xlabel('Temperature')
plt.ylabel('Revenue')
plt.show()

Visualizing training data and decision boundary

Evaluation
cm = plt.cm.RdBu
cm_bright = ListedColormap(['#FF0000', '#0000FF'])
ax = plt.subplot()
ax.set_title("Result")

for x, expected in training_data:
    if expected==1:
        vzor='r'
    else:

        vzor='b'
   # print(x[0])
    ax.scatter(x[0], x[1], color=vzor)

plt.plot([100,300],[priamka(100),priamka(300)])
plt.show() 

Visualizing decision boundary for XOR problem

Evaluation
cm = plt.cm.RdBu
cm_bright = ListedColormap(['#FF0000', '#0000FF'])
ax = plt.subplot()
ax.set_title("Result")

for x, expected in training_data:
    if expected==1:
        vzor='r'
    else:
        vzor='b'    ax.scatter(x[0], x[1], color=vzor)

plt.plot([0,8],[priamka(0),priamka(8)])
plt.show() 

Visualizing loss vs epochs relationship

Evaluation
plt.plot(range(1, len(classifier.cost) + 1), classifier.cost)
plt.title("Adaline: learn-rate 0.001")
plt.xlabel('Epochs')
plt.ylabel('Cost (Sum-of-Squares)')
plt.show()

Visualizing pairwise variable relationships

Evaluation
sns.pairplot(data, hue='competitor')

Data Exploration

Displaying dataset

Data understanding
diabetes

Displaying first 5 dataset rows for quick overview

Data understanding
titanic.head()

Neural Network Setup

Defining activation function for perceptron (step 0: biological neuron inspiration)

Modeling
def aktivacna_fn(x):
    if x>=0:
        return 1
    else:
        return -1 

Calculating neuron output (weighted sum of inputs + bias)

Modeling
def neuron(X,W,b):
 return  aktivacna_fn(np.dot(X,W) + b) 

Initializing weights and bias with random values

Modeling
W = array([-30,300])
b = -1230
eta = 0.01
print('aktualne vahy: ' , W)
print('bias: ', b) 

Training perceptron using delta rule (weight updates based on error)

Modeling
for i in range(0, 4):
    print('---')
    x, y = training_data[i]
    print('trenovacie data: ' , x , ', vysledok: ', y)

    predikcia = neuron(x,W,b)
    print('predikcia: ',predikcia)
    chyba = y - predikcia
    if (chyba != 0):
        print('potrebne je upravit vahy')
        W = W + (eta * chyba * x)
        b = b + (eta * chyba * 1)
    print('aktualne vahy: ' , W)
    print('bias: ', b) 

Predicting output for custom input vector

Modeling
vektor = array ([100, 10])
neuron(vektor, W, b)

Initializing weights and bias with random values (XOR problem)

Modeling
r1 = random.randint(-100, 100)
r2 = random.randint(-100, 100)
W = array([r1,r2])
b = random.randint(-100, 100)
eta = 0.5
print('aktualne vahy: ' , W)
print('bias: ', b) 

Training perceptron in epochs (iterating through training data)

Modeling
uprava_vahy = True
epocha_id = 1

while uprava_vahy:
  print('epocha: ', epocha_id)
  epocha_id += 1
  uprava_vahy = False
  for i in range(0, 3):
    print('---')
    x, y = training_data[i]
    predikcia = neuron(x,W,b)    chyba = y - predikcia    if (chyba != 0):
        uprava_vahy = True
        W = W + (eta * chyba * x)
        b = b + (eta * chyba * 1)
    print('aktualne vahy: ' , W, ', bias: ', b) 

Defining sum of squared errors function

Modeling
def sum_squared_errors(y, output_pred):
    errors = y - output_pred
    return (errors**2).sum()/2.0

Calculating errors between expected and predicted values

Modeling
sum_squared_errors(mal_byt, bol)

Calculating weighted sum of inputs (neuron's internal potential)

Modeling
def vnutorny_potencial(X, weights):
    return np.dot(X, weights)

Defining linear activation function for Adaline (identity function)

Modeling
def aktivacna_fn(x):
    return x 

Generating initial weights from normal distribution

Modeling
weights = random_gen.normal(loc = 0.0, scale = 0.01, size = biased_X.shape[1]) 

Initializing list for storing errors and calculating predictions

Modeling
cost = []
learn_rate = 0.5
output_pred = aktivacna_fn(vnutorny_potencial(biased_X, weights)) 

Calculating errors between actual and predicted values

Modeling
errors = y - output_pred

Updating weights using gradient descent

Modeling
weights += (learn_rate * biased_X.T.dot(errors))

Displaying updated model weights

Modeling
weights

Calculating loss using sum of squared errors

Evaluation
cost_i = (errors**2).sum() /2.0
cost_i = sum_squared_errors(y,output_pred)

Training Adaline model for 20 epochs

Modeling
for i in range(20):
  output_pred = aktivacna_fn(vnutorny_potencial(biased_X, weights))
  errors = y - output_pred
  weights += (learn_rate * biased_X.T.dot(errors))
  cost_i = (errors**2).sum() / 2.0
  cost.append(cost_i)

Implementing Adaline algorithm with automatic data scaling

Modeling
class Adaline(object):

    def __init__(self, learn_rate = 0.001, iterations = 10000):
        self.learn_rate = learn_rate
        self.iterations = iterations

    def fit(self, X, y, biased_X = False, standardised_X = False):
        if not standardised_X:
            X = self._standardise_features(X)
        if not biased_X:
            X = self._add_bias(X)
        self._initialise_weights(X)
        self.cost = []

        for cycle in range(self.iterations):
            output_pred = self._activation(self._net_input(X))
            errors = y - output_pred
            self.weights += (self.learn_rate * X.T.dot(errors))
            cost = (errors**2).sum() / 2.0
            self.cost.append(cost)
        return self

    def _net_input(self, X):
        return np.dot(X, self.weights)

    def predict(self, X, biased_X=False):
        if not biased_X:
            X = self._add_bias(X)
        return np.where(self._activation(self._net_input(X)) >= 0.0, 1, 0)

    def _add_bias(self, X):
        bias = np.ones((X.shape[0], 1))
        biased_X = np.hstack((bias, X))
        return biased_X

    def _initialise_weights(self, X):
        random_gen = np.random.RandomState(1)
        self.weights = random_gen.normal(loc = 0.0, scale = 0.01, size = X.shape[1])
        return self

    def _standardise_features(self, X):
        X_norm = (X - np.mean(X, axis=0)) / np.std(X, axis = 0)
        return X_norm

    def _activation(self, X):
        return X 

Creating and training Adaline classifier

Modeling
classifier = Adaline(learn_rate = 0.001, iterations = 100)
a = classifier.fit(X, y)

Displaying final trained model weights

Modeling
a.weights

Defining sequential model with three layers

Modeling
model = Sequential()
model.add(Dense(48,input_shape=(6,),activation="sigmoid"))
model.add(Dense(6,activation="sigmoid"))
model.add(Dense(1))

Compiling model with Adam optimizer and MSE loss

Modeling
model.compile(optimizer="adam", loss="mse") 

Defining model for multi-class classification

Modeling
model = Sequential()
model.add(Dense(4,input_shape=(2,),activation="relu"))
model.add(Dense(4,activation="softmax"))

Compiling model with binary cross-entropy

Modeling
model.compile(optimizer="adam", loss="binary_crossentropy")

Defining lambda function for simple addition

Modeling
x = lambda a: a + 100

Text Processing

Cleaning text (removing special characters/punctuation)

Data preparation
h_rights = h_rights.replace('\n', ' ')
h_rights = h_rights.replace("\ufeff", ' ')
h_rights = h_rights.replace(',', ' ')
h_rights = h_rights.replace('.', ' ')

Tokenizing text into words

Data preparation
slova = h_rights.split()

Function for counting words in tweets

Modeling
def tweet_count(row):
    my_var = row['tweet']
    return len(my_var.split())

Basic regex match test

Modeling
re.match('abc','abcdefgh')

Detecting hashtags with regex

Modeling
re.search('#[A-Za-z0-9]+', tweet)

Extracting all hashtags from tweet

Modeling
[w for w in tweet.split() if re.search('#[A-Za-z0-9]+', w)]

Demonstrating regex findall for 'b.+ing' pattern

Modeling
sentence1 = "In the beginning was the Word"
re.findall("b.+ing", sentence1)

Validating emails with regex

Modeling
sent = 'My email is jkapusta@ukf.sk and my colleague has mdrlik@ukf.sk . This is the bad email: jkkkapusta@u.k'
[w for w in sent.split(" ") if re.search("[a-z]+@[a-z.]+.[a-z]{2,3}$",w)]

Validating emails with regex

Evaluation
[w for w in sent.split('"') if re.search("^[a-zA-Z0-9+-_.]{1,64}@[a-zA-Z0-9-]{1,255}\.[a-zA-Z0-9-.]{2,}$", w)]

Tokenizing text into words

Data preparation
array = word_tokenize(text)

Normalizing text to lowercase

Data preparation
smalym = text.lower()

Tokenizing normalized text

Data preparation
word_tokenize(smalym)

Creating word frequency distribution

Evaluation
v = Counter(word_tokenize(smalym))

Getting top 5 frequent words

Evaluation
v.most_common(5)

Web Scraping

Extracting emails from university website

Modeling
link = "http://www.tu.ff.ukf.sk/kontakty"
stranka = requests.get(link)
sent = stranka.text