Some useful tips in Python

Ruoxinli
1 min readJan 1, 2021
  1. Extract numbers from a string

Some data comes in a string format with some critical numbers we need to calculate and modify. Below is how we can extract the number from the string no matter if the number is an integer or decimal number. Most of the time, the number will separate with the words with space, but sometimes they are adjacent.

# first step is to insert a space between number and the words. Using regular expression to do so. import re
b = (re.sub(r"([0-9]+(\.[0-9]+)?)",r" \1 ", a).strip())
# second step is to creat a list for all the numbers in the string num_lst = [float(s) for s in b.split(" ") if re.match(r'[0-9]+(\.[0-9]+)?', s)]

2. Create Bins for categorical data with numbers

Sometimes we want to treat numeric data as categorical data such as years, NAICS code, rating, etc. We will need to create bins to slice and save the data. Below is using the year from 1916 to 2017 as an example using pandas cut function.

import pandas as pd
import numpy as np
# bin years and convert in to dummies
bin_year = [1916, 1991, 2006, 2010, 2013, 2017]
year_range = ['1916-1991', '1991-2006', '2006-2010', '2010-2013', '2013-2017']
year_bin = pd.cut(df['year'], bin_year, labels = year_range)
# get dunnies
d_year = pd.get_dummies(year_bin).astype(np.int64)
d_year = pd.get_dummies(year_bin).astype(np.int64)

Will keep adding to the list.

--

--