crsp_siz_to_ciz.md (8776B)
1 # Transitioning to the new CRSP file format 2 3 CRSP is changing the way they disseminate price data. 4 This is mostly relevant for the daily and monthly stock price data. 5 6 General information and code examples are available on [WRDS Website](https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/crsp/stocks-and-indices/crsp-stock-and-indexes-version-2/). 7 I try to provide a short guide about how I went about converting some of the most basic crsp filters to the new format. 8 Note that the legacy files are named `SIZ` (CRSP 1.0) and the new file format is `CIZ` (CRSP 2.0). 9 10 WRDS has excellent guides, and what follows is mainly for my reference and how we would do this simply in julia. 11 [crsp-ciz-faq](https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/crsp/stocks-and-indices/crsp-stock-and-indexes-version-2/crsp-ciz-faq/) 12 13 ## Getting mapping tables from old to new formats 14 First, I am going to reference the main mapping and metadata tables that allow to quickly convert old code into modern one. 15 These tables are available from the WRDS postgres server 16 17 ```julia 18 using FinanceRoutines 19 using DataPipes, DataFrames, DataFramesMeta 20 import LibPQ: LibPQ.execute, LibPQ.Connection # to connect directly to the server 21 import Tables: columntable 22 23 wrds_conn = FinanceRoutines.open_wrds_pg(); # open a wrds connection with credentials 24 see(df) = show(df, allcols=true, allrows=true, truncate=0) # to view df without column truncation 25 ``` 26 27 ```julia 28 const get_postgres_table = FinanceRoutines._get_postgres_table 29 # main table for converting columns 30 siz_to_ciz = get_postgres_table("crsp", "metasiztociz"; wrds_conn=wrds_conn) |> DataFrame 31 # flag information 32 flag_info = get_postgres_table("crsp", "metaFlagInfo"; wrds_conn=wrds_conn) |> DataFrame 33 flag_coverage = get_postgres_table("crsp", "metaFlagCoverage"; wrds_conn=wrds_conn) |> DataFrame 34 item_info = get_postgres_table("crsp", "metaItemInfo"; wrds_conn=wrds_conn) |> DataFrame 35 36 stock_names = get_postgres_table("crsp", "stocknames"; wrds_conn=wrds_conn) |> DataFrame 37 stock_info_hist = get_postgres_table("crsp", "StkSecurityInfoHist"; wrds_conn=wrds_conn) |> DataFrame 38 ``` 39 40 ## Datasets 41 42 I do not fully understand the difference between `stkmthsecuritydata` and `msf_v2` dataset (first is underlying data, second is somehow merged) 43 44 In one of the transition slides, there is a dataset mapping from `SIZ` to `CIZ` and the final datasets `DSF_V2` and `MSF_V2`. 45 46 | Old File Format: MSF 1.0 or **SIZ** | New File Format: MSF 2.0 **CIZ** | **WRDS** | 47 |-------------------------------------|----------------------------------|------------------| 48 | `DSF` | `stkDlySecurityData` | `DSF_V2` | 49 | `MSF` | `stkMthSecurityData` | `MSF_V2` | 50 | ` ` | | | 51 | `StockNames` | `stkSecurityInfoHist` | `StockNames_V2` | 52 | `DSE` | `stkDelists` | | 53 | `MSE` | `stkDistributions` | | 54 | ` ` | | | 55 | `DSI` | `indDlySeriesData`(`_ind`) | | 56 | `MSI` | `indMthSeriesData`(`_ind`) | | 57 |-------------------------------------|----------------------------------|------------------| 58 59 60 61 62 ### Main Stock Files 63 64 ```julia 65 postgre_query = """ 66 SELECT * 67 FROM crsp.msf_v2 68 WHERE mthcaldt >= '2000-01-01' AND mthcaldt <= '2002-01-01' 69 """ 70 msf_v2 = execute(wrds_conn, postgre_query) |> columntable |> DataFrame 71 72 postgre_query = """ 73 SELECT * 74 FROM crsp.stkmthsecuritydata 75 WHERE mthcaldt >= '2000-01-01' AND mthcaldt <= '2002-01-01' 76 """ 77 stkmthsecuritydata = execute(wrds_conn, postgre_query) |> columntable |> DataFrame 78 setdiff(names(msf_v2), names(stkmthsecuritydata)) 79 ``` 80 81 ### Information on Stocks 82 83 ```julia 84 postgre_query = "SELECT * FROM crsp.stkSecurityInfoHist" 85 stksecurityinfohist = execute(wrds_conn, postgre_query) |> columntable |> DataFrame 86 postgre_query = "SELECT * FROM crsp.stocknames_v2" 87 stocknames_v2 = execute(wrds_conn, postgre_query) |> columntable |> DataFrame 88 89 names(stksecurityinfohist) 90 names(stocknames_v2) 91 ``` 92 93 ### Index Files 94 95 ```julia 96 indmthseriesdata = execute(wrds_conn, "SELECT * FROM crsp.indmthseriesdata") |> DataFrame 97 98 # more information on indices 99 indseriesinfohdr = execute(wrds_conn, "SELECT * FROM crsp.IndSeriesInfoHdr") |> DataFrame |> see 100 indfamilyinfohdr = execute(wrds_conn, "SELECT * FROM crsp.IndFamilyInfoHdr") |> DataFrame |> see 101 stkindmembership = execute(wrds_conn, "SELECT * FROM crsp.stkindmembership_ind") |> DataFrame |> see 102 ``` 103 104 105 106 107 108 ## Standard Filters 109 110 ### CRSP Share Codes Filters 111 It is standard to impose in the legacy file that the share codes is either `10` or `11`. 112 For transparency, CRSP replaced the variable `SHRCD` with multiple flags that convey the information more clearly. 113 114 First we are going to want to see the mapping in the metadata mapping table: 115 ```julia 116 @rsubset(siz_to_ciz, :sizitemname == "SHRCD") 117 # see how the split is down precisely 118 unique(@rsubset(siz_to_ciz, :sizitemname == "SHRCD"), :cizitemname) 119 ``` 120 We have five different flags that correspond to the legacy share codes. 121 How to map specifically the share code: recall that the first digit `1` translates to "ordinary common shares" and the second digit `0` or `1` translates to securities which "have not been further defined" or "need not be further defined" respectively. 122 123 The new flags are `ShareType`, `SecurityType`, `SecuritySubType`, `USIncFlg`, and `IssuerType`. 124 We can look at `ShareType` in the `metaFlagInfo` table: 125 ```julia 126 @rsubset(flag_info, contains(:flagtypedesc, r"share.*type"i)) 127 ``` 128 We can view how they map 129 ```julia 130 @p outerjoin( 131 unique(select(stock_names, :permno, :shrcd)), 132 unique(select(stock_info_hist, :permno, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype)), 133 on = :permno) |> 134 @rsubset(__, :shrcd ∈ [10, 11]) |> 135 groupby(__, [:shrcd, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype]) |> 136 combine(__, nrow) 137 ``` 138 The mapping at this point is less than obvious, so we gather some more information on the meaning of the relevant flags: 139 ```julia 140 import Unicode 141 function get_info_flag(flag_name::String) 142 innerjoin(flag_info, 143 select(unique( 144 @rsubset(flag_coverage, Unicode.normalize(:itemname, casefold=true)==Unicode.normalize(flag_name, casefold=true)), 145 :flagvalue), :flagkey), 146 on = :flagkey) 147 end 148 149 get_info_flag("ShareType") |> see 150 get_info_flag("SecurityType") |> see 151 get_info_flag("SecuritySubType") |> see 152 get_info_flag("USIncFlg") |> see 153 get_info_flag("IssuerType") |> see 154 ``` 155 156 To which it appears more clear that the proper mapping will be 157 ```julia 158 stock_info_hist_subset = @rsubset(stock_info_hist, 159 :sharetype ∈ ["NS", "N/A"], :securitytype=="EQTY", :securitysubtype=="COM", :issuertype ∈ ["ACOR", "CORP"], :usincflg=="Y") 160 @p outerjoin( 161 unique(select(stock_names, :permno, :shrcd)), 162 unique(select(stock_info_hist_subset, :permno, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype)), 163 on = :permno) |> 164 groupby(__, [:shrcd, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype]) |> 165 combine(__, nrow) 166 ``` 167 There still seems to be some discrepancy in the mapping. 168 If we do not want to worry, we simply use the [CRSP cross reference guide](https://www.crsp.org/wp-content/uploads/guides/CRSP_Cross_Reference_Guide_1.0_to_2.0.pdf) which leads us to this [mapping table](https://www.crsp.org/wp-content/uploads/ShareCode.html). 169 170 171 ### Exchange Filters 172 The legacy filters set the exchange code variable to `1`, `2`, or `3` (respectively for NYSE, ASE, or Nasdaq). 173 There is almost a direct mapping for exchange filters, though it also relies on two flag variables `conditionaltype` and `TradingStatusFlg` to account for halted or suspended trading (which were previously `-1` and `-2`). 174 Thus new version of the filter would read: 175 ```julia 176 stock_info_hist_subset = @rsubset(stock_info_hist, :primaryexch ∈ ["N", "A", "Q"]) 177 unique(innerjoin(stock_names, select(stock_info_hist_subset, :permno, :primaryexch), on = :permno), [:exchcd, :primaryexch]) 178 ``` 179 180 To remove halted trading we can filter the additional flags: 181 ```julia 182 stock_info_hist_subset = @rsubset(stock_info_hist, 183 :primaryexch ∈ ["N", "A", "Q"], :conditionaltype == "RW", :tradingstatusflg == "A") 184 get_info_flag("conditionaltype") |> see 185 get_info_flag("TradingStatusFlg") |> see 186 ``` 187 188 189 190 191