crsp_siz_to_ciz.md - FinanceRoutines.jl - Financial data routines for Julia

crsp_siz_to_ciz.md (8776B)
      1 # Transitioning to the new CRSP file format
      2 
      3 CRSP is changing the way they disseminate price data.
      4 This is mostly relevant for the daily and monthly stock price data.
      5 
      6 General information and code examples are available on [WRDS Website](https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/crsp/stocks-and-indices/crsp-stock-and-indexes-version-2/).
      7 I try to provide a short guide about how I went about converting some of the most basic crsp filters to the new format.
      8 Note that the legacy files are named `SIZ` (CRSP 1.0) and the new file format is `CIZ` (CRSP 2.0). 
      9 
     10 WRDS has excellent guides, and what follows is mainly for my reference and how we would do this simply in julia.
     11 [crsp-ciz-faq](https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/crsp/stocks-and-indices/crsp-stock-and-indexes-version-2/crsp-ciz-faq/)
     12 
     13 ## Getting mapping tables from old to new formats
     14 First, I am going to reference the main mapping and metadata tables that allow to quickly convert old code into modern one.
     15 These tables are available from the WRDS postgres server
     16 
     17 ```julia
     18 using FinanceRoutines
     19 using DataPipes, DataFrames, DataFramesMeta
     20 import LibPQ: LibPQ.execute, LibPQ.Connection # to connect directly to the server
     21 import Tables: columntable
     22 
     23 wrds_conn = FinanceRoutines.open_wrds_pg(); # open a wrds connection with credentials
     24 see(df) = show(df, allcols=true, allrows=true, truncate=0) # to view df without column truncation
     25 ```
     26 
     27 ```julia
     28 const get_postgres_table = FinanceRoutines._get_postgres_table
     29 # main table for converting columns
     30 siz_to_ciz = get_postgres_table("crsp", "metasiztociz"; wrds_conn=wrds_conn) |> DataFrame 
     31 # flag information
     32 flag_info = get_postgres_table("crsp", "metaFlagInfo"; wrds_conn=wrds_conn) |> DataFrame 
     33 flag_coverage = get_postgres_table("crsp", "metaFlagCoverage"; wrds_conn=wrds_conn) |> DataFrame 
     34 item_info = get_postgres_table("crsp", "metaItemInfo"; wrds_conn=wrds_conn) |> DataFrame 
     35 
     36 stock_names = get_postgres_table("crsp", "stocknames"; wrds_conn=wrds_conn) |> DataFrame 
     37 stock_info_hist = get_postgres_table("crsp", "StkSecurityInfoHist"; wrds_conn=wrds_conn) |> DataFrame 
     38 ```
     39 
     40 ## Datasets
     41 
     42 I do not fully understand the difference between `stkmthsecuritydata` and `msf_v2` dataset (first is underlying data, second is somehow merged)
     43 
     44 In one of the transition slides, there is a dataset mapping from `SIZ` to `CIZ` and the final datasets `DSF_V2` and `MSF_V2`. 
     45 
     46 | Old File Format: MSF 1.0 or **SIZ** | New File Format: MSF 2.0 **CIZ** | **WRDS**         |
     47 |-------------------------------------|----------------------------------|------------------|
     48 | `DSF`                               | `stkDlySecurityData`             | `DSF_V2`         |
     49 | `MSF`                               | `stkMthSecurityData`             | `MSF_V2`         |
     50 | ` `                                 |                                  |                  |
     51 | `StockNames`                        | `stkSecurityInfoHist`            | `StockNames_V2`  |
     52 | `DSE`                               | `stkDelists`                     |                  |
     53 | `MSE`                               | `stkDistributions`               |                  |
     54 | ` `                                 |                                  |                  |
     55 | `DSI`                               | `indDlySeriesData`(`_ind`)       |                  |
     56 | `MSI`                               | `indMthSeriesData`(`_ind`)       |                  |
     57 |-------------------------------------|----------------------------------|------------------|
     58 
     59 
     60 
     61 
     62 ### Main Stock Files
     63 
     64 ```julia
     65 postgre_query = """
     66 SELECT *
     67     FROM crsp.msf_v2
     68     WHERE mthcaldt >= '2000-01-01' AND mthcaldt <= '2002-01-01'
     69 """
     70 msf_v2 = execute(wrds_conn, postgre_query) |> columntable |> DataFrame
     71 
     72 postgre_query = """
     73 SELECT *
     74     FROM crsp.stkmthsecuritydata
     75     WHERE mthcaldt >= '2000-01-01' AND mthcaldt <= '2002-01-01'
     76 """
     77 stkmthsecuritydata = execute(wrds_conn, postgre_query) |> columntable |> DataFrame
     78 setdiff(names(msf_v2), names(stkmthsecuritydata))
     79 ```
     80 
     81 ### Information on Stocks
     82 
     83 ```julia
     84 postgre_query = "SELECT * FROM crsp.stkSecurityInfoHist"
     85 stksecurityinfohist = execute(wrds_conn, postgre_query) |> columntable |> DataFrame
     86 postgre_query = "SELECT * FROM crsp.stocknames_v2"
     87 stocknames_v2 = execute(wrds_conn, postgre_query) |> columntable |> DataFrame
     88 
     89 names(stksecurityinfohist)
     90 names(stocknames_v2)
     91 ```
     92 
     93 ### Index Files
     94 
     95 ```julia
     96 indmthseriesdata = execute(wrds_conn, "SELECT * FROM crsp.indmthseriesdata") |> DataFrame
     97 
     98 # more information on indices
     99 indseriesinfohdr = execute(wrds_conn, "SELECT * FROM crsp.IndSeriesInfoHdr") |> DataFrame |> see
    100 indfamilyinfohdr = execute(wrds_conn, "SELECT * FROM crsp.IndFamilyInfoHdr") |> DataFrame |> see
    101 stkindmembership = execute(wrds_conn, "SELECT * FROM crsp.stkindmembership_ind") |> DataFrame |> see
    102 ```
    103 
    104 
    105 
    106 
    107 
    108 ## Standard Filters
    109 
    110 ### CRSP Share Codes Filters
    111 It is standard to impose in the legacy file that the share codes is either `10` or `11`. 
    112 For transparency, CRSP replaced the variable `SHRCD` with multiple flags that convey the information more clearly. 
    113 
    114 First we are going to want to see the mapping in the metadata mapping table:
    115 ```julia
    116 @rsubset(siz_to_ciz, :sizitemname == "SHRCD")
    117 # see how the split is down precisely
    118 unique(@rsubset(siz_to_ciz, :sizitemname == "SHRCD"), :cizitemname)
    119 ```
    120 We have five different flags that correspond to the legacy share codes. 
    121 How to map specifically the share code: recall that the first digit `1` translates to "ordinary common shares" and the second digit `0` or `1` translates to securities which "have not been further defined" or "need not be further defined" respectively.
    122 
    123 The new flags are `ShareType`, `SecurityType`, `SecuritySubType`, `USIncFlg`, and `IssuerType`.
    124 We can look at `ShareType` in the `metaFlagInfo` table:
    125 ```julia
    126 @rsubset(flag_info, contains(:flagtypedesc, r"share.*type"i))
    127 ```
    128 We can view how they map 
    129 ```julia
    130 @p outerjoin(
    131     unique(select(stock_names, :permno, :shrcd)),
    132     unique(select(stock_info_hist, :permno, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype)),
    133     on = :permno) |>
    134     @rsubset(__, :shrcd ∈ [10, 11]) |>
    135     groupby(__, [:shrcd, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype]) |>
    136     combine(__, nrow)
    137 ```
    138 The mapping at this point is less than obvious, so we gather some more information on the meaning of the relevant flags:
    139 ```julia
    140 import Unicode
    141 function get_info_flag(flag_name::String) 
    142     innerjoin(flag_info,
    143         select(unique(
    144             @rsubset(flag_coverage, Unicode.normalize(:itemname, casefold=true)==Unicode.normalize(flag_name, casefold=true)), 
    145             :flagvalue), :flagkey),
    146         on = :flagkey)
    147 end
    148 
    149 get_info_flag("ShareType") |> see
    150 get_info_flag("SecurityType") |> see
    151 get_info_flag("SecuritySubType") |> see
    152 get_info_flag("USIncFlg") |> see
    153 get_info_flag("IssuerType") |> see
    154 ```
    155 
    156 To which it appears more clear that the proper mapping will be
    157 ```julia
    158 stock_info_hist_subset = @rsubset(stock_info_hist, 
    159     :sharetype ∈ ["NS", "N/A"], :securitytype=="EQTY", :securitysubtype=="COM", :issuertype ∈ ["ACOR", "CORP"], :usincflg=="Y")
    160 @p outerjoin(
    161     unique(select(stock_names, :permno, :shrcd)),
    162     unique(select(stock_info_hist_subset, :permno, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype)),
    163     on = :permno) |>
    164     groupby(__, [:shrcd, :sharetype, :securitytype, :securitysubtype, :usincflg, :issuertype]) |>
    165     combine(__, nrow)
    166 ```
    167 There still seems to be some discrepancy in the mapping.
    168 If we do not want to worry, we simply use the [CRSP cross reference guide](https://www.crsp.org/wp-content/uploads/guides/CRSP_Cross_Reference_Guide_1.0_to_2.0.pdf) which leads us to this [mapping table](https://www.crsp.org/wp-content/uploads/ShareCode.html).
    169 
    170 
    171 ### Exchange Filters
    172 The legacy filters set the exchange code variable to `1`, `2`, or `3` (respectively for NYSE, ASE, or Nasdaq).
    173 There is almost a direct mapping for exchange filters, though it also relies on two flag variables `conditionaltype` and `TradingStatusFlg` to account for halted or suspended trading (which were previously `-1` and `-2`).
    174 Thus new version of the filter would read:
    175 ```julia
    176 stock_info_hist_subset = @rsubset(stock_info_hist, :primaryexch ∈ ["N", "A", "Q"])
    177 unique(innerjoin(stock_names, select(stock_info_hist_subset, :permno, :primaryexch), on = :permno), [:exchcd, :primaryexch])
    178 ```
    179 
    180 To remove halted trading we can filter the additional flags:
    181 ```julia
    182 stock_info_hist_subset = @rsubset(stock_info_hist, 
    183     :primaryexch ∈ ["N", "A", "Q"], :conditionaltype == "RW", :tradingstatusflg == "A")
    184 get_info_flag("conditionaltype") |> see
    185 get_info_flag("TradingStatusFlg") |> see
    186 ```
    187 
    188 
    189 
    190 
    191
	FinanceRoutines.jl Financial data routines for Julia
	Log \| Files \| Refs \| README \| LICENSE