# Difference between revisions of "Pair Database"

Line 6: | Line 6: | ||

We have setup a high performance cluster which allows us for each of these >10,000,000 pairs perform hundreds of tests and calculations. All the data is then indexed in a special database for fast searching. The system is able to execute all kinds of complex queries on this database in less than 30 seconds. | We have setup a high performance cluster which allows us for each of these >10,000,000 pairs perform hundreds of tests and calculations. All the data is then indexed in a special database for fast searching. The system is able to execute all kinds of complex queries on this database in less than 30 seconds. | ||

+ | == Data description == | ||

+ | For each pair, we have performed certain tests and calculations. You can filter pairs using any of these measures/ tests results. In addition, you can also sort results based on some of these measures. | ||

+ | |||

+ | === General statistics === | ||

+ | * Slow correlation - average correlation with period 240 applied to the whole period, value between -1..1 | ||

+ | * Fast correlation - average correlation with period 60 applied to the whole period, value between -1..1 | ||

+ | * Half-life of ratio - [http://marcoagd.usuarios.rdc.puc-rio.br/half-life.html half-life] of the ratio series | ||

+ | |||

+ | It is important to mention something at this stage - some measures (like the half-life) differ with the variable ordering. What does it mean? A pair is created from two equities (legs) - for instance A and B. The variable ordering is about if you create the pair like A/B or like B/A. For some measures (like correlation) it does not matter. But for instance for calculating the half-life of ratio series it does, because apparently A/B pair has a different ratio series from B/A pair. How has this been solved in the database? | ||

+ | For all cases where the ordering matters, we will calculate the measure for each order (half-life of A/B and half-life of B/A in this case). | ||

+ | |||

+ | Then we apply this trick: | ||

+ | * we will save the better result of the measure (over A/B vs B/A) and we will call it '''the best case scenario''' (loose matching when searching) | ||

+ | * we will save the worse result of the measure (over A/B vs B/A) and we will call it '''the worst case scenario''' (strict matching when searching) |

## Revision as of 11:36, 30 March 2015

This is the manual for the new pair database system released at Mar 30th 2015. The new database fully replaced the older solution introduced in 2012 which was based on studies applied to all sector-based pairs. This new solution is standalone (not based on studies anymore) and it includes cointegration and orthogonal measures too.

Unlike the old system, this pair databases includes all brute-force combinations of qualified US equities **regardless of sectors**. Equity is considered qualified to be in the database if for last 2 months of the screening period its average price exceeds 0.2 USD and its average volume exceeds 80k. So for the database period of Jan 1st 2012 - Jan 1st 2015 it gives more than 10,000,000 pairs.

For each pair, we have taken last 3 years of data and we calculated profit measures, orthogonal stats, we have performed cointegration tests, etc...Practically we have included most data you can see play with in the pair analyzer.

We have setup a high performance cluster which allows us for each of these >10,000,000 pairs perform hundreds of tests and calculations. All the data is then indexed in a special database for fast searching. The system is able to execute all kinds of complex queries on this database in less than 30 seconds.

## Data description

For each pair, we have performed certain tests and calculations. You can filter pairs using any of these measures/ tests results. In addition, you can also sort results based on some of these measures.

### General statistics

- Slow correlation - average correlation with period 240 applied to the whole period, value between -1..1
- Fast correlation - average correlation with period 60 applied to the whole period, value between -1..1
- Half-life of ratio - half-life of the ratio series

It is important to mention something at this stage - some measures (like the half-life) differ with the variable ordering. What does it mean? A pair is created from two equities (legs) - for instance A and B. The variable ordering is about if you create the pair like A/B or like B/A. For some measures (like correlation) it does not matter. But for instance for calculating the half-life of ratio series it does, because apparently A/B pair has a different ratio series from B/A pair. How has this been solved in the database? For all cases where the ordering matters, we will calculate the measure for each order (half-life of A/B and half-life of B/A in this case).

Then we apply this trick:

- we will save the better result of the measure (over A/B vs B/A) and we will call it
**the best case scenario**(loose matching when searching) - we will save the worse result of the measure (over A/B vs B/A) and we will call it
**the worst case scenario**(strict matching when searching)