Using Financial Networks to Predict Stock Market Volatility

Predicting future market volatility and why it's a non-trivial task.

The stock market is a highly dynamic system that exhibits complex behavior due to interactions between things like economic indicators, market sentiment, and investor behavior, making it hard to predict. In addition, the behavior of this system can have severe consequences in people's lives as it drives much of the financial investment in the economy, so understanding the stability of this system at any point is essential. By creating a cohesive model of the dynamics of financial markets, we can gain insight into how a series of events can lead to instability and, in turn, economic volatility. This article will explain the construction of the model using financial networks. Additionally, it will define an algorithm that utilizes both current events data and this model's structure to make predictions about future market volatility.

Financial networks are graphs where each node is an actor in an economy like a company or country, and the connections, along with the magnitude, describe some relationship between these actors. Previous research has widely used these networks to represent economies of different sizes to gain insight into how actors interact. By comprehending the structure and connections between actors in this network, we can obtain critical information about how the volatility of one actor can impact others in the market. There are two commonly used financial networks: correlation networks and causation networks. Correlation networks provide insight into how different actors in the market move together at any moment in time. In contrast, causation networks reveal how one actor may cause a change in another in the future. By understanding how specific market conditions can impact future movements, causation networks can provide information on the stability of a market in future timesteps given specific initial conditions or events.

These cause-and-effect relationships among actors in the stock market will be represented by a directional causation financial network, with each node being a company in the stock market. We will utilize Transfer Entropy to measure the amount of active information that flows from one node to another for the connections between these companies. Transfer Entropy provides a measure of how a change in one actor could result in a change in another actor in the future. After this network is generated, an algorithm can begin to understand how events happening inside this network would interact, thereby offering insight into the future stability and robustness of the market at any given point. So understanding how an effect of one actor ripples, we can gain insight into the robustness of the actors around it.

Effectively this algorithm will find a coefficient that optimizes:

The probability of a high future volatility of a company given that:

a) The company has low current volatility

b) The company has a large coefficient

The added condition of low current volatility is part of the significance of the predictive power of the coefficient. The coefficient is meant to be an indicator of future volatility constructed from economic events that occur around the node of interest in the graph. If the node itself is already showing high volatility, it is not easy to differentiate the predictive power of the coefficient over the company just having high current volatility.

An example of the type of information we are trying to model is above—multiple companies connected with numerous degrees of separation and sometimes circular dependencies. When a shock happens, like increased regulation of Lithium mines, that effect can then be tracked through the network, giving information into how other companies' stock prices might change. When a complete network is created, and multiple events are considered, we can try and make a model that can give a picture of the future effects of events in any given period.

Have these causation networks been used before, and how were they used? What makes this analysis unique?

Previous research has extensively used transfer entropy to study the interaction between market agents because of its ability to measure causality. These works have primarily focused on using the network structure to make high-level predictions about how specific market agents interact for theoretical market events or explain past behavior between agents. Although analyzing network properties and past behavior is crucial, this article proposes an algorithm that provides particular insights into future behavior based on network structure and current events. This article also presents an experimental setup using real-world data to demonstrate how causation networks can reveal statistical likelihoods of future volatility and where it might occur.

Financial networks created in previous works in this area vary in several aspects, including the definition of nodes, the time interval of the data, the information being transferred, and the scale of analysis. For instance, Sandoval, L [1] discusses research where each node represents a company, using day-by-day data and stock price as the information transferred and analyzing the top 197 companies by market capitalization. Other studies consider nodes such as asset classes of commodities [2] or cryptocurrency [3]. The time intervals may range from days to years, as explored in [4], which examines the connectedness of markets based on the size of rolling windows. The information being transferred in these networks may include price, volume, or volatility.

Although the underlying approach is similar, the selection of the four criteria mentioned above can yield substantially different insights into how nodes interact with each other. While this article shares several characteristics with previous research, such as using companies as nodes, transferring price information, and analyzing data in windows ranging from days to weeks, there is a significant difference in this article's setup. While previous work focused on smaller networks with nodes representing powerful market agents, this paper examines a network that is an order of magnitude larger than previous research in both amount of nodes and connections, allowing for additional smaller agents. The larger network size can reveal how minor associations might affect the more extensive network's stability.

This article employs several methods presented in previous research into the creation of the transfer entropy network, such as effective transfer entropy, introduced in [5]. E.T.E. reduces noise in the Transfer Entropy calculation by eliminating the average of random noise of the measurement. The concept of predicting the impact of specific events using the network structure and the proximity of other nodes to the event has also been done, as presented in earlier studies like [6], which employ Minimum Spanning Trees to identify these close neighbors. Additionally, this article employs similar techniques to measure the transfer entropy's significance between any two-time series of data, as presented in [7], which uses random trials and T-Tests to determine if the measured transfer entropy is significant. While these separate methods have been used in many previous works, this article aims to bring these together to create a network and algorithm with the best chance of providing significant results.

One of this article's main contributions is using a modified random walk with a restart algorithm, which takes a significant change in a company's price as input and provides a list of other nodes that may become more volatile based on that event. It also gives a coefficient that indicates the likelihood of a significant change occurring for each node in the network. By running this algorithm for all significant events seen at any given timestep, the paper can provide a detailed understanding of the future volatility of the network and identify nodes that may be most susceptible to future volatility. This paper's approach provides finer-grained results than previous works which focused on rare significant events or high-level interactions.

What data do we need, and how will we collect it?

A complete and continuous dataset with many data points is vital to conduct these experiments successfully. Therefore, all companies listed on the AMEX, NASDAQ, and NYSE with a market capitalization over $50 million as of January 11th, 2021 were used. Companies with an IPO date within the last five years were excluded, leaving 2800 companies with five years of continuous data. This constructs an entire graph of about 9 million connections.

The pricing data was collected by retrieving the daily close prices for the companies using the open platform yfinance, and the total market index ticker VTI was obtained to normalize the stock price movements. Two data sets were constructed using this daily data: the unmodified single-day timestep data resulted in 1000 data points, and sampling every five days resulted in 200 data points. This results in what will be called the 1-day and 5-day data sets. These sets of data were split further into training sets and testing sets. The training set is all pricing data before 3/1/2022, and the testing set is after 3/1/2022. The network will be constructed on only training data so there is no data leakage when testing the resulting algorithm on future data. In addition, characteristics of the data, such as distribution metrics, are calculated on this past data for evaluation when running the model on future data.

Below shows, in general how the data will be used. The past data will be used to construct the Transfer Entropy Network at a given date, and a subset of that past data will be used to find events and tune the model that will be run on top of the network. Then the future data will be used to test how well the model performs.

For Effective Transfer Entropy calculations, the log will be taken of all data points to reduce the value range. This is to make the Transfer Entropy calculation more effective between companies with large and small stock prices, as the magnitude of the stock price is determined by outstanding shares versus actual market forces. This difference needs to be normalized in some fashion. This log modification to the time series of data was seen in previous works such as [1] with good effect.

Why use pricing data instead of more fine-grain indicators?

For this article, I focused on a company's stock price because price changes best represent the market's consensus on the effects and information known at any given period. In addition, pricing information contains information such as revenue, future outlook, product demand, etc., so it's an excellent single-inclusive measure of the economics of a company. You could make a multi-layered network that tried to make changes in these more fine-grain indicators; however, that might significantly increase the complexity of the model needed to make predictions correctly.

What exactly is Effective Transfer Entropy?

The effective transfer entropy and its statistical significance will be used to analyze the chosen companies' relationships. Transfer entropy measures the amount of information transferred between two time series of data while accounting for any shared historical effects. This project aims to calculate a more comprehensive transfer entropy by incorporating VTI as a background time series to filter out any total market comovement in the data. Transfer Entropy measures how much a change in one company's stock price will affect another, considering market forces that affect every company.

The transfer entropy formula between two time series is defined as follows:

Figure 3. Transfer Entropy Calculation [10]

Essentially, this analysis measures the probability of predicting the next step in the target time series (y) based on the previous steps of the source time series (x), while also considering the likelihood of predicting the next step of y based on the earlier steps of both y and the background process W. (For this study, only one background process is used).

Figure 4. Transfer Entropy Simplified [1]

Another benefit of transfer entropy is that it is agnostic in the direction of correlation. If the same two time series of data had a negative versus a positive correlation, the same transfer entropy would be calculated. This is important as this article investigates total volatility, not just shared directional movement between companies.

As established by previous research, the transfer entropy measurement can be susceptible to noise. To address this issue two methods are utilized: the Effective Transfer Entropy and the calculation of significant Transfer Entropy. Both techniques involve a similar analysis where the "Random" Transfer Entropy between the two companies is computed between a randomized source time series and a fixed target and background time series.

To obtain the Random Transfer Entropy, the source time series is randomized while the target and background time series stay constant; then, the Transfer Entropy calculation is made. This process is repeated for thirty trials, storing the resulting values in an array. We then calculate the mean of this array to obtain the Random Transfer Entropy. This value is then subtracted from the standard estimated transfer entropy to derive the Effective Transfer Entropy.

E.T.E. = Calculated T.E. – Average Random T.E.

Significance is determined using a one-sample t-test to assess whether the measured transfer entropy is greater than the mean of the randomized Transfer Entropy trials. The t-test is used to minimize trials and account for the unknown standard deviation of the randomized trials. Any measurement with a p-value less than 0.01 is retained for network creation.

uo: um – ur > 0

uo: Null Hypothesis

um: Measured Transfer Entropy

ur: Mean of Randomized Transfer Entropy

Another variable needed for the transfer entropy is a value k, or how many past timesteps are used for the Transfer Entropy calculation. In the formal case, k should be infinity [Lizier2012, Active Information]. However, the library doesn't support this and is computationally intensive. But as k increases, the measurement of the actual complete transfer entropy is more accurate. The above calculation was done for combinations of companies as the source and target for k=1,2,3,4 for the 1-day and 5-day data sets to try and deduce a proper value of k. The results are below, showing the total accumulated E.T.E. found and the number of significant connections found for each value of k for each data collection.

As we see by the time k is four for each data set, we approach the actual transfer entropy. These calculations were done on a randomized subset of 2.25 million source-target pairs of companies to save on computation time:

The candlestick plot displayed the distribution of significant transfer entropies found for each period for each k. Each plot shows that a k of 4 is sufficient to get close to the actual transfer entropy. Its necessary k is manageable as it can significantly increase computation time, but it needs to be large enough to capture the actual transfer entropy.

The k value calculation is one area I originally got wrong for the original project for class. I used a k value correlated to the most considerable resulting sum of Transfer Entropies. This proved to make a very noisy network, and later reading showed that a better approach is to use a value of k that is large enough to be close to the true Transfer Entropy but small enough not to increase computation time dramatically.

Now that we have the data and know how to calculate connections between nodes, how will we make the network?

All the selected companies are taken and put into unique source-target pairs where the source does not equal the target resulting in about 9 million pairs. The Effective Transfer Entropy calculation is performed with k equal to 4 and a background process of VTI along with the significance test for each pair. All pairs with a significance of p > 0.01 are filtered out. Then for each set of data (1-day,5-day), a transfer entropy threshold is found that filters out about 95% of the pairs. This threshold acts like a filter of background noise connecting all actors in the market. The resulting PDF of the filtered graph in the results section displays an underlying network closely following the power law. This filtered network is then used for all future calculations. This type of thresholding was also seen in previous works like ....

Figure 6. PDF for the Network Pre-Filter (Left) and Post-Filter (Right)

Above are the PDF distributions of the pre-filtered (Left) node degrees and post-filtered (Right) node degrees. Pre-filter, the network carries a very random structure; however, it does show multiple possible distributions within the network with various peaks. When the filter is applied, the graph becomes closer to the power law showing a singular distribution. When running experiments with changing this filter, the more the network followed the power law, the better the models performed.

This could be because if all connections are considered, there is too much noise and too many underlying connections. Once the final filtered list of pairs is found, a directional weighted graph is generated where each node is a company, and the connection between nodes is the directional effective transfer entropy.

Figure 7. Example Transfer Entropy Network

Above is an example network created for an early iteration of the project. The network typically showed many communities of companies with a group of companies that act like a glue connecting them. With many of these following the power law, that means there was a small subset of each group that was very well connected. Typically these companies were very well connected within the group or had many weaker connections outside the group with a couple of powerful links to the inside. This leads to more analysis of what I call "Gateway Companies," where shocks from other groups have to flow through to get inside the other group. Typically, these companies were deep into commodities, energy, or finance, which makes sense. I may go more into this analysis in a further article, as this one mainly focuses on the model run on top of this network and not the network structure itself.

Using a Random Walk with Restart algorithm to tell us what companies might be volatile given a single event.

Once the network is constructed, there is a structure on which to run an algorithm to investigate how a significant event might affect the market. The algorithm created takes in an "event," defined as two continuous timesteps of highly volatile price movement. The event is defined as happening on that second day of volatility to prevent "looking into the future" when calculating. This event will consist of a node (or company) and the magnitude of the event. The algorithm will be run on this event and output a table with companies as entries and a coefficient as values. This table's set of entries represents a neighborhood of close companies similar to a Minimum Spanning Tree. The coefficient aims to show how affected that company will be by the selected event. This algorithm only runs on a single event; later, this paper will explore how a master table is constructed from a timestep with multiple events.

One trial of the model algorithm will be as follows.

Start at the starting node. Set depth = 0. Table = {}
Create a list of weights for each outgoing edge equal to (Edge Weight / (factor_base ^ depth)) ^ prob_base
Add restarting with a probability of restart_factor to the choices.
Randomly select an option based on weight.
If a restart is selected, go to starting node and set depth to 0; otherwise, go to the selected output node.
Add 1/(depth + 1) to the selected node in the table.
If depth is greater than or equal to limit_depth exit, else add one to depth and go to step 2.

Each of the trials will be merged into a single table. The following is the description of the parameters of this algorithm.

limit_depth: The farthest number of steps the algorithm will take

factor_base: Determines how fast the probabilities of each output node diminish the farther the algorithm gets from the start.

restart_factor: How likely a restart is for any edge traversal.

probability_base: This can make the probability of each edge more exponentially distributed.

trails: The number of times the algorithm is run to accumulate table values

In summary, this algorithm takes in a company and the size of a price movement. It will output a table of companies affected by this event and a coefficient of how much that company might be affected. Later, we determine how to evaluate how close this table is to representing reality.

A small example of the algorithm.

Figure 8. A single trail of the algorithm.

Above is a quick example of how a table is created from an event.

A signifigant price change is seen in Q’s stock price.
With a random selection based on the edge weights it will walk to the “NXTX” node. “NXTX” then gets updated to 1 in the table.
The same random selection happens again and “NXP” is selected.
“NXP” then gets updated with a reduced factor of 0.5.
The same random selection happens again and “RIOT” is selected.
“RIOT” then gets updated with a reduced factor of 0.5.

This is a single trail of the algorithm and multiple trials are run to build up the table values. Also top note at any of these steps the walk could restart to the origin node and continue the walk based on the restart_factor.

Now that we know how to tabulate the effects of one event, how do we merge tables for multiple events?

The next step is that for a specific time step, all significant events are found, for example, all companies in a particular day or week with two consecutive periods of high volatility. All of the events in that timestep are run through the algorithm resulting in multiple tables. The tables are then merged to create a master table for that given timestep. This master table contains nodes and their likelihood of future volatility in the coming timesteps based on the complete set of events in that period. The merging function for the tables is a simple addition. Using multiplication or some non-linear combination gave a range of difficult values to analyze. Many of the differences in most companies were too small to be significant compared to those with more significant coefficients.

Now that we have the network and algorithm, we need to find a way to optimize the algorithm's parameters and evaluate how well the table represents future volatility.

Trying to optimize and measure the results by focusing on one event at a time.

My first set of analyses focused on optimizing the algorithm above to perform its best by optimizing its output for a single event. ( Essentially concentrating on a single table versus the master table ). To do this, I found all significant events over the "Tuning Period" of data. The algorithm was run on all these events individually, resulting in many tables. Each table was then evaluated in terms of how well it could predict future price movement the days or weeks after the event the table represented.

To measure the success of any given table, I plotted the log of the value in the table versus the future price difference seen for that company. The future price difference, in this case, sometimes was just the next time step, an average over the next four time steps, and so on. I discovered that about 30% of tables typically had a significant positive correlation when the p-value was 0.10. But when I looked into the insignificant correlations, I saw graphs like the one below:

Figure 9. Table Value to Future Volatility Graph

When just looking at single events, companies with significant changes often would have small values. These would cause the positive correlation to become less significant and, at times, make it a significant negative correlation. However, upon further analysis, this isn't necessarily a bad indicator for an individual table's performance. With this model, I am not saying that the company won't have a significant price change if it has a low value, as another event elsewhere could have caused that. I tried to filter out companies with substantial price differences the previous week or ones connected to other nodes with significant price differences to try and find the

This worked as the number of significant tables increased. However, this filter was another set of parameters that had to be tuned. So, in the end, I decided, while looking at single events provided helpful in finding ballpark ranges for the parameters and provided easy initial analysis, that it would be best to tune and focus on the multi-event model as focusing on optimizing one model at the very end is easier (however may be more computationally expensive)

Measuring the results by focusing on all events in a given timestep.

The difference in the multiple-event analysis is that all significant events in each timestep are collected, and a master table is generated by running all events in each timestep through the RWWR algorithm and merging their tables. Whereas the single event analysis just looked at the unmerged tables and tried to measure the performance of each table. This multi-event model results in a master table for each timestep in future data. For example, in the 5-day timestep, the master table will list companies and their likelihood of future volatility in coming weeks based on all of the price change events seen in that 5 day period.

Two measurements are done on the list of master tables.

Probability of a Selected Company being Correct verses Random Selection

The calculation above will give an idea of how valuable the table value is to provide insight into how much future volatility a node might have. This measure focuses on the model's specificity in predicting companies and their future price movement. This is done by going through all companies in the graph at a given timestep. If the company has a current volatility less than 1.68 standard deviations above 0, that timestep is considered in the low current volatility set. Suppose the company has a table coefficient with some standard deviation (later, this value will be modified, but initial analysis had this at 3) above mean coefficient calculations. In that case, it is considered in the high coefficient set or the selected set. Then if that node had higher average volatility above the current median volatility in the next couple of timesteps, it is considered in the high future volatility set. This results in three sets of companies from the master table and one complete set:

Selected: Companies selected by the algorithm to have high future volatility
Low Current Volatility: Companies that show no sign of future volatility based on current price changes.
High Future Volatility: Companies that show a higher-than-normal price fluctuation in the future.
Complete: All companies in the network.

Figure 10. Visualization of how companies are split into sets.

Above is an illustration of the relationship of these sets to the selections made by the algorithm. All companies can be put into four mutually exclusive sets of high or low current volatility and high or low future volatility. Each company then in the complete set the will either be selected or not selected by the algorithm. The idea of the algorithm is to correctly select companies that have low current volatility, but will have high future volatility. Above would be a well tuned model as most of the companies in the top left region are selected.

As the calculation is done for each timestep, each set does not reset. The resulting analysis will hopefully converge to a probability that is higher than the typical probability of a node having high future volatility.

Significance of the Selected verses Non-Selected Population

The second measure is determining if the set of companies listed in the master table above a specific coefficient is more likely to be highly volatile in the future than the set of all other companies. Specifically, we will look at the "Selected" set versus all other companies in the "Complete" set. To do this, the master tables for each timestep are iterated through. Using a two-sample ks test, the two sets are then tested to see if they differ significantly. List null hypothesis, etc, here

This performance measurement focuses on how the table performs by selecting a population of companies rather than focusing on the ability to predict any single company. In the next optimization section, we will see that trying to optimize a model to be specific versus performing well on predicting a population of companies conflicts. However, in the end, a scheme is created to get reasonable specificity and population prediction by having a sliding significance value.

How can we optimize parameters now that we can measure the model's success given multiple events?

Note: For the optimizations results below a network was created for a network date of 3/1/2023.

Now that we can measure a model's success, we can tune the model to find an optimal set of parameters. I used a brute force method that searches an extensive range for each parameter with coarse grain steps. I then graphed how the model performed for each specific set of parameters and collected the output of how many companies were in the selected table, the future price volatility of the companies in the table, and the significance of the two-sample ks test showing whether the selected population's mean was above the non-selected population.

It is essential to understand how many companies are in the resulting table from the algorithm to get a sense of specificity of prediction and the significance of the population selected as described above. If the algorithm just selects just a single company its very polarizing of either that selection is correct or not, however if the algorithm selects many companies the performance of the population could provide more insightful results.

Figure 11. Optimization of the selection threshold.

For example, above is the graph showing how a change in the threshold

Effects the number of companies the model selects (x-axis)
The probability of a company having higher future volatility (y-axis)

For this set we can see that as the threshold is higher, it allows algorithms that select fewer companies to have good performance in terms of the correctness of that selection. In addition we see that less and less companies are selected which makes sense.

This sort of analysis was done for all of the parameters to get an idea of how each parameter needs to be optimized to allow for optimal performance of the model. The biggest drivers of performance were mostly the number of trials, depth, and threshold significance. The other parameters just needed to be tuned to ballpark values.

I would say the most insightful set of graphs in terms of optimization was below:

Figure 12. Optimization of the number of Trials

The top row is like above with the threshold graph. Selecting a number of trials that selects a small number of companies allows for a high probability which is what we are looking for. As the trials increase, more selections are made, but the probability goes to near 0.

Next, we look at the bottom row. This row plots the significance of the test, proving that the selected population's mean is higher than the non-selected population. Here we see that this test doesn't have much power when only a small number of companies are selected, as seen with the high p-value in the bottom left; however, as there are more trials, we see that this test gets very significant with small p-values near zero.

Here we see the conflict between the model's ability to be specific in selecting companies with a high probability of future volatility versus selection populations of companies that have a higher mean than non-selected companies. This conflict is addressed later in this article, but for now, an optimal set of parameters between these two extremes where enough companies were selected to have a significant test and the set of companies were seen to have a higher probability of future probability over random.

Now that we know the parameters and how to select them, how well does the model perform using a single set of optimal parameters?

Note: For the results below the network date was 1/1/2023.

From above a single set of parameters was selected that performed will on the past data to create a set algorithm to run. Then day-by-day or week-by-week all significant price change events were found to create a master table for that day or week. This master table was then iterated over to find selected companies based on the criteria of having a low current volatility but high coefficient inside the table. This created the selected companies set. All other companies were put in the non-selected companies set. Then to evaluate each of these sets the future volatilities of the companies in each set were measured in various was below to try and determine if the selected companies had higher future volatility than the non-selected companies.

One Day Timestep

First let us focus on the 1-day network.

Probability Measure

Figure 13. Probability of future volatility using selection criteria (Blue) verses Random Selection (Orange)

For this measure we compared the probability a selected company had higher future volatility verses if a company was selected at random to see if the model provided any more insight than random selection. Here we see the days close to the model creation have a very good record of selecting correct companies, however this probability does slip and recover to a higher value in the long term.

Distribution of Volatilities

Figure 14. Distribution of future volatility seen in selected companies (Blue) verses non-selected companies (Orange)

For this measure we plot the histogram of the volatilities seen for companies in the selected set verses the non-selected set. Ideally the blue population mean is significantly higher than the orange population mean, and in this instance that was the case with a p-value of 0.004. Here over the 30 days there are a selection of 23 companies verses 23000 non-selected companies.

Overall the single-day timestep analysis was successful. The algorithm found a coefficient threshold that allowed for selections of nodes with high future volatility or at least more than normal. It's hard to see in the candlestick plot but is easier to see in the distribution histograms. This is also relatively successful because the algorithm selected more nodes in the thirty-day range. Even though it's low, it's significantly more in-depth than previous research, which looked at events that only happen every couple of years.

One Week Timestep

Probability Measure

Figure 15. Probability of future volatility using selection criteria (Blue) verses Random Selection (Orange)

Distribution of Volatilities

Figure 16. Distribution of future volatility seen in selected companies (Blue) verses non-selected companies (Orange)

Here over the 7 weeks there are a selection of 7 companies verses 44000 non-selected companies.

The one-week timestep still has more work to do in seeing why it's selecting so few nodes and if that is because of the increased timestep. The interesting part here is the selections' very selective but accurate nature. This rings alarm bells that the algorithm or optimization is cheating in some way which is looking into.

But can we perform the algorithm in such a way we can both get significant data about the behavior of many companies and reliably specify a few companies simultaneously?

With the above, the model was tuned to one set of parameters and ran. It was then evaluated on the precision with which it could select companies and the general significance of its chosen population. As seen during the model tuning, these may be in conflict, so the question must be asked is there a way to get both the precision of selecting a small number of companies that have a high probability of future volatility and also selecting a population that has a significantly higher mean than non-selected companies. The answer is yes, and I ended up running a model with the same exact parameters except for the z value of significance. The first run is with many trials and a smaller z which will output a large set of selected companies that typically had a significant prediction for the population of companies, and then the model could either be rerun with a higher z, or you could select the companies with the top table values in the merged tables. If the number of companies chosen was in the 1 to 5 range, the model performed very well in picking specific companies with high volatility.

What happens when we select a set of optimized parameters for a specific date and run them over time? How does the model perform over time?

Note: For the results below a network was created and a model optimized for a network date 3/1/2023.

With the promising results above, I focused on the 5-day timestamp and the selected parameters. I created a rolling transfer entropy network for each week from January 2023 to May 2023 that used the previous year's data and ran the model every week based on the events seen that week. I then plotted how the model performed over these couple of months. Then to note, the parameters were tuned for a network that was created for 3-1-2023.

Figure 17. Probability of future volatility using selection criteria minus random selection (Blue) and random selection of other low current volatility (Orange)

Figure 18. Significance of the Selected populations future volatility being greater than the non-selected population.

The graphs show that the model's performance works very well for the week of 3-1 and some subsequent weeks with up to 40% better predictability of future politicly over the random selection and random selection of low current volatility companies. In addition, the mean for the population was very significant. This points to one of two things.

The model will have to be tuned each week it runs. If the focus is on weekly models, this will be a manageable barrier, as running the optimization takes a day or so. But this constant tuning may be more difficult if you want to do any high speed daily or lower.
Something went wrong with the network creation/data that went into the network. I'm at a point where I tried to debug some, and it did look like the networks were all using the correct data.

I hypothesize that the model is susceptible to the number of events and causes for any given week. This means that the model may need to change its strategy from the type of events and the number of events seen in any period. From my analysis, the number and significance of these events vary significantly over the year, so this is very plausible.

An early concept I had in measuring this methodologies success.

Note: The Network Date for the results below was for 11/15/2021 (From original Paper)

Figure 19. Significant events seen verses significance of results.

Going back to early in this project, my first report had the below chart. The red line showed the number of significant events, and the green showed when the merged multi-event table significantly correlated with the price difference that week. Interestingly, this significant relationship looks like a precursor to large market volatility swells. Essentially every red-to-green block event, several significant events happened in the next couple of weeks. This graph is what led me to believe a more worthwhile analysis would be worth it, which I think it was.

Conclusion

This was a fun project that spanned a year or two and went through many phases. I went through and rewrote much of the code, optimizing it each time and trying to make it more and more modular. It used to take a day to create a network for 500 companies, and in the end, I could run it for 3000 companies in just an hour or two by utilizing a lot of Python Multi-Processing and batching a lot of the runs. I spent a lot of wasted time trying to analyze single events independently when the resulting tables were just not independent enough to create a good performance metric, and it was better to focus on the multi-event model, which gave a larger picture. Finally, knowing how to measure the success of the model took a long time, as I didn't show here, but my initial approach was to compare the number of significant results against a "random" network which did show promise but in the end,, didn't show any real-world performance or connection the model. Who cares how better the Tranfer Entropy Network performs over a Random graph? It would be best if you had some real-world statistics ting it to the real world.

In terms of future work, I may look into doing another article on the analysis of the Transfer Entropy network structure, which contains many neat stories. Not only that, but how the network changes over time is also a worthwhile endeavor (also computationally intensive). Then this model could also use a GNN to predict values versus a RWWR. So there is still a lot that can be done with this project. But with my next semester having a Security class, I will probably look into Zero-Knowledge Acceleration optimizations, but who knows? I hope to pick this up in the future. It would be a neat online app and data visualization project too.

Thanks for reading.

Original Paper

Paper from Class which this article expands on : Original Paper for Class

Citations

[1] Sandoval, L., Jr. Structure of a Global Network of Financial Companies Based on Transfer Entropy. Entropy 2014, 16, 4443-4482. https://doi.org/10.3390/e16084443

[2] Stelios B. , et al "Information diffusion, cluster formation and entropy-based network dynamics in equity and commodity markets"

[3] Qiang J., et. Al. "Information interdependence among energy, cryptocurrency and major commodity markets"

[4] Chen G. et. Al. "Measuring the network connectedness of global stock markets"

[5] Marschinski, R. & Kantz, H.. (2002). Analysing the information flow between financial time series . An improved estimator for transfer entropy. European Physical Journal B. 30. 275-281.

[6] Mantegna, Rosario. (1998). Hierarchical Struc-ture in Financial Markets. arXiv.org, Quantitative Finance Papers. 11.

[7] C.-Z. Yao and H.-Y. Li, "Effective Transfer Entropy Approach to Information Flow Among EPU, Investor Sentiment and Stock Market ," Frontiers in Physics, vol. 8, 2020.

[8] https://www.nasdaq.com/market-activity/stocks/screener?exchange=NASDAQ&render=download

[9] Yahoo Finance
[10] https://elife-asu.github.io/PyInform/timeseries.html