Thursday, July 18, 2024

Fourier analytic Barron Space theory

I recently gave a talk about applications of Fourier analytic Barron space theory (or Barron-E theory for short) at Erice International School on Complexity: the XVIII Course "Machine Learning approaches for complexity" and have released a toolbox to enable such applications: gbtoolbox. But what is Fourier analytic Barron space theory?

Fourier analytic Barron space theory is a theory that combines a theoretical understanding of the approximation error of neural networks (or the difference between the prediction of the best neural network in the function space of neural networks with some given set of hyperparameters and the true value) with a theoretical understanding of the estimation error of the machine learning algorithm (how much data is required to distinguish one function from another in the function space of neural networks with some given set of hyperparameters) using the path norm. I refer to it as a theory, but really, there are several subtly different theories in the literature, and I have not seen a final theory yet (and I intend to shortly submit a paper that will have my own slightly different version, which I also don't think is the final theory). Where the theory was first presented in a completed form in "A priori estimates of the population risk for two-layer neural networks" by E, Ma, and Wu (see also "Towards a Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't" with Wojtowytsch), but a good understanding of Machine Learning theory is required (I recommend "Understanding Machine Learning: From Theory to Algorithms" by Shalev-Shwartz and Ben-David) and I think the original works by Barron (later with collaborator Klusowski) are also required to understand the theory: "Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks" and "Universal approximation bounds for superpositions of a sigmoidal function". The path norm used by E, Ma and Wu is introduced in "Norm-Based Capacity Control in Neural Networks" by Neyshabur, Tomioka and Srebro. The specific purpose of the theory was to develop an a priori bound on the generalization error.

One obvious way that the theory is incomplete is the absence of optimization error (or the difference between the best possible neural network and the neural network found by the hyperparameters that define the training procedure).

In this first post, I will discuss how I think of the approximation part of the theory. My slides are also available at the link above.

Consider the task of a neural network. Basically, you have some data \begin{equation}\{\mathbf{x_k},y_k\}\end{equation} where k identifies the data point and \begin{equation}\mathbf{x_k}\in [-1,1]^d\end{equation} where d is the number of features. We are essentially assuming that there is some function \begin{equation}f(\mathbf{x}) \ni f(\mathbf{x_k})=y_k~.\end{equation}

In Barron-E theory, the task for the neural network is to approximate the effective target function, which is the extension of f(x) to all the Reals. This extension is both in domain and selected such as to minimize the Barron norm \begin{equation} \gamma(f^*) = \inf_{f^*} \int \|\mathbf{\omega}\|_1^2 |\hat{f}^*(\mathbf{\omega})| d\mathbf{\omega} < \infty\end{equation} where \begin{equation}\|\mathbf{\omega}\|_1=\sum_j |\mathbf{\omega_j}| \end{equation} is the Manhattan norm.

There are many mathematical subtleties here that I am going to ignore. One could imagine selecting some set of points from some true function that is nowhere continuous. The effective target function defined above would not likely match the true function anywhere, and Barron-E theory would not be applicable. But for some finite number of discontinuities, we would expect that for some number of data points that we wish to use to define the effective target function, that we could find a function that both would have a finite Barron norm and that we would be almost certain would match some test of points that we would want to apply the function to. This representativeness of the training data and test data is a concern of the machine learning theory of estimation error, and we will move on at this moment (but may return in a later post).

Since we can define a Fourier transform for the effective target function, \begin{equation}\tilde{f}^*(\mathbf{\omega})~,\end{equation} we then have \begin{equation} f^*(\mathbf{x}) \simeq \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-i\mathbf{x}\cdot \mathbf{\omega}} e^{i\mathbf{y}\cdot \mathbf{\omega}} f^*(\mathbf{y})  d\mathbf{y}d\mathbf{\omega}\end{equation} where we have left of factors of 2π. Then, for well behaved effective target functions, \begin{equation}  \int \|\mathbf{\omega}\|_1^2 \sigma(\mathbf{\hat{\omega}}\cdot \mathbf{x} + \alpha )e^{i \|\mathbf{\omega}\|_1 \alpha } d\alpha \simeq e^{-i \mathbf{\omega} \cdot \mathbf{x}} ~, \end{equation} where σ(y) is the Ramp function and \begin{equation}\mathbf{\hat{\omega}} =  \mathbf{\omega} /  \|\mathbf{\omega}\|_1 ~. \end{equation} Applying this, once more leaving off factors of 2π, we have \begin{equation}   f^*(\mathbf{x}) \simeq \int \int \tilde{f}^*(\mathbf{\omega}) \|\omega\|_1^2 \sigma(\hat{\omega}\cdot x + \alpha )e^{i \|\omega\|_1 \alpha } d\alpha d\mathbf{\omega} ~.\end{equation}

This is very suggestive, especially when we recall that we aren't interested in functions defined on the Reals but rather only on functions defined over the domain of the data. We have \begin{equation}f^*(\mathbf{x}) \simeq \int\int h(\mathbf{\omega},\mathbf{x},\alpha) p(\mathbf{\omega},\alpha) d\mathbf{\omega} d\alpha\end{equation} where \begin{equation}h(\mathbf{\omega},\mathbf{x},\alpha) \simeq -\mathrm{sgn}(\cos{(\|\mathbf{\omega}\|_1\alpha+\phi(\mathbf{\omega}))}) \sigma(\mathbf{\hat{\omega}}\cdot\mathbf{ x}+\alpha)\end{equation} and \begin{equation}p(\mathbf{\omega},\alpha)\simeq \|\mathbf{\omega}\|_1^2 |\tilde{f}^*(\mathbf{\omega})| |\cos{(\|\mathbf{\omega}\|_1\alpha + \phi(\mathbf{\omega}))}|/\nu \end{equation} and \begin{equation} \tilde{f}^*(\mathbf{\omega})=|\tilde{f}^*(\mathbf{\omega})|e^{i \phi(\mathbf{\omega})} \end{equation} and \begin{equation}\nu\leq 2\gamma(f^*)~.\end{equation} Using this we can define a Monte Carlo estimator, \begin{equation}     f_m(\{\mathbf{\omega},\alpha\},\mathbf{x}) \simeq \sum_j^m h(\mathbf{\omega}_j,\alpha_j,\mathbf{x})  \end{equation} which approximates the effective target function for \begin{equation}\{ \mathbf{\omega}_j,\alpha_j \} \end{equation} drawn from probability density function p(ω,α). The variance of such simple Monte Carlo estimators is easy to calculate, and so we have a bound on the approximation error of this Monte Carlo estimator of \begin{equation} 4 \gamma^2(f^*)/m ~.\end{equation}

This Monte Carlo estimator looks very similar to a shallow neural network with m hidden nodes (which we will call a Barron-E neural network). There are some important differences between Barron-E neural networks and those that we work with in standard practice, some of which we can argue would give smaller approximation errors than those given in Barron-E theory. First and most simply, the inner weight parameters in the Barron-E neural network have a Manhattan norm of 1. However, this can be addressed with an easy scale invariant transform of a neural network. Also, the integral over the bias is generally going to be much less than 2, but this will result in a smaller bound. Most importantly, the outer weights of a Barron-E theory neural network are constants that depend on the Barron norm. In practice, this suggests using Barron-E theory for applications such as inner weight initialization and not a generalization bound (see my slides above or the gbtoolbox), and we do some second step where we take some large number of nodes with constant outer weights and interpolate to have a smaller number of nodes with non-constant outer weights. Or we improve the theory.

The development that I present here isn't meant to be a presentation of the best theory (which doesn't exist yet), but it is a useful presentation of Fourier analytic Barron space theory. This presentation was adapted from reports that I wrote. I intend to write additional blog posts on this topic, and I have a draft on Fourier analytic Barron space theory applications that should be approved this summer.

Tuesday, July 16, 2024

Science and Scientists

 I left academia back in 2019 to become a research scientist at Onto Innovation. My title was research scientist, but was I still a scientist?

I read papers and thought up approaches using machine learning and the theory of machine learning to solve problems in semiconductor metrology (primarily for optical critical dimension (OCD) and a little X-ray critical dimension (XCD) metrology). I also worked on simulation. I mostly did applied research, looking to put ideas together from papers that could be used for OCD.

However, my work's results were primary trade secrets. Some were turned into preliminary patents, and some were handed off to engineers to be put into products. However, none of my results were communicated to other scientists. This was true even internally. I made presentations, but the other research scientists at Onto Innovation were not interested unless they were relevant to their work at the time. I did not go to conferences.

And I felt frustrated.

In January of 2022, I left to focus on Euler Scientific. Our main project was research, with a strong (maybe too strong) basic research component. But since our main customer was the Department of Defense, and because our main goal was to produce an application (the basic research was supposed to turn into applied research and then into an alpha application), and to have a successful company (and so more customers), there was no discussion outside of the limited number of involved scientists (really only myself and two with minimal time commitment from Fermi National Laboratory) and I didn't attend any conferences. There was an agreement to write papers, but without dedicated time/effort, they have been slow (there are two that are in the editing process (one that needs to be re-edited for submission to another journal and one that needs to be submitted for the first time) and two more that are almost finished and an additional one that requires more work). 

Was I still a scientist?

I spent the last 8 months focused on finding new employment. Despite funds being limited, I attended a conference I was invited to: Erice International School on Complexity: the XVIII Course "Machine Learning approaches for complexity". I realized there what I had been missing. Since the fall of 2017 (when I took parental leave, which became a parental leave sabbatical in the spring of 2018 which continued until I left academia in 2019) I had not communicated my research with other scientists (non-collaborators). Science can not be done alone; it must be communicated. This was what I had been missing.

I am not sure if I will still be a scientist in my next career step. I think that being a scientist outside of laboratories and academia is a privilege and one that I can not maintain. I look forward to bringing my work to customers, and my title will be engineer.

Sunday, July 14, 2024

Physics heroes, classes and graduate school

I wrote most of this five years ago but didn't publish it because I was still thinking about it. I am sharing it now, including my original opinions, despite my opinions changing. My new opinions are given in the last three paragraphs.

Over the years, I have thought about physics heroes. A lot of people love Feynman, and while I enjoyed his autobiography and had a professor I TAed for compare me to him, I didn't really consider him my hero. The same with many other physicists. I think that Einstein and Newton were my heroes as I started to pursue physics, but over the last 15 years, I have discovered that I consider Freeman Dyson a hero, and I have since read several of his books. While I understand the point of how heroes hurt science, I also think that they can do a lot of good. And not just by providing inspirational role models like Jim Gates.

When I was a freshman, Freeman Dyson visited my college. He taught a class for non-majors and gave a couple of lectures for the physics students. One I attended had several of us, including Dyson, leave the lecture hall to go to the theater and watch the Matrix. One thing he said at the time stuck with me, at least the concept (since the words didn’t). That was that physics was something you do and not what you study, that you needed to get involved in research and not just take classes.

I didn’t truly understand and internalize this idea until I almost dropped out of my third year of graduate school. It has become one of my guiding philosophies as a physicist and physics professor. 

I have observed that online graduate degrees are popular (universities withstood moocs but risk being outwitted by opms). I don’t see the point of them. Even a non-lab undergraduate degree loses out on a lot of value being online only, and graduate degrees lose out on most of their value. I think a good undergraduate degree should be 70-80% coursework, a master's degree should be 30-50% coursework, and a PhD should be around 10% coursework. The non-coursework component can be done with industrial mentors instead of academic mentors, but the good mentors will generally be at the same location as the good academic mentors. Who will do the legwork, and how is that legwork going to be valid for industrial mentors in a location without academic mentors?

I think the real signal with these online graduate degrees is that new things have been learned. But that isn’t the purpose of a graduate degree.

Since I graduated with my PhD, I have continually learned new things and worked in new fields. I have never taken a course, just reading papers (and books sometimes) to understand where the field is or to find a good technique. I think that instead of doing this, many people are taking a Master's degree (and spending money on it). They do get a certificate that others can see, but they don’t get the deep knowledge that traditionally comes from a Master's (or PhD).

This opinion of mine has changed.

In the last 8 months I have searched for a new position in industry. The requirements for finding a software engineering adjacent position have changed since I left academia for industry in 2019. I did not get the interviews I expected and ran into rounds of coding assessments that were well beyond my level (especially 8 months ago when I received my first interview at a top AI startup).

I didn't pursue an online graduate degree, but if I had the funds to do so, I would have done so, and it would have benefitted me. Both as a signal for the recruiters and hiring managers and because, while I have self-studied and learned a lot and have been following free online self-study courses (without reputable certificates) like those found at CodeSignal and NeetCode, it would have been beneficial to have the direction of a professor.

So, my position on this has changed because I think the signal is important and valuable. I may still never do an online Master's. But if I had had the assets to do one in the last year, I would have done one. And it would have been beneficial for me.

Entrepreneurship

 For most of the past three years, I have been engaged in a new adventure: trying to start a company. I was not, and am not, a natural for this endeavor as I am a scientist and not business-minded.

I left my former employer, Onto Innovation, in January 2022 to lead Euler Scientific and its efforts, primarily to develop a toolbox to enable the interpretability of neural networks, including a bound on the generalization error (or the difference between the neural network's prediction on the training data and the prediction on some unseen test dataset).

In 2023, we attempted to pivot, and in the winter of 2023, I shifted to focus on finding new employment (after releasing an alpha toolbox). Finally, in the summer of 2024, I have found a new position.

Looking back at my time as an entrepreneur, I really needed to be more customer-focused in 2022. I was focused on solving technical problems, which were great and required me to solve them, but I also needed to be focused on customers. Doing both required more time than I had available.

The other thing I needed, which I also really needed to find new positions (and to find positions in the past), is a good network. Most entrepreneurs, especially those focused on business-to-business sales, use their network to find their customers. My network is too international and academic to be useful in finding business-to-business sales.

Being an entrepreneur at this time is not right for me. Before I consider stepping out into entrepreneurship again, I think that I need to have a large network, not just ideas and technical ability (and even investors), to serve as a seed for business-to-business sales.

Monday, April 26, 2021

Apportionment

I have been very busy lately. But I have been relieved about US politics, I was so anxious going into November and continuing all the way to January 20 (With a spike on January 6, and I heard enough online to be anxious about January 6 after D).

This isn't an anxious post. I think that there might be things to be anxious about, in Georgia for example, but I am not anxious. I did want to share thoughts about Apportionment. Apparently, New York, California, West Virginia, Ohio, Pennsylvania, Illinois and Michigan all lost seats. Oregon, North Carolina, Colorado, Montana, Florida and Texas (2) all gained. Some people were anxious about this.

Florida, Colorado, North Carolina Ohio, Pennsylvania and Michigan have all been considered swing states. Texas, Montana and West Virginia are considered Republican while New York, California, Oregon and Illinois have been considered Democratic. This would suggest a 2 seat shift from Democratic to Republican. However, Ohio has looked very Republican lately and Colorado has looked very Democratic. This would give a 1 seat shift from Democratic to Republican. Finally, if Texas actually becomes a swing state, then it is a 1 seat shift from Democratic to swing.

So no, this is not something to be anxious about.


Monday, August 3, 2020

Bias in life and physics (sexism)

My thesis advisor (Prof. Betsy Beise) was a woman physicist, as were both of my postdoctoral supervisors  (Prof. Olga Botner and Prof. Catherine De Clercq), as was one of the co-leaders of my primary experiment as a professor in Chile (Prof. Debbie Harris). Despite this, and the progress it represents, I think that there is a bias that women face and not just societal imbalances relating to parental leave, parental responsibilities and expectations.

My experience in physics is that a bias exists. I have heard numerous male physicists express in private that women physicists were good or acceptable as lecturers, colleagues and even administrators but not as thought leaders or researchers. This wasn't just from elderly physicists, but also from ones from my generation.

Also, being an active and involved father of two young girls has opened my eyes to some of the bias that exists in this world and in myself. My girls always want the story to be about girls or assume that anyone not given a gender is a girl. My observation is that many of the stories give a male gender for the character (unnecessarily) and my own bias comes through in my discussion of stories without explicit gender where I tend to give characters a male gender if an explicit female gender is not given.

It is clear that an explicit effort to attract female talent to physics is necessary and appreciate those such as Prof. Kim (University of Chicago) who do this. Also necessary is a societal rebalancing towards parenting which has started in Sweden (and other places in Europe) and which some in the United States would like to implement here. Part of this rebalancing must include a rebalancing of expectations and responsibilities, like in Sweden, where men have parental leave.

I think a step that hasn't been made anywhere is to make some minimal amount of parental leave (6 months) required.

Sunday, July 12, 2020

Church

For many people the community aspects of religion are crucial. In fact, I know plenty of Christians for whom the community aspect is the main and most important part of Christianity. By community aspect I mean by going to church an individual is part of the community and has friends/support networks/etc and so on.

For most of my life this community or body aspect of church was foreign to me, despite attending weekly.

Growing up, I didn’t make friends at church, I didn’t really talk to others. I would go and listen to the sermon or sleep or think about games/books. I would sometimes take part in religious discussions there, even on occasion being a very active participant, but that was all.

When I returned to church (not Christianity, I never left Christianity) in the middle of graduate school, I began to appreciate three more components. The first of these was worship, was singing and praising as the body of Christ. The second was being inspired, as I came to very much appreciate pastors who could inspire me for the coming week to work to improve my life for the better. The final, and relatively illformed for me compared to the first two, was service. I didn’t lead or play a significant role in service to the community, but I did occasionally play a bit role and I found that that was also important and valuable component of church. I also found a camaraderie in service.

But I still struggled with the community aspect. Part of this is just a fundamental difficulty with socializing that I also find with physics conferences and the like, and my behavior at receptions is often similar. But at church I would take part or leave. Sometimes I tried to force myself to become part of the community by staying but I would just stand in a corner awkwardly. Sometimes I would have in mind to go greet someone, but that would be over quickly and then what? So the community aspect of church was foreign to me.

The last couple of years I began to understand, to internalize, it a bit more. For the first time, that became the most significant component (at times) to me of church and not worship or inspiration or it being a set aside time to rest. This was because I had children, and involved them in the children’s programs. They loved being involved with the other kids and I followed them.

So now, for the first time and as we can no longer worship together in person, I find myself wanting the community part of church.

The Economist (the virus is accelerating dechurching in america) posited that people would find other sources for what they got from religion after going away. I have also heard concerns about this from pastors who I know and admire.

It is true that the habit has been broken. But inspiration and praise are available remotely and online, and all forms of community, not just religious community, are missing at this time. People need community (especially those with families) and will return to them or renew them when they are able to.

So no, I don't think that the there is going to be a significant increase in dechurching, beyond that which has been going on the last two decades and at least partially originates in the alliance between evangelical christianity and the Right in the United States.