AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Optimize for the Second Chunk of Data
We randomly select the subset of parameters, and we treat that as one specific sub network. In order to ensure that this random subset goes from input up until the output, we sort of randomly select the subsets for each layer. These kind of sub networks are chosen, well, I guess the parameter partitions are of course non-overlapping. It basically has all of them, and that subnet who has seen all of the data is the network we use to make any predictions or downstream tasks.