Discussion Group at BIDS

Information and Uncertainty

Information is reduction of uncertainty. This is the blog page for the information and uncertainty discussion group that meets at BIDS bi-weekly Friday 11:00am. The blog posts contain summaries of the meetings.

The knowledge gap in data science is fundamental knowledge. This is, many companies provide tools and programming interfaces to perform various tasks in data science but the high level picture is still missing. Our group uses high-school level math and scientific reasoning (mostly freshmen-physics level) to build up the theoretical foundations of the science in data science from scratch. These ideas are then connected with statistics and computer science concepts to form a complete picture.

The discussion group is a forum where anybody can join and ask their data science questions about information, uncertainty, entropy, probabilities, machine learning, generalization and the rest of the universe. The discussion group facilitates and open, unjudgemental discussion (no grades, no promotion, no funding!) about the intuition and practical use of the concepts that are often deeply buried in mathematical language. We strive for clarity and understanding via concrete problems, practical examples and visualization.

Sign up for our mailinglist (called bidsentropy): https://groups.google.com/a/lists.berkeley.edu/forum/#!forum/bidsentropy

Suggested topics:

  • Fractal Dimension and Entropy
  • Cellular automata, Turing completeness, ergodicity, the data processing inequality, and the 2nd law.
  • The Galton Board and the physics of information and uncertainty: Why do we have to wait in line at the supermarket?
  • Entropy in Thermodynamics, Information Theory, and what if events cannot be independent?
  • Urn and ball experiments revisited: The probabilistic viewpoint, the information viewpoint, and the deep learning viewpoint (Nov 2nd)
  • Entropy and Learning: What does compression have to do with generalization?
  • Complexity and human evolution: Why does the latin alphabet have 27 characters?
  • Shannon’s Communication Model: So what doesn’t fit in there? And what are alternatives?
  • Information, Work, Energy: What’s the difference?
  • Decisions, past and future: Is there a connection between communication and computation?
  • The Data Processing Inequality and the scientific process: Why do we have the scientific process and can we show that it actually works?
  • The Pigeon Hole Principle: Why do prime numbers exist?

These topics are just fillers. You are very much free to bring your own question and inspire the group with your question. We will also have guests on selected topics and will advertise on this website.

Note: Spontaneous ideas in an open academic discussion setting might be wrong. This is not a polished seminar class. This meeting assumes that the attendees are trained or are being trained on how to follow the scientific process to verify ideas to make them publishable scientific results.

Video Modules

Check this: YouTube Playlist

Literature:

This list will grow.

  • Richard Feynman: “Feynman Lectures on Computation”, CRC Press (2000).
  • David MacKay: “Information Theory, Inference, and Learning Algorithms”, Cambridge University Press 2003.