Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Codebases with great, easy to read code?
234 points by impjohn on March 21, 2022 | hide | past | favorite | 141 comments
A colleague told me the best way to level up coding skills is to read excellent code.

Do you have favorite repos that highlight this?

I have an irrational fear of unknown codebases since it feels most of the code is either boilerplate or tied to some framework.

Do you have tips and tricks you use to read codebases?




Redis. Read the redis source code if you want to see nice C.

The reason it always impresses me is that C can look like gobledygook, but yet this codebase is clean and understandable.


I was impressed with the Redis codebase too. I think it benefits from being relatively new in C terms so it doesn't have too much baggage (2009, is really new in terms of C projects!). It must also take a lot of discipline on behalf of the maintainer.

I seem to remember Postgres and Sqlite were relatively accessible to a low intermediate C programmer. When I've had to look at Android code (more C++ admittedly) I've started to get lost very quickly.


I second Postgres. Not only is the source code a pleasure to read, there's also an unusual amount of well-presented material about its internals available online.


this probably correlates to why there is such a rich ecosystem of pg extensions and forks.


Also explains how Pivotal was able to refactor it into a MPP (Greenplum).


while true, the architecture rather than code hygiene deserves credit for facilitating the ecosystem of extensions and forks. in my husky opinion.


Postgres's Yacc definition helped me a ton when I was using Yacc. The documentation out there for Yacc/bison isn't great, but Postgres served as a decent set of examples.


If you're looking for code in C, the implementation of Tcl is a wonderful code base. You can even focus on specific parts instead of the complete scripting language: how to create a hash table, for example.


I came here to say this. Redis and SQLite, mentioned elsewhere, have their roots in Tcl, so there are some connections.


Agree and not only Redis, the way Salvatore Sanfilippo (Redis creator) program is very readable and instructive. A glance into its repos worth the time.


The Plan 9 operating system is good C codebase to explore too


sqlite too. It's good.


Are we allowed to share repos we've written? :)

If so, then here's distributed consensus in Zig:

https://github.com/coilhq/tigerbeetle/blob/main/src/vsr/repl...

Something that differentiates this from many consensus implementations is that there's no boilerplate networking/multithreading code leaking through, it's all message passing, so that it can be deterministically fuzz tested.

I learned so much, and had so much fun writing this, that I also hope it's an enjoyable read—or please let me know what can be improved!


Very clean, tasteful formatting. Easy to read. There's even correct splitting of long argument lists over multiple lines: [1], [2]. You don't see that very often, most programmers have awful taste about this.

I have some qualms with these one-line early returns though:

  if (commit <= self.commit_min) return;
Control flow statements should always be on their own lines, then it's easy to find all of them by visually scanning top-down, without needing to look all the way down each line.

[1]: https://github.com/coilhq/tigerbeetle/blob/main/src/vsr/repl... [2]: https://github.com/coilhq/tigerbeetle/blob/main/src/vsr/repl...


Zig's code formatter puts list items on separate lines if the last item has a trailing comma; otherwise, it puts them all on the same line. So if you use the code formatter, you'll either have just one line, or one line per item. It's pretty nice.


Almost anything from suckless.org.

Here's a windows manager (dwm) and it's docs and build system in 13 files and just around 3000 lines of code.

https://git.suckless.org/dwm/files.html

And sbase, a sort of "busybox-like" set of common *NIX base utils written to be small and portable. Some of the commands are just a few dozen lines.

https://git.suckless.org/sbase/files.html


Along with the other recommendations, I was introduced to The Architecture of Open Source Applications from an HN post some time back, and have found it quite interesting. You can use it together with a more detailed walk through the respective projects' source code, to get a great idea of what some big names are doing.

http://aosabook.org/en/index.html


+1.

This is what i recommend to everybody who wants to read code. Why? Because the books explain the Design behind the Code lacking which, it is quite difficult to understand the code-base. Also you get an exposure to a bazillion different projects which is crucial to "grok" Software Architecture and Large-Scale Design.


I've always enjoyed lichess's chess API: https://github.com/lichess-org/scalachess/tree/master/src/ma...

It's funny because I remember comparing it to mine that I had tried to write during college, and appreciating how much better it is.

Pay attention to how there's a bunch of different types of chess in there too, and how that's factored.


https://github.com/seattlerb/minitest really removed the FUD for me when i started learning Ruby and Rails. Its full of metaprogramming and fancy tricks but is also quite small, practical and informal in its style.

e.g. "assert_equal" is really just "expected == actual" at it's core but it uses both both a block param (a kind of closure) for composing a default message and calls "diff" which is a dumb wrapper around the system "diff" utility (horrors!). There is even some evolved nastiness in there for an API change that uses the existing assert/refute logic to raise an informative message. this is handled with a simple if and not some sort of complex hard-to-follow factory pattern or dependency injection misuse.

https://github.com/seattlerb/minitest/blob/master/lib/minite...


I can't believe it. It was the same for me back in 2009 when I was learning Ruby. Thanks for this.


That's an oldie, but the experience was the same for me. I was reading Metaprogramming Ruby at the time, and going over the implementation made everything much clearer to understand.


> A colleague told me the best way to level up coding skills is to read excellent code.

The best way to level up is to code. Reading code can be a complementary activity that can bring insights but it's not a way to level up. Active > passive.

> Do you have favorite repos that highlight this?

For what language? Desktop, mobile? Systems programming or web development? Linux/BSD/etc all have source code available. I believe microsoft has open sourced the .Net Framework or parts of it.

It's like you are learning a foreign language and want us to recommend good books? Can't really help you if you don't tell us the foreign language and your goals for the language ( casual conversation, business, translation, etc ).


> The best way to level up is to code.

Up to a point, yes. But beyond that point, in my experience, a deliberate study of software architecture is required to move forward. That and mentorship/code reviews by people who have a deeper appreciation of software architecture.

You start by wanting to learn how to code, then you write a lot of code, then you progress by learning how to write less code and less complex code.


"The best way to level up is to code. Reading code can be a complementary activity that can bring insights but it's not a way to level up. Active > passive."

Both writing and reading code is important. It's just that most people, in my experience, do not actively search out code to read and spend more time writing code.


>> A colleague told me the best way to level up coding skills is to read excellent code.

> The best way to level up is to code.

I think it's much more subtle than either of these.

First of all, "excellent code" is an extremely subjective thing. I once worked with this one developer. He could cook up solutions to complex problems very quickly. But he didn't comment or docstring any of his code, he favored writing his own libraries and frameworks rather than pull in dependencies, and every single thing he wrote was grossly over-engineered once you managed to figure out what it was doing.

Which is a long way of saying, he was a brilliant programmer who wrote very shitty code. And unfortunately, there are a large number of open source projects and maintainers like this, so picking some at random to study may not get you very far.


> Reading code can be a complementary activity that can bring insights but it's not a way to level up.

Counter-point: As a professional developer one might spend far more time reading code than writing code. In my experience, all the good developers I've worked with have the ability to skim through large code bases and quickly zone into the parts that interest them. It is a very deliberate skill to cultivate.

I once put down my thoughts on this : http://lonetwin.net/20090829/hacks-you-can-live-without/on-r...


Not sure your counterpoint is a counterpoint. I think it just illustrates there are multiple ways of reading code, just like reading text.

Reading text to research is different than reading text to learn to be a better writer.


"Practice makes perfect" is a common refrain, and there is some truth to it... but more accurately, perfect practice makes perfect. If you practice bad form, you will execute bad form. Merely writing code is not necessarily good practice. Writing good code is good practice. Ergo, alternating between reading good code and writing code is an effective means of leveling up.


The post originally asked about Python programming. OP made it more general on purpose.

I'll (tongue-in-cheekly) prompt you with the following:

Language: Whatever qiskit is most familiar with OR has a favorite recommendation for (based on qiskit's interests). Domain: Whatever qiskit is most familiar with OR has a favorite recommendation for (based on qiskit's interests).


Unfortunately, a lot of developers run into clever, over-abstracted code they can't understand (repeatedly) and then eventually grasp it and think it's OK or even preferred to write clever code like that themselves. It's like a virus.


I've been thinking a lot recently how to get devs to read more code and it's a very interesting reason you give that you don't want to wade through boilerplate. I never thought of that before.

I don't know boilerplate-heavy systems like Rails or Django too well. But I just wouldn't suggest starting with reading web app code (though maybe I've ignored reading too much web app code over time).

The easiest code to start thinking about is libraries and things you use today already like the nginx code base or the CPython code base or your logging library or your web server library code.

In these cases maybe you download the repo, build it, see how you could make a small tweak and run it. And soon you're looking through its code to understand how it works.

Another maybe easier technique to start reading more is when you are programming and have an error in a 3rd party library, use grep to find that error in 3rd library code and just start poking around when you do. Maybe add some print statements to it so you can see more of what goes wrong. Try to solve the problem just looking at the code and modifying it instead of using google.

If you ever get into it I'd love to hear from you. Email is on my site and Discord is in my HN profile.


Django and DRF are really well written systems with good documentation.


I must be in the minority, I have tried to "learn Django" on more than one occasion and gave up every time after struggling with how hilariously over-engineered it is.

My best guess is that Django must be great for writing big important relational-database-backed apps, or rolling your own CMS, or something else people get a paid a lot of money to do. But personally, for my small projects I get more mileage out of starting with a micro-framework and just choosing and bolting on the bits I need.


That's exactly the case. My career has been mostly making CMS systems using Django. The neat thing is that you have this front-end batteries-included framework with a gargantuan ecosystem of system tools bolted on it's back end (Python). Django is not a web page tool. It's a tool to build large-scale web systems.


I'm totally with you. Whenever I tried to build anything with Django, I soon went back to Tornado for its glorious simplicity. Being able to quickly see how something is implemented is worth much more to me than any magic I mostly end up fighting. Django just has too much going on with everything being somehow connected to everything else (which is sort of the point). I might agree it's well-engineered, but an example of easy to read code...?


absolutely +1 for django's source. and after they shipped their black integration, everything is properly formatted, which makes reading the source even easier.


GRBL the CNC firware for Arduninos:

https://github.com/grbl/grbl/

It feels like it has more comments than code. The comments are written in a very nice, understandable language that even activley teaches about concepts that are only adjacent to the code at hand.

E.g. https://github.com/grbl/grbl/blob/master/grbl/stepper.c#L142 or https://github.com/grbl/grbl/blob/master/grbl/stepper.c#L233


That's over-commented code written by a junior developer from my quick look. The first random thing I looked at in grbl/eeprom.c:

... char old_value; // Old EEPROM value. char diff_mask; // Difference mask, i.e. old value XOR new value.

cli(); // Ensure atomic operation for the write operation. ...

You can remove the need for the first comment by calling the variable old_eeprom_value. Boom, simple and obvious. Commenting cli() is similarly ridiculous: call the function disable_interrupts() and it's completely obvious what it's doing. Later on:

sei(); // Restore interrupt flag state.

This is incorrect. It's enabling interrupts, not restoring them. If the intent was actually to restore the interrupt disable flag to its original state then this function is buggy and will unintentionally enable them. It would be far better to document the expected sematics in the documentation for the function above, but instead of documenting the expected semantics of the eeprom_put_char() function, you have to read the code to figure out what the semantics are. What would be better is to have a comment in the function description saying "this function can only be called with interrupts enabled" or "this function is atomic and can be safely called from an interrupt handler or with interrupts enabled". Then it's obvious when reading the code which semantics are guaranteed / expected.

So, sure, overly commented code makes it easy to figure things out, but this is a sign of a junior developer that is focused too much on the code and not enough on the overall system. This isn't something I'd like to see a developer pointed at that is looking to learn good habits. Good habits are telling other developer what they can expect from a function. Bad habits are making them read the code to figure that out.


It's funny this got mentioned, because I recently got a 3D printer that runs Marlin, which embeds GRBL. So, I decided I would take a look at it. I thought a lot of places were really garbled. Especially motion_control.c, which has a ton of #ifdef logic


SerenityOS, especially the userland, has always seemed very elegant to me:

https://github.com/SerenityOS/serenity


one of the best C++ codebases in existence.


How do you guys approach the "start" of reading a code base, i never know where to start looking, specifically if its a language i am not too familiar with i have no idea where to start and sometimes i have no idea where the program execution starts


Mitchell Hashimoto has published a blog post describing how he approaches complex codebases. That might give you an idea where to start.

https://mitchellh.com/writing/contributing-to-complex-projec...


Great guide! I would add fixing bugs. I often learn most about a code base by fixing my bugs. A good debugger can be a blessing. Profiling is part of debugging to me. Questions can come up about why something is taking a long time that lead to more debugging and thinking about what is going on.


The best runs I've had working on others' codebases is to jump into documenting it. Many projects love having someone read, ask questions about, and document code, even (or especially) from a naive standpoint since that's who'll benefit the most from it, and in the process you learn how the code's structured, track where references lead to, and more often than not kick over some bugs worth fixing in the process.


Good advice! I especially like the first advice

> The first step to understanding the internals of any project is to become a user of the project.

It's normally easier to figure out complex behaviour from the spec/doc/interaction than from the code.


I've watched an interview on MSDN with one of the developers of .NET (I think she was responsible for the GC), who also used to work on Windows, making those famous workarounds making games work on newer Windows releases, even when they relied on old kernel bugs. I think she said that the best way to get familiar with a complex new codebase is to step through it with a debugger, going through several scenarios. I think it's a great idea. I only wish I had a working visual debugger in my day to day work.

EDIT: Found the interview: <https://docs.microsoft.com/en-us/shows/Careers-Behind-the-Co...>


Besides the good methods others have posted, and a really nice method (if you have it) of having someone else familiar with the code give you a tour, you can also do pretty well just by brute forcing it.

Get a list of all the files, sorted however (`find -name *.foo` works) and start going through them top to bottom, or bottom to top if that's a more clear convention of the language. Maybe shuffle order a bit if you discover unit tests (nearby or asking a tool to cross-reference a call) to read the code and the test around the same time, but resist the urge to jump around too much or too deeply. Jot down short notes about what seems to be the main purpose(s) of the file, and move on. Keep going, keep track of what you've seen, your first goal is to do a complete survey of all the files and not get too distracted by fully understanding new syntax (Java annotations and Python decorators can both be understood as high level declarative tags even though under the hood they're quite different) or endless note revisions from new insights as you progress and start seeing connections or just finally understanding terminology ("wtf is a 'hero'?").

You'd be surprised how fast you can do a single (high level, shallow, skimming in places) pass even for larger code bases, by the end of it you'll also have found the/an entry point, and are in a better place for followup study or producing materials that can help the next person (like an architecture diagram that lists the files involved in each element, at least at that moment, or just some important cross references you've noted that a tool isn't necessarily going to make clear). And for easy code, a single pass may be all you ever need, even if you read it in a strange order. A completed puzzle is perfectly clear regardless of the order you put the pieces down.


Short answer -

git clone <repo> ;

open project in editor/IDE

Read the readme.md to get an idea at the author's opinion

Start at `func main(){}` and find what I find.

Longer answer can be taught by taking the patterns out of

https://www.goodreads.com/book/show/567610.How_to_Read_a_Boo...


I came up with the idea of ENTRYPOINT comments to solve this problem: https://gist.github.com/gushogg-blake/247b1bf2ed46b035d1c8a2...


I have a very different suggestion. This codebase (RPI Engine[1])is what I initially cut my teeth on and I learned a lot about good program design just by viewing what works and what didn't work. Reading and understanding code that's stood the test of time can also be quite valuable because you can see which patterns can survive lots of people touching it and which patterns start to fall apart when the original designer isn't available to onboard new people - MUDs develop through time with a few concurrent developers at most, and generally have stretches where there are no active developers, or the people executing code changes are learning it as they go.

I'd suggest this codebase as an excellent lesson in how bloat and complexity enter into the picture over time - I wish the actual commit history was available, but unfortunately the open source release was just a snapshot in time.

1. https://github.com/webbj74/RPI-Engine


Something I find really helpful is to start with a question that I want to answer.

Often this will be along the lines of "How does it do X?" - where X is something I either didn't know was possible or that I suspect to be really difficult.

Then I can dive in to the codebase (usually starting with GitHub code search) and try to figure out how they do it.

This helps me skip straight past the boilerplate and means I often get to a satisfying conclusion - where I've learned something new - in a very small amount of time.

And along the way I pick up knowledge about how their code is organized and often a few other tricks too.


One recent example: I wanted to know if the SQLite package in Python took any steps to avoid calling "interrupt" on a closed connection, which the SQLite C documentation warns against.

A couple of searches against https://github.com/python/cpython lead me to this code here: https://github.com/python/cpython/blob/4674fd4e938eb4a29ccd5...


It's nice that code that I wrote more than a decade ago is mentioned here.


Thank you very much for building this, I benefit from it every day!

Since you're here... there was actually a question raised on the SQLite forum about that code and whether it is genuinely safe against a specific race condition... and I don't have nearly enough Python C knowledge to know the answer!

Does this look like it could be a problem to you? https://sqlite.org/forum/forumpost/f37ae374cc


And it's even nicer that you get the chance to see it !


Anything written in Zig or Go.

Both languages are extremely readable, even when looking at unfamiliar code.

The Zig standard library is small, yet covers a lot of common tools and structures. Every file contains implementations of one particular thing, so you can casually browse random files and understand what's going on without having to understand the entire context.


Couldn't disagree more in the Go case. Folder-level namespaces (rather than file-level) makes Go exceptionally annoying to navigate, and interfaces heavily obscure the.. interface between abstractions and implementations.


I'll +1 Zig code, but I came here to say this about Go as well. Once Go codebases creep past medium sized (25-50k LOC perhaps) into large I find them to be inscrutable. I've noticed this even in projects that people in the community point to as the gold standard such as the various HashiCorp tools.

I don't write much Go anymore but my hunch is that it's a combination of the package layout and auto method delegation for embedded structs. Even Java does a much better job of helping the developer at obviating the interfaces between different subsystems.


Java basically does the same thing and I've never heard anyone complaining about that.

Python does the file level thing and it's source of constant annoyances with cyclical imports. Ugh, wasted so much hours on fixing it.


> Java basically does the same thing and I've never heard anyone complaining about that.

I don't love dealing with Java, but it has none of these particular issues. It's easy to find the definition of a class, because it enforces that the file path must match the fully qualified class name for all public classes (and this tends to be followed for privates as well in practice). Finding the implementation isn't quite trivial, but if you know a class in the tree then you can easily find the whole parent tree by following the "extends Foo" clauses, or find subclasses by grepping for it instead.


It seems like gopls solves your problems. It will do the correct thing for navigating to definitions and listing implementations of interfaces.


In my experience gopls hasn't been useful for much more than crashing.

But that aside, Go's incessant insistence on structural typing makes it impossible for any tool to generate a list that is both complete and free from false positives.


This is a massive generalisation. It is very easy to write bad code in any language.

Well, bad on a large scale. Go in particular has some nice tools to ensure code at a small scale is always good (enforcing syntax style), but no language can stop you from having a bad project architecture.


Go basically codifies a lot of best practices for writing C++ at Google, but I wouldn’t say it always teaches good code. Good Go would definitely help you learn how to write good systems code in any systems programming language though, and to be explicit/clear instead of terse.

But you can easily code yourself into a corner with Go too. If someone doesn’t know how to use concurrency well they can do bad things like overcreating goroutines or making a mess with channels. And some of the common patterns (particularly excessively overriding things) can be considered an anti pattern in terms of understanding (IMO)


Absolutely this. Especially Go's standard library is a pleasure to read. Lots of idioms and good practices to be learnt.


A bit old now of course but both Underscore [1] and Backbone [2] have annotated sources and are a pleasure to read.

1. https://underscorejs.org/docs/underscore-esm.html

2. https://backbonejs.org/docs/backbone.html


In past threads, people have mentioned enjoying my Tarsnap (https://github.com/Tarsnap/tarsnap) code. I personally think that the spiped (https://github.com/Tarsnap/spiped) code is even better.


When people ask this question about Python codebases, I always recommend the Shodan Python client - https://github.com/achillean/shodan-python

It is easy to read and has taught me some neat Python-isms.


I find Ramda very easy to read! It's a functional Javascript library based on currying and composition. https://github.com/ramda/ramda/

I find a lot of code fairly alienating to read. Lots of codebases require you to get into the "mindset" of the person who wrote the code: their idioms, assumptions, patterns they lean on, etc. So unless you've got the time to get deep into it, the insights you can draw from reading it are minimal.

Ramda, by comparison, is just a library of utility functions, and all of those utilities perform very simple operations: merging, plucking, appending, equality checking, etc.

There's a lot of intention in the Ramda API as well. All functions are "data last," meaning that the actual piece of data you're operating on is the final argument to every function. This enables you to write Ramda code that is very structurally consistent: function parameters first, data last, every time.

It gives me a sense of empowerment, reading the code. It's like "This doesn't have to be rocket science. If you just start from these basic operations, and write those basic operations with a simple but strict ideology of 'data last' every time, and stick them together like lego blocks using compose, then you can achieve some very cool stuff with very little code."


To be honest, I don’t know any code bases I would call “great” or “easy to read” but I can tell you what I do when I need to work in codebases I don’t know.

I’ve got two main strategies:

1) I look at the part of the app I want to modify when I use the app and search for that part in the code. Once I’ve found that code I roughly try to find out how that code works by adding exploratory code (you can also use a debugger). Once I “think” I know what is going on I try to modify the code. This is where you usually find some exceptions or misunderstandings on you part if you haven’t touched the code before. If you are lucky and work in a team somebody can tell you in a code review that you didn’t understand. If you are alone you will have to see things blow up, debug and fix the problem.

2) You can try to figure out from the main entry point how the app works. This works better for some apps than for others. If you have an event based app this is most likely just a supplement to method 1, if you have a cli app or some type of data munching app this can replace method 1.

3) You can try looking at early versions of a code base in GIT to get an understanding of its architecture before the app became “more complex”.

You will always be a bit overwhelmed by any code base and many code bases are just to large for a single person so get comfortable working on “parts” of an app first rather than working on or understanding “the whole thing”. Also, code reading is not like reading books, code is way way denser than any book you can read (and that includes Heidegger) so you will not just “read” it, you will need to work with it. Zed Shaw’s “Learn X the Hard Way” series relies on you working with the code to understand it. The same holds true for code you “read”, you will at least need to try to “run” the code in your mind if you can’t run it for real.

You might also want to get over your thing about frameworks. QT, GTK, Ruby on Rails, React, ncurses, frameworks and libs are in just about any app and many apps that get larger might extract significant parts of their functionality into libs or frameworks. A lot of boilerplate is usually a good indication that an app could benefit from a framework. I never understood the “I want to be free from the constraints of frameworks” people. Their code bases usually have the start of multiple architectures and a lot of boiler plate code. I think they always search for some “perfect” solution and just can’t find it. The truth is, libs and frameworks are great, they give you an easy in on a new app and they give you documentation that probably wouldn’t exist on fully home grown code. In other words, they mace “reading” code easier.


I've found the Chef project (https://github.com/chef/chef) to be high quality and easily readable but I've been working with Chef for like 8 years at this point which might be influencing how I view it.

Hashicorp projects also seem very well done too especially given how extensible they are.


Pihole [1] is mostly written in bash, which reads rather well, as far as I am concerned.

[1] https://github.com/pi-hole/pi-hole


I have found using github's language search to be helpful for this sort of thing.

If you are using ruby, for instance, just search for https://github.com/search?q=language%3Aruby and look for popular codebases. You can decide which are beautiful for yourself.


I think my favourite open source project to poke around in recently is [Reshade](https://github.com/crosire/reshade). The code is pretty readable and is doing a lot of interesting stuff. Every time I've taken a look at it I've learned something new. Definitely super light on boilerplate, given that it's solving a bit of a unique problem.

In terms of tips and tricks, I often start looking at new code by trying to write out in plain english prose, a bit of a story of how the code works. Almost like I'm writing a blog post explaining how things work to someone else. Often this process uncovers rabbit holes that I need to go down to understand isolated bits of logic before I can return to building this big picture view, which is sort of the point.


Every time that I can't figure out how to do something with Django, I just read the code [1] and then everything is easy and clear.

[1]: https://github.com/django/django


I really like DWM: https://git.suckless.org/dwm/

If you have a Linux machine, you can compile and install manually by just following the instructions on the README.

Then you can customize the window manager by copying and pasting the patches into your version and recompiling. That forces you to learn how to build and extend your own window manager in pure C. And it isn’t hard at all, even to a beginner.

That inspired the creation of many tiling window managers, because people understood the code and decided to build their own, like i3 or xmonad.

The project also features other easy to read C apps, like ST terminal and the surf web browser.


Look through the stack you’re familiar with. For me that means nginx, uwsgi, flask, sqlalchemy, alembic - but I’ll look at anything I have a question about.

My trick is to dig in when something doesn’t work the way I expect. Or someone says “I don’t think there’s a way to do X with blah”. My immediate reaction is to clone the code and take a look. I have a “tools” folder on my local machine that contains many of the tools / libraries is use.

Orientation is easier than you expect. The easiest scenarios are around “why did I get that error” situations. Grep for the error and away you go. But having a question to answer will definitely give you a direction to investigate.


I agree about having an error to investigate or a question to answer. Some of the python libraries with great code I'd recommend reading are Flask and Werkzueg. Both have very clean interfaces and excellent documentation. Sqlalchemy is somewhere in the middle. As an ORM it has a high minimum level of complexity. But the codebase is still reasonably well organized. Looking for an answer to something like "how does it emit a JOIN" may be a lot harder to answer than you expect though.

For packages to avoid, stay away from Celery. It's just... icky.


Yup, sqla is super meta and, thus, complicated (just due to the problem space). Better is alembic / alembicutils where you can dig into the autogenerate system they’re layering over sqla.

Weirdly enough, I enjoy digging through C and Java libs more, mostly because they’re more unfamiliar to me. I’d spend more time in Postgres / fontforge / mupdf / pdfbox / nginx / uwsgi on the whole.


I think the recommendation to study Flask's code base is an echo from years ago, when it was considered a micro/toy framework. I used to recommend it too. As a more seasoned programmer, I don't anymore. Having used and studied the framework, I've seen some of the limitations that directly stem from some of its design choices. Today, I think it tried to be needlessly clever sometimes. Some of those choices, that seemed fun and clever back in the days, did not age well. As time went and people started pushing the framework, some artefacts required hackish workarounds. And those kept cumulating. To such an extent that the codebase now has a bunch of flags that control it's contingent behavior; ifs and buts that make the whole thing harder to reason about. You'll still learn lots of clever Python tricks, but I wouldn't put it in the "clean and clear" category.


Doom 3 is a perennial favorite for "most beautiful C++ codebase" lists [0]

[0] https://github.com/id-Software/DOOM-3-BFG


I was going to comment the same thing, all the Carmack era id software open source code (Quake, Doom) is very nicely structured and quite easy to grok.


This is a very interesting question.

Are you interested in any particular languages?

For Python, take a look at: https://github.com/psf/requests


I initially had Python in the title but I removed it to give way to a broader discussion. Definitely checking this one out


Kenneth Reitz actually wrote a book called The Hitchhiker’s Guide to Python! which includes a chapter on Reading Great Code.

https://docs.python-guide.org/writing/reading/


That looks exactly what I was looking for, thanks for the resource


I hate to be the "it's complicated" guy but "excellent" is too broad.

I see every day code that is elegant but has bugs, ugly code that is foolproof, optimized code that performs abysmally because of some architecture change that happened in between, and a lot of abominations that make the code bad for guy A and good for guy B (e.g. a neat typechecked, object-oriented, very elegant, Pythonic numerical code that is 100 times more confusing for your research level numerical analyst than an uglier but functional Matlab script).

What I agree on is "the best way to improve X in my code" is "read code that has quality X".

Given the broadness of your question I suspect you are still finding your way around programming in general. If that's the case my method is to be driven by curiosity.

- Why does macOS behave this way? Let's look up xnu's code - I wonder about list implementation... Let's look at cPython code for appending items to a list

And so on... There is a lot of open code for stuff we are using everyday. It is interesting to get into it.


You might want to join the https://codereading.club/


Im surprised no one has said reading tests as a good starting point. Any way, besides main, tests are usually good too.


This guy made a HN mobile reader and put all the code on Github for his NDC Oslo presentation, it was good and shows off very readable asynchronous code in C#:

https://github.com/brminnick/AsyncAwaitBestPractices


Thanks for the kind words!

I’ve also published an open-source iOS + Android app to the App Stores, called GitTrends that leverages my AsyncAwaitBestPractices library if anyone wants to see how to use it in a real/live production app!

The source code for GitTrends is available here: https://gittrends.com


I see that you're primarily looking into Python work, so I'd recommend `smart_open` as a nice, compact way to get started.

https://github.com/RaRe-Technologies/smart_open


The zig stdlib has been good reading so far. You also basically have to read it if you want to use it.


For C, I've yet to see better code than ReactOS. Look at how they keep even monstrous functions readable: https://github.com/reactos/reactos/blob/3fa57b8ff7fcee47b8e2...

For C++, try Chromium: https://chromium.googlesource.com/chromium/chromium/+/refs/h...


GitLab is an excellent example of a large, complex Rails codebase: https://gitlab.com/gitlab-org/gitlab/


Stockfish is well written, commented, and documented C++ code:

https://github.com/official-stockfish/Stockfish


I remember back in the day reading parts of the Python standard library. I don't know if that's generally good advice or still viable, but that's what I did, and I found it helpful. It was directly available, and usually connected to things I used with Python.

One upside of this might also be that it's not as you said boilerplate, because it's very foundational and not heavily using other stuff. It also is well documented, so you'll find good explanations why things are the way they are.


A lot of the Java concurrency primitives written by Doug Lea and co. are great reads, and very well commented. See the source of `ConcurrentHashMap` for example: https://github.com/openjdk/jdk/blob/master/src/java.base/sha...



Depending on your interest, I could vouch for OpenBSD having a very clean readable codebase. Often it has some of the best practices coded in with useful commentary.


For TypeScript, Ghost: https://github.com/TryGhost/Ghost


Maybe I'm missing something but this repo looks like it's exclusively Javascript with no typescript.


Wordpress is pretty great.

https://github.com/WordPress/WordPress



I've had a look at NetBSD's codebase before. It was fairly easy to follow.

I've also heard good things said for OpenBSD's readability.


Reading and using YUI3 (https://github.com/yui/yui3) took my JavaScript to the next level. It's no longer relevant because of improvements to the language, but it's the best model of readable JavaScript I've ever seen.


Postgres


Most sections of the codebase that are actively developed are very readable, but I still got quite lost in the core parts of xact/multixact recently. I feel that is more of an exception, though.


Box2D https://github.com/erincatto/box2d I went over every file of this writing a Unity plugin for it in work once. I was really impressed, learned a lot.


The book programmers brain contains a lot of tips on improving code reading skills - https://www.manning.com/books/the-programmers-brain


wordI think it can be hard to recommend a particular codebase, well written code can be good to read but if you want to become better at a language or problem domain then sometimes reading badly written code may be a better way to learn.

Working through some badly written code that actually performs well can be a real eye opener. I mainly work in C and reading some legacy code (sometimes even my own) can be a challenge to work out exactly what's going on.

If you want to learn how an algorithm works, then a good clean codebase with lots of comments is a good way to go. If you want to learn the details of a particular language, then just read a lot of code in that language whether it’s good or bad.


For anyone looking for a (nontrivial) C# project, I can only recommend going through ILSpy decompiler. https://github.com/icsharpcode/ilspy


I had to modify FFmpeg for a job and I found it surprisingly accessible and easy to read/modify: https://github.com/FFmpeg/FFmpeg



It's been years since I've looked but I remember being impressed by the NGINX codebase. https://github.com/nginx/nginx


I’ve learned a TON from the [okhttp3](https://square.github.io/okhttp/) codebase, highly recommend studying it.



Read source code of libraries using in your current projects. It helps you to understand them more and improve your coding skills. You can start with a small feature, an API, a util or a configuration.


Not my current employer.

For jumping into new codebases I stick to the Jetbrains toolbox because it’s usually a consistent enough environment to investigate a new codebase. I also greatly appreciate the indexing.


Noda time is very clean/well written IMO -> https://github.com/nodatime/nodatime


Prefect workflow orchestrator: https://github.com/PrefectHQ/prefect



I think `xsv` is easy to read. I have a fork of it for personal use and it was easy to add features to it even though I'm not a rust daily user.


Honestly, the SQLite codebase is a fantastic read.


You don't get good at a language by just listening to it all the time. You get good by engaging. Same goes for programming.

Also, a lot of "clean code" stuff can be confusing dogma.

You should try building things you find interesting, and try to build them in a way that "feels correct", and try to emphasize - what if someone else was reading this? What if someone else dived into this codebase to add this feature? Could they?


- Anything from suckless

- Lua

- Redis

- idtech3

- libuv

- linux kernel

- sqlite

As much as Ruby, Python, and Go tout for being elegant or clean to read, they are pretty horrible to read in the wild. C is where it's at.


Anything writen in List /scheme


Any explicit examples?

Starting to explore scheme more and would be interested in some good pointers


I really enjoyed working with the Redis codebase. Great, easy to understand C code.


I wish people mentioned the language of the repo they are sharing, in their posts.


> Do you have tips and tricks you use to read codebases?

#1: If the codebase is huge, you can't read all of it. So you'd best know how to navigate it.

#2: You need an IDE or cscope-like too to navigate a codebase. The codebase is like a web of, say, wikipedia articles, and you're going to have to browse it a lot like how you'd browse wikipedia. Symbols are links!

#3: It helps to understand the big picture. What does this codebase implement? Where are the "entry points" -- where to start reading? What's the architecture? (E.g., Java is a byte-compiled language with a bytecode interpreter known as a JVM.) What's the design look like?

#4: If it's just for fun, well, just browse till you find something interesting, then read it carefully, and go spelunking like it's a wikipedia article.

#5: If you're reading it to debug something, you need to first find the relevant entry points.

#6: If you're reading it to add features, you really need to read the developer docs (if they exist), the internals docs (if they exist), and figure out a lot of things like APIs exported, internal utilities libraries, portability layers, external dependencies, protocols, etc. This will take time, and that's ok. Start with small features, and work your way. You'll build a deeper understanding as you go.

#7: You don't have to understand all that much about the codebase in question, and it might not be possible to if we're talking about a codebase that's in the hundreds of millions of lines of code. You'll have to specialize as you dive deep, and generalize as you wade "near the top".

#8: It can take time to pick up these skills to the point where you can do this quickly. And even then, it can take time to understand a large codebase well enough. There's just a ton of detail that you have to digest into a mental picture that's sufficiently high-level that you can use it productively. So be patient, and keep on going. Just because it's a lot to learn, you shouldn't be discouraged.

To really deal with huge codebases, you have to be a bit like a generalist who can specialize as needed.

For example, if you're reading the OpenJDK, you'll want to understand what Java is, what the JVM is, and so on, though you won't have to understand all of that if you just want to read the OpenJDK implementation of, say, TLS, but you will have to be able to navigate outside that particular bit of the OpenJDK sometimes, but if you tease out code threads far enough, you probably will learn a thing or three about seemingly unrelated things like the GC.

Get comfortable doing these things, and you'll be able to deal with codebases in the millions of lines of code.


LevelDB


+1000

https://github.com/google/leveldb

Jeff Dean and Sanjay Ghemawat are amazing engineers and this code is (/was?) nice.


My tricks in Go projects could use sqlc to transpile from SQL is a great time saving and minimise error prone, glad to avoid ORM as long as possible and minimal framework. It gets my job done and spent more time on business logic.

Adding on Tailwind, nothing lock you in.


I was always impressed with Near's emulators, RIP.


cs.chromium.org is an example of how tooling can drastically help with readability. It's incredibly easy to navigate the codebase.


I’m a fan of both SQLite and Postgres


Codemirror 6


git, curl & nginx.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: