Backend Fundamentals, Databases, API Design, and Application Architecture
Backend Basics
- What are Git and Git Flow? How are they used?
- Git = a set of immutable objects (blob / tree / commit) + an index + a working directory.
- blob = the file content itself (immutable snapshot). tree = the project’s directory structure, not the index. index = the staging area (the prep area for the next commit). commit = a timeline node (points to a tree + parent commits).
- When running
git add, Git stores the file content as an immutable blob and writes it into the index, recording which files will be included in the next commit. - When running
git commit, Git creates a tree snapshot based on the index and then creates a commit with timestamp and history links. - blob stores content, tree stores directory structure, commit forms the real timeline. All objects use hashes as their unique identity.
- What is Nginx? What are its uses?
- Nginx is a lightweight and high-performance web server. It sits at the front, receiving all user requests, handling routing, forwarding, load distribution, and caching, while the application server focuses solely on business logic. This makes systems safer, faster, more stable, and easier to scale.
- Core use cases:
- Reverse Proxy: Nginx stands in front and forwards all incoming HTTP requests to backend apps. Backends no longer need to be exposed to the public internet.
- Load Balancing: Nginx distributes traffic so multiple backend servers behave like a single, “infinitely scalable” server.
- HTTP Caching: Extremely fast caching layer that stores hot content in memory and returns it instantly.
- What is a CDN? Why use a CDN?
- CDN = Content Delivery Network.
- CDN places cache nodes around the world so users fetch content from nearby nodes rather than from the origin server. This significantly reduces latency, offloads traffic, prevents bottlenecks, and enhances security. Modern websites rely heavily on CDNs for performance.
- What is Cache? How does caching work?
- Cache stores frequently accessed, time-consuming, read-heavy data in memory to speed up access. It reduces database load but introduces consistency and concurrency challenges. Common patterns include Cache Aside, Read/Write Through, and Write Behind, along with handling cache penetration, avalanche, breakdown, and hotspots.
- How does JWT work?
- JWT (JSON Web Token) is a stateless authentication mechanism. The server does not store login state; instead, clients present a token each time to prove identity. The server simply verifies the token’s signature.
- What are TCP and UDP? How do they differ?
- TCP (Transmission Control Protocol) is connection-oriented: three-way handshake, reliable, ordered, retransmission, congestion control → slower but stable.
- “Handshake” refers to the protocol exchange that ensures both parties are ready before data transfer begins.
- UDP (User Datagram Protocol) is connectionless: unreliable, unordered, no retransmission, extremely fast → ideal for real-time applications.
- What is a Rate Limiter? What strategies exist?
- It restricts the rate of incoming requests so the system is not overwhelmed.
- Common strategies: Token Bucket, Fixed Window, Sliding Window.
- Access Control Design: ACL vs RBAC
- ACL (Access Control List): users get permissions directly via lists.
- RBAC (Role-Based Access Control): permissions are assigned to roles; users get roles.
- Differences between session-based and token-based authentication
- Session-based = stateful: the server stores session data and must “remember” user state.
- Token-based = stateless: the server stores no session data; the token itself proves identity.
API Design
- What is a RESTful API? Understanding REST with real-world analogies
- API = Application Programming Interface.
- REST = Representational State Transfer.
- Core features: stateless, cacheable, client/server separation, layered architecture.
- What is GraphQL?
- GraphQL is a query language for APIs.
- GraphQL: single endpoint + query fields to specify exactly what you need → one request can fetch multiple resources with field-level precision.
- Pros: precise, fetch-all-in-one, highly extensible. Cons: complex, can introduce performance issues, higher security risks.
- SOAP vs REST: how to choose?
- SOAP (Simple Object Access Protocol) is an XML-based protocol for distributed computing.
- XML (eXtensible Markup Language) is a markup language for describing data.
- SOAP is strict, XML-based, security-heavy, reliable, transactional — used in enterprise systems (banking, insurance).
- REST is a lightweight HTTP-based architecture style, using JSON, resource-oriented, fast, scalable — dominant for web/mobile APIs.
- API Design — How to build predictable APIs (Idempotency)
- Predictability = APIs must behave consistently across normal/abnormal conditions.
- Idempotency = calling an API once or 10 times results in the same final server state.
- Use idempotent keys:
- First time a key appears → process and store the result.
- Subsequent times → return the stored result without re-executing.
Database
- SQL JOIN
- SQL = Structured Query Language.
- SQL JOIN performs set operations on two result sets, taking two sets and producing a new one.
- INNER JOIN = intersection; LEFT JOIN = all rows from left + matched rows from right; RIGHT JOIN = symmetric; FULL JOIN = union. FULL/LEFT/RIGHT JOIN + IS NULL → find unmatched rows.
FROMchooses the main table,JOINselects the table to merge,ONdefines the match rule,WHEREfilters the final rows.
SELECT orders.*
FROM users
RIGHT JOIN orders
ON users.user_id = orders.user_id
WHERE users.user_id IS NULL;
- What is database normalization? Why is normalization necessary?
- Database normalization means restructuring tables into a “clean, non-redundant, and update-safe” form. Two goals: reduce redundancy and prevent update anomalies.
- 1NF (First Normal Form): each column must contain a single atomic value.
- 2NF (Second Normal Form): every non-key attribute must depend on the entire primary key.
- 3NF (Third Normal Form): a non-key attribute A cannot depend on another non-key attribute B (A should not require B to reach the primary key).
- BCNF (Boyce–Codd Normal Form): no non-key attribute may determine a key attribute.
- What is database denormalization? What are its pros and cons?
- Database denormalization intentionally introduces redundancy—adding extra fields, pre-joined tables, or precomputed values—to reduce JOIN operations and improve query performance. Typical techniques include pre-JOINing, mirror tables, table sharding, and storing derived values.
- What is NoSQL? What are the characteristics of NoSQL databases?
- NoSQL originally meant Non-SQL, later reinterpreted as Not Only SQL: flexible, dynamic data structures.
- Key-Value: extremely fast
- Wide Column: handles massive datasets
- Document: highly flexible
- Graph: optimized for relationship networks
- What is ACID? Explain the ACID properties.
- ACID describes four essential properties of database transactions to ensure correctness under critical operations.
- A — Atomicity: a transaction either completes entirely or not at all — no partial execution.
- C — Consistency: the system must remain logically valid before and after the transaction; no constraint violations.
- I — Isolation: concurrent transactions must not interfere with each other; prevents dirty or inconsistent reads.
- D — Durability: once a transaction is committed, the data must persist, even after crashes.
- ACID ensures reliability and predictability, preventing issues like “money deducted but order not created.”
- What is ORM? What are the advantages and disadvantages of ORM?
- ORM (Object-Relational Mapping) allows developers to work with objects in code while the ORM generates SQL and interacts with the database underneath.
- Without ORM (writing raw SQL directly): developers manually write queries, manage joins, handle mappings, and maintain full control over performance and schema interactions.
cursor.execute("SELECT * FROM users WHERE id = 3;")
result = cursor.fetchone()
- 用 ORM
user = User.objects.get(id=3)
- ORM = a translator. We speak in code (objects/classes in Python/Java/TypeScript), but the database only understands SQL. ORM’s job is to translate your code into SQL so the database can execute it.