How troubleshoot linux problems(server crash, out of memory). Diagnosing and Fixing Common Issues.

How Troubleshoot Linux Problems(Server Crash, Out Of Memory). Diagnosing And Fixing Common Issues.

When your Linux server seems to be offline or otherwise inaccessible, you should always be able to log in with the web console or through a VNC connection.

 

Steps you need to do for troubleshooting:

  1. Identify the problem: Determine the specific issue or error you’re experiencing, and collect as much information as possible about it.

  2.  Check system logs: Look at system logs, such as the kernel log, Syslog, or systemd journal, to see if there are any relevant messages or errors.

  3.  Verify system configuration: Check configuration files, such as /etc/fstab, /etc/network/interfaces, or /etc/sudoers, to ensure they are correctly configured.

  4.  Test hardware: If you suspect a hardware issue, run diagnostic tools to test your hard disk, memory, CPU, and other components.

  5.  Check network connectivity: Use ping, traceroute, or other diagnostic tools to verify connectivity to remote hosts or the internet.

  6.  Update and upgrade packages: Ensure that all packages on your system are up-to-date and have the latest security patches installed.

  7.  Check for conflicting software: If you have recently installed new software or made configuration changes, check for conflicts or compatibility issues with existing software.

  8.  Restart services: Try restarting relevant services, such as Apache, MySQL, or SSH, to see if that resolves the issue.

  9.  Reboot the system: If all else fails, try rebooting the system to see if that clears up any issues.

 

Here’s a general troubleshooting guide:

  1. Identify the problem:

    • Check system logs: Examine system log files like /var/log/messages, /var/log/syslog, or /var/log/dmesg for any error messages or indications of what went wrong.

    • Check application logs: Look into logs specific to the application or service that experienced the issue. The location of these logs may vary depending on the application.

  2. Check system resources:

    • Memory usage: Use the free or top command to check the available memory and identify if the system is running low on memory.

    • CPU usage: Use the top or htop command to monitor CPU usage and check if any processes are consuming excessive resources.

    • Disk space: Verify that there is enough free disk space using the df command.

  3. Investigate specific issues:

    • Server crashes: If the server crashed, it could be due to hardware issues, kernel panics, or software bugs. Analyze the system logs to find any error messages or clues about the cause.

    • Out-of-memory errors: If the system runs out of memory, identify which process is consuming excessive memory using the top command. It could be a memory leak or a misconfigured application. Consider optimizing or restarting the problematic process.

  4. Take preventive measures:

    • Keep the system updated: Regularly update the kernel, software packages, and security patches to ensure stability and fix any known issues.

    • Monitor system performance: Use tools like Nagios, Zabbix, or Prometheus to monitor system resources, services, and network activity. This helps identify potential issues before they become critical.

    • Implement resource limits: Set resource limits (ulimits) for processes to prevent them from consuming excessive memory, CPU, or other resources.

 

Commands you need to use for troubleshooting:

  1. dmesg: Displays kernel messages and can help you identify hardware-related issues.

  2. journalctl: Displays system logs, including messages from the systemd service manager.

  3. lsblk: Lists all available block devices, such as hard disks and USB drives.

  4. df: Displays disk usage information for all mounted file systems.

  5. ifconfig or ip: Displays network interface information, including IP addresses and network configuration.

  6. ping: Tests network connectivity by sending ICMP echo requests to a remote host.

    In the above output the server is disconnected.

    This is what a server looks like without any problems.

  7. traceroute: Displays the route taken by packets to reach a remote host.

  8. apt-get or yum: Package managers for Debian-based or Red Hat-based distributions, respectively. Use these to install, remove, or update packages on your system.

  9. systemctl: Command-line interface for controlling the systemd system and service.

  10. ps: Lists running processes on your system and their status.

     

 

Additional troubleshooting steps and commands you can use:

Check network configuration: Verify the network configuration files to ensure they are correctly set up. The location of these files may vary depending on the distribution. For example:

  • For Ubuntu and Debian: /etc/network/interfaces

  • For CentOS and Red Hat: /etc/sysconfig/network-scripts/ifcfg-<interface>

Check firewall settings: Verify the firewall rules to ensure that necessary ports are open and accessible. Use the following commands:

  • For iptables: sudo iptables -L

  • For firewalld: sudo firewall-cmd –list-all

Check DNS resolution: Verify that the DNS configuration is correct and that the server can resolve domain names. Use the following commands:

  • nslookup <domain>

  • dig <domain>

Check service status: Check the status of specific services that are essential for the server’s operation. Use the following commands:

  • systemctl status <service>: Check the status of a systemd service.

  • service <service> status: Check the status of a service (SysVinit).

Check open ports: Use the netstat command to check which ports are open and which processes are listening on them. For example:

  • netstat -tuln: Display all listening TCP and UDP ports.

Verify file permissions with ls -l command to view file permissions.

Test connectivity to specific hosts or services: Use tools such as telnet, curl, or nc to check if you can establish connections to remote hosts or services.

 

Conclusion:

Troubleshooting steps can vary depending on the specific issue and Linux distribution you are using. Adapt the steps accordingly based on your situation and refer to the documentation or support resources specific to your distribution if needed.